Log in

Relevant bibliographies by topics / Speech processing systems / Journal articles

To see the other types of publications on this topic, follow the link: Speech processing systems.

Journal articles on the topic 'Speech processing systems'

Author: Grafiati

Published: 4 June 2021

Last updated: 30 July 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speech processing systems.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Ibragimova, Sayora. "THE ADVANTAGE OFTHEWAVELET TRANSFORM IN PROCESSING OF SPEECH SIGNALS." Technical Sciences 4, no. 3 (March 30, 2021): 37–41. http://dx.doi.org/10.26739/2181-9696-2021-3-6.

Full text

Abstract:

This work deals with basic theory of wavelet transform and multi-scale analysis of speech signals, briefly reviewed the main differences between wavelet transform and Fourier transform in the analysis of speech signals. The possibilities to use the method of wavelet analysis to speech recognition systems and its main advantages. In most existing systems of recognition and analysis of speech sound considered as a stream of vectors whose elements are some frequency response. Therefore, the speech processing in real time using sequential algorithms requires computing resources with high performance. Examples of how this method can be used when processing speech signals and build standards for systems of recognition.Key words: digital signal processing, Fourier transform, wavelet analysis, speech signal, wavelet transform

APA, Harvard, Vancouver, ISO, and other styles

2

Dasarathy, Belur V. "Robust speech processing." Information Fusion 5, no. 2 (June 2004): 75. http://dx.doi.org/10.1016/j.inffus.2004.02.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Thompson, Laura A., and William C. Ogden. "Visible speech improves human language understanding: Implications for speech processing systems." Artificial Intelligence Review 9, no. 4-5 (October 1995): 347–58. http://dx.doi.org/10.1007/bf00849044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Scott, Sophie K., and Carolyn McGettigan. "The neural processing of masked speech." Hearing Research 303 (September 2013): 58–66. http://dx.doi.org/10.1016/j.heares.2013.05.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

M Tasbolatov, N. Mekebayev, O. Mamyrbayev, M. Turdalyuly, D. Oralbekova,. "Algorithms and architectures of speech recognition systems." Psychology and Education Journal 58, no. 2 (February 20, 2021): 6497–501. http://dx.doi.org/10.17762/pae.v58i2.3182.

Full text

Abstract:

Digital processing of speech signal and the voice recognition algorithm is very important for fast and accurate automatic scoring of the recognition technology. A voice is a signal of infinite information. The direct analysis and synthesis of a complex speech signal is due to the fact that the information is contained in the signal. Speech is the most natural way of communicating people. The task of speech recognition is to convert speech into a sequence of words using a computer program. This article presents an algorithm of extracting MFCC for speech recognition. The MFCC algorithm reduces the processing power by 53% compared to the conventional algorithm. Automatic speech recognition using Matlab.

APA, Harvard, Vancouver, ISO, and other styles

6

Delic, Vlado, Darko Pekar, Radovan Obradovic, and Milan Secujski. "Speech signal processing in ASR&TTS algorithms." Facta universitatis - series: Electronics and Energetics 16, no. 3 (2003): 355–64. http://dx.doi.org/10.2298/fuee0303355d.

Full text

Abstract:

Speech signal processing and modeling in systems for continuous speech recognition and Text-to-Speech synthesis in Serbian language are described in this paper. Both systems are fully developed by the authors and do not use any third party software. Accuracy of the speech recognizer and intelligibility of the TTS system are in the range of the best solutions in the world, and all conditions are met for commercial use of these solutions.

APA, Harvard, Vancouver, ISO, and other styles

7

FUNAKOSHI, KOTARO, TAKENOBU TOKUNAGA, and HOZUMI TANAKA. "Processing Japanese Self-correction in Speech Dialog Systems." Journal of Natural Language Processing 10, no. 4 (2003): 33–53. http://dx.doi.org/10.5715/jnlp.10.4_33.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Hills, A., and K. Scott. "Perceived degradation effects in packet speech systems." IEEE Transactions on Acoustics, Speech, and Signal Processing 35, no. 5 (May 1987): 699–701. http://dx.doi.org/10.1109/tassp.1987.1165187.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Gransier, Robin, and Jan Wouters. "Neural auditory processing of parameterized speech envelopes." Hearing Research 412 (December 2021): 108374. http://dx.doi.org/10.1016/j.heares.2021.108374.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Moon, Todd K., Jacob H. Gunther, Cortnie Broadus, Wendy Hou, and Nils Nelson. "Turbo Processing for Speech Recognition." IEEE Transactions on Cybernetics 44, no. 1 (January 2014): 83–91. http://dx.doi.org/10.1109/tcyb.2013.2247593.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Arnold, Tim, and Helen J. A. Fuller. "An Ergonomic Framework for Researching and Designing Speech Recognition Technologies in Health Care with an Emphasis on Safety." Proceedings of the International Symposium on Human Factors and Ergonomics in Health Care 8, no. 1 (September 2019): 279–83. http://dx.doi.org/10.1177/2327857919081067.

Full text

Abstract:

Automatic speech recognition (ASR) systems and speech interfaces are becoming increasingly prevalent. This includes increases in and expansion of use of these technologies for supporting work in health care. Computer-based speech processing has been extensively studied and developed over decades. Speech processing tools have been fine-tuned through the work of Speech and Language Researchers. Researchers have previously and continue to describe speech processing errors in medicine. The discussion provided in this paper proposes an ergonomic framework for speech recognition to expand and further describe this view of speech processing in supporting clinical work. With this end in mind, we hope to build on previous work and emphasize the need for increased human factors involvement in this area while also facilitating the discussion of speech recognition in contexts that have been explored in the human factors domain. Human factors expertise can contribute through proactively describing and designing these critical interconnected socio-technical systems with error-tolerance in mind.

APA, Harvard, Vancouver, ISO, and other styles

12

Kai, Atsuhiko, and Seiichi Nakagawa. "Comparison of continuous speech recognition systems with unknown-word processing for speech disfluencies." Systems and Computers in Japan 29, no. 9 (August 1998): 43–53. http://dx.doi.org/10.1002/(sici)1520-684x(199808)29:9<43::aid-scj5>3.0.co;2-j.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Cecinati, Riccardo. "Integrated processing unit, particularly for connected speech recognition systems." Journal of the Acoustical Society of America 92, no. 2 (August 1992): 1199–200. http://dx.doi.org/10.1121/1.403986.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Marshall, Stephen. "Processing of audio and visual speech for telecommunication systems." Journal of Electronic Imaging 8, no. 3 (July 1, 1999): 263. http://dx.doi.org/10.1117/1.482675.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Polifroni, Joseph, Imre Kiss, and Stephanie Seneff. "Speech for Content Creation." International Journal of Mobile Human Computer Interaction 3, no. 2 (April 2011): 35–49. http://dx.doi.org/10.4018/jmhci.2011040103.

Full text

Abstract:

This paper proposes a paradigm for using speech to interact with computers, one that complements and extends traditional spoken dialogue systems: speech for content creation. The literature in automatic speech recognition (ASR), natural language processing (NLP), sentiment detection, and opinion mining is surveyed to argue that the time has come to use mobile devices to create content on-the-fly. Recent work in user modelling and recommender systems is examined to support the claim that using speech in this way can result in a useful interface to uniquely personalizable data. A data collection effort recently undertaken to help build a prototype system for spoken restaurant reviews is discussed. This vision critically depends on mobile technology, for enabling the creation of the content and for providing ancillary data to make its processing more relevant to individual users. This type of system can be of use where only limited speech processing is possible.

APA, Harvard, Vancouver, ISO, and other styles

16

Auti, Dr Nisha, Atharva Pujari, Anagha Desai, Shreya Patil, Sanika Kshirsagar, and Rutika Rindhe. "Advanced Audio Signal Processing for Speaker Recognition and Sentiment Analysis." International Journal for Research in Applied Science and Engineering Technology 11, no. 5 (May 31, 2023): 1717–24. http://dx.doi.org/10.22214/ijraset.2023.51825.

Full text

Abstract:

Abstract: Automatic Speech Recognition (ASR) technology has revolutionized human-computer interaction by allowing users to communicate with computer interfaces using their voice in a natural way. Speaker recognition is a biometric recognition method that identifies individuals based on their unique speech signal, with potential applications in security, communication, and personalization. Sentiment analysis is a statistical method that analyzes unique acoustic properties of the speaker's voice to identify emotions or sentiments in speech. This allows for automated speech recognition systems to accurately categorize speech as Positive, Neutral, or Negative. While sentiment analysis has been developed for various languages, further research is required for regional languages. This project aims to improve the accuracy of automatic speech recognition systems by implementing advanced audio signal processing and sentiment analysis detection. The proposed system will identify the speaker's voice and analyze the audio signal to detect the context of speech, including the identification of foul language and aggressive speech. The system will be developed for the Marathi Language dataset, with potential for further development in other languages.

APA, Harvard, Vancouver, ISO, and other styles

17

Järvinen, Kari. "Digital speech processing: Speech coding, synthesis, and recognition." Signal Processing 30, no. 1 (January 1993): 133–34. http://dx.doi.org/10.1016/0165-1684(93)90056-g.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Varga, A., and F. Fallside. "A technique for using multipulse linear predictive speech synthesis in text-to-speech type systems." IEEE Transactions on Acoustics, Speech, and Signal Processing 35, no. 4 (April 1987): 586–87. http://dx.doi.org/10.1109/tassp.1987.1165151.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Moore, Thomas J., and Richard L. McKinley. "Research on Speech Processing for Military Avionics." Proceedings of the Human Factors Society Annual Meeting 30, no. 13 (September 1986): 1331–35. http://dx.doi.org/10.1177/154193128603001321.

Full text

Abstract:

The Biological Acoustics Branch of the Armstrong Aerospace Medical Research Laboratory (AAMRL) is engaged in research in a number of speech related areas. This paper will describe the approach used to conduct research in the development and evaluation of military speech communication systems, mention the types of studies done using this approach and give examples of the types of data generated by these studies. Representative data will also be provided describing acoustic-phonetic changes that occur when speech is produced under acceleration.

APA, Harvard, Vancouver, ISO, and other styles

20

Fadel, Wiam, Toumi Bouchentouf, Pierre-André Buvet, and Omar Bourja. "Adapting Off-the-Shelf Speech Recognition Systems for Novel Words." Information 14, no. 3 (March 13, 2023): 179. http://dx.doi.org/10.3390/info14030179.

Full text

Abstract:

Current speech recognition systems with fixed vocabularies have difficulties recognizing Out-of-Vocabulary words (OOVs) such as proper nouns and new words. This leads to misunderstandings or even failures in dialog systems. Ensuring effective speech recognition is crucial for the proper functioning of robot assistants. Non-native accents, new vocabulary, and aging voices can cause malfunctions in a speech recognition system. If this task is not executed correctly, the assistant robot will inevitably produce false or random responses. In this paper, we used a statistical approach based on distance algorithms to improve OOV correction. We developed a post-processing algorithm to be combined with a speech recognition model. In this sense, we compared two distance algorithms: Damerau–Levenshtein and Levenshtein distance. We validated the performance of the two distance algorithms in conjunction with five off-the-shelf speech recognition models. Damerau–Levenshtein, as compared to the Levenshtein distance algorithm, succeeded in minimizing the Word Error Rate (WER) when using the MoroccanFrench test set with five speech recognition systems, namely VOSK API, Google API, Wav2vec2.0, SpeechBrain, and Quartznet pre-trained models. Our post-processing method works regardless of the architecture of the speech recognizer, and its results on our MoroccanFrench test set outperformed the five chosen off-the-shelf speech recognizer systems.

APA, Harvard, Vancouver, ISO, and other styles

21

Puder, Henning, and Gerhard Schmidt. "Applied speech and audio processing." Signal Processing 86, no. 6 (June 2006): 1121–23. http://dx.doi.org/10.1016/j.sigpro.2005.07.034.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Salman, Hayder Mahmood, Vian S. Al Al-Doori, Hayder sharif, Wasfi Hameed4, and Rusul S. Bader. "Accurate Recognition of Natural language Using Machine Learning and Feature Fusion Processing." Fusion: Practice and Applications 10, no. 1 (2023): 128–42. http://dx.doi.org/10.54216/fpa.100108.

Full text

Abstract:

To enhance the performance of Chinese language pronunciation evaluation and speech recognition systems, researchers are focusing on developing intelligent techniques for multilevel fusion processing of data, features, and decisions using deep learning-based computer-aided systems. With a combination of score level, rank level, and hybrid level fusion, as well as fusion optimization and fusion score improvement, these systems can effectively combine multiple models and sensors to improve the accuracy of information fusion. Additionally, intelligent systems for information fusion, including those used in robotics and decision-making, can benefit from techniques such as multimedia data fusion and machine learning for data fusion. Furthermore, optimization algorithms and fuzzy approaches can be applied to data fusion applications in cloud environments and e-systems, while spatial data fusion can be used to enhance the quality of image and feature data In this paper, a new approach has been presented to identify the tonal language in continuous speech. This study proposes the Machine learning-assisted automatic speech recognition framework (ML-ASRF) for Chinese character and language prediction. Our focus is on extracting highly robust features and combining various speech signal sequences of deep models. The experimental results demonstrated that the machine learning neural network recognition rate is considerably higher than that of the conventional speech recognition algorithm, which performs more accurate human-computer interaction and increases the efficiency of determining Chinese language pronunciation accuracy.

APA, Harvard, Vancouver, ISO, and other styles

23

Ali Abumalloh, Rabab, Hasan Muaidi Al-Serhan, Othman Bin Ibrahim, and Waheeb Abu-Ulbeh. "Arabic Part-of-Speech Tagger, an Approach Based on Neural Network Modelling." International Journal of Engineering & Technology 7, no. 2.29 (May 22, 2018): 742. http://dx.doi.org/10.14419/ijet.v7i2.29.14009.

Full text

Abstract:

POS-tagging gained the interest of researchers in computational linguistics sciences in the recent years. Part-of-speech tagging systems assign the proper grammatical tag or morpho-syntactical category labels automatically to every word in the corpus per its appearance on the text. POS-tagging serves as a fundamental and preliminary step in linguistic analysis which can help in developing many natural language processing applications such as: word processing systems, spell checking systems, building dictionaries and in parsing systems. Arabic language gained the interest of researchers which led to increasing demand for Arabic natural language processing systems. Artiﬁcial neural networks has been applied in many applications such as speech recognition and part of speech prediction, but it is considered as a new approach in Part-of-speech tagging. In this research, we developed an Arabic POS-tagger using artificial neural network. A corpus of 20,620 words, which were manually assigned to the appropriate tags was developed and used to train the artificial neural network and to test the part of speech tagger systems’ overall performance. The accuracy of the developed tagger reaches 89.04% using the testing dataset. While, it reaches 98.94% using the training dataset. By combining the two datasets, the accuracy rate for the whole system is 96.96%.

APA, Harvard, Vancouver, ISO, and other styles

24

Romeu, E. S., and V. I. Syryamkin. "Possibilities for applied joint speech processing and computer vision systems." IOP Conference Series: Materials Science and Engineering 516 (April 26, 2019): 012044. http://dx.doi.org/10.1088/1757-899x/516/1/012044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Bonte, Milene, Anke Ley, Wolfgang Scharke, and Elia Formisano. "Developmental refinement of cortical systems for speech and voice processing." NeuroImage 128 (March 2016): 373–84. http://dx.doi.org/10.1016/j.neuroimage.2016.01.015.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Savchenko, L. V., and A. V. Savchenko. "Fuzzy Phonetic Encoding of Speech Signals in Voice Processing Systems." Journal of Communications Technology and Electronics 64, no. 3 (March 2019): 238–44. http://dx.doi.org/10.1134/s1064226919030173.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Chen, Tsuhan. "Video signal processing systems and methods utilizing automated speech analysis." Journal of the Acoustical Society of America 112, no. 2 (2002): 368. http://dx.doi.org/10.1121/1.1507005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Weinstein, C. J. "Opportunities for advanced speech processing in military computer-based systems." Proceedings of the IEEE 79, no. 11 (1991): 1626–41. http://dx.doi.org/10.1109/5.118986.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

de Abreu, Caio Cesar Enside, Marco Aparecido Queiroz Duarte, Bruno Rodrigues de Oliveira, Jozue Vieira Filho, and Francisco Villarreal. "Regression-Based Noise Modeling for Speech Signal Processing." Fluctuation and Noise Letters 20, no. 03 (January 30, 2021): 2150022. http://dx.doi.org/10.1142/s021947752150022x.

Full text

Abstract:

Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.

APA, Harvard, Vancouver, ISO, and other styles

30

Ungureanu, Dan, Stefan-Adrian Toma, Ion-Dorinel Filip, Bogdan-Costel Mocanu, Iulian Aciobăniței, Bogdan Marghescu, Titus Balan, Mihai Dascalu, Ion Bica, and Florin Pop. "ODIN112–AI-Assisted Emergency Services in Romania." Applied Sciences 13, no. 1 (January 3, 2023): 639. http://dx.doi.org/10.3390/app13010639.

Full text

Abstract:

The evolution of Natural Language Processing technologies transformed them into viable choices for various accessibility features and for facilitating interactions between humans and computers. A subset of them consists of speech processing systems, such as Automatic Speech Recognition, which became more accurate and more popular as a result. In this article, we introduce an architecture built around various speech processing systems to enhance Romanian emergency services. Our system is designed to help the operator evaluate various situations with the end goal of reducing the response times of emergency services. We also release the largest high-quality speech dataset of more than 150 h for Romanian. Our architecture includes an Automatic Speech Recognition model to transcribe calls automatically and augment the operator’s notes, as well as a Speech Recognition model to classify the caller’s emotions. We achieve state-of-the-art results on both tasks, while our demonstrator is designed to be integrated with the Romanian emergency system.

APA, Harvard, Vancouver, ISO, and other styles

31

Jamieson, Donald G., Vijay Parsa, Moneca C. Price, and James Till. "Interaction of Speech Coders and Atypical Speech II." Journal of Speech, Language, and Hearing Research 45, no. 4 (August 2002): 689–99. http://dx.doi.org/10.1044/1092-4388(2002/055).

Full text

Abstract:

We investigated how standard speech coders, currently used in modern communication systems, affect the quality of the speech of persons who have common speech and voice disorders. Three standardized speech coders (GSM 6.10 RPELTP, FS1016 CELP, and FS1015 LPC) and two speech coders based on subband processing were evaluated for their performance. Coder effects were assessed by measuring the quality of speech samples both before and after processing by the speech coders. Speech quality was rated by 10 listeners with normal hearing on 28 different scales representing pitch and loudness changes, speech rate, laryngeal and resonatory dysfunction, and coder-induced distortions. Results showed that (a) nine scale items were consistently and reliably rated by the listeners; (b) all coders degraded speech quality on these nine scales, with the GSM and CELP coders providing the better quality speech; and (c) interactions between coders and individual voices did occur on several voice quality scales.

APA, Harvard, Vancouver, ISO, and other styles

32

Hu, J., C. C. Cheng, and W. H. Liu. "Processing of speech signals using a microphone array for intelligent robots." Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering 219, no. 2 (March 1, 2005): 133–43. http://dx.doi.org/10.1243/095965105x9461.

Full text

Abstract:

For intelligent robots to interact with people, an efficient human-robot communication interface is very important (e.g. voice command). However, recognizing voice command or speech represents only part of speech communication. The physics of speech signals includes other information, such as speaker direction. Secondly, a basic element of processing the speech signal is recognition at the acoustic level. However, the performance of recognition depends greatly on the reception. In a noisy environment, the success rate can be very poor. As a result, prior to speech recognition, it is important to process the speech signals to extract the needed content while rejecting others (such as background noise). This paper presents a speech purification system for robots to improve the signal-to-noise ratio of reception and an algorithm with a multidirection calibration beamformer.

APA, Harvard, Vancouver, ISO, and other styles

33

Wu, Yixuan. "Application of deep learning-based speech signal processing technology in electronic communication." Applied and Computational Engineering 77, no. 1 (July 16, 2024): 106–11. http://dx.doi.org/10.54254/2755-2721/77/20240661.

Full text

Abstract:

In recent years, the artificial intelligence boom triggered by deep learning is influencing and changing peoples lifestyles. People are no longer satisfied with human-computer interaction through simple text commands; instead, they look forward to more convenient and faster communication methods like voice interaction. Against the backdrop of innovative development, the application of speech signal processing systems is becoming increasingly widespread. Therefore, it is necessary to study the application of deep learning-based speech signal processing technology in electronic communication. This can provide more valuable references and assistance for future development, promoting the better development of deep learning-based speech signal processing technology in electronic communication. In this paper, we first review the application of deep learning in speech signal enhancement, speech recognition, and speech synthesis from a theoretical analysis perspective. Then, we discuss the application of deep learning-based speech signal processing in electronic communication, including the application of models such as Transformer, LAS (Listen, Attend and Spell), and GFT-conformer in speech signal processing. We also discuss some application scenarios of deep learning-based speech signal processing in electronic communication. Finally, we identify the need for deeper application of deep learning technology in speech signal processing and electronic communication, with continuous optimization and adjustment.

APA, Harvard, Vancouver, ISO, and other styles

34

Smither, Janan Al-Awar. "The Processing of Synthetic Speech by Older and Younger Adults." Proceedings of the Human Factors Society Annual Meeting 36, no. 2 (October 1992): 190–92. http://dx.doi.org/10.1177/154193129203600211.

Full text

Abstract:

This experiment investigated the demands synthetic speech places on short term memory by comparing performance of old and young adults on an ordinary short term memory task. Items presented were generated by a human speaker or by a text-to-speech computer synthesizer. Results were consistent with the idea that the comprehension of synthetic speech imposes increased resource demands on the short term memory system. Older subjects performed significantly more poorly than younger subjects, and both groups performed more poorly with synthetic than with human speech. Findings suggest that short term memory demands imposed by the processing of synthetic speech should be investigated further, particularly regarding the implementation of voice response systems in devices for the elderly.

APA, Harvard, Vancouver, ISO, and other styles

35

Chien, Jen-Tzung, and Man-Wai Mak. "Guest Editorial: Modern Speech Processing and Learning." Journal of Signal Processing Systems 92, no. 8 (July 9, 2020): 775–76. http://dx.doi.org/10.1007/s11265-020-01577-4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Islam, Rumana, Esam Abdel-Raheem, and Mohammed Tarique. "A Novel Pathological Voice Identification Technique through Simulated Cochlear Implant Processing Systems." Applied Sciences 12, no. 5 (February 25, 2022): 2398. http://dx.doi.org/10.3390/app12052398.

Full text

Abstract:

This paper presents a pathological voice identification system employing signal processing techniques through cochlear implant models. The fundamentals of the biological process for speech perception are investigated to develop this technique. Two cochlear implant models are considered in this work: one uses a conventional bank of bandpass filters, and the other one uses a bank of optimized gammatone filters. The critical center frequencies of those filters are selected to mimic the human cochlear vibration patterns caused by audio signals. The proposed system processes the speech samples and applies a CNN for final pathological voice identification. The results show that the two proposed models adopting bandpass and gammatone filterbanks can discriminate the pathological voices from healthy ones, resulting in F1 scores of 77.6% and 78.7%, respectively, with speech samples. The obtained results of this work are also compared with those of other related published works.

APA, Harvard, Vancouver, ISO, and other styles

37

Kazi, Sara. "SPEECH RECOGNITION SYSTEM." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 03 (March 22, 2024): 1–5. http://dx.doi.org/10.55041/ijsrem29567.

Full text

Abstract:

Speech recognition technology has witnessed remarkable progress in recent years, fueled by advancements in machine learning, deep neural networks, and signal processing techniques. This paper presents a comprehensive review of the current state-of-the-art in speech recognition systems, highlighting key methodologies and breakthroughs that have contributed to their improved performance. The paper explores various aspects, including acoustic modeling, language modeling, and the integration of contextual information, shedding light on the challenges faced and innovative solutions proposed in the field. Furthermore, the paper discusses the impact of large-scale datasets and transfer learning on the robustness and adaptability of speech recognition models. It delves into recent developments in end-to-end models and their potential to simplify the architecture while enhancing accuracy. The integration of real-time and edge computing for speech recognition applications is also explored, emphasizing the implications for practical implementations in diverse domains such as healthcare, telecommunications, and smart devices. In addition to reviewing the current landscape, the paper provides insights into future prospects and emerging trends in speech recognition research. The role of multimodal approaches, incorporating visual and contextual cues, is discussed as a potential avenue for further improvement. Ethical considerations related to privacy and bias in speech recognition systems are also addressed, emphasizing the importance of responsible development and deployment. By synthesizing current research findings and anticipating future directions, this paper contributes to the evolving discourse on speech recognition technologies, providing a valuable resource for researchers, practitioners, and industry professionals in the field. Key Words: Real-time processing , Machine learning , Deep neural networks , Technology advancements , Contextual information , Large-scale datasets Transfer learning , End-to-end models , Real-time processing Edge computing , Multimodal approaches Ethical considerations , Privacy , Bias , Future prospects Research review.

APA, Harvard, Vancouver, ISO, and other styles

38

Yu, Sabrina, Sherryse Corrow, Jason JS Barton, and Andrea Albonico. "Facial Identity And Facial Speech Processing In Developmental Prosopagnosia." Journal of Vision 22, no. 14 (December 5, 2022): 3422. http://dx.doi.org/10.1167/jov.22.14.3422.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Kosarev, Y. "Synergetics and 'insight' strategy for speech processing." Literary and Linguistic Computing 12, no. 2 (June 1, 1997): 113–18. http://dx.doi.org/10.1093/llc/12.2.113.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Finke, Mareike, Pascale Sandmann, Hanna Bönitz, Andrej Kral, and Andreas Büchner. "Consequences of Stimulus Type on Higher-Order Processing in Single-Sided Deaf Cochlear Implant Users." Audiology and Neurotology 21, no. 5 (2016): 305–15. http://dx.doi.org/10.1159/000452123.

Full text

Abstract:

Single-sided deaf subjects with a cochlear implant (CI) provide the unique opportunity to compare central auditory processing of the electrical input (CI ear) and the acoustic input (normal-hearing, NH, ear) within the same individual. In these individuals, sensory processing differs between their two ears, while cognitive abilities are the same irrespectively of the sensory input. To better understand perceptual-cognitive factors modulating speech intelligibility with a CI, this electroencephalography study examined the central-auditory processing of words, the cognitive abilities, and the speech intelligibility in 10 postlingually single-sided deaf CI users. We found lower hit rates and prolonged response times for word classification during an oddball task for the CI ear when compared with the NH ear. Also, event-related potentials reflecting sensory (N1) and higher-order processing (N2/N4) were prolonged for word classification (targets versus nontargets) with the CI ear compared with the NH ear. Our results suggest that speech processing via the CI ear and the NH ear differs both at sensory (N1) and cognitive (N2/N4) processing stages, thereby affecting the behavioral performance for speech discrimination. These results provide objective evidence for cognition to be a key factor for speech perception under adverse listening conditions, such as the degraded speech signal provided from the CI.

APA, Harvard, Vancouver, ISO, and other styles

41

Jamal, Marwa, and Tariq A. Hassan. "Speech Coding Using Discrete Cosine Transform and Chaotic Map." Ingénierie des systèmes d information 27, no. 4 (August 31, 2022): 673–77. http://dx.doi.org/10.18280/isi.270419.

Full text

Abstract:

Recently, data of multimedia performs an exponentially blowing tendency, saturating daily life of humans. Various modalities of data, includes images, texts and video, plays important role in different aspects and has wide. However, the key problem of utilizing data of large scale is cost of processing and massive storage. Therefore, for efficient communications and for economical storage requires effective techniques of data compression to reduce the volume of data. Speech coding is a main problem in the area of digital speech processing. The process of converting the voice signals into a more compressed form is speech coding. In this work, we demonstrate that a DCT with a chaotic system combined with run-length coding can be utilized to implement speech coding of very low bit-rate with high reconstruction quality. Experimental result show that compression ratio is about 13% when implemented on Librispeech dataset.

APA, Harvard, Vancouver, ISO, and other styles

42

Resende, Natália, and Andy Way. "Can Google Translate Rewire Your L2 English Processing?" Digital 1, no. 1 (March 4, 2021): 66–85. http://dx.doi.org/10.3390/digital1010006.

Full text

Abstract:

In this article, we address the question of whether exposure to the translated output of MT systems could result in changes in the cognitive processing of English as a second language (L2 English). To answer this question, we first conducted a survey with 90 Brazilian Portuguese L2 English speakers with the aim of understanding how and for what purposes they use web-based MT systems. To investigate whether MT systems are capable of influencing L2 English cognitive processing, we carried out a syntactic priming experiment with 32 Brazilian Portuguese speakers. We wanted to test whether speakers re-use in their subsequent speech in English the same syntactic alternative previously seen in the MT output, when using the popular Google Translate system to translate sentences from Portuguese into English. The results of the survey show that Brazilian Portuguese L2 English speakers use Google Translate as a tool supporting their speech in English as well as a source of English vocabulary learning. The results of the syntactic priming experiment show that exposure to an English syntactic alternative through GT can lead to the re-use of the same syntactic alternative in subsequent speech even if it is not the speaker’s preferred syntactic alternative in English. These findings suggest that GT is being used as a tool for language learning purposes and so is indeed capable of rewiring the processing of L2 English syntax.

APA, Harvard, Vancouver, ISO, and other styles

43

Yelle, Serena K., and Gina M. Grimshaw. "Hemispheric Specialization for Linguistic Processing of Sung Speech." Perceptual and Motor Skills 108, no. 1 (February 2009): 219–28. http://dx.doi.org/10.2466/pms.108.1.219-228.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Ito, Takayuki, Alexis R. Johns, and David J. Ostry. "Left Lateralized Enhancement of Orofacial Somatosensory Processing Due to Speech Sounds." Journal of Speech, Language, and Hearing Research 56, no. 6 (December 2013): 1875–81. http://dx.doi.org/10.1044/1092-4388(2013/12-0226).

Full text

Abstract:

Purpose Somatosensory information associated with speech articulatory movements affects the perception of speech sounds and vice versa, suggesting an intimate linkage between speech production and perception systems. However, it is unclear which cortical processes are involved in the interaction between speech sounds and orofacial somatosensory inputs. The authors examined whether speech sounds modify orofacial somatosensory cortical potentials that were elicited using facial skin perturbations. Method Somatosensory event-related potentials in EEG were recorded in 3 background sound conditions (pink noise, speech sounds, and nonspeech sounds) and also in a silent condition. Facial skin deformations that are similar in timing and duration to those experienced in speech production were used for somatosensory stimulation. Results The authors found that speech sounds reliably enhanced the first negative peak of the somatosensory event-related potential when compared with the other 3 sound conditions. The enhancement was evident at electrode locations above the left motor and premotor area of the orofacial system. The result indicates that speech sounds interact with somatosensory cortical processes that are produced by speech-production-like patterns of facial skin stretch. Conclusion Neural circuits in the left hemisphere, presumably in left motor and premotor cortex, may play a prominent role in the interaction between auditory inputs and speech-relevant somatosensory processing.

APA, Harvard, Vancouver, ISO, and other styles

45

Abdusalomov, Akmalbek Bobomirzaevich, Furkat Safarov, Mekhriddin Rakhimov, Boburkhon Turaev, and Taeg Keun Whangbo. "Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm." Sensors 22, no. 21 (October 24, 2022): 8122. http://dx.doi.org/10.3390/s22218122.

Full text

Abstract:

Speech recognition refers to the capability of software or hardware to receive a speech signal, identify the speaker’s features in the speech signal, and recognize the speaker thereafter. In general, the speech recognition process involves three main steps: acoustic processing, feature extraction, and classification/recognition. The purpose of feature extraction is to illustrate a speech signal using a predetermined number of signal components. This is because all information in the acoustic signal is excessively cumbersome to handle, and some information is irrelevant in the identification task. This study proposes a machine learning-based approach that performs feature parameter extraction from speech signals to improve the performance of speech recognition applications in real-time smart city environments. Moreover, the principle of mapping a block of main memory to the cache is used efficiently to reduce computing time. The block size of cache memory is a parameter that strongly affects the cache performance. In particular, the implementation of such processes in real-time systems requires a high computation speed. Processing speed plays an important role in speech recognition in real-time systems. It requires the use of modern technologies and fast algorithms that increase the acceleration in extracting the feature parameters from speech signals. Problems with overclocking during the digital processing of speech signals have yet to be completely resolved. The experimental results demonstrate that the proposed method successfully extracts the signal features and achieves seamless classification performance compared to other conventional speech recognition algorithms.

APA, Harvard, Vancouver, ISO, and other styles

46

Wingfield, Arthur, and Kimberly C. Lindfield. "Multiple Memory Systems in the Processing of Speech: Evidence from Aging." Experimental Aging Research 21, no. 2 (April 1995): 101–21. http://dx.doi.org/10.1080/03610739508254272.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Murthy, Hema A., and B. Yegnanarayana. "Speech processing using group delay functions." Signal Processing 22, no. 3 (March 1991): 259–67. http://dx.doi.org/10.1016/0165-1684(91)90014-a.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Ghezaiel, Wajdi, Amel Ben Slimane, and Ezzedine Ben Braiek. "On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 6 (December 1, 2016): 2766. http://dx.doi.org/10.11591/ijece.v6i6.9844.

Full text

Abstract:

<p>Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system.</p>

APA, Harvard, Vancouver, ISO, and other styles

49

Ghezaiel, Wajdi, Amel Ben Slimane, and Ezzedine Ben Braiek. "On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification." International Journal of Electrical and Computer Engineering (IJECE) 6, no. 6 (December 1, 2016): 2766. http://dx.doi.org/10.11591/ijece.v6i6.pp2766-2772.

Full text

Abstract:

<p>Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system.</p>

APA, Harvard, Vancouver, ISO, and other styles

50

Stork, David G. "SOURCES OF NEURAL STRUCTURE IN SPEECH AND LANGUAGE PROCESSING." International Journal of Neural Systems 02, no. 03 (January 1991): 159–67. http://dx.doi.org/10.1142/s0129065791000157.

Full text

Abstract:

Because of the complexity and high dimensionality of the problem, speech recognition—perhaps more than any other problem of current interest in network research—will profit from human neurophysiology, psychoacoustics and psycholinguistics: approaches based exclusively on engineering principles will provide only limited benefits. Despite the great power of current learning algorithms in homogeneous or unstructured networks, a number of difficulties in speech recognition seem to indicate that homogeneous networks taken alone will be insufficient for the task, and that structure—representing constraints—will also be required. In the biological system, the sources of such structure include developmental and evolutionary effects. Recent considerations of the evolutionary sources of neural structure in the human speech and language systems, including models of the interrelationship between speech motor system and auditory system, are analyzed with special reference to neural network approaches.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!