Siga este link para ver outros tipos de publicações sobre o tema: Speech processing systems.

Artigos de revistas sobre o tema "Speech processing systems"

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Veja os 50 melhores artigos de revistas para estudos sobre o assunto "Speech processing systems".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Veja os artigos de revistas das mais diversas áreas científicas e compile uma bibliografia correta.

1

Ibragimova, Sayora. "THE ADVANTAGE OFTHEWAVELET TRANSFORM IN PROCESSING OF SPEECH SIGNALS". Technical Sciences 4, n.º 3 (30 de março de 2021): 37–41. http://dx.doi.org/10.26739/2181-9696-2021-3-6.

Texto completo da fonte
Resumo:
This work deals with basic theory of wavelet transform and multi-scale analysis of speech signals, briefly reviewed the main differences between wavelet transform and Fourier transform in the analysis of speech signals. The possibilities to use the method of wavelet analysis to speech recognition systems and its main advantages. In most existing systems of recognition and analysis of speech sound considered as a stream of vectors whose elements are some frequency response. Therefore, the speech processing in real time using sequential algorithms requires computing resources with high performance. Examples of how this method can be used when processing speech signals and build standards for systems of recognition.Key words: digital signal processing, Fourier transform, wavelet analysis, speech signal, wavelet transform
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Dasarathy, Belur V. "Robust speech processing". Information Fusion 5, n.º 2 (junho de 2004): 75. http://dx.doi.org/10.1016/j.inffus.2004.02.002.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Thompson, Laura A., e William C. Ogden. "Visible speech improves human language understanding: Implications for speech processing systems". Artificial Intelligence Review 9, n.º 4-5 (outubro de 1995): 347–58. http://dx.doi.org/10.1007/bf00849044.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Scott, Sophie K., e Carolyn McGettigan. "The neural processing of masked speech". Hearing Research 303 (setembro de 2013): 58–66. http://dx.doi.org/10.1016/j.heares.2013.05.001.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

M Tasbolatov, N. Mekebayev, O. Mamyrbayev, M. Turdalyuly, D. Oralbekova,. "Algorithms and architectures of speech recognition systems". Psychology and Education Journal 58, n.º 2 (20 de fevereiro de 2021): 6497–501. http://dx.doi.org/10.17762/pae.v58i2.3182.

Texto completo da fonte
Resumo:
Digital processing of speech signal and the voice recognition algorithm is very important for fast and accurate automatic scoring of the recognition technology. A voice is a signal of infinite information. The direct analysis and synthesis of a complex speech signal is due to the fact that the information is contained in the signal. Speech is the most natural way of communicating people. The task of speech recognition is to convert speech into a sequence of words using a computer program. This article presents an algorithm of extracting MFCC for speech recognition. The MFCC algorithm reduces the processing power by 53% compared to the conventional algorithm. Automatic speech recognition using Matlab.
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Delic, Vlado, Darko Pekar, Radovan Obradovic e Milan Secujski. "Speech signal processing in ASR&TTS algorithms". Facta universitatis - series: Electronics and Energetics 16, n.º 3 (2003): 355–64. http://dx.doi.org/10.2298/fuee0303355d.

Texto completo da fonte
Resumo:
Speech signal processing and modeling in systems for continuous speech recognition and Text-to-Speech synthesis in Serbian language are described in this paper. Both systems are fully developed by the authors and do not use any third party software. Accuracy of the speech recognizer and intelligibility of the TTS system are in the range of the best solutions in the world, and all conditions are met for commercial use of these solutions.
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

FUNAKOSHI, KOTARO, TAKENOBU TOKUNAGA e HOZUMI TANAKA. "Processing Japanese Self-correction in Speech Dialog Systems." Journal of Natural Language Processing 10, n.º 4 (2003): 33–53. http://dx.doi.org/10.5715/jnlp.10.4_33.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Hills, A., e K. Scott. "Perceived degradation effects in packet speech systems". IEEE Transactions on Acoustics, Speech, and Signal Processing 35, n.º 5 (maio de 1987): 699–701. http://dx.doi.org/10.1109/tassp.1987.1165187.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Gransier, Robin, e Jan Wouters. "Neural auditory processing of parameterized speech envelopes". Hearing Research 412 (dezembro de 2021): 108374. http://dx.doi.org/10.1016/j.heares.2021.108374.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Moon, Todd K., Jacob H. Gunther, Cortnie Broadus, Wendy Hou e Nils Nelson. "Turbo Processing for Speech Recognition". IEEE Transactions on Cybernetics 44, n.º 1 (janeiro de 2014): 83–91. http://dx.doi.org/10.1109/tcyb.2013.2247593.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
11

Arnold, Tim, e Helen J. A. Fuller. "An Ergonomic Framework for Researching and Designing Speech Recognition Technologies in Health Care with an Emphasis on Safety". Proceedings of the International Symposium on Human Factors and Ergonomics in Health Care 8, n.º 1 (setembro de 2019): 279–83. http://dx.doi.org/10.1177/2327857919081067.

Texto completo da fonte
Resumo:
Automatic speech recognition (ASR) systems and speech interfaces are becoming increasingly prevalent. This includes increases in and expansion of use of these technologies for supporting work in health care. Computer-based speech processing has been extensively studied and developed over decades. Speech processing tools have been fine-tuned through the work of Speech and Language Researchers. Researchers have previously and continue to describe speech processing errors in medicine. The discussion provided in this paper proposes an ergonomic framework for speech recognition to expand and further describe this view of speech processing in supporting clinical work. With this end in mind, we hope to build on previous work and emphasize the need for increased human factors involvement in this area while also facilitating the discussion of speech recognition in contexts that have been explored in the human factors domain. Human factors expertise can contribute through proactively describing and designing these critical interconnected socio-technical systems with error-tolerance in mind.
Estilos ABNT, Harvard, Vancouver, APA, etc.
12

Kai, Atsuhiko, e Seiichi Nakagawa. "Comparison of continuous speech recognition systems with unknown-word processing for speech disfluencies". Systems and Computers in Japan 29, n.º 9 (agosto de 1998): 43–53. http://dx.doi.org/10.1002/(sici)1520-684x(199808)29:9<43::aid-scj5>3.0.co;2-j.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
13

Cecinati, Riccardo. "Integrated processing unit, particularly for connected speech recognition systems". Journal of the Acoustical Society of America 92, n.º 2 (agosto de 1992): 1199–200. http://dx.doi.org/10.1121/1.403986.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
14

Marshall, Stephen. "Processing of audio and visual speech for telecommunication systems". Journal of Electronic Imaging 8, n.º 3 (1 de julho de 1999): 263. http://dx.doi.org/10.1117/1.482675.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
15

Polifroni, Joseph, Imre Kiss e Stephanie Seneff. "Speech for Content Creation". International Journal of Mobile Human Computer Interaction 3, n.º 2 (abril de 2011): 35–49. http://dx.doi.org/10.4018/jmhci.2011040103.

Texto completo da fonte
Resumo:
This paper proposes a paradigm for using speech to interact with computers, one that complements and extends traditional spoken dialogue systems: speech for content creation. The literature in automatic speech recognition (ASR), natural language processing (NLP), sentiment detection, and opinion mining is surveyed to argue that the time has come to use mobile devices to create content on-the-fly. Recent work in user modelling and recommender systems is examined to support the claim that using speech in this way can result in a useful interface to uniquely personalizable data. A data collection effort recently undertaken to help build a prototype system for spoken restaurant reviews is discussed. This vision critically depends on mobile technology, for enabling the creation of the content and for providing ancillary data to make its processing more relevant to individual users. This type of system can be of use where only limited speech processing is possible.
Estilos ABNT, Harvard, Vancouver, APA, etc.
16

Auti, Dr Nisha, Atharva Pujari, Anagha Desai, Shreya Patil, Sanika Kshirsagar e Rutika Rindhe. "Advanced Audio Signal Processing for Speaker Recognition and Sentiment Analysis". International Journal for Research in Applied Science and Engineering Technology 11, n.º 5 (31 de maio de 2023): 1717–24. http://dx.doi.org/10.22214/ijraset.2023.51825.

Texto completo da fonte
Resumo:
Abstract: Automatic Speech Recognition (ASR) technology has revolutionized human-computer interaction by allowing users to communicate with computer interfaces using their voice in a natural way. Speaker recognition is a biometric recognition method that identifies individuals based on their unique speech signal, with potential applications in security, communication, and personalization. Sentiment analysis is a statistical method that analyzes unique acoustic properties of the speaker's voice to identify emotions or sentiments in speech. This allows for automated speech recognition systems to accurately categorize speech as Positive, Neutral, or Negative. While sentiment analysis has been developed for various languages, further research is required for regional languages. This project aims to improve the accuracy of automatic speech recognition systems by implementing advanced audio signal processing and sentiment analysis detection. The proposed system will identify the speaker's voice and analyze the audio signal to detect the context of speech, including the identification of foul language and aggressive speech. The system will be developed for the Marathi Language dataset, with potential for further development in other languages.
Estilos ABNT, Harvard, Vancouver, APA, etc.
17

Järvinen, Kari. "Digital speech processing: Speech coding, synthesis, and recognition". Signal Processing 30, n.º 1 (janeiro de 1993): 133–34. http://dx.doi.org/10.1016/0165-1684(93)90056-g.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
18

Varga, A., e F. Fallside. "A technique for using multipulse linear predictive speech synthesis in text-to-speech type systems". IEEE Transactions on Acoustics, Speech, and Signal Processing 35, n.º 4 (abril de 1987): 586–87. http://dx.doi.org/10.1109/tassp.1987.1165151.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
19

Moore, Thomas J., e Richard L. McKinley. "Research on Speech Processing for Military Avionics". Proceedings of the Human Factors Society Annual Meeting 30, n.º 13 (setembro de 1986): 1331–35. http://dx.doi.org/10.1177/154193128603001321.

Texto completo da fonte
Resumo:
The Biological Acoustics Branch of the Armstrong Aerospace Medical Research Laboratory (AAMRL) is engaged in research in a number of speech related areas. This paper will describe the approach used to conduct research in the development and evaluation of military speech communication systems, mention the types of studies done using this approach and give examples of the types of data generated by these studies. Representative data will also be provided describing acoustic-phonetic changes that occur when speech is produced under acceleration.
Estilos ABNT, Harvard, Vancouver, APA, etc.
20

Fadel, Wiam, Toumi Bouchentouf, Pierre-André Buvet e Omar Bourja. "Adapting Off-the-Shelf Speech Recognition Systems for Novel Words". Information 14, n.º 3 (13 de março de 2023): 179. http://dx.doi.org/10.3390/info14030179.

Texto completo da fonte
Resumo:
Current speech recognition systems with fixed vocabularies have difficulties recognizing Out-of-Vocabulary words (OOVs) such as proper nouns and new words. This leads to misunderstandings or even failures in dialog systems. Ensuring effective speech recognition is crucial for the proper functioning of robot assistants. Non-native accents, new vocabulary, and aging voices can cause malfunctions in a speech recognition system. If this task is not executed correctly, the assistant robot will inevitably produce false or random responses. In this paper, we used a statistical approach based on distance algorithms to improve OOV correction. We developed a post-processing algorithm to be combined with a speech recognition model. In this sense, we compared two distance algorithms: Damerau–Levenshtein and Levenshtein distance. We validated the performance of the two distance algorithms in conjunction with five off-the-shelf speech recognition models. Damerau–Levenshtein, as compared to the Levenshtein distance algorithm, succeeded in minimizing the Word Error Rate (WER) when using the MoroccanFrench test set with five speech recognition systems, namely VOSK API, Google API, Wav2vec2.0, SpeechBrain, and Quartznet pre-trained models. Our post-processing method works regardless of the architecture of the speech recognizer, and its results on our MoroccanFrench test set outperformed the five chosen off-the-shelf speech recognizer systems.
Estilos ABNT, Harvard, Vancouver, APA, etc.
21

Puder, Henning, e Gerhard Schmidt. "Applied speech and audio processing". Signal Processing 86, n.º 6 (junho de 2006): 1121–23. http://dx.doi.org/10.1016/j.sigpro.2005.07.034.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
22

Salman, Hayder Mahmood, Vian S. Al Al-Doori, Hayder sharif, Wasfi Hameed4 e Rusul S. Bader. "Accurate Recognition of Natural language Using Machine Learning and Feature Fusion Processing". Fusion: Practice and Applications 10, n.º 1 (2023): 128–42. http://dx.doi.org/10.54216/fpa.100108.

Texto completo da fonte
Resumo:
To enhance the performance of Chinese language pronunciation evaluation and speech recognition systems, researchers are focusing on developing intelligent techniques for multilevel fusion processing of data, features, and decisions using deep learning-based computer-aided systems. With a combination of score level, rank level, and hybrid level fusion, as well as fusion optimization and fusion score improvement, these systems can effectively combine multiple models and sensors to improve the accuracy of information fusion. Additionally, intelligent systems for information fusion, including those used in robotics and decision-making, can benefit from techniques such as multimedia data fusion and machine learning for data fusion. Furthermore, optimization algorithms and fuzzy approaches can be applied to data fusion applications in cloud environments and e-systems, while spatial data fusion can be used to enhance the quality of image and feature data In this paper, a new approach has been presented to identify the tonal language in continuous speech. This study proposes the Machine learning-assisted automatic speech recognition framework (ML-ASRF) for Chinese character and language prediction. Our focus is on extracting highly robust features and combining various speech signal sequences of deep models. The experimental results demonstrated that the machine learning neural network recognition rate is considerably higher than that of the conventional speech recognition algorithm, which performs more accurate human-computer interaction and increases the efficiency of determining Chinese language pronunciation accuracy.
Estilos ABNT, Harvard, Vancouver, APA, etc.
23

Ali Abumalloh, Rabab, Hasan Muaidi Al-Serhan, Othman Bin Ibrahim e Waheeb Abu-Ulbeh. "Arabic Part-of-Speech Tagger, an Approach Based on Neural Network Modelling". International Journal of Engineering & Technology 7, n.º 2.29 (22 de maio de 2018): 742. http://dx.doi.org/10.14419/ijet.v7i2.29.14009.

Texto completo da fonte
Resumo:
POS-tagging gained the interest of researchers in computational linguistics sciences in the recent years. Part-of-speech tagging systems assign the proper grammatical tag or morpho-syntactical category labels automatically to every word in the corpus per its appearance on the text. POS-tagging serves as a fundamental and preliminary step in linguistic analysis which can help in developing many natural language processing applications such as: word processing systems, spell checking systems, building dictionaries and in parsing systems. Arabic language gained the interest of researchers which led to increasing demand for Arabic natural language processing systems. Artificial neural networks has been applied in many applications such as speech recognition and part of speech prediction, but it is considered as a new approach in Part-of-speech tagging. In this research, we developed an Arabic POS-tagger using artificial neural network. A corpus of 20,620 words, which were manually assigned to the appropriate tags was developed and used to train the artificial neural network and to test the part of speech tagger systems’ overall performance. The accuracy of the developed tagger reaches 89.04% using the testing dataset. While, it reaches 98.94% using the training dataset. By combining the two datasets, the accuracy rate for the whole system is 96.96%.
Estilos ABNT, Harvard, Vancouver, APA, etc.
24

Romeu, E. S., e V. I. Syryamkin. "Possibilities for applied joint speech processing and computer vision systems". IOP Conference Series: Materials Science and Engineering 516 (26 de abril de 2019): 012044. http://dx.doi.org/10.1088/1757-899x/516/1/012044.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
25

Bonte, Milene, Anke Ley, Wolfgang Scharke e Elia Formisano. "Developmental refinement of cortical systems for speech and voice processing". NeuroImage 128 (março de 2016): 373–84. http://dx.doi.org/10.1016/j.neuroimage.2016.01.015.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
26

Savchenko, L. V., e A. V. Savchenko. "Fuzzy Phonetic Encoding of Speech Signals in Voice Processing Systems". Journal of Communications Technology and Electronics 64, n.º 3 (março de 2019): 238–44. http://dx.doi.org/10.1134/s1064226919030173.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
27

Chen, Tsuhan. "Video signal processing systems and methods utilizing automated speech analysis". Journal of the Acoustical Society of America 112, n.º 2 (2002): 368. http://dx.doi.org/10.1121/1.1507005.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
28

Weinstein, C. J. "Opportunities for advanced speech processing in military computer-based systems". Proceedings of the IEEE 79, n.º 11 (1991): 1626–41. http://dx.doi.org/10.1109/5.118986.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
29

de Abreu, Caio Cesar Enside, Marco Aparecido Queiroz Duarte, Bruno Rodrigues de Oliveira, Jozue Vieira Filho e Francisco Villarreal. "Regression-Based Noise Modeling for Speech Signal Processing". Fluctuation and Noise Letters 20, n.º 03 (30 de janeiro de 2021): 2150022. http://dx.doi.org/10.1142/s021947752150022x.

Texto completo da fonte
Resumo:
Speech processing systems are very important in different applications involving speech and voice quality such as automatic speech recognition, forensic phonetics and speech enhancement, among others. In most of them, the acoustic environmental noise is added to the original signal, decreasing the signal-to-noise ratio (SNR) and the speech quality by consequence. Therefore, estimating noise is one of the most important steps in speech processing whether to reduce it before processing or to design robust algorithms. In this paper, a new approach to estimate noise from speech signals is presented and its effectiveness is tested in the speech enhancement context. For this purpose, partial least squares (PLS) regression is used to model the acoustic environment (AE) and a Wiener filter based on a priori SNR estimation is implemented to evaluate the proposed approach. Six noise types are used to create seven acoustically modeled noises. The basic idea is to consider the AE model to identify the noise type and estimate its power to be used in a speech processing system. Speech signals processed using the proposed method and classical noise estimators are evaluated through objective measures. Results show that the proposed method produces better speech quality than state-of-the-art noise estimators, enabling it to be used in real-time applications in the field of robotic, telecommunications and acoustic analysis.
Estilos ABNT, Harvard, Vancouver, APA, etc.
30

Ungureanu, Dan, Stefan-Adrian Toma, Ion-Dorinel Filip, Bogdan-Costel Mocanu, Iulian Aciobăniței, Bogdan Marghescu, Titus Balan, Mihai Dascalu, Ion Bica e Florin Pop. "ODIN112–AI-Assisted Emergency Services in Romania". Applied Sciences 13, n.º 1 (3 de janeiro de 2023): 639. http://dx.doi.org/10.3390/app13010639.

Texto completo da fonte
Resumo:
The evolution of Natural Language Processing technologies transformed them into viable choices for various accessibility features and for facilitating interactions between humans and computers. A subset of them consists of speech processing systems, such as Automatic Speech Recognition, which became more accurate and more popular as a result. In this article, we introduce an architecture built around various speech processing systems to enhance Romanian emergency services. Our system is designed to help the operator evaluate various situations with the end goal of reducing the response times of emergency services. We also release the largest high-quality speech dataset of more than 150 h for Romanian. Our architecture includes an Automatic Speech Recognition model to transcribe calls automatically and augment the operator’s notes, as well as a Speech Recognition model to classify the caller’s emotions. We achieve state-of-the-art results on both tasks, while our demonstrator is designed to be integrated with the Romanian emergency system.
Estilos ABNT, Harvard, Vancouver, APA, etc.
31

Jamieson, Donald G., Vijay Parsa, Moneca C. Price e James Till. "Interaction of Speech Coders and Atypical Speech II". Journal of Speech, Language, and Hearing Research 45, n.º 4 (agosto de 2002): 689–99. http://dx.doi.org/10.1044/1092-4388(2002/055).

Texto completo da fonte
Resumo:
We investigated how standard speech coders, currently used in modern communication systems, affect the quality of the speech of persons who have common speech and voice disorders. Three standardized speech coders (GSM 6.10 RPELTP, FS1016 CELP, and FS1015 LPC) and two speech coders based on subband processing were evaluated for their performance. Coder effects were assessed by measuring the quality of speech samples both before and after processing by the speech coders. Speech quality was rated by 10 listeners with normal hearing on 28 different scales representing pitch and loudness changes, speech rate, laryngeal and resonatory dysfunction, and coder-induced distortions. Results showed that (a) nine scale items were consistently and reliably rated by the listeners; (b) all coders degraded speech quality on these nine scales, with the GSM and CELP coders providing the better quality speech; and (c) interactions between coders and individual voices did occur on several voice quality scales.
Estilos ABNT, Harvard, Vancouver, APA, etc.
32

Hu, J., C. C. Cheng e W. H. Liu. "Processing of speech signals using a microphone array for intelligent robots". Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering 219, n.º 2 (1 de março de 2005): 133–43. http://dx.doi.org/10.1243/095965105x9461.

Texto completo da fonte
Resumo:
For intelligent robots to interact with people, an efficient human-robot communication interface is very important (e.g. voice command). However, recognizing voice command or speech represents only part of speech communication. The physics of speech signals includes other information, such as speaker direction. Secondly, a basic element of processing the speech signal is recognition at the acoustic level. However, the performance of recognition depends greatly on the reception. In a noisy environment, the success rate can be very poor. As a result, prior to speech recognition, it is important to process the speech signals to extract the needed content while rejecting others (such as background noise). This paper presents a speech purification system for robots to improve the signal-to-noise ratio of reception and an algorithm with a multidirection calibration beamformer.
Estilos ABNT, Harvard, Vancouver, APA, etc.
33

Wu, Yixuan. "Application of deep learning-based speech signal processing technology in electronic communication". Applied and Computational Engineering 77, n.º 1 (16 de julho de 2024): 106–11. http://dx.doi.org/10.54254/2755-2721/77/20240661.

Texto completo da fonte
Resumo:
In recent years, the artificial intelligence boom triggered by deep learning is influencing and changing peoples lifestyles. People are no longer satisfied with human-computer interaction through simple text commands; instead, they look forward to more convenient and faster communication methods like voice interaction. Against the backdrop of innovative development, the application of speech signal processing systems is becoming increasingly widespread. Therefore, it is necessary to study the application of deep learning-based speech signal processing technology in electronic communication. This can provide more valuable references and assistance for future development, promoting the better development of deep learning-based speech signal processing technology in electronic communication. In this paper, we first review the application of deep learning in speech signal enhancement, speech recognition, and speech synthesis from a theoretical analysis perspective. Then, we discuss the application of deep learning-based speech signal processing in electronic communication, including the application of models such as Transformer, LAS (Listen, Attend and Spell), and GFT-conformer in speech signal processing. We also discuss some application scenarios of deep learning-based speech signal processing in electronic communication. Finally, we identify the need for deeper application of deep learning technology in speech signal processing and electronic communication, with continuous optimization and adjustment.
Estilos ABNT, Harvard, Vancouver, APA, etc.
34

Smither, Janan Al-Awar. "The Processing of Synthetic Speech by Older and Younger Adults". Proceedings of the Human Factors Society Annual Meeting 36, n.º 2 (outubro de 1992): 190–92. http://dx.doi.org/10.1177/154193129203600211.

Texto completo da fonte
Resumo:
This experiment investigated the demands synthetic speech places on short term memory by comparing performance of old and young adults on an ordinary short term memory task. Items presented were generated by a human speaker or by a text-to-speech computer synthesizer. Results were consistent with the idea that the comprehension of synthetic speech imposes increased resource demands on the short term memory system. Older subjects performed significantly more poorly than younger subjects, and both groups performed more poorly with synthetic than with human speech. Findings suggest that short term memory demands imposed by the processing of synthetic speech should be investigated further, particularly regarding the implementation of voice response systems in devices for the elderly.
Estilos ABNT, Harvard, Vancouver, APA, etc.
35

Chien, Jen-Tzung, e Man-Wai Mak. "Guest Editorial: Modern Speech Processing and Learning". Journal of Signal Processing Systems 92, n.º 8 (9 de julho de 2020): 775–76. http://dx.doi.org/10.1007/s11265-020-01577-4.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
36

Islam, Rumana, Esam Abdel-Raheem e Mohammed Tarique. "A Novel Pathological Voice Identification Technique through Simulated Cochlear Implant Processing Systems". Applied Sciences 12, n.º 5 (25 de fevereiro de 2022): 2398. http://dx.doi.org/10.3390/app12052398.

Texto completo da fonte
Resumo:
This paper presents a pathological voice identification system employing signal processing techniques through cochlear implant models. The fundamentals of the biological process for speech perception are investigated to develop this technique. Two cochlear implant models are considered in this work: one uses a conventional bank of bandpass filters, and the other one uses a bank of optimized gammatone filters. The critical center frequencies of those filters are selected to mimic the human cochlear vibration patterns caused by audio signals. The proposed system processes the speech samples and applies a CNN for final pathological voice identification. The results show that the two proposed models adopting bandpass and gammatone filterbanks can discriminate the pathological voices from healthy ones, resulting in F1 scores of 77.6% and 78.7%, respectively, with speech samples. The obtained results of this work are also compared with those of other related published works.
Estilos ABNT, Harvard, Vancouver, APA, etc.
37

Kazi, Sara. "SPEECH RECOGNITION SYSTEM". INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, n.º 03 (22 de março de 2024): 1–5. http://dx.doi.org/10.55041/ijsrem29567.

Texto completo da fonte
Resumo:
Speech recognition technology has witnessed remarkable progress in recent years, fueled by advancements in machine learning, deep neural networks, and signal processing techniques. This paper presents a comprehensive review of the current state-of-the-art in speech recognition systems, highlighting key methodologies and breakthroughs that have contributed to their improved performance. The paper explores various aspects, including acoustic modeling, language modeling, and the integration of contextual information, shedding light on the challenges faced and innovative solutions proposed in the field. Furthermore, the paper discusses the impact of large-scale datasets and transfer learning on the robustness and adaptability of speech recognition models. It delves into recent developments in end-to-end models and their potential to simplify the architecture while enhancing accuracy. The integration of real-time and edge computing for speech recognition applications is also explored, emphasizing the implications for practical implementations in diverse domains such as healthcare, telecommunications, and smart devices. In addition to reviewing the current landscape, the paper provides insights into future prospects and emerging trends in speech recognition research. The role of multimodal approaches, incorporating visual and contextual cues, is discussed as a potential avenue for further improvement. Ethical considerations related to privacy and bias in speech recognition systems are also addressed, emphasizing the importance of responsible development and deployment. By synthesizing current research findings and anticipating future directions, this paper contributes to the evolving discourse on speech recognition technologies, providing a valuable resource for researchers, practitioners, and industry professionals in the field. Key Words: Real-time processing , Machine learning , Deep neural networks , Technology advancements , Contextual information , Large-scale datasets Transfer learning , End-to-end models , Real-time processing Edge computing , Multimodal approaches Ethical considerations , Privacy , Bias , Future prospects Research review.
Estilos ABNT, Harvard, Vancouver, APA, etc.
38

Yu, Sabrina, Sherryse Corrow, Jason JS Barton e Andrea Albonico. "Facial Identity And Facial Speech Processing In Developmental Prosopagnosia". Journal of Vision 22, n.º 14 (5 de dezembro de 2022): 3422. http://dx.doi.org/10.1167/jov.22.14.3422.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
39

Kosarev, Y. "Synergetics and 'insight' strategy for speech processing". Literary and Linguistic Computing 12, n.º 2 (1 de junho de 1997): 113–18. http://dx.doi.org/10.1093/llc/12.2.113.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
40

Finke, Mareike, Pascale Sandmann, Hanna Bönitz, Andrej Kral e Andreas Büchner. "Consequences of Stimulus Type on Higher-Order Processing in Single-Sided Deaf Cochlear Implant Users". Audiology and Neurotology 21, n.º 5 (2016): 305–15. http://dx.doi.org/10.1159/000452123.

Texto completo da fonte
Resumo:
Single-sided deaf subjects with a cochlear implant (CI) provide the unique opportunity to compare central auditory processing of the electrical input (CI ear) and the acoustic input (normal-hearing, NH, ear) within the same individual. In these individuals, sensory processing differs between their two ears, while cognitive abilities are the same irrespectively of the sensory input. To better understand perceptual-cognitive factors modulating speech intelligibility with a CI, this electroencephalography study examined the central-auditory processing of words, the cognitive abilities, and the speech intelligibility in 10 postlingually single-sided deaf CI users. We found lower hit rates and prolonged response times for word classification during an oddball task for the CI ear when compared with the NH ear. Also, event-related potentials reflecting sensory (N1) and higher-order processing (N2/N4) were prolonged for word classification (targets versus nontargets) with the CI ear compared with the NH ear. Our results suggest that speech processing via the CI ear and the NH ear differs both at sensory (N1) and cognitive (N2/N4) processing stages, thereby affecting the behavioral performance for speech discrimination. These results provide objective evidence for cognition to be a key factor for speech perception under adverse listening conditions, such as the degraded speech signal provided from the CI.
Estilos ABNT, Harvard, Vancouver, APA, etc.
41

Jamal, Marwa, e Tariq A. Hassan. "Speech Coding Using Discrete Cosine Transform and Chaotic Map". Ingénierie des systèmes d information 27, n.º 4 (31 de agosto de 2022): 673–77. http://dx.doi.org/10.18280/isi.270419.

Texto completo da fonte
Resumo:
Recently, data of multimedia performs an exponentially blowing tendency, saturating daily life of humans. Various modalities of data, includes images, texts and video, plays important role in different aspects and has wide. However, the key problem of utilizing data of large scale is cost of processing and massive storage. Therefore, for efficient communications and for economical storage requires effective techniques of data compression to reduce the volume of data. Speech coding is a main problem in the area of digital speech processing. The process of converting the voice signals into a more compressed form is speech coding. In this work, we demonstrate that a DCT with a chaotic system combined with run-length coding can be utilized to implement speech coding of very low bit-rate with high reconstruction quality. Experimental result show that compression ratio is about 13% when implemented on Librispeech dataset.
Estilos ABNT, Harvard, Vancouver, APA, etc.
42

Resende, Natália, e Andy Way. "Can Google Translate Rewire Your L2 English Processing?" Digital 1, n.º 1 (4 de março de 2021): 66–85. http://dx.doi.org/10.3390/digital1010006.

Texto completo da fonte
Resumo:
In this article, we address the question of whether exposure to the translated output of MT systems could result in changes in the cognitive processing of English as a second language (L2 English). To answer this question, we first conducted a survey with 90 Brazilian Portuguese L2 English speakers with the aim of understanding how and for what purposes they use web-based MT systems. To investigate whether MT systems are capable of influencing L2 English cognitive processing, we carried out a syntactic priming experiment with 32 Brazilian Portuguese speakers. We wanted to test whether speakers re-use in their subsequent speech in English the same syntactic alternative previously seen in the MT output, when using the popular Google Translate system to translate sentences from Portuguese into English. The results of the survey show that Brazilian Portuguese L2 English speakers use Google Translate as a tool supporting their speech in English as well as a source of English vocabulary learning. The results of the syntactic priming experiment show that exposure to an English syntactic alternative through GT can lead to the re-use of the same syntactic alternative in subsequent speech even if it is not the speaker’s preferred syntactic alternative in English. These findings suggest that GT is being used as a tool for language learning purposes and so is indeed capable of rewiring the processing of L2 English syntax.
Estilos ABNT, Harvard, Vancouver, APA, etc.
43

Yelle, Serena K., e Gina M. Grimshaw. "Hemispheric Specialization for Linguistic Processing of Sung Speech". Perceptual and Motor Skills 108, n.º 1 (fevereiro de 2009): 219–28. http://dx.doi.org/10.2466/pms.108.1.219-228.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
44

Ito, Takayuki, Alexis R. Johns e David J. Ostry. "Left Lateralized Enhancement of Orofacial Somatosensory Processing Due to Speech Sounds". Journal of Speech, Language, and Hearing Research 56, n.º 6 (dezembro de 2013): 1875–81. http://dx.doi.org/10.1044/1092-4388(2013/12-0226).

Texto completo da fonte
Resumo:
Purpose Somatosensory information associated with speech articulatory movements affects the perception of speech sounds and vice versa, suggesting an intimate linkage between speech production and perception systems. However, it is unclear which cortical processes are involved in the interaction between speech sounds and orofacial somatosensory inputs. The authors examined whether speech sounds modify orofacial somatosensory cortical potentials that were elicited using facial skin perturbations. Method Somatosensory event-related potentials in EEG were recorded in 3 background sound conditions (pink noise, speech sounds, and nonspeech sounds) and also in a silent condition. Facial skin deformations that are similar in timing and duration to those experienced in speech production were used for somatosensory stimulation. Results The authors found that speech sounds reliably enhanced the first negative peak of the somatosensory event-related potential when compared with the other 3 sound conditions. The enhancement was evident at electrode locations above the left motor and premotor area of the orofacial system. The result indicates that speech sounds interact with somatosensory cortical processes that are produced by speech-production-like patterns of facial skin stretch. Conclusion Neural circuits in the left hemisphere, presumably in left motor and premotor cortex, may play a prominent role in the interaction between auditory inputs and speech-relevant somatosensory processing.
Estilos ABNT, Harvard, Vancouver, APA, etc.
45

Abdusalomov, Akmalbek Bobomirzaevich, Furkat Safarov, Mekhriddin Rakhimov, Boburkhon Turaev e Taeg Keun Whangbo. "Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm". Sensors 22, n.º 21 (24 de outubro de 2022): 8122. http://dx.doi.org/10.3390/s22218122.

Texto completo da fonte
Resumo:
Speech recognition refers to the capability of software or hardware to receive a speech signal, identify the speaker’s features in the speech signal, and recognize the speaker thereafter. In general, the speech recognition process involves three main steps: acoustic processing, feature extraction, and classification/recognition. The purpose of feature extraction is to illustrate a speech signal using a predetermined number of signal components. This is because all information in the acoustic signal is excessively cumbersome to handle, and some information is irrelevant in the identification task. This study proposes a machine learning-based approach that performs feature parameter extraction from speech signals to improve the performance of speech recognition applications in real-time smart city environments. Moreover, the principle of mapping a block of main memory to the cache is used efficiently to reduce computing time. The block size of cache memory is a parameter that strongly affects the cache performance. In particular, the implementation of such processes in real-time systems requires a high computation speed. Processing speed plays an important role in speech recognition in real-time systems. It requires the use of modern technologies and fast algorithms that increase the acceleration in extracting the feature parameters from speech signals. Problems with overclocking during the digital processing of speech signals have yet to be completely resolved. The experimental results demonstrate that the proposed method successfully extracts the signal features and achieves seamless classification performance compared to other conventional speech recognition algorithms.
Estilos ABNT, Harvard, Vancouver, APA, etc.
46

Wingfield, Arthur, e Kimberly C. Lindfield. "Multiple Memory Systems in the Processing of Speech: Evidence from Aging". Experimental Aging Research 21, n.º 2 (abril de 1995): 101–21. http://dx.doi.org/10.1080/03610739508254272.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
47

Murthy, Hema A., e B. Yegnanarayana. "Speech processing using group delay functions". Signal Processing 22, n.º 3 (março de 1991): 259–67. http://dx.doi.org/10.1016/0165-1684(91)90014-a.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
48

Ghezaiel, Wajdi, Amel Ben Slimane e Ezzedine Ben Braiek. "On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification". International Journal of Electrical and Computer Engineering (IJECE) 6, n.º 6 (1 de dezembro de 2016): 2766. http://dx.doi.org/10.11591/ijece.v6i6.9844.

Texto completo da fonte
Resumo:
<p>Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system.</p>
Estilos ABNT, Harvard, Vancouver, APA, etc.
49

Ghezaiel, Wajdi, Amel Ben Slimane e Ezzedine Ben Braiek. "On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification". International Journal of Electrical and Computer Engineering (IJECE) 6, n.º 6 (1 de dezembro de 2016): 2766. http://dx.doi.org/10.11591/ijece.v6i6.pp2766-2772.

Texto completo da fonte
Resumo:
<p>Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system.</p>
Estilos ABNT, Harvard, Vancouver, APA, etc.
50

Stork, David G. "SOURCES OF NEURAL STRUCTURE IN SPEECH AND LANGUAGE PROCESSING". International Journal of Neural Systems 02, n.º 03 (janeiro de 1991): 159–67. http://dx.doi.org/10.1142/s0129065791000157.

Texto completo da fonte
Resumo:
Because of the complexity and high dimensionality of the problem, speech recognition—perhaps more than any other problem of current interest in network research—will profit from human neurophysiology, psychoacoustics and psycholinguistics: approaches based exclusively on engineering principles will provide only limited benefits. Despite the great power of current learning algorithms in homogeneous or unstructured networks, a number of difficulties in speech recognition seem to indicate that homogeneous networks taken alone will be insufficient for the task, and that structure—representing constraints—will also be required. In the biological system, the sources of such structure include developmental and evolutionary effects. Recent considerations of the evolutionary sources of neural structure in the human speech and language systems, including models of the interrelationship between speech motor system and auditory system, are analyzed with special reference to neural network approaches.
Estilos ABNT, Harvard, Vancouver, APA, etc.
Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!

Vá para a bibliografia