Bibliografías temáticas / Audio speech recognition

Literatura académica sobre el tema "Audio speech recognition"

Autor: Grafiati

Publicado: 6 de septiembre de 2023

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Índice

Artículos de revistas
Tesis
Libros
Capítulos de libros
Actas de conferencias
Informes

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Audio speech recognition".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Audio speech recognition"

Beadles, Robert L. "Audio visual speech recognition". Journal of the Acoustical Society of America 87, n.º 5 (mayo de 1990): 2274. http://dx.doi.org/10.1121/1.399137.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Bahal, Akriti. "Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition". IOSR Journal of Computer Engineering 5, n.º 1 (2012): 31–36. http://dx.doi.org/10.9790/0661-0513136.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Hwang, Jung-Wook, Jeongkyun Park, Rae-Hong Park y Hyung-Min Park. "Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition". Applied Acoustics 211 (agosto de 2023): 109478. http://dx.doi.org/10.1016/j.apacoust.2023.109478.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Nakadai, Kazuhiro y Tomoaki Koiwa. "Psychologically-Inspired Audio-Visual Speech Recognition Using Coarse Speech Recognition and Missing Feature Theory". Journal of Robotics and Mechatronics 29, n.º 1 (20 de febrero de 2017): 105–13. http://dx.doi.org/10.20965/jrm.2017.p0105.

Texto completo

Resumen

[abstFig src='/00290001/10.jpg' width='300' text='System architecture of AVSR based on missing feature theory and P-V grouping' ] Audio-visual speech recognition (AVSR) is a promising approach to improving the noise robustness of speech recognition in the real world. For AVSR, the auditory and visual units are the phoneme and viseme, respectively. However, these are often misclassified in the real world because of noisy input. To solve this problem, we propose two psychologically-inspired approaches. One is audio-visual integration based on missing feature theory (MFT) to cope with missing or unreliable audio and visual features for recognition. The other is phoneme and viseme grouping based on coarse-to-fine recognition. Preliminary experiments show that these two approaches are effective for audio-visual speech recognition. Integration based on MFT with an appropriate weight improves the recognition performance by −5 dB. This is the case even in a noisy environment, in which most speech recognition systems do not work properly. Phoneme and viseme grouping further improved the AVSR performance, particularly at a low signal-to-noise ratio.**This work is an extension of our publication “Tomoaki Koiwa et al.: Coarse speech recognition by audio-visual integration based on missing feature theory, IROS 2007, pp.1751-1756, 2007.”

Los estilos APA, Harvard, Vancouver, ISO, etc.

BASYSTIUK, Oleh y Nataliia MELNYKOVA. "MULTIMODAL SPEECH RECOGNITION BASED ON AUDIO AND TEXT DATA". Herald of Khmelnytskyi National University. Technical sciences 313, n.º 5 (27 de octubre de 2022): 22–25. http://dx.doi.org/10.31891/2307-5732-2022-313-5-22-25.

Texto completo

Resumen

Systems of machine translation of texts from one language to another simulate the work of a human translator. Their performance depends on the ability to understand the grammar rules of the language. In translation, the basic units are not individual words, but word combinations or phraseological units that express different concepts. Only by using them, more complex ideas can be expressed through the translated text. The main feature of machine translation is different length for input and output. The ability to work with different lengths of input and output provides us with the approach of recurrent neural networks. A recurrent neural network (RNN) is a class of artificial neural network that has connections between nodes. In this case, a connection refers to a connection from a more distant node to a less distant node. The presence of connections allows the RNN to remember and reproduce the entire sequence of reactions to one stimulus. From the point of view of programming, such networks are analogous to cyclic execution, and from the point of view of the system, such networks are equivalent to a state machine. RNNs are commonly used to process word sequences in natural language processing. Usually, a hidden Markov model (HMM) and an N-program language model are used to process a sequence of words. Deep learning has completely changed the approach to machine translation. Researchers in the deep learning field has created simple solutions based on machine learning that outperform the best expert systems. In this paper was reviewed the main features of machine translation based on recurrent neural networks. The advantages of systems based on RNN using the sequence-to-sequence model against statistical translation systems are also highlighted in the article. Two machine translation systems based on the sequence-to-sequence model were constructed using Keras and PyTorch machine learning libraries. Based on the obtained results, libraries analysis was done, and their performance comparison.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Dupont, S. y J. Luettin. "Audio-visual speech modeling for continuous speech recognition". IEEE Transactions on Multimedia 2, n.º 3 (2000): 141–51. http://dx.doi.org/10.1109/6046.865479.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Kubanek, M., J. Bobulski y L. Adrjanowicz. "Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition". Bulletin of the Polish Academy of Sciences: Technical Sciences 60, n.º 2 (1 de octubre de 2012): 307–16. http://dx.doi.org/10.2478/v10175-012-0041-6.

Texto completo

Resumen

Abstract. This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of the highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). The described methods were developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audiovisual speech recognition. The problem of a visual speech analysis is very difficult and computationally demanding, mostly because of an extreme amount of data that needs to be processed. Therefore, the method of audio-video speech recognition is used only while the audiospeech signal is exposed to a considerable level of distortion. There are proposed the authors’ own methods of the lip edges detection and a visual characteristic extraction in this paper. Moreover, the method of fusing speech characteristics for an audio-video signal was proposed and tested. A significant increase of recognition effectiveness and processing speed were noted during tests - for properly selected CHMM parameters and an adequate codebook size, besides the use of the appropriate fusion of audio-visual characteristics. The experimental results were very promising and close to those achieved by leading scientists in the field of audio-visual speech recognition.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Kacur, Juraj, Boris Puterka, Jarmila Pavlovicova y Milos Oravec. "Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications". Sensors 22, n.º 16 (22 de agosto de 2022): 6304. http://dx.doi.org/10.3390/s22166304.

Texto completo

Resumen

There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Showkat Ahmad Dar, Showkat Ahmad Dar. "Emotion Recognition Based On Audio Speech". IOSR Journal of Computer Engineering 11, n.º 6 (2013): 46–50. http://dx.doi.org/10.9790/0661-1164650.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Aucouturier, Jean-Julien y Laurent Daudet. "Pattern recognition of non-speech audio". Pattern Recognition Letters 31, n.º 12 (septiembre de 2010): 1487–88. http://dx.doi.org/10.1016/j.patrec.2010.05.003.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Más fuentes

Tesis sobre el tema "Audio speech recognition"

Miyajima, C., D. Negi, Y. Ninomiya, M. Sano, K. Mori, K. Itou, K. Takeda y Y. Suenaga. "Audio-Visual Speech Database for Bimodal Speech Recognition". INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2005. http://hdl.handle.net/2237/10460.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Seymour, R. "Audio-visual speech and speaker recognition". Thesis, Queen's University Belfast, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492489.

Texto completo

Resumen

In this thesis, a number of important issues relating to the use of both audio and video information for speech and speaker recognition are investigated. A comprehensive comparison of different visual feature types is given, including both geometric and image transformation based features. A new geometric based method for feature extraction is described, as well as the novel use of curvelet based features. Different methods for constructing the feature vectors are compared, as well as feature vector sizes and the use of dynamic features. Each feature type is tested against three types of visual noise: compression, blurring and jitter. A novel method of integrating the audio and video information streams called the maximum stream posterior (MSP) is described. This method is tested in both speaker dependent and speaker independent audio-visual speech recognition (AVSR) systems, and is shown to be robust to noise in either the audio or video streams, given no prior knowledge of the noise. This method is then extended to form the maximum weighted stream posterior (MWSP) method. Finally, both the MSP and MWSP are tested in an audio-visual speaker recognition system (AVSpR). / Experiments using the XM2VTS database will show that both of these methods can outperform ,_.','/ standard methods in terms of recognition accuracy in situations where either stream is corrupted.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Pachoud, Samuel. "Audio-visual speech and emotion recognition". Thesis, Queen Mary, University of London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.528923.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Matthews, Iain. "Features for audio-visual speech recognition". Thesis, University of East Anglia, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266736.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Kaucic, Robert August. "Lip tracking for audio-visual speech recognition". Thesis, University of Oxford, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360392.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Lucey, Simon. "Audio-visual speech processing". Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Texto completo

Resumen

Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Eriksson, Mattias. "Speech recognition availability". Thesis, Linköping University, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2651.

Texto completo

Resumen

This project investigates the importance of availability in the scope of dictation programs. Using speech recognition technology for dictating has not reached the public, and that may very well be a result of poor availability in today’s technical solutions.

I have constructed a persona character, Johanna, who personalizes the target user. I have also developed a solution that streams audio into a speech recognition server and sends back interpreted text. Johanna affirmed that the solution was successful in theory.

I then incorporated test users that tried out the solution in practice. Half of them do indeed claim that their usage has been and will continue to be increased thanks to the new level of availability.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Rao, Ram Raghavendra. "Audio-visual interaction in multimedia". Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/13349.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing". Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.

Texto completo

Resumen

Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing". Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.

Texto completo

Resumen

Los estilos APA, Harvard, Vancouver, ISO, etc.

Más fuentes

Libros sobre el tema "Audio speech recognition"

Sen, Soumya, Anjan Dutta y Nilanjan Dey. Audio Processing and Speech Recognition. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ogunfunmi, Tokunbo, Roberto Togneri y Madihally Narasimha, eds. Speech and Audio Processing for Coding, Enhancement and Recognition. New York, NY: Springer New York, 2015. http://dx.doi.org/10.1007/978-1-4939-1456-2.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Junqua, Jean-Claude. Robustness in automatic speech recognition: Fundamentals and applications. Boston: Kluwer Academic Publishers, 1996.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Harrison, Mark. The use of interactive audio and speech recognition techniques in training. [U.K.]: [s.n.], 1993.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

AVBPA '97 ((1st 1997 Montana,Switzerland). Audio- and video-based biometric person authentication: First International Conference, AVBPA '97, Crans-Montana, Switzerland, March 1997 : proceedings. Berlin: Springer, 1997.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

1946-, Kittler Josef y Nixon Mark S, eds. Audio-and video-based biometric person authentication: 4th International Conference, AVBPA 2003, Guildford, UK, June 2003 : proceedings. Berlin: Springer, 2003.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

International Conference, AVBPA (1st 1997 Montana, Switzerland). Audio- and video-based biometric person authentication: First International Conference, AVBPA '97, Crans-Montana, Switzerland, March 12-14, 1997 : proceedings. Berlin: Springer, 1997.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

IEEE Workshop on Automatic Speech Recognition and Understanding (1997 Santa Barbara, Calif.). 1997 IEEE Workshop on Automatic Speech Recognition and Understanding proceedings. Piscataway, NJ: Published under the sponsorship of the IEEE Signal Processing Society, 1997.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Minker, Wolfgang. Speech and human-machine dialog. Boston: Kluwer Academic Publishers, 2004.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Minker, Wolfgang. Speech and human-machine dialog. Boston: Kluwer Academic Publishers, 2004.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Más fuentes

Capítulos de libros sobre el tema "Audio speech recognition"

Sen, Soumya, Anjan Dutta y Nilanjan Dey. "Audio Indexing". En Audio Processing and Speech Recognition, 1–11. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_1.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Sen, Soumya, Anjan Dutta y Nilanjan Dey. "Audio Classification". En Audio Processing and Speech Recognition, 67–93. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_4.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Richter, Michael M., Sheuli Paul, Veton Këpuska y Marius Silaghi. "Audio Signals and Speech Recognition". En Signal Processing and Machine Learning with Applications, 345–68. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-319-45372-9_18.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Luettin, Juergen y Stéphane Dupont. "Continuous audio-visual speech recognition". En Lecture Notes in Computer Science, 657–73. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0054771.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Sen, Soumya, Anjan Dutta y Nilanjan Dey. "Speech Processing and Recognition System". En Audio Processing and Speech Recognition, 13–43. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_2.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Sen, Soumya, Anjan Dutta y Nilanjan Dey. "Feature Extraction". En Audio Processing and Speech Recognition, 45–66. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_3.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Sen, Soumya, Anjan Dutta y Nilanjan Dey. "Conclusion". En Audio Processing and Speech Recognition, 95–96. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_5.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Sethu, Vidhyasaharan, Julien Epps y Eliathamby Ambikairajah. "Speech Based Emotion Recognition". En Speech and Audio Processing for Coding, Enhancement and Recognition, 197–228. New York, NY: Springer New York, 2014. http://dx.doi.org/10.1007/978-1-4939-1456-2_7.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Karpov, Alexey, Alexander Ronzhin, Irina Kipyatkova, Andrey Ronzhin, Vasilisa Verkhodanova, Anton Saveliev y Milos Zelezny. "Bimodal Speech Recognition Fusing Audio-Visual Modalities". En Lecture Notes in Computer Science, 170–79. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-39516-6_16.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Kratt, Jan, Florian Metze, Rainer Stiefelhagen y Alex Waibel. "Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit". En Lecture Notes in Computer Science, 488–95. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-28649-3_60.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Audio speech recognition"

Ko, Tom, Vijayaditya Peddinti, Daniel Povey y Sanjeev Khudanpur. "Audio augmentation for speech recognition". En Interspeech 2015. ISCA: ISCA, 2015. http://dx.doi.org/10.21437/interspeech.2015-711.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Li, Xinyu, Venkata Chebiyyam y Katrin Kirchhoff. "Speech Audio Super-Resolution for Speech Recognition". En Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-3043.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Palecek, Karel y Josef Chaloupka. "Audio-visual speech recognition in noisy audio environments". En 2013 36th International Conference on Telecommunications and Signal Processing (TSP). IEEE, 2013. http://dx.doi.org/10.1109/tsp.2013.6613979.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Fook, C. Y., M. Hariharan, Sazali Yaacob y AH Adom. "A review: Malay speech recognition and audio visual speech recognition". En 2012 International Conference on Biomedical Engineering (ICoBE). IEEE, 2012. http://dx.doi.org/10.1109/icobe.2012.6179063.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Sinha, Arryan y G. Suseela. "Deep Learning-Based Speech Emotion Recognition". En International Research Conference on IOT, Cloud and Data Science. Switzerland: Trans Tech Publications Ltd, 2023. http://dx.doi.org/10.4028/p-0892re.

Texto completo

Resumen

Speech Emotion Recognition, as described in this study, uses Neural Networks to classify the emotions expressed in each speech (SER). It’s centered upon concept where voice tone and pitch frequently reflect underlying emotion. Speech Emotion Recognition aids in the classification of elicited emotions. The MLP-Classifier is a tool for classifying emotions in a circumstance. As wave signal, allowing for flexible learning rate selection. RAVDESS (Ryerson Audio-Visual Dataset Emotional Speech and Song Database data) will be used. To extract the characteristics from particular audio input, Contrast, MFCC, Mel Spectrograph Frequency, & Chroma are some of factors that may be employed. To facilitate extraction of features from audio script, dataset will be labelled using decimal encoding. Utilizing input audio sample, precision was found to be 80.28%. Additional testing confirmed this result.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Benhaim, Eric, Hichem Sahbi y Guillaume Vittey. "Continuous visual speech recognition for audio speech enhancement". En ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015. http://dx.doi.org/10.1109/icassp.2015.7178370.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Reikeras, Helge, Ben Herbst, Johan du Preez y Herman Engelbrecht. "Audio-Visual Speech Recognition using SciPy". En Python in Science Conference. SciPy, 2010. http://dx.doi.org/10.25080/majora-92bf1922-010.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Tan, Hao, Chenwei Liu, Yinyu Lyu, Xiao Zhang, Denghui Zhang y Zhaoquan Gu. "Audio Steganography with Speech Recognition System". En 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC). IEEE, 2021. http://dx.doi.org/10.1109/dsc53577.2021.00042.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Narisetty, Chaitanya, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel y Shinji Watanabe. "Joint Speech Recognition and Audio Captioning". En ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022. http://dx.doi.org/10.1109/icassp43922.2022.9746601.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Yang, Karren, Dejan Markovic, Steven Krenn, Vasu Agrawal y Alexander Richard. "Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis". En 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00805.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Informes sobre el tema "Audio speech recognition"

STANDARD OBJECT SYSTEMS INC. Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment. Fort Belvoir, VA: Defense Technical Information Center, enero de 2000. http://dx.doi.org/10.21236/ada373461.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Issues in Data Processing and Relevant Population Selection. OSAC Speaker Recognition Subcommittee, noviembre de 2022. http://dx.doi.org/10.29325/osac.tg.0006.

Texto completo

Resumen

In Forensic Automatic Speaker Recognition (FASR), forensic examiners typically compare audio recordings of a speaker whose identity is in question with recordings of known speakers to assist investigators and triers of fact in a legal proceeding. The performance of automated speaker recognition (SR) systems used for this purpose depends largely on the characteristics of the speech samples being compared. Examiners must understand the requirements of specific systems in use as well as the audio characteristics that impact system performance. Mismatch conditions between the known and questioned data samples are of particular importance, but the need for, and impact of, audio pre-processing must also be understood. The data selected for use in a relevant population can also be critical to the performance of the system. This document describes issues that arise in the processing of case data and in the selections of a relevant population for purposes of conducting an examination using a human supervised automatic speaker recognition approach in a forensic context. The document is intended to comply with the Organization of Scientific Area Committees (OSAC) for Forensic Science Technical Guidance Document.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!