Littérature scientifique sur le sujet « Audio speech recognition »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les listes thématiques d’articles de revues, de livres, de thèses, de rapports de conférences et d’autres sources académiques sur le sujet « Audio speech recognition ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Articles de revues sur le sujet "Audio speech recognition"

1

Beadles, Robert L. « Audio visual speech recognition ». Journal of the Acoustical Society of America 87, no 5 (mai 1990) : 2274. http://dx.doi.org/10.1121/1.399137.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
2

Bahal, Akriti. « Advances in Automatic Speech Recognition : From Audio-Only To Audio-Visual Speech Recognition ». IOSR Journal of Computer Engineering 5, no 1 (2012) : 31–36. http://dx.doi.org/10.9790/0661-0513136.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
3

Hwang, Jung-Wook, Jeongkyun Park, Rae-Hong Park et Hyung-Min Park. « Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition ». Applied Acoustics 211 (août 2023) : 109478. http://dx.doi.org/10.1016/j.apacoust.2023.109478.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
4

Nakadai, Kazuhiro, et Tomoaki Koiwa. « Psychologically-Inspired Audio-Visual Speech Recognition Using Coarse Speech Recognition and Missing Feature Theory ». Journal of Robotics and Mechatronics 29, no 1 (20 février 2017) : 105–13. http://dx.doi.org/10.20965/jrm.2017.p0105.

Texte intégral
Résumé :
[abstFig src='/00290001/10.jpg' width='300' text='System architecture of AVSR based on missing feature theory and P-V grouping' ] Audio-visual speech recognition (AVSR) is a promising approach to improving the noise robustness of speech recognition in the real world. For AVSR, the auditory and visual units are the phoneme and viseme, respectively. However, these are often misclassified in the real world because of noisy input. To solve this problem, we propose two psychologically-inspired approaches. One is audio-visual integration based on missing feature theory (MFT) to cope with missing or unreliable audio and visual features for recognition. The other is phoneme and viseme grouping based on coarse-to-fine recognition. Preliminary experiments show that these two approaches are effective for audio-visual speech recognition. Integration based on MFT with an appropriate weight improves the recognition performance by −5 dB. This is the case even in a noisy environment, in which most speech recognition systems do not work properly. Phoneme and viseme grouping further improved the AVSR performance, particularly at a low signal-to-noise ratio.**This work is an extension of our publication “Tomoaki Koiwa et al.: Coarse speech recognition by audio-visual integration based on missing feature theory, IROS 2007, pp.1751-1756, 2007.”
Styles APA, Harvard, Vancouver, ISO, etc.
5

BASYSTIUK, Oleh, et Nataliia MELNYKOVA. « MULTIMODAL SPEECH RECOGNITION BASED ON AUDIO AND TEXT DATA ». Herald of Khmelnytskyi National University. Technical sciences 313, no 5 (27 octobre 2022) : 22–25. http://dx.doi.org/10.31891/2307-5732-2022-313-5-22-25.

Texte intégral
Résumé :
Systems of machine translation of texts from one language to another simulate the work of a human translator. Their performance depends on the ability to understand the grammar rules of the language. In translation, the basic units are not individual words, but word combinations or phraseological units that express different concepts. Only by using them, more complex ideas can be expressed through the translated text. The main feature of machine translation is different length for input and output. The ability to work with different lengths of input and output provides us with the approach of recurrent neural networks. A recurrent neural network (RNN) is a class of artificial neural network that has connections between nodes. In this case, a connection refers to a connection from a more distant node to a less distant node. The presence of connections allows the RNN to remember and reproduce the entire sequence of reactions to one stimulus. From the point of view of programming, such networks are analogous to cyclic execution, and from the point of view of the system, such networks are equivalent to a state machine. RNNs are commonly used to process word sequences in natural language processing. Usually, a hidden Markov model (HMM) and an N-program language model are used to process a sequence of words. Deep learning has completely changed the approach to machine translation. Researchers in the deep learning field has created simple solutions based on machine learning that outperform the best expert systems. In this paper was reviewed the main features of machine translation based on recurrent neural networks. The advantages of systems based on RNN using the sequence-to-sequence model against statistical translation systems are also highlighted in the article. Two machine translation systems based on the sequence-to-sequence model were constructed using Keras and PyTorch machine learning libraries. Based on the obtained results, libraries analysis was done, and their performance comparison.
Styles APA, Harvard, Vancouver, ISO, etc.
6

Dupont, S., et J. Luettin. « Audio-visual speech modeling for continuous speech recognition ». IEEE Transactions on Multimedia 2, no 3 (2000) : 141–51. http://dx.doi.org/10.1109/6046.865479.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
7

Kubanek, M., J. Bobulski et L. Adrjanowicz. « Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition ». Bulletin of the Polish Academy of Sciences : Technical Sciences 60, no 2 (1 octobre 2012) : 307–16. http://dx.doi.org/10.2478/v10175-012-0041-6.

Texte intégral
Résumé :
Abstract. This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of the highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). The described methods were developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audiovisual speech recognition. The problem of a visual speech analysis is very difficult and computationally demanding, mostly because of an extreme amount of data that needs to be processed. Therefore, the method of audio-video speech recognition is used only while the audiospeech signal is exposed to a considerable level of distortion. There are proposed the authors’ own methods of the lip edges detection and a visual characteristic extraction in this paper. Moreover, the method of fusing speech characteristics for an audio-video signal was proposed and tested. A significant increase of recognition effectiveness and processing speed were noted during tests - for properly selected CHMM parameters and an adequate codebook size, besides the use of the appropriate fusion of audio-visual characteristics. The experimental results were very promising and close to those achieved by leading scientists in the field of audio-visual speech recognition.
Styles APA, Harvard, Vancouver, ISO, etc.
8

Kacur, Juraj, Boris Puterka, Jarmila Pavlovicova et Milos Oravec. « Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications ». Sensors 22, no 16 (22 août 2022) : 6304. http://dx.doi.org/10.3390/s22166304.

Texte intégral
Résumé :
There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources.
Styles APA, Harvard, Vancouver, ISO, etc.
9

Showkat Ahmad Dar, Showkat Ahmad Dar. « Emotion Recognition Based On Audio Speech ». IOSR Journal of Computer Engineering 11, no 6 (2013) : 46–50. http://dx.doi.org/10.9790/0661-1164650.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
10

Aucouturier, Jean-Julien, et Laurent Daudet. « Pattern recognition of non-speech audio ». Pattern Recognition Letters 31, no 12 (septembre 2010) : 1487–88. http://dx.doi.org/10.1016/j.patrec.2010.05.003.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.

Thèses sur le sujet "Audio speech recognition"

1

Miyajima, C., D. Negi, Y. Ninomiya, M. Sano, K. Mori, K. Itou, K. Takeda et Y. Suenaga. « Audio-Visual Speech Database for Bimodal Speech Recognition ». INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2005. http://hdl.handle.net/2237/10460.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
2

Seymour, R. « Audio-visual speech and speaker recognition ». Thesis, Queen's University Belfast, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492489.

Texte intégral
Résumé :
In this thesis, a number of important issues relating to the use of both audio and video information for speech and speaker recognition are investigated. A comprehensive comparison of different visual feature types is given, including both geometric and image transformation based features. A new geometric based method for feature extraction is described, as well as the novel use of curvelet based features. Different methods for constructing the feature vectors are compared, as well as feature vector sizes and the use of dynamic features. Each feature type is tested against three types of visual noise: compression, blurring and jitter. A novel method of integrating the audio and video information streams called the maximum stream posterior (MSP) is described. This method is tested in both speaker dependent and speaker independent audio-visual speech recognition (AVSR) systems, and is shown to be robust to noise in either the audio or video streams, given no prior knowledge of the noise. This method is then extended to form the maximum weighted stream posterior (MWSP) method. Finally, both the MSP and MWSP are tested in an audio-visual speaker recognition system (AVSpR). / Experiments using the XM2VTS database will show that both of these methods can outperform ,_.','/ standard methods in terms of recognition accuracy in situations where either stream is corrupted.
Styles APA, Harvard, Vancouver, ISO, etc.
3

Pachoud, Samuel. « Audio-visual speech and emotion recognition ». Thesis, Queen Mary, University of London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.528923.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
4

Matthews, Iain. « Features for audio-visual speech recognition ». Thesis, University of East Anglia, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266736.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
5

Kaucic, Robert August. « Lip tracking for audio-visual speech recognition ». Thesis, University of Oxford, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360392.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
6

Lucey, Simon. « Audio-visual speech processing ». Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Texte intégral
Résumé :
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
Styles APA, Harvard, Vancouver, ISO, etc.
7

Eriksson, Mattias. « Speech recognition availability ». Thesis, Linköping University, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2651.

Texte intégral
Résumé :

This project investigates the importance of availability in the scope of dictation programs. Using speech recognition technology for dictating has not reached the public, and that may very well be a result of poor availability in today’s technical solutions.

I have constructed a persona character, Johanna, who personalizes the target user. I have also developed a solution that streams audio into a speech recognition server and sends back interpreted text. Johanna affirmed that the solution was successful in theory.

I then incorporated test users that tried out the solution in practice. Half of them do indeed claim that their usage has been and will continue to be increased thanks to the new level of availability.

Styles APA, Harvard, Vancouver, ISO, etc.
8

Rao, Ram Raghavendra. « Audio-visual interaction in multimedia ». Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/13349.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
9

Dean, David Brendan. « Synchronous HMMs for audio-visual speech processing ». Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.

Texte intégral
Résumé :
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
Styles APA, Harvard, Vancouver, ISO, etc.
10

Dean, David Brendan. « Synchronous HMMs for audio-visual speech processing ». Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.

Texte intégral
Résumé :
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
Styles APA, Harvard, Vancouver, ISO, etc.

Livres sur le sujet "Audio speech recognition"

1

Sen, Soumya, Anjan Dutta et Nilanjan Dey. Audio Processing and Speech Recognition. Singapore : Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
2

Ogunfunmi, Tokunbo, Roberto Togneri et Madihally Narasimha, dir. Speech and Audio Processing for Coding, Enhancement and Recognition. New York, NY : Springer New York, 2015. http://dx.doi.org/10.1007/978-1-4939-1456-2.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
3

Junqua, Jean-Claude. Robustness in automatic speech recognition : Fundamentals and applications. Boston : Kluwer Academic Publishers, 1996.

Trouver le texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
4

Harrison, Mark. The use of interactive audio and speech recognition techniques in training. [U.K.] : [s.n.], 1993.

Trouver le texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
5

AVBPA '97 ((1st 1997 Montana,Switzerland). Audio- and video-based biometric person authentication : First International Conference, AVBPA '97, Crans-Montana, Switzerland, March 1997 : proceedings. Berlin : Springer, 1997.

Trouver le texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
6

1946-, Kittler Josef, et Nixon Mark S, dir. Audio-and video-based biometric person authentication : 4th International Conference, AVBPA 2003, Guildford, UK, June 2003 : proceedings. Berlin : Springer, 2003.

Trouver le texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
7

International Conference, AVBPA (1st 1997 Montana, Switzerland). Audio- and video-based biometric person authentication : First International Conference, AVBPA '97, Crans-Montana, Switzerland, March 12-14, 1997 : proceedings. Berlin : Springer, 1997.

Trouver le texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
8

IEEE Workshop on Automatic Speech Recognition and Understanding (1997 Santa Barbara, Calif.). 1997 IEEE Workshop on Automatic Speech Recognition and Understanding proceedings. Piscataway, NJ : Published under the sponsorship of the IEEE Signal Processing Society, 1997.

Trouver le texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
9

Minker, Wolfgang. Speech and human-machine dialog. Boston : Kluwer Academic Publishers, 2004.

Trouver le texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
10

Minker, Wolfgang. Speech and human-machine dialog. Boston : Kluwer Academic Publishers, 2004.

Trouver le texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.

Chapitres de livres sur le sujet "Audio speech recognition"

1

Sen, Soumya, Anjan Dutta et Nilanjan Dey. « Audio Indexing ». Dans Audio Processing and Speech Recognition, 1–11. Singapore : Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_1.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
2

Sen, Soumya, Anjan Dutta et Nilanjan Dey. « Audio Classification ». Dans Audio Processing and Speech Recognition, 67–93. Singapore : Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_4.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
3

Richter, Michael M., Sheuli Paul, Veton Këpuska et Marius Silaghi. « Audio Signals and Speech Recognition ». Dans Signal Processing and Machine Learning with Applications, 345–68. Cham : Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-319-45372-9_18.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
4

Luettin, Juergen, et Stéphane Dupont. « Continuous audio-visual speech recognition ». Dans Lecture Notes in Computer Science, 657–73. Berlin, Heidelberg : Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0054771.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
5

Sen, Soumya, Anjan Dutta et Nilanjan Dey. « Speech Processing and Recognition System ». Dans Audio Processing and Speech Recognition, 13–43. Singapore : Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_2.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
6

Sen, Soumya, Anjan Dutta et Nilanjan Dey. « Feature Extraction ». Dans Audio Processing and Speech Recognition, 45–66. Singapore : Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_3.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
7

Sen, Soumya, Anjan Dutta et Nilanjan Dey. « Conclusion ». Dans Audio Processing and Speech Recognition, 95–96. Singapore : Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_5.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
8

Sethu, Vidhyasaharan, Julien Epps et Eliathamby Ambikairajah. « Speech Based Emotion Recognition ». Dans Speech and Audio Processing for Coding, Enhancement and Recognition, 197–228. New York, NY : Springer New York, 2014. http://dx.doi.org/10.1007/978-1-4939-1456-2_7.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
9

Karpov, Alexey, Alexander Ronzhin, Irina Kipyatkova, Andrey Ronzhin, Vasilisa Verkhodanova, Anton Saveliev et Milos Zelezny. « Bimodal Speech Recognition Fusing Audio-Visual Modalities ». Dans Lecture Notes in Computer Science, 170–79. Cham : Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-39516-6_16.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
10

Kratt, Jan, Florian Metze, Rainer Stiefelhagen et Alex Waibel. « Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit ». Dans Lecture Notes in Computer Science, 488–95. Berlin, Heidelberg : Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-28649-3_60.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.

Actes de conférences sur le sujet "Audio speech recognition"

1

Ko, Tom, Vijayaditya Peddinti, Daniel Povey et Sanjeev Khudanpur. « Audio augmentation for speech recognition ». Dans Interspeech 2015. ISCA : ISCA, 2015. http://dx.doi.org/10.21437/interspeech.2015-711.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
2

Li, Xinyu, Venkata Chebiyyam et Katrin Kirchhoff. « Speech Audio Super-Resolution for Speech Recognition ». Dans Interspeech 2019. ISCA : ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-3043.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
3

Palecek, Karel, et Josef Chaloupka. « Audio-visual speech recognition in noisy audio environments ». Dans 2013 36th International Conference on Telecommunications and Signal Processing (TSP). IEEE, 2013. http://dx.doi.org/10.1109/tsp.2013.6613979.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
4

Fook, C. Y., M. Hariharan, Sazali Yaacob et AH Adom. « A review : Malay speech recognition and audio visual speech recognition ». Dans 2012 International Conference on Biomedical Engineering (ICoBE). IEEE, 2012. http://dx.doi.org/10.1109/icobe.2012.6179063.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
5

Sinha, Arryan, et G. Suseela. « Deep Learning-Based Speech Emotion Recognition ». Dans International Research Conference on IOT, Cloud and Data Science. Switzerland : Trans Tech Publications Ltd, 2023. http://dx.doi.org/10.4028/p-0892re.

Texte intégral
Résumé :
Speech Emotion Recognition, as described in this study, uses Neural Networks to classify the emotions expressed in each speech (SER). It’s centered upon concept where voice tone and pitch frequently reflect underlying emotion. Speech Emotion Recognition aids in the classification of elicited emotions. The MLP-Classifier is a tool for classifying emotions in a circumstance. As wave signal, allowing for flexible learning rate selection. RAVDESS (Ryerson Audio-Visual Dataset Emotional Speech and Song Database data) will be used. To extract the characteristics from particular audio input, Contrast, MFCC, Mel Spectrograph Frequency, & Chroma are some of factors that may be employed. To facilitate extraction of features from audio script, dataset will be labelled using decimal encoding. Utilizing input audio sample, precision was found to be 80.28%. Additional testing confirmed this result.
Styles APA, Harvard, Vancouver, ISO, etc.
6

Benhaim, Eric, Hichem Sahbi et Guillaume Vittey. « Continuous visual speech recognition for audio speech enhancement ». Dans ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015. http://dx.doi.org/10.1109/icassp.2015.7178370.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
7

Reikeras, Helge, Ben Herbst, Johan du Preez et Herman Engelbrecht. « Audio-Visual Speech Recognition using SciPy ». Dans Python in Science Conference. SciPy, 2010. http://dx.doi.org/10.25080/majora-92bf1922-010.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
8

Tan, Hao, Chenwei Liu, Yinyu Lyu, Xiao Zhang, Denghui Zhang et Zhaoquan Gu. « Audio Steganography with Speech Recognition System ». Dans 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC). IEEE, 2021. http://dx.doi.org/10.1109/dsc53577.2021.00042.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
9

Narisetty, Chaitanya, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel et Shinji Watanabe. « Joint Speech Recognition and Audio Captioning ». Dans ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022. http://dx.doi.org/10.1109/icassp43922.2022.9746601.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
10

Yang, Karren, Dejan Markovic, Steven Krenn, Vasu Agrawal et Alexander Richard. « Audio-Visual Speech Codecs : Rethinking Audio-Visual Speech Enhancement by Re-Synthesis ». Dans 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00805.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.

Rapports d'organisations sur le sujet "Audio speech recognition"

1

STANDARD OBJECT SYSTEMS INC. Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment. Fort Belvoir, VA : Defense Technical Information Center, janvier 2000. http://dx.doi.org/10.21236/ada373461.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
2

Issues in Data Processing and Relevant Population Selection. OSAC Speaker Recognition Subcommittee, novembre 2022. http://dx.doi.org/10.29325/osac.tg.0006.

Texte intégral
Résumé :
In Forensic Automatic Speaker Recognition (FASR), forensic examiners typically compare audio recordings of a speaker whose identity is in question with recordings of known speakers to assist investigators and triers of fact in a legal proceeding. The performance of automated speaker recognition (SR) systems used for this purpose depends largely on the characteristics of the speech samples being compared. Examiners must understand the requirements of specific systems in use as well as the audio characteristics that impact system performance. Mismatch conditions between the known and questioned data samples are of particular importance, but the need for, and impact of, audio pre-processing must also be understood. The data selected for use in a relevant population can also be critical to the performance of the system. This document describes issues that arise in the processing of case data and in the selections of a relevant population for purposes of conducting an examination using a human supervised automatic speaker recognition approach in a forensic context. The document is intended to comply with the Organization of Scientific Area Committees (OSAC) for Forensic Science Technical Guidance Document.
Styles APA, Harvard, Vancouver, ISO, etc.
Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!

Vers la bibliographie