Gotowa bibliografia na temat „Audio speech recognition”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Audio speech recognition”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Artykuły w czasopismach na temat "Audio speech recognition"

1

Beadles, Robert L. "Audio visual speech recognition". Journal of the Acoustical Society of America 87, nr 5 (maj 1990): 2274. http://dx.doi.org/10.1121/1.399137.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Bahal, Akriti. "Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition". IOSR Journal of Computer Engineering 5, nr 1 (2012): 31–36. http://dx.doi.org/10.9790/0661-0513136.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Hwang, Jung-Wook, Jeongkyun Park, Rae-Hong Park i Hyung-Min Park. "Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition". Applied Acoustics 211 (sierpień 2023): 109478. http://dx.doi.org/10.1016/j.apacoust.2023.109478.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Nakadai, Kazuhiro, i Tomoaki Koiwa. "Psychologically-Inspired Audio-Visual Speech Recognition Using Coarse Speech Recognition and Missing Feature Theory". Journal of Robotics and Mechatronics 29, nr 1 (20.02.2017): 105–13. http://dx.doi.org/10.20965/jrm.2017.p0105.

Pełny tekst źródła
Streszczenie:
[abstFig src='/00290001/10.jpg' width='300' text='System architecture of AVSR based on missing feature theory and P-V grouping' ] Audio-visual speech recognition (AVSR) is a promising approach to improving the noise robustness of speech recognition in the real world. For AVSR, the auditory and visual units are the phoneme and viseme, respectively. However, these are often misclassified in the real world because of noisy input. To solve this problem, we propose two psychologically-inspired approaches. One is audio-visual integration based on missing feature theory (MFT) to cope with missing or unreliable audio and visual features for recognition. The other is phoneme and viseme grouping based on coarse-to-fine recognition. Preliminary experiments show that these two approaches are effective for audio-visual speech recognition. Integration based on MFT with an appropriate weight improves the recognition performance by −5 dB. This is the case even in a noisy environment, in which most speech recognition systems do not work properly. Phoneme and viseme grouping further improved the AVSR performance, particularly at a low signal-to-noise ratio.**This work is an extension of our publication “Tomoaki Koiwa et al.: Coarse speech recognition by audio-visual integration based on missing feature theory, IROS 2007, pp.1751-1756, 2007.”
Style APA, Harvard, Vancouver, ISO itp.
5

BASYSTIUK, Oleh, i Nataliia MELNYKOVA. "MULTIMODAL SPEECH RECOGNITION BASED ON AUDIO AND TEXT DATA". Herald of Khmelnytskyi National University. Technical sciences 313, nr 5 (27.10.2022): 22–25. http://dx.doi.org/10.31891/2307-5732-2022-313-5-22-25.

Pełny tekst źródła
Streszczenie:
Systems of machine translation of texts from one language to another simulate the work of a human translator. Their performance depends on the ability to understand the grammar rules of the language. In translation, the basic units are not individual words, but word combinations or phraseological units that express different concepts. Only by using them, more complex ideas can be expressed through the translated text. The main feature of machine translation is different length for input and output. The ability to work with different lengths of input and output provides us with the approach of recurrent neural networks. A recurrent neural network (RNN) is a class of artificial neural network that has connections between nodes. In this case, a connection refers to a connection from a more distant node to a less distant node. The presence of connections allows the RNN to remember and reproduce the entire sequence of reactions to one stimulus. From the point of view of programming, such networks are analogous to cyclic execution, and from the point of view of the system, such networks are equivalent to a state machine. RNNs are commonly used to process word sequences in natural language processing. Usually, a hidden Markov model (HMM) and an N-program language model are used to process a sequence of words. Deep learning has completely changed the approach to machine translation. Researchers in the deep learning field has created simple solutions based on machine learning that outperform the best expert systems. In this paper was reviewed the main features of machine translation based on recurrent neural networks. The advantages of systems based on RNN using the sequence-to-sequence model against statistical translation systems are also highlighted in the article. Two machine translation systems based on the sequence-to-sequence model were constructed using Keras and PyTorch machine learning libraries. Based on the obtained results, libraries analysis was done, and their performance comparison.
Style APA, Harvard, Vancouver, ISO itp.
6

Dupont, S., i J. Luettin. "Audio-visual speech modeling for continuous speech recognition". IEEE Transactions on Multimedia 2, nr 3 (2000): 141–51. http://dx.doi.org/10.1109/6046.865479.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Kubanek, M., J. Bobulski i L. Adrjanowicz. "Characteristics of the use of coupled hidden Markov models for audio-visual polish speech recognition". Bulletin of the Polish Academy of Sciences: Technical Sciences 60, nr 2 (1.10.2012): 307–16. http://dx.doi.org/10.2478/v10175-012-0041-6.

Pełny tekst źródła
Streszczenie:
Abstract. This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of the highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). The described methods were developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audiovisual speech recognition. The problem of a visual speech analysis is very difficult and computationally demanding, mostly because of an extreme amount of data that needs to be processed. Therefore, the method of audio-video speech recognition is used only while the audiospeech signal is exposed to a considerable level of distortion. There are proposed the authors’ own methods of the lip edges detection and a visual characteristic extraction in this paper. Moreover, the method of fusing speech characteristics for an audio-video signal was proposed and tested. A significant increase of recognition effectiveness and processing speed were noted during tests - for properly selected CHMM parameters and an adequate codebook size, besides the use of the appropriate fusion of audio-visual characteristics. The experimental results were very promising and close to those achieved by leading scientists in the field of audio-visual speech recognition.
Style APA, Harvard, Vancouver, ISO itp.
8

Kacur, Juraj, Boris Puterka, Jarmila Pavlovicova i Milos Oravec. "Frequency, Time, Representation and Modeling Aspects for Major Speech and Audio Processing Applications". Sensors 22, nr 16 (22.08.2022): 6304. http://dx.doi.org/10.3390/s22166304.

Pełny tekst źródła
Streszczenie:
There are many speech and audio processing applications and their number is growing. They may cover a wide range of tasks, each having different requirements on the processed speech or audio signals and, therefore, indirectly, on the audio sensors as well. This article reports on tests and evaluation of the effect of basic physical properties of speech and audio signals on the recognition accuracy of major speech/audio processing applications, i.e., speech recognition, speaker recognition, speech emotion recognition, and audio event recognition. A particular focus is on frequency ranges, time intervals, a precision of representation (quantization), and complexities of models suitable for each class of applications. Using domain-specific datasets, eligible feature extraction methods and complex neural network models, it was possible to test and evaluate the effect of basic speech and audio signal properties on the achieved accuracies for each group of applications. The tests confirmed that the basic parameters do affect the overall performance and, moreover, this effect is domain-dependent. Therefore, accurate knowledge of the extent of these effects can be valuable for system designers when selecting appropriate hardware, sensors, architecture, and software for a particular application, especially in the case of limited resources.
Style APA, Harvard, Vancouver, ISO itp.
9

Showkat Ahmad Dar, Showkat Ahmad Dar. "Emotion Recognition Based On Audio Speech". IOSR Journal of Computer Engineering 11, nr 6 (2013): 46–50. http://dx.doi.org/10.9790/0661-1164650.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Aucouturier, Jean-Julien, i Laurent Daudet. "Pattern recognition of non-speech audio". Pattern Recognition Letters 31, nr 12 (wrzesień 2010): 1487–88. http://dx.doi.org/10.1016/j.patrec.2010.05.003.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat "Audio speech recognition"

1

Miyajima, C., D. Negi, Y. Ninomiya, M. Sano, K. Mori, K. Itou, K. Takeda i Y. Suenaga. "Audio-Visual Speech Database for Bimodal Speech Recognition". INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2005. http://hdl.handle.net/2237/10460.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Seymour, R. "Audio-visual speech and speaker recognition". Thesis, Queen's University Belfast, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492489.

Pełny tekst źródła
Streszczenie:
In this thesis, a number of important issues relating to the use of both audio and video information for speech and speaker recognition are investigated. A comprehensive comparison of different visual feature types is given, including both geometric and image transformation based features. A new geometric based method for feature extraction is described, as well as the novel use of curvelet based features. Different methods for constructing the feature vectors are compared, as well as feature vector sizes and the use of dynamic features. Each feature type is tested against three types of visual noise: compression, blurring and jitter. A novel method of integrating the audio and video information streams called the maximum stream posterior (MSP) is described. This method is tested in both speaker dependent and speaker independent audio-visual speech recognition (AVSR) systems, and is shown to be robust to noise in either the audio or video streams, given no prior knowledge of the noise. This method is then extended to form the maximum weighted stream posterior (MWSP) method. Finally, both the MSP and MWSP are tested in an audio-visual speaker recognition system (AVSpR). / Experiments using the XM2VTS database will show that both of these methods can outperform ,_.','/ standard methods in terms of recognition accuracy in situations where either stream is corrupted.
Style APA, Harvard, Vancouver, ISO itp.
3

Pachoud, Samuel. "Audio-visual speech and emotion recognition". Thesis, Queen Mary, University of London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.528923.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Matthews, Iain. "Features for audio-visual speech recognition". Thesis, University of East Anglia, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266736.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Kaucic, Robert August. "Lip tracking for audio-visual speech recognition". Thesis, University of Oxford, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360392.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Lucey, Simon. "Audio-visual speech processing". Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Pełny tekst źródła
Streszczenie:
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
Style APA, Harvard, Vancouver, ISO itp.
7

Eriksson, Mattias. "Speech recognition availability". Thesis, Linköping University, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2651.

Pełny tekst źródła
Streszczenie:

This project investigates the importance of availability in the scope of dictation programs. Using speech recognition technology for dictating has not reached the public, and that may very well be a result of poor availability in today’s technical solutions.

I have constructed a persona character, Johanna, who personalizes the target user. I have also developed a solution that streams audio into a speech recognition server and sends back interpreted text. Johanna affirmed that the solution was successful in theory.

I then incorporated test users that tried out the solution in practice. Half of them do indeed claim that their usage has been and will continue to be increased thanks to the new level of availability.

Style APA, Harvard, Vancouver, ISO itp.
8

Rao, Ram Raghavendra. "Audio-visual interaction in multimedia". Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/13349.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing". Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.

Pełny tekst źródła
Streszczenie:
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
Style APA, Harvard, Vancouver, ISO itp.
10

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing". Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.

Pełny tekst źródła
Streszczenie:
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
Style APA, Harvard, Vancouver, ISO itp.

Książki na temat "Audio speech recognition"

1

Sen, Soumya, Anjan Dutta i Nilanjan Dey. Audio Processing and Speech Recognition. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Ogunfunmi, Tokunbo, Roberto Togneri i Madihally Narasimha, red. Speech and Audio Processing for Coding, Enhancement and Recognition. New York, NY: Springer New York, 2015. http://dx.doi.org/10.1007/978-1-4939-1456-2.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Junqua, Jean-Claude. Robustness in automatic speech recognition: Fundamentals and applications. Boston: Kluwer Academic Publishers, 1996.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Harrison, Mark. The use of interactive audio and speech recognition techniques in training. [U.K.]: [s.n.], 1993.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

AVBPA '97 ((1st 1997 Montana,Switzerland). Audio- and video-based biometric person authentication: First International Conference, AVBPA '97, Crans-Montana, Switzerland, March 1997 : proceedings. Berlin: Springer, 1997.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

1946-, Kittler Josef, i Nixon Mark S, red. Audio-and video-based biometric person authentication: 4th International Conference, AVBPA 2003, Guildford, UK, June 2003 : proceedings. Berlin: Springer, 2003.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

International Conference, AVBPA (1st 1997 Montana, Switzerland). Audio- and video-based biometric person authentication: First International Conference, AVBPA '97, Crans-Montana, Switzerland, March 12-14, 1997 : proceedings. Berlin: Springer, 1997.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

IEEE Workshop on Automatic Speech Recognition and Understanding (1997 Santa Barbara, Calif.). 1997 IEEE Workshop on Automatic Speech Recognition and Understanding proceedings. Piscataway, NJ: Published under the sponsorship of the IEEE Signal Processing Society, 1997.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Minker, Wolfgang. Speech and human-machine dialog. Boston: Kluwer Academic Publishers, 2004.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Minker, Wolfgang. Speech and human-machine dialog. Boston: Kluwer Academic Publishers, 2004.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Części książek na temat "Audio speech recognition"

1

Sen, Soumya, Anjan Dutta i Nilanjan Dey. "Audio Indexing". W Audio Processing and Speech Recognition, 1–11. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_1.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Sen, Soumya, Anjan Dutta i Nilanjan Dey. "Audio Classification". W Audio Processing and Speech Recognition, 67–93. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_4.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Richter, Michael M., Sheuli Paul, Veton Këpuska i Marius Silaghi. "Audio Signals and Speech Recognition". W Signal Processing and Machine Learning with Applications, 345–68. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-319-45372-9_18.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Luettin, Juergen, i Stéphane Dupont. "Continuous audio-visual speech recognition". W Lecture Notes in Computer Science, 657–73. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0054771.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Sen, Soumya, Anjan Dutta i Nilanjan Dey. "Speech Processing and Recognition System". W Audio Processing and Speech Recognition, 13–43. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_2.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Sen, Soumya, Anjan Dutta i Nilanjan Dey. "Feature Extraction". W Audio Processing and Speech Recognition, 45–66. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_3.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Sen, Soumya, Anjan Dutta i Nilanjan Dey. "Conclusion". W Audio Processing and Speech Recognition, 95–96. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6098-5_5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Sethu, Vidhyasaharan, Julien Epps i Eliathamby Ambikairajah. "Speech Based Emotion Recognition". W Speech and Audio Processing for Coding, Enhancement and Recognition, 197–228. New York, NY: Springer New York, 2014. http://dx.doi.org/10.1007/978-1-4939-1456-2_7.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Karpov, Alexey, Alexander Ronzhin, Irina Kipyatkova, Andrey Ronzhin, Vasilisa Verkhodanova, Anton Saveliev i Milos Zelezny. "Bimodal Speech Recognition Fusing Audio-Visual Modalities". W Lecture Notes in Computer Science, 170–79. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-39516-6_16.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Kratt, Jan, Florian Metze, Rainer Stiefelhagen i Alex Waibel. "Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit". W Lecture Notes in Computer Science, 488–95. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-28649-3_60.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Streszczenia konferencji na temat "Audio speech recognition"

1

Ko, Tom, Vijayaditya Peddinti, Daniel Povey i Sanjeev Khudanpur. "Audio augmentation for speech recognition". W Interspeech 2015. ISCA: ISCA, 2015. http://dx.doi.org/10.21437/interspeech.2015-711.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Li, Xinyu, Venkata Chebiyyam i Katrin Kirchhoff. "Speech Audio Super-Resolution for Speech Recognition". W Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-3043.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Palecek, Karel, i Josef Chaloupka. "Audio-visual speech recognition in noisy audio environments". W 2013 36th International Conference on Telecommunications and Signal Processing (TSP). IEEE, 2013. http://dx.doi.org/10.1109/tsp.2013.6613979.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Fook, C. Y., M. Hariharan, Sazali Yaacob i AH Adom. "A review: Malay speech recognition and audio visual speech recognition". W 2012 International Conference on Biomedical Engineering (ICoBE). IEEE, 2012. http://dx.doi.org/10.1109/icobe.2012.6179063.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Sinha, Arryan, i G. Suseela. "Deep Learning-Based Speech Emotion Recognition". W International Research Conference on IOT, Cloud and Data Science. Switzerland: Trans Tech Publications Ltd, 2023. http://dx.doi.org/10.4028/p-0892re.

Pełny tekst źródła
Streszczenie:
Speech Emotion Recognition, as described in this study, uses Neural Networks to classify the emotions expressed in each speech (SER). It’s centered upon concept where voice tone and pitch frequently reflect underlying emotion. Speech Emotion Recognition aids in the classification of elicited emotions. The MLP-Classifier is a tool for classifying emotions in a circumstance. As wave signal, allowing for flexible learning rate selection. RAVDESS (Ryerson Audio-Visual Dataset Emotional Speech and Song Database data) will be used. To extract the characteristics from particular audio input, Contrast, MFCC, Mel Spectrograph Frequency, & Chroma are some of factors that may be employed. To facilitate extraction of features from audio script, dataset will be labelled using decimal encoding. Utilizing input audio sample, precision was found to be 80.28%. Additional testing confirmed this result.
Style APA, Harvard, Vancouver, ISO itp.
6

Benhaim, Eric, Hichem Sahbi i Guillaume Vittey. "Continuous visual speech recognition for audio speech enhancement". W ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015. http://dx.doi.org/10.1109/icassp.2015.7178370.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Reikeras, Helge, Ben Herbst, Johan du Preez i Herman Engelbrecht. "Audio-Visual Speech Recognition using SciPy". W Python in Science Conference. SciPy, 2010. http://dx.doi.org/10.25080/majora-92bf1922-010.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Tan, Hao, Chenwei Liu, Yinyu Lyu, Xiao Zhang, Denghui Zhang i Zhaoquan Gu. "Audio Steganography with Speech Recognition System". W 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC). IEEE, 2021. http://dx.doi.org/10.1109/dsc53577.2021.00042.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Narisetty, Chaitanya, Emiru Tsunoo, Xuankai Chang, Yosuke Kashiwagi, Michael Hentschel i Shinji Watanabe. "Joint Speech Recognition and Audio Captioning". W ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022. http://dx.doi.org/10.1109/icassp43922.2022.9746601.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Yang, Karren, Dejan Markovic, Steven Krenn, Vasu Agrawal i Alexander Richard. "Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis". W 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00805.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Raporty organizacyjne na temat "Audio speech recognition"

1

STANDARD OBJECT SYSTEMS INC. Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment. Fort Belvoir, VA: Defense Technical Information Center, styczeń 2000. http://dx.doi.org/10.21236/ada373461.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Issues in Data Processing and Relevant Population Selection. OSAC Speaker Recognition Subcommittee, listopad 2022. http://dx.doi.org/10.29325/osac.tg.0006.

Pełny tekst źródła
Streszczenie:
In Forensic Automatic Speaker Recognition (FASR), forensic examiners typically compare audio recordings of a speaker whose identity is in question with recordings of known speakers to assist investigators and triers of fact in a legal proceeding. The performance of automated speaker recognition (SR) systems used for this purpose depends largely on the characteristics of the speech samples being compared. Examiners must understand the requirements of specific systems in use as well as the audio characteristics that impact system performance. Mismatch conditions between the known and questioned data samples are of particular importance, but the need for, and impact of, audio pre-processing must also be understood. The data selected for use in a relevant population can also be critical to the performance of the system. This document describes issues that arise in the processing of case data and in the selections of a relevant population for purposes of conducting an examination using a human supervised automatic speaker recognition approach in a forensic context. The document is intended to comply with the Organization of Scientific Area Committees (OSAC) for Forensic Science Technical Guidance Document.
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii