Log in

Relevant bibliographies by topics / Visual speech recognition

Academic literature on the topic 'Visual speech recognition'

Author: Grafiati

Published: 4 June 2021

Last updated: 18 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Contents

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Visual speech recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Visual speech recognition"

1

Beadles, Robert L. "Audio visual speech recognition." Journal of the Acoustical Society of America 87, no. 5 (May 1990): 2274. http://dx.doi.org/10.1121/1.399137.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Dupont, S., and J. Luettin. "Audio-visual speech modeling for continuous speech recognition." IEEE Transactions on Multimedia 2, no. 3 (2000): 141–51. http://dx.doi.org/10.1109/6046.865479.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Brahme, Aparna, and Umesh Bhadade. "Effect of Various Visual Speech Units on Language Identification Using Visual Speech Recognition." International Journal of Image and Graphics 20, no. 04 (October 2020): 2050029. http://dx.doi.org/10.1142/s0219467820500291.

Full text

Abstract:

In this paper, we describe our work in Spoken language Identification using Visual Speech Recognition (VSR) and analyze the effect of various visual speech units used to transcribe the visual speech on language recognition. We have proposed a new approach of word recognition followed by the word N-gram language model (WRWLM), which uses high-level syntactic features and the word bigram language model for language discrimination. Also, as opposed to the traditional visemic approach, we propose a holistic approach of using the signature of a whole word, referred to as a “Visual Word” as visual speech unit for transcribing visual speech. The result shows Word Recognition Rate (WRR) of 88% and Language Recognition Rate (LRR) of 94% in speaker dependent cases and 58% WRR and 77% LRR in speaker independent cases for English and Marathi digit classification task. The proposed approach is also evaluated for continuous speech input. The result shows that the Spoken Language Identification rate of 50% is possible even though the WRR using Visual Speech Recognition is below 10%, using only 1[Formula: see text]s of speech. Also, there is an improvement of about 5% in language discrimination as compared to traditional visemic approaches.

APA, Harvard, Vancouver, ISO, and other styles

4

Elrefaei, Lamiaa A., Tahani Q. Alhassan, and Shefaa S. Omar. "An Arabic Visual Dataset for Visual Speech Recognition." Procedia Computer Science 163 (2019): 400–409. http://dx.doi.org/10.1016/j.procs.2019.12.122.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Rosenblum, Lawrence D., Deborah A. Yakel, Naser Baseer, Anjani Panchal, Brynn C. Nodarse, and Ryan P. Niehus. "Visual speech information for face recognition." Perception & Psychophysics 64, no. 2 (February 2002): 220–29. http://dx.doi.org/10.3758/bf03195788.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Yu, Dahai, Ovidiu Ghita, Alistair Sutherland, and Paul F. Whelan. "A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition." IPSJ Transactions on Computer Vision and Applications 2 (2010): 25–38. http://dx.doi.org/10.2197/ipsjtcva.2.25.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

S.Salama, Elham, Reda A. El-Khoribi, and Mahmoud E. Shoman. "Audio-Visual Speech Recognition for People with Speech Disorders." International Journal of Computer Applications 96, no. 2 (June 18, 2014): 51–56. http://dx.doi.org/10.5120/16770-6337.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Nakadai, Kazuhiro, and Tomoaki Koiwa. "Psychologically-Inspired Audio-Visual Speech Recognition Using Coarse Speech Recognition and Missing Feature Theory." Journal of Robotics and Mechatronics 29, no. 1 (February 20, 2017): 105–13. http://dx.doi.org/10.20965/jrm.2017.p0105.

Full text

Abstract:

[abstFig src='/00290001/10.jpg' width='300' text='System architecture of AVSR based on missing feature theory and P-V grouping' ] Audio-visual speech recognition (AVSR) is a promising approach to improving the noise robustness of speech recognition in the real world. For AVSR, the auditory and visual units are the phoneme and viseme, respectively. However, these are often misclassified in the real world because of noisy input. To solve this problem, we propose two psychologically-inspired approaches. One is audio-visual integration based on missing feature theory (MFT) to cope with missing or unreliable audio and visual features for recognition. The other is phoneme and viseme grouping based on coarse-to-fine recognition. Preliminary experiments show that these two approaches are effective for audio-visual speech recognition. Integration based on MFT with an appropriate weight improves the recognition performance by −5 dB. This is the case even in a noisy environment, in which most speech recognition systems do not work properly. Phoneme and viseme grouping further improved the AVSR performance, particularly at a low signal-to-noise ratio.**This work is an extension of our publication “Tomoaki Koiwa et al.: Coarse speech recognition by audio-visual integration based on missing feature theory, IROS 2007, pp.1751-1756, 2007.”

APA, Harvard, Vancouver, ISO, and other styles

9

Bahal, Akriti. "Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Speech Recognition." IOSR Journal of Computer Engineering 5, no. 1 (2012): 31–36. http://dx.doi.org/10.9790/0661-0513136.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Seong, Thum Wei, M. Z. Ibrahim, and D. J. Mulvaney. "WADA-W: A Modified WADA SNR Estimator for Audio-Visual Speech Recognition." International Journal of Machine Learning and Computing 9, no. 4 (August 2019): 446–51. http://dx.doi.org/10.18178/ijmlc.2019.9.4.824.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Visual speech recognition"

1

Luettin, Juergen. "Visual speech and speaker recognition." Thesis, University of Sheffield, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264432.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Miyajima, C., D. Negi, Y. Ninomiya, M. Sano, K. Mori, K. Itou, K. Takeda, and Y. Suenaga. "Audio-Visual Speech Database for Bimodal Speech Recognition." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2005. http://hdl.handle.net/2237/10460.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Pachoud, Samuel. "Audio-visual speech and emotion recognition." Thesis, Queen Mary, University of London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.528923.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Matthews, Iain. "Features for audio-visual speech recognition." Thesis, University of East Anglia, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Seymour, R. "Audio-visual speech and speaker recognition." Thesis, Queen's University Belfast, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492489.

Full text

Abstract:

In this thesis, a number of important issues relating to the use of both audio and video information for speech and speaker recognition are investigated. A comprehensive comparison of different visual feature types is given, including both geometric and image transformation based features. A new geometric based method for feature extraction is described, as well as the novel use of curvelet based features. Different methods for constructing the feature vectors are compared, as well as feature vector sizes and the use of dynamic features. Each feature type is tested against three types of visual noise: compression, blurring and jitter. A novel method of integrating the audio and video information streams called the maximum stream posterior (MSP) is described. This method is tested in both speaker dependent and speaker independent audio-visual speech recognition (AVSR) systems, and is shown to be robust to noise in either the audio or video streams, given no prior knowledge of the noise. This method is then extended to form the maximum weighted stream posterior (MWSP) method. Finally, both the MSP and MWSP are tested in an audio-visual speaker recognition system (AVSpR). / Experiments using the XM2VTS database will show that both of these methods can outperform ,_.','/ standard methods in terms of recognition accuracy in situations where either stream is corrupted.

APA, Harvard, Vancouver, ISO, and other styles

6

Rabi, Gihad. "Visual speech recognition by recurrent neural networks." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0010/MQ36169.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Kaucic, Robert August. "Lip tracking for audio-visual speech recognition." Thesis, University of Oxford, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360392.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Saeed, Mehreen. "Soft AI methods and visual speech recognition." Thesis, University of Bristol, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.299270.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Saenko, Ekaterina 1976. "Articulatory features for robust visual speech recognition." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/28736.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.
Includes bibliographical references (p. 99-105).
This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic segment. These units are defined heuristically by mapping several visually similar phonemes to one visual phoneme, sometimes referred to as a viseme. However, experimental evidence shows that phonetic models trained from visual data are not synchronous in time with acoustic phonetic models, indicating that visemes may not be the most natural building blocks of visual speech. Instead, we propose to model the visual signal in terms of the underlying articulatory features. This approach is a natural extension of feature-based modeling of acoustic speech, which has been shown to increase robustness of audio-based speech recognition systems. We start by exploring ways of defining visual articulatory features: first in a data-driven manner, using a large, multi-speaker visual speech corpus, and then in a knowledge-driven manner, using the rules of speech production. Based on these studies, we propose a set of articulatory features, and describe a computational framework for feature-based visual speech recognition. Multiple feature streams are detected in the input image sequence using Support Vector Machines, and then incorporated in a Dynamic Bayesian Network to obtain the final word hypothesis. Preliminary experiments show that our approach increases viseme classification rates in visually noisy conditions, and improves visual word recognition through feature-based context modeling.
by Ekaterina Saenko.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

10

Pass, A. R. "Towards pose invariant visual speech processing." Thesis, Queen's University Belfast, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.580170.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Visual speech recognition"

1

Liew, Alan Wee-Chung. Visual speech recognition: Lip segmentation and mapping. Edited by Wang Shilin. Hershey PA: Medical Information Science Reference, 2009.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

2

Joachim, Hornegger, ed. Pattern recognition and image processing in C [plus] [plus]. Wiesbaden: Vieweg, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

3

Windows speech recognition programming: With Visual Basic and ActiveX voice controls ; exploring Speech API (SAPI) & Software Developer Kit (SDK) for voice input & output enabling of Windows applications. New York: IUniverse, Inc., 2004.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

4

Wayne, Cranton, Fihn Mark, and SpringerLink (Online service), eds. Handbook of Visual Display Technology. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

5

Lip Tracking for Audio-Visual Speech Recognition. Storming Media, 1997.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

6

C, Schwab Eileen, and Nusbaum Howard C, eds. Pattern recognition by humans and machines. Orlando, Fla: Academic Press, 1986.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

7

Hornegger, Joachim, and Dietrich W. R. Paulus. Pattern Recognition and Image Processing in C++ (Vieweg Advanced Studies in Computer Science). Friedrich Vieweg & Sohn Verlag, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

8

Integrating Face and Voice in Person Perception. Springer, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

9

Cranton, Wayne, Janglin Chen, and Mark Fihn. Handbook of Visual Display Technology. Springer, 2016.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

10

Cranton, Wayne, Janglin Chen, and Mark Fihn. Handbook of Visual Display Technology. Springer, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Visual speech recognition"

1

Luettin, Juergen, and Stéphane Dupont. "Continuous audio-visual speech recognition." In Lecture Notes in Computer Science, 657–73. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0054771.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Mahadevaswamy, U. B., M. Shashank Rao, S. Vrushab, C. Anagha, and V. Sangameshwar. "Visual Speech Processing and Recognition." In Advances in Intelligent Systems and Computing, 481–91. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-3383-9_44.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Wachsmuth, Sven, Gernot A. Fink, Franz Kümmert, and Gerhard Sagerer. "Using Speech in Visual Object Recognition." In Informatik aktuell, 428–35. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000. http://dx.doi.org/10.1007/978-3-642-59802-9_54.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Gupta, Deepika, Preety Singh, V. Laxmi, and Manoj S. Gaur. "Boundary Descriptors for Visual Speech Recognition." In Computer and Information Sciences II, 307–13. London: Springer London, 2011. http://dx.doi.org/10.1007/978-1-4471-2155-8_39.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Yu, Dahai, Ovidiu Ghita, Alistair Sutherland, and Paul F. Whelan. "A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition." In Advances in Image and Video Technology, 398–409. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-540-92957-4_35.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Kratt, Jan, Florian Metze, Rainer Stiefelhagen, and Alex Waibel. "Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit." In Lecture Notes in Computer Science, 488–95. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-28649-3_60.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Ivanko, Denis, Dmitry Ryumin, Alexandr Axyonov, and Miloš Železný. "Designing Advanced Geometric Features for Automatic Russian Visual Speech Recognition." In Speech and Computer, 245–54. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99579-3_26.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Anwar, M. A., Jim F. Baldwin, and Trevor P. Martin. "Learning Fuzzy Rules for Visual Speech Recognition." In Adaptive Multimedia Retrieval, 164–75. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-25981-7_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Karpov, Alexey, Alexander Ronzhin, Irina Kipyatkova, Andrey Ronzhin, Vasilisa Verkhodanova, Anton Saveliev, and Milos Zelezny. "Bimodal Speech Recognition Fusing Audio-Visual Modalities." In Lecture Notes in Computer Science, 170–79. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-39516-6_16.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Singh, Preety, Vijay Laxmi, and Manoj Singh Gaur. "Visual Speech Recognition with Selected Boundary Descriptors." In Image Feature Detectors and Descriptors, 367–83. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-28854-3_14.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Visual speech recognition"

1

Devi, Sulochana, Siddhi Chokshi, Kritika Kotian, and Juili Warwatkar. "Visual Speech Recognition." In 2021 4th Biennial International Conference on Nascent Technologies in Engineering (ICNTE). IEEE, 2021. http://dx.doi.org/10.1109/icnte51185.2021.9487784.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Shillingford, Brendan, Yannis Assael, Matthew W. Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, et al. "Large-Scale Visual Speech Recognition." In Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-1669.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Fook, C. Y., M. Hariharan, Sazali Yaacob, and AH Adom. "A review: Malay speech recognition and audio visual speech recognition." In 2012 International Conference on Biomedical Engineering (ICoBE). IEEE, 2012. http://dx.doi.org/10.1109/icobe.2012.6179063.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Galatas, Georgios, Gerasimos Potamianos, Alexandros Papangelis, and Fillia Makedon. "Audio visual speech recognition in noisy visual environments." In the 4th International Conference. New York, New York, USA: ACM Press, 2011. http://dx.doi.org/10.1145/2141622.2141646.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Zhang, X., R. M. Mersereau, M. Clements, and C. C. Broun. "Visual speech feature extraction for improved speech recognition." In Proceedings of ICASSP '02. IEEE, 2002. http://dx.doi.org/10.1109/icassp.2002.5745022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Zhang, Mersereau, Clements, and Broun. "Visual speech feature extraction for improved speech recognition." In IEEE International Conference on Acoustics Speech and Signal Processing ICASSP-02. IEEE, 2002. http://dx.doi.org/10.1109/icassp.2002.1006162.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Benhaim, Eric, Hichem Sahbi, and Guillaume Vittey. "Continuous visual speech recognition for audio speech enhancement." In ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015. http://dx.doi.org/10.1109/icassp.2015.7178370.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Silsbee, Peter L., and Alan C. Bovik. "Audio-visual speech recognition for a vowel discrimination task." In Visual Communications '93, edited by Barry G. Haskell and Hsueh-Ming Hang. SPIE, 1993. http://dx.doi.org/10.1117/12.157855.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Reikeras, Helge, Ben Herbst, Johan du Preez, and Herman Engelbrecht. "Audio-Visual Speech Recognition using SciPy." In Python in Science Conference. SciPy, 2010. http://dx.doi.org/10.25080/majora-92bf1922-010.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Frisky, Aufaclav Zatu Kusuma, Chien-Yao Wang, Andri Santoso, and Jia-Ching Wang. "Lip-based visual speech recognition system." In 2015 International Carnahan Conference on Security Technology (ICCST). IEEE, 2015. http://dx.doi.org/10.1109/ccst.2015.7389703.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!