Log in

Relevant bibliographies by topics / Visual speech information / Journal articles

To see the other types of publications on this topic, follow the link: Visual speech information.

Journal articles on the topic 'Visual speech information'

Author: Grafiati

Published: 4 June 2021

Last updated: 20 February 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Visual speech information.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Miller, Rachel M., Kauyumari Sanchez, and Lawrence D. Rosenblum. "Alignment to visual speech information." Attention, Perception, & Psychophysics 72, no. 6 (August 2010): 1614–25. http://dx.doi.org/10.3758/app.72.6.1614.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Rosenblum, Lawrence D., Deborah A. Yakel, Naser Baseer, Anjani Panchal, Brynn C. Nodarse, and Ryan P. Niehus. "Visual speech information for face recognition." Perception & Psychophysics 64, no. 2 (February 2002): 220–29. http://dx.doi.org/10.3758/bf03195788.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Yakel, Deborah A., and Lawrence D. Rosenblum. "Face identification using visual speech information." Journal of the Acoustical Society of America 100, no. 4 (October 1996): 2570. http://dx.doi.org/10.1121/1.417401.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Weinholtz, Chase, and James W. Dias. "Categorical perception of visual speech information." Journal of the Acoustical Society of America 139, no. 4 (April 2016): 2018. http://dx.doi.org/10.1121/1.4949950.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

HISANAGA, Satoko, Kaoru SEKIYAMA, Tomohiko IGASAKI, and Nobuki MURAYAMA. "Effects of visual information on audio-visual speech processing." Proceedings of the Annual Convention of the Japanese Psychological Association 75 (September 15, 2011): 2AM061. http://dx.doi.org/10.4992/pacjpa.75.0_2am061.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Sell, Andrea J., and Michael P. Kaschak. "Does visual speech information affect word segmentation?" Memory & Cognition 37, no. 6 (September 2009): 889–94. http://dx.doi.org/10.3758/mc.37.6.889.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Hall, Michael D., Paula M. T. Smeele, and Patricia K. Kuhl. "Integration of auditory and visual speech information." Journal of the Acoustical Society of America 103, no. 5 (May 1998): 2985. http://dx.doi.org/10.1121/1.421677.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

McGiverin, Rolland. "Speech, Hearing and Visual." Behavioral & Social Sciences Librarian 8, no. 3-4 (April 16, 1990): 73–78. http://dx.doi.org/10.1300/j103v08n03_12.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Hollich, George J., Peter W. Jusczyk, and Rochelle S. Newman. "Infants use of visual information in speech segmentation." Journal of the Acoustical Society of America 110, no. 5 (November 2001): 2703. http://dx.doi.org/10.1121/1.4777318.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Tekin, Ender, James Coughlan, and Helen Simon. "Improving speech enhancement algorithms by incorporating visual information." Journal of the Acoustical Society of America 134, no. 5 (November 2013): 4237. http://dx.doi.org/10.1121/1.4831575.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Ujiie, Yuta, and Kohske Takahashi. "Weaker McGurk Effect for Rubin’s Vase-Type Speech in People With High Autistic Traits." Multisensory Research 34, no. 6 (April 16, 2021): 663–79. http://dx.doi.org/10.1163/22134808-bja10047.

Full text

Abstract:

Abstract While visual information from facial speech modulates auditory speech perception, it is less influential on audiovisual speech perception among autistic individuals than among typically developed individuals. In this study, we investigated the relationship between autistic traits (Autism-Spectrum Quotient; AQ) and the influence of visual speech on the recognition of Rubin’s vase-type speech stimuli with degraded facial speech information. Participants were 31 university students (13 males and 18 females; mean age: 19.2, SD: 1.13 years) who reported normal (or corrected-to-normal) hearing and vision. All participants completed three speech recognition tasks (visual, auditory, and audiovisual stimuli) and the AQ–Japanese version. The results showed that accuracies of speech recognition for visual (i.e., lip-reading) and auditory stimuli were not significantly related to participants’ AQ. In contrast, audiovisual speech perception was less susceptible to facial speech perception among individuals with high rather than low autistic traits. The weaker influence of visual information on audiovisual speech perception in autism spectrum disorder (ASD) was robust regardless of the clarity of the visual information, suggesting a difficulty in the process of audiovisual integration rather than in the visual processing of facial speech.

APA, Harvard, Vancouver, ISO, and other styles

12

Reed, Rebecca K., and Edward T. Auer. "Influence of visual speech information on the identification of foreign accented speech." Journal of the Acoustical Society of America 125, no. 4 (April 2009): 2660. http://dx.doi.org/10.1121/1.4784199.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Kim, Jeesun, and Chris Davis. "How visual timing and form information affect speech and non-speech processing." Brain and Language 137 (October 2014): 86–90. http://dx.doi.org/10.1016/j.bandl.2014.07.012.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Sams, M. "Audiovisual Speech Perception." Perception 26, no. 1_suppl (August 1997): 347. http://dx.doi.org/10.1068/v970029.

Full text

Abstract:

Persons with hearing loss use visual information from articulation to improve their speech perception. Even persons with normal hearing utilise visual information, especially when the stimulus-to-noise ratio is poor. A dramatic demonstration of the role of vision in speech perception is the audiovisual fusion called the ‘McGurk effect’. When the auditory syllable /pa/ is presented in synchrony with the face articulating the syllable /ka/, the subject usually perceives /ta/ or /ka/. The illusory perception is clearly auditory in nature. We recently studied the audiovisual fusion (acoustical /p/, visual /k/) for Finnish (1) syllables, and (2) words. Only 3% of the subjects perceived the syllables according to the acoustical input, ie in 97% of the subjects the perception was influenced by the visual information. For words the percentage of acoustical identifications was 10%. The results demonstrate a very strong influence of visual information of articulation in face-to-face speech perception. Word meaning and sentence context have a negligible influence on the fusion. We have also recorded neuromagnetic responses of the human cortex when the subjects both heard and saw speech. Some subjects showed a distinct response to a ‘McGurk’ stimulus. The response was rather late, emerging about 200 ms from the onset of the auditory stimulus. We suggest that the perisylvian cortex, close to the source area for the auditory 100 ms response (M100), may be activated by the discordant stimuli. The behavioural and neuromagnetic results suggest a precognitive audiovisual speech integration occurring at a relatively early processing level.

APA, Harvard, Vancouver, ISO, and other styles

15

Plass, John, David Brang, Satoru Suzuki, and Marcia Grabowecky. "Vision perceptually restores auditory spectral dynamics in speech." Proceedings of the National Academy of Sciences 117, no. 29 (July 6, 2020): 16920–27. http://dx.doi.org/10.1073/pnas.2002887117.

Full text

Abstract:

Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time–frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a condition-specific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech.

APA, Harvard, Vancouver, ISO, and other styles

16

Karpov, Alexey Anatolyevich. "Assistive Information Technologies based on Audio-Visual Speech Interfaces." SPIIRAS Proceedings 4, no. 27 (March 17, 2014): 114. http://dx.doi.org/10.15622/sp.27.10.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Whalen, D. H., Julia Irwin, and Carol A. Fowler. "Audiovisual integration of speech based on minimal visual information." Journal of the Acoustical Society of America 100, no. 4 (October 1996): 2569. http://dx.doi.org/10.1121/1.417395.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Gurban, M., and J. P. Thiran. "Information Theoretic Feature Extraction for Audio-Visual Speech Recognition." IEEE Transactions on Signal Processing 57, no. 12 (December 2009): 4765–76. http://dx.doi.org/10.1109/tsp.2009.2026513.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Mishra, Sushmit, Thomas Lunner, Stefan Stenfelt, Jerker Rönnberg, and Mary Rudner. "Visual Information Can Hinder Working Memory Processing of Speech." Journal of Speech, Language, and Hearing Research 56, no. 4 (August 2013): 1120–32. http://dx.doi.org/10.1044/1092-4388(2012/12-0033).

Full text

Abstract:

Purpose The purpose of the present study was to evaluate the new Cognitive Spare Capacity Test (CSCT), which measures aspects of working memory capacity for heard speech in the audiovisual and auditory-only modalities of presentation. Method In Experiment 1, 20 young adults with normal hearing performed the CSCT and an independent battery of cognitive tests. In the CSCT, they listened to and recalled 2-digit numbers according to instructions inducing executive processing at 2 different memory loads. In Experiment 2, 10 participants performed a less executively demanding free recall task using the same stimuli. Results CSCT performance demonstrated an effect of memory load and was associated with independent measures of executive function and inference making but not with general working memory capacity. Audiovisual presentation was associated with lower CSCT scores but higher free recall performance scores. Conclusions CSCT is an executively challenging test of the ability to process heard speech. It captures cognitive aspects of listening related to sentence comprehension that are quantitatively and qualitatively different from working memory capacity. Visual information provided in the audiovisual modality of presentation can hinder executive processing in working memory of nondegraded speech material.

APA, Harvard, Vancouver, ISO, and other styles

20

Borrie, Stephanie A. "Visual speech information: A help or hindrance in perceptual processing of dysarthric speech." Journal of the Acoustical Society of America 137, no. 3 (March 2015): 1473–80. http://dx.doi.org/10.1121/1.4913770.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Wayne, Rachel V., and Ingrid S. Johnsrude. "The role of visual speech information in supporting perceptual learning of degraded speech." Journal of Experimental Psychology: Applied 18, no. 4 (2012): 419–35. http://dx.doi.org/10.1037/a0031042.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Winneke, Axel H., and Natalie A. Phillips. "Brain processes underlying the integration of audio-visual speech and non-speech information." Brain and Cognition 67 (June 2008): 45. http://dx.doi.org/10.1016/j.bandc.2008.02.096.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Sánchez-García, Carolina, Sonia Kandel, Christophe Savariaux, Nara Ikumi, and Salvador Soto-Faraco. "Time course of audio–visual phoneme identification: A cross-modal gating study." Seeing and Perceiving 25 (2012): 194. http://dx.doi.org/10.1163/187847612x648233.

Full text

Abstract:

When both present, visual and auditory information are combined in order to decode the speech signal. Past research has addressed to what extent visual information contributes to distinguish confusable speech sounds, but usually ignoring the continuous nature of speech perception. Here we tap at the temporal course of the contribution of visual and auditory information during the process of speech perception. To this end, we designed an audio–visual gating task with videos recorded with high speed camera. Participants were asked to identify gradually longer fragments of pseudowords varying in the central consonant. Different Spanish consonant phonemes with different degree of visual and acoustic saliency were included, and tested on visual-only, auditory-only and audio–visual trials. The data showed different patterns of contribution of unimodal and bimodal information during identification, depending on the visual saliency of the presented phonemes. In particular, for phonemes which are clearly more salient in one modality than the other, audio–visual performance equals that of the best unimodal. In phonemes with more balanced saliency, audio–visual performance was better than both unimodal conditions. These results shed new light on the temporal course of audio–visual speech integration.

APA, Harvard, Vancouver, ISO, and other styles

24

Yordamlı, Arzu, and Doğu Erdener. "Auditory–Visual Speech Integration in Bipolar Disorder: A Preliminary Study." Languages 3, no. 4 (October 17, 2018): 38. http://dx.doi.org/10.3390/languages3040038.

Full text

Abstract:

This study aimed to investigate how individuals with bipolar disorder integrate auditory and visual speech information compared to healthy individuals. Furthermore, we wanted to see whether there were any differences between manic and depressive episode bipolar disorder patients with respect to auditory and visual speech integration. It was hypothesized that the bipolar group’s auditory–visual speech integration would be weaker than that of the control group. Further, it was predicted that those in the manic phase of bipolar disorder would integrate visual speech information more robustly than their depressive phase counterparts. To examine these predictions, a McGurk effect paradigm with an identification task was used with typical auditory–visual (AV) speech stimuli. Additionally, auditory-only (AO) and visual-only (VO, lip-reading) speech perceptions were also tested. The dependent variable for the AV stimuli was the amount of visual speech influence. The dependent variables for AO and VO stimuli were accurate modality-based responses. Results showed that the disordered and control groups did not differ in AV speech integration and AO speech perception. However, there was a striking difference in favour of the healthy group with respect to the VO stimuli. The results suggest the need for further research whereby both behavioural and physiological data are collected simultaneously. This will help us understand the full dynamics of how auditory and visual speech information are integrated in people with bipolar disorder.

APA, Harvard, Vancouver, ISO, and other styles

25

Drijvers, Linda, and Asli Özyürek. "Visual Context Enhanced: The Joint Contribution of Iconic Gestures and Visible Speech to Degraded Speech Comprehension." Journal of Speech, Language, and Hearing Research 60, no. 1 (January 2017): 212–22. http://dx.doi.org/10.1044/2016_jslhr-h-16-0101.

Full text

Abstract:

Purpose This study investigated whether and to what extent iconic co-speech gestures contribute to information from visible speech to enhance degraded speech comprehension at different levels of noise-vocoding. Previous studies of the contributions of these 2 visual articulators to speech comprehension have only been performed separately. Method Twenty participants watched videos of an actress uttering an action verb and completed a free-recall task. The videos were presented in 3 speech conditions (2-band noise-vocoding, 6-band noise-vocoding, clear), 3 multimodal conditions (speech + lips blurred, speech + visible speech, speech + visible speech + gesture), and 2 visual-only conditions (visible speech, visible speech + gesture). Results Accuracy levels were higher when both visual articulators were present compared with 1 or none. The enhancement effects of (a) visible speech, (b) gestural information on top of visible speech, and (c) both visible speech and iconic gestures were larger in 6-band than 2-band noise-vocoding or visual-only conditions. Gestural enhancement in 2-band noise-vocoding did not differ from gestural enhancement in visual-only conditions. Conclusions When perceiving degraded speech in a visual context, listeners benefit more from having both visual articulators present compared with 1. This benefit was larger at 6-band than 2-band noise-vocoding, where listeners can benefit from both phonological cues from visible speech and semantic cues from iconic gestures to disambiguate speech.

APA, Harvard, Vancouver, ISO, and other styles

26

Rosenblum, Lawrence D. "Speech Perception as a Multimodal Phenomenon." Current Directions in Psychological Science 17, no. 6 (December 2008): 405–9. http://dx.doi.org/10.1111/j.1467-8721.2008.00615.x.

Full text

Abstract:

Speech perception is inherently multimodal. Visual speech (lip-reading) information is used by all perceivers and readily integrates with auditory speech. Imaging research suggests that the brain treats auditory and visual speech similarly. These findings have led some researchers to consider that speech perception works by extracting amodal information that takes the same form across modalities. From this perspective, speech integration is a property of the input information itself. Amodal speech information could explain the reported automaticity, immediacy, and completeness of audiovisual speech integration. However, recent findings suggest that speech integration can be influenced by higher cognitive properties such as lexical status and semantic context. Proponents of amodal accounts will need to explain these results.

APA, Harvard, Vancouver, ISO, and other styles

27

Mishra, Saumya, Anup Kumar Gupta, and Puneet Gupta. "DARE: Deceiving Audio–Visual speech Recognition model." Knowledge-Based Systems 232 (November 2021): 107503. http://dx.doi.org/10.1016/j.knosys.2021.107503.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Callan, Daniel E., Jeffery A. Jones, Kevin Munhall, Christian Kroos, Akiko M. Callan, and Eric Vatikiotis-Bateson. "Multisensory Integration Sites Identified by Perception of Spatial Wavelet Filtered Visual Speech Gesture Information." Journal of Cognitive Neuroscience 16, no. 5 (June 2004): 805–16. http://dx.doi.org/10.1162/089892904970771.

Full text

Abstract:

Perception of speech is improved when presentation of the audio signal is accompanied by concordant visual speech gesture information. This enhancement is most prevalent when the audio signal is degraded. One potential means by which the brain affords perceptual enhancement is thought to be through the integration of concordant information from multiple sensory channels in a common site of convergence, multisensory integration (MSI) sites. Some studies have identified potential sites in the superior temporal gyrus/sulcus (STG/S) that are responsive to multisensory information from the auditory speech signal and visual speech movement. One limitation of these studies is that they do not control for activity resulting from attentional modulation cued by such things as visual information signaling the onsets and offsets of the acoustic speech signal, as well as activity resulting from MSI of properties of the auditory speech signal with aspects of gross visual motion that are not specific to place of articulation information. This fMRI experiment uses spatial wavelet bandpass filtered Japanese sentences presented with background multispeaker audio noise to discern brain activity reflecting MSI induced by auditory and visual correspondence of place of articulation information that controls for activity resulting from the above-mentioned factors. The experiment consists of a low-frequency (LF) filtered condition containing gross visual motion of the lips, jaw, and head without specific place of articulation information, a midfrequency (MF) filtered condition containing place of articulation information, and an unfiltered (UF) condition. Sites of MSI selectively induced by auditory and visual correspondence of place of articulation information were determined by the presence of activity for both the MF and UF conditions relative to the LF condition. Based on these criteria, sites of MSI were found predominantly in the left middle temporal gyrus (MTG), and the left STG/S (including the auditory cortex). By controlling for additional factors that could also induce greater activity resulting from visual motion information, this study identifies potential MSI sites that we believe are involved with improved speech perception intelligibility.

APA, Harvard, Vancouver, ISO, and other styles

29

Hertrich, Ingo, Susanne Dietrich, and Hermann Ackermann. "Cross-modal Interactions during Perception of Audiovisual Speech and Nonspeech Signals: An fMRI Study." Journal of Cognitive Neuroscience 23, no. 1 (January 2011): 221–37. http://dx.doi.org/10.1162/jocn.2010.21421.

Full text

Abstract:

During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream—prior to its fusion with auditory phonological features [Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. Time course of early audiovisual interactions during speech and non-speech central-auditory processing: An MEG study. Journal of Cognitive Neuroscience, 21, 259–274, 2009]. Using functional magnetic resonance imaging, the present follow-up study aims to further elucidate the topographic distribution of visual–phonological operations and audiovisual (AV) interactions during speech perception. Ambiguous acoustic syllables—disambiguated to /pa/ or /ta/ by the visual channel (speaking face)—served as test materials, concomitant with various control conditions (nonspeech AV signals, visual-only and acoustic-only speech, and nonspeech stimuli). (i) Visual speech yielded an AV-subadditive activation of primary auditory cortex and the anterior superior temporal gyrus (STG), whereas the posterior STG responded both to speech and nonspeech motion. (ii) The inferior frontal and the fusiform gyrus of the right hemisphere showed a strong phonetic/phonological impact (differential effects of visual /pa/ vs. /ta/) upon hemodynamic activation during presentation of speaking faces. Taken together with the previous MEG data, these results point at a dual-pathway model of visual speech information processing: On the one hand, access to the auditory system via the anterior supratemporal “what” path may give rise to direct activation of “auditory objects.” On the other hand, visual speech information seems to be represented in a right-hemisphere visual working memory, providing a potential basis for later interactions with auditory information such as the McGurk effect.

APA, Harvard, Vancouver, ISO, and other styles

30

Everdell, Ian T., Heidi Marsh, Micheal D. Yurick, Kevin G. Munhall, and Martin Paré. "Gaze Behaviour in Audiovisual Speech Perception: Asymmetrical Distribution of Face-Directed Fixations." Perception 36, no. 10 (October 2007): 1535–45. http://dx.doi.org/10.1068/p5852.

Full text

Abstract:

Speech perception under natural conditions entails integration of auditory and visual information. Understanding how visual and auditory speech information are integrated requires detailed descriptions of the nature and processing of visual speech information. To understand better the process of gathering visual information, we studied the distribution of face-directed fixations of humans performing an audiovisual speech perception task to characterise the degree of asymmetrical viewing and its relationship to speech intelligibility. Participants showed stronger gaze fixation asymmetries while viewing dynamic faces, compared to static faces or face-like objects, especially when gaze was directed to the talkers' eyes. Although speech perception accuracy was significantly enhanced by the viewing of congruent, dynamic faces, we found no correlation between task performance and gaze fixation asymmetry. Most participants preferentially fixated the right side of the faces and their preferences persisted while viewing horizontally mirrored stimuli, different talkers, or static faces. These results suggest that the asymmetrical distributions of gaze fixations reflect the participants' viewing preferences, rather than being a product of asymmetrical faces, but that this behavioural bias does not predict correct audiovisual speech perception.

APA, Harvard, Vancouver, ISO, and other styles

31

Jesse, Alexandra, Nick Vrignaud, Michael M. Cohen, and Dominic W. Massaro. "The processing of information from multiple sources in simultaneous interpreting." Interpreting. International Journal of Research and Practice in Interpreting 5, no. 2 (December 31, 2000): 95–115. http://dx.doi.org/10.1075/intp.5.2.04jes.

Full text

Abstract:

Language processing is influenced by multiple sources of information. We examined whether the performance in simultaneous interpreting would be improved when providing two sources of information, the auditory speech as well as corresponding lip-movements, in comparison to presenting the auditory speech alone. Although there was an improvement in sentence recognition when presented with visible speech, there was no difference in performance between these two presentation conditions when bilinguals simultaneously interpreted from English to German or from English to Spanish. The reason why visual speech did not contribute to performance could be the presentation of the auditory signal without noise (Massaro, 1998). This hypothesis should be tested in the future. Furthermore, it should be investigated if an effect of visible speech can be found for other contexts, when visual information could provide cues for emotions, prosody, or syntax.

APA, Harvard, Vancouver, ISO, and other styles

32

Jia, Xi Bin, and Mei Xia Zheng. "Video Based Visual Speech Feature Model Construction." Applied Mechanics and Materials 182-183 (June 2012): 1367–71. http://dx.doi.org/10.4028/www.scientific.net/amm.182-183.1367.

Full text

Abstract:

This paper aims to give a solutions for the construction of chinese visual speech feature model based on HMM. We propose and discuss three kind representation model of the visual speech which are lip geometrical features, lip motion features and lip texture features. The model combines the advantages of the local LBP and global DCT texture information together, which shows better performance than the single feature. Equally the model combines the advantages of the local LBP and geometrical information together is better than single feature. By computing the recognition rate of the visemes from the model, the paper shows the HMM which describing the dynamic of speech, coupled with the combined feature for describing the global and local texture is the best model.

APA, Harvard, Vancouver, ISO, and other styles

33

Shi, Li Juan, Ping Feng, Jian Zhao, Li Rong Wang, and Na Che. "Study on Dual Mode Fusion Method of Video and Audio." Applied Mechanics and Materials 734 (February 2015): 412–15. http://dx.doi.org/10.4028/www.scientific.net/amm.734.412.

Full text

Abstract:

In order to solve the hearing-impaired students in class only rely on sign language, amount of classroom information received less, This paper studies video and audio dual mode fusion algorithm combined with lip reading、speech recognition technology and information fusion technology.First ,speech feature extraction, processing of speech signal, the speech synchronization output text. At the same time, extraction of video features, voice and video signal fusion, Make voice information into visual information that the hearing-impaired students can receive. Make the students receive text messages as receive visual information, improve speech recognition rate, so meet the need of the classroom teaching for hearing-impaired students.

APA, Harvard, Vancouver, ISO, and other styles

34

Dias, James W., and Lawrence D. Rosenblum. "Visual Influences on Interactive Speech Alignment." Perception 40, no. 12 (January 1, 2011): 1457–66. http://dx.doi.org/10.1068/p7071.

Full text

Abstract:

Speech alignment describes the unconscious tendency to produce speech that shares characteristics with perceived speech (eg Goldinger, 1998 Psychological Review105 251 – 279). In the present study we evaluated whether seeing a talker enhances alignment over just hearing a talker. Pairs of participants performed an interactive search task which required them to repeatedly utter a series of keywords. Half of the pairs performed the task while hearing each other, while the other half could see and hear each other. Alignment was assessed by naive judges rating the similarity of interlocutors' keywords recorded before, during, and after the interactive task. Results showed that interlocutors aligned more when able to see one another suggesting that visual information enhances speech alignment.

APA, Harvard, Vancouver, ISO, and other styles

35

Campbell, Ruth. "The processing of audio-visual speech: empirical and neural bases." Philosophical Transactions of the Royal Society B: Biological Sciences 363, no. 1493 (September 7, 2007): 1001–10. http://dx.doi.org/10.1098/rstb.2007.2155.

Full text

Abstract:

In this selective review, I outline a number of ways in which seeing the talker affects auditory perception of speech, including, but not confined to, the McGurk effect. To date, studies suggest that all linguistic levels are susceptible to visual influence, and that two main modes of processing can be described: a complementary mode, whereby vision provides information more efficiently than hearing for some under-specified parts of the speech stream, and a correlated mode, whereby vision partially duplicates information about dynamic articulatory patterning. Cortical correlates of seen speech suggest that at the neurological as well as the perceptual level, auditory processing of speech is affected by vision, so that ‘auditory speech regions’ are activated by seen speech. The processing of natural speech, whether it is heard, seen or heard and seen, activates the perisylvian language regions (left>right). It is highly probable that activation occurs in a specific order. First, superior temporal, then inferior parietal and finally inferior frontal regions (left>right) are activated. There is some differentiation of the visual input stream to the core perisylvian language system, suggesting that complementary seen speech information makes special use of the visual ventral processing stream, while for correlated visual speech, the dorsal processing stream, which is sensitive to visual movement, may be relatively more involved.

APA, Harvard, Vancouver, ISO, and other styles

36

Metzger, Brian A. ,., John F. ,. Magnotti, Elizabeth Nesbitt, Daniel Yoshor, and Michael S. ,. Beauchamp. "Cross-modal suppression model of speech perception: Visual information drives suppressive interactions between visual and auditory speech in pSTG." Journal of Vision 20, no. 11 (October 20, 2020): 434. http://dx.doi.org/10.1167/jov.20.11.434.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Irwin, Julia, Trey Avery, Lawrence Brancazio, Jacqueline Turcios, Kayleigh Ryherd, and Nicole Landi. "Electrophysiological Indices of Audiovisual Speech Perception: Beyond the McGurk Effect and Speech in Noise." Multisensory Research 31, no. 1-2 (2018): 39–56. http://dx.doi.org/10.1163/22134808-00002580.

Full text

Abstract:

Visual information on a talker’s face can influence what a listener hears. Commonly used approaches to study this include mismatched audiovisual stimuli (e.g., McGurk type stimuli) or visual speech in auditory noise. In this paper we discuss potential limitations of these approaches and introduce a novel visual phonemic restoration method. This method always presents the same visual stimulus (e.g., /ba/) dubbed with a matched auditory stimulus (/ba/) or one that has weakened consonantal information and sounds more /a/-like). When this reduced auditory stimulus (or /a/) is dubbed with the visual /ba/, a visual influence will result in effectively ‘restoring’ the weakened auditory cues so that the stimulus is perceived as a /ba/. An oddball design in which participants are asked to detect the /a/ among a stream of more frequently occurring /ba/s while either a speaking face or face with no visual speech was used. In addition, the same paradigm was presented for a second contrast in which participants detected /pa/ among /ba/s, a contrast which should be unaltered by the presence of visual speech. Behavioral and some ERP findings reflect the expected phonemic restoration for the /ba/ vs. /a/ contrast; specifically, we observed reduced accuracy and P300 response in the presence of visual speech. Further, we report an unexpected finding of reduced accuracy and P300 response for both speech contrasts in the presence of visual speech, suggesting overall modulation of the auditory signal in the presence of visual speech. Consistent with this, we observed a mismatch negativity (MMN) effect for the /ba/ vs. /pa/ contrast only that was larger in absence of visual speech. We discuss the potential utility for this paradigm for listeners who cannot respond actively, such as infants and individuals with developmental disabilities.

APA, Harvard, Vancouver, ISO, and other styles

38

Van Engen, Kristin J., Jasmine E. B. Phelps, Rajka Smiljanic, and Bharath Chandrasekaran. "Enhancing Speech Intelligibility: Interactions Among Context, Modality, Speech Style, and Masker." Journal of Speech, Language, and Hearing Research 57, no. 5 (October 2014): 1908–18. http://dx.doi.org/10.1044/jslhr-h-13-0076.

Full text

Abstract:

Purpose The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous sentences, conversational and clear speaking styles, auditory-only (AO) and audiovisual (AV) presentation modalities, and 4 different maskers (2-talker babble, 4-talker babble, 8-talker babble, and speech-shaped noise). Results Semantic context, clear speech, and visual input all improved intelligibility but also interacted with one another and with masking condition. Semantic context was beneficial across all maskers in AV conditions but only in speech-shaped noise in AO conditions. Clear speech provided the most benefit for AV speech with semantically anomalous targets. Finally, listeners were better able to take advantage of visual information for meaningful versus anomalous sentences and for clear versus conversational speech. Conclusion Because intelligibility-enhancing cues influence each other and depend on masking condition, multiple maskers and enhancement cues should be used to accurately assess individuals' speech-in-noise perception.

APA, Harvard, Vancouver, ISO, and other styles

39

Records, Nancy L. "A Measure of the Contribution of a Gesture to the Perception of Speech in Listeners With Aphasia." Journal of Speech, Language, and Hearing Research 37, no. 5 (October 1994): 1086–99. http://dx.doi.org/10.1044/jshr.3705.1086.

Full text

Abstract:

The contribution of a visual source of contextual information to speech perception was measured in 12 listeners with aphasia. The three experimental conditions were: Visual-Only (referential gesture), Auditory-Only (computer-edited speech), and Audio-Visual. In a two-alternative, forced-choice task, subjects indicated which picture had been requested. The stimuli were first validated with listeners without brain damage. The listeners with aphasia were subgrouped as having high or low language comprehension based on standardized test scores. Results showed a significantly larger contribution of gestural information to the responses of the lower-comprehension subgroup. The contribution of gesture was significantly correlated with the amount of ambiguity experienced with the auditory-only information. These results show that as the auditory information becomes more ambiguous, individuals with impaired language comprehension deficits make greater use of the visual information. The results support clinical observations that speech information received without visual context is perceived differently than when received with visual context.

APA, Harvard, Vancouver, ISO, and other styles

40

Helfer, Karen S. "Auditory and Auditory-Visual Perception of Clear and Conversational Speech." Journal of Speech, Language, and Hearing Research 40, no. 2 (April 1997): 432–43. http://dx.doi.org/10.1044/jslhr.4002.432.

Full text

Abstract:

Research has shown that speaking in a deliberately clear manner can improve the accuracy of auditory speech recognition. Allowing listeners access to visual speech cues also enhances speech understanding. Whether the nature of information provided by speaking clearly and by using visual speech cues is redundant has not been determined. This study examined how speaking mode (clear vs. conversational) and presentation mode (auditory vs. auditory-visual) influenced the perception of words within nonsense sentences. In Experiment 1, 30 young listeners with normal hearing responded to videotaped stimuli presented audiovisually in the presence of background noise at one of three signal-to-noise ratios. In Experiment 2, 9 participants returned for an additional assessment using auditory-only presentation. Results of these experiments showed significant effects of speaking mode (clear speech was easier to understand than was conversational speech) and presentation mode (auditoryvisual presentation led to better performance than did auditory-only presentation). The benefit of clear speech was greater for words occurring in the middle of sentences than for words at either the beginning or end of sentences for both auditory-only and auditory-visual presentation, whereas the greatest benefit from supplying visual cues was for words at the end of sentences spoken both clearly and conversationally. The total benefit from speaking clearly and supplying visual cues was equal to the sum of each of these effects. Overall, the results suggest that speaking clearly and providing visual speech information provide complementary (rather than redundant) information.

APA, Harvard, Vancouver, ISO, and other styles

41

Taitelbaum-Swead, Riki, and Leah Fostick. "Auditory and visual information in speech perception: A developmental perspective." Clinical Linguistics & Phonetics 30, no. 7 (March 30, 2016): 531–45. http://dx.doi.org/10.3109/02699206.2016.1151938.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Yakel, Deborah A., and Lawrence D. Rosenblum. "Time‐varying information for vowel identification in visual speech perception." Journal of the Acoustical Society of America 108, no. 5 (November 2000): 2482. http://dx.doi.org/10.1121/1.4743160.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Johnson, Jennifer A., and Lawrence D. Rosenblum. "Hemispheric differences in perceiving and integrating dynamic visual speech information." Journal of the Acoustical Society of America 100, no. 4 (October 1996): 2570. http://dx.doi.org/10.1121/1.417400.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Ogihara, Akio, Akira Shintani, Naoshi Doi, and Kunio Fukunaga. "HMM Speech Recognition Using Fusion of Visual and Auditory Information." IEEJ Transactions on Electronics, Information and Systems 115, no. 11 (1995): 1317–24. http://dx.doi.org/10.1541/ieejeiss1987.115.11_1317.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Keintz, Connie K., Kate Bunton, and Jeannette D. Hoit. "Influence of Visual Information on the Intelligibility of Dysarthric Speech." American Journal of Speech-Language Pathology 16, no. 3 (August 2007): 222–34. http://dx.doi.org/10.1044/1058-0360(2007/027).

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Yuan, Yi, Andrew Lotto, and Yonghee Oh. "Temporal cues from visual information benefit speech perception in noise." Journal of the Acoustical Society of America 146, no. 4 (October 2019): 3056. http://dx.doi.org/10.1121/1.5137604.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Blank, Helen, and Katharina von Kriegstein. "Mechanisms of enhancing visual–speech recognition by prior auditory information." NeuroImage 65 (January 2013): 109–18. http://dx.doi.org/10.1016/j.neuroimage.2012.09.047.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Moon, Il-Joon, Mini Jo, Ga-Young Kim, Nicolas Kim, Young-Sang Cho, Sung-Hwa Hong, and Hye-Yoon Seol. "How Does a Face Mask Impact Speech Perception?" Healthcare 10, no. 9 (September 7, 2022): 1709. http://dx.doi.org/10.3390/healthcare10091709.

Full text

Abstract:

Face masks are mandatory during the COVID-19 pandemic, leading to attenuation of sound energy and loss of visual cues which are important for communication. This study explores how a face mask affects speech performance for individuals with and without hearing loss. Four video recordings (a female speaker with and without a face mask and a male speaker with and without a face mask) were used to examine individuals’ speech performance. The participants completed a listen-and-repeat task while watching four types of video recordings. Acoustic characteristics of speech signals based on mask type (no mask, surgical, and N95) were also examined. The availability of visual cues was beneficial for speech understanding—both groups showed significant improvements in speech perception when they were able to see the speaker without the mask. However, when the speakers were wearing the mask, no statistical significance was observed between no visual cues and visual cues conditions. Findings of the study demonstrate that provision of visual cues is beneficial for speech perception for individuals with normal hearing and hearing impairment. This study adds value to the importance of the use of communication strategies during the pandemic where visual information is lost due to the face mask.

APA, Harvard, Vancouver, ISO, and other styles

49

Kubicek, Claudia, Anne Hillairet de Boisferon, Eve Dupierrix, Hélène Lœvenbruck, Judit Gervain, and Gudrun Schwarzer. "Face-scanning behavior to silently-talking faces in 12-month-old infants: The impact of pre-exposed auditory speech." International Journal of Behavioral Development 37, no. 2 (February 25, 2013): 106–10. http://dx.doi.org/10.1177/0165025412473016.

Full text

Abstract:

The present eye-tracking study aimed to investigate the impact of auditory speech information on 12-month-olds’ gaze behavior to silently-talking faces. We examined German infants’ face-scanning behavior to side-by-side presentation of a bilingual speaker’s face silently speaking German utterances on one side and French on the other side, before and after auditory familiarization with one of the two languages. The results showed that 12-month-old infants showed no general visual preference for either of the visual speeches, neither before nor after auditory input. But, infants who heard native speech decreased their looking time to the mouth area and focused longer on the eyes compared to their scanning behavior without auditory language input, whereas infants who heard non-native speech increased their visual attention on the mouth region and focused less on the eyes. Thus, it can be assumed that 12-month-olds quickly identified their native language based on auditory speech and guided their visual attention more to the eye region than infants who have listened to non-native speech.

APA, Harvard, Vancouver, ISO, and other styles

50

McCotter, Maxine V., and Timothy R. Jordan. "The Role of Facial Colour and Luminance in Visual and Audiovisual Speech Perception." Perception 32, no. 8 (August 2003): 921–36. http://dx.doi.org/10.1068/p3316.

Full text

Abstract:

We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments la (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!