Zeitschriftenartikel zum Thema „Audiovisual speech processing“

Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Audiovisual speech processing.

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Zeitschriftenartikel für die Forschung zum Thema "Audiovisual speech processing" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Zeitschriftenartikel für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Tsuhan Chen. „Audiovisual speech processing“. IEEE Signal Processing Magazine 18, Nr. 1 (2001): 9–21. http://dx.doi.org/10.1109/79.911195.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Vatikiotis-Bateson, Eric, und Takaaki Kuratate. „Overview of audiovisual speech processing“. Acoustical Science and Technology 33, Nr. 3 (2012): 135–41. http://dx.doi.org/10.1250/ast.33.135.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Francisco, Ana A., Alexandra Jesse, Margriet A. Groen und James M. McQueen. „A General Audiovisual Temporal Processing Deficit in Adult Readers With Dyslexia“. Journal of Speech, Language, and Hearing Research 60, Nr. 1 (Januar 2017): 144–58. http://dx.doi.org/10.1044/2016_jslhr-h-15-0375.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Purpose Because reading is an audiovisual process, reading impairment may reflect an audiovisual processing deficit. The aim of the present study was to test the existence and scope of such a deficit in adult readers with dyslexia. Method We tested 39 typical readers and 51 adult readers with dyslexia on their sensitivity to the simultaneity of audiovisual speech and nonspeech stimuli, their time window of audiovisual integration for speech (using incongruent /aCa/ syllables), and their audiovisual perception of phonetic categories. Results Adult readers with dyslexia showed less sensitivity to audiovisual simultaneity than typical readers for both speech and nonspeech events. We found no differences between readers with dyslexia and typical readers in the temporal window of integration for audiovisual speech or in the audiovisual perception of phonetic categories. Conclusions The results suggest an audiovisual temporal deficit in dyslexia that is not specific to speech-related events. But the differences found for audiovisual temporal sensitivity did not translate into a deficit in audiovisual speech perception. Hence, there seems to be a hiatus between simultaneity judgment and perception, suggesting a multisensory system that uses different mechanisms across tasks. Alternatively, it is possible that the audiovisual deficit in dyslexia is only observable when explicit judgments about audiovisual simultaneity are required.
4

Bernstein, Lynne E., Edward T. Auer, Michael Wagner und Curtis W. Ponton. „Spatiotemporal dynamics of audiovisual speech processing“. NeuroImage 39, Nr. 1 (Januar 2008): 423–35. http://dx.doi.org/10.1016/j.neuroimage.2007.08.035.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Sams, M. „Audiovisual Speech Perception“. Perception 26, Nr. 1_suppl (August 1997): 347. http://dx.doi.org/10.1068/v970029.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Persons with hearing loss use visual information from articulation to improve their speech perception. Even persons with normal hearing utilise visual information, especially when the stimulus-to-noise ratio is poor. A dramatic demonstration of the role of vision in speech perception is the audiovisual fusion called the ‘McGurk effect’. When the auditory syllable /pa/ is presented in synchrony with the face articulating the syllable /ka/, the subject usually perceives /ta/ or /ka/. The illusory perception is clearly auditory in nature. We recently studied the audiovisual fusion (acoustical /p/, visual /k/) for Finnish (1) syllables, and (2) words. Only 3% of the subjects perceived the syllables according to the acoustical input, ie in 97% of the subjects the perception was influenced by the visual information. For words the percentage of acoustical identifications was 10%. The results demonstrate a very strong influence of visual information of articulation in face-to-face speech perception. Word meaning and sentence context have a negligible influence on the fusion. We have also recorded neuromagnetic responses of the human cortex when the subjects both heard and saw speech. Some subjects showed a distinct response to a ‘McGurk’ stimulus. The response was rather late, emerging about 200 ms from the onset of the auditory stimulus. We suggest that the perisylvian cortex, close to the source area for the auditory 100 ms response (M100), may be activated by the discordant stimuli. The behavioural and neuromagnetic results suggest a precognitive audiovisual speech integration occurring at a relatively early processing level.
6

Ojanen, Ville, Riikka Möttönen, Johanna Pekkola, Iiro P. Jääskeläinen, Raimo Joensuu, Taina Autti und Mikko Sams. „Processing of audiovisual speech in Broca's area“. NeuroImage 25, Nr. 2 (April 2005): 333–38. http://dx.doi.org/10.1016/j.neuroimage.2004.12.001.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Stevenson, Ryan A., Nicholas A. Altieri, Sunah Kim, David B. Pisoni und Thomas W. James. „Neural processing of asynchronous audiovisual speech perception“. NeuroImage 49, Nr. 4 (Februar 2010): 3308–18. http://dx.doi.org/10.1016/j.neuroimage.2009.12.001.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Hamilton, Roy H., Jeffrey T. Shenton und H. Branch Coslett. „An acquired deficit of audiovisual speech processing“. Brain and Language 98, Nr. 1 (Juli 2006): 66–73. http://dx.doi.org/10.1016/j.bandl.2006.02.001.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Dunham-Carr, Kacie, Jacob I. Feldman, David M. Simon, Sarah R. Edmunds, Alexander Tu, Wayne Kuang, Julie G. Conrad, Pooja Santapuram, Mark T. Wallace und Tiffany G. Woynaroski. „The Processing of Audiovisual Speech Is Linked with Vocabulary in Autistic and Nonautistic Children: An ERP Study“. Brain Sciences 13, Nr. 7 (08.07.2023): 1043. http://dx.doi.org/10.3390/brainsci13071043.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Explaining individual differences in vocabulary in autism is critical, as understanding and using words to communicate are key predictors of long-term outcomes for autistic individuals. Differences in audiovisual speech processing may explain variability in vocabulary in autism. The efficiency of audiovisual speech processing can be indexed via amplitude suppression, wherein the amplitude of the event-related potential (ERP) is reduced at the P2 component in response to audiovisual speech compared to auditory-only speech. This study used electroencephalography (EEG) to measure P2 amplitudes in response to auditory-only and audiovisual speech and norm-referenced, standardized assessments to measure vocabulary in 25 autistic and 25 nonautistic children to determine whether amplitude suppression (a) differs or (b) explains variability in vocabulary in autistic and nonautistic children. A series of regression analyses evaluated associations between amplitude suppression and vocabulary scores. Both groups demonstrated P2 amplitude suppression, on average, in response to audiovisual speech relative to auditory-only speech. Between-group differences in mean amplitude suppression were nonsignificant. Individual differences in amplitude suppression were positively associated with expressive vocabulary through receptive vocabulary, as evidenced by a significant indirect effect observed across groups. The results suggest that efficiency of audiovisual speech processing may explain variance in vocabulary in autism.
10

Tomalski, Przemysław. „Developmental Trajectory of Audiovisual Speech Integration in Early Infancy. A Review of Studies Using the McGurk Paradigm“. Psychology of Language and Communication 19, Nr. 2 (01.10.2015): 77–100. http://dx.doi.org/10.1515/plc-2015-0006.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Abstract Apart from their remarkable phonological skills young infants prior to their first birthday show ability to match the mouth articulation they see with the speech sounds they hear. They are able to detect the audiovisual conflict of speech and to selectively attend to articulating mouth depending on audiovisual congruency. Early audiovisual speech processing is an important aspect of language development, related not only to phonological knowledge, but also to language production during subsequent years. Th is article reviews recent experimental work delineating the complex developmental trajectory of audiovisual mismatch detection. Th e central issue is the role of age-related changes in visual scanning of audiovisual speech and the corresponding changes in neural signatures of audiovisual speech processing in the second half of the first year of life. Th is phenomenon is discussed in the context of recent theories of perceptual development and existing data on the neural organisation of the infant ‘social brain’.
11

Ozker, Muge, Inga M. Schepers, John F. Magnotti, Daniel Yoshor und Michael S. Beauchamp. „A Double Dissociation between Anterior and Posterior Superior Temporal Gyrus for Processing Audiovisual Speech Demonstrated by Electrocorticography“. Journal of Cognitive Neuroscience 29, Nr. 6 (Juni 2017): 1044–60. http://dx.doi.org/10.1162/jocn_a_01110.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Human speech can be comprehended using only auditory information from the talker's voice. However, comprehension is improved if the talker's face is visible, especially if the auditory information is degraded as occurs in noisy environments or with hearing loss. We explored the neural substrates of audiovisual speech perception using electrocorticography, direct recording of neural activity using electrodes implanted on the cortical surface. We observed a double dissociation in the responses to audiovisual speech with clear and noisy auditory component within the superior temporal gyrus (STG), a region long known to be important for speech perception. Anterior STG showed greater neural activity to audiovisual speech with clear auditory component, whereas posterior STG showed similar or greater neural activity to audiovisual speech in which the speech was replaced with speech-like noise. A distinct border between the two response patterns was observed, demarcated by a landmark corresponding to the posterior margin of Heschl's gyrus. To further investigate the computational roles of both regions, we considered Bayesian models of multisensory integration, which predict that combining the independent sources of information available from different modalities should reduce variability in the neural responses. We tested this prediction by measuring the variability of the neural responses to single audiovisual words. Posterior STG showed smaller variability than anterior STG during presentation of audiovisual speech with noisy auditory component. Taken together, these results suggest that posterior STG but not anterior STG is important for multisensory integration of noisy auditory and visual speech.
12

Simon, David M., und Mark T. Wallace. „Integration and Temporal Processing of Asynchronous Audiovisual Speech“. Journal of Cognitive Neuroscience 30, Nr. 3 (März 2018): 319–37. http://dx.doi.org/10.1162/jocn_a_01205.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Multisensory integration of visual mouth movements with auditory speech is known to offer substantial perceptual benefits, particularly under challenging (i.e., noisy) acoustic conditions. Previous work characterizing this process has found that ERPs to auditory speech are of shorter latency and smaller magnitude in the presence of visual speech. We sought to determine the dependency of these effects on the temporal relationship between the auditory and visual speech streams using EEG. We found that reductions in ERP latency and suppression of ERP amplitude are maximal when the visual signal precedes the auditory signal by a small interval and that increasing amounts of asynchrony reduce these effects in a continuous manner. Time–frequency analysis revealed that these effects are found primarily in the theta (4–8 Hz) and alpha (8–12 Hz) bands, with a central topography consistent with auditory generators. Theta effects also persisted in the lower portion of the band (3.5–5 Hz), and this late activity was more frontally distributed. Importantly, the magnitude of these late theta oscillations not only differed with the temporal characteristics of the stimuli but also served to predict participants' task performance. Our analysis thus reveals that suppression of single-trial brain responses by visual speech depends strongly on the temporal concordance of the auditory and visual inputs. It further illustrates that processes in the lower theta band, which we suggest as an index of incongruity processing, might serve to reflect the neural correlates of individual differences in multisensory temporal perception.
13

de la Vaux, Steven K., und Dominic W. Massaro. „Audiovisual speech gating: examining information and information processing“. Cognitive Processing 5, Nr. 2 (23.04.2004): 106–12. http://dx.doi.org/10.1007/s10339-004-0014-2.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
14

Alsius, Agnès, Martin Paré und Kevin G. Munhall. „Forty Years After Hearing Lips and Seeing Voices: the McGurk Effect Revisited“. Multisensory Research 31, Nr. 1-2 (2018): 111–44. http://dx.doi.org/10.1163/22134808-00002565.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Since its discovery 40 years ago, the McGurk illusion has been usually cited as a prototypical paradigmatic case of multisensory binding in humans, and has been extensively used in speech perception studies as a proxy measure for audiovisual integration mechanisms. Despite the well-established practice of using the McGurk illusion as a tool for studying the mechanisms underlying audiovisual speech integration, the magnitude of the illusion varies enormously across studies. Furthermore, the processing of McGurk stimuli differs from congruent audiovisual processing at both phenomenological and neural levels. This questions the suitability of this illusion as a tool to quantify the necessary and sufficient conditions under which audiovisual integration occurs in natural conditions. In this paper, we review some of the practical and theoretical issues related to the use of the McGurk illusion as an experimental paradigm. We believe that, without a richer understanding of the mechanisms involved in the processing of the McGurk effect, experimenters should be really cautious when generalizing data generated by McGurk stimuli to matching audiovisual speech events.
15

Moradi, Shahram, und Jerker Rönnberg. „Perceptual Doping: A Hypothesis on How Early Audiovisual Speech Stimulation Enhances Subsequent Auditory Speech Processing“. Brain Sciences 13, Nr. 4 (01.04.2023): 601. http://dx.doi.org/10.3390/brainsci13040601.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Face-to-face communication is one of the most common means of communication in daily life. We benefit from both auditory and visual speech signals that lead to better language understanding. People prefer face-to-face communication when access to auditory speech cues is limited because of background noise in the surrounding environment or in the case of hearing impairment. We demonstrated that an early, short period of exposure to audiovisual speech stimuli facilitates subsequent auditory processing of speech stimuli for correct identification, but early auditory exposure does not. We called this effect “perceptual doping” as an early audiovisual speech stimulation dopes or recalibrates auditory phonological and lexical maps in the mental lexicon in a way that results in better processing of auditory speech signals for correct identification. This short opinion paper provides an overview of perceptual doping and how it differs from similar auditory perceptual aftereffects following exposure to audiovisual speech materials, its underlying cognitive mechanism, and its potential usefulness in the aural rehabilitation of people with hearing difficulties.
16

Ujiie, Yuta, und Kohske Takahashi. „Weaker McGurk Effect for Rubin’s Vase-Type Speech in People With High Autistic Traits“. Multisensory Research 34, Nr. 6 (16.04.2021): 663–79. http://dx.doi.org/10.1163/22134808-bja10047.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Abstract While visual information from facial speech modulates auditory speech perception, it is less influential on audiovisual speech perception among autistic individuals than among typically developed individuals. In this study, we investigated the relationship between autistic traits (Autism-Spectrum Quotient; AQ) and the influence of visual speech on the recognition of Rubin’s vase-type speech stimuli with degraded facial speech information. Participants were 31 university students (13 males and 18 females; mean age: 19.2, SD: 1.13 years) who reported normal (or corrected-to-normal) hearing and vision. All participants completed three speech recognition tasks (visual, auditory, and audiovisual stimuli) and the AQ–Japanese version. The results showed that accuracies of speech recognition for visual (i.e., lip-reading) and auditory stimuli were not significantly related to participants’ AQ. In contrast, audiovisual speech perception was less susceptible to facial speech perception among individuals with high rather than low autistic traits. The weaker influence of visual information on audiovisual speech perception in autism spectrum disorder (ASD) was robust regardless of the clarity of the visual information, suggesting a difficulty in the process of audiovisual integration rather than in the visual processing of facial speech.
17

Drebing, Daniel, Jared Medina, H. Branch Coslett, Jeffrey T. Shenton und Roy H. Hamilton. „An acquired deficit of intermodal temporal processing for audiovisual speech: A case study“. Seeing and Perceiving 25 (2012): 186. http://dx.doi.org/10.1163/187847612x648152.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Integrating sensory information across modalities is necessary for a cohesive experience of the world; disrupting the ability to bind the multisensory stimuli arising from an event leads to a disjointed and confusing percept. We previously reported (Hamilton et al., 2006) a patient, AWF, who suffered an acute neural incident after which he displayed a distinct inability to integrate auditory and visual speech information. While our prior experiments involving AWF suggested that he had a deficit of audiovisual speech processing, they did not explore the hypothesis that his deficits in audiovisual integration are restricted to speech. In order to test this notion, we conducted a series of experiments aimed at testing AWF’s ability to integrate cross-modal information from both speech and non-speech events. AWF was tasked with making temporal order judgments (TOJs) for videos of object noises (such as hands clapping) or speech, wherein the onsets of auditory and visual information were manipulated. Results from the experiments show that while AWF performed worse than controls in his ability to accurately judge even the most salient onset differences for speech videos, he did not differ significantly from controls in his ability to make TOJs for the object videos. These results illustrate the possibility of disruption of intermodal binding for audiovisual speech events with spared binding for real-world, non-speech events.
18

Mishra, Sushmit, Thomas Lunner, Stefan Stenfelt, Jerker Rönnberg und Mary Rudner. „Visual Information Can Hinder Working Memory Processing of Speech“. Journal of Speech, Language, and Hearing Research 56, Nr. 4 (August 2013): 1120–32. http://dx.doi.org/10.1044/1092-4388(2012/12-0033).

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Purpose The purpose of the present study was to evaluate the new Cognitive Spare Capacity Test (CSCT), which measures aspects of working memory capacity for heard speech in the audiovisual and auditory-only modalities of presentation. Method In Experiment 1, 20 young adults with normal hearing performed the CSCT and an independent battery of cognitive tests. In the CSCT, they listened to and recalled 2-digit numbers according to instructions inducing executive processing at 2 different memory loads. In Experiment 2, 10 participants performed a less executively demanding free recall task using the same stimuli. Results CSCT performance demonstrated an effect of memory load and was associated with independent measures of executive function and inference making but not with general working memory capacity. Audiovisual presentation was associated with lower CSCT scores but higher free recall performance scores. Conclusions CSCT is an executively challenging test of the ability to process heard speech. It captures cognitive aspects of listening related to sentence comprehension that are quantitatively and qualitatively different from working memory capacity. Visual information provided in the audiovisual modality of presentation can hinder executive processing in working memory of nondegraded speech material.
19

Thézé, Raphaël, Anne-Lise Giraud und Pierre Mégevand. „The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech“. Science Advances 6, Nr. 45 (November 2020): eabc6348. http://dx.doi.org/10.1126/sciadv.abc6348.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
When we see our interlocutor, our brain seamlessly extracts visual cues from their face and processes them along with the sound of their voice, making speech an intrinsically multimodal signal. Visual cues are especially important in noisy environments, when the auditory signal is less reliable. Neuronal oscillations might be involved in the cortical processing of audiovisual speech by selecting which sensory channel contributes more to perception. To test this, we designed computer-generated naturalistic audiovisual speech stimuli where one mismatched phoneme-viseme pair in a key word of sentences created bistable perception. Neurophysiological recordings (high-density scalp and intracranial electroencephalography) revealed that the precise phase angle of theta-band oscillations in posterior temporal and occipital cortex of the right hemisphere was crucial to select whether the auditory or the visual speech cue drove perception. We demonstrate that the phase of cortical oscillations acts as an instrument for sensory selection in audiovisual speech processing.
20

Hertrich, Ingo, Hermann Ackermann, Klaus Mathiak und Werner Lutzenberger. „Early stages of audiovisual speech processing—a magnetoencephalography study“. Journal of the Acoustical Society of America 121, Nr. 5 (Mai 2007): 3044. http://dx.doi.org/10.1121/1.4781737.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
21

Harwood, Vanessa, Alisa Baron, Daniel Kleinman, Luca Campanelli, Julia Irwin und Nicole Landi. „Event-Related Potentials in Assessing Visual Speech Cues in the Broader Autism Phenotype: Evidence from a Phonemic Restoration Paradigm“. Brain Sciences 13, Nr. 7 (30.06.2023): 1011. http://dx.doi.org/10.3390/brainsci13071011.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Audiovisual speech perception includes the simultaneous processing of auditory and visual speech. Deficits in audiovisual speech perception are reported in autistic individuals; however, less is known regarding audiovisual speech perception within the broader autism phenotype (BAP), which includes individuals with elevated, yet subclinical, levels of autistic traits. We investigate the neural indices of audiovisual speech perception in adults exhibiting a range of autism-like traits using event-related potentials (ERPs) in a phonemic restoration paradigm. In this paradigm, we consider conditions where speech articulators (mouth and jaw) are present (AV condition) and obscured by a pixelated mask (PX condition). These two face conditions were included in both passive (simply viewing a speaking face) and active (participants were required to press a button for a specific consonant–vowel stimulus) experiments. The results revealed an N100 ERP component which was present for all listening contexts and conditions; however, it was attenuated in the active AV condition where participants were able to view the speaker’s face, including the mouth and jaw. The P300 ERP component was present within the active experiment only, and significantly greater within the AV condition compared to the PX condition. This suggests increased neural effort for detecting deviant stimuli when visible articulation was present and visual influence on perception. Finally, the P300 response was negatively correlated with autism-like traits, suggesting that higher autistic traits were associated with generally smaller P300 responses in the active AV and PX conditions. The conclusions support the finding that atypical audiovisual processing may be characteristic of the BAP in adults.
22

Vroomen, Jean, und Jeroen J. Stekelenburg. „Visual Anticipatory Information Modulates Multisensory Interactions of Artificial Audiovisual Stimuli“. Journal of Cognitive Neuroscience 22, Nr. 7 (Juli 2010): 1583–96. http://dx.doi.org/10.1162/jocn.2009.21308.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
The neural activity of speech sound processing (the N1 component of the auditory ERP) can be suppressed if a speech sound is accompanied by concordant lip movements. Here we demonstrate that this audiovisual interaction is neither speech specific nor linked to humanlike actions but can be observed with artificial stimuli if their timing is made predictable. In Experiment 1, a pure tone synchronized with a deformation of a rectangle induced a smaller auditory N1 than auditory-only presentations if the temporal occurrence of this audiovisual event was made predictable by two moving disks that touched the rectangle. Local autoregressive average source estimation indicated that this audiovisual interaction may be related to integrative processing in auditory areas. When the moving disks did not precede the audiovisual stimulus—making the onset unpredictable—there was no N1 reduction. In Experiment 2, the predictability of the leading visual signal was manipulated by introducing a temporal asynchrony between the audiovisual event and the collision of moving disks. Audiovisual events occurred either at the moment, before (too “early”), or after (too “late”) the disks collided on the rectangle. When asynchronies varied from trial to trial—rendering the moving disks unreliable temporal predictors of the audiovisual event—the N1 reduction was abolished. These results demonstrate that the N1 suppression is induced by visual information that both precedes and reliably predicts audiovisual onset, without a necessary link to human action-related neural mechanisms.
23

Ghaneirad, Erfan, Ellyn Saenger, Gregor R. Szycik, Anja Čuš, Laura Möde, Christopher Sinke, Daniel Wiswede, Stefan Bleich und Anna Borgolte. „Deficient Audiovisual Speech Perception in Schizophrenia: An ERP Study“. Brain Sciences 13, Nr. 6 (19.06.2023): 970. http://dx.doi.org/10.3390/brainsci13060970.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
In everyday verbal communication, auditory speech perception is often disturbed by background noise. Especially in disadvantageous hearing conditions, additional visual articulatory information (e.g., lip movement) can positively contribute to speech comprehension. Patients with schizophrenia (SZs) demonstrate an aberrant ability to integrate visual and auditory sensory input during speech perception. Current findings about underlying neural mechanisms of this deficit are inconsistent. Particularly and despite the importance of early sensory processing in speech perception, very few studies have addressed these processes in SZs. Thus, in the present study, we examined 20 adult subjects with SZ and 21 healthy controls (HCs) while presenting audiovisual spoken words (disyllabic nouns) either superimposed by white noise (−12 dB signal-to-noise ratio) or not. In addition to behavioral data, event-related brain potentials (ERPs) were recorded. Our results demonstrate reduced speech comprehension for SZs compared to HCs under noisy conditions. Moreover, we found altered N1 amplitudes in SZ during speech perception, while P2 amplitudes and the N1-P2 complex were similar to HCs, indicating that there may be disturbances in multimodal speech perception at an early stage of processing, which may be due to deficits in auditory speech perception. Moreover, a positive relationship between fronto-central N1 amplitudes and the positive subscale of the Positive and Negative Syndrome Scale (PANSS) has been observed.
24

McCotter, Maxine V., und Timothy R. Jordan. „The Role of Facial Colour and Luminance in Visual and Audiovisual Speech Perception“. Perception 32, Nr. 8 (August 2003): 921–36. http://dx.doi.org/10.1068/p3316.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
We conducted four experiments to investigate the role of colour and luminance information in visual and audiovisual speech perception. In experiments la (stimuli presented in quiet conditions) and 1b (stimuli presented in auditory noise), face display types comprised naturalistic colour (NC), grey-scale (GS), and luminance inverted (LI) faces. In experiments 2a (quiet) and 2b (noise), face display types comprised NC, colour inverted (CI), LI, and colour and luminance inverted (CLI) faces. Six syllables and twenty-two words were used to produce auditory and visual speech stimuli. Auditory and visual signals were combined to produce congruent and incongruent audiovisual speech stimuli. Experiments 1a and 1b showed that perception of visual speech, and its influence on identifying the auditory components of congruent and incongruent audiovisual speech, was less for LI than for either NC or GS faces, which produced identical results. Experiments 2a and 2b showed that perception of visual speech, and influences on perception of incongruent auditory speech, was less for LI and CLI faces than for NC and CI faces (which produced identical patterns of performance). Our findings for NC and CI faces suggest that colour is not critical for perception of visual and audiovisual speech. The effect of luminance inversion on performance accuracy was relatively small (5%), which suggests that the luminance information preserved in LI faces is important for the processing of visual and audiovisual speech.
25

Tye-Murray, Nancy, Brent P. Spehar, Joel Myerson, Sandra Hale und Mitchell S. Sommers. „The self-advantage in visual speech processing enhances audiovisual speech recognition in noise“. Psychonomic Bulletin & Review 22, Nr. 4 (25.11.2014): 1048–53. http://dx.doi.org/10.3758/s13423-014-0774-3.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
26

Bernstein, Lynne E., Zhong-Lin Lu und Jintao Jiang. „Quantified acoustic–optical speech signal incongruity identifies cortical sites of audiovisual speech processing“. Brain Research 1242 (November 2008): 172–84. http://dx.doi.org/10.1016/j.brainres.2008.04.018.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
27

Crosse, Michael J., und Edmund C. Lalor. „The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech“. Journal of Neurophysiology 111, Nr. 7 (01.04.2014): 1400–1408. http://dx.doi.org/10.1152/jn.00690.2013.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information.
28

Roa Romero, Yadira, Daniel Senkowski und Julian Keil. „Early and late beta-band power reflect audiovisual perception in the McGurk illusion“. Journal of Neurophysiology 113, Nr. 7 (April 2015): 2342–50. http://dx.doi.org/10.1152/jn.00783.2014.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
The McGurk illusion is a prominent example of audiovisual speech perception and the influence that visual stimuli can have on auditory perception. In this illusion, a visual speech stimulus influences the perception of an incongruent auditory stimulus, resulting in a fused novel percept. In this high-density electroencephalography (EEG) study, we were interested in the neural signatures of the subjective percept of the McGurk illusion as a phenomenon of speech-specific multisensory integration. Therefore, we examined the role of cortical oscillations and event-related responses in the perception of congruent and incongruent audiovisual speech. We compared the cortical activity elicited by objectively congruent syllables with incongruent audiovisual stimuli. Importantly, the latter elicited a subjectively congruent percept: the McGurk illusion. We found that early event-related responses (N1) to audiovisual stimuli were reduced during the perception of the McGurk illusion compared with congruent stimuli. Most interestingly, our study showed a stronger poststimulus suppression of beta-band power (13–30 Hz) at short (0–500 ms) and long (500–800 ms) latencies during the perception of the McGurk illusion compared with congruent stimuli. Our study demonstrates that auditory perception is influenced by visual context and that the subsequent formation of a McGurk illusion requires stronger audiovisual integration even at early processing stages. Our results provide evidence that beta-band suppression at early stages reflects stronger stimulus processing in the McGurk illusion. Moreover, stronger late beta-band suppression in McGurk illusion indicates the resolution of incongruent physical audiovisual input and the formation of a coherent, illusory multisensory percept.
29

Dunham, Kacie, Alisa Zoltowski, Jacob I. Feldman, Samona Davis, Baxter Rogers, Michelle D. Failla, Mark T. Wallace, Carissa J. Cascio und Tiffany G. Woynaroski. „Neural Correlates of Audiovisual Speech Processing in Autistic and Non-Autistic Youth“. Multisensory Research 36, Nr. 3 (19.01.2023): 263–88. http://dx.doi.org/10.1163/22134808-bja10093.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Abstract Autistic youth demonstrate differences in processing multisensory information, particularly in temporal processing of multisensory speech. Extensive research has identified several key brain regions for multisensory speech processing in non-autistic adults, including the superior temporal sulcus (STS) and insula, but it is unclear to what extent these regions are involved in temporal processing of multisensory speech in autistic youth. As a first step in exploring the neural substrates of multisensory temporal processing in this clinical population, we employed functional magnetic resonance imaging (fMRI) with a simultaneity-judgment audiovisual speech task. Eighteen autistic youth and a comparison group of 20 non-autistic youth matched on chronological age, biological sex, and gender participated. Results extend prior findings from studies of non-autistic adults, with non-autistic youth demonstrating responses in several similar regions as previously implicated in adult temporal processing of multisensory speech. Autistic youth demonstrated responses in fewer of the multisensory regions identified in adult studies; responses were limited to visual and motor cortices. Group responses in the middle temporal gyrus significantly interacted with age; younger autistic individuals showed reduced MTG responses whereas older individuals showed comparable MTG responses relative to non-autistic controls. Across groups, responses in the precuneus covaried with task accuracy, and anterior temporal and insula responses covaried with nonverbal IQ. These preliminary findings suggest possible differences in neural mechanisms of audiovisual processing in autistic youth while highlighting the need to consider participant characteristics in future, larger-scale studies exploring the neural basis of multisensory function in autism.
30

Vakhshiteh, Fatemeh, und Farshad Almasganj. „Exploration of Properly Combined Audiovisual Representation with the Entropy Measure in Audiovisual Speech Recognition“. Circuits, Systems, and Signal Processing 38, Nr. 6 (09.11.2018): 2523–43. http://dx.doi.org/10.1007/s00034-018-0975-5.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
31

Lalonde, Kaylah, und Rachael Frush Holt. „Audiovisual speech integration development at varying levels of perceptual processing“. Journal of the Acoustical Society of America 136, Nr. 4 (Oktober 2014): 2263. http://dx.doi.org/10.1121/1.4900174.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
32

Lalonde, Kaylah, und Rachael Frush Holt. „Audiovisual speech perception development at varying levels of perceptual processing“. Journal of the Acoustical Society of America 139, Nr. 4 (April 2016): 1713–23. http://dx.doi.org/10.1121/1.4945590.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
33

Zhang, Yang, Bing Cheng, Tess Koerner, Christine Cao, Edward Carney und Yue Wang. „Cortical processing of audiovisual speech perception in infancy and adulthood“. Journal of the Acoustical Society of America 134, Nr. 5 (November 2013): 4234. http://dx.doi.org/10.1121/1.4831559.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
34

Barrós-Loscertales, Alfonso, Noelia Ventura-Campos, Maya Visser, Agnès Alsius, Christophe Pallier, César Ávila Rivera und Salvador Soto-Faraco. „Neural correlates of audiovisual speech processing in a second language“. Brain and Language 126, Nr. 3 (September 2013): 253–62. http://dx.doi.org/10.1016/j.bandl.2013.05.009.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
35

Loh, Marco, Gabriele Schmid, Gustavo Deco und Wolfram Ziegler. „Audiovisual Matching in Speech and Nonspeech Sounds: A Neurodynamical Model“. Journal of Cognitive Neuroscience 22, Nr. 2 (Februar 2010): 240–47. http://dx.doi.org/10.1162/jocn.2009.21202.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Audiovisual speech perception provides an opportunity to investigate the mechanisms underlying multimodal processing. By using nonspeech stimuli, it is possible to investigate the degree to which audiovisual processing is specific to the speech domain. It has been shown in a match-to-sample design that matching across modalities is more difficult in the nonspeech domain as compared to the speech domain. We constructed a biophysically realistic neural network model simulating this experimental evidence. We propose that a stronger connection between modalities in speech underlies the behavioral difference between the speech and the nonspeech domain. This could be the result of more extensive experience with speech stimuli. Because the match-to-sample paradigm does not allow us to draw conclusions concerning the integration of auditory and visual information, we also simulated two further conditions based on the same paradigm, which tested the integration of auditory and visual information within a single stimulus. New experimental data for these two conditions support the simulation results and suggest that audiovisual integration of discordant stimuli is stronger in speech than in nonspeech stimuli. According to the simulations, the connection strength between auditory and visual information, on the one hand, determines how well auditory information can be assigned to visual information, and on the other hand, it influences the magnitude of multimodal integration.
36

Tiippana, Kaisa. „Advances in Understanding the Phenomena and Processing in Audiovisual Speech Perception“. Brain Sciences 13, Nr. 9 (20.09.2023): 1345. http://dx.doi.org/10.3390/brainsci13091345.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
37

Hällgren, Mathias, Birgitta Larsby, Björn Lyxell und Stig Arlinger. „Evaluation of a Cognitive Test Battery in Young and Elderly Normal-Hearing and Hearing-Impaired Persons“. Journal of the American Academy of Audiology 12, Nr. 07 (Juli 2001): 357–70. http://dx.doi.org/10.1055/s-0042-1745620.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
AbstractA cognitive test battery sensitive to processes important for speech understanding was developed and investigated. Test stimuli are presented as text or in an auditory or audiovisual modality. The tests investigate phonologic processing and verbal information processing. Four subject groups, young/elderly with normal-hearing and young/elderly with hearing impairment, each including 12 subjects, participated in the study. The only significant effect in the text modality was an age effect in the speed of performance, seen also in the auditory and audiovisual modalities. In the auditory and audiovisual modalities, the effects of hearing status and modality were seen in accuracy parameters. Interactions between hearing status and modality, both in accuracy and in reaction times, show that hearing-impaired subjects have difficulties without visual cues. Performing the test battery in noise made the tasks more difficult, especially in the auditory modality and for the elderly, affecting both accuracy and speed. Test-retest measurements showed learning effects and a modality-dependent variability. The test battery has proven useful in assessing the relative contribution of different input signals and the effects of age, hearing impairment, and visual contribution on functions important for speech processing. Abbreviations: ANOVA = analysis of variance, PTA = pure-tone average, SNR = signal-to-noise ratio, SVIPS = speech and visual information processing system, TIPS = text information processing system
38

Lalonde, Kaylah, und Grace A. Dwyer. „Visual phonemic knowledge and audiovisual speech-in-noise perception in school-age children“. Journal of the Acoustical Society of America 153, Nr. 3_supplement (01.03.2023): A337. http://dx.doi.org/10.1121/10.0019067.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Our mental representations of speech sounds include information about the visible articulatory gestures that accompany different speech sounds. We call this visual phonemic knowledge. This study examined development of school-age children’s visual phonemic knowledge and their ability to use visual phonemic knowledge to supplement audiovisual speech processing. Sixty-two children (5–16 years) and 18 adults (19–35 years) completed auditory-only, visual-only, and audiovisual tests of consonant-vowel syllable repetition. Auditory-only and audiovisual conditions were presented in steady-state, speech-spectrum noise at individually set SNRs. Consonant confusions were analyzed to define visemes (clusters of phonemes that are visually confusable with one another but visually distinct from other phonemes) evident in adults’ responses to visual-only consonants and to compute the proportion of errors in each participant and modality that were within adults' visemes. Children were less accurate than adults at visual-only consonant identification. However, children as young as 5 years of age demonstrated some visual phonemic knowledge. Comparison of error patterns across conditions indicated that children used visual phonemic knowledge during audiovisual speech-in-noise recognition. Details regarding the order of acquisition of viseme will be discussed.
39

Costa-Giomi, Eugenia. „Mode of Presentation Affects Infants’ Preferential Attention to Singing and Speech“. Music Perception 32, Nr. 2 (01.12.2014): 160–69. http://dx.doi.org/10.1525/mp.2014.32.2.160.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Almost from birth, infants prefer to attend to human vocalizations associated with speech over many other sounds. However, studies that have focused on infants’ differential attention to speech and singing have failed to show a speech listening bias. The purpose of the study was to investigate infants’ preferential attention to singing and speech presented in audiovisual and auditory mode. Using an infant-controlled preference procedure, 11-month-olds were presented with audiovisual stimuli depicting a woman singing or reciting a song (Experiment 1, audiovisual condition). The results showed that infants attended significantly longer to singing than to speech. In Experiment 2 (visual condition), infants watched the same videos presented with no sound and in Experiment 3 (auditory condition), they listened to the singing and speech stimuli in English and a foreign language. No differences in length of attention to singing and speech were found in either experiment. The results of the study reconcile the seemingly contradicting findings of previous investigations and show that mode of presentation affects infants’ preferential attention to speech and singing. The facilitating effects of facial cues on infants’ processing of speech and singing are discussed.
40

PONS, FERRAN, LLORENÇ ANDREU, MONICA SANZ-TORRENT, LUCÍA BUIL-LEGAZ und DAVID J. LEWKOWICZ. „Perception of audio-visual speech synchrony in Spanish-speaking children with and without specific language impairment“. Journal of Child Language 40, Nr. 3 (09.07.2012): 687–700. http://dx.doi.org/10.1017/s0305000912000189.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
ABSTRACTSpeech perception involves the integration of auditory and visual articulatory information, and thus requires the perception of temporal synchrony between this information. There is evidence that children with specific language impairment (SLI) have difficulty with auditory speech perception but it is not known if this is also true for the integration of auditory and visual speech. Twenty Spanish-speaking children with SLI, twenty typically developing age-matched Spanish-speaking children, and twenty Spanish-speaking children matched for MLU-w participated in an eye-tracking study to investigate the perception of audiovisual speech synchrony. Results revealed that children with typical language development perceived an audiovisual asynchrony of 666 ms regardless of whether the auditory or visual speech attribute led the other one. Children with SLI only detected the 666 ms asynchrony when the auditory component followed the visual component. None of the groups perceived an audiovisual asynchrony of 366 ms. These results suggest that the difficulty of speech processing by children with SLI would also involve difficulties in integrating auditory and visual aspects of speech perception.
41

Vatakis, Argiro, und Charles Spence. „Assessing audiovisual saliency and visual-information content in the articulation of consonants and vowels on audiovisual temporal perception“. Seeing and Perceiving 25 (2012): 29. http://dx.doi.org/10.1163/187847612x646514.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Research has revealed different temporal integration windows between and within different speech-tokens. The limited speech-tokens tested to date has not allowed for the proper evaluation of whether such differences are task or stimulus driven? We conducted a series of experiments to investigate how the physical differences associated with speech articulation affect the temporal aspects of audiovisual speech perception. Videos of consonants and vowels uttered by three speakers were presented. Participants made temporal order judgments (TOJs) regarding which speech-stream had been presented first. The sensitivity of participants’ TOJs and the point of subjective simultaneity (PSS) were analyzed as a function of the place, manner of articulation, and voicing for consonants, and the height/backness of the tongue and lip-roundedness for vowels. The results demonstrated that for the case of place of articulation/roundedness, participants were more sensitive to the temporal order of highly-salient speech-signals with smaller visual-leads at the PSS. This was not the case when the manner of articulation/height was evaluated. These findings suggest that the visual-speech signal provides substantial cues to the auditory-signal that modulate the relative processing times required for the perception of the speech-stream. A subsequent experiment explored how the presentation of different sources of visual-information modulated such findings. Videos of three consonants were presented under natural and point-light (PL) viewing conditions revealing parts, or the whole, face. Preliminary analysis revealed no differences in TOJ accuracy under different viewing conditions. However, the PSS data revealed significant differences in viewing conditions depending on the speech token uttered (e.g., larger visual-leads for PL-lip/teeth/tongue-only views).
42

Paris, Tim, Jeesun Kim und Christopher Davis. „Updating expectencies about audiovisual associations in speech“. Seeing and Perceiving 25 (2012): 164. http://dx.doi.org/10.1163/187847612x647946.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
The processing of multisensory information depends on the learned association between sensory cues. In the case of speech there is a well-learned association between the movements of the lips and the subsequent sound. That is, particular lip and mouth movements reliably lead to a specific sound. EEG and MEG studies that have investigated the differences between this ‘congruent’ AV association and other ‘incongruent’ associations have commonly reported ERP differences from 350 ms after sound onset. Using a 256 active electrode EEG system, we tested whether this ‘congruency effect’ would be reduced in the context where most of the trials had an altered audiovisual association (auditory speech paired with mismatched visual lip movements). Participants were presented stimuli over 2 sessions: in one session only 15% were incongruent trials; in the other session, 85% were incongruent trials. We found a congruency effect, showing differences in ERP between congruent and incongruent speech between 350 and 500 ms. Importantly, this effect was reduced within the context of mostly incongruent trials. This reduction in the congruency effect indicates that the way in which AV speech is processed depends on the context it is viewed in. Furthermore, this result suggests that exposure to novel sensory relationships leads to updated expectations regarding the relationship between auditory and visual speech cues.
43

Van der Burg, Erik, und Patrick T. Goodbourn. „Rapid, generalized adaptation to asynchronous audiovisual speech“. Proceedings of the Royal Society B: Biological Sciences 282, Nr. 1804 (07.04.2015): 20143083. http://dx.doi.org/10.1098/rspb.2014.3083.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
The brain is adaptive. The speed of propagation through air, and of low-level sensory processing, differs markedly between auditory and visual stimuli; yet the brain can adapt to compensate for the resulting cross-modal delays. Studies investigating temporal recalibration to audiovisual speech have used prolonged adaptation procedures, suggesting that adaptation is sluggish. Here, we show that adaptation to asynchronous audiovisual speech occurs rapidly. Participants viewed a brief clip of an actor pronouncing a single syllable. The voice was either advanced or delayed relative to the corresponding lip movements, and participants were asked to make a synchrony judgement. Although we did not use an explicit adaptation procedure, we demonstrate rapid recalibration based on a single audiovisual event. We find that the point of subjective simultaneity on each trial is highly contingent upon the modality order of the preceding trial. We find compelling evidence that rapid recalibration generalizes across different stimuli, and different actors. Finally, we demonstrate that rapid recalibration occurs even when auditory and visual events clearly belong to different actors. These results suggest that rapid temporal recalibration to audiovisual speech is primarily mediated by basic temporal factors, rather than higher-order factors such as perceived simultaneity and source identity.
44

Jerger, Susan, Markus F. Damian, Cassandra Karl und Hervé Abdi. „Developmental Shifts in Detection and Attention for Auditory, Visual, and Audiovisual Speech“. Journal of Speech, Language, and Hearing Research 61, Nr. 12 (10.12.2018): 3095–112. http://dx.doi.org/10.1044/2018_jslhr-h-17-0343.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Purpose Successful speech processing depends on our ability to detect and integrate multisensory cues, yet there is minimal research on multisensory speech detection and integration by children. To address this need, we studied the development of speech detection for auditory (A), visual (V), and audiovisual (AV) input. Method Participants were 115 typically developing children clustered into age groups between 4 and 14 years. Speech detection (quantified by response times [RTs]) was determined for 1 stimulus, /buh/, presented in A, V, and AV modes (articulating vs. static facial conditions). Performance was analyzed not only in terms of traditional mean RTs but also in terms of the faster versus slower RTs (defined by the 1st vs. 3rd quartiles of RT distributions). These time regions were conceptualized respectively as reflecting optimal detection with efficient focused attention versus less optimal detection with inefficient focused attention due to attentional lapses. Results Mean RTs indicated better detection (a) of multisensory AV speech than A speech only in 4- to 5-year-olds and (b) of A and AV inputs than V input in all age groups. The faster RTs revealed that AV input did not improve detection in any group. The slower RTs indicated that (a) the processing of silent V input was significantly faster for the articulating than static face and (b) AV speech or facial input significantly minimized attentional lapses in all groups except 6- to 7-year-olds (a peaked U-shaped curve). Apparently, the AV benefit observed for mean performance in 4- to 5-year-olds arose from effects of attention. Conclusions The faster RTs indicated that AV input did not enhance detection in any group, but the slower RTs indicated that AV speech and dynamic V speech (mouthing) significantly minimized attentional lapses and thus did influence performance. Overall, A and AV inputs were detected consistently faster than V input; this result endorsed stimulus-bound auditory processing by these children.
45

Treille, Avril, Coriandre Vilain, Sonia Kandel und Marc Sato. „Electrophysiological evidence for a self-processing advantage during audiovisual speech integration“. Experimental Brain Research 235, Nr. 9 (04.07.2017): 2867–76. http://dx.doi.org/10.1007/s00221-017-5018-0.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
46

Hueber, Thomas, Eric Tatulli, Laurent Girin und Jean-Luc Schwartz. „Evaluating the Potential Gain of Auditory and Audiovisual Speech-Predictive Coding Using Deep Learning“. Neural Computation 32, Nr. 3 (März 2020): 596–625. http://dx.doi.org/10.1162/neco_a_01264.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Sensory processing is increasingly conceived in a predictive framework in which neurons would constantly process the error signal resulting from the comparison of expected and observed stimuli. Surprisingly, few data exist on the accuracy of predictions that can be computed in real sensory scenes. Here, we focus on the sensory processing of auditory and audiovisual speech. We propose a set of computational models based on artificial neural networks (mixing deep feedforward and convolutional networks), which are trained to predict future audio observations from present and past audio or audiovisual observations (i.e., including lip movements). Those predictions exploit purely local phonetic regularities with no explicit call to higher linguistic levels. Experiments are conducted on the multispeaker LibriSpeech audio speech database (around 100 hours) and on the NTCD-TIMIT audiovisual speech database (around 7 hours). They appear to be efficient in a short temporal range (25–50 ms), predicting 50% to 75% of the variance of the incoming stimulus, which could result in potentially saving up to three-quarters of the processing power. Then they quickly decrease and almost vanish after 250 ms. Adding information on the lips slightly improves predictions, with a 5% to 10% increase in explained variance. Interestingly the visual gain vanishes more slowly, and the gain is maximum for a delay of 75 ms between image and predicted sound.
47

Gijbels, Liesbeth, Jason D. Yeatman, Kaylah Lalonde und Adrian K. C. Lee. „Audiovisual Speech Processing in Relationship to Phonological and Vocabulary Skills in First Graders“. Journal of Speech, Language, and Hearing Research 64, Nr. 12 (13.12.2021): 5022–40. http://dx.doi.org/10.1044/2021_jslhr-21-00196.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Purpose: It is generally accepted that adults use visual cues to improve speech intelligibility in noisy environments, but findings regarding visual speech benefit in children are mixed. We explored factors that contribute to audiovisual (AV) gain in young children's speech understanding. We examined whether there is an AV benefit to speech-in-noise recognition in children in first grade and if visual salience of phonemes influences their AV benefit. We explored if individual differences in AV speech enhancement could be explained by vocabulary knowledge, phonological awareness, or general psychophysical testing performance. Method: Thirty-seven first graders completed online psychophysical experiments. We used an online single-interval, four-alternative forced-choice picture-pointing task with age-appropriate consonant–vowel–consonant words to measure auditory-only, visual-only, and AV word recognition in noise at −2 and −8 dB SNR. We obtained standard measures of vocabulary and phonological awareness and included a general psychophysical test to examine correlations with AV benefits. Results: We observed a significant overall AV gain among children in first grade. This effect was mainly attributed to the benefit at −8 dB SNR, for visually distinct targets. Individual differences were not explained by any of the child variables. Boys showed lower auditory-only performances, leading to significantly larger AV gains. Conclusions: This study shows AV benefit, of distinctive visual cues, to word recognition in challenging noisy conditions in first graders. The cognitive and linguistic constraints of the task may have minimized the impact of individual differences of vocabulary and phonological awareness on AV benefit. The gender difference should be studied on a larger sample and age range.
48

Hertrich, Ingo, Susanne Dietrich und Hermann Ackermann. „Cross-modal Interactions during Perception of Audiovisual Speech and Nonspeech Signals: An fMRI Study“. Journal of Cognitive Neuroscience 23, Nr. 1 (Januar 2011): 221–37. http://dx.doi.org/10.1162/jocn.2010.21421.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
During speech communication, visual information may interact with the auditory system at various processing stages. Most noteworthy, recent magnetoencephalography (MEG) data provided first evidence for early and preattentive phonetic/phonological encoding of the visual data stream—prior to its fusion with auditory phonological features [Hertrich, I., Mathiak, K., Lutzenberger, W., & Ackermann, H. Time course of early audiovisual interactions during speech and non-speech central-auditory processing: An MEG study. Journal of Cognitive Neuroscience, 21, 259–274, 2009]. Using functional magnetic resonance imaging, the present follow-up study aims to further elucidate the topographic distribution of visual–phonological operations and audiovisual (AV) interactions during speech perception. Ambiguous acoustic syllables—disambiguated to /pa/ or /ta/ by the visual channel (speaking face)—served as test materials, concomitant with various control conditions (nonspeech AV signals, visual-only and acoustic-only speech, and nonspeech stimuli). (i) Visual speech yielded an AV-subadditive activation of primary auditory cortex and the anterior superior temporal gyrus (STG), whereas the posterior STG responded both to speech and nonspeech motion. (ii) The inferior frontal and the fusiform gyrus of the right hemisphere showed a strong phonetic/phonological impact (differential effects of visual /pa/ vs. /ta/) upon hemodynamic activation during presentation of speaking faces. Taken together with the previous MEG data, these results point at a dual-pathway model of visual speech information processing: On the one hand, access to the auditory system via the anterior supratemporal “what” path may give rise to direct activation of “auditory objects.” On the other hand, visual speech information seems to be represented in a right-hemisphere visual working memory, providing a potential basis for later interactions with auditory information such as the McGurk effect.
49

Jansen, Samantha D., Joseph R. Keebler und Alex Chaparro. „Shifts in Maximum Audiovisual Integration with Age“. Multisensory Research 31, Nr. 3-4 (2018): 191–212. http://dx.doi.org/10.1163/22134808-00002599.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Annotation:
Listeners attempting to understand speech in noisy environments rely on visual and auditory processes, typically referred to as audiovisual processing. Noise corrupts the auditory speech signal and listeners naturally leverage visual cues from the talker’s face in an attempt to interpret the degraded auditory signal. Studies of speech intelligibility in noise show that the maximum improvement in speech recognition performance (i.e., maximum visual enhancement or VEmax), derived from seeing an interlocutor’s face, is invariant with age. Several studies have reported that VEmaxis typically associated with a signal-to-noise (SNR) of −12 dB; however, few studies have systematically investigated whether the SNR associated with VEmaxchanges with age. We investigated if VEmaxchanges as a function of age, whether the SNR at VEmaxchanges as a function of age, and what perceptual/cognitive abilities account for or mediate such relationships. We measured VEmaxon a nongeriatric adult sample () ranging in age from 20 to 59 years old. We found that VEmaxwas age-invariant, replicating earlier studies. No perceptual/cognitive measures predicted VEmax, most likely due to limited variance in VEmaxscores. Importantly, we found that the SNR at VEmaxshifts toward higher (quieter) SNR levels with increasing age; however, this relationship is partially mediated by working memory capacity, where those with larger working memory capacities (WMCs) can identify speech under lower (louder) SNR levels than their age equivalents with smaller WMCs. The current study is the first to report that individual differences in WMC partially mediate the age-related shift in SNR at VEmax.
50

Schabus, Dietmar, Michael Pucher und Gregor Hofer. „Joint Audiovisual Hidden Semi-Markov Model-Based Speech Synthesis“. IEEE Journal of Selected Topics in Signal Processing 8, Nr. 2 (April 2014): 336–47. http://dx.doi.org/10.1109/jstsp.2013.2281036.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Zur Bibliographie