Auswahl der wissenschaftlichen Literatur zum Thema „Audiovisual speech processing“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Audiovisual speech processing" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Zeitschriftenartikel zum Thema "Audiovisual speech processing":
Tsuhan Chen. „Audiovisual speech processing“. IEEE Signal Processing Magazine 18, Nr. 1 (2001): 9–21. http://dx.doi.org/10.1109/79.911195.
Vatikiotis-Bateson, Eric, und Takaaki Kuratate. „Overview of audiovisual speech processing“. Acoustical Science and Technology 33, Nr. 3 (2012): 135–41. http://dx.doi.org/10.1250/ast.33.135.
Francisco, Ana A., Alexandra Jesse, Margriet A. Groen und James M. McQueen. „A General Audiovisual Temporal Processing Deficit in Adult Readers With Dyslexia“. Journal of Speech, Language, and Hearing Research 60, Nr. 1 (Januar 2017): 144–58. http://dx.doi.org/10.1044/2016_jslhr-h-15-0375.
Bernstein, Lynne E., Edward T. Auer, Michael Wagner und Curtis W. Ponton. „Spatiotemporal dynamics of audiovisual speech processing“. NeuroImage 39, Nr. 1 (Januar 2008): 423–35. http://dx.doi.org/10.1016/j.neuroimage.2007.08.035.
Sams, M. „Audiovisual Speech Perception“. Perception 26, Nr. 1_suppl (August 1997): 347. http://dx.doi.org/10.1068/v970029.
Ojanen, Ville, Riikka Möttönen, Johanna Pekkola, Iiro P. Jääskeläinen, Raimo Joensuu, Taina Autti und Mikko Sams. „Processing of audiovisual speech in Broca's area“. NeuroImage 25, Nr. 2 (April 2005): 333–38. http://dx.doi.org/10.1016/j.neuroimage.2004.12.001.
Stevenson, Ryan A., Nicholas A. Altieri, Sunah Kim, David B. Pisoni und Thomas W. James. „Neural processing of asynchronous audiovisual speech perception“. NeuroImage 49, Nr. 4 (Februar 2010): 3308–18. http://dx.doi.org/10.1016/j.neuroimage.2009.12.001.
Hamilton, Roy H., Jeffrey T. Shenton und H. Branch Coslett. „An acquired deficit of audiovisual speech processing“. Brain and Language 98, Nr. 1 (Juli 2006): 66–73. http://dx.doi.org/10.1016/j.bandl.2006.02.001.
Dunham-Carr, Kacie, Jacob I. Feldman, David M. Simon, Sarah R. Edmunds, Alexander Tu, Wayne Kuang, Julie G. Conrad, Pooja Santapuram, Mark T. Wallace und Tiffany G. Woynaroski. „The Processing of Audiovisual Speech Is Linked with Vocabulary in Autistic and Nonautistic Children: An ERP Study“. Brain Sciences 13, Nr. 7 (08.07.2023): 1043. http://dx.doi.org/10.3390/brainsci13071043.
Tomalski, Przemysław. „Developmental Trajectory of Audiovisual Speech Integration in Early Infancy. A Review of Studies Using the McGurk Paradigm“. Psychology of Language and Communication 19, Nr. 2 (01.10.2015): 77–100. http://dx.doi.org/10.1515/plc-2015-0006.
Dissertationen zum Thema "Audiovisual speech processing":
Morís, Fernández Luis 1982. „Audiovisual speech processing: the role of attention and conflict“. Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/385348.
Los eventos que suceden a nuestro alrededor, no suelen estimular una única modalidad sensorial, sino que, al contrario, suelen involucrar varias modalidades sensoriales las cuales ofrecen información complementaria. La información proveniente de estas diferentes modalidades es integrada en un único percepto a través del proceso denominado integración multisensorial. Esta tesis estudia cómo y bajo qué circunstancias ocurre este proceso en el contexto audiovisual del habla. Los resultados de esta tesis cuestionan los enfoques previos que describían la integración audiovisual como un proceso automático y de bajo nivel. Primero, demuestra que el estado atencional es determinante en el proceso de integración multisensorial. Más concretamente, presenta pruebas de la necesidad de atender a ambas modalidades, visual y auditiva, para que ocurra el proceso de integración. Y en segundo lugar, presenta pruebas de la participación de procesos de alto nivel (i.e. detección y resolución de conflictos) cuando existe una incongruencia entre la modalidad auditiva y visual, especialmente en el caso del efecto McGurk.
Copeland, Laura. „Audiovisual processing of affective and linguistic prosody : an event-related fMRI study“. Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=111605.
Krause, Hanna [Verfasser], und Andreas K. [Akademischer Betreuer] Engel. „Audiovisual processing in Schizophrenia : neural responses in audiovisual speech interference and semantic priming / Hanna Krause. Betreuer: Andreas K. Engel“. Hamburg : Staats- und Universitätsbibliothek Hamburg, 2015. http://d-nb.info/1075858569/34.
Sadok, Samir. „Audiovisual speech representation learning applied to emotion recognition“. Electronic Thesis or Diss., CentraleSupélec, 2024. http://www.theses.fr/2024CSUP0003.
Emotions are vital in our daily lives, becoming a primary focus of ongoing research. Automatic emotion recognition has gained considerable attention owing to its wide-ranging applications across sectors such as healthcare, education, entertainment, and marketing. This advancement in emotion recognition is pivotal for fostering the development of human-centric artificial intelligence. Supervised emotion recognition systems have significantly improved over traditional machine learning approaches. However, this progress encounters limitations due to the complexity and ambiguous nature of emotions. Acquiring extensive emotionally labeled datasets is costly, time-intensive, and often impractical.Moreover, the subjective nature of emotions results in biased datasets, impacting the learning models' applicability in real-world scenarios. Motivated by how humans learn and conceptualize complex representations from an early age with minimal supervision, this approach demonstrates the effectiveness of leveraging prior experience to adapt to new situations. Unsupervised or self-supervised learning models draw inspiration from this paradigm. Initially, they aim to establish a general representation learning from unlabeled data, akin to the foundational prior experience in human learning. These representations should adhere to criteria like invariance, interpretability, and effectiveness. Subsequently, these learned representations are applied to downstream tasks with limited labeled data, such as emotion recognition. This mirrors the assimilation of new situations in human learning. In this thesis, we aim to propose unsupervised and self-supervised representation learning methods designed explicitly for multimodal and sequential data and to explore their potential advantages in the context of emotion recognition tasks. The main contributions of this thesis encompass:1. Developing generative models via unsupervised or self-supervised learning for audiovisual speech representation learning, incorporating joint temporal and multimodal (audiovisual) modeling.2. Structuring the latent space to enable disentangled representations, enhancing interpretability by controlling human-interpretable latent factors.3. Validating the effectiveness of our approaches through both qualitative and quantitative analyses, in particular on emotion recognition task. Our methods facilitate signal analysis, transformation, and generation
Biau, Emmanuel 1985. „Beat gestures and speech processing: when prosody extends to the speaker's hands“. Doctoral thesis, Universitat Pompeu Fabra, 2015. http://hdl.handle.net/10803/325429.
Los gestos acompañan de manera natural el discurso de los hablantes, de esta manera, la prosodia auditiva se traslada también a la modalidad visual a través de los gestos rítmicos que ayudan al hablante a estructurar el mensaje y a enfatizar la información relevante. El objetivo principal de esta tesis fue la investigación de la percepción de los gestos rítmicos y la actividad neuronal relacionada con estos. Esta se desarrolló con un enfoque naturalístico combinando la presentación de discursos políticos con técnicas de neuroimagen (ERPs, EEG y fMRI) y medidas conductuales. Sus principales hallazgos fueron, primero, que el procesado conjunto del habla y gestos rítmicos involucraron áreas relacionadas con el lenguaje, esto sugiere que los gestos y el habla forman parte de un único sistema del lenguaje. Segundo, que los gestos rítmicos modulan el procesamiento de las palabras a las que acompañan tanto en el momento de su pronunciación como en etapas posteriores. Concluimos que los oyentes perciben los gestos rítmicos como parte de la prosodia visual y utilizan su valor predictivo para anticipar la señal acústica de la palabra a la que preceden a través de procesos locales de atención.
Blomberg, Rina. „CORTICAL PHASE SYNCHRONISATION MEDIATES NATURAL FACE-SPEECH PERCEPTION“. Thesis, Linköpings universitet, Institutionen för datavetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-122825.
Girin, Laurent. „Débruitage de parole par un filtrage utilisant l'image du locuteur“. Grenoble INPG, 1997. http://www.theses.fr/1997INPG0207.
Teissier, Pascal. „Fusion de capteurs avec contrôle du contexte : application a la reconnaissance de parole dans le bruit“. Grenoble INPG, 1999. http://www.theses.fr/1999INPG0023.
Decroix, François-Xavier. „Apprentissage en ligne de signatures audiovisuelles pour la reconnaissance et le suivi de personnes au sein d'un réseau de capteurs ambiants“. Thesis, Toulouse 3, 2017. http://www.theses.fr/2017TOU30298/document.
The neOCampus operation, started in 2013 by Paul Sabatier University in Toulouse, aims to create a connected, innovative, intelligent and sustainable campus, by exploiting the skills of 11 laboratories and several industrial partners. These multidisciplinary skills are combined in order to improve users (students, teachers, administrative staff) daily comfort and to reduce the ecological footprint of the campus. The intelligence we want to bring to the campus of the future requires to provide to its buildings a perception of its intern activity. Indeed, optimizing the energy resources needs a characterization of the user's activities so that the building can automatically adapt itself to it. Human activity being open to multiple levels of interpretation, our work is focused on extracting people trajectories, its more elementary component. Characterizing users activities, in terms of movement, uses data extracted from cameras and microphones distributed in a room, forming a sparse network of heterogeneous sensors. From these data, we then seek to extract audiovisual signatures and rough localizations of the people transiting through this network of sensors. While protecting person privacy, signatures must be discriminative, to distinguish a person from another one, and compact, to optimize computational costs and enables the building to adapt itself. Having regard to these constraints, the characteristics we model are the speaker's timbre, and his appearance, in terms of colorimetric distribution. The scientific contributions of this thesis are thus at the intersection of the fields of speech processing and computer vision, by introducing new methods of fusing audio and visual signatures of individuals. To achieve this fusion, new sound source location indices as well as an audiovisual adaptation of a multi-target tracking method were introduced, representing the main contributions of this work. The thesis is structured in 4 chapters, and the first one presents the state of the art on visual reidentification of persons and speaker recognition. Acoustic and visual modalities are not correlated, so two signatures are separately computed, one for video and one for audio, using existing methods in the literature. After a first chapter dedicated to the state of the art in re-identification and speaker recognition methods, the details of the computation of the signatures is explored in chapter 2. The fusion of the signatures is then dealt as a problem of matching between audio and video observations, whose corresponding detections are spatially coherent and compatible. Two novel association strategies are introduced in chapter 3. Spatio-temporal coherence of the bimodal observations is then discussed in chapter 4, in a context of multi-target tracking
Robert-Ribes, Jordi. „Modèles d'intégration audiovisuelle de signaux linguistiques : de la perception humaine a la reconnaissance automatique des voyelles“. Grenoble INPG, 1995. http://www.theses.fr/1995INPG0032.
Bücher zum Thema "Audiovisual speech processing":
Bailly, Gerard, Pascal Perrier und Eric Vatikiotis-Bateson, Hrsg. Audiovisual Speech Processing. Cambridge: Cambridge University Press, 2012. http://dx.doi.org/10.1017/cbo9780511843891.
Bailly, G., Eric Vatikiotis-Bateson und Pascal Perrier. Audiovisual speech processing. Cambridge: Cambridge University Press, 2012.
Randazzo, Melissa. Audiovisual Integration in Apraxia of Speech: EEG Evidence for Processing Differences. [New York, N.Y.?]: [publisher not identified], 2016.
Vatikiotis-Bateson, Eric, Pascal Perrier und Gérard Bailly. Audiovisual Speech Processing. Cambridge University Press, 2012.
Vatikiotis-Bateson, Eric, Pascal Perrier und Gérard Bailly. Audiovisual Speech Processing. Cambridge University Press, 2012.
Vatikiotis-Bateson, Eric, Pascal Perrier und Gérard Bailly. Audiovisual Speech Processing. Cambridge University Press, 2012.
Vatikiotis-Bateson, Eric, Pascal Perrier und Gérard Bailly. Audiovisual Speech Processing. Cambridge University Press, 2015.
Vatikiotis-Bateson, Eric, Pascal Perrier und Gérard Bailly. Audiovisual Speech Processing. Cambridge University Press, 2012.
Abel, Andrew, und Amir Hussain. Cognitively Inspired Audiovisual Speech Filtering: Towards an Intelligent, Fuzzy Based, Multimodal, Two-Stage Speech Enhancement System. Springer, 2015.
Abel, Andrew, und Amir Hussain. Cognitively Inspired Audiovisual Speech Filtering: Towards an Intelligent, Fuzzy Based, Multimodal, Two-Stage Speech Enhancement System. Springer International Publishing AG, 2015.
Buchteile zum Thema "Audiovisual speech processing":
Riekhakaynen, Elena, und Elena Zatevalova. „Should We Believe Our Eyes or Our Ears? Processing Incongruent Audiovisual Stimuli by Russian Listeners“. In Speech and Computer, 604–15. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-20980-2_51.
Aleksic, Petar S., Gerasminos Potamianos und Aggelos K. Katsaggelos. „Audiovisual Speech Processing“. In The Essential Guide to Video Processing, 689–737. Elsevier, 2009. http://dx.doi.org/10.1016/b978-0-12-374456-2.00024-4.
Pantic, Maja. „Face for Interface“. In Encyclopedia of Multimedia Technology and Networking, Second Edition, 560–67. IGI Global, 2009. http://dx.doi.org/10.4018/978-1-60566-014-1.ch075.
Konferenzberichte zum Thema "Audiovisual speech processing":
Vatikiotis-Bateson, E., K. G. Munhall, Y. Kasahara, F. Garcia und H. Yehia. „Characterizing audiovisual information during speech“. In 4th International Conference on Spoken Language Processing (ICSLP 1996). ISCA: ISCA, 1996. http://dx.doi.org/10.21437/icslp.1996-379.
Petridis, Stavros, und Maja Pantic. „Audiovisual discrimination between laughter and speech“. In ICASSP 2008 - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2008. http://dx.doi.org/10.1109/icassp.2008.4518810.
Petridis, Stavros, Themos Stafylakis, Pingehuan Ma, Feipeng Cai, Georgios Tzimiropoulos und Maja Pantic. „End-to-End Audiovisual Speech Recognition“. In ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. http://dx.doi.org/10.1109/icassp.2018.8461326.
Tran, Tam, Soroosh Mariooryad und Carlos Busso. „Audiovisual corpus to analyze whisper speech“. In ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013. http://dx.doi.org/10.1109/icassp.2013.6639243.
Katsamanis, Athanassios, George Papandreou und Petros Maragos. „Audiovisual-to-Articulatory Speech Inversion Using HMMs“. In 2007 IEEE 9th Workshop on Multimedia Signal Processing. IEEE, 2007. http://dx.doi.org/10.1109/mmsp.2007.4412915.
Rosenblum, Lawrence D. „The perceptual basis for audiovisual speech integration“. In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA: ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-424.
Silva, Samuel, und António Teixeira. „An Anthropomorphic Perspective for Audiovisual Speech Synthesis“. In 10th International Conference on Bio-inspired Systems and Signal Processing. SCITEPRESS - Science and Technology Publications, 2017. http://dx.doi.org/10.5220/0006150201630172.
Holt, Rebecca, Laurence Bruggeman und Katherine Demuth. „Audiovisual benefits for speech processing speed among children with hearing loss“. In The 15th International Conference on Auditory-Visual Speech Processing. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/avsp.2019-10.
Matthews, I. A. „Scale based features for audiovisual speech recognition“. In IEE Colloquium on Integrated Audio-Visual Processing for Recognition, Synthesis and Communication. IEE, 1996. http://dx.doi.org/10.1049/ic:19961152.
Ma, Pingchuan, Stavros Petridis und Maja Pantic. „Detecting Adversarial Attacks on Audiovisual Speech Recognition“. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9413661.