Academic literature on the topic 'Audio-visual scene analysis'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Audio-visual scene analysis.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Audio-visual scene analysis"
Parekh, Sanjeel, Slim Essid, Alexey Ozerov, Ngoc Q. K. Duong, Patrick Perez, and Gael Richard. "Weakly Supervised Representation Learning for Audio-Visual Scene Analysis." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 416–28. http://dx.doi.org/10.1109/taslp.2019.2957889.
Full textO’Donovan, Adam, Ramani Duraiswami, Dmitry Zotkin, and Nail Gumerov. "Audio visual scene analysis using spherical arrays and cameras." Journal of the Acoustical Society of America 127, no. 3 (March 2010): 1979. http://dx.doi.org/10.1121/1.3385079.
Full textAhrens, Axel, and Kasper Duemose Lund. "Auditory spatial analysis in reverberant multi-talker environments with congruent and incongruent audio-visual room information." Journal of the Acoustical Society of America 152, no. 3 (September 2022): 1586–94. http://dx.doi.org/10.1121/10.0013991.
Full textMotlicek, Petr, Stefan Duffner, Danil Korchagin, Hervé Bourlard, Carl Scheffler, Jean-Marc Odobez, Giovanni Del Galdo, Markus Kallinger, and Oliver Thiergart. "Real-Time Audio-Visual Analysis for Multiperson Videoconferencing." Advances in Multimedia 2013 (2013): 1–21. http://dx.doi.org/10.1155/2013/175745.
Full textGebru, Israel Dejene, Xavier Alameda-Pineda, Florence Forbes, and Radu Horaud. "EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis." IEEE Transactions on Pattern Analysis and Machine Intelligence 38, no. 12 (December 1, 2016): 2402–15. http://dx.doi.org/10.1109/tpami.2016.2522425.
Full textMulachela, Husen, Aurelius RL Teluma, and Eka Putri Paramita. "Gender Equality Messages in Film Marlina The Murderer In Four Acts." JCommsci - Journal of Media and Communication Science 2, no. 3 (September 13, 2019): 136. http://dx.doi.org/10.29303/jcommsci.v2i3.57.
Full textXiao, Mei, May Wong, Michelle Umali, and Marc Pomplun. "Using Eye-Tracking to Study Audio — Visual Perceptual Integration." Perception 36, no. 9 (September 2007): 1391–95. http://dx.doi.org/10.1068/p5731.
Full textNahorna, Olha, Frédéric Berthommier, and Jean-Luc Schwartz. "Audio-visual speech scene analysis: Characterization of the dynamics of unbinding and rebinding the McGurk effect." Journal of the Acoustical Society of America 137, no. 1 (January 2015): 362–77. http://dx.doi.org/10.1121/1.4904536.
Full textHabib, Muhammad Alhada Fuadilah, Asik Putri Ayusari Ratnaningsih, and Michael Jeffri Sinabutar. "SEMIOTICS ANALYSIS OF AHOK-DJAROT’S CAMPAIGN VIDEO ON YOUTUBE SOCIAL MEDIA FOR THE SECOND ROUND OF THE 2017 DKI JAKARTA GUBERNATORIAL ELECTION." Journal of Urban Sociology 4, no. 2 (December 22, 2021): 76. http://dx.doi.org/10.30742/jus.v4i2.1772.
Full textRamenahalli, Sudarshan. "A Biologically Motivated, Proto-Object-Based Audiovisual Saliency Model." AI 1, no. 4 (November 3, 2020): 487–509. http://dx.doi.org/10.3390/ai1040030.
Full textDissertations / Theses on the topic "Audio-visual scene analysis"
Parekh, Sanjeel. "Learning representations for robust audio-visual scene analysis." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT015/document.
Full textThe goal of this thesis is to design algorithms that enable robust detection of objectsand events in videos through joint audio-visual analysis. This is motivated by humans’remarkable ability to meaningfully integrate auditory and visual characteristics forperception in noisy scenarios. To this end, we identify two kinds of natural associationsbetween the modalities in recordings made using a single microphone and camera,namely motion-audio correlation and appearance-audio co-occurrence.For the former, we use audio source separation as the primary application andpropose two novel methods within the popular non-negative matrix factorizationframework. The central idea is to utilize the temporal correlation between audio andmotion for objects/actions where the sound-producing motion is visible. The firstproposed method focuses on soft coupling between audio and motion representationscapturing temporal variations, while the second is based on cross-modal regression.We segregate several challenging audio mixtures of string instruments into theirconstituent sources using these approaches.To identify and extract many commonly encountered objects, we leverageappearance–audio co-occurrence in large datasets. This complementary associationmechanism is particularly useful for objects where motion-based correlations are notvisible or available. The problem is dealt with in a weakly-supervised setting whereinwe design a representation learning framework for robust AV event classification,visual object localization, audio event detection and source separation.We extensively test the proposed ideas on publicly available datasets. The experimentsdemonstrate several intuitive multimodal phenomena that humans utilize on aregular basis for robust scene understanding
Phillips, Nicola Jane. "Audio-visual scene analysis : attending to music in film." Thesis, University of Cambridge, 2000. https://www.repository.cam.ac.uk/handle/1810/251745.
Full textAlameda-Pineda, Xavier. "Egocentric Audio-Visual Scene Analysis : a machine learning and signal processing approach." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENM024/document.
Full textAlong the past two decades, the industry has developed several commercial products with audio-visual sensing capabilities. Most of them consists on a videocamera with an embedded microphone (mobile phones, tablets, etc). Other, such as Kinect, include depth sensors and/or small microphone arrays. Also, there are some mobile phones equipped with a stereo camera pair. At the same time, many research-oriented systems became available (e.g., humanoid robots such as NAO). Since all these systems are small in volume, their sensors are close to each other. Therefore, they are not able to capture de global scene, but one point of view of the ongoing social interplay. We refer to this as "Egocentric Audio-Visual Scene Analysis''.This thesis contributes to this field in several aspects. Firstly, by providing a publicly available data set targeting applications such as action/gesture recognition, speaker localization, tracking and diarisation, sound source localization, dialogue modelling, etc. This work has been used later on inside and outside the thesis. We also investigated the problem of AV event detection. We showed how the trust on one of the modalities (visual to be precise) can be modeled and used to bias the method, leading to a visually-supervised EM algorithm (ViSEM). Afterwards we modified the approach to target audio-visual speaker detection yielding to an on-line method working in the humanoid robot NAO. In parallel to the work on audio-visual speaker detection, we developed a new approach for audio-visual command recognition. We explored different features and classifiers and confirmed that the use of audio-visual data increases the performance when compared to auditory-only and to video-only classifiers. Later, we sought for the best method using tiny training sets (5-10 samples per class). This is interesting because real systems need to adapt and learn new commands from the user. Such systems need to be operational with a few examples for the general public usage. Finally, we contributed to the field of sound source localization, in the particular case of non-coplanar microphone arrays. This is interesting because the geometry of the microphone can be any. Consequently, this opens the door to dynamic microphone arrays that would adapt their geometry to fit some particular tasks. Also, because the design of commercial systems may be subject to certain constraints for which circular or linear arrays are not suited
Khalidov, Vasil. "Modèles de mélanges conjugués pour la modélisation de la perception visuelle et auditive." Grenoble, 2010. http://www.theses.fr/2010GRENM064.
Full textIn this thesis, the modelling of audio-visual perception with a head-like device is considered. The related problems, namely audio-visual calibration, audio-visual object detection, localization and tracking are addressed. A spatio-temporal approach to the head-like device calibration is proposed based on probabilistic multimodal trajectory matching. The formalism of conjugate mixture models is introduced along with a family of efficient optimization algorithms to perform multimodal clustering. One instance of this algorithm family, namely the conjugate expectation maximization (ConjEM) algorithm is further improved to gain attractive theoretical properties. The multimodal object detection and object number estimation methods are developed, their theoretical properties are discussed. Finally, the proposed multimodal clustering method is combined with the object detection and object number estimation strategies and known tracking techniques to perform multimodal multiobject tracking. The performance is demonstrated on simulated data and the database of realistic audio-visual scenarios (CAVA database)
Stauffer, Chris. "Automated Audio-visual Activity Analysis." 2005. http://hdl.handle.net/1721.1/30568.
Full textBook chapters on the topic "Audio-visual scene analysis"
Saraceno, Caterina, and Riccardo Leonardi. "Audio-visual processing for scene change detection." In Image Analysis and Processing, 124–31. Berlin, Heidelberg: Springer Berlin Heidelberg, 1997. http://dx.doi.org/10.1007/3-540-63508-4_114.
Full textTsekeridou, Sofia, Stelios Krinidis, and Ioannis Pitas. "Scene Change Detection Based on Audio-Visual Analysis and Interaction." In Multi-Image Analysis, 214–25. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-45134-x_16.
Full textOwens, Andrew, and Alexei A. Efros. "Audio-Visual Scene Analysis with Self-Supervised Multisensory Features." In Computer Vision – ECCV 2018, 639–58. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01231-1_39.
Full textGanesh, Attigodu Chandrashekara, Frédéric Berthommier, and Jean-Luc Schwartz. "Audio Visual Integration with Competing Sources in the Framework of Audio Visual Speech Scene Analysis." In Advances in Experimental Medicine and Biology, 399–408. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-25474-6_42.
Full textGupta, Vaibhavi, Vinay Detani, Vivek Khokar, and Chiranjoy Chattopadhyay. "C2VNet: A Deep Learning Framework Towards Comic Strip to Audio-Visual Scene Synthesis." In Document Analysis and Recognition – ICDAR 2021, 160–75. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86331-9_11.
Full textPham, Lam, Alexander Schindler, Mina Schutz, Jasmin Lampert, Sven Schlarb, and Ross King. "Deep Learning Frameworks Applied For Audio-Visual Scene Classification." In Data Science – Analytics and Applications, 39–44. Wiesbaden: Springer Fachmedien Wiesbaden, 2022. http://dx.doi.org/10.1007/978-3-658-36295-9_6.
Full textConference papers on the topic "Audio-visual scene analysis"
Wang, Shanshan, Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. "A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis." In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9415085.
Full textSchwartz, Jean-Luc, Frédéric Berthommier, and Christophe Savariaux. "Audio-visual scene analysis: evidence for a "very-early" integration process in audio-visual speech perception." In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA: ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-437.
Full text"ColEnViSon: Color Enhanced Visual Sonifier - A Polyphonic Audio Texture and Salient Scene Analysis." In International Conference on Computer Vision Theory and Applications. SciTePress - Science and and Technology Publications, 2009. http://dx.doi.org/10.5220/0001805105660572.
Full textSchott, Gareth, and Raphael Marczak. "Understanding game actions: The development of a post-processing method for audio-visual scene analysis." In 2016 Future Technologies Conference (FTC). IEEE, 2016. http://dx.doi.org/10.1109/ftc.2016.7821657.
Full textFayek, Haytham M., and Anurag Kumar. "Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/78.
Full textYANG, LING, and SHENG-DONG YUE. "AN ANALYSIS OF THE CHARACTERISTICS OF MUSIC CREATION IN MEFISTOFELE." In 2021 International Conference on Education, Humanity and Language, Art. Destech Publications, Inc., 2021. http://dx.doi.org/10.12783/dtssehs/ehla2021/35726.
Full text