Dissertations / Theses on the topic 'Acoustic Scene Analysis'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 20 dissertations / theses for your research on the topic 'Acoustic Scene Analysis.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Kudo, Hiroaki, Jinji Chen, and Noboru Ohnishi. "Scene Analysis by Clues from the Acoustic Signals." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2004. http://hdl.handle.net/2237/10426.
Full textFord, Logan H. "Large-scale acoustic scene analysis with deep residual networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123026.
Full textThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 63-66).
Many of the recent advances in audio event detection, particularly on the AudioSet dataset, have focused on improving performance using the released embeddings produced by a pre-trained model. In this work, we instead study the task of training a multi-label event classifier directly from the audio recordings of AudioSet. Using the audio recordings, not only are we able to reproduce results from prior work, we have also confirmed improvements of other proposed additions, such as an attention module. Moreover, by training the embedding network jointly with the additions, we achieve a mean Average Precision (mAP) of 0.392 and an area under ROC curve (AUC) of 0.971, surpassing the state-of-the-art without transfer learning from a large dataset. We also analyze the output activations of the network and find that the models are able to localize audio events when a finer time resolution is needed. In addition, we use this model in exploring multimodal learning, transfer learning, and realtime sound event detection tasks.
by Logan H. Ford.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Teutsch, Heinz. "Wavefield decomposition using microphone arrays and its application to acoustic scene analysis." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=97902806X.
Full textMcMullan, Amanda R. "Electroencephalographic measures of auditory perception in dynamic acoustic environments." Thesis, Lethbridge, Alta. : University of Lethbridge, Dept. of Neuroscience, c2013, 2013. http://hdl.handle.net/10133/3354.
Full textx, 90 leaves : col. ill. ; 29 cm
Narayanan, Arun. "Computational auditory scene analysis and robust automatic speech recognition." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1401460288.
Full textCarlo, Diego Di. "Echo-aware signal processing for audio scene analysis." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S075.
Full textMost of audio signal processing methods regard reverberation and in particular acoustic echoes as a nuisance. However, they convey important spatial and semantic information about sound sources and, based on this, recent echo-aware methods have been proposed. In this work we focus on two directions. First, we study the how to estimate acoustic echoes blindly from microphone recordings. Two approaches are proposed, one leveraging on continuous dictionaries, one using recent deep learning techniques. Then, we focus on extending existing methods in audio scene analysis to their echo-aware forms. The Multichannel NMF framework for audio source separation, the SRP-PHAT localization method, and the MVDR beamformer for speech enhancement are all extended to their echo-aware versions
Deleforge, Antoine. "Acoustic Space Mapping : A Machine Learning Approach to Sound Source Separation and Localization." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENM033/document.
Full textIn this thesis, we address the long-studied problem of binaural (two microphones) sound source separation and localization through supervised leaning. To achieve this, we develop a new paradigm referred as acoustic space mapping, at the crossroads of binaural perception, robot hearing, audio signal processing and machine learning. The proposed approach consists in learning a link between auditory cues perceived by the system and the emitting sound source position in another modality of the system, such as the visual space or the motor space. We propose new experimental protocols to automatically gather large training sets that associates such data. Obtained datasets are then used to reveal some fundamental intrinsic properties of acoustic spaces and lead to the development of a general family of probabilistic models for locally-linear high- to low-dimensional space mapping. We show that these models unify several existing regression and dimensionality reduction techniques, while encompassing a large number of new models that generalize previous ones. The properties and inference of these models are thoroughly detailed, and the prominent advantage of proposed methods with respect to state-of-the-art techniques is established on different space mapping applications, beyond the scope of auditory scene analysis. We then show how the proposed methods can be probabilistically extended to tackle the long-known cocktail party problem, i.e., accurately localizing one or several sound sources emitting at the same time in a real-word environment, and separate the mixed signals. We show that resulting techniques perform these tasks with an unequaled accuracy. This demonstrates the important role of learning and puts forwards the acoustic space mapping paradigm as a promising tool for robustly addressing the most challenging problems in computational binaural audition
Mouterde, Solveig. "Long-range discrimination of individual vocal signatures by a songbird : from propagation constraints to neural substrate." Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4012/document.
Full textIn communication systems, one of the biggest challenges is that the information encoded by the emitter is always modified before reaching the receiver, who has to process this altered information in order to recover the intended message. In acoustic communication particularly, the transmission of sound through the environment is a major source of signal degradation, caused by attenuation, absorption and reflections, all of which lead to decreases in the signal relative to the background noise. How animals deal with the need for exchanging information in spite of constraining conditions has been the subject of many studies either at the emitter or at the receiver's levels. However, a more integrated research about auditory scene analysis has seldom been used, and is needed to address the complexity of this process. The goal of my research was to use a transversal approach to study how birds adapt to the constraints of long distance communication by investigating the information coding at the emitter's level, the propagation-induced degradation of the acoustic signal, and the discrimination of this degraded information by the receiver at both the behavioral and neural levels. Taking into account the everyday issues faced by animals in their natural environment, and using stimuli and paradigms that reflected the behavioral relevance of these challenges, has been the cornerstone of my approach. Focusing on the information about individual identity in the distance calls of zebra finches Taeniopygia guttata, I investigated how the individual vocal signature is encoded, degraded, and finally discriminated, from the emitter to the receiver. This study shows that the individual signature of zebra finches is very resistant to propagation-induced degradation, and that the most individualized acoustic parameters vary depending on distance. Testing female birds in operant conditioning experiments, I showed that they are experts at discriminating between the degraded vocal signatures of two males, and that they can improve their ability substantially when they can train over increasing distances. Finally, I showed that this impressive discrimination ability also occurs at the neural level: we found a population of neurons in the avian auditory forebrain that discriminate individual voices with various degrees of propagation-induced degradation without prior familiarization or training. The finding of such a high-level auditory processing, in the primary auditory cortex, opens a new range of investigations, at the interface of neural processing and behavior
Teki, S. "Cognitive analysis of complex acoustic scenes." Thesis, University College London (University of London), 2013. http://discovery.ucl.ac.uk/1413017/.
Full textWang, Yuxuan. "Supervised Speech Separation Using Deep Neural Networks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1426366690.
Full textChen, Jitong. "On Generalization of Supervised Speech Separation." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492038295603502.
Full textHuet, Moïra-Phoebé. "Voice mixology at a cocktail party : Combining behavioural and neural tracking for speech segregation." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI070.
Full textIt is not always easy to follow a conversation in a noisy environment. In order to discriminate two speakers, we have to mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background. In this dissertation, the processes underlying speech segregation are explored through behavioural and neurophysiological experiments. In a preliminary phase, the development of an intelligibility task -- the Long-SWoRD test -- is introduced. This protocol allows participants to benefit from cognitive resources, such as linguistic knowledge, to separate two talkers in a realistic listening environment. The similarity between the two speakers, and thus by extension the difficulty of the task, was controlled by manipulating the acoustic parameters of the target and masker voices. In a second phase, the performance of the participants on this task is evaluated through three behavioural and neurophysiological studies (EEG). Behavioural results are consistent with the literature and show that the distance between voices, spatialisation cues, and semantic information influence participants' performance. Neurophysiological results, analysed with temporal response functions (TRF), indicate that the neural representations of the two speakers differ according to the difficulty of listening conditions. In addition, these representations are constructed more quickly when the voices are easily distinguishable. It is often presumed in the literature that participants' attention remains constantly on the same voice. The experimental protocol presented in this work provides the opportunity to retrospectively infer when participants were listening to each voice. Therefore, in a third stage, a combined analysis of this attentional information and EEG signals is presented. Results show that information about attentional focus can be used to improve the neural representation of the attended voice in situations where the voices are similar
Baque, Mathieu. "Analyse de scène sonore multi-capteurs : un front-end temps-réel pour la manipulation de scène." Thesis, Le Mans, 2017. http://www.theses.fr/2017LEMA1013/document.
Full textThe context of this thesis is the development of spatialized audio (5.1 contents, Dolby Atmos...) and particularly of 3D audio. Among the existing 3D audio formats, Ambisonics and Higher Order Ambisonics (HOA) allow a homogeneous spatial representation of a sound field and allows basics manipulations, like rotations or distorsions. The aim of the thesis is to provides efficient tools for ambisonics and HOA sound scene analyse and manipulations. A real-time implementation and robustness to reverberation are the main constraints to deal with. The implemented algorithm is based on a frame-by-frame Independent Component Analysis (ICA), wich decomposes the sound field into a set of acoustic contributions. Then a bayesian classification step is applied to the extracted components to identify the real sources and the residual reverberation. Direction of arrival of the sources are extracted from the mixing matrix estimated by ICA, according to the ambisonic formalism, and a real-time cartography of the sound scene is obtained. Performances have been evaluated in different acoustic environnements to assess the influence of several parameters such as the ambisonic order, the frame length or the number of sources. Accurate results in terms of source localization and source counting have been obtained for frame lengths of a few hundred milliseconds. The algorithm is exploited as a pre-processing step for a speech recognition prototype and allows a significant increasing of the recognition results, in far field conditions and in the presence of noise and interferent sources
Woodruff, John F. "Integrating Monaural and Binaural Cues for Sound Localization and Segregation in Reverberant Environments." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332425718.
Full textSundar, Harshavardhan. "Who Spoke What And Where? A Latent Variable Framework For Acoustic Scene Analysis." Thesis, 2016. https://etd.iisc.ac.in/handle/2005/2569.
Full textSundar, Harshavardhan. "Who Spoke What And Where? A Latent Variable Framework For Acoustic Scene Analysis." Thesis, 2016. http://etd.iisc.ernet.in/handle/2005/2569.
Full textAaronson, Neil L. "Speech-on-speech masking in a front-back dimension and analysis of binaural parameters in rooms using MLS methods." Diss., 2008.
Find full textTeutsch, Heinz [Verfasser]. "Wavefield decomposition using microphone arrays and its application to acoustic scene analysis / vorgelegt von Heinz Teutsch." 2006. http://d-nb.info/97902806X/34.
Full textPelluri, Sai Gunaranjan. "Joint Spectro-Temporal Analysis of Moving Acoustic Sources." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/4279.
Full text"Psychophysical and Neural Correlates of Auditory Attraction and Aversion." Master's thesis, 2014. http://hdl.handle.net/2286/R.I.27518.
Full textDissertation/Thesis
Masters Thesis Psychology 2014