Dissertations / Theses on the topic 'Computational auditory scene analysis'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Computational auditory scene analysis.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Ellis, Daniel Patrick Whittlesey. "Prediction-driven computational auditory scene analysis." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/11006.
Full textIncludes bibliographical references (p. 173-180).
by Daniel P.W. Ellis.
Ph.D.
Delmotte, Varinthira Duangudom. "Computational auditory saliency." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45888.
Full textShao, Yang. "Sequential organization in computational auditory scene analysis." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1190127412.
Full textBrown, Guy Jason. "Computational auditory scene analysis : a representational approach." Thesis, University of Sheffield, 1992. http://etheses.whiterose.ac.uk/2982/.
Full textSrinivasan, Soundararajan. "Integrating computational auditory scene analysis and automatic speech recognition." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1158250036.
Full textNarayanan, Arun. "Computational auditory scene analysis and robust automatic speech recognition." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1401460288.
Full textUnnikrishnan, Harikrishnan. "AUDIO SCENE SEGEMENTATION USING A MICROPHONE ARRAY AND AUDITORY FEATURES." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/622.
Full textNakatani, Tomohiro. "Computational Auditory Scene Analysis Based on Residue-driven Architecture and Its Application to Mixed Speech Recognition." 京都大学 (Kyoto University), 2002. http://hdl.handle.net/2433/149754.
Full textJavadi, Ailar. "Bio-inspired noise robust auditory features." Thesis, Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44801.
Full textMelih, Kathy, and n/a. "Audio Source Separation Using Perceptual Principles for Content-Based Coding and Information Management." Griffith University. School of Information Technology, 2004. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20050114.081327.
Full textMelih, Kathy. "Audio Source Separation Using Perceptual Principles for Content-Based Coding and Information Management." Thesis, Griffith University, 2004. http://hdl.handle.net/10072/366279.
Full textThesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information Technology
Full Text
Trowitzsch, Ivo [Verfasser], Klaus [Akademischer Betreuer] Obermayer, Klaus [Gutachter] Obermayer, Dorothea [Gutachter] Kolossa, and Thomas [Gutachter] Sikora. "Robust sound event detection in binaural computational auditory scene analysis / Ivo Trowitzsch ; Gutachter: Klaus Obermayer, Dorothea Kolossa, Thomas Sikora ; Betreuer: Klaus Obermayer." Berlin : Technische Universität Berlin, 2020. http://d-nb.info/1210055120/34.
Full textRoman, Nicoleta. "Auditory-based algorithms for sound segregation in multisource and reverberant environments." Connect to resource, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1124370749.
Full textTitle from first page of PDF file. Document formatted into pages; contains i-xxii, xx-xxi, 183 p.; also includes graphics. Includes bibliographical references (p. 171-183). Available online via OhioLINK's ETD Center
Otsuka, Takuma. "Bayesian Microphone Array Processing." 京都大学 (Kyoto University), 2014. http://hdl.handle.net/2433/188871.
Full text0048
新制・課程博士
博士(情報学)
甲第18412号
情博第527号
新制||情||93(附属図書館)
31270
京都大学大学院情報学研究科知能情報学専攻
(主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳
学位規則第4条第1項該当
Jin, Zhaozhang. "Monaural Speech Segregation in Reverberant Environments." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1279141797.
Full textWoodruff, John F. "Integrating Monaural and Binaural Cues for Sound Localization and Segregation in Reverberant Environments." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332425718.
Full textWang, Yuxuan. "Supervised Speech Separation Using Deep Neural Networks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1426366690.
Full textChen, Jitong. "On Generalization of Supervised Speech Separation." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492038295603502.
Full textZhao, Yan. "Deep learning methods for reverberant and noisy speech enhancement." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1593462119759348.
Full textLiu, Yuzhou. "Deep CASA for Robust Pitch Tracking and Speaker Separation." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1566179636974186.
Full textGolden, H. L. "Auditory scene analysis in Alzheimer's disease." Thesis, University College London (University of London), 2016. http://discovery.ucl.ac.uk/1474234/.
Full textYan, Rujiao [Verfasser]. "Computational Audiovisual Scene Analysis / Rujiao Yan." Bielefeld : Universitätsbibliothek Bielefeld, 2014. http://d-nb.info/1058945572/34.
Full textSauvé, Sarah A. "Prediction in polyphony : modelling musical auditory scene analysis." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/46805.
Full textHutchison, Joanna Lynn. "Boundary extension in the auditory domain." Fort Worth, Tex. : Texas Christian University, 2007. http://etd.tcu.edu/etdfiles/available/etd-07232007-150552/unrestricted/Hutchison.pdf.
Full textHarding, Susan M. "Multi-resolution auditory scene analysis for speech perception : experimental evidence and a model." Thesis, Keele University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.275284.
Full textAtilgan, Huriye. "A visionary approach to listening : determining the role of vision in auditory scene analysis." Thesis, University College London (University of London), 2017. http://discovery.ucl.ac.uk/1573694/.
Full textMcMullan, Amanda R. "Electroencephalographic measures of auditory perception in dynamic acoustic environments." Thesis, Lethbridge, Alta. : University of Lethbridge, Dept. of Neuroscience, c2013, 2013. http://hdl.handle.net/10133/3354.
Full textx, 90 leaves : col. ill. ; 29 cm
Shirazibeheshti, Amirali. "The effect of sedation on conscious processing : computational analysis of the EEG response to auditory irregularity." Thesis, University of Kent, 2015. https://kar.kent.ac.uk/54467/.
Full textArdam, Nagaraju. "Study of ASA Algorithms." Thesis, Linköpings universitet, Elektroniksystem, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70996.
Full textHearing-Aid
Ravulapalli, Sunil Babu. "Association of Sound to Motion in Video Using Perceptual Organization." Scholar Commons, 2006. http://scholarcommons.usf.edu/etd/3769.
Full textHamid, Muhammad Raffay. "A computational framework for unsupervised analysis of everyday human activities." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24765.
Full textCommittee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Rehg
Swadzba, Agnes [Verfasser]. "The robots vista space : a computational 3D scene analysis / Agnes Swadzba. AG Angewandte Informatik -- Technische Fakultät (Bereich Informatik). Sonderforschungsbereich: Alignment in Communication (DFG SFB 673)." Bielefeld : Universitätsbibliothek Bielefeld, Hochschulschriften, 2011. http://d-nb.info/1012433447/34.
Full textMahapatra, Arun Kiran. "Investigation of noise in hospital emergency departments." Thesis, Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/45842.
Full textDeleforge, Antoine. "Acoustic Space Mapping : A Machine Learning Approach to Sound Source Separation and Localization." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENM033/document.
Full textIn this thesis, we address the long-studied problem of binaural (two microphones) sound source separation and localization through supervised leaning. To achieve this, we develop a new paradigm referred as acoustic space mapping, at the crossroads of binaural perception, robot hearing, audio signal processing and machine learning. The proposed approach consists in learning a link between auditory cues perceived by the system and the emitting sound source position in another modality of the system, such as the visual space or the motor space. We propose new experimental protocols to automatically gather large training sets that associates such data. Obtained datasets are then used to reveal some fundamental intrinsic properties of acoustic spaces and lead to the development of a general family of probabilistic models for locally-linear high- to low-dimensional space mapping. We show that these models unify several existing regression and dimensionality reduction techniques, while encompassing a large number of new models that generalize previous ones. The properties and inference of these models are thoroughly detailed, and the prominent advantage of proposed methods with respect to state-of-the-art techniques is established on different space mapping applications, beyond the scope of auditory scene analysis. We then show how the proposed methods can be probabilistically extended to tackle the long-known cocktail party problem, i.e., accurately localizing one or several sound sources emitting at the same time in a real-word environment, and separate the mixed signals. We show that resulting techniques perform these tasks with an unequaled accuracy. This demonstrates the important role of learning and puts forwards the acoustic space mapping paradigm as a promising tool for robustly addressing the most challenging problems in computational binaural audition
David, Marion. "Toward sequential segregation of speech sounds based on spatial cues." Thesis, Vaulx-en-Velin, Ecole nationale des travaux publics, 2014. http://www.theses.fr/2014ENTP0013/document.
Full textIn a context of competing sound sources, the auditory scene analysis aims to draw an accurate and useful representation of the perceived sounds. Solving such a scene consists of grouping sound events which come from the same source and segregating them from the other sounds. This PhD work intended to further our understanding of how the human auditory system processes these complex acoustic environments, with a particular emphasis on the potential influence of spatial cues on perceptual stream segregation. All the studies conducted during this PhD endeavoured to rely on realistic configurations.In a real environment, the diffraction and reflection properties of the room and the head lead to distortions of the sounds depending on the source and receiver positions. This phenomenon is named colouration. Speechshaped noises, as a first approximation of speech sounds, were used to evaluate the effect of this colouration on stream segregation. The results showed that the slight monaural spectral differences induced by head and room colouration can induce segregation. Moreover, this segregation was enhanced by adding the binaural cues associated with a given position (ITD, ILD). Especially, a second study suggested that the monaural intensity variations across time at each ear were more relevant for stream segregation than the interaural level differences. The results also indicated that the percept of lateralization associated with a given ITD helped the segregation when the lateralization was salient enough. Besides, the ITD per se could also favour segregation.The natural ability to perceptually solve an auditory scene is relevant for speech intelligibility. The main idea was to replicate the first experiments with speech items instead of frozen noises. A characteristic of running speech is a high degree of acoustical variability used to convey information. Thus, as a first step, we investigated the robustness of stream segregation based on a frequency difference to variability on the same acoustical cue (i.e., frequency). The second step was to evaluate the fundamental frequency difference that enables to separate speech items. Indeed, according to the limited effects measured in the two first experiments, it was assumed that spatial cues might be relevant for stream segregation only in interaction with another “stronger” cue such as a F0 difference.The results of these preliminary experiments showed first that the introduction of a large spectral variability introduced within pure tone streams can lead to a complicated percept, presumably consisting of multiple streams. Second, the results suggested that a fundamental frequency difference comprised between 3 and 5 semitones enables to separate speech item. These experiments provided results that will be used to design the next experiment investigating how an ambiguous percept could be biased toward segregation by introducing spatial cues
Devergie, Aymeric. "Interactions audiovisuelles pour l'analyse de scènes auditives." Phd thesis, Université Claude Bernard - Lyon I, 2010. http://tel.archives-ouvertes.fr/tel-00830927.
Full textHuet, Moïra-Phoebé. "Voice mixology at a cocktail party : Combining behavioural and neural tracking for speech segregation." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI070.
Full textIt is not always easy to follow a conversation in a noisy environment. In order to discriminate two speakers, we have to mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background. In this dissertation, the processes underlying speech segregation are explored through behavioural and neurophysiological experiments. In a preliminary phase, the development of an intelligibility task -- the Long-SWoRD test -- is introduced. This protocol allows participants to benefit from cognitive resources, such as linguistic knowledge, to separate two talkers in a realistic listening environment. The similarity between the two speakers, and thus by extension the difficulty of the task, was controlled by manipulating the acoustic parameters of the target and masker voices. In a second phase, the performance of the participants on this task is evaluated through three behavioural and neurophysiological studies (EEG). Behavioural results are consistent with the literature and show that the distance between voices, spatialisation cues, and semantic information influence participants' performance. Neurophysiological results, analysed with temporal response functions (TRF), indicate that the neural representations of the two speakers differ according to the difficulty of listening conditions. In addition, these representations are constructed more quickly when the voices are easily distinguishable. It is often presumed in the literature that participants' attention remains constantly on the same voice. The experimental protocol presented in this work provides the opportunity to retrospectively infer when participants were listening to each voice. Therefore, in a third stage, a combined analysis of this attentional information and EEG signals is presented. Results show that information about attentional focus can be used to improve the neural representation of the attended voice in situations where the voices are similar
Dufour, Jean-Yves. "Contribution algorithmique a la conception d'un systeme integre d'analyse temps-reel de scenes dynamiques : reconnaissance d'objets et analyse de mouvement dans une sequence d'images." Caen, 1988. http://www.theses.fr/1988CAEN2034.
Full textMouterde, Solveig. "Long-range discrimination of individual vocal signatures by a songbird : from propagation constraints to neural substrate." Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4012/document.
Full textIn communication systems, one of the biggest challenges is that the information encoded by the emitter is always modified before reaching the receiver, who has to process this altered information in order to recover the intended message. In acoustic communication particularly, the transmission of sound through the environment is a major source of signal degradation, caused by attenuation, absorption and reflections, all of which lead to decreases in the signal relative to the background noise. How animals deal with the need for exchanging information in spite of constraining conditions has been the subject of many studies either at the emitter or at the receiver's levels. However, a more integrated research about auditory scene analysis has seldom been used, and is needed to address the complexity of this process. The goal of my research was to use a transversal approach to study how birds adapt to the constraints of long distance communication by investigating the information coding at the emitter's level, the propagation-induced degradation of the acoustic signal, and the discrimination of this degraded information by the receiver at both the behavioral and neural levels. Taking into account the everyday issues faced by animals in their natural environment, and using stimuli and paradigms that reflected the behavioral relevance of these challenges, has been the cornerstone of my approach. Focusing on the information about individual identity in the distance calls of zebra finches Taeniopygia guttata, I investigated how the individual vocal signature is encoded, degraded, and finally discriminated, from the emitter to the receiver. This study shows that the individual signature of zebra finches is very resistant to propagation-induced degradation, and that the most individualized acoustic parameters vary depending on distance. Testing female birds in operant conditioning experiments, I showed that they are experts at discriminating between the degraded vocal signatures of two males, and that they can improve their ability substantially when they can train over increasing distances. Finally, I showed that this impressive discrimination ability also occurs at the neural level: we found a population of neurons in the avian auditory forebrain that discriminate individual voices with various degrees of propagation-induced degradation without prior familiarization or training. The finding of such a high-level auditory processing, in the primary auditory cortex, opens a new range of investigations, at the interface of neural processing and behavior
Camonin, Martine. "Mephisto : un outil de validation de modèles tridimensionnels." Nancy 1, 1987. http://www.theses.fr/1987NAN10149.
Full textSalamon, Justin J. "Melody extraction from polyphonic music signals." Doctoral thesis, Universitat Pompeu Fabra, 2013. http://hdl.handle.net/10803/123777.
Full textLa industria de la música fue una de las primeras en verse completamente reestructurada por los avances de la tecnología digital, y hoy en día tenemos acceso a miles de canciones almacenadas en nuestros dispositivos móviles y a millones más a través de servicios en la nube. Dada esta inmensa cantidad de música al nuestro alcance, necesitamos nuevas maneras de describir, indexar, buscar e interactuar con el contenido musical. Esta tesis se centra en una tecnología que abre las puertas a nuevas aplicaciones en este área: la extracción automática de la melodía a partir de una grabación musical polifónica. Mientras que identificar la melodía de una pieza es algo que los humanos pueden hacer relativamente bien, hacerlo de forma automática presenta mucha complejidad, ya que requiere combinar conocimiento de procesado de señal, acústica, aprendizaje automático y percepción sonora. Esta tarea se conoce en el ámbito de investigación como “extracción de melodía”, y consiste técnicamente en estimar la secuencia de alturas correspondiente a la melodía predominante de una pieza musical a partir del análisis de la señal de audio. Esta tesis presenta un método innovador para la extracción de la melodía basado en el seguimiento y caracterización de contornos tonales. En la tesis, mostramos cómo se pueden explotar las características de contornos en combinación con reglas basadas en la percepción auditiva, para identificar la melodía a partir de todo el contenido tonal de una grabación, tanto de manera heurística como a través de modelos aprendidos automáticamente. A través de una iniciativa internacional de evaluación comparativa de algoritmos, comprobamos además que el método propuesto obtiene resultados punteros. De hecho, logra la precisión más alta de todos los algoritmos que han participado en la iniciativa hasta la fecha. Además, la tesis demuestra la utilidad de nuestro método en diversas aplicaciones tanto de investigación como para usuarios finales, desarrollando una serie de sistemas que aprovechan la melodía extraída para la búsqueda de música por semejanza (identificación de versiones y búsqueda por tarareo), la clasificación del estilo musical, la transcripción o conversión de audio a partitura, y el análisis musical con métodos computacionales. La tesis también incluye un amplio análisis comparativo del estado de la cuestión en extracción de melodía y el primer análisis crítico existente de la metodología de evaluación de algoritmos de este tipo
La indústria musical va ser una de les primeres a veure's completament reestructurada pels avenços de la tecnologia digital, i avui en dia tenim accés a milers de cançons emmagatzemades als nostres dispositius mòbils i a milions més a través de serveis en xarxa. Al tenir aquesta immensa quantitat de música al nostre abast, necessitem noves maneres de descriure, indexar, buscar i interactuar amb el contingut musical. Aquesta tesi es centra en una tecnologia que obre les portes a noves aplicacions en aquesta àrea: l'extracció automàtica de la melodia a partir d'una gravació musical polifònica. Tot i que identificar la melodia d'una peça és quelcom que els humans podem fer relativament fàcilment, fer-ho de forma automàtica presenta una alta complexitat, ja que requereix combinar coneixement de processament del senyal, acústica, aprenentatge automàtic i percepció sonora. Aquesta tasca es coneix dins de l'àmbit d'investigació com a “extracció de melodia”, i consisteix tècnicament a estimar la seqüència de altures tonals corresponents a la melodia predominant d'una peça musical a partir de l'anàlisi del senyal d'àudio. Aquesta tesi presenta un mètode innovador per a l'extracció de la melodia basat en el seguiment i caracterització de contorns tonals. Per a fer-ho, mostrem com es poden explotar les característiques de contorns combinades amb regles basades en la percepció auditiva per a identificar la melodia a partir de tot el contingut tonal d'una gravació, tant de manera heurística com a través de models apresos automàticament. A més d'això, comprovem a través d'una iniciativa internacional d'avaluació comparativa d'algoritmes que el mètode proposat obté resultats punters. De fet, obté la precisió més alta de tots els algoritmes proposats fins la data d'avui. A demés, la tesi demostra la utilitat del mètode en diverses aplicacions tant d'investigació com per a usuaris finals, desenvolupant una sèrie de sistemes que aprofiten la melodia extreta per a la cerca de música per semblança (identificació de versions i cerca per taral•larà), la classificació de l'estil musical, la transcripció o conversió d'àudio a partitura, i l'anàlisi musical amb mètodes computacionals. La tesi també inclou una àmplia anàlisi comparativa de l'estat de l'art en extracció de melodia i la primera anàlisi crítica existent de la metodologia d'avaluació d'algoritmes d'aquesta mena.
Ellis, Daniel P. W. "Prediction-driven computational auditory scene analysis." Thesis, 1996. https://doi.org/10.7916/D84J0N13.
Full textYang, Cheng-Jia, and 楊政家. "Computational Auditory Scene Analysis for Speech Segregation." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/19929826462460307835.
Full text國立高雄第一科技大學
電腦與通訊工程研究所
102
Speech Separation is one of the most difficult task. It uses computational auditory scene analysis to separate in this study. The main Calculation methods is simulate Human auditory via computer, including Gammatone filter banks to simulate cochlear, and Meddis inner hair cell model simulate the wave form signal to Electrical signals. After the Gammatone filter banks and Meddis inner hair cell, the input signal will convert to time-frequency units and through classifier to classify which are noise units or speech units. Then collect all speech unit and reorganize the speech. It uses computational auditory scene analysis and support vector machine to separate speech in this study. The features of signal is using Mel-frequency cepstral coefficients and pitch as a basis for classification. The result of classification is a binary mask, In addition to process binary mask for image, including noise remove, hole filled, morphology opening and closing. We use noise remove, hole filled, morphology to compare noise filtering and speech distortion in experiments. It showed that using image process can improve binary mask.
Chou, Kenny. "A biologically inspired approach to the cocktail party problem." Thesis, 2020. https://hdl.handle.net/2144/41043.
Full text2022-05-19T00:00:00Z
Armstrong, Jonathan M. "Machine listening, musicological analysis and the creative process : the production of song-based music influenced by augmented listening techniques." Thesis, 2020. http://hdl.handle.net/1959.7/uws:68395.
Full textCantu, Marcos Antonio. "Sound source segregation of multiple concurrent talkers via Short-Time Target Cancellation." Thesis, 2018. https://hdl.handle.net/2144/32082.
Full text2020-10-22T00:00:00Z
Gillingham, Susan. "Auditory Search: The Deployment of Attention within a Complex Auditory Scene." Thesis, 2012. http://hdl.handle.net/1807/33220.
Full textΝταλαμπίρας, Σταύρος. "Ψηφιακή επεξεργασία και αυτόματη κατηγοριοποίηση περιβαλλοντικών ήχων." Thesis, 2010. http://nemertes.lis.upatras.gr/jspui/handle/10889/3705.
Full textThe dissertation is outlined as followed: In chapter 1 we present a general overview of the task of automatic recognition of sound events. Additionally we discuss the applications of the generalized audio signal recognition technology and we give a brief description of the state of the art. Finally we mention the contribution of the thesis. In chapter 2 we introduce the reader to the area of non speech audio processing. We provide the current trend in the feature extraction methodologies as well as the pattern recognition techniques. In chapter 3 we analyze a novel sound recognition system especially designed for addressing the domain of urban environmental sound events. A hierarchical probabilistic structure was constructed along with a combined set of sound parameters which lead to high accuracy. chapter 4 is divided in the following two parts: a) we explore the usage of multiresolution analysis as regards the speech/music discrimination problem and b) the previously acquired knowledge was used to build a system which combined features of different domains towards efficient analysis of online radio signals. In chapter 5 we exhaustively experiment on a new application of the sound recognition technology, space monitoring based on the acoustic modality. We propose a system which detects atypical situations under a metro station environment towards assisting the authorized personnel in the space monitoring task. In chapter 6 we propose an adaptive framework for acoustic surveillance of potentially hazardous situations under environments of different acoustic properties. We show that the system achieves high performance and has the ability to adapt to heterogeneous environments in an unsupervised way. In chapter 7 we investigate the usage of the novelty detection method to the task of acoustic monitoring of indoor and outdoor spaces. A database with real-world data was recorded and three probabilistic techniques are proposed. In chapter 8 we present a novel methodology for generalized sound recognition that leads to high recognition accuracy. The merits of temporal feature integration as well as multi domain descriptors are exploited in combination with a state of the art generative classification technique.
"Natural Correlations of Spectral Envelope and their Contribution to Auditory Scene Analysis." Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.46351.
Full textDissertation/Thesis
Doctoral Dissertation Psychology 2017
Wu, Wei-Che, and 吳瑋哲. "Enhancement and segregation specific person's speech based on human auditory scene analysis." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/39372178856500478948.
Full text國立陽明大學
醫學工程研究所
95
There are several strategies to separate speech from noise. However,effective methods to solve multi speech’s sources problem remain elusive. The well-studied Cocktail party problem is to analyze and deal with multi speech’s sources signal. Here, we focus on a specific speech source. By sing speaker recognition and speech recognition techniques extracting speakers’speech characteristics coupled with human auditory scene analysis, the speakers’speech characteristics can be determined more precisely and distinguished from one another. These observed differences can then be used for signal segregation and enhancement for specific speakers. Therefore, the users like children can enhance the receiving signal intensity form specific speech’s sources as their parents or teacher.