Dissertations / Theses on the topic 'Computational auditory scene analysis'

To see the other types of publications on this topic, follow the link: Computational auditory scene analysis.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Computational auditory scene analysis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Ellis, Daniel Patrick Whittlesey. "Prediction-driven computational auditory scene analysis." Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/11006.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.
Includes bibliographical references (p. 173-180).
by Daniel P.W. Ellis.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
2

Delmotte, Varinthira Duangudom. "Computational auditory saliency." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45888.

Full text
Abstract:
The objective of this dissertation research is to identify sounds that grab a listener's attention. These sounds that draw a person's attention are sounds that are considered salient. The focus here will be on investigating the role of saliency in the auditory attentional process. In order to identify these salient sounds, we have developed a computational auditory saliency model inspired by our understanding of the human auditory system and auditory perception. By identifying salient sounds we can obtain a better understanding of how sounds are processed by the auditory system, and in particular, the key features contributing to sound salience. Additionally, studying the salience of different auditory stimuli can lead to improvements in the performance of current computational models in several different areas, by making use of the information obtained about what stands out perceptually to observers in a particular scene. Auditory saliency also helps to rapidly sort the information present in a complex auditory scene. Since our resources are finite, not all information can be processed equally. We must, therefore, be able to quickly determine the importance of different objects in a scene. Additionally, an immediate response or decision may be required. In order to respond, the observer needs to know the key elements of the scene. The issue of saliency is closely related to many different areas, including scene analysis. The thesis provides a comprehensive look at auditory saliency. It explores the advantages and limitations of using auditory saliency models through different experiments and presents a general computational auditory saliency model that can be used for various applications.
APA, Harvard, Vancouver, ISO, and other styles
3

Shao, Yang. "Sequential organization in computational auditory scene analysis." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1190127412.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Brown, Guy Jason. "Computational auditory scene analysis : a representational approach." Thesis, University of Sheffield, 1992. http://etheses.whiterose.ac.uk/2982/.

Full text
Abstract:
This thesis addresses the problem of how a listener groups together acoustic components which have arisen from the same environmental event, a phenomenon known as auditory scene analysis. A computational model of auditory scene analysis is presented, which is able to separate speech from a variety of interfering noises. The model consists of four processing stages. Firstly, the auditory periphery is simulated by a bank of bandpass filters and a model of inner hair cell function. In the second stage, physiologically-inspired models of higher auditory organization - aiditory maps - are used to provide a rich representational basis for scene analysis. Periodicities in the acoustic input are coded by an ant ocorrelation map and a crosscorrelation map. Information about spectral continuity is extracted by a frequency transition map. The times at which acoustic components start and stop are identified by an onset map and an offset map. In the third 8tage of processing, information from the periodicity and frequency transition maps is used to characterize the auditory scene as a collection of symbolic auditory objects. Finally, a search strategy identifies objects that have similar properties and groups them together. Specifically, objects are likely to form a group if they have a similar periodicity, onset time or offset time. The model has been evaluated in two ways, using the task of segregating voiced speech from a number of interfering sounds such as random noise, "cocktail party" noise and other speech. Firstly, a waveform can be resynthesized for each group in the auditory scene, so that segregation performance can be assessed by informal listening tests. The resynthesized speech is highly intelligible and fairly natural. Secondly, the linear nature of the resynthesis process allows the signal-to-noise ratio (SNR) to be compared before and after segregation. An improvement in SNR is obtained after segregation for each type of interfering noise. Additionally, the performance of the model is significantly better than that of a conventional frame-based autocorrelation segregation strategy.
APA, Harvard, Vancouver, ISO, and other styles
5

Srinivasan, Soundararajan. "Integrating computational auditory scene analysis and automatic speech recognition." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1158250036.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Narayanan, Arun. "Computational auditory scene analysis and robust automatic speech recognition." The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1401460288.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Unnikrishnan, Harikrishnan. "AUDIO SCENE SEGEMENTATION USING A MICROPHONE ARRAY AND AUDITORY FEATURES." UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/622.

Full text
Abstract:
Auditory stream denotes the abstract effect a source creates in the mind of the listener. An auditory scene consists of many streams, which the listener uses to analyze and understand the environment. Computer analyses that attempt to mimic human analysis of a scene must first perform Audio Scene Segmentation (ASS). ASS find applications in surveillance, automatic speech recognition and human computer interfaces. Microphone arrays can be employed for extracting streams corresponding to spatially separated sources. However, when a source moves to a new location during a period of silence, such a system loses track of the source. This results in multiple spatially localized streams for the same source. This thesis proposes to identify local streams associated with the same source using auditory features extracted from the beamformed signal. ASS using the spatial cues is first performed. Then auditory features are extracted and segments are linked together based on similarity of the feature vector. An experiment was carried out with two simultaneous speakers. A classifier is used to classify the localized streams as belonging to one speaker or the other. The best performance was achieved when pitch appended with Gammatone Frequency Cepstral Coefficeints (GFCC) was used as the feature vector. An accuracy of 96.2% was achieved.
APA, Harvard, Vancouver, ISO, and other styles
8

Nakatani, Tomohiro. "Computational Auditory Scene Analysis Based on Residue-driven Architecture and Its Application to Mixed Speech Recognition." 京都大学 (Kyoto University), 2002. http://hdl.handle.net/2433/149754.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Javadi, Ailar. "Bio-inspired noise robust auditory features." Thesis, Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44801.

Full text
Abstract:
The purpose of this work is to investigate a series of biologically inspired modifications to state-of-the-art Mel- frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition results. We have provided recommendations to improve speech recognition results de- pending on signal-to-noise ratio levels of input signals. This work has been motivated by noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a spatial derivative step is used to sharpen the results, followed by an envelope detector (recti- fication and smoothing) and down-sampling for each filter bank before being compressed. DCT is then applied to the results of all filter banks to produce features. The Hidden- Markov Model Toolkit (HTK) is used as the recognition back-end to perform speech recognition given the features we have extracted. In this work, we investigate the role of filter types, window size, spatial derivative, rectification types, smoothing, down- sampling and compression and compared the final results to state-of-the-art Mel-frequency cepstral coefficients (MFCC). A series of conclusions and insights are provided for each step of the process. The goal of this work has not been to outperform MFCCs; however, we have shown that by changing the compression type from log compression to 0.07 root compression we are able to outperform MFCCs for all noisy conditions.
APA, Harvard, Vancouver, ISO, and other styles
10

Melih, Kathy, and n/a. "Audio Source Separation Using Perceptual Principles for Content-Based Coding and Information Management." Griffith University. School of Information Technology, 2004. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20050114.081327.

Full text
Abstract:
The information age has brought with it a dual problem. In the first place, the ready access to mechanisms to capture and store vast amounts of data in all forms (text, audio, image and video), has resulted in a continued demand for ever more efficient means to store and transmit this data. In the second, the rapidly increasing store demands effective means to structure and access the data in an efficient and meaningful manner. In terms of audio data, the first challenge has traditionally been the realm of audio compression research that has focused on statistical, unstructured audio representations that obfuscate the inherent structure and semantic content of the underlying data. This has only served to further complicate the resolution of the second challenge resulting in access mechanisms that are either impractical to implement, too inflexible for general application or too low level for the average user. Thus, an artificial dichotomy has been created from what is in essence a dual problem. The founding motivation of this thesis is that, although the hypermedia model has been identified as the ideal, cognitively justified method for organising data, existing audio data representations and coding models provide little, if any, support for, or resemblance to, this model. It is the contention of the author that any successful attempt to create hyperaudio must resolve this schism, addressing both storage and information management issues simultaneously. In order to achieve this aim, an audio representation must be designed that provides compact data storage while, at the same time, revealing the inherent structure of the underlying data. Thus it is the aim of this thesis to present a representation designed with these factors in mind. Perhaps the most difficult hurdle in the way of achieving the aims of content-based audio coding and information management is that of auditory source separation. The MPEG committee has noted this requirement during the development of its MPEG-7 standard, however, the mechanics of "how" to achieve auditory source separation were left as an open research question. This same committee proposed that MPEG-7 would "support descriptors that can act as handles referring directly to the data, to allow manipulation of the multimedia material." While meta-data tags are a part solution to this problem, these cannot allow manipulation of audio material down to the level of individual sources when several simultaneous sources exist in a recording. In order to achieve this aim, the data themselves must be encoded in such a manner that allows these descriptors to be formed. Thus, content-based coding is obviously required. In the case of audio, this is impossible to achieve without effecting auditory source separation. Auditory source separation is the concern of computational auditory scene analysis (CASA). However, the findings of CASA research have traditionally been restricted to a limited domain. To date, the only real application of CASA research to what could loosely be classified as information management has been in the area of signal enhancement for automatic speech recognition systems. In these systems, a CASA front end serves as a means of separating the target speech from the background "noise". As such, the design of a CASA-based approach, as presented in this thesis, to one of the most significant challenges facing audio information management research represents a significant contribution to the field of information management. Thus, this thesis unifies research from three distinct fields in an attempt to resolve some specific and general challenges faced by all three. It describes an audio representation that is based on a sinusoidal model from which low-level auditory primitive elements are extracted. The use of a sinusoidal representation is somewhat contentious with the modern trend in CASA research tending toward more complex approaches in order to resolve issues relating to co-incident partials. However, the choice of a sinusoidal representation has been validated by the demonstration of a method to resolve many of these issues. The majority of the thesis contributes several algorithms to organise the low-level primitives into low-level auditory objects that may form the basis of nodes or link anchor points in a hyperaudio structure. Finally, preliminary investigations in the representation’s suitability for coding and information management tasks are outlined as directions for future research.
APA, Harvard, Vancouver, ISO, and other styles
11

Melih, Kathy. "Audio Source Separation Using Perceptual Principles for Content-Based Coding and Information Management." Thesis, Griffith University, 2004. http://hdl.handle.net/10072/366279.

Full text
Abstract:
The information age has brought with it a dual problem. In the first place, the ready access to mechanisms to capture and store vast amounts of data in all forms (text, audio, image and video), has resulted in a continued demand for ever more efficient means to store and transmit this data. In the second, the rapidly increasing store demands effective means to structure and access the data in an efficient and meaningful manner. In terms of audio data, the first challenge has traditionally been the realm of audio compression research that has focused on statistical, unstructured audio representations that obfuscate the inherent structure and semantic content of the underlying data. This has only served to further complicate the resolution of the second challenge resulting in access mechanisms that are either impractical to implement, too inflexible for general application or too low level for the average user. Thus, an artificial dichotomy has been created from what is in essence a dual problem. The founding motivation of this thesis is that, although the hypermedia model has been identified as the ideal, cognitively justified method for organising data, existing audio data representations and coding models provide little, if any, support for, or resemblance to, this model. It is the contention of the author that any successful attempt to create hyperaudio must resolve this schism, addressing both storage and information management issues simultaneously. In order to achieve this aim, an audio representation must be designed that provides compact data storage while, at the same time, revealing the inherent structure of the underlying data. Thus it is the aim of this thesis to present a representation designed with these factors in mind. Perhaps the most difficult hurdle in the way of achieving the aims of content-based audio coding and information management is that of auditory source separation. The MPEG committee has noted this requirement during the development of its MPEG-7 standard, however, the mechanics of "how" to achieve auditory source separation were left as an open research question. This same committee proposed that MPEG-7 would "support descriptors that can act as handles referring directly to the data, to allow manipulation of the multimedia material." While meta-data tags are a part solution to this problem, these cannot allow manipulation of audio material down to the level of individual sources when several simultaneous sources exist in a recording. In order to achieve this aim, the data themselves must be encoded in such a manner that allows these descriptors to be formed. Thus, content-based coding is obviously required. In the case of audio, this is impossible to achieve without effecting auditory source separation. Auditory source separation is the concern of computational auditory scene analysis (CASA). However, the findings of CASA research have traditionally been restricted to a limited domain. To date, the only real application of CASA research to what could loosely be classified as information management has been in the area of signal enhancement for automatic speech recognition systems. In these systems, a CASA front end serves as a means of separating the target speech from the background "noise". As such, the design of a CASA-based approach, as presented in this thesis, to one of the most significant challenges facing audio information management research represents a significant contribution to the field of information management. Thus, this thesis unifies research from three distinct fields in an attempt to resolve some specific and general challenges faced by all three. It describes an audio representation that is based on a sinusoidal model from which low-level auditory primitive elements are extracted. The use of a sinusoidal representation is somewhat contentious with the modern trend in CASA research tending toward more complex approaches in order to resolve issues relating to co-incident partials. However, the choice of a sinusoidal representation has been validated by the demonstration of a method to resolve many of these issues. The majority of the thesis contributes several algorithms to organise the low-level primitives into low-level auditory objects that may form the basis of nodes or link anchor points in a hyperaudio structure. Finally, preliminary investigations in the representation’s suitability for coding and information management tasks are outlined as directions for future research.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
12

Trowitzsch, Ivo [Verfasser], Klaus [Akademischer Betreuer] Obermayer, Klaus [Gutachter] Obermayer, Dorothea [Gutachter] Kolossa, and Thomas [Gutachter] Sikora. "Robust sound event detection in binaural computational auditory scene analysis / Ivo Trowitzsch ; Gutachter: Klaus Obermayer, Dorothea Kolossa, Thomas Sikora ; Betreuer: Klaus Obermayer." Berlin : Technische Universität Berlin, 2020. http://d-nb.info/1210055120/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Roman, Nicoleta. "Auditory-based algorithms for sound segregation in multisource and reverberant environments." Connect to resource, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1124370749.

Full text
Abstract:
Thesis (Ph. D.)--Ohio State University, 2005.
Title from first page of PDF file. Document formatted into pages; contains i-xxii, xx-xxi, 183 p.; also includes graphics. Includes bibliographical references (p. 171-183). Available online via OhioLINK's ETD Center
APA, Harvard, Vancouver, ISO, and other styles
14

Otsuka, Takuma. "Bayesian Microphone Array Processing." 京都大学 (Kyoto University), 2014. http://hdl.handle.net/2433/188871.

Full text
Abstract:
Kyoto University (京都大学)
0048
新制・課程博士
博士(情報学)
甲第18412号
情博第527号
新制||情||93(附属図書館)
31270
京都大学大学院情報学研究科知能情報学専攻
(主査)教授 奥乃 博, 教授 河原 達也, 准教授 CUTURI CAMETO Marco, 講師 吉井 和佳
学位規則第4条第1項該当
APA, Harvard, Vancouver, ISO, and other styles
15

Jin, Zhaozhang. "Monaural Speech Segregation in Reverberant Environments." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1279141797.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Woodruff, John F. "Integrating Monaural and Binaural Cues for Sound Localization and Segregation in Reverberant Environments." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1332425718.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Wang, Yuxuan. "Supervised Speech Separation Using Deep Neural Networks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1426366690.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Chen, Jitong. "On Generalization of Supervised Speech Separation." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492038295603502.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Zhao, Yan. "Deep learning methods for reverberant and noisy speech enhancement." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1593462119759348.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Liu, Yuzhou. "Deep CASA for Robust Pitch Tracking and Speaker Separation." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1566179636974186.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Golden, H. L. "Auditory scene analysis in Alzheimer's disease." Thesis, University College London (University of London), 2016. http://discovery.ucl.ac.uk/1474234/.

Full text
Abstract:
This thesis explores the behavioural and neuroanatomical picture of Auditory Scene Analysis (ASA) in Alzheimer’s disease (AD). Central auditory dysfunction is an understudied symptom of AD and there has been little connection between the neuropathological profile of the disease, its relationship to generic ASA functions, and real-world listening situations. Utilising novel neuropsychological batteries alongside structural and functional imaging techniques, this thesis aims to bridge this gap through investigations of auditory spatial, speech in noise, and (as a specialised auditory scene) music processing. Spatial location discrimination and motion detection of sounds was impaired in both typical AD and posterior cortical atrophy; this was associated with atrophy in right inferior parietal and posterior medial regions. A functional imaging investigation of auditory spatial processing in typical AD revealed abnormalities in posterior medial cortical areas when sounds were changing in location. Functional imaging of an everyday auditory scenario (hearing one’s own name over background babble) highlighted alteration in a right inferior parietal region. Novel neuropsychological tasks assessing components of musical ‘scenes’ found that global aspects of pitch pattern processing were impaired in both the typical and language variant of AD while local aspects were preserved; both global and local forms of temporal processing were also intact. These patients also exhibited diminished tonality perception and musical stream segregation based on familiar templates. These investigations delineate reduced ASA capacity in a number of components that make up everyday auditory scenes. This has real world implications for both typical AD and its rarer phenotypes. Furthermore, ASA dysfunction may inform us about network breakdown, network function, and sources of phenotypic similarity in AD.
APA, Harvard, Vancouver, ISO, and other styles
22

Yan, Rujiao [Verfasser]. "Computational Audiovisual Scene Analysis / Rujiao Yan." Bielefeld : Universitätsbibliothek Bielefeld, 2014. http://d-nb.info/1058945572/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Sauvé, Sarah A. "Prediction in polyphony : modelling musical auditory scene analysis." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/46805.

Full text
Abstract:
How do we know that a melody is a melody? In other words, how does the human brain extract melody from a polyphonic musical context? This thesis begins with a theoretical presentation of musical auditory scene analysis (ASA) in the context of predictive coding and rule-based approaches and takes methodological and analytical steps to evaluate selected components of a proposed integrated framework for musical ASA, unified by prediction. Predictive coding has been proposed as a grand unifying model of perception, action and cognition and is based on the idea that brains process error to refine models of the world. Existing models of ASA tackle distinct subsets of ASA and are currently unable to integrate all the acoustic and extensive contextual information needed to parse auditory scenes. This thesis proposes a framework capable of integrating all relevant information contributing to the understanding of musical auditory scenes, including auditory features, musical features, attention, expectation and listening experience, and examines a subset of ASA issues - timbre perception in relation to musical training, modelling temporal expectancies, the relative salience of musical parameters and melody extraction - using probabilistic approaches. Using behavioural methods, attention is shown to influence streaming perception based on timbre more than instrumental experience. Using probabilistic methods, information content (IC) for temporal aspects of music as generated by IDyOM (information dynamics of music; Pearce, 2005), are validated and, along with IC for pitch and harmonic aspects of the music, are subsequently linked to perceived complexity but not to salience. Furthermore, based on the hypotheses that a melody is internally coherent and the most complex voice in a piece of polyphonic music, IDyOM has been extended to extract melody from symbolic representations of chorales by J.S. Bach and a selection of string quartets by W.A. Mozart.
APA, Harvard, Vancouver, ISO, and other styles
24

Hutchison, Joanna Lynn. "Boundary extension in the auditory domain." Fort Worth, Tex. : Texas Christian University, 2007. http://etd.tcu.edu/etdfiles/available/etd-07232007-150552/unrestricted/Hutchison.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Harding, Susan M. "Multi-resolution auditory scene analysis for speech perception : experimental evidence and a model." Thesis, Keele University, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.275284.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Atilgan, Huriye. "A visionary approach to listening : determining the role of vision in auditory scene analysis." Thesis, University College London (University of London), 2017. http://discovery.ucl.ac.uk/1573694/.

Full text
Abstract:
To recognize and understand the auditory environment, the listener must first separate sounds that arise from different sources and capture each event. This process is known as auditory scene analysis. The aim of this thesis is to investigate whether and how visual information can influence auditory scene analysis. The thesis consists of four chapters. Firstly, I reviewed the literature to give a clear framework about the impact of visual information on the analysis of complex acoustic environments. In chapter II, I examined psychophysically whether temporal coherence between auditory and visual stimuli was sufficient to promote auditory stream segregation in a mixture. I have found that listeners were better able to report brief deviants in an amplitude modulated target stream when a visual stimulus changed in size in a temporally coherent manner than when the visual stream was coherent with the non-target auditory stream. This work demonstrates that temporal coherence between auditory and visual features can influence the way people analyse an auditory scene. In chapter III, the integration of auditory and visual features in auditory cortex was examined by recording neuronal responses in awake and anaesthetised ferret auditory cortex in response to the modified stimuli used in Chapter II. I demonstrated that temporal coherence between auditory and visual stimuli enhances the neural representation of a sound and influences which sound a neuron represents in a sound mixture. Visual stimuli elicited reliable changes in the phase of the local field potential which provides mechanistic insight into this finding. Together these findings provide evidence that early cross modal integration underlies the behavioural effects in chapter II. Finally, in chapter IV, I investigated whether training can influence the ability of listeners to utilize visual cues for auditory stream analysis and showed that this ability improved by training listeners to detect auditory-visual temporal coherence.
APA, Harvard, Vancouver, ISO, and other styles
27

McMullan, Amanda R. "Electroencephalographic measures of auditory perception in dynamic acoustic environments." Thesis, Lethbridge, Alta. : University of Lethbridge, Dept. of Neuroscience, c2013, 2013. http://hdl.handle.net/10133/3354.

Full text
Abstract:
We are capable of effortlessly parsing a complex scene presented to us. In order to do this, we must segregate objects from each other and from the background. While this process has been extensively studied in vision science, it remains relatively less understood in auditory science. This thesis sought to characterize the neuroelectric correlates of auditory scene analysis using electroencephalography. Chapter 2 determined components evoked by first-order energy boundaries and second-order pitch boundaries. Chapter 3 determined components evoked by first-order and second-order discontinuous motion boundaries. Both of these chapters focused on analysis of event-related potential (ERP) waveforms and time-frequency analysis. In addition, these chapters investigated the contralateral nature of a negative ERP component. These results extend the current knowledge of auditory scene analysis by providing a starting point for discussing and characterizing first-order and second-order boundaries in an auditory scene.
x, 90 leaves : col. ill. ; 29 cm
APA, Harvard, Vancouver, ISO, and other styles
28

Shirazibeheshti, Amirali. "The effect of sedation on conscious processing : computational analysis of the EEG response to auditory irregularity." Thesis, University of Kent, 2015. https://kar.kent.ac.uk/54467/.

Full text
Abstract:
Characterising the relationships between conscious and unconscious processes is one of the most important goals in cognitive neuroscience. Behavioural studies as well as neuroimaging techniques have been conducted to understand the nature of conscious perception in the brain. Functional brain imaging and EEG (Electroencephalogram) methods allow for detailed exploration of neural and computational correlates of conscious and unconscious cognition. Using a high density EEG dataset, recorded from 129 electrodes over the scalp, we studied the neural responses of the brain to auditory stimuli. To this end, we employed an auditory oddball paradigm, called the local-global experiment. Bekinschtein et al (2009) designed this experiment to explore the neural dynamics at the early auditory cortex, associated with the MMN (mismatch negativity) component, generated by the local violation of auditory stimuli. They also investigated a later novelty response, associated with the P3 (a late positive response) component, which was generated by the global violation of auditory stimuli. Their findings suggest that the global response, corresponding to working memory updating, independently from the local response, is a signature of conscious processing. But our investigations shows that the local and global effects are not fully independent from one another. Therefore, we looked for other potential signatures of conscious processing. To do this, we studied 18 healthy participants who had been sedated. Using SPM (Statistical Parametric Mapping), which is a mass univariate approach, we analysed the sedation dataset in an omnibus statistical setting. We found an interaction between the local and global effects. In addition, we investigated the impact of sedation on both the early and late temporal components (i.e. the local and global effects), and their interaction. In addition to SPM analysis, we performed single-trial analysis. Unlike SPM analysis, which explores ERPs (average effect across replications) to assess significance, single-trial analysis looks for variation across replications, from one experimental level to another. More specifically, we looked at amplitude variation and temporal jitter when participants are sedated versus recovered. In the cases, when the null hypothesis is not rejected (i.e. there is no significant difference across different levels), we calculated Bayes factors to search for evidence in favour of the null hypothesis. With the exception of latency dispersion under dual (global and local) deviance, we could find no evidence for increased variability in single trial responses under sedation. This suggests the effects of reduced conscious level are systematic and can be summarised as an attenuation of dependency of (or interaction between) local and global processing.
APA, Harvard, Vancouver, ISO, and other styles
29

Ardam, Nagaraju. "Study of ASA Algorithms." Thesis, Linköpings universitet, Elektroniksystem, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70996.

Full text
Abstract:
Hearing aid devices are used to help people with hearing impairment. The number of people that requires hearingaid devices are possibly constant over the years, however the number of people that now have access to hearing aiddevices increasing rapidly. The hearing aid devices must be small, consume very little power, and be fairly accurate.Even though it is normally more important for the user that hearing impairment look good (are discrete). Once thehearing aid device prescribed to the user, she/he needs to train and adjust the device to compensate for the individualimpairment.We are within the framework of this project researching on hearing aid devices that can be trained by the hearingimpaired person her-/himself. This project is about finding suitable noise cancellation algorithm for the hearing-aiddevice. We consider several types of algorithms like, microphone array signal processing, Independent ComponentAnalysis (ICA) based on double microphone called Blind Source Separation (BSS) and DRNPE algorithm.We run this current and most sophisticated and robust algorithms in certain noise backgrounds like Cocktail noise,street, public places, train, babble situations to test the efficiency. The BSS algorithm was well in some situation andgave average results in some situations. Where one microphone gave steady results in all situations. The output isgood enough to listen targeted audio.The functionality and performance of the proposed algorithm is evaluated with different non-stationary noisebackgrounds. From the performance results it can be concluded that, by using the proposed algorithm we are able toreduce the noise to certain level. SNR, system delay, minimum error and audio perception are the vital parametersconsidered to evaluate the performance of algorithms. Based on these parameters an algorithm is suggested forheairng-aid.
Hearing-Aid
APA, Harvard, Vancouver, ISO, and other styles
30

Ravulapalli, Sunil Babu. "Association of Sound to Motion in Video Using Perceptual Organization." Scholar Commons, 2006. http://scholarcommons.usf.edu/etd/3769.

Full text
Abstract:
Technological developments and innovations of the first forty years of the digital era have primarily addressed either the audio or the visual senses. Consequently, designers have primarily focused on the audio or the visual aspects of design. In the perspective of video surveillance, the data under consideration has always been visual. However, in light of the new behavioral and physiological studies which established a proof of cross modality in human perception i.e. humans do not process audio and visual stimulus separately, but percieve a scene based on all stimulus available, similar cues are being used to develop a surveillance system which uses both audio and visual data available. Human beings can easily associate a particular sound to an object in the surrounding. Drawing from such studies, we demonstrate a technique by which we can isolate concurrent audio and video events and associate them based on perceptual grouping principles. Associating sound to an object can form apart of larger surveillance system by producing a better description of objects. We represent audio in the pitch-time domain and use image processing algorithms such as line detection to isolate significant events. These events and are then grouped based on gestalt principles of proximity and similarity which operates in audio. Once auditory events are isolated we can extract their periodicity. In video, we can extract objects by using simple background subtraction. We extract motion and shape periodicities of all the objects by tracking their position or the number of pixels in each frame. By comparing all the periodicities in audio and video using a simple index we can easily associate audio to video. We show results on five scenariosin outdoor settings with different kinds of human activity such as running, walking and other moving objects such as balls and cars.
APA, Harvard, Vancouver, ISO, and other styles
31

Hamid, Muhammad Raffay. "A computational framework for unsupervised analysis of everyday human activities." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/24765.

Full text
Abstract:
Thesis (Ph.D.)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Rehg
APA, Harvard, Vancouver, ISO, and other styles
32

Swadzba, Agnes [Verfasser]. "The robots vista space : a computational 3D scene analysis / Agnes Swadzba. AG Angewandte Informatik -- Technische Fakultät (Bereich Informatik). Sonderforschungsbereich: Alignment in Communication (DFG SFB 673)." Bielefeld : Universitätsbibliothek Bielefeld, Hochschulschriften, 2011. http://d-nb.info/1012433447/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Mahapatra, Arun Kiran. "Investigation of noise in hospital emergency departments." Thesis, Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/45842.

Full text
Abstract:
The hospital sound environment is complex. Emergency Departments (EDs), in particular, have proven to be hectic work environments populated with diverse sound sources. Medical equipment, alarms, and communication events generate noise that can interfere with staff concentration and communication. In this study, sound measurements and analyses were conducted in six hospitals total: three civilian hospitals in Atlanta, Georgia and Dublin, Ohio, as well as three Washington, DC-area hospitals in the Military Health System (MHS). The equivalent, minimum, and maximum sound pressure levels were recorded over twenty-four hours in several locations in each ED, with shorter 15-30 minute measurements performed in other areas. Acoustic descriptors, such as spectral content, level distributions, and speech intelligibility were examined. The perception of these acoustic qualities by hospital staff was also evaluated through subjective surveys. It was found that noise levels in both work areas and patient rooms were excessive. Additionally, speech intelligibility measurements and survey results show that background noise presents a significant obstacle in effective communication between staff members and patients. Compared to previous studies, this study looks at a wider range of acoustic metrics and the corresponding perceptions of staff in order to form a more precise and accurate depiction of the ED sound environment.
APA, Harvard, Vancouver, ISO, and other styles
34

Deleforge, Antoine. "Acoustic Space Mapping : A Machine Learning Approach to Sound Source Separation and Localization." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENM033/document.

Full text
Abstract:
Dans cette thèse, nous abordons le problème longtemps étudié de la séparation et localisation binaurale (deux microphones) de sources sonores par l'apprentissage supervisé. Dans ce but, nous développons un nouveau paradigme dénommé projection d'espaces acoustiques, à la croisé des chemins entre la perception binaurale, de l'écoute robotisée, du traitement du signal audio, et de l'apprentissage automatisé. L'approche proposée consiste à apprendre un lien entre les indices auditifs perçus par le système et la position de la source sonore dans une autre modalité du système, comme l'espace visuelle ou l'espace moteur. Nous proposons de nouveaux protocoles expérimentaux permettant d'acquérir automatiquement de grands ensembles d'entraînement qui associent des telles données. Les jeux de données obtenus sont ensuite utilisés pour révéler certaines propriétés intrinsèques des espaces acoustiques, et conduisent au développement d'une famille générale de modèles probabilistes permettant la projection localement linéaire d'un espace de haute dimension vers un espace de basse dimension. Nous montrons que ces modèles unifient plusieurs méthodes de régression et de réduction de dimension existantes, tout en incluant un grand nombre de nouveaux modèles qui généralisent les précédents. Les popriétés et l'inférence de ces modèles sont détaillées en profondeur, et le net avantage des méthodes proposées par rapport à des techniques de l'état de l'art est établit sur différentes applications de projection d'espace, au delà du champs de l'analyse de scènes auditives. Nous montrons ensuite comment les méthodes proposées peuvent être étendues probabilistiquement pour s'attaquer au fameux problème de la soirée cocktail, c'est à dire localiser une ou plusieurs sources émettant simultanément dans un environnement réel, et reséparer les signaux mélangés. Nous montrons que les techniques qui en découlent accomplissent cette tâche avec une précision inégalée. Ceci démontre le rôle important de l'apprentissage et met en avant le paradigme de la projection d'espaces acoustiques comme un outil prometteur pour aborder de façon robuste les problèmes les plus difficiles de l'audition binaurale computationnelle
In this thesis, we address the long-studied problem of binaural (two microphones) sound source separation and localization through supervised leaning. To achieve this, we develop a new paradigm referred as acoustic space mapping, at the crossroads of binaural perception, robot hearing, audio signal processing and machine learning. The proposed approach consists in learning a link between auditory cues perceived by the system and the emitting sound source position in another modality of the system, such as the visual space or the motor space. We propose new experimental protocols to automatically gather large training sets that associates such data. Obtained datasets are then used to reveal some fundamental intrinsic properties of acoustic spaces and lead to the development of a general family of probabilistic models for locally-linear high- to low-dimensional space mapping. We show that these models unify several existing regression and dimensionality reduction techniques, while encompassing a large number of new models that generalize previous ones. The properties and inference of these models are thoroughly detailed, and the prominent advantage of proposed methods with respect to state-of-the-art techniques is established on different space mapping applications, beyond the scope of auditory scene analysis. We then show how the proposed methods can be probabilistically extended to tackle the long-known cocktail party problem, i.e., accurately localizing one or several sound sources emitting at the same time in a real-word environment, and separate the mixed signals. We show that resulting techniques perform these tasks with an unequaled accuracy. This demonstrates the important role of learning and puts forwards the acoustic space mapping paradigm as a promising tool for robustly addressing the most challenging problems in computational binaural audition
APA, Harvard, Vancouver, ISO, and other styles
35

David, Marion. "Toward sequential segregation of speech sounds based on spatial cues." Thesis, Vaulx-en-Velin, Ecole nationale des travaux publics, 2014. http://www.theses.fr/2014ENTP0013/document.

Full text
Abstract:
Dans un contexte sonore constitué de plusieurs sources sonores, l’analyse de scène auditive a pour objectif de dresser une représentation précise et utile des sons perçus. Résoudre ce type de scènes consiste à regrouper les sons provenant d’une même source et de les séparer des autres sons. Ce travail de thèse a eu pour but d’approfondir nos connaissances du traitement de ces scènes auditives complexes par le système auditif. En particulier, il s’agissait d’étudier l’influence potentielle des indices spatiaux sur la ségrégation. Une attention particulière a été portée tout au long de cette thèse pour intégrer des éléments réalistes dans toutes les études menées. Dans un environnement réel, la salle et la tête entraînent des distorsions des signaux de parole en fonction des positions de la source et du récepteur. Ce phénomène est appelé coloration. Comme première approximation de la parole, des bruits avec un spectre de parole ont été utilisés pour évaluer l’effet de la coloration. Les résultats ont montré que les fines différences spectrales monaurales induites par la coloration due à la tête et à la salle peuvent engendrer de la ségrégation. De plus, cette ségrégation peut être renforcée en ajoutant les indices binauraux associés à une position donnée (ILD, ITD). En particulier, une deuxième étude a suggéré que les variations monaurales d’intensité au cours du temps à chaque oreille étaient plus utiles pour la ségrégation que les différences interaurales de niveau. Les résultats ont également montré que le percept de latéralisation, associé à un ITD donné, favorise la ségrégation lorsque ce percept est suffisamment saillant. Par ailleurs, l’ITD per se peut induire de la ségrégation. La capacité naturelle à résoudre perceptivement une scène auditive est pertinente pour l’intelligibilité de la parole. L’objectif était de répliquer ces premières expériences, donc évaluer l’influence des indices spatiaux sur la ségrégation de signaux de parole à la place de bruits gelés. Une caractéristique de la parole est la grande variabilité de ses paramètres acoustiques qui permettent de transmettre de l’information. Ainsi, la première étape a été d’étudier dans quelle mesure la ségrégation basée sur une différence de fréquence peut être influencée par l’introduction de variabilité spectrale au sein des stimuli. L’étape suivante a été d’évaluer la différence de fréquence fondamentale requise pour séparer des flux de parole. En effet, il a été supposé que des indices de position pourraient être utiles pour renforcer la ségrégation basée sur un indice plus robuste comme une différence de F0 du fait de leur stabilité au cours du temps dans des situations réelles. Les résultats de ces expériences préliminaires ont montré que l’introduction d’une large variabilité spectrale au sein de flux de sons purs pouvait entraîner un percept compliqué, probablement constitué des multiples flux sonores. De plus, les résultats ont indiqué qu’une différence de F0 comprise entre 3 et 5 demi-tons permettait de séparer des signaux de parole. Les résultats de ces expériences pourront être utilisés pour concevoir la prochaine expérience visant à étudier dans quelle mesure un percept ambigu peut évoluer vers de la ségrégation par l’introduction d’indices de position
In a context of competing sound sources, the auditory scene analysis aims to draw an accurate and useful representation of the perceived sounds. Solving such a scene consists of grouping sound events which come from the same source and segregating them from the other sounds. This PhD work intended to further our understanding of how the human auditory system processes these complex acoustic environments, with a particular emphasis on the potential influence of spatial cues on perceptual stream segregation. All the studies conducted during this PhD endeavoured to rely on realistic configurations.In a real environment, the diffraction and reflection properties of the room and the head lead to distortions of the sounds depending on the source and receiver positions. This phenomenon is named colouration. Speechshaped noises, as a first approximation of speech sounds, were used to evaluate the effect of this colouration on stream segregation. The results showed that the slight monaural spectral differences induced by head and room colouration can induce segregation. Moreover, this segregation was enhanced by adding the binaural cues associated with a given position (ITD, ILD). Especially, a second study suggested that the monaural intensity variations across time at each ear were more relevant for stream segregation than the interaural level differences. The results also indicated that the percept of lateralization associated with a given ITD helped the segregation when the lateralization was salient enough. Besides, the ITD per se could also favour segregation.The natural ability to perceptually solve an auditory scene is relevant for speech intelligibility. The main idea was to replicate the first experiments with speech items instead of frozen noises. A characteristic of running speech is a high degree of acoustical variability used to convey information. Thus, as a first step, we investigated the robustness of stream segregation based on a frequency difference to variability on the same acoustical cue (i.e., frequency). The second step was to evaluate the fundamental frequency difference that enables to separate speech items. Indeed, according to the limited effects measured in the two first experiments, it was assumed that spatial cues might be relevant for stream segregation only in interaction with another “stronger” cue such as a F0 difference.The results of these preliminary experiments showed first that the introduction of a large spectral variability introduced within pure tone streams can lead to a complicated percept, presumably consisting of multiple streams. Second, the results suggested that a fundamental frequency difference comprised between 3 and 5 semitones enables to separate speech item. These experiments provided results that will be used to design the next experiment investigating how an ambiguous percept could be biased toward segregation by introducing spatial cues
APA, Harvard, Vancouver, ISO, and other styles
36

Devergie, Aymeric. "Interactions audiovisuelles pour l'analyse de scènes auditives." Phd thesis, Université Claude Bernard - Lyon I, 2010. http://tel.archives-ouvertes.fr/tel-00830927.

Full text
Abstract:
Percevoir la parole dans le bruit représente une opération complexe pour notre système perceptif. Pour parvenir à analyser cette scène auditive, nous mettons en place des mécanismes de ségrégation auditive. Nous pouvons également lire sur les lèvres pour améliorer notre compréhension de la parole. L'hypothèse initiale, présentée dans ce travail de thèse, est que ce bénéfice visuel pourrait en partie reposer sur des interactions entre l'information visuelle et les mécanismes de ségrégation auditive. Les travaux réalisés montrent que lorsque la cohérence audiovisuelle est importante, les mécanismes de ségrégation précoce peuvent être renforcés. Les mécanismes de ségrégation tardives, quant à eux, ont été démontré comme mettant en jeu des processus attentionnels. Ces processus attentionnels pourraient donc être renforcés par la présentation d'un indice visuel lié perceptivement. Il apparaît que ce liage entre un flux de voyelles et un indice visuel élémentaire est possible mais cependant moins fort que lorsque l'indice visuel possède un contenu phonétique. En conclusion, les résultats présentés dans ce travail suggèrent que les mécanismes de ségrégation auditive puissent être influencés par un indice visuel pour peu que la cohérence audiovisuelle soit importante comme dans le cas de la parole.
APA, Harvard, Vancouver, ISO, and other styles
37

Huet, Moïra-Phoebé. "Voice mixology at a cocktail party : Combining behavioural and neural tracking for speech segregation." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI070.

Full text
Abstract:
Il n’est pas toujours aisé de suivre une conversation dans un environnement bruyant. Pour parvenir à discriminer deux locuteurs, nous devons mobiliser de nombreux mécanismes perceptifs et cognitifs, ce qui peut parfois entraîner un basculement momentané de notre attention auditive sur les discussions alentour. Dans cette thèse, les processus qui sous-tendent la ségrégation de la parole sont explorés à travers des expériences comportementales et neurophysiologiques. Dans un premier temps, le développement d’une tâche d’intelligibilité – le Long-SWoRD test – est introduit. Ce nouveau protocole permet, tout d’abord, de s’approcher de situations réalistes et, in fine, de bénéficier pour les participants de ressources cognitives, telles que des connaissances linguistiques, pour séparer deux locuteurs. La similarité entre les locuteurs, et donc par extension la difficulté de la tâche, a été contrôlée en manipulant les paramètres des voix. Dans un deuxième temps, les performances des sujets avec cette nouvelle tâche est évaluée à travers trois études comportementales et neurophysiologiques (EEG). Les résultats comportementaux sont cohérents avec la littérature et montrent que la distance entre les voix, les indices de spatialisation, ainsi que les informations sémantiques influencent les performances des participants. Les résultats neurophysiologiques, analysés avec des fonctions de réponse temporelle (TRF), suggèrent que les représentations neuronales des deux locuteurs diffèrent selon la difficulté des conditions d’écoute. Par ailleurs, ces représentations se construisent plus rapidement lorsque les voix sont facilement distinguables. Il est souvent supposé dans la littérature que l’attention des participants reste constamment sur la même voix. Le protocole expérimental présenté dans ce travail permet également d’inférer rétrospectivement à quel moment et quelle voix les participants écoutaient. C’est pourquoi, dans un troisième temps, une analyse combinée de ces informations attentionnelles et des signaux EEG est présentée. Les résultats soulignent que les informations concernant le focus attentionnel peuvent être utilisées avantageusement pour améliorer la représentation neuronale du locuteur sur lequel est porté la concentration dans les situations où les voix sont similaires
It is not always easy to follow a conversation in a noisy environment. In order to discriminate two speakers, we have to mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background. In this dissertation, the processes underlying speech segregation are explored through behavioural and neurophysiological experiments. In a preliminary phase, the development of an intelligibility task -- the Long-SWoRD test -- is introduced. This protocol allows participants to benefit from cognitive resources, such as linguistic knowledge, to separate two talkers in a realistic listening environment. The similarity between the two speakers, and thus by extension the difficulty of the task, was controlled by manipulating the acoustic parameters of the target and masker voices. In a second phase, the performance of the participants on this task is evaluated through three behavioural and neurophysiological studies (EEG). Behavioural results are consistent with the literature and show that the distance between voices, spatialisation cues, and semantic information influence participants' performance. Neurophysiological results, analysed with temporal response functions (TRF), indicate that the neural representations of the two speakers differ according to the difficulty of listening conditions. In addition, these representations are constructed more quickly when the voices are easily distinguishable. It is often presumed in the literature that participants' attention remains constantly on the same voice. The experimental protocol presented in this work provides the opportunity to retrospectively infer when participants were listening to each voice. Therefore, in a third stage, a combined analysis of this attentional information and EEG signals is presented. Results show that information about attentional focus can be used to improve the neural representation of the attended voice in situations where the voices are similar
APA, Harvard, Vancouver, ISO, and other styles
38

Dufour, Jean-Yves. "Contribution algorithmique a la conception d'un systeme integre d'analyse temps-reel de scenes dynamiques : reconnaissance d'objets et analyse de mouvement dans une sequence d'images." Caen, 1988. http://www.theses.fr/1988CAEN2034.

Full text
Abstract:
Deux problemes de l'analyse de scene ont ete etudies: la reconnaissance automatique d'objets et l'analyse temporelle des images fournies par le senseur. En ce qui concerne la reconnaissance d'objets, l'interet a ete porte sur l'aspect "decision" du probleme. Pour la partie analyse temporelle les travaux ont ete consacres a la conception d'un module de recalage d'images
APA, Harvard, Vancouver, ISO, and other styles
39

Mouterde, Solveig. "Long-range discrimination of individual vocal signatures by a songbird : from propagation constraints to neural substrate." Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4012/document.

Full text
Abstract:
L'un des plus grands défis posés par la communication est que l'information codée par l'émetteur est toujours modifiée avant d'atteindre le récepteur, et que celui-ci doit traiter cette information altérée afin de recouvrer le message. Ceci est particulièrement vrai pour la communication acoustique, où la transmission du son dans l'environnement est une source majeure de dégradation du signal, ce qui diminue l'intensité du signal relatif au bruit. La question de savoir comment les animaux transmettent l'information malgré ces conditions contraignantes a été l'objet de nombreuses études, portant soit sur l'émetteur soit sur le récepteur. Cependant, une recherche plus intégrée sur l'analyse de scènes auditives est nécessaire pour aborder cette tâche dans toute sa complexité. Le but de ma recherche était d'utiliser une approche transversale afin d'étudier comment les oiseaux s'adaptent aux contraintes de la communication à longue distance, en examinant le codage de l'information au niveau de l'émetteur, les dégradations du signal acoustiques dues à la propagation, et la discrimination de cette information dégradée par le récepteur, au niveau comportemental comme au niveau neuronal. J'ai basé mon travail sur l'idée de prendre en compte les problèmes réellement rencontrés par les animaux dans leur environnement naturel, et d'utiliser des stimuli reflétant la pertinence biologique des problèmes posés à ces animaux. J'ai choisi de me focaliser sur l'information d'identité individuelle contenue dans le cri de distance des diamants mandarins (Taeniopygia guttata) et d'examiner comment la signature vocale individuelle est codée, dégradée, puis discriminée et décodée, depuis l'émetteur jusqu'au récepteur. Cette étude montre que la signature individuelle des diamants mandarins est très résistante à la propagation, et que les paramètres acoustiques les plus individualisés varient selon la distance considérée. En testant des femelles dans les expériences de conditionnement opérant, j'ai pu montrer que celles-ci sont expertes pour discriminer entre les signature vocales dégradées de deux mâles, et qu'elles peuvent s'améliorer en s'entraînant. Enfin, j'ai montré que cette capacité de discrimination impressionnante existe aussi au niveau neuronal : nous avons montré l'existence d'une population de neurones pouvant discriminer des voix individuelles à différent degrés de dégradation, sans entrainement préalable. Ce niveau de traitement évolué, dans le cortex auditif primaire, ouvre la voie à de nouvelles recherches, à l'interface entre le traitement neuronal de l'information et le comportement
In communication systems, one of the biggest challenges is that the information encoded by the emitter is always modified before reaching the receiver, who has to process this altered information in order to recover the intended message. In acoustic communication particularly, the transmission of sound through the environment is a major source of signal degradation, caused by attenuation, absorption and reflections, all of which lead to decreases in the signal relative to the background noise. How animals deal with the need for exchanging information in spite of constraining conditions has been the subject of many studies either at the emitter or at the receiver's levels. However, a more integrated research about auditory scene analysis has seldom been used, and is needed to address the complexity of this process. The goal of my research was to use a transversal approach to study how birds adapt to the constraints of long distance communication by investigating the information coding at the emitter's level, the propagation-induced degradation of the acoustic signal, and the discrimination of this degraded information by the receiver at both the behavioral and neural levels. Taking into account the everyday issues faced by animals in their natural environment, and using stimuli and paradigms that reflected the behavioral relevance of these challenges, has been the cornerstone of my approach. Focusing on the information about individual identity in the distance calls of zebra finches Taeniopygia guttata, I investigated how the individual vocal signature is encoded, degraded, and finally discriminated, from the emitter to the receiver. This study shows that the individual signature of zebra finches is very resistant to propagation-induced degradation, and that the most individualized acoustic parameters vary depending on distance. Testing female birds in operant conditioning experiments, I showed that they are experts at discriminating between the degraded vocal signatures of two males, and that they can improve their ability substantially when they can train over increasing distances. Finally, I showed that this impressive discrimination ability also occurs at the neural level: we found a population of neurons in the avian auditory forebrain that discriminate individual voices with various degrees of propagation-induced degradation without prior familiarization or training. The finding of such a high-level auditory processing, in the primary auditory cortex, opens a new range of investigations, at the interface of neural processing and behavior
APA, Harvard, Vancouver, ISO, and other styles
40

Camonin, Martine. "Mephisto : un outil de validation de modèles tridimensionnels." Nancy 1, 1987. http://www.theses.fr/1987NAN10149.

Full text
Abstract:
Le système présente a été développé dans le cadre d'un système d'interprétation de scènes tridimentionnelles (Trident). Le modèle choisi permet de décrire des familles d'objets génériques construits par unions de primitives. La tache du système Mephisto est de décider de la cohérence d'un modèle fourni par l'utilisateur avant qu'il ne soit utilisé par trident. Dans le contexte de la représentation choisie, un modèle peut être vu comme un graphe et/ou avec contraintes. Une stratégie de recherche de chemin dans un graphe et/ou, minimisant en moyenne les coûts de construction, à partir d'une évaluation des espérances de succès de cette construction
APA, Harvard, Vancouver, ISO, and other styles
41

Salamon, Justin J. "Melody extraction from polyphonic music signals." Doctoral thesis, Universitat Pompeu Fabra, 2013. http://hdl.handle.net/10803/123777.

Full text
Abstract:
Music was the first mass-market industry to be completely restructured by digital technology, and today we can have access to thousands of tracks stored locally on our smartphone and millions of tracks through cloud-based music services. Given the vast quantity of music at our fingertips, we now require novel ways of describing, indexing, searching and interacting with musical content. In this thesis we focus on a technology that opens the door to a wide range of such applications: automatically estimating the pitch sequence of the melody directly from the audio signal of a polyphonic music recording, also referred to as melody extraction. Whilst identifying the pitch of the melody is something human listeners can do quite well, doing this automatically is highly challenging. We present a novel method for melody extraction based on the tracking and characterisation of the pitch contours that form the melodic line of a piece. We show how different contour characteristics can be exploited in combination with auditory streaming cues to identify the melody out of all the pitch content in a music recording using both heuristic and model-based approaches. The performance of our method is assessed in an international evaluation campaign where it is shown to obtain state-of-the-art results. In fact, it achieves the highest mean overall accuracy obtained by any algorithm that has participated in the campaign to date. We demonstrate the applicability of our method both for research and end-user applications by developing systems that exploit the extracted melody pitch sequence for similarity-based music retrieval (version identification and query-by-humming), genre classification, automatic transcription and computational music analysis. The thesis also provides a comprehensive comparative analysis and review of the current state-of-the-art in melody extraction and a first of its kind analysis of melody extraction evaluation methodology.
La industria de la música fue una de las primeras en verse completamente reestructurada por los avances de la tecnología digital, y hoy en día tenemos acceso a miles de canciones almacenadas en nuestros dispositivos móviles y a millones más a través de servicios en la nube. Dada esta inmensa cantidad de música al nuestro alcance, necesitamos nuevas maneras de describir, indexar, buscar e interactuar con el contenido musical. Esta tesis se centra en una tecnología que abre las puertas a nuevas aplicaciones en este área: la extracción automática de la melodía a partir de una grabación musical polifónica. Mientras que identificar la melodía de una pieza es algo que los humanos pueden hacer relativamente bien, hacerlo de forma automática presenta mucha complejidad, ya que requiere combinar conocimiento de procesado de señal, acústica, aprendizaje automático y percepción sonora. Esta tarea se conoce en el ámbito de investigación como “extracción de melodía”, y consiste técnicamente en estimar la secuencia de alturas correspondiente a la melodía predominante de una pieza musical a partir del análisis de la señal de audio. Esta tesis presenta un método innovador para la extracción de la melodía basado en el seguimiento y caracterización de contornos tonales. En la tesis, mostramos cómo se pueden explotar las características de contornos en combinación con reglas basadas en la percepción auditiva, para identificar la melodía a partir de todo el contenido tonal de una grabación, tanto de manera heurística como a través de modelos aprendidos automáticamente. A través de una iniciativa internacional de evaluación comparativa de algoritmos, comprobamos además que el método propuesto obtiene resultados punteros. De hecho, logra la precisión más alta de todos los algoritmos que han participado en la iniciativa hasta la fecha. Además, la tesis demuestra la utilidad de nuestro método en diversas aplicaciones tanto de investigación como para usuarios finales, desarrollando una serie de sistemas que aprovechan la melodía extraída para la búsqueda de música por semejanza (identificación de versiones y búsqueda por tarareo), la clasificación del estilo musical, la transcripción o conversión de audio a partitura, y el análisis musical con métodos computacionales. La tesis también incluye un amplio análisis comparativo del estado de la cuestión en extracción de melodía y el primer análisis crítico existente de la metodología de evaluación de algoritmos de este tipo
La indústria musical va ser una de les primeres a veure's completament reestructurada pels avenços de la tecnologia digital, i avui en dia tenim accés a milers de cançons emmagatzemades als nostres dispositius mòbils i a milions més a través de serveis en xarxa. Al tenir aquesta immensa quantitat de música al nostre abast, necessitem noves maneres de descriure, indexar, buscar i interactuar amb el contingut musical. Aquesta tesi es centra en una tecnologia que obre les portes a noves aplicacions en aquesta àrea: l'extracció automàtica de la melodia a partir d'una gravació musical polifònica. Tot i que identificar la melodia d'una peça és quelcom que els humans podem fer relativament fàcilment, fer-ho de forma automàtica presenta una alta complexitat, ja que requereix combinar coneixement de processament del senyal, acústica, aprenentatge automàtic i percepció sonora. Aquesta tasca es coneix dins de l'àmbit d'investigació com a “extracció de melodia”, i consisteix tècnicament a estimar la seqüència de altures tonals corresponents a la melodia predominant d'una peça musical a partir de l'anàlisi del senyal d'àudio. Aquesta tesi presenta un mètode innovador per a l'extracció de la melodia basat en el seguiment i caracterització de contorns tonals. Per a fer-ho, mostrem com es poden explotar les característiques de contorns combinades amb regles basades en la percepció auditiva per a identificar la melodia a partir de tot el contingut tonal d'una gravació, tant de manera heurística com a través de models apresos automàticament. A més d'això, comprovem a través d'una iniciativa internacional d'avaluació comparativa d'algoritmes que el mètode proposat obté resultats punters. De fet, obté la precisió més alta de tots els algoritmes proposats fins la data d'avui. A demés, la tesi demostra la utilitat del mètode en diverses aplicacions tant d'investigació com per a usuaris finals, desenvolupant una sèrie de sistemes que aprofiten la melodia extreta per a la cerca de música per semblança (identificació de versions i cerca per taral•larà), la classificació de l'estil musical, la transcripció o conversió d'àudio a partitura, i l'anàlisi musical amb mètodes computacionals. La tesi també inclou una àmplia anàlisi comparativa de l'estat de l'art en extracció de melodia i la primera anàlisi crítica existent de la metodologia d'avaluació d'algoritmes d'aquesta mena.
APA, Harvard, Vancouver, ISO, and other styles
42

Ellis, Daniel P. W. "Prediction-driven computational auditory scene analysis." Thesis, 1996. https://doi.org/10.7916/D84J0N13.

Full text
Abstract:
The sound of a busy environment, such as a city street, gives rise to a perception of numerous distinct events in a human listener--the 'auditory scene analysis' of the acoustic information. Recent advances in the understanding of this process from experimental psychoacoustics have led to several efforts to build a computer model capable of the same function. This work is known as 'computational auditory scene analysis'. The dominant approach to this problem has been as a sequence of modules, the output of one forming the input to the next. Sound is converted to its spectrum, cues are picked out, and representations of the cues are grouped into an abstract description of the initial input. This 'data-driven' approach has some specific weaknesses in comparison to the auditory system: it will interpret a given sound in the same way regardless of its context, and it cannot 'infer' the presence of a sound for which direct evidence is hidden by other components. The 'prediction-driven' approach is presented as an alternative, in which analysis is a process of reconciliation between the observed acoustic features and the predictions of an internal model of the sound-producing entities in the environment. In this way, predicted sound events will form part of the scene interpretation as long as they are consistent with the input sound, regardless of whether direct evidence is found. A blackboard-based implementation of this approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft. The system is assessed through experiments that firstly investigate subjects' perception of distinct events in ambient sound examples, and secondly collect quality judgments for sound events resynthesized by the system. Although rated as far from perfect, there was good agreement between the events detected by the model and by the listeners. In addition, the experimental procedure does not depend on special aspects of the algorithm (other than the generation of resyntheses), and is applicable to the assessment and comparison of other models of human auditory organization.
APA, Harvard, Vancouver, ISO, and other styles
43

Yang, Cheng-Jia, and 楊政家. "Computational Auditory Scene Analysis for Speech Segregation." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/19929826462460307835.

Full text
Abstract:
碩士
國立高雄第一科技大學
電腦與通訊工程研究所
102
Speech Separation is one of the most difficult task. It uses computational auditory scene analysis to separate in this study. The main Calculation methods is simulate Human auditory via computer, including Gammatone filter banks to simulate cochlear, and Meddis inner hair cell model simulate the wave form signal to Electrical signals. After the Gammatone filter banks and Meddis inner hair cell, the input signal will convert to time-frequency units and through classifier to classify which are noise units or speech units. Then collect all speech unit and reorganize the speech. It uses computational auditory scene analysis and support vector machine to separate speech in this study. The features of signal is using Mel-frequency cepstral coefficients and pitch as a basis for classification. The result of classification is a binary mask, In addition to process binary mask for image, including noise remove, hole filled, morphology opening and closing. We use noise remove, hole filled, morphology to compare noise filtering and speech distortion in experiments. It showed that using image process can improve binary mask.
APA, Harvard, Vancouver, ISO, and other styles
44

Chou, Kenny. "A biologically inspired approach to the cocktail party problem." Thesis, 2020. https://hdl.handle.net/2144/41043.

Full text
Abstract:
At a cocktail party, one can choose to scan the room for conversations of interest, attend to a specific conversation partner, switch between conversation partners, or not attend to anything at all. The ability of the normal-functioning auditory system to flexibly listen in complex acoustic scenes plays a central role in solving the cocktail party problem (CPP). In contrast, certain demographics (e.g., individuals with hearing impairment or older adults) are unable to solve the CPP, leading to psychological ailments and reduced quality of life. Since the normal auditory system still outperforms machines in solving the CPP, an effective solution may be found by mimicking the normal-functioning auditory system. Spatial hearing likely plays an important role in CPP-processing in the auditory system. This thesis details the development of a biologically based approach to the CPP by modeling specific neural mechanisms underlying spatial tuning in the auditory cortex. First, we modeled bottom-up, stimulus-driven mechanisms using a multi-layer network model of the auditory system. To convert spike trains from the model output into audible waveforms, we designed a novel reconstruction method based on the estimation of time-frequency masks. We showed that our reconstruction method produced sounds with significantly higher intelligibility and quality than previous reconstruction methods. We also evaluated the algorithm's performance using a psychoacoustic study, and found that it provided the same amount of benefit to normal-hearing listeners as a current state-of-the-art acoustic beamforming algorithm. Finally, we modeled top-down, attention driven mechanisms that allowed the network to flexibly operate in different regimes, e.g., monitor the acoustic scene, attend to a specific target, and switch between attended targets. The model explains previous experimental observations, and proposes candidate neural mechanisms underlying flexible listening in cocktail-party scenarios. The strategies proposed here would benefit hearing-assistive devices for CPP processing (e.g., hearing aids), where users would benefit from switching between various modes of listening in different social situations.
2022-05-19T00:00:00Z
APA, Harvard, Vancouver, ISO, and other styles
45

Armstrong, Jonathan M. "Machine listening, musicological analysis and the creative process : the production of song-based music influenced by augmented listening techniques." Thesis, 2020. http://hdl.handle.net/1959.7/uws:68395.

Full text
Abstract:
The creative process of making recordings in the popular music sphere is impossible to disconnect from the concept of influence. Whether practitioners are influenced consciously or subconsciously, and push toward, or away from their influences, they are shaped by the music they hear. This research-led practice project augments the influencing factors in the creation of an album of song-based music by foregrounding the listening process. This is approached by conducting an in-depth analysis of a set of tracks using a combined methodology integrating traditional popular music analysis techniques, with music information retrieval (MIR) tools. My methodology explores the novel applicability of these computational tools in a musicological context, with one goal being to show the value of machine listening in popular musicological research and the processes of composition and production. The emerging field of Digital Musicology takes advantage of big data and statistical analysis to allow for large scale observation and comparison of datasets in a way that would be unrealistic for one person to attempt without the aid of machine listening. Setting aside the intention and reception components of musicology, the goal of musical output suggests a feature-based approach is well suited to the task of investigating these methods. Utilising the musical ideas generated through the combined analysis of tracks compiled from the Billboard Alternative, year-end charts of 2011-2015, the songs written and recordings produced for the album Stay Still | Please Hear are a result of allowing a conscious subversion of my usual creative process through the expansion of my field of musical influence. This discursive component shows the development of my combined analysis methodology, highlights the points where creative influence occurred in the arrangement and production of Stay Still | Please Hear, emphasises the value of MIR tools in expanding the scope of musicological analysis, and demonstrates a unique approach to the development of artistic practice from the perspective of a creative practitioner.
APA, Harvard, Vancouver, ISO, and other styles
46

Cantu, Marcos Antonio. "Sound source segregation of multiple concurrent talkers via Short-Time Target Cancellation." Thesis, 2018. https://hdl.handle.net/2144/32082.

Full text
Abstract:
The Short-Time Target Cancellation (STTC) algorithm, developed as part of this dissertation research, is a “Cocktail Party Problem” processor that can boost speech intelligibility for a target talker from a specified “look” direction, while suppressing the intelligibility of competing talkers. The algorithm holds promise for both automatic speech recognition and assistive listening device applications. The STTC algorithm operates on a frame-by-frame basis, leverages the computational efficiency of the Fast Fourier Transform (FFT), and is designed to run in real time. Notably, performance in objective measures of speech intelligibility and sound source segregation is comparable to that of the Ideal Binary Mask (IBM) and Ideal Ratio Mask (IRM). Because the STTC algorithm computes a time-frequency mask that can be applied independently to both the left and right signals, binaural cues for spatial hearing, including Interaural Time Differences (ITDs), Interaural Level Differences (ILDs) and spectral cues, can be preserved in potential hearing aid applications. A minimalist design for a proposed STTC Assistive Listening Device (ALD), consisting of six microphones embedded in the frame of a pair of eyeglasses, is presented and evaluated using virtual room acoustics and both objective and behavioral measures. The results suggest that the proposed STTC ALD can provide a significant speech intelligibility benefit in complex auditory scenes comprised of multiple spatially separated talkers.
2020-10-22T00:00:00Z
APA, Harvard, Vancouver, ISO, and other styles
47

Gillingham, Susan. "Auditory Search: The Deployment of Attention within a Complex Auditory Scene." Thesis, 2012. http://hdl.handle.net/1807/33220.

Full text
Abstract:
Current theories of auditory attention are largely based upon studies examining either the presentation of a single auditory stimulus or requiring the identification and labeling of stimuli presented sequentially. Whether or not these theories apply in more complex ecologically-valid environments where multiple sound sources are simultaneously active is still unknown. This study examined the pattern of neuromagnetic responses elicited when participants had to perform a search in an auditory language-based `scene` for a stimulus matching an imperative target held in working memory. The analysis of source waveforms revealed left lateralized patterns of activity that distinguished target present from target absent trials. Similar source waveform amplitudes were found when the target was presented in the left or right hemispace. The results suggest that auditory search for speech sounds engage a left lateralized process in the superior temporal gyrus.
APA, Harvard, Vancouver, ISO, and other styles
48

Νταλαμπίρας, Σταύρος. "Ψηφιακή επεξεργασία και αυτόματη κατηγοριοποίηση περιβαλλοντικών ήχων." Thesis, 2010. http://nemertes.lis.upatras.gr/jspui/handle/10889/3705.

Full text
Abstract:
Στο κεφάλαιο 1 παρουσιάζεται μία γενική επισκόπηση της αυτόματης αναγνώρισης γενικευμένων ακουστικών γεγονότων. Επιπλέον συζητάμε τις εφαρμογές της τεχνολογίας αναγνώρισης ακουστικού σήματος και δίνουμε μία σύντομη περιγραφή του state of the art. Τέλος, αναφέρουμε τη συνεισφορά της διατριβής. Στο κεφάλαιο 2 εισάγουμε τον αναγνώστη στο χώρο της επεξεργασίας ακουστικών σημάτων που δε περιλαμβάνουν ομιλία. Παρουσιάζονται οι σύγχρονες προσεγγίσεις όσον αφορά στις μεθοδολογίες εξαγωγής χαρακτηριστικών και αναγνώρισης προτύπων. Στο κεφάλαιο 3 προτείνεται ένα καινοτόμο σύστημα αναγνώρισης ήχων ειδικά σχεδιασμένο για το χώρο των ηχητικών γεγονότων αστικού περιβάλλοντος και αναλύεται ο σχεδιασμός της αντίστοιχης βάσης δεδομένων. Δημιουργήθηκε μία ιεραρχική πιθανοτική δομή μαζί με δύο ομάδες ακουστικών παραμέτρων που οδηγούν σε υψηλή ακρίβεια αναγνώρισης. Στο κεφάλαιο 4 ερευνάται η χρήση της τεχνικής πολλαπλών αναλύσεων όπως εφαρμόζεται στο πρόβλημα της διάκρισης ομιλίας/μουσικής. Στη συνέχεια η τεχνική αυτή χρησιμοποιήθηκε για τη δημιουργία ενός συστήματος το οποίο συνδυάζει χαρακτηριστικά από διαφορετικά πεδία με στόχο την αποδοτική ανάλυση online ραδιοφωνικών σημάτων. Στο κεφάλαιο 5 προτείνεται ένα σύστημα το οποίο εντοπίζει μη-τυπικές καταστάσεις σε περιβάλλον σταθμού μετρό με στόχο να βοηθήσει το εξουσιοδοτημένο προσωπικό στην συνεχή επίβλεψη του χώρου. Στο κεφάλαιο 6 προτείνεται ένα προσαρμοζόμενο σύστημα για ακουστική παρακολούθηση εν δυνάμει καταστροφικών καταστάσεων ικανό να λειτουργεί κάτω από διαφορετικά περιβάλλοντα. Δείχνουμε ότι το σύστημα επιτυγχάνει υψηλή απόδοση και μπορεί να προσαρμόζεται αυτόνομα σε ετερογενείς ακουστικές συνθήκες. Στο κεφάλαιο 7 ερευνάται η χρήση της μεθόδου ανίχνευσης καινοτομίας για ακουστική επόπτευση κλειστών και ανοιχτών χώρων. Ηχογραφήθηκε μία βάση δεδομένων πραγματικού κόσμου και προτείνονται τρεις πιθανοτικές τεχνικές. Στο κεφάλαιο 8 παρουσιάζεται μία καινοτόμα μεθοδολογία για αναγνώριση γενικευμένου ακουστικού σήματος που οδηγεί σε υψηλή ακρίβεια αναγνώρισης. Εκμεταλλευόμαστε τα πλεονεκτήματα της χρονικής συγχώνευσης χαρακτηριστικών σε συνδυασμό με μία παραγωγική τεχνική κατηγοριοποίησης.
The dissertation is outlined as followed: In chapter 1 we present a general overview of the task of automatic recognition of sound events. Additionally we discuss the applications of the generalized audio signal recognition technology and we give a brief description of the state of the art. Finally we mention the contribution of the thesis. In chapter 2 we introduce the reader to the area of non speech audio processing. We provide the current trend in the feature extraction methodologies as well as the pattern recognition techniques. In chapter 3 we analyze a novel sound recognition system especially designed for addressing the domain of urban environmental sound events. A hierarchical probabilistic structure was constructed along with a combined set of sound parameters which lead to high accuracy. chapter 4 is divided in the following two parts: a) we explore the usage of multiresolution analysis as regards the speech/music discrimination problem and b) the previously acquired knowledge was used to build a system which combined features of different domains towards efficient analysis of online radio signals. In chapter 5 we exhaustively experiment on a new application of the sound recognition technology, space monitoring based on the acoustic modality. We propose a system which detects atypical situations under a metro station environment towards assisting the authorized personnel in the space monitoring task. In chapter 6 we propose an adaptive framework for acoustic surveillance of potentially hazardous situations under environments of different acoustic properties. We show that the system achieves high performance and has the ability to adapt to heterogeneous environments in an unsupervised way. In chapter 7 we investigate the usage of the novelty detection method to the task of acoustic monitoring of indoor and outdoor spaces. A database with real-world data was recorded and three probabilistic techniques are proposed. In chapter 8 we present a novel methodology for generalized sound recognition that leads to high recognition accuracy. The merits of temporal feature integration as well as multi domain descriptors are exploited in combination with a state of the art generative classification technique.
APA, Harvard, Vancouver, ISO, and other styles
49

"Natural Correlations of Spectral Envelope and their Contribution to Auditory Scene Analysis." Doctoral diss., 2017. http://hdl.handle.net/2286/R.I.46351.

Full text
Abstract:
abstract: Auditory scene analysis (ASA) is the process through which listeners parse and organize their acoustic environment into relevant auditory objects. ASA functions by exploiting natural regularities in the structure of auditory information. The current study investigates spectral envelope and its contribution to the perception of changes in pitch and loudness. Experiment 1 constructs a perceptual continuum of twelve f0- and intensity-matched vowel phonemes (i.e. a pure timbre manipulation) and reveals spectral envelope as a primary organizational dimension. The extremes of this dimension are i (as in “bee”) and Ʌ (“bun”). Experiment 2 measures the strength of the relationship between produced f0 and the previously observed phonetic-pitch continuum at three different levels of phonemic constraint. Scat performances and, to a lesser extent, recorded interviews were found to exhibit changes in accordance with the natural regularity; specifically, f0 changes were correlated with the phoneme pitch-height continuum. The more constrained case of lyrical singing did not exhibit the natural regularity. Experiment 3 investigates participant ratings of pitch and loudness as stimuli vary in f0, intensity, and the phonetic-pitch continuum. Psychophysical functions derived from the results reveal that moving from i to Ʌ is equivalent to a .38 semitone decrease in f0 and a .75 dB decrease in intensity. Experiment 4 examines the potentially functional aspect of the pitch, loudness, and spectral envelope relationship. Detection thresholds of stimuli in which all three dimensions change congruently (f0 increase, intensity increase, Ʌ to i) or incongruently (no f0 change, intensity increase, i to Ʌ) are compared using an objective version of the method of limits. Congruent changes did not provide a detection benefit over incongruent changes; however, when the contribution of phoneme change was removed, congruent changes did offer a slight detection benefit, as in previous research. While this relationship does not offer a detection benefit at threshold, there is a natural regularity for humans to produce phonemes at higher f0s according to their relative position on the pitch height continuum. Likewise, humans have a bias to detect pitch and loudness changes in phoneme sweeps in accordance with the natural regularity.
Dissertation/Thesis
Doctoral Dissertation Psychology 2017
APA, Harvard, Vancouver, ISO, and other styles
50

Wu, Wei-Che, and 吳瑋哲. "Enhancement and segregation specific person's speech based on human auditory scene analysis." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/39372178856500478948.

Full text
Abstract:
碩士
國立陽明大學
醫學工程研究所
95
There are several strategies to separate speech from noise. However,effective methods to solve multi speech’s sources problem remain elusive. The well-studied Cocktail party problem is to analyze and deal with multi speech’s sources signal. Here, we focus on a specific speech source. By sing speaker recognition and speech recognition techniques extracting speakers’speech characteristics coupled with human auditory scene analysis, the speakers’speech characteristics can be determined more precisely and distinguished from one another. These observed differences can then be used for signal segregation and enhancement for specific speakers. Therefore, the users like children can enhance the receiving signal intensity form specific speech’s sources as their parents or teacher.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography