Tesis sobre el tema "Sound recognition"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Sound recognition".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Kawaguchi, Nobuo y Yuya Negishi. "Instant Learning Sound Sensor: Flexible Environmental Sound Recognition System". IEEE, 2007. http://hdl.handle.net/2237/15456.
Texto completoChapman, David P. "Playing with sounds : a spatial solution for computer sound synthesis". Thesis, University of Bath, 1996. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.307047.
Texto completoStäger, Mathias. "Low-power sound-based user activity recognition /". Zürich : ETH, 2006. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=16719.
Texto completoMedhat, Fady. "Masked conditional neural networks for sound recognition". Thesis, University of York, 2018. http://etheses.whiterose.ac.uk/21594/.
Texto completoRodeia, José Pedro dos Santos. "Analysis and recognition of similar environmental sounds". Master's thesis, FCT - UNL, 2009. http://hdl.handle.net/10362/2305.
Texto completoHumans have the ability to identify sound sources just by hearing a sound. Adapting the same problem to computers is called (automatic) sound recognition. Several sound recognizers have been developed throughout the years. The accuracy provided by these recognizers is influenced by the features they use and the classification method implemented. While there are many approaches in sound feature extraction and in sound classification, most have been used to classify sounds with very different characteristics. Here, we implemented a similar sound recognizer. This recognizer uses sounds with very similar properties making the recognition process harder. Therefore, we will use both temporal and spectral properties of the sound. These properties will be extracted using the Intrinsic Structures Analysis (ISA) method, which uses Independent Component Analysis and Principal Component Analysis. We will implement the classification method based on k-Nearest Neighbor algorithm. Here we prove that the features extracted in this way are powerful in sound recognition. We tested our recognizer with several sets of features the ISA method retrieves, and achieved great results. We, finally, did a user study to compare human performance distinguishing similar sounds against our recognizer. The study allowed us to conclude the sounds are in fact really similar and difficult to distinguish and that our recognizer has much more ability than humans to identify them.
Martin, Keith Dana. "Sound-source recognition : a theory and computational model". Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/9468.
Texto completoIncludes bibliographical references (p. 159-172).
The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or competing sounds. Robust listening requires extensive contextual knowledge, but the potential contribution of sound-source recognition to the process of auditory scene analysis has largely been neglected by researchers building computational models of the scene analysis process. This thesis proposes a theory of sound-source recognition, casting recognition as a process of gathering information to enable the listener to make inferences about objects in the environment or to predict their behavior. In order to explore the process, attention is restricted to isolated sounds produced by a small class of sound sources, the non-percussive orchestral musical instruments. Previous research on the perception and production of orchestral instrument sounds is reviewed from a vantage point based on the excitation and resonance structure of the sound-production process, revealing a set of perceptually salient acoustic features. A computer model of the recognition process is developed that is capable of "listening" to a recording of a musical instrument and classifying the instrument as one of 25 possibilities. The model is based on current models of signal processing in the human auditory system. It explicitly extracts salient acoustic features and uses a novel improvisational taxonomic architecture (based on simple statistical pattern-recognition techniques) to classify the sound source. The performance of the model is compared directly to that of skilled human listeners, using both isolated musical tones and excerpts from compact disc recordings as test stimuli. The computer model's performance is robust with regard to the variations of reverberation and ambient noise (although not with regard to competing sound sources) in commercial compact disc recordings, and the system performs better than three out of fourteen skilled human listeners on a forced-choice classification task. This work has implications for research in musical timbre, automatic media annotation, human talker identification, and computational auditory scene analysis.
by Keith Dana Martin.
Ph.D.
Hunter, Jane Louise. "Integrated sound synchronisation for computer animation". Thesis, University of Cambridge, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.239569.
Texto completoSoltani-Farani, A. A. "Sound visualisation as an aid for the deaf : a new approach". Thesis, University of Surrey, 1998. http://epubs.surrey.ac.uk/844112/.
Texto completoGillespie, Bradford W. "Strategies for improving audible quality and speech recognition accuracy of reverberant speech /". Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/5930.
Texto completoCorbet, Remy. "A SOUND FOR RECOGNITION: BLUES MUSIC AND THE AFRICAN AMERICAN COMMUNITY". OpenSIUC, 2011. https://opensiuc.lib.siu.edu/theses/730.
Texto completoDeigård, Daniel. "The Effect of Acute Background Noise on Recognition Tasks". Thesis, Stockholms universitet, Psykologiska institutionen, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-74415.
Texto completoGiannoulis, Dimitrios. "Recognition of sound sources and acoustic events in music and environmental audio". Thesis, Queen Mary, University of London, 2014. http://qmro.qmul.ac.uk/xmlui/handle/123456789/9130.
Texto completoVan, der Merwe Hugo Jacobus. "Bird song recognition with Hidden Markov Models /". Thesis, Link to the online version, 2008. http://hdl.handle.net/10019/914.
Texto completoMainwaring, David y Jonathan Österberg. "Sound Pattern Recognition : Evaluation of Independent Component Analysis Algorithms for Separation of Voices". Thesis, KTH, Skolan för teknikvetenskap (SCI), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230746.
Texto completoI detta arbete har vi undersökt hur oberoende komponentanalys algoritmer (ICA) kan användas förseparation av röster där vi har ett varierande antal röster och mikrofoner utplacerade på olikapositioner i ett rum, mer känt som ”cocktailparty problemet”. Detta görs genom att resultatet frånICA-algoritmer appliceras på ljudinspelningar där flera personer talar i mun på varandra. Vi testar ICAalgoritmerna Maximum Likelihood-ICA (ML-ICA) och fastICA. Båda algoritmerna ger goda resultat närdet är minst lika många mikrofoner som talare. Fördelen med fastICA mot ML-ICA är att körtiden ärmycket kortare. Överraskande resultat från båda algoritmerna är att de klarade att separera ut minsten av rösterna när det var fler talare än mikrofoner då detta inte var ett förväntat resultat.
Kahl, Stefan. "Identifying Birds by Sound: Large-scale Acoustic Event Recognition for Avian Activity Monitoring". Universitätsverlag Chemnitz, 2019. https://monarch.qucosa.de/id/qucosa%3A36986.
Texto completoDie automatisierte Überwachung der Vogelstimmenaktivität und der Artenvielfalt kann ein revolutionäres Werkzeug für Ornithologen, Naturschützer und Vogelbeobachter sein, um bei der langfristigen Überwachung kritischer Umweltnischen zu helfen. Tiefe künstliche neuronale Netzwerke haben die traditionellen Klassifikatoren im Bereich der visuellen Erkennung und akustische Ereignisklassifizierung übertroffen. Dennoch erfordern tiefe neuronale Netze Expertenwissen, um leistungsstarke Modelle zu entwickeln, trainieren und testen. Mit dieser Einschränkung und unter Berücksichtigung der Anforderungen zukünftiger Anwendungen wurde eine umfangreiche Forschungsplattform zur automatisierten Überwachung der Vogelaktivität entwickelt: BirdNET. Das daraus resultierende Benchmark-System liefert state-of-the-art Ergebnisse in verschiedenen akustischen Bereichen und wurde verwendet, um Expertenwerkzeuge und öffentliche Demonstratoren zu entwickeln, die dazu beitragen können, die Demokratisierung des wissenschaftlichen Fortschritts und zukünftige Naturschutzbemühungen voranzutreiben.
Guenebaut, Boris. "Automatic Subtitle Generation for Sound in Videos". Thesis, University West, Department of Economics and IT, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:hv:diva-1784.
Texto completoThe last ten years have been the witnesses of the emergence of any kind of video content. Moreover, the appearance of dedicated websites for this phenomenon has increased the importance the public gives to it. In the same time, certain individuals are deaf and occasionally cannot understand the meanings of such videos because there is not any text transcription available. Therefore, it is necessary to find solutions for the purpose of making these media artefacts accessible for most people. Several software propose utilities to create subtitles for videos but all require an extensive participation of the user. Thence, a more automated concept is envisaged. This thesis report indicates a way to generate subtitles following standards by using speech recognition. Three parts are distinguished. The first one consists in separating audio from video and converting the audio in suitable format if necessary. The second phase proceeds to the recognition of speech contained in the audio. The ultimate stage generates a subtitle file from the recognition results of the previous step. Directions of implementation have been proposed for the three distinct modules. The experiment results have not done enough satisfaction and adjustments have to be realized for further work. Decoding parallelization, use of well trained models, and punctuation insertion are some of the improvements to be done.
Cowling, Michael y n/a. "Non-Speech Environmental Sound Classification System for Autonomous Surveillance". Griffith University. School of Information Technology, 2004. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040428.152425.
Texto completoCowling, Michael. "Non-Speech Environmental Sound Classification System for Autonomous Surveillance". Thesis, Griffith University, 2004. http://hdl.handle.net/10072/365386.
Texto completoThesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information Technology
Full Text
Taft, Daniel Adam. "Cochlear implant sound coding with across-frequency delays". Connect to thesis, 2009. http://repository.unimelb.edu.au/10187/5783.
Texto completoBefore incorporating cochlear delays into a cochlear implant processor, a set of suitable delays was determined with a psychoacoustic calibration to pitch perception, since normal cochlear delays are a function of frequency. The first experiment assessed the perception of pitch evoked by electrical stimuli from cochlear implant electrodes. Six cochlear implant users with acoustic hearing in their non-implanted ears were recruited for this, since they were able to compare electric stimuli to acoustic tones. Traveling wave delays were then computed for each subject using the frequencies matched to their electrodes. These were similar across subjects, ranging over 0-6 milliseconds along the electrode array.
The next experiment applied the calibrated delays to the ACE strategy filter outputs before maxima selection. The effects upon speech perception in noise were assessed with cochlear implant users, and a small but significant improvement was observed. A subsequent sensitivity analysis indicated that accurate calibration of the delays might not be necessary after all; instead, a range of across-frequency delays might be similarly beneficial.
A computational investigation was performed next, where a corpus of recorded speech was passed through the ACE cochlear implant sound processing strategy in order to determine how across-frequency delays altered the patterns of stimulation. A range of delay vectors were used in combination with a number of processing parameter sets and noise levels. The results showed that additional stimuli from broadband sounds (such as the glottal pulses of vowels) are selected when frequency bands are desynchronized with across-frequency delays. Background noise contains fewer dominant impulses than a single talker and so is not enhanced in this way.
In the following experiment, speech perception with an ensemble of across-frequency delays was assessed with eight cochlear implant users. Reverse cochlear delays (high frequency delays) were equivalent to conventional cochlear delays. Benefit was diminished for larger delays. Speech recognition scores were at baseline with random delay assignments. An information transmission analysis of speech in quiet indicated that the discrimination of voiced cues was most improved with across-frequency delays. For some subjects, this was seen as improved vowel discrimination based on formant locations and improved transmission of the place of articulation of consonants.
A final study indicated that benefits to speech perception with across-frequency delays are diminished when the number of maxima selected per frame is increased above 8-out-of-22 frequency bands.
Sturtivant, Christopher R. "Extraction and recognition of tonal sounds produced by small cetaceans and identification of individuals". Thesis, Loughborough University, 1997. https://dspace.lboro.ac.uk/2134/6761.
Texto completoMalheiro, Frederico Alberto Santos de Carteado. "Automatic musical instrument recognition for multimedia indexing". Master's thesis, Faculdade de Ciências e Tecnologia, 2011. http://hdl.handle.net/10362/6124.
Texto completoThe subject of automatic indexing of multimedia has been a target of numerous discussion and study. This interest is due to the exponential growth of multimedia content and the subsequent need to create methods that automatically catalogue this data. To fulfil this idea, several projects and areas of study have emerged. The most relevant of these are the MPEG-7 standard, which defines a standardized system for the representation and automatic extraction of information present in the content, and Music Information Retrieval (MIR), which gathers several paradigms and areas of study relating to music. The main approach to this indexing problem relies on analysing data to obtain and identify descriptors that can help define what we intend to recognize (as, for instance,musical instruments, voice, facial expressions, and so on), this then provides us with information we can use to index the data. This dissertation will focus on audio indexing in music, specifically regarding the recognition of musical instruments from recorded musical notes. Moreover, the developed system and techniques will also be tested for the recognition of ambient sounds (such as the sound of running water, cars driving by, and so on). Our approach will use non-negative matrix factorization to extract features from various types of sounds, these will then be used to train a classification algorithm that will be then capable of identifying new sounds.
El-Feghaly, Edmond M. "The influence of sound spectrum on recognition of temporal pattern of cricket (Teleogryllus oceanicus) song /". Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=60656.
Texto completoHigh frequency neurons were suspected to be behind cessation of responsiveness to stimuli with altered temporal features. This hypothesis predicts that the effect on selectivity of increasing the intensity of the 5 kHz stimulus might be mimicked by adding a high frequency to the stimulus. My results contradict this hypothesis.
The response to a 30 kHz carrier demonstrates a dependency on the duration and pulse repetition rate of the stimulus.
Deily, Joshua Allen. "Mechanisms of call recognition in three sympatric species of Neoconocephalus (Orthoptera: Tettigoniidae) asymmetrical interactions and evolutionary implications /". Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/4357.
Texto completoThe entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (February 26, 2007) Vita. Includes bibliographical references.
White, Teresa. "The effects of mnemonics on letter recognition and letter sound acquisition of at-risk kindergarten students". [College Station, Tex. : Texas A&M University, 2006. http://hdl.handle.net/1969.1/ETD-TAMU-1100.
Texto completoLam, Chi-kan. "Detection of air leaks using pattern recognition techniques and neurofuzzy networks /". Hong Kong : University of Hong Kong, 2000. http://sunzi.lib.hku.hk/hkuto/record.jsp?B21981826.
Texto completo林智勤 y Chi-kan Lam. "Detection of air leaks using pattern recognition techniques and neurofuzzy networks". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B31222833.
Texto completoBajzík, Jakub. "Rozpoznání zvukových událostí pomocí hlubokého učení". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-401993.
Texto completoParks, Sherrie L. "The sound of music: The influence of evoked emotion on recognition memory for musical excerpts across the lifespan". OpenSIUC, 2013. https://opensiuc.lib.siu.edu/theses/1143.
Texto completoAlsharhan, Iman. "Exploiting phonological constraints and automatic identification of speaker classes for Arabic speech recognition". Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/exploiting-phonologicalconstraints-and-automaticidentification-of-speakerclasses-for-arabic-speechrecognition(8d443cae-e9e4-4f40-8884-99e2a01df8e9).html.
Texto completoDhakal, Parashar. "Novel Architectures for Human Voice and Environmental Sound Recognitionusing Machine Learning Algorithms". University of Toledo / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278.
Texto completoHoffman, Jeffrey Dean. "Using Blind Source Separation and a Compact Microphone Array to Improve the Error Rate of Speech Recognition". PDXScholar, 2016. https://pdxscholar.library.pdx.edu/open_access_etds/3367.
Texto completoLareau, Jonathan. "Application of shifted delta cepstral features for GMM language identification /". Electronic version of thesis, 2006. https://ritdml.rit.edu/dspace/handle/1850/2686.
Texto completoTindale, Adam. "Classification of snare drum sounds using neural networks". Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81515.
Texto completoMartí, Guerola Amparo. "Multichannel audio processing for speaker localization, separation and enhancement". Doctoral thesis, Universitat Politècnica de València, 2013. http://hdl.handle.net/10251/33101.
Texto completoMartí Guerola, A. (2013). Multichannel audio processing for speaker localization, separation and enhancement [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33101
TESIS
Dodenhoff, Danielle J. "AN ANALYSIS OF ACOUSTIC COMMUNICATION WITHIN THE SOCIAL SYSTEM OF DOWNY WOODPECKERS (PICOIDES PUBESCENS)". The Ohio State University, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=osu1032381559.
Texto completoKahl, Stefan [Verfasser], Maximilian [Akademischer Betreuer] Eibl, Maximilian [Gutachter] Eibl, Marc [Gutachter] Ritter y Holger [Akademischer Betreuer] Klinck. "Identifying Birds by Sound: Large-scale Acoustic Event Recognition for Avian Activity Monitoring / Stefan Kahl ; Gutachter: Maximilian Eibl, Marc Ritter ; Maximilian Eibl, Holger Klinck". Chemnitz : Universitätsverlag Chemnitz, 2020. http://d-nb.info/1219664502/34.
Texto completoChoi, Hyung Keun. "Blind source separation of the audio signals in a real world". Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/14986.
Texto completoRoman, Nicoleta. "Auditory-based algorithms for sound segregation in multisource and reverberant environments". Connect to resource, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1124370749.
Texto completoTitle from first page of PDF file. Document formatted into pages; contains i-xxii, xx-xxi, 183 p.; also includes graphics. Includes bibliographical references (p. 171-183). Available online via OhioLINK's ETD Center
Odehnal, Jiří. "Řízení a měření sportovních drilů hlasem/zvuky". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-399705.
Texto completoSmith, Daniel. "An analysis of blind signal separation for real time application". Access electronically, 2006. http://www.library.uow.edu.au/adt-NWU/public/adt-NWU20070815.152400/index.html.
Texto completoUnnikrishnan, Harikrishnan. "AUDIO SCENE SEGEMENTATION USING A MICROPHONE ARRAY AND AUDITORY FEATURES". UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/622.
Texto completoSklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition". Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.
Texto completoANFLO, FREDRIK. "M8 the Four-legged Robot". Thesis, KTH, Skolan för industriell teknik och management (ITM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279836.
Texto completoPå senare tider har robotar blivit mer och mer vanliga. De är överallt. Gående, springande, simmande, flygande och många av dem har mycket gemensamt med de varelser som lever på denna jord. Mycket av detta för att tilltala oss mer, istället för att framstå som enbart iskalla maskiner. Att fortsätta på den väg som evolutionen har lagt framför oss verkar vara ett vist beslut att ta, i strävan efter att effektivt utnyttja våra kunskaper i vetenskap och ingenjörskonst med visionen om att förbättra vår framtid. Med målet att simulera ett fyrbent djur och utvärdera möjligheterna till att interagera med ens omgivning, har ett fyrbent förflyttningssystem tillsammans med två typer av ljud och röstsystem tagits fram. En prototyp kontruerades för att testa de problem som uppstår i den verkliga värden och för att kunna bedöma vilket sätt att interagera som visar vara sig mest fördelaktigt. Resultaten indikerar att röstkommandon och röstigenkänning, snarare än ljuddetektion från omgivningen är mer praktiska och robusta som ett sätt att interagera med sin närmiljö.
Reis, Clovis Ferreira dos. "Sistema Modular para Detecção e Reconhecimento de Disparos de Armas de Fogo". Universidade Federal da Paraíba, 2015. http://tede.biblioteca.ufpb.br:8080/handle/tede/9244.
Texto completoMade available in DSpace on 2017-08-11T14:47:59Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 3075980 bytes, checksum: 34017b499d4b0a096285315cb614b985 (MD5) Previous issue date: 2015-12-04
The urban violence has been increasing in almost Brazilian state and in order to face this threat, new technological tools are required by the police authorities in order to support their decisions on how and when the few available resources should be employed to combat criminality. In this context, this work presents an embedded computational tool that is suitable for detecting gun-shots automatically. To provide the necessary knowledge to understand the work, a brief description about impulsive sounds, re guns and the gun-shot characteristics are initially presented. Latter, a system based on modules is proposed to detect and recognize impulsive sound, which are characteristics of gun-shots. However, since the system contain several modules in this work we have focus only on two of them: the module for detecting impulsive sounds and the module for distinguish a gun-shot from any other impulsive sound. For the impulsive detection module, three well-known algorithms were analyzed on the same condition: the fourth derivative of the Root Median Square (RMS), the Conditional Median Filter (CMF) and the Variance Method (VM). The algorithms were tested based on four measured performance parameters: accuracy, precision, sensibility and speci city. And in order to determine the most e cient algorithm for detecting impulsive sounds, a cadence test with impulsive sounds, without or with additional noise (constant or increasing) was performed. After this analysis, the parameters employed on the CMF and VM method were tested in a wide range of con gurations to verify any possibility of optimization. Once this optimal method was determined, the classi cation module to recognize gun-shots started to be implemented. For this, two distinguish methods were compared, one based on the signal wrapped over the time and the other based on most relevant frequencies obtained from the Fourier transform. From the comparison between the two methods it was observed that the wrapped method provided 54% of accuracy in the classi cation of impulsive sounds, while with the frequency analysis this value was 72%.
A violência urbana vem crescendo anualmente em praticamente todos os estados brasileiros e para fazer face a essa amea ca, as autoridades policiais necessitam cada vez mais de ferramentas tecnológicas que os auxiliem na tomada de decisões sobre quando e como empregar os parcos recursos disponíveis a repressão do crime. Neste contexto, e apresentado nesse trabalho uma ferramenta computacional, passível de ser embarcada em dispositivos m oveis, que possibilita realizar a detecção e reconhecimento automático de disparos de armas de fogo. Para tanto, são descritos inicialmente os fundamentos básicos sobre sons impulsivos, armas de fogo e caracter sticas de disparos. Posteriormente, descreve-se uma proposta de um sistema modular de detecção e reconhecimento de disparos. No entanto, devido ao sistema conter diversos m odulos complexos, este trabalho teve foco em dois deles: o modulo de detecção de sons impulsivos e o modulo de classificação, que permite distinguir disparos de armas de fogo de outros sons impulsivos. Para o módulo de detecção de sons impulsivos foram analisados três algoritmos amplamente descritos na literatura: o algoritmo da quarta derivada da RMS, o da Conditional Median Filter (CMF) e o Método da Variância (VM). Os algoritmos foram testados com base nas medidas de desempenho da acurácia, precisão, sensibilidade e especificidade. E a para determinar o método mais e ciente, foram realizados testes de cadências, com sons impulsivos sem adição de ru do sonoro, com adição de ruído constante e com ruído variável. Ao final dessa anáise, os par^ametros preconizados na literatura para os m etodos CMF e VM foram alterados para uma verificação de possibilidade de otimização. De nido o algoritmo de detecção de impulso mais e ciente, iniciou-se o desenvolvimento do módulo de classificação. Para isso, foram propostas duas t ecnicas para o reconhecimento de disparos de armas de fogo, uma utilizando uma compara c~ao da envolt oria do som no dom nio do tempo e outra baseada na comparação de frequências dominantes obtidas por meio da transformada de Fourier. Numa comparação entre as duas técnicas observou-se que com a técnica da envoltória e poss vel identi car 54% dos sons impulsivos, enquanto que com a t ecnica baseada no dom nio da frequ^encia, este percentual foi de 72%.
Strowger, Megan E. "Interoceptive sounds and emotion recognition". Thesis, University of the Sciences in Philadelphia, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10294821.
Texto completoBackground: Perception of changes in physiological arousal is theorized to form the basis for which the brain labels emotional states. Interoception is a process by which individuals become aware of physiological sensations. Lowered emotional awareness has been found to be associated with lower interoceptive awareness. Alexithymia is a personality trait associated with lowered emotion recognition ability which affects 10-20% of the university student population in Western countries. Research suggests that being made aware of one’s heartbeat may enhance emotional awareness. Objective(s): The present study attempted to enhance emotion recognition abilities directly via an experimental interoceptive manipulation in order to decrease levels of alexithymia. It had three aims: 1) To examine whether exposing individuals to the interoceptive sound of their own heart beat could illicit changes in their emotion recognition abilities,2) To examine whether higher emotion recognition abilities as a result of listening to one’s own heartbeat differed by alexithymia group, and 3) if higher interoceptive awareness was associated with higher RME scores during the own heartbeat sound condition. Methods: 36 participants were recruited from an introductory psychology class at the University of the Sciences in Philadelphia. Participants completed lab-based tests of emotion recognition followed by questionnaires assessing alexithymia and interoceptive abilities. During the lab-based test of emotion recognition, participants were subjected to an interoceptive manipulation by listening to three sounds (in random order): own heartbeat, another person’s heartbeat, and footsteps. To test aim 1, a repeated-measures ANOVA examined differences in emotion recognition scores during the various sound conditions (i.e., no sound, own heartbeat, other heartbeat, footsteps). For evaluating aim 2, a two way 3 x 4 RM ANOVA tested for differences in RME scores by sound condition when individuals were alexithymic, possibly alexithymic and not alexithymic. Aim 3 was examined using correlations between the attention to body and emotion awareness subscale scores separately with RME score for own heartbeat. Results: Contrary to predictions, RME performance did not vary according to body sound condition, F (3, 105) =.53, p = .67, η² = .02. A significant interaction was seen between alexithymia category and RME scores during the interoceptive sound conditions, F (6, 99) = 2.27, p = .04, η ² = .12. However, post-hoc analyses did not reveal significant differences between specific alexithymia categories and RME scores. A significant positive relationship was seen between RME during own heartbeat and being able to pay attention to the body (r (36) = .34, p = .05, R² = .11). Discussion: Our results suggest that more attention was directed toward facial emotions when subjects listened to their own heartbeat but this increase did not result in measurable changes in RME performance. Limitations: Although using a within-subjects design potentially increased statistical power, a between-subjects design with random assignment could have eliminated the effects of repeated measurement and condition order. Implications: The most novel of these findings was that individuals paid more attention to the emotional stimuli when hearing their own heartbeat. More research is needed to understand if the interoceptive sound manipulation may aide in improving other cognitive functions or earlier steps in the emotion process. Future research using other measures of interoception and attention are necessary to confirm the result.
Sehili, Mohamed el Amine. "Reconnaissance des sons de l’environnement dans un contexte domotique". Thesis, Evry, Institut national des télécommunications, 2013. http://www.theses.fr/2013TELE0014/document.
Texto completoIn many countries around the world, the number of elderly people living alone has been increasing. In the last few years, a significant number of research projects on elderly people monitoring have been launched. Most of them make use of several modalities such as video streams, sound, fall detection and so on, in order to monitor the activities of an elderly person, to supply them with a natural way to communicate with their “smart-home”, and to render assistance in case of an emergency. This work is part of the Industrial Research ANR VERSO project, Sweet-Home. The goals of the project are to propose a domotic system that enables a natural interaction (using touch and voice command) between an elderly person and their house and to provide them a higher safety level through the detection of distress situations. Thus, the goal of this work is to come up with solutions for sound recognition of daily life in a realistic context. Sound recognition will run prior to an Automatic Speech Recognition system. Therefore, the speech recognition’s performances rely on the reliability of the speech/non-speech separation. Furthermore, a good recognition of a few kinds of sounds, complemented by other sources of information (presence detection, fall detection, etc.) could allow for a better monitoring of the person's activities that leads to a better detection of dangerous situations. We first had been interested in methods from the Speaker Recognition and Verification field. As part of this, we have experimented methods based on GMM and SVM. We had particularly tested a Sequence Discriminant SVM kernel called SVM-GSL (SVM GMM Super Vector Linear Kernel). SVM-GSL is a combination of GMM and SVM whose basic idea is to map a sequence of vectors of an arbitrary length into one high dimensional vector called a Super Vector and used as an input of an SVM. Experiments had been carried out using a locally created sound database (containing 18 sound classes for over 1000 records), then using the Sweet-Home project's corpus. Our daily sounds recognition system was integrated into a more complete system that also performs a multi-channel sound detection and speech recognition. These first experiments had all been performed using one kind of acoustical coefficients, MFCC coefficients. Thereafter, we focused on the study of other families of acoustical coefficients. The aim of this study was to assess the usability of other acoustical coefficients for environmental sounds recognition. Our motivation was to find a few representations that are simpler and/or more effective than the MFCC coefficients. Using 15 different acoustical coefficients families, we have also experimented two approaches to map a sequence of vectors into one vector, usable with a linear SVM. The first approach consists of computing a set of a fixed number of statistical coefficients and use them instead of the whole sequence. The second one, which is one of the novel contributions of this work, makes use of a discretization method to find, for each feature within an acoustical vector, the best cut points that associates a given class with one or many intervals of values. The likelihood of the sequence is estimated for each interval. The obtained likelihood values are used to build one single vector that replaces the sequence of acoustical vectors. The obtained results show that a few families of coefficients are actually more appropriate to the recognition of some sound classes. For most sound classes, we noticed that the best recognition performances were obtained with one or many families other than MFCC. Moreover, a number of these families are less complex than MFCC. They are actually a one-feature per frame acoustical families, whereas MFCC coefficients contain 16 features per frame
Lirussi, Igor. "Human-Robot interaction with low computational-power humanoids". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19120/.
Texto completoMovin, Andreas y Jonathan Jilg. "Kan datorer höra fåglar?" Thesis, KTH, Skolan för teknikvetenskap (SCI), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254800.
Texto completoSound recognition is made possible through spectral analysis, computed by the fast Fourier transform (FFT), and has in recent years made major breakthroughs along with the rise of computational power and artificial intelligence. The technology is now used ubiquitously and in particular in the field of bioacoustics for identification of animal species, an important task for wildlife monitoring. It is still a growing field of science and especially the recognition of bird song which remains a hard-solved challenge. Even state-of-the-art algorithms are far from error-free. In this thesis, simple algorithms to match sounds to a sound database were implemented and assessed. A filtering method was developed to pick out characteristic frequencies at five time frames which were the basis for comparison and the matching procedure. The sounds used were pre-recorded bird songs (blackbird, nightingale, crow and seagull) as well as human voices (4 young Swedish males) that we recorded. Our findings show success rates typically at 50–70%, the lowest being the seagull of 30% for a small database and the highest being the blackbird at 90% for a large database. The voices were more difficult for the algorithms to distinguish, but they still had an overall success rate between 50% and 80%. Furthermore, increasing the database size did not improve success rates in general. In conclusion, this thesis shows the proof of concept and illustrates both the strengths as well as short-comings of the simple algorithms developed. The algorithms gave better success rates than pure chance of 25% but there is room for improvement since the algorithms were easily misled by sounds of the same frequencies. Further research will be needed to assess the devised algorithms' ability to identify even more birds and voices.
Hrabina, Martin. "VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ". Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-409087.
Texto completoChen, Bo Min y 陳柏旻. "Sound reconstruction based on features for sound recognition". Thesis, 2015. http://ndltd.ncl.edu.tw/handle/22558650799254615024.
Texto completo國立清華大學
電機工程學系
104
Abstract Sounds play an important role in our life. We can communicate with each other and know what happens by listening to sounds. By extracting the feature of sounds, we can keep specific information of sounds to recognize sounds. Sound transmission can be done if sounds could be reconstructed from the transmitted features of sounds. In this research, we attempt to reconstruct sounds using features that are typically transmitted for recognition purposes. In this thesis, we take the mel frequency cepstral coefficients (MFCC), a set of features that has been commonly used for sound recognition, as the basic features for reconstruction. Because MFCC does not encode the detail of sounds, we use the pitch as additional information to enhance the completeness of the features. The sound reconstruction is based on a source-filter model which takes the reconstructed frequency response from MFCC as the spectral envelope and determines the sound source with the pitch. The critical factors of the reconstructed sound source are the frequency distribution of noise and harmonics which could be determined by the human speech production mechanism. We then combine the spectral envelope with the sound source to reconstruct sounds through a modified source-filter model. In this thesis, we test our methods by analysis and reconstruction of speech and non-speech materials. We attempt to find the factors that may affect the quality of reconstructed sounds. We also evaluate reconstructed sounds by subjective listening test and objective perceptual evaluation of audio quality ( PEAQ). The range of grades is from 1(very bad) to 5(very good). The result of listening test reveals that the grade of speech and non-speech reconstruction is about 3 to 4. PEAQ reveals that the grade of non-speech reconstruction is about 2 to 3.5 and the grade of speech reconstruction is slightly higher than 1.