Tesis: "Sound recognition"

1

Kawaguchi, Nobuo y Yuya Negishi. "Instant Learning Sound Sensor: Flexible Environmental Sound Recognition System". IEEE, 2007. http://hdl.handle.net/2237/15456.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Chapman, David P. "Playing with sounds : a spatial solution for computer sound synthesis". Thesis, University of Bath, 1996. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.307047.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Stäger, Mathias. "Low-power sound-based user activity recognition /". Zürich : ETH, 2006. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=16719.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Medhat, Fady. "Masked conditional neural networks for sound recognition". Thesis, University of York, 2018. http://etheses.whiterose.ac.uk/21594/.

Texto completo

Resumen

Sound recognition has been studied for decades to grant machines the human hearing ability. The advances in this field help in a range of applications, from industrial ones such as fault detection in machines and noise monitoring to household applications such as surveillance and hearing aids. The problem of sound recognition like any pattern recognition task involves the reliability of the extracted features and the recognition model. The problem has been approached through decades of crafted features used collaboratively with models based on neural networks or statistical models such as Gaussian Mixtures and Hidden Markov models. Neural networks are currently being considered as a method to automate the feature extraction stage together with the already incorporated role of recognition. The performance of such models is approaching handcrafted features. Current neural network based models are not primarily designed for the nature of the sound signal, which may not optimally harness distinctive properties of the signal. This thesis proposes neural network models that exploit the nature of the time-frequency representation of the sound signal. We propose the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN). The CLNN is designed to account for the temporal dimension of a signal and behaves as the framework for the MCLNN. The MCLNN allows a filterbank-like behaviour to be embedded within the network using a specially designed binary mask. The masking subdivides the frequency range of a signal into bands and allows concurrent consideration of different feature combinations analogous to the manual handcrafting of the optimum set of features for a recognition task. The proposed models have been evaluated through an extensive set of experiments using a range of publicly available datasets of music genres and environmental sounds, where they surpass state-of-the-art Convolutional Neural Networks and several hand-crafted attempts.

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Rodeia, José Pedro dos Santos. "Analysis and recognition of similar environmental sounds". Master's thesis, FCT - UNL, 2009. http://hdl.handle.net/10362/2305.

Texto completo

Resumen

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Informática
Humans have the ability to identify sound sources just by hearing a sound. Adapting the same problem to computers is called (automatic) sound recognition. Several sound recognizers have been developed throughout the years. The accuracy provided by these recognizers is influenced by the features they use and the classification method implemented. While there are many approaches in sound feature extraction and in sound classification, most have been used to classify sounds with very different characteristics. Here, we implemented a similar sound recognizer. This recognizer uses sounds with very similar properties making the recognition process harder. Therefore, we will use both temporal and spectral properties of the sound. These properties will be extracted using the Intrinsic Structures Analysis (ISA) method, which uses Independent Component Analysis and Principal Component Analysis. We will implement the classification method based on k-Nearest Neighbor algorithm. Here we prove that the features extracted in this way are powerful in sound recognition. We tested our recognizer with several sets of features the ISA method retrieves, and achieved great results. We, finally, did a user study to compare human performance distinguishing similar sounds against our recognizer. The study allowed us to conclude the sounds are in fact really similar and difficult to distinguish and that our recognizer has much more ability than humans to identify them.

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Martin, Keith Dana. "Sound-source recognition : a theory and computational model". Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/9468.

Texto completo

Resumen

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.
Includes bibliographical references (p. 159-172).
The ability of a normal human listener to recognize objects in the environment from only the sounds they produce is extraordinarily robust with regard to characteristics of the acoustic environment and of other competing sound sources. In contrast, computer systems designed to recognize sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or competing sounds. Robust listening requires extensive contextual knowledge, but the potential contribution of sound-source recognition to the process of auditory scene analysis has largely been neglected by researchers building computational models of the scene analysis process. This thesis proposes a theory of sound-source recognition, casting recognition as a process of gathering information to enable the listener to make inferences about objects in the environment or to predict their behavior. In order to explore the process, attention is restricted to isolated sounds produced by a small class of sound sources, the non-percussive orchestral musical instruments. Previous research on the perception and production of orchestral instrument sounds is reviewed from a vantage point based on the excitation and resonance structure of the sound-production process, revealing a set of perceptually salient acoustic features. A computer model of the recognition process is developed that is capable of "listening" to a recording of a musical instrument and classifying the instrument as one of 25 possibilities. The model is based on current models of signal processing in the human auditory system. It explicitly extracts salient acoustic features and uses a novel improvisational taxonomic architecture (based on simple statistical pattern-recognition techniques) to classify the sound source. The performance of the model is compared directly to that of skilled human listeners, using both isolated musical tones and excerpts from compact disc recordings as test stimuli. The computer model's performance is robust with regard to the variations of reverberation and ambient noise (although not with regard to competing sound sources) in commercial compact disc recordings, and the system performs better than three out of fourteen skilled human listeners on a forced-choice classification task. This work has implications for research in musical timbre, automatic media annotation, human talker identification, and computational auditory scene analysis.
by Keith Dana Martin.
Ph.D.

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Hunter, Jane Louise. "Integrated sound synchronisation for computer animation". Thesis, University of Cambridge, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.239569.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

Soltani-Farani, A. A. "Sound visualisation as an aid for the deaf : a new approach". Thesis, University of Surrey, 1998. http://epubs.surrey.ac.uk/844112/.

Texto completo

Resumen

Visual translation of speech as an aid for the deaf has long been a subject of electronic research and development. This thesis is concerned with a technique of sound visualisation based upon the theory of the primacy of dynamic, rather than static, information in the perception of speech sounds. The goal is design and evaluation of a system to display the perceptually important features of an input sound in a dynamic format as similar as possible to the auditory representation of that sound. The human auditory system, as the most effective system of sound representation, is first studied. Then, based on the latest theories of hearing and techniques of auditory modelling, a simplified model of the human ear is developed. In this model, the outer and middle ears together are simulated by a high-pass filter, and the inner ear is modelled by a bank of band-pass filters the outputs of which, after rectification and compression, are applied to a visualiser block. To design an appropriate visualiser block, theories of sound and speech perception are reviewed. Then the perceptually important properties of sound, and their relations to the physical attributes of the sound pressure wave, are considered to map the outputs from the auditory model onto an informative and recognisable running image-like the one known as cochleagram. This conveyor-like image is then sampled by a window of 20 milliseconds duration at a rate of 50 samples per second, so that a sequence of phase-locked, rectangular images is produced. Animation of these images results in a novel method of spectrography displaying both the time-varying and the time-independent information of the underlying sound with a high resolution in real time. The resulting system translates a spoken word into a visual gesture, and displays a still picture when the input is a steady state sound. Finally the implementation of this visualiser system is evaluated through several experiments undertaken by normal-hearing subjects. In these experiments, recognition of the gestures of a number of spoken words, is examined through a set of two-word and multi-word forced-choice tests. The results of these preliminary experiments show a high recognition score (40-90 percent, where zero represents chance expectation) after only 10 learning trials. General conclusions from the results suggest: a potential quick learning of the gestures, language independence of the system, fidelity of the system in translating the auditory information, and persistence of the learned gestures in the long-term memory. The results are very promising and motivate further investigations.

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Gillespie, Bradford W. "Strategies for improving audible quality and speech recognition accuracy of reverberant speech /". Thesis, Connect to this title online; UW restricted, 2002. http://hdl.handle.net/1773/5930.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Corbet, Remy. "A SOUND FOR RECOGNITION: BLUES MUSIC AND THE AFRICAN AMERICAN COMMUNITY". OpenSIUC, 2011. https://opensiuc.lib.siu.edu/theses/730.

Texto completo

Resumen

AN ABSTRACT OF THE THESIS OF Remy Corbet for the Masters in History degree in American History, presented on August 3, 2011, at Southern Illinois University Carbondale. TITLE: A SOUND FOR RECOGNITION: BLUES MUSIC AND THE AFRICAN AMERICAN COMMUNITY MAJOR PROFESSOR: Dr. Robbie Lieberman Blues music is a reflection of all the changes that shaped the African American experience. It is an affirmation of the African American identity, looking forward to the future with one eye glancing at the past. It is a reminder of the tragedies and inequalities that accompanied African Americans from slavery to official freedom, then from freedom to equality. It is the witness of the development of African Americans, and of their acculturation to the individual voice, symbol of the American ethos, which made the link between their African past and their American future.

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

Deigård, Daniel. "The Effect of Acute Background Noise on Recognition Tasks". Thesis, Stockholms universitet, Psykologiska institutionen, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-74415.

Texto completo

Resumen

Many studies have investigated the effects of background noise on cognitive functions, in particular memory and learning. But few studies have examined the effect of acute noise on the specific parts of the memory process. The purpose of the current study was to fill this gap in the research. Twenty-three students from Stockholm University were tested with two different semantic programming tasks during different white noise conditions. Working memory capacity and subjective sensitivity to noise was also tested. No significant effects were found on the participants’ recognition scores, but a significant main effect for noise during recognition, as well as a significant main effect of experimental group, was found on response times. The noise effect was positive, which puts the study in conflict with most previous ones. The results could perhaps be explained by the theory of Stochastic Resonance or the Yerkes-Dodson Effect. Other reaction-time related tasks are suggested as future topics of study.

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Giannoulis, Dimitrios. "Recognition of sound sources and acoustic events in music and environmental audio". Thesis, Queen Mary, University of London, 2014. http://qmro.qmul.ac.uk/xmlui/handle/123456789/9130.

Texto completo

Resumen

Hearing, together with other senses, enables us to perceive the surrounding world through sensory data we constantly receive. The information carried in this data allow us to classify the environment and the objects in it. In modern society the loud and noisy acoustic environment that surrounds us makes the task of "listening" quite challenging, probably more so than ever before. There is a lot of information that has to be filtered to separate the sounds we want to hear at from unwanted noise and interference. And yet, humans, as other living organisms, have a remarkable ability to identify and track the sounds they want, irrespectively of the number of them, the degree of overlap and the interference that surrounds them. To this day, the task of building systems that try to "listen" to the surrounding environment and identify sounds in it the same way humans do is a challenging one, and even though we have made steps towards reaching human performance we are still a long way from building systems able to identify and track most if not all the different sounds within an acoustic scene. In this thesis, we deal with the tasks of recognising sound sources or acoustic events in two distinct cases of audio – music and more generic environmental sounds. We reformulate the problem and redefine the task associated with each case. Music can also be regarded as a multisound source environment where the different sound sources (musical instruments) activate at different times, and the task of recognising the musical instruments is then a central part of the more generic process of automatic music transcription. The principal question we address is whether we could develop a system able to recognise musical instruments in a multi-instrument scenario where many different instruments are active at the same time, and for that we draw influence from human performance. The proposed system is based on missing feature theory and we find that the method is able to retain high performance even under the most adverse of listening conditions (i.e. low signal-to-noise ratio). Finally, we propose a technique to fuse this system with another that deals with automatic music transcription in an attempt to inform and improve the overall performance. For a more generic environmental audio scene, things are less clear and the amount of research conducted in the area is still scarce. The central issue here, is to formulate the problem of sound recognition, define the subtasks and associated difficulties. We have set up and run a worldwide challenge and created datasets that is intended to enable researchers to perform better quality research in the field. We have also developed proposed systems that could serve as baseline techniques for future research and also compared existing state-of-the-art algorithms to one another, and also against human performance, in an effort to highlight strengths and weaknesses of existing methodologies.

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Van, der Merwe Hugo Jacobus. "Bird song recognition with Hidden Markov Models /". Thesis, Link to the online version, 2008. http://hdl.handle.net/10019/914.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Mainwaring, David y Jonathan Österberg. "Sound Pattern Recognition : Evaluation of Independent Component Analysis Algorithms for Separation of Voices". Thesis, KTH, Skolan för teknikvetenskap (SCI), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230746.

Texto completo

Resumen

With computers being used for more applications where commands can be spoken it is useful to findalgorithms which can separate voices from each other so that software can turn spoken words intocommands. In this paper our goal is to describe how Independent Component Analysis (ICA) can beused for separation of voices in cases where we have at least the same number of microphones, atdifferent distances from the speakers, as speakers whose voices we wish to separate, the so called``cocktail party problem". This is done by implementing an ICA algorithm on voice recordingscontaining multiple persons and examining the results. The use of both ICA algorithms result in aclear separation of voices, the advantage of fastICA is that the computations take a fraction of thetime needed for the ML-ICA. Both algorithms can also successfully separate voices when recordingsare made by more microphones than speakers. The algorithms were also able to separate some ofthe voices when there were fewer microphones than speakers which was surprising as thealgorithms have no theoretical guarantee for this.
I detta arbete har vi undersökt hur oberoende komponentanalys algoritmer (ICA) kan användas förseparation av röster där vi har ett varierande antal röster och mikrofoner utplacerade på olikapositioner i ett rum, mer känt som ”cocktailparty problemet”. Detta görs genom att resultatet frånICA-algoritmer appliceras på ljudinspelningar där flera personer talar i mun på varandra. Vi testar ICAalgoritmerna Maximum Likelihood-ICA (ML-ICA) och fastICA. Båda algoritmerna ger goda resultat närdet är minst lika många mikrofoner som talare. Fördelen med fastICA mot ML-ICA är att körtiden ärmycket kortare. Överraskande resultat från båda algoritmerna är att de klarade att separera ut minsten av rösterna när det var fler talare än mikrofoner då detta inte var ett förväntat resultat.

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Kahl, Stefan. "Identifying Birds by Sound: Large-scale Acoustic Event Recognition for Avian Activity Monitoring". Universitätsverlag Chemnitz, 2019. https://monarch.qucosa.de/id/qucosa%3A36986.

Texto completo

Resumen

Automated observation of avian vocal activity and species diversity can be a transformative tool for ornithologists, conservation biologists, and bird watchers to assist in long-term monitoring of critical environmental niches. Deep artificial neural networks have surpassed traditional classifiers in the field of visual recognition and acoustic event classification. Still, deep neural networks require expert knowledge to design, train, and test powerful models. With this constraint and the requirements of future applications in mind, an extensive research platform for automated avian activity monitoring was developed: BirdNET. The resulting benchmark system yields state-of-the-art scores across various acoustic domains and was used to develop expert tools and public demonstrators that can help to advance the democratization of scientific progress and future conservation efforts.
Die automatisierte Überwachung der Vogelstimmenaktivität und der Artenvielfalt kann ein revolutionäres Werkzeug für Ornithologen, Naturschützer und Vogelbeobachter sein, um bei der langfristigen Überwachung kritischer Umweltnischen zu helfen. Tiefe künstliche neuronale Netzwerke haben die traditionellen Klassifikatoren im Bereich der visuellen Erkennung und akustische Ereignisklassifizierung übertroffen. Dennoch erfordern tiefe neuronale Netze Expertenwissen, um leistungsstarke Modelle zu entwickeln, trainieren und testen. Mit dieser Einschränkung und unter Berücksichtigung der Anforderungen zukünftiger Anwendungen wurde eine umfangreiche Forschungsplattform zur automatisierten Überwachung der Vogelaktivität entwickelt: BirdNET. Das daraus resultierende Benchmark-System liefert state-of-the-art Ergebnisse in verschiedenen akustischen Bereichen und wurde verwendet, um Expertenwerkzeuge und öffentliche Demonstratoren zu entwickeln, die dazu beitragen können, die Demokratisierung des wissenschaftlichen Fortschritts und zukünftige Naturschutzbemühungen voranzutreiben.

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Guenebaut, Boris. "Automatic Subtitle Generation for Sound in Videos". Thesis, University West, Department of Economics and IT, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:hv:diva-1784.

Texto completo

Resumen

The last ten years have been the witnesses of the emergence of any kind of video content. Moreover, the appearance of dedicated websites for this phenomenon has increased the importance the public gives to it. In the same time, certain individuals are deaf and occasionally cannot understand the meanings of such videos because there is not any text transcription available. Therefore, it is necessary to find solutions for the purpose of making these media artefacts accessible for most people. Several software propose utilities to create subtitles for videos but all require an extensive participation of the user. Thence, a more automated concept is envisaged. This thesis report indicates a way to generate subtitles following standards by using speech recognition. Three parts are distinguished. The first one consists in separating audio from video and converting the audio in suitable format if necessary. The second phase proceeds to the recognition of speech contained in the audio. The ultimate stage generates a subtitle file from the recognition results of the previous step. Directions of implementation have been proposed for the three distinct modules. The experiment results have not done enough satisfaction and adjustments have to be realized for further work. Decoding parallelization, use of well trained models, and punctuation insertion are some of the improvements to be done.

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Cowling, Michael y n/a. "Non-Speech Environmental Sound Classification System for Autonomous Surveillance". Griffith University. School of Information Technology, 2004. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040428.152425.

Texto completo

Resumen

Sound is one of a human beings most important senses. After vision, it is the sense most used to gather information about the environment. Despite this, comparatively little research has been done into the field of sound recognition. The research that has been done mainly centres around the recognition of speech and music. Our auditory environment is made up of many sounds other than speech and music. This sound information can be taped into for the benefit of specific applications such as security systems. Currently, most researchers are ignoring this sound information. This thesis investigates techniques to recognise environmental non-speech sounds and their direction, with the purpose of using these techniques in an autonomous mobile surveillance robot. It also presents advanced methods to improve the accuracy and efficiency of these techniques. Initially, this report presents an extensive literature survey, looking at the few existing techniques for non-speech environmental sound recognition. This survey also, by necessity, investigates existing techniques used for sound recognition in speech and music. It also examines techniques used for direction detection of sounds. The techniques that have been identified are then comprehensively compared to determine the most appropriate techniques for non-speech sound recognition. A comprehensive comparison is performed using non-speech sounds and several runs are performed to ensure accuracy. These techniques are then ranked based on their effectiveness. The best technique is found to be either Continuous Wavelet Transform feature extraction with Dynamic Time Warping or Mel-Frequency Cepstral Coefficients with Dynamic Time Warping. Both of these techniques achieve a 70% recognition rate. Once the best of the existing classification techniques is identified, the problem of uncountable sounds in the environment can be addressed. Unlike speech recognition, non-speech sound recognition requires recognition from a much wider library of sounds. Due to this near-infinite set of example sounds, the characteristics and complexity of non-speech sound recognition techniques increases. To address this problem, a systematic scheme needs to be developed for non-speech sound classification. Several different approaches are examined. Included is a new design for an environmental sound taxonomy based on an environmental sound alphabet. This taxonomy works over three levels and classifies sounds based on their physical characteristics. Its performance is compared with a technique that generates a structured tree automatically. These structured techniques are compared for different data sets and results are analysed. Comparable results are achieved for these techniques with the same data set as previously used. In addition, the results and greater information from these experiments is used to infer some information about the structure of environmental sounds in general. Finally, conclusions are drawn on both sets of techniques and areas of future research stemming from this thesis are explored.

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Cowling, Michael. "Non-Speech Environmental Sound Classification System for Autonomous Surveillance". Thesis, Griffith University, 2004. http://hdl.handle.net/10072/365386.

Texto completo

Resumen

Sound is one of a human beings most important senses. After vision, it is the sense most used to gather information about the environment. Despite this, comparatively little research has been done into the field of sound recognition. The research that has been done mainly centres around the recognition of speech and music. Our auditory environment is made up of many sounds other than speech and music. This sound information can be taped into for the benefit of specific applications such as security systems. Currently, most researchers are ignoring this sound information. This thesis investigates techniques to recognise environmental non-speech sounds and their direction, with the purpose of using these techniques in an autonomous mobile surveillance robot. It also presents advanced methods to improve the accuracy and efficiency of these techniques. Initially, this report presents an extensive literature survey, looking at the few existing techniques for non-speech environmental sound recognition. This survey also, by necessity, investigates existing techniques used for sound recognition in speech and music. It also examines techniques used for direction detection of sounds. The techniques that have been identified are then comprehensively compared to determine the most appropriate techniques for non-speech sound recognition. A comprehensive comparison is performed using non-speech sounds and several runs are performed to ensure accuracy. These techniques are then ranked based on their effectiveness. The best technique is found to be either Continuous Wavelet Transform feature extraction with Dynamic Time Warping or Mel-Frequency Cepstral Coefficients with Dynamic Time Warping. Both of these techniques achieve a 70% recognition rate. Once the best of the existing classification techniques is identified, the problem of uncountable sounds in the environment can be addressed. Unlike speech recognition, non-speech sound recognition requires recognition from a much wider library of sounds. Due to this near-infinite set of example sounds, the characteristics and complexity of non-speech sound recognition techniques increases. To address this problem, a systematic scheme needs to be developed for non-speech sound classification. Several different approaches are examined. Included is a new design for an environmental sound taxonomy based on an environmental sound alphabet. This taxonomy works over three levels and classifies sounds based on their physical characteristics. Its performance is compared with a technique that generates a structured tree automatically. These structured techniques are compared for different data sets and results are analysed. Comparable results are achieved for these techniques with the same data set as previously used. In addition, the results and greater information from these experiments is used to infer some information about the structure of environmental sounds in general. Finally, conclusions are drawn on both sets of techniques and areas of future research stemming from this thesis are explored.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information Technology
Full Text

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Taft, Daniel Adam. "Cochlear implant sound coding with across-frequency delays". Connect to thesis, 2009. http://repository.unimelb.edu.au/10187/5783.

Texto completo

Resumen

The experiments described in this thesis investigate the temporal relationship between frequency bands in a cochlear implant sound processor. Initial studies were of cochlea-based traveling wave delays for cochlear implant sound processing strategies. These were later broadened into studies of an ensemble of across-frequency delays.
Before incorporating cochlear delays into a cochlear implant processor, a set of suitable delays was determined with a psychoacoustic calibration to pitch perception, since normal cochlear delays are a function of frequency. The first experiment assessed the perception of pitch evoked by electrical stimuli from cochlear implant electrodes. Six cochlear implant users with acoustic hearing in their non-implanted ears were recruited for this, since they were able to compare electric stimuli to acoustic tones. Traveling wave delays were then computed for each subject using the frequencies matched to their electrodes. These were similar across subjects, ranging over 0-6 milliseconds along the electrode array.
The next experiment applied the calibrated delays to the ACE strategy filter outputs before maxima selection. The effects upon speech perception in noise were assessed with cochlear implant users, and a small but significant improvement was observed. A subsequent sensitivity analysis indicated that accurate calibration of the delays might not be necessary after all; instead, a range of across-frequency delays might be similarly beneficial.
A computational investigation was performed next, where a corpus of recorded speech was passed through the ACE cochlear implant sound processing strategy in order to determine how across-frequency delays altered the patterns of stimulation. A range of delay vectors were used in combination with a number of processing parameter sets and noise levels. The results showed that additional stimuli from broadband sounds (such as the glottal pulses of vowels) are selected when frequency bands are desynchronized with across-frequency delays. Background noise contains fewer dominant impulses than a single talker and so is not enhanced in this way.
In the following experiment, speech perception with an ensemble of across-frequency delays was assessed with eight cochlear implant users. Reverse cochlear delays (high frequency delays) were equivalent to conventional cochlear delays. Benefit was diminished for larger delays. Speech recognition scores were at baseline with random delay assignments. An information transmission analysis of speech in quiet indicated that the discrimination of voiced cues was most improved with across-frequency delays. For some subjects, this was seen as improved vowel discrimination based on formant locations and improved transmission of the place of articulation of consonants.
A final study indicated that benefits to speech perception with across-frequency delays are diminished when the number of maxima selected per frame is increased above 8-out-of-22 frequency bands.

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Sturtivant, Christopher R. "Extraction and recognition of tonal sounds produced by small cetaceans and identification of individuals". Thesis, Loughborough University, 1997. https://dspace.lboro.ac.uk/2134/6761.

Texto completo

Resumen

The by-catch of small cetaceans in fishing nets has been identified as a widespread problem, but attempts to reduce this require an understanding of the way these animals behave around the nets. One of the problems with assessing changes in behaviour between encounters is the difficulty of identifying individuals. Acoustic identification techniques overcome some of the problems associated with visual ID, and field research has shown that the presence of a sonobuoy and hydrophone have no effect on dolphin behaviour in the field. Dolphins produce whistles that can be used for identification, although current theory suggests these identify small groups rather than individuals. Novel algorithms have been developed to detect and process these tonal whistles, and their characteristic time-frequency-intensity contours extracted from the raw signals. Feature extraction techniques were developed for the contours based on timefrequency 'shape' of the contours, allowing a syntactic pattern recognition approach based around hidden Markov modelling to be employed for classification. The algorithms have enabled the whistles from concurrent whistles to be separated and analysed. Contours of 101 wild bottlenose dolphin whistles were successfully characterised. Analysis of the resulting classes indicated one group occurring only once and two other groups occurred twice but on different days. Another study was conducted of three groups of common dolphin, with a total of 49 recorded whistles analysed. The first group was found to contain whistles significantly different to either of the other two, although neither similarity nor dissimilarity could be inferred on the second and third. Further analysis suggested there were indeed two separate groups of dolphins for the last two groups, but that there was a period of overlap in their recording. A significant difference could be found between them once certain classes were re-assigned. It should be possible to apply these same techniques to a wider range of odontocete species, since most of those studied have been found to exhibit similar whistles. The tasks of whistle detection, isolation, and encoding can be applied automatically by computer with no loss of identity information, and these encoded contours can subsequently be quantitatively classified by their shape.

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Malheiro, Frederico Alberto Santos de Carteado. "Automatic musical instrument recognition for multimedia indexing". Master's thesis, Faculdade de Ciências e Tecnologia, 2011. http://hdl.handle.net/10362/6124.

Texto completo

Resumen

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
The subject of automatic indexing of multimedia has been a target of numerous discussion and study. This interest is due to the exponential growth of multimedia content and the subsequent need to create methods that automatically catalogue this data. To fulfil this idea, several projects and areas of study have emerged. The most relevant of these are the MPEG-7 standard, which defines a standardized system for the representation and automatic extraction of information present in the content, and Music Information Retrieval (MIR), which gathers several paradigms and areas of study relating to music. The main approach to this indexing problem relies on analysing data to obtain and identify descriptors that can help define what we intend to recognize (as, for instance,musical instruments, voice, facial expressions, and so on), this then provides us with information we can use to index the data. This dissertation will focus on audio indexing in music, specifically regarding the recognition of musical instruments from recorded musical notes. Moreover, the developed system and techniques will also be tested for the recognition of ambient sounds (such as the sound of running water, cars driving by, and so on). Our approach will use non-negative matrix factorization to extract features from various types of sounds, these will then be used to train a classification algorithm that will be then capable of identifying new sounds.

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

El-Feghaly, Edmond M. "The influence of sound spectrum on recognition of temporal pattern of cricket (Teleogryllus oceanicus) song /". Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=60656.

Texto completo

Resumen

The phonotactic steering behavior of tethered flying crickets (Teleogryllus oceanicus) was examined as a measure of the insect's attraction to temporal patterns of calling song at different frequencies and intensities. A stimulus with a 5 kHz carrier becomes less attractive the further its pulse repetition rate deviates from 16 pulses/s. Increasing the intensity increases selectivity for temporal pattern. At sufficiently high intensity level crickets cease to respond to stimuli with altered temporal patterns.
High frequency neurons were suspected to be behind cessation of responsiveness to stimuli with altered temporal features. This hypothesis predicts that the effect on selectivity of increasing the intensity of the 5 kHz stimulus might be mimicked by adding a high frequency to the stimulus. My results contradict this hypothesis.
The response to a 30 kHz carrier demonstrates a dependency on the duration and pulse repetition rate of the stimulus.

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Deily, Joshua Allen. "Mechanisms of call recognition in three sympatric species of Neoconocephalus (Orthoptera: Tettigoniidae) asymmetrical interactions and evolutionary implications /". Diss., Columbia, Mo. : University of Missouri-Columbia, 2006. http://hdl.handle.net/10355/4357.

Texto completo

Resumen

Thesis (Ph.D.)--University of Missouri-Columbia, 2006.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file viewed on (February 26, 2007) Vita. Includes bibliographical references.

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

White, Teresa. "The effects of mnemonics on letter recognition and letter sound acquisition of at-risk kindergarten students". [College Station, Tex. : Texas A&M University, 2006. http://hdl.handle.net/1969.1/ETD-TAMU-1100.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

Lam, Chi-kan. "Detection of air leaks using pattern recognition techniques and neurofuzzy networks /". Hong Kong : University of Hong Kong, 2000. http://sunzi.lib.hku.hk/hkuto/record.jsp?B21981826.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

林智勤 y Chi-kan Lam. "Detection of air leaks using pattern recognition techniques and neurofuzzy networks". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2000. http://hub.hku.hk/bib/B31222833.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Bajzík, Jakub. "Rozpoznání zvukových událostí pomocí hlubokého učení". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-401993.

Texto completo

Resumen

This paper deals with processing and recognition of events in audio signal. The work explores the possibility of using audio signal visualization and subsequent use of convolutional neural networks as a classifier for recognition in real use. Recognized audio events are gunshots placed in a sound background such as street noise, human voice, animal sounds, and other forms of random noise. Before the implementation, a large database with various parameters, especially reverberation and time positioning within the processed section, is created. In this work are used freely available platforms Keras and TensorFlow for work with neural networks.

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Parks, Sherrie L. "The sound of music: The influence of evoked emotion on recognition memory for musical excerpts across the lifespan". OpenSIUC, 2013. https://opensiuc.lib.siu.edu/theses/1143.

Texto completo

Resumen

TITLE: THE SOUND OF MUSIC: THE INFLUENCE OF EVOKED EMOTION ON RECOGNITION MEMORY FOR MUSICAL EXCERPTS ACROSS THE LIFESPAN Socioemotional Selectivity Theory (Carstensen, 1999) posits that as people age, they selectively focus on positive aspects of emotional stimuli as opposed to negative as a way of regulating emotions. Thus, older adults remember positive information better than negative. This hypothesis has been tested extensively with visual stimuli, but rarely with auditory stimuli. Findings from this study provide support in the auditory domain. In this study, 135 younger, middle-aged, and older adults heard consonant (pleasant) and dissonant (unpleasant) musical excerpts. Participants were randomly assigned to either a Study Only condition, in which they heard excerpts and studied them for later recognition, a Rate Only condition, in which they rated the excerpts and were tested later in a surprise recognition test, or a Rate and Study condition, in which they rated and studied the excerpts for later recognition. Results indicated that younger, middle-aged and older adults remembered consonant (pleasant) musical excerpts better than dissonant (unpleasant) musical excerpts overall and provide support for the hypotheses of the Socioemotional Selectivity Theory.

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

Alsharhan, Iman. "Exploiting phonological constraints and automatic identification of speaker classes for Arabic speech recognition". Thesis, University of Manchester, 2014. https://www.research.manchester.ac.uk/portal/en/theses/exploiting-phonologicalconstraints-and-automaticidentification-of-speakerclasses-for-arabic-speechrecognition(8d443cae-e9e4-4f40-8884-99e2a01df8e9).html.

Texto completo

Resumen

The aim of this thesis is to investigate a number of factors that could affect the performance of an Arabic automatic speech understanding (ASU) system. The work described in this thesis belongs to the speech recognition (ASR) phase, but the fact that it is part of an ASU project rather than a stand-alone piece of work on ASR influences the way in which it will be carried out. Our main concern in this work is to determine the best way to exploit the phonological properties of the Arabic language in order to improve the performance of the speech recogniser. One of the main challenges facing the processing of Arabic is the effect of the local context, which induces changes in the phonetic representation of a given text, thereby causing the recognition engine to misclassifiy it. The proposed solution is to develop a set of language-dependent grapheme-to-allophone rules that can predict such allophonic variations and eventually provide a phonetic transcription that is sensitive to the local context for the ASR system. The novel aspect of this method is that the pronunciation of each word is extracted directly from a context-sensitive phonetic transcription rather than a predened dictionary that typically does not reect the actual pronunciation of the word. Besides investigating the boundary effect on pronunciation, the research also seeks to address the problem of Arabic's complex morphology. Two solutions are proposed to tackle this problem, namely, using underspecified phonetic transcription to build the system, and using phonemes instead of words to build the hidden markov models (HMMS). The research also seeks to investigate several technical settings that might have an effect on the system's performance. These include training on the sub-population to minimise the variation caused by training on the main undifferentiated population, as well as investigating the correlation between training size and performance of the ASR system.

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

Dhakal, Parashar. "Novel Architectures for Human Voice and Environmental Sound Recognitionusing Machine Learning Algorithms". University of Toledo / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1531349806743278.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Hoffman, Jeffrey Dean. "Using Blind Source Separation and a Compact Microphone Array to Improve the Error Rate of Speech Recognition". PDXScholar, 2016. https://pdxscholar.library.pdx.edu/open_access_etds/3367.

Texto completo

Resumen

Automatic speech recognition has become a standard feature on many consumer electronics and automotive products, and the accuracy of the decoded speech has improved dramatically over time. Often, designers of these products achieve accuracy by employing microphone arrays and beamforming algorithms to reduce interference. However, beamforming microphone arrays are too large for small form factor products such as smart watches. Yet these small form factor products, which have precious little space for tactile user input (i.e. knobs, buttons and touch screens), would benefit immensely from a user interface based on reliably accurate automatic speech recognition. This thesis proposes a solution for interference mitigation that employs blind source separation with a compact array of commercially available unidirectional microphone elements. Such an array provides adequate spatial diversity to enable blind source separation and would easily fit in a smart watch or similar small form factor product. The solution is characterized using publicly available speech audio clips recorded for the purpose of testing automatic speech recognition algorithms. The proposal is modelled in different interference environments and the efficacy of the solution is evaluated. Factors affecting the performance of the solution are identified and their influence quantified. An expectation is presented for the quality of separation as well as the resulting improvement in word error rate that can be achieved from decoding the separated speech estimate versus the mixture obtained from a single unidirectional microphone element. Finally, directions for future work are proposed, which have the potential to improve the performance of the solution thereby making it a commercially viable product.

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Lareau, Jonathan. "Application of shifted delta cepstral features for GMM language identification /". Electronic version of thesis, 2006. https://ritdml.rit.edu/dspace/handle/1850/2686.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Tindale, Adam. "Classification of snare drum sounds using neural networks". Thesis, McGill University, 2004. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=81515.

Texto completo

Resumen

The development of computer algorithms for music instrument identification and parameter extraction in digital audio signals is an active research field. A musician can listen to music and instantly identify different instruments and the timbres produced by various playing techniques. Creating software to allow computers to do the same is much more challenging. This thesis will use digital signal processing and machine learning techniques to differentiate snare drum timbres produced by different stroke positions and stroke techniques.

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

Martí, Guerola Amparo. "Multichannel audio processing for speaker localization, separation and enhancement". Doctoral thesis, Universitat Politècnica de València, 2013. http://hdl.handle.net/10251/33101.

Texto completo

Resumen

This thesis is related to the field of acoustic signal processing and its applications to emerging communication environments. Acoustic signal processing is a very wide research area covering the design of signal processing algorithms involving one or several acoustic signals to perform a given task, such as locating the sound source that originated the acquired signals, improving their signal to noise ratio, separating signals of interest from a set of interfering sources or recognizing the type of source and the content of the message. Among the above tasks, Sound Source localization (SSL) and Automatic Speech Recognition (ASR) have been specially addressed in this thesis. In fact, the localization of sound sources in a room has received a lot of attention in the last decades. Most real-word microphone array applications require the localization of one or more active sound sources in adverse environments (low signal-to-noise ratio and high reverberation). Some of these applications are teleconferencing systems, video-gaming, autonomous robots, remote surveillance, hands-free speech acquisition, etc. Indeed, performing robust sound source localization under high noise and reverberation is a very challenging task. One of the most well-known algorithms for source localization in noisy and reverberant environments is the Steered Response Power - Phase Transform (SRP-PHAT) algorithm, which constitutes the baseline framework for the contributions proposed in this thesis. Another challenge in the design of SSL algorithms is to achieve real-time performance and high localization accuracy with a reasonable number of microphones and limited computational resources. Although the SRP-PHAT algorithm has been shown to be an effective localization algorithm for real-world environments, its practical implementation is usually based on a costly fine grid-search procedure, making the computational cost of the method a real issue. In this context, several modifications and optimizations have been proposed to improve its performance and applicability. An effective strategy that extends the conventional SRP-PHAT functional is presented in this thesis. This approach performs a full exploration of the sampled space rather than computing the SRP at discrete spatial positions, increasing its robustness and allowing for a coarser spatial grid that reduces the computational cost required in a practical implementation with a small hardware cost (reduced number of microphones). This strategy allows to implement real-time applications based on location information, such as automatic camera steering or the detection of speech/non-speech fragments in advanced videoconferencing systems. As stated before, besides the contributions related to SSL, this thesis is also related to the field of ASR. This technology allows a computer or electronic device to identify the words spoken by a person so that the message can be stored or processed in a useful way. ASR is used on a day-to-day basis in a number of applications and services such as natural human-machine interfaces, dictation systems, electronic translators and automatic information desks. However, there are still some challenges to be solved. A major problem in ASR is to recognize people speaking in a room by using distant microphones. In distant-speech recognition, the microphone does not only receive the direct path signal, but also delayed replicas as a result of multi-path propagation. Moreover, there are multiple situations in teleconferencing meetings when multiple speakers talk simultaneously. In this context, when multiple speaker signals are present, Sound Source Separation (SSS) methods can be successfully employed to improve ASR performance in multi-source scenarios. This is the motivation behind the training method for multiple talk situations proposed in this thesis. This training, which is based on a robust transformed model constructed from separated speech in diverse acoustic environments, makes use of a SSS method as a speech enhancement stage that suppresses the unwanted interferences. The combination of source separation and this specific training has been explored and evaluated under different acoustical conditions, leading to improvements of up to a 35% in ASR performance.
Martí Guerola, A. (2013). Multichannel audio processing for speaker localization, separation and enhancement [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/33101
TESIS

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Dodenhoff, Danielle J. "AN ANALYSIS OF ACOUSTIC COMMUNICATION WITHIN THE SOCIAL SYSTEM OF DOWNY WOODPECKERS (PICOIDES PUBESCENS)". The Ohio State University, 2002. http://rave.ohiolink.edu/etdc/view?acc_num=osu1032381559.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

Kahl, Stefan [Verfasser], Maximilian [Akademischer Betreuer] Eibl, Maximilian [Gutachter] Eibl, Marc [Gutachter] Ritter y Holger [Akademischer Betreuer] Klinck. "Identifying Birds by Sound: Large-scale Acoustic Event Recognition for Avian Activity Monitoring / Stefan Kahl ; Gutachter: Maximilian Eibl, Marc Ritter ; Maximilian Eibl, Holger Klinck". Chemnitz : Universitätsverlag Chemnitz, 2020. http://d-nb.info/1219664502/34.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

Choi, Hyung Keun. "Blind source separation of the audio signals in a real world". Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/14986.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

Roman, Nicoleta. "Auditory-based algorithms for sound segregation in multisource and reverberant environments". Connect to resource, 2005. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1124370749.

Texto completo

Resumen

Thesis (Ph. D.)--Ohio State University, 2005.
Title from first page of PDF file. Document formatted into pages; contains i-xxii, xx-xxi, 183 p.; also includes graphics. Includes bibliographical references (p. 171-183). Available online via OhioLINK's ETD Center

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

Odehnal, Jiří. "Řízení a měření sportovních drilů hlasem/zvuky". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-399705.

Texto completo

Resumen

This master's thesis deals with the design and development of mobile aplication for Android platform. The aim of the work is to implement a simple and user-friendly user interface that would support and assist the user in trainning and sport exercises. The thesis also include implementation of sound detection to support during exercises and voice instruction by application. In practice the application should help in making training exercises more comfortable without the user being forced to keep mobile device in hand.

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

Smith, Daniel. "An analysis of blind signal separation for real time application". Access electronically, 2006. http://www.library.uow.edu.au/adt-NWU/public/adt-NWU20070815.152400/index.html.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

Unnikrishnan, Harikrishnan. "AUDIO SCENE SEGEMENTATION USING A MICROPHONE ARRAY AND AUDITORY FEATURES". UKnowledge, 2010. http://uknowledge.uky.edu/gradschool_theses/622.

Texto completo

Resumen

Auditory stream denotes the abstract effect a source creates in the mind of the listener. An auditory scene consists of many streams, which the listener uses to analyze and understand the environment. Computer analyses that attempt to mimic human analysis of a scene must first perform Audio Scene Segmentation (ASS). ASS find applications in surveillance, automatic speech recognition and human computer interfaces. Microphone arrays can be employed for extracting streams corresponding to spatially separated sources. However, when a source moves to a new location during a period of silence, such a system loses track of the source. This results in multiple spatially localized streams for the same source. This thesis proposes to identify local streams associated with the same source using auditory features extracted from the beamformed signal. ASS using the spatial cues is first performed. Then auditory features are extracted and segments are linked together based on similarity of the feature vector. An experiment was carried out with two simultaneous speakers. A classifier is used to classify the localized streams as belonging to one speaker or the other. The best performance was achieved when pitch appended with Gammatone Frequency Cepstral Coefficeints (GFCC) was used as the feature vector. An accuracy of 96.2% was achieved.

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

Sklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition". Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.

Texto completo

Resumen

In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

ANFLO, FREDRIK. "M8 the Four-legged Robot". Thesis, KTH, Skolan för industriell teknik och management (ITM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279836.

Texto completo

Resumen

In recent times robots are becoming more and more common. They are everywhere. Walking, running, swimming, flying and many of them have much in common with the creatures inhabiting this planet. A lot of it in order to make them appeal more to us, instead of simply being portrayed as stone cold machines. Continuing on the path evolution has laid out before us seems to be a wise decision to make, aspiring to efficiently utilize our knowledge about science and engineering with the vision of improving our future. With the intention to simulate a four legged animal and evaluate the means of interacting with one´s surrounding, a quadruped locomotion system together with two types of sound and voice interacting systems have been assessed. A demonstrator was built to test the real world problems and decide what kind of interacting that is most beneficial. The results indicate that voice commands and speech recognition, rather than sounds from the environment are more practical and robust as a way of interacting with one´s surroundings.
På senare tider har robotar blivit mer och mer vanliga. De är överallt. Gående, springande, simmande, flygande och många av dem har mycket gemensamt med de varelser som lever på denna jord. Mycket av detta för att tilltala oss mer, istället för att framstå som enbart iskalla maskiner. Att fortsätta på den väg som evolutionen har lagt framför oss verkar vara ett vist beslut att ta, i strävan efter att effektivt utnyttja våra kunskaper i vetenskap och ingenjörskonst med visionen om att förbättra vår framtid. Med målet att simulera ett fyrbent djur och utvärdera möjligheterna till att interagera med ens omgivning, har ett fyrbent förflyttningssystem tillsammans med två typer av ljud och röstsystem tagits fram. En prototyp kontruerades för att testa de problem som uppstår i den verkliga värden och för att kunna bedöma vilket sätt att interagera som visar vara sig mest fördelaktigt. Resultaten indikerar att röstkommandon och röstigenkänning, snarare än ljuddetektion från omgivningen är mer praktiska och robusta som ett sätt att interagera med sin närmiljö.

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

Reis, Clovis Ferreira dos. "Sistema Modular para Detecção e Reconhecimento de Disparos de Armas de Fogo". Universidade Federal da Paraíba, 2015. http://tede.biblioteca.ufpb.br:8080/handle/tede/9244.

Texto completo

Resumen

Submitted by Fernando Souza (fernandoafsou@gmail.com) on 2017-08-11T14:47:59Z No. of bitstreams: 1 arquivototal.pdf: 3075980 bytes, checksum: 34017b499d4b0a096285315cb614b985 (MD5)
Made available in DSpace on 2017-08-11T14:47:59Z (GMT). No. of bitstreams: 1 arquivototal.pdf: 3075980 bytes, checksum: 34017b499d4b0a096285315cb614b985 (MD5) Previous issue date: 2015-12-04
The urban violence has been increasing in almost Brazilian state and in order to face this threat, new technological tools are required by the police authorities in order to support their decisions on how and when the few available resources should be employed to combat criminality. In this context, this work presents an embedded computational tool that is suitable for detecting gun-shots automatically. To provide the necessary knowledge to understand the work, a brief description about impulsive sounds, re guns and the gun-shot characteristics are initially presented. Latter, a system based on modules is proposed to detect and recognize impulsive sound, which are characteristics of gun-shots. However, since the system contain several modules in this work we have focus only on two of them: the module for detecting impulsive sounds and the module for distinguish a gun-shot from any other impulsive sound. For the impulsive detection module, three well-known algorithms were analyzed on the same condition: the fourth derivative of the Root Median Square (RMS), the Conditional Median Filter (CMF) and the Variance Method (VM). The algorithms were tested based on four measured performance parameters: accuracy, precision, sensibility and speci city. And in order to determine the most e cient algorithm for detecting impulsive sounds, a cadence test with impulsive sounds, without or with additional noise (constant or increasing) was performed. After this analysis, the parameters employed on the CMF and VM method were tested in a wide range of con gurations to verify any possibility of optimization. Once this optimal method was determined, the classi cation module to recognize gun-shots started to be implemented. For this, two distinguish methods were compared, one based on the signal wrapped over the time and the other based on most relevant frequencies obtained from the Fourier transform. From the comparison between the two methods it was observed that the wrapped method provided 54% of accuracy in the classi cation of impulsive sounds, while with the frequency analysis this value was 72%.
A violência urbana vem crescendo anualmente em praticamente todos os estados brasileiros e para fazer face a essa amea ca, as autoridades policiais necessitam cada vez mais de ferramentas tecnológicas que os auxiliem na tomada de decisões sobre quando e como empregar os parcos recursos disponíveis a repressão do crime. Neste contexto, e apresentado nesse trabalho uma ferramenta computacional, passível de ser embarcada em dispositivos m oveis, que possibilita realizar a detecção e reconhecimento automático de disparos de armas de fogo. Para tanto, são descritos inicialmente os fundamentos básicos sobre sons impulsivos, armas de fogo e caracter sticas de disparos. Posteriormente, descreve-se uma proposta de um sistema modular de detecção e reconhecimento de disparos. No entanto, devido ao sistema conter diversos m odulos complexos, este trabalho teve foco em dois deles: o modulo de detecção de sons impulsivos e o modulo de classificação, que permite distinguir disparos de armas de fogo de outros sons impulsivos. Para o módulo de detecção de sons impulsivos foram analisados três algoritmos amplamente descritos na literatura: o algoritmo da quarta derivada da RMS, o da Conditional Median Filter (CMF) e o Método da Variância (VM). Os algoritmos foram testados com base nas medidas de desempenho da acurácia, precisão, sensibilidade e especificidade. E a para determinar o método mais e ciente, foram realizados testes de cadências, com sons impulsivos sem adição de ru do sonoro, com adição de ruído constante e com ruído variável. Ao final dessa anáise, os par^ametros preconizados na literatura para os m etodos CMF e VM foram alterados para uma verificação de possibilidade de otimização. De nido o algoritmo de detecção de impulso mais e ciente, iniciou-se o desenvolvimento do módulo de classificação. Para isso, foram propostas duas t ecnicas para o reconhecimento de disparos de armas de fogo, uma utilizando uma compara c~ao da envolt oria do som no dom nio do tempo e outra baseada na comparação de frequências dominantes obtidas por meio da transformada de Fourier. Numa comparação entre as duas técnicas observou-se que com a técnica da envoltória e poss vel identi car 54% dos sons impulsivos, enquanto que com a t ecnica baseada no dom nio da frequ^encia, este percentual foi de 72%.

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Strowger, Megan E. "Interoceptive sounds and emotion recognition". Thesis, University of the Sciences in Philadelphia, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10294821.

Texto completo

Resumen

Background: Perception of changes in physiological arousal is theorized to form the basis for which the brain labels emotional states. Interoception is a process by which individuals become aware of physiological sensations. Lowered emotional awareness has been found to be associated with lower interoceptive awareness. Alexithymia is a personality trait associated with lowered emotion recognition ability which affects 10-20% of the university student population in Western countries. Research suggests that being made aware of one’s heartbeat may enhance emotional awareness. Objective(s): The present study attempted to enhance emotion recognition abilities directly via an experimental interoceptive manipulation in order to decrease levels of alexithymia. It had three aims: 1) To examine whether exposing individuals to the interoceptive sound of their own heart beat could illicit changes in their emotion recognition abilities,2) To examine whether higher emotion recognition abilities as a result of listening to one’s own heartbeat differed by alexithymia group, and 3) if higher interoceptive awareness was associated with higher RME scores during the own heartbeat sound condition. Methods: 36 participants were recruited from an introductory psychology class at the University of the Sciences in Philadelphia. Participants completed lab-based tests of emotion recognition followed by questionnaires assessing alexithymia and interoceptive abilities. During the lab-based test of emotion recognition, participants were subjected to an interoceptive manipulation by listening to three sounds (in random order): own heartbeat, another person’s heartbeat, and footsteps. To test aim 1, a repeated-measures ANOVA examined differences in emotion recognition scores during the various sound conditions (i.e., no sound, own heartbeat, other heartbeat, footsteps). For evaluating aim 2, a two way 3 x 4 RM ANOVA tested for differences in RME scores by sound condition when individuals were alexithymic, possibly alexithymic and not alexithymic. Aim 3 was examined using correlations between the attention to body and emotion awareness subscale scores separately with RME score for own heartbeat. Results: Contrary to predictions, RME performance did not vary according to body sound condition, F (3, 105) =.53, p = .67, η² = .02. A significant interaction was seen between alexithymia category and RME scores during the interoceptive sound conditions, F (6, 99) = 2.27, p = .04, η ² = .12. However, post-hoc analyses did not reveal significant differences between specific alexithymia categories and RME scores. A significant positive relationship was seen between RME during own heartbeat and being able to pay attention to the body (r (36) = .34, p = .05, R² = .11). Discussion: Our results suggest that more attention was directed toward facial emotions when subjects listened to their own heartbeat but this increase did not result in measurable changes in RME performance. Limitations: Although using a within-subjects design potentially increased statistical power, a between-subjects design with random assignment could have eliminated the effects of repeated measurement and condition order. Implications: The most novel of these findings was that individuals paid more attention to the emotional stimuli when hearing their own heartbeat. More research is needed to understand if the interoceptive sound manipulation may aide in improving other cognitive functions or earlier steps in the emotion process. Future research using other measures of interoception and attention are necessary to confirm the result.

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Sehili, Mohamed el Amine. "Reconnaissance des sons de l’environnement dans un contexte domotique". Thesis, Evry, Institut national des télécommunications, 2013. http://www.theses.fr/2013TELE0014/document.

Texto completo

Resumen

Dans beaucoup de pays du monde, on observe une importante augmentation du nombre de personnes âgées vivant seules. Depuis quelques années, un nombre significatif de projets de recherche sur l’assistance aux personnes âgées ont vu le jour. La plupart de ces projets utilisent plusieurs modalités (vidéo, son, détection de chute, etc.) pour surveiller l'activité de la personne et lui permettre de communiquer naturellement avec sa maison "intelligente", et, en cas de danger, lui venir en aide au plus vite. Ce travail a été réalisé dans le cadre du projet ANR VERSO de recherche industrielle, Sweet-Home. Les objectifs du projet sont de proposer un système domotique permettant une interaction naturelle (par commande vocale et tactile) avec la maison, et procurant plus de sécurité à l'habitant par la détection des situations de détresse. Dans ce cadre, l'objectif de ce travail est de proposer des solutions pour la reconnaissance des sons de la vie courante dans un contexte réaliste. La reconnaissance du son fonctionnera en amont d'un système de Reconnaissance Automatique de la Parole. Les performances de celui-ci dépendent donc de la fiabilité de la séparation entre la parole et les autres sons. Par ailleurs, une bonne reconnaissance de certains sons, complétée par d'autres sources informations (détection de présence, détection de chute, etc.) permettrait de bien suivre les activités de la personne et de détecter ainsi les situations de danger. Dans un premier temps, nous nous sommes intéressés aux méthodes en provenance de la Reconnaissance et Vérification du Locuteur. Dans cet esprit, nous avons testé des méthodes basées sur GMM et SVM. Nous avons, en particulier, testé le noyau SVM-GSL (SVM GMM Supervector Linear Kernel) utilisé pour la classification de séquences. SVM-GSL est une combinaison de SVM et GMM et consiste à transformer une séquence de vecteurs de longueur arbitraire en un seul vecteur de très grande taille, appelé Super Vecteur, et utilisé en entrée d'un SVM. Les expérimentations ont été menées en utilisant une base de données créée localement (18 classes de sons, plus de 1000 enregistrements), puis le corpus du projet Sweet-Home, en intégrant notre système dans un système plus complet incluant la détection multi-canaux du son et la reconnaissance de la parole. Ces premières expérimentations ont toutes été réalisées en utilisant un seul type de coefficients acoustiques, les MFCC. Par la suite, nous nous sommes penchés sur l'étude d'autres familles de coefficients en vue d'en évaluer l'utilisabilité en reconnaissance des sons de l'environnement. Notre motivation fut de trouver des représentations plus simples et/ou plus efficaces que les MFCC. En utilisant 15 familles différentes de coefficients, nous avons également expérimenté deux approches pour transformer une séquence de vecteurs en un seul vecteur, à utiliser avec un SVM linéaire. Dans le première approche, on calcule un nombre fixe de coefficients statistiques qui remplaceront toute la séquence de vecteurs. La seconde approche (une des contributions de ce travail) utilise une méthode de discrétisation pour trouver, pour chaque caractéristique d'un vecteur acoustique, les meilleurs points de découpage permettant d'associer une classe donnée à un ou plusieurs intervalles de valeurs. La probabilité de la séquence est estimée par rapport à chaque intervalle. Les probabilités obtenues ainsi sont utilisées pour construire un seul vecteur qui remplacera la séquence de vecteurs acoustiques. Les résultats obtenus montrent que certaines familles de coefficients sont effectivement plus adaptées pour reconnaître certaines classes de sons. En effet, pour la plupart des classes, les meilleurs taux de reconnaissance ont été observés avec une ou plusieurs familles de coefficients différentes des MFCC. Certaines familles sont, de surcroît, moins complexes et comptent une seule caractéristique par fenêtre d'analyse contre 16 caractéristiques pour les MFCC
In many countries around the world, the number of elderly people living alone has been increasing. In the last few years, a significant number of research projects on elderly people monitoring have been launched. Most of them make use of several modalities such as video streams, sound, fall detection and so on, in order to monitor the activities of an elderly person, to supply them with a natural way to communicate with their “smart-home”, and to render assistance in case of an emergency. This work is part of the Industrial Research ANR VERSO project, Sweet-Home. The goals of the project are to propose a domotic system that enables a natural interaction (using touch and voice command) between an elderly person and their house and to provide them a higher safety level through the detection of distress situations. Thus, the goal of this work is to come up with solutions for sound recognition of daily life in a realistic context. Sound recognition will run prior to an Automatic Speech Recognition system. Therefore, the speech recognition’s performances rely on the reliability of the speech/non-speech separation. Furthermore, a good recognition of a few kinds of sounds, complemented by other sources of information (presence detection, fall detection, etc.) could allow for a better monitoring of the person's activities that leads to a better detection of dangerous situations. We first had been interested in methods from the Speaker Recognition and Verification field. As part of this, we have experimented methods based on GMM and SVM. We had particularly tested a Sequence Discriminant SVM kernel called SVM-GSL (SVM GMM Super Vector Linear Kernel). SVM-GSL is a combination of GMM and SVM whose basic idea is to map a sequence of vectors of an arbitrary length into one high dimensional vector called a Super Vector and used as an input of an SVM. Experiments had been carried out using a locally created sound database (containing 18 sound classes for over 1000 records), then using the Sweet-Home project's corpus. Our daily sounds recognition system was integrated into a more complete system that also performs a multi-channel sound detection and speech recognition. These first experiments had all been performed using one kind of acoustical coefficients, MFCC coefficients. Thereafter, we focused on the study of other families of acoustical coefficients. The aim of this study was to assess the usability of other acoustical coefficients for environmental sounds recognition. Our motivation was to find a few representations that are simpler and/or more effective than the MFCC coefficients. Using 15 different acoustical coefficients families, we have also experimented two approaches to map a sequence of vectors into one vector, usable with a linear SVM. The first approach consists of computing a set of a fixed number of statistical coefficients and use them instead of the whole sequence. The second one, which is one of the novel contributions of this work, makes use of a discretization method to find, for each feature within an acoustical vector, the best cut points that associates a given class with one or many intervals of values. The likelihood of the sequence is estimated for each interval. The obtained likelihood values are used to build one single vector that replaces the sequence of acoustical vectors. The obtained results show that a few families of coefficients are actually more appropriate to the recognition of some sound classes. For most sound classes, we noticed that the best recognition performances were obtained with one or many families other than MFCC. Moreover, a number of these families are less complex than MFCC. They are actually a one-feature per frame acoustical families, whereas MFCC coefficients contain 16 features per frame

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

Lirussi, Igor. "Human-Robot interaction with low computational-power humanoids". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19120/.

Texto completo

Resumen

This article investigates the possibilities of human-humanoid interaction with robots whose computational power is limited. The project has been carried during a year of work at the Computer and Robot Vision Laboratory (VisLab), part of the Institute for Systems and Robotics in Lisbon, Portugal. Communication, the basis of interaction, is simultaneously visual, verbal, and gestural. The robot's algorithm provides users a natural language communication, being able to catch and understand the person’s needs and feelings. The design of the system should, consequently, give it the capability to dialogue with people in a way that makes possible the understanding of their needs. The whole experience, to be natural, is independent from the GUI, used just as an auxiliary instrument. Furthermore, the humanoid can communicate with gestures, touch and visual perceptions and feedbacks. This creates a totally new type of interaction where the robot is not just a machine to use, but a figure to interact and talk with: a social robot.

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Movin, Andreas y Jonathan Jilg. "Kan datorer höra fåglar?" Thesis, KTH, Skolan för teknikvetenskap (SCI), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254800.

Texto completo

Resumen

Ljudigenkänning möjliggörs genom spektralanalys, som beräknas av den snabba fouriertransformen (FFT), och har under senare år nått stora genombrott i samband med ökningen av datorprestanda och artificiell intelligens. Tekniken är nu allmänt förekommande, i synnerhet inom bioakustik för identifiering av djurarter, en viktig del av miljöövervakning. Det är fortfarande ett växande vetenskapsområde och särskilt igenkänning av fågelsång som återstår som en svårlöst utmaning. Även de främsta algoritmer i området är långt ifrån felfria. I detta kandidatexamensarbete implementerades och utvärderades enkla algoritmer för att para ihop ljud med en ljuddatabas. En filtreringsmetod utvecklades för att urskilja de karaktäristiska frekvenserna vid fem tidsramar som utgjorde basen för jämförelsen och proceduren för ihopparning. Ljuden som användes var förinspelad fågelsång (koltrast, näktergal, kråka och fiskmås) så väl som egeninspelad mänsklig röst (4 unga svenska män). Våra resultat visar att framgångsgraden normalt är 50–70%, den lägsta var fiskmåsen med 30% för en liten databas och den högsta var koltrasten med 90% för en stor databas. Rösterna var svårare för algoritmen att särskilja, men de hade överlag framgångsgrader mellan 50% och 80%. Dock gav en ökning av databasstorleken generellt inte en ökning av framgångsgraden. Sammanfattningsvis visar detta kandidatexamensarbete konceptbeviset bakom fågelsångigenkänning och illustrerar såväl styrkorna som bristerna av dessa enkla algoritmer som har utvecklats. Algoritmerna gav högre framgångsgrad än slumpen (25%) men det finns ändå utrymme för förbättring eftersom algoritmen vilseleddes av ljud av samma frekvenser. Ytterligare studier behövs för att bedöma den utvecklade algoritmens förmåga att identifiera ännu fler fåglar och röster.
Sound recognition is made possible through spectral analysis, computed by the fast Fourier transform (FFT), and has in recent years made major breakthroughs along with the rise of computational power and artificial intelligence. The technology is now used ubiquitously and in particular in the field of bioacoustics for identification of animal species, an important task for wildlife monitoring. It is still a growing field of science and especially the recognition of bird song which remains a hard-solved challenge. Even state-of-the-art algorithms are far from error-free. In this thesis, simple algorithms to match sounds to a sound database were implemented and assessed. A filtering method was developed to pick out characteristic frequencies at five time frames which were the basis for comparison and the matching procedure. The sounds used were pre-recorded bird songs (blackbird, nightingale, crow and seagull) as well as human voices (4 young Swedish males) that we recorded. Our findings show success rates typically at 50–70%, the lowest being the seagull of 30% for a small database and the highest being the blackbird at 90% for a large database. The voices were more difficult for the algorithms to distinguish, but they still had an overall success rate between 50% and 80%. Furthermore, increasing the database size did not improve success rates in general. In conclusion, this thesis shows the proof of concept and illustrates both the strengths as well as short-comings of the simple algorithms developed. The algorithms gave better success rates than pure chance of 25% but there is room for improvement since the algorithms were easily misled by sounds of the same frequencies. Further research will be needed to assess the devised algorithms' ability to identify even more birds and voices.

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Hrabina, Martin. "VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ". Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-409087.

Texto completo

Resumen

Táto práca sa zaoberá rozpoznávaním výstrelov a pridruženými problémami. Ako prvé je celá vec predstavená a rozdelená na menšie kroky. Ďalej je poskytnutý prehľad zvukových databáz, významné publikácie, akcie a súčasný stav veci spoločne s prehľadom možných aplikácií detekcie výstrelov. Druhá časť pozostáva z porovnávania príznakov pomocou rôznych metrík spoločne s porovnaním ich výkonu pri rozpoznávaní. Nasleduje porovnanie algoritmov rozpoznávania a sú uvedené nové príznaky použiteľné pri rozpoznávaní. Práca vrcholí návrhom dvojstupňového systému na rozpoznávanie výstrelov, monitorujúceho okolie v reálnom čase. V závere sú zhrnuté dosiahnuté výsledky a načrtnutý ďalší postup.

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Chen, Bo Min y 陳柏旻. "Sound reconstruction based on features for sound recognition". Thesis, 2015. http://ndltd.ncl.edu.tw/handle/22558650799254615024.

Texto completo

Resumen

碩士
國立清華大學
電機工程學系
104
Abstract Sounds play an important role in our life. We can communicate with each other and know what happens by listening to sounds. By extracting the feature of sounds, we can keep specific information of sounds to recognize sounds. Sound transmission can be done if sounds could be reconstructed from the transmitted features of sounds. In this research, we attempt to reconstruct sounds using features that are typically transmitted for recognition purposes. In this thesis, we take the mel frequency cepstral coefficients (MFCC), a set of features that has been commonly used for sound recognition, as the basic features for reconstruction. Because MFCC does not encode the detail of sounds, we use the pitch as additional information to enhance the completeness of the features. The sound reconstruction is based on a source-filter model which takes the reconstructed frequency response from MFCC as the spectral envelope and determines the sound source with the pitch. The critical factors of the reconstructed sound source are the frequency distribution of noise and harmonics which could be determined by the human speech production mechanism. We then combine the spectral envelope with the sound source to reconstruct sounds through a modified source-filter model. In this thesis, we test our methods by analysis and reconstruction of speech and non-speech materials. We attempt to find the factors that may affect the quality of reconstructed sounds. We also evaluate reconstructed sounds by subjective listening test and objective perceptual evaluation of audio quality ( PEAQ). The range of grades is from 1(very bad) to 5(very good). The result of listening test reveals that the grade of speech and non-speech reconstruction is about 3 to 4. PEAQ reveals that the grade of non-speech reconstruction is about 2 to 3.5 and the grade of speech reconstruction is slightly higher than 1.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "Sound recognition"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros