Dissertations / Theses: 'Influence of audio signal'

1

Saruwatari, Hiroshi. "BLIND SIGNAL SEPARATION OF AUDIO SIGNALS." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2006. http://hdl.handle.net/2237/10406.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Scott, Hugh R. R. "Multiresolution techniques for audio signal restoration." Thesis, University of Warwick, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.307347.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Ясунова, Масума Пулатівна. "Метод оцінки інтегральної активності ЕЕГ під впливом аудіо сигналів." Bachelor's thesis, КПІ ім. Ігоря Сікорського, 2021. https://ela.kpi.ua/handle/123456789/43674.

Full text

Abstract:

Обсяг звіту становить 57 сторінок, міститься 32 ілюстрації, 14 таблиць, 5 формул, 2 додатки. Загалом опрацьовано 36 джерел. Актуальність даної роботи полягаєу визначенні залежності біоелектричної активності головного мозку від амплітудно-частотних характеристик звукового сигналу. У наш час музика супроводжує наше життя, тому важливо визначити який вплив вона має на електричну активність головного мозку і як саме змінюються характеристики показників мозкової активності при прослуховуванні музичних сигналів. Мета:визначити ефективність впливу аудіо сигналів різного амплітудно-частотного складу на зміну інтегральної активності мозку. Для досягнення мети дипломної роботи було сформовано ряд наступних задач: 1. Проаналізувати амплітудно частотні характеристики обраних аудіо сигналів; 2. Дослідити зміну ЕЕГ-ритмів на фоні впливу обраних аудіо сигналів 3. Дослідити вплив частотних характеристик аудіо сигналу на інтегральну електричну активність мозку.
The scope of the report is 57 pages, contains 32 illustrations, 14 tables, 2 annexes. In total, 36 sources were used. The relevance of this work lies in determining the dependence of the bioelectric activity of the brain on the amplitude-frequency characteristics of the sound signal. Nowadays, music accompanies our life, so it is important to determine what impact it has on the electrical activity of the brain and how the characteristics of the indicators of brain activity change when listening to musical signals. Purpose: to determine the effectiveness of audio signal impact of different amplitude-frequency composition on the change of brain integral activity. To achieve the goal of the thesis, the following tasks were formed: 1. Analyze the amplitude-frequency characteristics of the selected audio signals; 2. Investigate the change in EEG rhythms against the background of the influence of selected audio signals; 3. Investigate the influence of frequency characteristics of the audio signal on the integrated electrical activity of the brain.

APA, Harvard, Vancouver, ISO, and other styles

4

Chiu, Leung Kin. "Efficient audio signal processing for embedded systems." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44775.

Full text

Abstract:

We investigated two design strategies that would allow us to efficiently process audio signals on embedded systems such as mobile phones and portable electronics. In the first strategy, we exploit properties of the human auditory system to process audio signals. We designed a sound enhancement algorithm to make piezoelectric loudspeakers sound "richer" and "fuller," using a combination of bass extension and dynamic range compression. We also developed an audio energy reduction algorithm for loudspeaker power management by suppressing signal energy below the masking threshold. In the second strategy, we use low-power analog circuits to process the signal before digitizing it. We designed an analog front-end for sound detection and implemented it on a field programmable analog array (FPAA). The sound classifier front-end can be used in a wide range of applications because programmable floating-gate transistors are employed to store classifier weights. Moreover, we incorporated a feature selection algorithm to simplify the analog front-end. A machine learning algorithm AdaBoost is used to select the most relevant features for a particular sound detection application. We also designed the circuits to implement the AdaBoost-based analog classifier.

APA, Harvard, Vancouver, ISO, and other styles

5

Wellhausen, Jens. "Algorithms for audio signal segmentation and separation /." Aachen : Shaker, 2007. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=016149157&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Kwong, Mylène. "Détection de transitoires dans un signal audio." Sherbrooke : Université de Sherbrooke, 2004.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

7

Moinnereau, Marc-Antoine. "Encodage d'un signal audio dans un électroencéphalogramme." Mémoire, Université de Sherbrooke, 2017. http://hdl.handle.net/11143/10554.

Full text

Abstract:

Les interfaces cerveau-machine visent à établir un lien de communication entre le cerveau et un système externe à ce dernier. Les électroencéphalogrammes (EEG), dans ce contexte, ont l’avantage d’être non invasifs. Par contre, l’information sensorielle qui se retrouve dans un signal EEG est beaucoup moins ciblée que dans un signal neuronal acquis par une méthode invasive. De plus, étant donné que le cortex auditif est situé dans des repliements du tissu cortical, les neurones qui déchargent, suite à un stimulus auditif, sont parallèles à la surface corticale sur laquelle les EEG sont enregistrés. Par conséquent, l’information auditive qui se retrouve dans le canal EEG situé vis-à-vis du cortex auditif est faible. L’objectif principal de ce projet de recherche consiste donc à étudier la répartition de l’information auditive dans l’ensemble des canaux EEG. Pour ce faire, nous utilisons deux approches. Dans la première, nous tenterons d’estimer l’activité corticale sous-jacente à partir des signaux EEG en utilisant un modèle de couplage bande fréquence. En effet, certaines bandes de fréquences sont des bons prédicteurs des décharges neuronales. Cependant, cette approche n’a pas été validée pour le système auditif, nous confronterons donc l’estimation obtenue à une autre estimation en ayant recours à un modèle spécialisé pour l’encodage du signal de parole faisant appel aux processus ponctuels. Ce modèle prend en compte les dynamiques intrasèques des neurones et également des propriétés spectrotemporelles du stimulus d’entrée. Dans la seconde approche, nous étudierons la possibilité de classifier 3 voyelles (a, i et u) en fonction du nombre de canaux EEG utilisés ainsi que leur répartition sur le cuir chevelu. Nous aurons recours, pour cela, à un réservoir de neurone à décharge récurrent activé en entrée par les données EEG. Les résultats démontrent que l’information auditive se retrouve en fait dans l’ensemble des canaux EEG et qu’elle n’est pas confinée à un nombre restreint d’électrodes. Il est également montré que lorsque l’on utilise les 64 électrodes que comporte l’EEG pour classifier les 3 voyelles, on obtient une classification de l’ordre de 80%, mais aussi qu’un nombre limité de 10 électrodes suffit pour obtenir une classification satisfaisante et, qu’en plus, la position de ces électrodes sur le cuir chevelu est peu importante.

APA, Harvard, Vancouver, ISO, and other styles

8

Kwong, Mylène. "Détection de transitoires dans un signal audio." Mémoire, Université de Sherbrooke, 2004. http://savoirs.usherbrooke.ca/handle/11143/1254.

Full text

Abstract:

Le travail présenté dans ce mémoire de maîtrise porte sur la détection de transitoires dans un signal audio. L'objectif visé est de pouvoir extraire le rythme et la segmentation du signal musical. Ce projet s'inscrit dans le cadre général d'un dispositif de transcription automatique de la musique. La problématique majeure réside dans la détection de toutes les transitoires contenues dans le signal musical, y compris celles masquées par des composantes stationnaires de plus haute énergie. Les méthodes que nous avons envisagées sont basées sur des décompositions temps-fréquence du signal et une décomposition du signal en deux composantes, l'une stationnaire et l'autre transitoire.Le premier algorithme développé décompose le signal en bandes de fréquences.Le signal d'enveloppe de chaque bande de fréquences est analysé (par une étude sur le signal de dérivée) afin d'identifier les positions des transitoires. L'utilisation d'un opérateur de Teager (qui met en évidence des transitoires dans le signal) sur des signaux réels de guitare, a permis de passer de 67% à 92% de bonnes détections des transitoires de haute énergie (à peine 50% de toutes les transitoires) sans toutefois permettre la détection de transitoires masquées de faible énergie toutes aussi importantes. La méthode que nous avons envisagée, pour détecter un maximum de transitoires, utilise un pré-traitement séparant l'information transitoire du reste du signal. Un filtrage fréquentiel adaptatif permet de retirer au signal sa composante harmonique, avant de procéder à la localisation temporelle des transitoires. La méthode évaluée avec des signaux réels riches de guitare permet de détecter 85% des transitoires.

APA, Harvard, Vancouver, ISO, and other styles

9

Anderson, David Verl. "Audio signal enhancement using multi-resolution sinusoidal modeling." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15394.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Carlo, Diego Di. "Echo-aware signal processing for audio scene analysis." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S075.

Full text

Abstract:

La plupart des méthodes de traitement du signal audio considèrent la réverbération et en particulier les échos acoustiques comme une nuisance. Cependant, ceux-ci transmettent des informations spatiales et sémantiques importantes sur les sources sonores et des méthodes essayant de les prendre en compte ont donc récemment émergé.. Dans ce travail, nous nous concentrons sur deux directions. Tout d’abord, nous étudions la manière d’estimer les échos acoustiques à l’aveugle à partir d’enregistrements microphoniques. Deux approches sont proposées, l’une s’appuyant sur le cadre des dictionnaires continus, l’autre sur des techniques récentes d’apprentissage profond. Ensuite, nous nous concentrons sur l’extension de méthodes existantes d’analyse de scènes audio à leurs formes sensibles à l’écho. Le cadre NMF multicanal pour la séparation de sources audio, la méthode de localisation SRP-PHAT et le formateur de voies MVDR pour l’amélioration de la parole sont tous étendus pour prendre en compte les échos. Ces applications montrent comment un simple modèle d’écho peut conduire à une amélioration des performances
Most of audio signal processing methods regard reverberation and in particular acoustic echoes as a nuisance. However, they convey important spatial and semantic information about sound sources and, based on this, recent echo-aware methods have been proposed. In this work we focus on two directions. First, we study the how to estimate acoustic echoes blindly from microphone recordings. Two approaches are proposed, one leveraging on continuous dictionaries, one using recent deep learning techniques. Then, we focus on extending existing methods in audio scene analysis to their echo-aware forms. The Multichannel NMF framework for audio source separation, the SRP-PHAT localization method, and the MVDR beamformer for speech enhancement are all extended to their echo-aware versions

APA, Harvard, Vancouver, ISO, and other styles

11

Ekström, Mattias. "Acoustic feedback suppression in audio mixer for PA applications." Thesis, Umeå universitet, Institutionen för fysik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-136841.

Full text

Abstract:

When a speaker is addressing an audience, a PA system consisting of a microphone and a loudspeaker is often used. If the microphone picks up too much of the loudspeaker energy, acoustic feedback in the form of an unwanted characteristic howling can occur. Limes Audio is a software company that specializes in improving sound quality in digital communications, mainly conference telephony, and has developed a reference product, the Magneto mixer, to demonstrate the capability of their software TrueVoice. The company now wishes to expand the field of usage for the Magneto mixer to enable it to work as a microphone mixer in PA scenarios, and for this, a feedback suppression feature is needed. This master’s thesis aims at surveying the market and the literature in the field and specifying the requirements for a feedback suppression feature. Three methods for suppressing howling feedback are evaluated through simulations and compared in terms of maximum stable gain (MSG) and subjective listening experience. The method that performed the best based on these criteria was acoustic feedback cancellation with a 5 Hz frequency shift on the loudspeaker signal. This method makes use of an adaptive filter to model the acoustic feedback path and to remove the feedback component from the microphone signal. In the simulations, the method was able to increase the stable gain by approximately 10 dB while maintaining a good sound quality.
När en talare talar för en publik används ofta ett PA system bestående av en mikrofon och en högtalare. Om mikrofonen tar upp för mycket av ljudet från högtalaren finns en överhängande risk för akustisk rundgång i form av ett karaktäristiskt oönskat tjut. Limes Audio är ett företag som utvecklar mjukvara för att förbättra ljudkvaliten i digital kommunikation, främst inom konferenstelefoni. De har utvecklat en demonstrationsprodukt, Magnetomixern, som kan användas som en konferenstelefon för att demonstrera deras programvara TrueVoice. Företaget önskar nu utveckla Magnetomixern till att även fungera som en ljudmixer för PA-scenarion, eller konferenstelefoni där intern ljudförstärkning i rummet behövs, och för detta behövs en funktion för att ta bort eventuell rundgång. Detta examensarbete har som mål att lägga grunden för en sådan funktion i Magnetomixern genom att undersöka marknaden och litteraturen på området. Tre metoder för att eliminera rundgång utvärderas i simuleringar och jämförs beträffande maximal stabil förstärkning (MSG) och subjektiv ljudkvalitet. Metoden ”Acoustic feedback cancellation” tillsammans med ett 5 Hz frekvensskifte på högtalarsignalen gav högst MSG och bäst ljudkvalitet. Metoden använder ett adaptivt filter för att approximera den akustiska återkopplingsvägen mellan högtalare och mikrofon samt tar bort rundgångskomponenter från mikrofonsignalen. I simuleringarna kunde metoden öka den maximala stabila förstärkningen med upp till 10 dB medan en god ljudkvalitet på talet bibehölls.

APA, Harvard, Vancouver, ISO, and other styles

12

Papadopoulos, Hélène. "Estimation conjointe d'information de contenu musical d'un signal audio." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2010. http://tel.archives-ouvertes.fr/tel-00548952.

Full text

Abstract:

Depuis quelques années, nous assistons à l'augmentation croissante de gigantesques collections de musique en ligne. Ce phénomène a attiré l'attention de nombreux chercheurs. En effet, le besoin urgent de développer des outils et des méthodes qui permettent d'interagir avec ces énormes bibliothèques de musique numérique pose des défis scientifiques complexes. Le domaine de la recherche d'information musicale (Music Information Retrieval, MIR) est ainsi devenu très actif depuis une dizaine d'années. Ce domaine général inclut celui de l'indexation musicale dans lequel s'inscrit cette thèse qui a pour but d'aider au stockage, à la diffusion et la consultation des gigantesques collections de musique en ligne. Ce domaine ouvre de nombreuses perspectives pour l'industrie et la recherche liées aux activités multimédia. Dans cette thèse, nous nous intéressons au problème de l'extraction automatique d'informations de contenu d'un signal audio de musique. La plupart des travaux existants abordent ce problème en considérant les attributs musicaux de manière indépendante les uns vis-à-vis des autres. Cependant les morceaux de musique sont extrèmement structurés du point de vue de l'harmonie et du rythme et leur estimation devrait se faire en tenant compte du contexte musical, comme le fait un musicien lorsqu'il analyse un morceau de musique. Nous nous concentrons sur trois descripteurs musicaux liés aux structures harmoniques, métriques et tonales d'un morceau de musique. Plus précisément, nous cherchons à en estimer la progression des accords, les premiers temps et la tonalité. L'originalité de notre travail consiste à construire un modèle qui permet d'estimer de manière conjointe ces trois attributs musicaux. Notre objectif est de montrer que l'estimation des divers descripteurs musicaux est meilleure si on tient compte de leurs dépendances mutuelles que si on les estime de manière indépendante. Nous proposons au cours de ce travail un ensemble de protocoles de comparaison, de métriques de performances et de nouvelles bases de données de test afin de pouvoir évaluer les différentes méthodes étudiées. Afin de valider notre approche, nous présentons également les résultats de nos participations à des campagnes d'évaluation internationales. Dans un premier temps, nous examinons plusieurs représentations typiques du signal audio afin de choisir celle qui est la plus appropriée à l'analyse du contenu harmonique d'un morceau de musique. Nous explorons plusieurs méthodes qui permettent d'extraire un chromagram du signal et les comparons à travers un protocole d'évaluation original et une nouvelle base de données que nous avons annotée. Nous détaillons et expliquons les raisons qui nous ont amenés à choisir la représentation que nous utilisons dans notre modèle. Dans notre modèle, les accords sont considérés comme un attribut central autour duquel les autres descripteurs musicaux s'organisent. Nous étudions le problème de l'estimation automatique de la suite des accords d'un morceau de musique audio en utilisant les _chromas_ comme observations du signal. Nous proposons plusieurs méthodes basées sur les modèles de Markov cachés (hidden Markov models, HMM), qui permettent de prendre en compte des éléments de la théorie musicale, le résultat d'expériences cognitives sur la perception de la tonalité et l'effet des harmoniques des notes de musique. Les différentes méthodes sont évaluées et comparées pour la première fois sur une grande base de données composée de morceaux de musique populaire. Nous présentons ensuite une nouvelle approche qui permet d'estimer de manière simultanée la progression des accords et les premiers temps d'un signal audio de musique. Pour cela, nous proposons une topologie spécifique de HMM qui nous permet de modéliser la dépendance des accords par rapport à la structure métrique d'un morceau. Une importante contribution est que notre modèle peut être utilisé pour des structures métriques complexes présentant par exemple l'insertion ou l'omission d'un temps, ou des changements dans la signature rythmique. Le modèle proposé est évalué sur un grand nombre de morceaux de musique populaire qui présentent des structures métriques variées. Nous comparons les résultats d'un modèle semi-automatique, dans lequel nous utilisons les positions des temps annotées manuellement, avec ceux obtenus par un modèle entièrement automatique où la position des temps est estimée directement à partir du signal. Enfin, nous nous penchons sur la question de la tonalité. Nous commençons par nous intéresser au problème de l'estimation de la tonalité principale d'un morceau de musique. Nous étendons le modèle présenté ci-dessus à un modèle qui permet d'estimer simultanément la progression des accords, les premiers temps et la tonalité principale. Les performances du modèle sont évaluées à travers des exemples choisis dans la musique populaire. Nous nous tournons ensuite vers le problème plus complexe de l'estimation de la tonalité locale d'un morceau de musique. Nous proposons d'aborder ce problème en combinant et en étendant plusieurs approches existantes pour l'estimation de la tonalité principale. La spécificité de notre approche est que nous considérons la dépendance de la tonalité locale par rapport aux structures harmonique et métrique. Nous évaluons les résultats de notre modèle sur une base de données originale composée de morceaux de musique classique que nous avons annotés.

APA, Harvard, Vancouver, ISO, and other styles

13

Wang, Shuai. "Embedding data in an audio signal, using acoustic OFDM." Thesis, Linköpings universitet, Kommunikationssystem, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-71427.

Full text

Abstract:

The OFDM technology has been extensively used in many radio communicationtechnologies. For example, OFDM is the core technology applied in WiFi, WiMAXand LTE. Its main advantages include high bandwidth utilization, strong noise im-munity and the capability to resist frequency selective fading. However, OFDMtechnology is not only applied in the ﬁeld of radio communication, but has alsobeen developed greatly in acoustic communication, namely the so called acousticOFDM. Thanks to the acoustic OFDM technology, the information can be em-bedded in audio and then transmitted so that the receiver can obtain the requiredinformation through certain demodulation mechanisms without severely aﬀectingthe audio quality.This thesis mainly discusses how to embed and transmit information in audioby making use of acoustic OFDM. Based on the theoretical systematic structure, italso designs a simulation system and a measurement system respectively. In thesetwo systems, channel coding, manners of modulation and demodulation, timingsynchronization and parameters of the functional components are conﬁgured in themost reasonable way in order to achieve relatively strong stability and robustnessof the system. Moreover, power control and the compatibility between audio andOFDM signals are also explained and analyzed in this thesis.Based on the experimental results, the author analyzes the performance of thesystem and the factors that aﬀect the performance of the system, such as the typeof audio, distance between transmitter and receiver, audio output level and so on.According to this analysis, it is proved that the simulation system can work steadilyin any audio of wav format and transmit information correctly. However, dueto the hardware limitations of the receiver and sender devices, the measurementsystem is unstable to a certain degree. Finally, this thesis draws conclusions of theresearch results and points out unsolved problems in the experiments. Eventually,some expectations for this research orientation are stated and relevant suggestionsare proposed.

APA, Harvard, Vancouver, ISO, and other styles

14

Palmer, Duncan. "Position estimation using the Digital Audio Broadcast (DAB) signal." Thesis, University of Nottingham, 2011. http://eprints.nottingham.ac.uk/12456/.

Full text

Abstract:

Over the past decades, there have been a number of trends that have driven the desire to improve the ability to navigate in all environments. While the Global Positioning System has been the driving factor behind most of these trends, there are limitations to this system that have become more evident over time as the world has increasingly come to rely on navigation. These limitations are mostly due to the low transmission power of the satellites, where navigation signals broadcast from space are comparatively weak, especially by the time they have travelled to receivers on the ground. This makes the signals particularly vulnerable to fading in difficult environments such as "urban jungles" and other built up areas. The low signal-to-noise ratio (SNR) also means, that the signals are susceptible to jamming, both hostile and accidental. This motivates the need for alternatives technologies to satellite navigation and consider terrestrial based alternatives such as LORAN-C and eLORAN, but there is also significant interest in the exploitation of other non-navigation signals for positioning and navigation purposes. These so-called 'Signals of Opportunity' do not generally require any alterations to existing communications transmission infrastructure and utilise alternative multi-carrier modulation techniques to those used by navigation systems. This project examines the use of such a signal, the Digital Audio Broadcast (DAB) signal, as a positioning source. This thesis contains complete research from initial coverage simulations in the UK, through to extensive static testing, and the use of the signal in a dynamic environment and it has been shown that the Digital Audio Broadcast signal has potential as a terrestrial based positioning signal.

APA, Harvard, Vancouver, ISO, and other styles

15

Gower, Ephraim. "Mathematical Analysis and Audio Applications in Blind Signal Decomposition." Thesis, University of Essex, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.520088.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Wellhausen, Jens [Verfasser]. "Algorithms for Audio Signal Segmentation and Separation / Jens Wellhausen." Aachen : Shaker, 2007. http://d-nb.info/1166510050/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Savard, Patrick-André. "Méthode hybride de modification de durée d'un signal audio." Mémoire, Université de Sherbrooke, 2008. http://savoirs.usherbrooke.ca/handle/11143/1440.

Full text

Abstract:

Le domaine de recherche de la modification de durée d'un signal audio est actif depuis plus d'une quarantaine d'années. Aujourd'hui, plusieurs applications en font usage: livres audio, synthèse par échantillonnage, voix sur IP, postsynchronisation audio-vidéo, etc. Dans la plupart des cas, l'objectif des techniques de modification de durée est de modifier le débit du signal tout en y affectant le moins possible ses caractéristiques perceptuelles (tel que la tonalité). Plusieurs algorithmes de modification de durée ont été proposés pour tenter d'atteindre ce but. Ceux-ci sont caractérisés par des forces et faiblesses selon le type de signal traité. Dans le cas où l'on tente de traiter un signal de type variable, l'ensemble des techniques proposées à ce jour exhibent des artéfacts audibles qui affectent l'ensemble de la qualité subjective du signal d'origine. Une étude approfondie de l'état de l'art permet de constater que les algorithmes de modification de durée peuvent être classés sous deux familles, soit les techniques appliquées dans le domaine temporel et les techniques appliquées dans le domaine fréquentiel. Selon le domaine de traitement choisi (temporel ou fréquentiel), la qualité subjective obtenue prime sur un type de signal particulier. Par exemple, les techniques appliquées dans le domaine temporel ont la caractéristique d'être efficaces sur des signaux monophoniques, tandis que les techniques appliquées dans le domaine fréquentiel sont plus performantes sur des signaux polyphoniques ou bruités. Ce constat permet d'établir qu'il existe une complémentarité entre les techniques appliquées dans le domaine temporel et le domaine fréquentiel. Ceci motive la création d'un algorithme de modification de durée qui en tire avantage, de façon à résoudre le problème du traitement de signaux composés de plusieurs types. Cet ouvrage présente une méthode novatrice qui vise à exploiter la complémentarité observée entre les techniques appliquées dans le domaine temporel et le domaine fréquentiel. Cette méthode introduit les contributions suivantes: (1) Le choix de deux techniques de modification de durée, basé sur une étude de la qualité du signal obtenu selon le type de signal traité. (2) Une étape de classification du signal, appliquée trame par trame, permettant de choisir la technique appropriée. (3) L'introduction de dispositions permettant la transition transparente entre les techniques choisies. (4) La définition d'un ensemble de paramètres régissant l'occurrence et la fréquence des transitions entre les techniques. (5) L'adaptation du codeur de phase amélioré, permettant d'obtenir un signal de synthèse de longueur fixe. La méthode de modification de durée obtenue est caractérisée par une haute qualité subjective sur une large gamme de signaux.

APA, Harvard, Vancouver, ISO, and other styles

18

Xiao, Zhongzhe Chen Liming. "Recognition of emotions in audio signals." Ecully : Ecole Centrale de Lyon, 2008. http://bibli.ec-lyon.fr/exl-doc/zxiao.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Huber, Rainer. "Objective assessment of audio quality using an auditory processing model." [S.l. : s.n.], 2003. http://deposit.ddb.de/cgi-bin/dokserv?idn=971182167.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

De, Campos Teixeira Gomes Leandro. "Tatouage de signaux audio." Paris 5, 2002. http://www.theses.fr/2002PA05S009.

Full text

Abstract:

Les enregistrements audio sous forme numérique peuvent être facilement reproduits sans aucune distorsion et à l'aide de dispositifs accessibles au grand public. Le problème du piratage se pose donc aujourd'hui avec une acuité nouvelle. Le tatouage audio ("audio watermarking" en anglais) a été proposé comme une solution potentielle à ce problème. Il consiste à insérer une marque, le tatouage, dans un signal audio. Cette marque ne doit pas dégrader la qualité perceptive du signal original, mais elle doit être détectable et en général indélébile. Un signal contenant un tatouage, signal dit "tatoué", porte des données qui peuvent, par exemple, identifier le propriétaire et décrire les droits accordés à l'utilisateur sur le signal. Ce travail concerne le développement de nouvelles méthodes de tatouage audio et l'étude d'applications. Il présente notamment un système de tatouage à clé publique fondé sur des propriétés de cyclostationnarité : la cyclofréquence du tatouage permet à l'utilisateur d'effectuer la détection. . .

APA, Harvard, Vancouver, ISO, and other styles

21

Amphlett, Robert W. "Multiprocessor techniques for high quality digital audio." Thesis, University of Bristol, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.337273.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Balraj, Navaneethakrishnan. "AUTOMATED ACCIDENT DETECTION IN INTERSECTIONS VIA DIGITAL AUDIO SIGNAL PROCESSING." MSSTATE, 2003. http://sun.library.msstate.edu/ETD-db/theses/available/etd-10212003-102715/.

Full text

Abstract:

The aim of this thesis is to design a system for automated accident detection in intersections. The input to the system is a three-second audio signal. The system can be operated in two modes: two-class and multi-class. The output of the two-class system is a label of ?crash? or ?non-crash?. In the multi-class system, the output is the label of ?crash? or various non-crash incidents including ?pile drive?, ?brake?, and ?normal-traffic? sounds. The system designed has three main steps in processing the input audio signal. They are: feature extraction, feature optimization and classification. Five different methods of feature extraction are investigated and compared; they are based on the discrete wavelet transform, fast Fourier transform, discrete cosine transform, real cepstrum transform and Mel frequency cepstral transform. Linear discriminant analysis (LDA) is used to optimize the features obtained in the feature extraction stage by linearly combining the features using different weights. Three types of statistical classifiers are investigated and compared: the nearest neighbor, nearest mean, and maximum likelihood methods. Data collected from Jackson, MS and Starkville, MS and the crash signals obtained from Texas Transportation Institute crash test facility are used to train and test the designed system. The results showed that the wavelet based feature extraction method with LDA and maximum likelihood classifier is the optimum design. This wavelet-based system is computationally inexpensive compared to other methods. The system produced classification accuracies of 95% to 100% when the input signal has a signal-to-noise-ratio of at least 0 decibels. These results show that the system is capable of effectively classifying ?crash? or ?non-crash? on a given input audio signal.

APA, Harvard, Vancouver, ISO, and other styles

23

Lindström, Fredric. "Digital signal processing methods and algorithms for audio conferencing systems /." Karlskrona : Department of Signal Processing, School of Engineering, Blekinge Institute of Technology, 2007. http://www.bth.se/fou/Forskinfo.nsf/allfirst2/9cc008f2fa400e82c12572bb00331533?OpenDocument.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Papadopoulos, Hélène. "Joint estimation of musical content information[ from an audio signal]." Paris 6, 2010. http://www.theses.fr/2010PA066224.

Full text

Abstract:

Dans cette thèse, nous nous intéressons au problème de l'extraction automatique d'information de contenu d'un signal audio de musique. L'originalité de notre travail est que nous proposons un modèle qui permet d'estimer de manière conjointe plusieurs attributs musicaux (accords, premiers temps et tonalité). Nous examinons plusieurs représentations typiques du signal audio afin de choisir celle qui est la plus appropriée à l'analyse de son contenu harmonique. En utilisant les chromas comme observations du signal, nous proposons d’abord plusieurs méthodes basées sur les modèles de Markov cachés (hidden Markov models, HMM) pour estimer la suite des accords. Elles permettent de prendre en compte des éléments de théorie musicale, le résultat d'expériences cognitives sur la perception de la tonalité et l'effet des harmoniques des notes de musique. Nous présentons alors une nouvelle approche qui permet d'estimer de manière simultanée la progression des accords et les premiers temps. Nous proposons une topologie spécifique de HMM qui nous permet de modéliser la dépendance des accords par rapport à la structure métrique d'un morceau. Notre modèle peut être utilisé pour des structures métriques complexes. Nous nous penchons enfin sur le problème de l’estimation de la tonalité principale ainsi que celui plus complexe de l'estimation de la tonalité locale. La spécificité de notre approche est que nous considérons la dépendance de la tonalité par rapport aux structures harmonique et métrique. Nous montrons que l'estimation des divers éléments musicaux est meilleure si on tient compte de leurs dépendances mutuelles que si on les estime de manière indépendante.

APA, Harvard, Vancouver, ISO, and other styles

25

Marchand, Ugo. "Caractérisation du rythme à partir de l'analyse du signal audio." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066453/document.

Full text

Abstract:

Cette thèse s'inscrit dans le cadre de l'analyse automatique de la musique.La finalité de ce champ de recherche est d'extraire des informations de la musique, ou autrement dit, de faire comprendre ce qu'est la musique à un ordinateur.Les applications sont nombreuses: fabriquer des systèmes de recommandation musicale, transcrire une partition à partir du signal ou générer automatiquement de la musique.Nous nous intéressons dans ce manuscrit à l'analyse automatique du rythme.Notre objectif est de proposer de nouvelles descriptions du rythme qui s'inspirent d'études perceptives et neurologiques.La représentation du rythme d’un signal musical audio est un problème complexe.Il ne s’agit pas simplement de détecter la position des attaques et la durée des notes comme sur une partition mais plus généralement de modéliser l’interaction temporelle entre les différents instruments présents et collaborant à l’établissement d’un rythme de manière compacte, discriminante et invariante.Nous cherchons à obtenir des représentations invariantes à certains paramètres (tels la position dans le temps, les variations faibles de tempo ou d’instrumentation) mais à l’inverse sensibles à d’autres (comme le motif rythmique, les paramètres fins d’interprétation ou le swing). Nous étudions les trois aspects fondamentaux pour la description du rythme: le tempo les déviations et les motifs rythmiques
This thesis is within the scope of Music Information Retrieval. The goal of this research field is to extract meaningful informations from music. There are numerous applications: music recommendation systems, music transcription to a score or automatic generation of music. In this manuscript, oOur objective is to propose new rhythm descriptions inspired from perceptual and neurological studies.Rhythm representation of a musical signal is a complex problem. Detecting attack positions and note durations is not sufficient: we have the model the temporal interaction between the different instruments collaborating together to create rhythm. We try to obtain representations that are invariant to some parameters (like the position over time, the small tempo or instrumentation variations) but sensitive to other parameters (like the rhythm pattern or the swing factor). We study the three key aspect of rhythm description: tempo, deviations and rhythm pattern

APA, Harvard, Vancouver, ISO, and other styles

26

Tegendal, Lukas. "Watermarking in Audio using Deep Learning." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159191.

Full text

Abstract:

Watermarking is a technique used to used to mark the ownership in media such as audio or images by embedding a watermark, e.g. copyrights information, into the media. A good watermarking method should perform this embedding without affecting the quality of the media. Recent methods for watermarking in images uses deep learning to embed and extract the watermark in the images. In this thesis, we investigate watermarking in the hearable frequencies of audio using deep learning. More specifically, we try to create a watermarking method for audio that is robust to noise in the carrier, and that allows for the extraction of the embedded watermark from the audio after being played over-the-air. The proposed method consists of two deep convolutional neural network trained end-to-end on music with simulated noise. Experiments show that the proposed method successfully creates watermarks robust to simulated noise with moderate quality reductions, but it is not robust to the real world noise introduced after playing and recording the audio over-the-air.

APA, Harvard, Vancouver, ISO, and other styles

27

Bando, Yoshiaki. "Robust Audio Scene Analysis for Rescue Robots." Kyoto University, 2018. http://hdl.handle.net/2433/232410.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Xiao, Zhongzhe. "Recognition of emotions in audio signals." Ecully, Ecole centrale de Lyon, 2008. http://www.theses.fr/2008ECDL0002.

Full text

Abstract:

Les travaux de recherche réalisés dans le cadre de cette thèse de doctorat portent sur la reconnaissance automatique de l’émotion et de l’humeur au sein de signaux sonores. En effet, l’émotion portée par les signaux audio constitue une information sémantique particulièrement importante dont l’analyse automatique offre de nombreuses possibilités en termes d’applications, telles que les interactions homme-machine intelligentes et l’indexation multimédia. L’objectif de cette thèse est ainsi d’étudier des solutions informatique d’analyse de l’émotion audio tant pour la parole que pour les signaux musicaux. Nous utilisons dans notre travail un modèle émotionnel discret combiné à un modèle dimensionnel, en nous appuyant sur des études existantes sur les corrélations entre les propriétés acoustiques et l’émotion dans la parole ainsi que l’humeur dans les signaux de musique. Les principales contributions de nos travaux sont les suivantes. Tout d’abord, nous avons proposé, en complément des caractéristiques audio basées sur les propriétés fréquentielles et d’énergie, de nouvelles caractéristiques harmoniques et Zipf, afin d’améliorer la caractérisation des propriétés des signaux de parole en terme de timbre et de prosodie. Deuxièmement, dans la mesure où très peu de ressources pour l’étude de l’émotion dans la parole et dans la musique sont disponibles par rapport au nombre important de caractéristiques audio qu’il est envisageable d’extraire, une méthode de sélection de caractéristiques nomméeESFS, basée sur la théorie de l’évidence est proposée afin de simplifier le modèle de classification et d’en améliorer les performances. De plus, nous avons montré que l’utilisation d’un classifieur hiérarchique basé sur un modèle dimensionnel de l’émotion, permet d’obtenir de meilleurs résultats de classification qu’un unique classifieur global, souvent utilisé dans la littérature. Par ailleurs, puisqu’il n’existe pas d’accord universel sur la définition des émotions de base, et parce que les états émotionnels considérés sont très dépendant des applications, nous avons également proposé un algorithme basés sur ESFS et permettant de construire automatiquement un classifieur hiérarchique adapté à un ensemble spécifique d’états émotionnels dans le cadre d’une application particulière. Cette classification hiérarchique procède en divisant un problème de classification complexe en un ensemble de problèmes plus petits et plus simples grâce à la combinaison d’un ensemble de sous-classifieurs binaires organisés sous forme d’un arbre binaire. Enfin, les émotions étant par nature des notions subjectives, nous avons également proposé un classifieur ambigu, basé sur la théorie de l’évidence, permettant l’association d’un signal audio à de multiples émotions, comme le font souvent les êtres humains
This Ph. D thesis work is dedicated to automatic emotion/mood recognition in audio signals. Indeed, audio emotion is high semantic information and its automatic analysis may have many applications such as smart human-computer interactions or multimedia indexing. The purpose of this thesis is thus to investigate machine-based audio emotion analysis solutions for both speech and music signals. Our work makes use of a discrete emotional model combined with the dimensional one and relies upon existing studies on acoustics correlates of emotional speech and music mood. The key contributions are the following. First, we have proposed, in complement to popular frequency-based and energy-based features, some new audio features, namely harmonic and Zipf features, to better characterize timbre and prosodic properties of emotional speech. Second, as there exists very few emotional resources either for speech or music for machine learning as compared to audio features that one can extract, an evidence theory-based feature selection scheme named Embedded Sequential Forward Selection (ESFS) is proposed to deal with the classic “curse of dimensionality” problem and thus over-fitting. Third, using a manually built dimensional emotion model-based hierarchical classifier to deal with fuzzy borders of emotional states, we demonstrated that a hierarchical classification scheme performs better than single global classifier mostly used in the literature. Furthermore, as there does not exist any universal agreement on basic emotion definition and as emotional states are typically application dependent, we also proposed a ESFS-based algorithm for automatically building a hierarchical classification scheme (HCS) which is best adapted to a specific set of application dependent emotional states. The HCS divides a complex classification problem into simpler and smaller problems by combining several binary sub-classifiers in the structure of a binary tree in several stages, and gives the result as the type of emotional states of the audio samples. Finally, to deal with the subjective nature of emotions, we also proposed an evidence theory-based ambiguous classifier allowing multiple emotions labeling as human often does. The effectiveness of all these recognition techniques was evaluated on Berlin and DES datasets for emotional speech recognition and on a music mood dataset that we collected in our laboratory as there exist no public dataset so far. Keywords: audio signal, emotion classification, music mood analysis, audio features, feature selection, hierarchical classification, ambiguous classification, evidence theory

APA, Harvard, Vancouver, ISO, and other styles

29

Coulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Sherbrooke : Université de Sherbrooke, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

30

Khemiri, Houssemeddine. "Approche générique appliquée à l'indexation audio par modélisation non supervisée." Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0055/document.

Full text

Abstract:

La quantité de données audio disponibles, telles que les enregistrements radio, la musique, les podcasts et les publicités est en augmentation constance. Par contre, il n'y a pas beaucoup d'outils de classification et d'indexation, qui permettent aux utilisateurs de naviguer et retrouver des documents audio. Dans ces systèmes, les données audio sont traitées différemment en fonction des applications. La diversité de ces techniques d'indexation rend inadéquat le traitement simultané de flux audio où différents types de contenu audio coexistent. Dans cette thèse, nous présentons nos travaux sur l'extension de l'approche ALISP, développé initialement pour la parole, comme une méthode générique pour l'indexation et l'identification audio. La particularité des outils ALISP est qu'aucune transcription textuelle ou annotation manuelle est nécessaire lors de l'étape d'apprentissage. Le principe de cet outil est de transformer les données audio en une séquence de symboles. Ces symboles peuvent être utilisés à des fins d'indexation. La principale contribution de cette thèse est l'exploitation de l'approche ALISP comme une méthode générique pour l'indexation audio. Ce système est composé de trois modules: acquisition et modélisation des unités ALISP d'une manière non supervisée, transcription ALISP des données audio et comparaison des symboles ALISP avec la technique BLAST et la distance de Levenshtein. Les évaluations du système proposé pour les différentes applications sont effectuées avec la base de données YACAST et avec d'autres corpus disponibles publiquement pour différentes tâche de l'indexation audio
The amount of available audio data, such as broadcast news archives, radio recordings, music and songs collections, podcasts or various internet media is constantly increasing. Therefore many audio indexing techniques are proposed in order to help users to browse audio documents. Nevertheless, these methods are developed for a specific audio content which makes them unsuitable to simultaneously treat audio streams where different types of audio document coexist. In this thesis we report our recent efforts in extending the ALISP approach developed for speech as a generic method for audio indexing, retrieval and recognition. The particularity of ALISP tools is that no textual transcriptions are needed during the learning step. Any input speech data is transformed into a sequence of arbitrary symbols. These symbols can be used for indexing purposes. The main contribution of this thesis is the exploitation of the ALISP approach as a generic method for audio indexing. The proposed system consists of three steps; an unsupervised training to model and acquire the ALISP HMM models, ALISP segmentation of audio data using the ALISP HMM models and a comparison of ALISP symbols using the BLAST algorithm and Levenshtein distance. The evaluations of the proposed systems are done on the YACAST and other publicly available corpora for several tasks of audio indexing

APA, Harvard, Vancouver, ISO, and other styles

31

Vemulapalli, Smita. "Audio-video based handwritten mathematical content recognition." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45958.

Full text

Abstract:

Recognizing handwritten mathematical content is a challenging problem, and more so when such content appears in classroom videos. However, given the fact that in such videos the handwritten text and the accompanying audio refer to the same content, a combination of video and audio based recognizer has the potential to significantly improve the content recognition accuracy. This dissertation, using a combination of video and audio based recognizers, focuses on improving the recognition accuracy associated with handwritten mathematical content in such videos. Our approach makes use of a video recognizer as the primary recognizer and a multi-stage assembly, developed as part of this research, is used to facilitate effective combination with an audio recognizer. Specifically, we address the following challenges related to audio-video based handwritten mathematical content recognition: (1) Video Preprocessing - generates a timestamped sequence of segmented characters from the classroom video in the face of occlusions and shadows caused by the instructor, (2) Ambiguity Detection - determines the subset of input characters that may have been incorrectly recognized by the video based recognizer and forwards this subset for disambiguation, (3) A/V Synchronization - establishes correspondence between the handwritten character and the spoken content, (4) A/V Combination - combines the synchronized outputs from the video and audio based recognizers and generates the final recognized character, and (5) Grammar Assisted A/V Based Mathematical Content Recognition - utilizes a base mathematical speech grammar for both character and structure disambiguation. Experiments conducted using videos recorded in a classroom-like environment demonstrate the significant improvements in recognition accuracy that can be achieved using our techniques.

APA, Harvard, Vancouver, ISO, and other styles

32

Nishibori, Kento, Yoshinori Takeuchi, Tetsuya Matsumoto, Hiroaki Kudo, and Noboru Ohnishi. "An Active Correspondence of Audio-Visual Events by using Motor signal." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2005. http://hdl.handle.net/2237/10376.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Hübner, Sebastian Valentin. "Wissensbasierte Modellierung von Audio-Signal-Klassifikatoren zur Bioakustik von Tursiops truncatus." Potsdam Univ.-Verl, 2006. http://d-nb.info/1000230988/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Fong, W. N. W. "Model-based methods for linear and non-linear audio signal enhancement." Thesis, University of Cambridge, 2003. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.599095.

Full text

Abstract:

Owing to the random nature of audio signals, most of the enhancement methodologies reviewed in this work are based explicitly on a Bayesian model-based approach. Of these, the Kalman filter is the most commonly adopted enhancement strategy for a linear and Gaussian restoration problem. To copy with the general non-linear and non-Gaussian case, different filters such as the extended Kalman filter and the Gaussian sum filter have been proposed in the past few decades. As computing power increases, more computationally expensive simulation based approaches such as Monte Carlo methods have been suggested. The main focus of this work is on sequential estimation of the underlying clean signal and system parameters given some noisy observations under the Monte Carlo framework. This class of method is known as sequential Monte Carlo methods, also commonly referred to as the particle filter. In this work, different improvement strategies have been developed and described to improve on the generic particle filtering/smoothing algorithm. A block-based particle smoother is proposed to reduce the memory capacity required for the processing of lengthy datasets, such as audio signals. A Rao-Blackwellised particle smoother is developed to improve on the simulation results by reducing the dimension of the sampling space and thus the estimation variance. To cope with the non-linear restoration problem, a non-linear Rao-Blackwellised particle smoother is developed, which marginalises the parameter state, instead of the signal state as suggested earlier. Finally, we propose an efficient implementation of the suggested slow time-varying model under the sequential Monte Carlo framework for on-line joint signal and parameter estimation.

APA, Harvard, Vancouver, ISO, and other styles

35

Benjelloun, Touimi Abdellatif. "Traitement du signal audio dans le domaine codé : techniques et applications /." Paris : École nationale supérieure des télécommunications, 2001. http://catalogue.bnf.fr/ark:/12148/cb388319544.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Lipstreu, William F. "Digital Signal Processing Laboratory Using Real-Time Implementations of Audio Applications." Cleveland, Ohio : Case Western Reserve University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=case1240836810.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Benjelloun, Touimi Abdellatif. "Traitement du signal audio dans le domaine code : techniques et applications." Paris, ENST, 2001. http://www.theses.fr/2001ENST0018.

Full text

Abstract:

La manipulation classique de flux audio codes nécessite une opération préalable de décodage pour extraire les signaux temporels puis de recodage après traitement. Pour pallier les inconvénients en terme de complexité et délai algorithmiques, lies a cette méthode, cette thèse se propose de réaliser un traitement direct dans le domaine code. On s'intéresse aux codeurs de types fréquentiel perceptuel et particulièrement mpeg-1 layer i-ii et tdac de ftr&d. Le contexte applicatif illustre concerne le traitement du son pour la téléconférence multipoint. Le premier traitement aborde concerne le filtrage dans le domaine des sous-bandes. Une méthode générique a été développée permettant de transposer tout filtre rationnel (fir ou iir) temporel dans ce domaine et pour tout banc de filtres à décimation maximale assurant la reconstruction parfaite. Cette méthode a été appliquée à la spatialisation sonore par des filtres hrtf dans le domaine des sous-bandes. L'étude de la sommation sur les flux codes soulevé plusieurs contraintes suivant le codeur considère. Pour le codeur mpeg-1, la problématique principale concerne la détermination des paramètres psychoacoustiques nécessaires à l'allocation de bits. L'algorithme propose apporte donc une solution à ce problème par estimation des seuils de masquage des signaux individuels à sommer puis recombinaison. Une méthode de réduction de débit est aussi dérivée à partir de cet algorithme. Pour le codeur tdac la diminution de la complexité dans le procède de sommation repose sur le phénomène de masquage inter-signaux en tenant compte de la structure particulière de ce codeur. Elle profite de l'imbrication des dictionnaires de la quantification vectorielle qu'il utilise. La mise en valeur de l'intérêt du traitement dans le domaine code a été concrétisée par une mise en œuvre dans un pont audio de téléconférence multipoints qui assure les fonctionnalités de mixage, correction de trames effacées et la gestion des flux discontinus.

APA, Harvard, Vancouver, ISO, and other styles

38

Olivero, Anaik. "Les multiplicateurs temps-fréquence : Applications à l’analyse et la synthèse de signaux sonores et musicaux." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM4788/document.

Full text

Abstract:

Cette thèse s'inscrit dans le contexte de l'analyse/transformation/synthèse des signaux audio utilisant des représentations temps-fréquence, de type transformation de Gabor. Dans ce contexte, la complexité des transformations permettant de relier des sons peut être modélisée au moyen de multiplicateurs de Gabor, opérateurs de signaux linéaires caractérisés par une fonction de transfert temps-fréquence, à valeurs complexes, que l'on appelle masque de Gabor. Les multiplicateurs de Gabor permettent deformaliser le concept de filtrage dans le plan temps-fréquence. En agissant de façon multiplicative dans le plan temps-fréquence, ils sont a priori bien adaptés pour réaliser des transformations sonores telles que des modifications de timbre des sons. Dans un premier temps, ce travail de thèses intéresse à la modélisation du problème d'estimation d'un masque de Gabor entre deux signaux donnés et la mise en place de méthodes de calculs efficaces permettant de résoudre le problème. Le multiplicateur de Gabor entre deux signaux n'est pas défini de manière unique et les techniques d'estimation proposées de construire des multiplicateurs produisant des signaux sonores de qualité satisfaisante. Dans un second temps, nous montrons que les masques de Gabor contiennent une information pertinente capable d'établir une classification des signaux,et proposons des stratégies permettant de localiser automatiquement les régions temps-fréquence impliquées dans la différentiation de deux classes de signaux. Enfin, nous montrons que les multiplicateurs de Gabor constituent tout un panel de transformations sonores entre deux sons, qui, dans certaines situations, peuvent être guidées par des descripteurs de timbre
Analysis/Transformation/Synthesis is a generalparadigm in signal processing, that aims at manipulating or generating signalsfor practical applications. This thesis deals with time-frequencyrepresentations obtained with Gabor atoms. In this context, the complexity of a soundtransformation can be modeled by a Gabor multiplier. Gabormultipliers are linear diagonal operators acting on signals, andare characterized by a time-frequency transfer function of complex values, called theGabor mask. Gabor multipliers allows to formalize the conceptof filtering in the time-frequency domain. As they act by multiplying in the time-frequencydomain, they are "a priori'' well adapted to producesound transformations like timbre transformations. In a first part, this work proposes to model theproblem of Gabor mask estimation between two given signals,and provides algorithms to solve it. The Gabor multiplier between two signals is not uniquely defined and the proposed estimationstrategies are able to generate Gabor multipliers that produce signalswith a satisfied sound quality. In a second part, we show that a Gabor maskcontain a relevant information, as it can be viewed asa time-frequency representation of the difference oftimbre between two given sounds. By averaging the energy contained in a Gabor mask, we obtain a measure of this difference that allows to discriminate different musical instrumentsounds. We also propose strategies to automaticallylocalize the time-frequency regions responsible for such a timbre dissimilarity between musicalinstrument classes. Finally, we show that the Gabor multipliers can beused to construct a lot of sounds morphing trajectories,and propose an extension

APA, Harvard, Vancouver, ISO, and other styles

39

Bland, Denise. "Alias-free signal processing of nonuniformly sampled signals." Thesis, University of Westminster, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.322992.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Potard, Guillaume. "3D-audio object oriented coding." Access electronically, 2006. http://www.library.uow.edu.au/adt-NWU/public/adt-NWU20061109.111639/index.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Lee, Hyeon. "Spatial Audio for Bat Biosonar." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/99833.

Full text

Abstract:

Research investigating the behavioral and physiological responses of bats to echoes typically includes analysis of acoustic signals from microphones and/or microphone arrays, using time difference of arrival (TDOA) between array elements or the microphones to locate flying bats (azimuth and elevation). This has provided insight into transmission adaptations with respect to target distance, clutter, and interference. Microphones recording transmitted signals and echoes near a stationary bat provide sound pressure as a function of time but no directional information. This dissertation introduces spatial audio techniques to bat biosonar studies as a complementary method to the current TDOA based acoustical study methods. This work proposes a couple of feasible methods based on spatial audio techniques, that both track bats in flight and pinpoint the directions of echoes received by a bat. A spatial audio/soundfield microphone array is introduced to measure sounds in the sonar frequency range (20-80 kHz) of the big brown bat (Eptesicus fuscus). The custom-built ultrasonic tetrahedral soundfield microphone consists of four capacitive microphones that were calibrated to match magnitude and phase responses using a transfer function approach. Ambisonics, a signal processing technique used in three-dimensional (3D) audio applications, is used for the basic processing and reproduction of the signals measured by the soundfield microphone. Ambisonics provides syntheses and decompositions of a signal containing its directional properties, using the relationship between the spherical harmonics and the directional properties. As the first proposed method, a spatial audio decoding technique called HARPEx (High Angular Resolution Planewave Expansion) was used to build a system providing angle and elevation estimates. HARPEx can estimate the direction of arrivals (DOA) for up to two simultaneous sources since it decomposes a signal into two dominant planewaves. Experiments proved that the estimation system based on HARPEx provides accurate DOA estimates of static or moving sources. It also reconstructed a smooth flight-path of a bat by accurately estimating its direction at each snapshot of pulse measurements in time. The performance of the system was also assessed using statistical analyses of simulations. A signal model was built to generate microphone capsule responses to a virtual source emitting an LFM signal (3 ms, two harmonics: 40-22 kHz and 80-44 kHz) at an angle of 30° in the simulations. Medians and RMSEs (root-mean-square error) of 10,000 simulations for each case represent the accuracy and precision of the estimations, respectively. Results show lower d (distance between a capsule and the soundfield microphone center) or/and higher SNR (signal-to-noise ratio) are required to achieve higher estimator performance. The Cramer-Rao lower bounds (CRLB) of the estimator are also computed with various d and SNR conditions. The CRLB which is for TDOA based methods does not cover the effects of different incident angles to the capsules and signal delays between the capsules due to a non-zero d, on the estimation system. This shows the CRLB is not a proper tool to assess the estimator performance. For the second proposed method, the matched-filter technique is used instead of HARPEx to build another estimation system. The signal processing algorithm based on Ambisonics and the matched-filter approach reproduces a measured signal in various directions, and computes matched-filter responses of the reproduced signals in time-series. The matched-filter result points a target(s) by the highest filter response. This is a sonar-like estimation system that provides information of the target (range, direction, and velocity) using sonar fundamentals. Experiments using a loudspeaker (emitter) and an artificial or natural target (either stationary or moving) show the system provides accurate estimates of the target's direction and range. Simulations of imitating a situation where a bat emits a pulse and receives an echo from a target (30°) were also performed. The echo sound level is determined using the sonar equation. The system processed the virtual bat pulse and echo, and accurately estimated the direction, range, and velocity of the target. The simulation results also appear to recommend an echo level over -3 dB for accurate and precise estimations (below 15% RMSE for all parameters). This work proposes two methods to track bats in flight or/and pinpoint the directions of targets using spatial audio techniques. The suggested methods provide accurate estimates of the direction, range, or/and velocity of a bat based on its pulses or of a target based on echoes. This demonstrates these methods can be used as key tools to reconstruct bat biosonar. They would be also an independent tool or a complementary option to TDOA based methods, for bat echolocation studies. The developed methods are believed to be also useful in improving man-made sonar technology.
Doctor of Philosophy
While bats are one of the most intriguing creatures to the general population, they are also a popular subject of study in various disciplines. Their extraordinary ability to navigate and forage irrespective of clutter using echolocation has gotten attention from many scientists and engineers. Research investigating bats typically includes analysis of acoustic signals from microphones and/or microphone arrays. Using time difference of arrival (TDOA) between the array elements or the microphones is probably the most popular method to locate flying bats (azimuth and elevation). Microphone responses to transmitted signals and echoes near a bat provide sound pressure but no directional information. This dissertation proposes a complementary way to the current TDOA methods, that delivers directional information by introducing spatial audio techniques. This work shows a couple of feasible methods based on spatial audio techniques, that can both track bats in flight and pinpoint the directions of echoes received by a bat. An ultrasonic tetrahedral soundfield microphone is introduced as a measurement tool for sounds in the sonar frequency range (20-80 kHz) of the big brown bat (Eptesicus fuscus). Ambisonics, a signal processing technique used in three-dimensional (3D) audio applications, is used for the basic processing of the signals measured by the soundfield microphone. Ambisonics also reproduces a measured signal containing its directional properties. As the first method, a spatial audio decoding technique called HARPEx (High Angular Resolution Planewave Expansion) was used to build a system providing angle and elevation estimates. HARPEx can estimate the direction of arrivals (DOA) for up to two simultaneous sound sources. Experiments proved that the estimation system based on HARPEx provides accurate DOA estimates of static or moving sources. The performance of the system was also assessed using statistical analyses of simulations. Medians and RMSEs (root-mean-square error) of 10,000 simulations for each simulation case represent the accuracy and precision of the estimations, respectively. Results show shorter distance between a capsule and the soundfield microphone center, or/and higher SNR (signal-to-noise ratio) are required to achieve higher performance. For the second method, the matched-filter technique is used to build another estimation system. This is a sonar-like estimation system that provides information of the target (range, direction, and velocity) using matched-filter responses and sonar fundamentals. Experiments using a loudspeaker (emitter) and an artificial or natural target (either stationary or moving) show the system provides accurate estimates of the target's direction and range. Simulations imitating a situation where a bat emits a pulse and receives an echo from a target (30°) were also performed. The system processed the virtual bat pulse and echo, and accurately estimated the direction, range, and velocity of the target. The suggested methods provide accurate estimates of the direction, range, or/and velocity of a bat based on its pulses or of a target based on echoes. This demonstrates these methods can be used as key tools to reconstruct bat biosonar. They would be also an independent tool or a complementary option to TDOA based methods, for bat echolocation studies. The developed methods are also believed to be useful in improving sonar technology.

APA, Harvard, Vancouver, ISO, and other styles

42

Lam, Vicky Yin Hay. "Audio signal compression and modelling using psychoacoustic excitation pattern and loudness models." Thesis, University of Strathclyde, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.248501.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Oudre, Laurent. "Reconnaissance d'accords à partir de signaux audio par l'utilisation de gabarits théoriques." Phd thesis, Télécom ParisTech, 2010. http://pastel.archives-ouvertes.fr/pastel-00542840.

Full text

Abstract:

Cette thèse s'inscrit dans le cadre du traitement du signal musical, en se focalisant plus particulièrement sur la transcription automatique de signaux audio en accords. En effet, depuis une dizaine d'années, de nombreux travaux visent à représenter les signaux musicaux de la façon la plus compacte et pertinente possible, par exemple dans un but d'indexation ou de recherche par similarité. La transcription en accords constitue une façon simple et robuste d'extraire l'information harmonique et rythmique des chansons et peut notamment être utilisée par les musiciens pour rejouer les morceaux. Nous proposons deux approches pour la reconnaissance automatique d'accords à partir de signaux audio, qui offrent la particularité de se baser uniquement sur des gabarits d'accords théoriques, c'est à dire sur la définition des accords. En particulier, nos systèmes ne nécessitent ni connaissance particulière sur l'harmonie du morceau, ni apprentissage. Notre première approche est déterministe, et repose sur l'utilisation conjointe de gabarits d'accords théoriques, de mesures d'ajustement et de post-traitement par filtrage. On extrait tout d'abord des vecteurs de chroma du signal musical, qui sont ensuite comparés aux gabarits d'accords grâce à plusieurs mesures d'ajustement. Le critère de reconnaissance ainsi formé est ensuite filtré, afin de prendre en compte l'aspect temporel de la tâche. L'accord finalement détecté sur chaque trame est celui minimisant le critère de reconnaissance. Cette méthode a notamment été présentée lors d'une évaluation internationale (MIREX 2009) et a obtenu des résultats très honorables. Notre seconde approche est probabiliste, et réutilise certains éléments présents dans notre méthode déterministe. En faisant un parallèle entre les mesures d'ajustement utilisées dans l'approche déterministe et des modèles de probabilité, on peut définir un cadre probabiliste pour la reconnaissance d'accords. Dans ce cadre, les probabilités de chaque accord dans le morceau sont évaluées grâce à un algorithme Espérance-Maximisation (EM). Il en résulte la détection, pour chaque chanson, d'un vocabulaire d'accords adapté, qui permet l'obtention d'une meilleure transcription en accords. Cette méthode est comparée à de nombreux systèmes de l'état de l'art, grâce à plusieurs corpus et plusieurs métriques, qui permettent une évaluation complète des différents aspects de la tâche.

APA, Harvard, Vancouver, ISO, and other styles

44

Parekh, Sanjeel. "Learning representations for robust audio-visual scene analysis." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT015/document.

Full text

Abstract:

L'objectif de cette thèse est de concevoir des algorithmes qui permettent la détection robuste d’objets et d’événements dans des vidéos en s’appuyant sur une analyse conjointe de données audio et visuelle. Ceci est inspiré par la capacité remarquable des humains à intégrer les caractéristiques auditives et visuelles pour améliorer leur compréhension de scénarios bruités. À cette fin, nous nous appuyons sur deux types d'associations naturelles entre les modalités d'enregistrements audiovisuels (réalisés à l'aide d'un seul microphone et d'une seule caméra), à savoir la corrélation mouvement/audio et la co-occurrence apparence/audio. Dans le premier cas, nous utilisons la séparation de sources audio comme application principale et proposons deux nouvelles méthodes dans le cadre classique de la factorisation par matrices non négatives (NMF). L'idée centrale est d'utiliser la corrélation temporelle entre l'audio et le mouvement pour les objets / actions où le mouvement produisant le son est visible. La première méthode proposée met l'accent sur le couplage flexible entre les représentations audio et de mouvement capturant les variations temporelles, tandis que la seconde repose sur la régression intermodale. Nous avons séparé plusieurs mélanges complexes d'instruments à cordes en leurs sources constituantes en utilisant ces approches.Pour identifier et extraire de nombreux objets couramment rencontrés, nous exploitons la co-occurrence apparence/audio dans de grands ensembles de données. Ce mécanisme d'association complémentaire est particulièrement utile pour les objets où les corrélations basées sur le mouvement ne sont ni visibles ni disponibles. Le problème est traité dans un contexte faiblement supervisé dans lequel nous proposons un framework d’apprentissage de représentation pour la classification robuste des événements audiovisuels, la localisation des objets visuels, la détection des événements audio et la séparation de sources.Nous avons testé de manière approfondie les idées proposées sur des ensembles de données publics. Ces expériences permettent de faire un lien avec des phénomènes intuitifs et multimodaux que les humains utilisent dans leur processus de compréhension de scènes audiovisuelles
The goal of this thesis is to design algorithms that enable robust detection of objectsand events in videos through joint audio-visual analysis. This is motivated by humans’remarkable ability to meaningfully integrate auditory and visual characteristics forperception in noisy scenarios. To this end, we identify two kinds of natural associationsbetween the modalities in recordings made using a single microphone and camera,namely motion-audio correlation and appearance-audio co-occurrence.For the former, we use audio source separation as the primary application andpropose two novel methods within the popular non-negative matrix factorizationframework. The central idea is to utilize the temporal correlation between audio andmotion for objects/actions where the sound-producing motion is visible. The firstproposed method focuses on soft coupling between audio and motion representationscapturing temporal variations, while the second is based on cross-modal regression.We segregate several challenging audio mixtures of string instruments into theirconstituent sources using these approaches.To identify and extract many commonly encountered objects, we leverageappearance–audio co-occurrence in large datasets. This complementary associationmechanism is particularly useful for objects where motion-based correlations are notvisible or available. The problem is dealt with in a weakly-supervised setting whereinwe design a representation learning framework for robust AV event classification,visual object localization, audio event detection and source separation.We extensively test the proposed ideas on publicly available datasets. The experimentsdemonstrate several intuitive multimodal phenomena that humans utilize on aregular basis for robust scene understanding

APA, Harvard, Vancouver, ISO, and other styles

45

Adistambha, Kevin. "Embedded lossless audio coding using linear prediction and cascade coding." Access electronically, 2005. http://www.library.uow.edu.au/adt-NWU/public/adt-NWU20060724.122433/index.html.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

TAKEDA, Kazuya, Takanori NISHINO, and Kenta NIWA. "Selective Listening Point Audio Based on Blind Signal Separation and Stereophonic Technology." Institute of Electronics, Information and Communication Engineers, 2009. http://hdl.handle.net/2237/15055.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Fan, Yun-Hui. "A stereo audio coder with a nearly constant signal-to-noise ratio." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/14788.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Alameda-Pineda, Xavier. "Egocentric Audio-Visual Scene Analysis : a machine learning and signal processing approach." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENM024/document.

Full text

Abstract:

Depuis les vingt dernières années, l'industrie a développé plusieurs produits commerciaux dotés de capacités auditives et visuelles. La grand majorité de ces produits est composée d'un caméscope et d'un microphone embarqué (téléphones portables, tablettes, etc). D'autres, comme la Kinect, sont équipés de capteurs de profondeur et/ou de petits réseaux de microphones. On trouve également des téléphones portables dotés d'un système de vision stéréo. En même temps, plusieurs systèmes orientés recherche sont apparus (par exemple, le robot humanoïde NAO). Du fait que ces systèmes sont compacts, leurs capteurs sont positionnés près les uns des autres. En conséquence, ils ne peuvent pas capturer la scène complète, mais qu'un point de vue très particulier de l'interaction sociale en cours. On appelle cela "Analyse Égocentrique de Scènes Audio-Visuelles''.Cette thèse contribue à cette thématique de plusieurs façons. D'abord, en fournissant une base de données publique qui cible des applications comme la reconnaissance d'actions et de gestes, localisation et suivi d'interlocuteurs, analyse du tour de parole, localisation de sources auditives, etc. Cette base a été utilisé en dedans et en dehors de cette thèse. Nous avons aussi travaillé le problème de la détection d'événements audio-visuels. Nous avons montré comme la confiance en une des modalités (issue de la vision en l'occurrence), peut être modélisée pour biaiser la méthode, en donnant lieu à un algorithme d'espérance-maximisation visuellement supervisé. Ensuite, nous avons modifié l'approche pour cibler la détection audio-visuelle d'interlocuteurs en utilisant le robot humanoïde NAO. En parallèle aux travaux en détection audio-visuelle d'interlocuteurs, nous avons développé une nouvelle approche pour la reconnaissance audio-visuelle de commandes. Nous avons évalué la qualité de plusieurs indices et classeurs, et confirmé que l'utilisation des données auditives et visuelles favorise la reconnaissance, en comparaison aux méthodes qui n'utilisent que l'audio ou que la vidéo. Plus tard, nous avons cherché la meilleure méthode pour des ensembles d'entraînement minuscules (5-10 observations par catégorie). Il s'agit d'un problème intéressant, car les systèmes réels ont besoin de s'adapter très rapidement et d'apprendre de nouvelles commandes. Ces systèmes doivent être opérationnels avec très peu d'échantillons pour l'usage publique. Pour finir, nous avons contribué au champ de la localisation de sources sonores, dans le cas particulier des réseaux coplanaires de microphones. C'est une problématique importante, car la géométrie du réseau est arbitraire et inconnue. En conséquence, cela ouvre la voie pour travailler avec des réseaux de microphones dynamiques, qui peuvent adapter leur géométrie pour mieux répondre à certaines tâches. De plus, la conception des produits commerciaux peut être contrainte de façon que les réseaux linéaires ou circulaires ne sont pas bien adaptés
Along the past two decades, the industry has developed several commercial products with audio-visual sensing capabilities. Most of them consists on a videocamera with an embedded microphone (mobile phones, tablets, etc). Other, such as Kinect, include depth sensors and/or small microphone arrays. Also, there are some mobile phones equipped with a stereo camera pair. At the same time, many research-oriented systems became available (e.g., humanoid robots such as NAO). Since all these systems are small in volume, their sensors are close to each other. Therefore, they are not able to capture de global scene, but one point of view of the ongoing social interplay. We refer to this as "Egocentric Audio-Visual Scene Analysis''.This thesis contributes to this field in several aspects. Firstly, by providing a publicly available data set targeting applications such as action/gesture recognition, speaker localization, tracking and diarisation, sound source localization, dialogue modelling, etc. This work has been used later on inside and outside the thesis. We also investigated the problem of AV event detection. We showed how the trust on one of the modalities (visual to be precise) can be modeled and used to bias the method, leading to a visually-supervised EM algorithm (ViSEM). Afterwards we modified the approach to target audio-visual speaker detection yielding to an on-line method working in the humanoid robot NAO. In parallel to the work on audio-visual speaker detection, we developed a new approach for audio-visual command recognition. We explored different features and classifiers and confirmed that the use of audio-visual data increases the performance when compared to auditory-only and to video-only classifiers. Later, we sought for the best method using tiny training sets (5-10 samples per class). This is interesting because real systems need to adapt and learn new commands from the user. Such systems need to be operational with a few examples for the general public usage. Finally, we contributed to the field of sound source localization, in the particular case of non-coplanar microphone arrays. This is interesting because the geometry of the microphone can be any. Consequently, this opens the door to dynamic microphone arrays that would adapt their geometry to fit some particular tasks. Also, because the design of commercial systems may be subject to certain constraints for which circular or linear arrays are not suited

APA, Harvard, Vancouver, ISO, and other styles

49

Paraskevas, Ioannis. "Phase as a feature extraction tool for audio classification and signal localisation." Thesis, University of Surrey, 2005. http://epubs.surrey.ac.uk/843856/.

Full text

Abstract:

The aim of this research is to demonstrate the significance of signal phase content in time localization issues in synthetic signals and in the extraction of appropriate features from acoustically similar audio recordings (non-synthetic signals) for audio classification purposes. Published work, relating to audio classification, tends to be1 focused on the discrimination of audio classes that are dissimilar acoustically. Consequently, a wide range of features, extracted from the audio recordings, has been appropriate for the classification task. In this research, the audio classification application involves audio recordings (digitized through the same pre-processing conditions) that are acoustically similar and hence, only a few features are appropriate, due to the similarity amongst the classes. The difficulties in processing the phase spectrum of a signal have probably led previous researchers to avoid its investigation. In this research, the sources of these difficulties are studied and certain methods are employed to overcome them. Subsequently, the phase content of the signal has been found to be useful for various applications. The justification of this, is demonstrated via audio classification (non-synthetic signals) and time localization (synthetic signals) applications. Summarizing, the original contributions, introduced based on this research work, are the 'whitened' Hartley spectrum and its short-time analysis, as well as the use of the Hartley phase cepstrum as a time localization tool and the use of phase related feature vectors for the audio classification application.

APA, Harvard, Vancouver, ISO, and other styles

50

Trinkaus, Trevor R. "Perceptual coding of audio and diverse speech signals." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13883.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Influence of audio signal'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles