Dissertations / Theses on the topic 'Audio analysi'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Audio analysi.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
CHEMLA, ROMEU SANTOS AXEL CLAUDE ANDRE'. "MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES." Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/700444.
Full textAmong the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.
À travers les différents domaines de recherche de la musique computationnelle, l’analysie et la génération de signaux audio sont l’exemple parfait de la trans-disciplinarité de ce domaine, nourrissant simultanément les pratiques scientifiques et artistiques depuis leur création. Intégrée à la musique computationnelle depuis sa création, la synthèse sonore a inspiré de nombreuses approches musicales et scientifiques, évoluant de pair avec les pratiques musicales et les avancées technologiques et scientifiques de son temps. De plus, certaines méthodes de synthèse sonore permettent aussi le processus inverse, appelé analyse, de sorte que les paramètres de synthèse d’un certain générateur peuvent être en partie ou entièrement obtenus à partir de sons donnés, pouvant ainsi être considérés comme une représentation alternative des signaux analysés. Parallèlement, l’intérêt croissant soulevé par les algorithmes d’apprentissage automatique a vivement questionné le monde scientifique, apportant de puissantes méthodes d’analyse de données suscitant de nombreux questionnements épistémologiques chez les chercheurs, en dépit de leur effectivité pratique. En particulier, une famille de méthodes d’apprentissage automatique, nommée modèles génératifs, s’intéressent à la génération de contenus originaux à partir de caractéristiques extraites directement des données analysées. Ces méthodes n’interrogent pas seulement les approches précédentes, mais aussi sur l’intégration de ces nouvelles méthodes dans les processus créatifs existants. Pourtant, alors que ces nouveaux processus génératifs sont progressivement intégrés dans le domaine la génération d’image, l’application de ces techniques en synthèse audio reste marginale. Dans cette thèse, nous proposons une nouvelle méthode d’analyse-synthèse basés sur ces derniers modèles génératifs, depuis renforcés par les avancées modernes dans le domaine de l’apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des systèmes génératifs, sur comment notre travail peut s’insérer dans les pratiques de synthèse sonore existantes, et que peut-on espérer de l’hybridation de ces deux approches. Ensuite, nous nous focaliserons plus précisément sur comment les récentes avancées accomplies dans ce domaine dans ce domaine peuvent être exploitées pour l’apprentissage de distributions sonores complexes, tout en étant suffisamment flexibles pour être intégrées dans le processus créatif de l’utilisateur. Nous proposons donc un processus d’inférence / génération, reflétant les paradigmes d’analyse-synthèse existant dans le domaine de génération audio, basé sur l’usage de modèles latents continus que l’on peut utiliser pour contrôler la génération. Pour ce faire, nous étudierons déjà les résultats préliminaires obtenus par cette méthode sur l’apprentissage de distributions spectrales, prises d’ensembles de données diversifiés, en adoptant une approche à la fois quantitative et qualitative. Ensuite, nous proposerons d’améliorer ces méthodes de manière spécifique à l’audio sur trois aspects distincts. D’abord, nous proposons deux stratégies de régularisation différentes pour l’analyse de signaux audio : une basée sur la traduction signal/ symbole, ainsi qu’une autre basée sur des contraintes perceptives. Nous passerons par la suite à la dimension temporelle de ces signaux audio, proposant de nouvelles méthodes basées sur l’extraction de représentations temporelles multi-échelle et sur une tâche supplémentaire de prédiction, permettant la modélisation de caractéristiques dynamiques par les espaces génératifs obtenus. En dernier lieu, nous passerons d’une approche scientifique à une approche plus orientée vers un point de vue recherche & création. Premièrement, nous présenterons notre librairie open-source, vsacids, visant à être employée par des créateurs experts et non-experts comme un outil intégré. Ensuite, nous proposons une première utilisation musicale de notre système par la création d’une performance temps réel, nommée ægo, basée à la fois sur notre librarie et sur un agent d’exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu’à maintenant, concernant les possibles améliorations et développements de la méthode de synthèse proposée, ainsi que sur de possibles applications créatives.
TERENZI, Alessandro. "Innovative Digital Signal Processing Methodologies for Identification and Analysis of Real Audio Systems." Doctoral thesis, Università Politecnica delle Marche, 2021. http://hdl.handle.net/11566/287822.
Full textMany real word audio systems exist, each has its own characteristics but almost all of them can be identified from the fact that they are able to generate or modify a sound. If a natural or artificial system can be defined as a sound system, then it is possible to apply the techniques of digital signal processing for the studying and the emulation of the system. In this thesis, innovative methodologies for digital signal processing applied to real audio systems will be discussed. In particular, three different audio systems will be considered: the world of vacuum-based non linear audio devices with particular attention to guitar and hi-fi amplifiers; the room acoustic environment and its effect on the sound propagation; and finally the sound emitted by honey bees in a beehive. Regarding the first system, innovative approaches for the identification of the Volterra series and Hammerstein models will be proposed, in particular an approach to overcome some limitation of Volterra series identification. The application of a sub-band structure to reduce the computational cost and increase the convergence speed of an adaptive Hammerstein model identification will be proposed as well. Finally, an innovative approach for the measurement of several distortion parameters using a single measure, exploiting a generalized Hammerstein model, will be presented. For the second system, the results of the application of a multi-point equalizer to two different situations will be exposed. In particular, in the first case, it will be shown how a multi-point equalization can be used not only to compensate the acoustical anomalies of a room, but also to improve the frequency response of vibrating transducers mounted on a rigid surface. The second contribution will show how a sub-band approach can be used to improve the computational cost and the speed of an adaptive algorithm for a multi-point and multi channel equalizer. At the end, the focus will be on a natural sound system, i.e., a honey bees colony. In this case, an innovative acquisition system for honey bees sound monitoring will be presented. Then, the approaches developed for sound analysis will be exposed and applied to the recorded sounds in two different situations. Finally, the obtained results, achieved with the application of classification algorithms, will be exposed. In the final part of the work some minor contributions still related to signal processing applied to real sound systems are presented. In particular, an implementation of an active noise control system is discussed, and two algorithms for digital effects where the former improves the sound performances of compact loudspeakers and the latter generates a stereophonic effect for electric guitars are exposed.
Djebbar, Fatiha. "Contributions to Audio Steganography : Algorithms and Robustness Analysis." Thesis, Brest, 2012. http://www.theses.fr/2012BRES0005.
Full textDigital steganography is a young flourishing science emerged as a prominent source of data security. The primary goal of steganography is to reliably send hidden information secretly, not merely to obscure its presence. It exploits the characteristics of digital media files such as: image, audio, video, text by utilizing them as carriers to secretly communicate data. Encryption and watermarking techniques are already used to address concerns related to datasecurity. However, constantly-changing attacks on the integrity of digital data require new techniques to break the cycle of malicious attempts and expand the scope of involved applications. The main objective of steganographic systems is to provide secure, undetectable and imperceptible ways to conceal high-rate of data into digital medium. Steganography is used under the assumption that it will not be detected if no one is attempting to uncover it. Steganography techniques have found their way into various and versatile applications. Some of these applications are used for the benefit of people others are used maliciously. The threat posed by criminals, hackers, terrorists and spies using steganography is indeed real. To defeat malicious attempts when communicating secretly, researchers’ work has been lately extended toinclude a new and parallel research branch to countermeasure steganagraphy techniques called steganalysis. The main purpose of steganalysis technique is to detect the presence or not of hidden message and does not consider necessarily its successful extraction. Digital speech, in particular, constitutes a prominent source of data-hiding across novel telecommunication technologies such as covered voice-over-IP, audio conferencing, etc. This thesis investigatesdigital speech steganography and steganalysis and aims at: (1) presenting an algorithm that meets high data capacity, undetectability and imperceptibility requirements of steganographic systems, (2) controlling the distortion induced by the embedding process (3) presenting new concepts of spectral embedding areas in the Fourier domain which is applicable to magnitude and phase spectrums and (4) introducing a simple yet effective speech steganalysis algorithm based on lossless data compression techniques. The steganographic algorithm’s performance is measured by perceptual and statistical evaluation methods. On the other hand, the steganalysis algorithm’s performance is measured by how well the system can distinguish between stego- and cover-audio signals. The results are very promising and show interesting performance tradeoffs compared to related methods. Future work is based mainly on strengthening the proposed steganalysis algorithm to be able to detect small hiding capacity. As for our steganographic algorithm, we aim at integrating our steganographic in some emerging devices such as iPhone and further enhancing the capabilities of our steganographic algorithm to ensure hidden-data integrity under severe compression, noise and channel distortion
Kafentzis, George. "Adaptive Sinusoidal Models for Speech with Applications in Speech Modifications and Audio Analysis." Thesis, Rennes 1, 2014. http://www.theses.fr/2014REN1S085/document.
Full textSinusoidal Modeling is one of the most widely used parametric methods for speech and audio signal processing. The accurate estimation of sinusoidal parameters (amplitudes, frequencies, and phases) is a critical task for close representation of the analyzed signal. In this thesis, based on recent advances in sinusoidal analysis, we propose high resolution adaptive sinusoidal models for analysis, synthesis, and modifications systems of speech. Our goal is to provide systems that represent speech in a highly accurate and compact way. Inspired by the recently introduced adaptive Quasi-Harmonic Model (aQHM) and adaptive Harmonic Model (aHM), we overview the theory of adaptive Sinusoidal Modeling and we propose a model named the extended adaptive Quasi-Harmonic Model (eaQHM), which is a non-parametric model able to adjust the instantaneous amplitudes and phases of its basis functions to the underlying time-varying characteristics of the speech signal, thus significantly alleviating the so-called local stationarity hypothesis. The eaQHM is shown to outperform aQHM in analysis and resynthesis of voiced speech. Based on the eaQHM, a hybrid analysis/synthesis system of speech is presented (eaQHNM), along with a hybrid version of the aHM (aHNM). Moreover, we present motivation for a full-band representation of speech using the eaQHM, that is, representing all parts of speech as high resolution AM-FM sinusoids. Experiments show that adaptation and quasi-harmonicity is sufficient to provide transparent quality in unvoiced speech resynthesis. The full-band eaQHM analysis and synthesis system is presented next, which outperforms state-of-the-art systems, hybrid or full-band, in speech reconstruction, providing transparent quality confirmed by objective and subjective evaluations. Regarding applications, the eaQHM and the aHM are applied on speech modifications (time and pitch scaling). The resulting modifications are of high quality, and follow very simple rules, compared to other state-of-the-art modification systems. Results show that harmonicity is preferred over quasi-harmonicity in speech modifications due to the embedded simplicity of representation. Moreover, the full-band eaQHM is applied on the problem of modeling audio signals, and specifically of musical instrument sounds. The eaQHM is evaluated and compared to state-of-the-art systems, and is shown to outperform them in terms of resynthesis quality, successfully representing the attack, transient, and stationary part of a musical instrument sound. Finally, another application is suggested, namely the analysis and classification of emotional speech. The eaQHM is applied on the analysis of emotional speech, providing its instantaneous parameters as features that can be used in recognition and Vector-Quantization-based classification of the emotional content of speech. Although the sinusoidal models are not commonly used in such tasks, results are promising
SIMONETTA, FEDERICO. "MUSIC INTERPRETATION ANALYSIS. A MULTIMODAL APPROACH TO SCORE-INFORMED RESYNTHESIS OF PIANO RECORDINGS." Doctoral thesis, Università degli Studi di Milano, 2022. http://hdl.handle.net/2434/918909.
Full textSong, Guanghan. "Effect of sound in videos on gaze : contribution to audio-visual saliency modelling." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENT013/document.
Full textHumans receive large quantity of information from the environment with sight and hearing. To help us to react rapidly and properly, there exist mechanisms in the brain to bias attention towards particular regions, namely the salient regions. This attentional bias is not only influenced by vision, but also influenced by audio-visual interaction. According to existing literature, the visual attention can be studied towards eye movements, however the sound effect on eye movement in videos is little known. The aim of this thesis is to investigate the influence of sound in videos on eye movement and to propose an audio-visual saliency model to predict salient regions in videos more accurately. For this purpose, we designed a first audio-visual experiment of eye tracking. We created a database of short video excerpts selected from various films. These excerpts were viewed by participants either with their original soundtrack (AV condition), or without soundtrack (V condition). We analyzed the difference of eye positions between participants with AV and V conditions. The results show that there does exist an effect of sound on eye movement and the effect is greater for the on-screen speech class. Then, we designed a second audio-visual experiment with thirteen classes of sound. Through comparing the difference of eye positions between participants with AV and V conditions, we conclude that the effect of sound is different depending on the type of sound, and the classes with human voice (i.e. speech, singer, human noise and singers classes) have the greatest effect. More precisely, sound source significantly attracted eye position only when the sound was human voice. Moreover, participants with AV condition had a shorter average duration of fixation than with V condition. Finally, we proposed a preliminary audio-visual saliency model based on the findings of the above experiments. In this model, two fusion strategies of audio and visual information were described: one for speech sound class, and one for musical instrument sound class. The audio-visual fusion strategies defined in the model improves its predictability with AV condition
Elfitri, I. "Analysis by synthesis spatial audio coding." Thesis, University of Surrey, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.590657.
Full textFazekas, György. "Semantic audio analysis utilities and applications." Thesis, Queen Mary, University of London, 2012. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8443.
Full textSteinhour, Jacob B. "The Social and Pedagogical Advantages of Audio Forensics and Restoration Education." Ohio University Honors Tutorial College / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ouhonors1276014966.
Full textXiao, Zhongzhe Chen Liming. "Recognition of emotions in audio signals." Ecully : Ecole Centrale de Lyon, 2008. http://bibli.ec-lyon.fr/exl-doc/zxiao.pdf.
Full textÅsén, Rickard. "Game Audio in Audio Games : Towards a Theory on the Roles and Functions of Sound in Audio Games." Thesis, Högskolan Dalarna, Ljud- och musikproduktion, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:du-13588.
Full textSkudelny, Sascha [Verfasser]. "Semantische Analyse von Audio-Logos : vom Audio-Branding-Element zur Metasprachlichen Betrachtung / Sascha Skudelny." Aachen : Shaker, 2012. http://d-nb.info/1069048127/34.
Full textBurka, Zak. "Perceptual audio classification using principal component analysis /." Online version of thesis, 2010. http://hdl.handle.net/1850/12247.
Full textStammers, Jon. "Audio event classification for urban soundscape analysis." Thesis, University of York, 2011. http://etheses.whiterose.ac.uk/19142/.
Full textMitianoudis, Nikolaos. "Audio source separation using independent component analysis." Thesis, Queen Mary, University of London, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.406171.
Full textSpina, Michelle S. (Michelle Suzanne). "Analysis and transcription of general audio data." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/86479.
Full textIncludes bibliographical references (p. 141-147).
by Michelle S. Spina.
Ph.D.
Bando, Yoshiaki. "Robust Audio Scene Analysis for Rescue Robots." Kyoto University, 2018. http://hdl.handle.net/2433/232410.
Full textOcchipinti, Cristina. "Analisi di segnali audio mediante funzioni wavelet." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2010. http://amslaurea.unibo.it/1581/.
Full textDe, Sena Enzo. "Analysis, design and implementation of multichannel audio systems." Thesis, King's College London (University of London), 2013. https://kclpure.kcl.ac.uk/portal/en/theses/analysis-design-and-implementation-of-multichannel-audio-systems(2667506b-f58e-44f1-858a-bcb67d341720).html.
Full textXiao, Zhongzhe. "Recognition of emotions in audio signals." Ecully, Ecole centrale de Lyon, 2008. http://www.theses.fr/2008ECDL0002.
Full textThis Ph. D thesis work is dedicated to automatic emotion/mood recognition in audio signals. Indeed, audio emotion is high semantic information and its automatic analysis may have many applications such as smart human-computer interactions or multimedia indexing. The purpose of this thesis is thus to investigate machine-based audio emotion analysis solutions for both speech and music signals. Our work makes use of a discrete emotional model combined with the dimensional one and relies upon existing studies on acoustics correlates of emotional speech and music mood. The key contributions are the following. First, we have proposed, in complement to popular frequency-based and energy-based features, some new audio features, namely harmonic and Zipf features, to better characterize timbre and prosodic properties of emotional speech. Second, as there exists very few emotional resources either for speech or music for machine learning as compared to audio features that one can extract, an evidence theory-based feature selection scheme named Embedded Sequential Forward Selection (ESFS) is proposed to deal with the classic “curse of dimensionality” problem and thus over-fitting. Third, using a manually built dimensional emotion model-based hierarchical classifier to deal with fuzzy borders of emotional states, we demonstrated that a hierarchical classification scheme performs better than single global classifier mostly used in the literature. Furthermore, as there does not exist any universal agreement on basic emotion definition and as emotional states are typically application dependent, we also proposed a ESFS-based algorithm for automatically building a hierarchical classification scheme (HCS) which is best adapted to a specific set of application dependent emotional states. The HCS divides a complex classification problem into simpler and smaller problems by combining several binary sub-classifiers in the structure of a binary tree in several stages, and gives the result as the type of emotional states of the audio samples. Finally, to deal with the subjective nature of emotions, we also proposed an evidence theory-based ambiguous classifier allowing multiple emotions labeling as human often does. The effectiveness of all these recognition techniques was evaluated on Berlin and DES datasets for emotional speech recognition and on a music mood dataset that we collected in our laboratory as there exist no public dataset so far. Keywords: audio signal, emotion classification, music mood analysis, audio features, feature selection, hierarchical classification, ambiguous classification, evidence theory
Nesvadba, Jan. "Segmentation sémantique des contenus audio-visuels." Bordeaux 1, 2007. http://www.theses.fr/2007BOR13456.
Full textParekh, Sanjeel. "Learning representations for robust audio-visual scene analysis." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT015/document.
Full textThe goal of this thesis is to design algorithms that enable robust detection of objectsand events in videos through joint audio-visual analysis. This is motivated by humans’remarkable ability to meaningfully integrate auditory and visual characteristics forperception in noisy scenarios. To this end, we identify two kinds of natural associationsbetween the modalities in recordings made using a single microphone and camera,namely motion-audio correlation and appearance-audio co-occurrence.For the former, we use audio source separation as the primary application andpropose two novel methods within the popular non-negative matrix factorizationframework. The central idea is to utilize the temporal correlation between audio andmotion for objects/actions where the sound-producing motion is visible. The firstproposed method focuses on soft coupling between audio and motion representationscapturing temporal variations, while the second is based on cross-modal regression.We segregate several challenging audio mixtures of string instruments into theirconstituent sources using these approaches.To identify and extract many commonly encountered objects, we leverageappearance–audio co-occurrence in large datasets. This complementary associationmechanism is particularly useful for objects where motion-based correlations are notvisible or available. The problem is dealt with in a weakly-supervised setting whereinwe design a representation learning framework for robust AV event classification,visual object localization, audio event detection and source separation.We extensively test the proposed ideas on publicly available datasets. The experimentsdemonstrate several intuitive multimodal phenomena that humans utilize on aregular basis for robust scene understanding
Faircloth, Ryan. "AUDIO AND VIDEO TEMPO ANALYSIS FOR DANCE DETECTION." Master's thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2633.
Full textM.S.E.E.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering MSEE
Kolozali, Sefki. "Automatic ontology generation based on semantic audio analysis." Thesis, Queen Mary, University of London, 2014. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8452.
Full textHainsworth, Stephen Webley. "Techniques for the automated analysis of musical audio." Thesis, University of Cambridge, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.616011.
Full textSolomon, Mary Joanna. "Multivariate Analysis of Korean Pop Music Audio Features." Bowling Green State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1617105874719868.
Full textCarlo, Diego Di. "Echo-aware signal processing for audio scene analysis." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S075.
Full textMost of audio signal processing methods regard reverberation and in particular acoustic echoes as a nuisance. However, they convey important spatial and semantic information about sound sources and, based on this, recent echo-aware methods have been proposed. In this work we focus on two directions. First, we study the how to estimate acoustic echoes blindly from microphone recordings. Two approaches are proposed, one leveraging on continuous dictionaries, one using recent deep learning techniques. Then, we focus on extending existing methods in audio scene analysis to their echo-aware forms. The Multichannel NMF framework for audio source separation, the SRP-PHAT localization method, and the MVDR beamformer for speech enhancement are all extended to their echo-aware versions
Hartquist, John E. "Real-time Musical Analysis of Polyphonic Guitar Audio." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/808.
Full textNing, Daryl. "Analysis and coding of high quality audio signals." Thesis, Queensland University of Technology, 2003. https://eprints.qut.edu.au/15814/1/Daryl_Ning_Thesis.pdf.
Full textNing, Daryl. "Analysis and Coding of High Quality Audio Signals." Queensland University of Technology, 2003. http://eprints.qut.edu.au/15814/.
Full textRen, Reede. "Audio-visual football video analysis, from structure detection to attention analysis." Thesis, Connect to e-thesis. Move to record for print version, 2008. http://theses.gla.ac.uk/77/.
Full textPh.D. thesis submitted to the Faculty of Information and Mathematical Sciences, Department of Computing Science, University of Glasgow, 2008. Includes bibliographical references. Print version also available.
Alameda-Pineda, Xavier. "Egocentric Audio-Visual Scene Analysis : a machine learning and signal processing approach." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENM024/document.
Full textAlong the past two decades, the industry has developed several commercial products with audio-visual sensing capabilities. Most of them consists on a videocamera with an embedded microphone (mobile phones, tablets, etc). Other, such as Kinect, include depth sensors and/or small microphone arrays. Also, there are some mobile phones equipped with a stereo camera pair. At the same time, many research-oriented systems became available (e.g., humanoid robots such as NAO). Since all these systems are small in volume, their sensors are close to each other. Therefore, they are not able to capture de global scene, but one point of view of the ongoing social interplay. We refer to this as "Egocentric Audio-Visual Scene Analysis''.This thesis contributes to this field in several aspects. Firstly, by providing a publicly available data set targeting applications such as action/gesture recognition, speaker localization, tracking and diarisation, sound source localization, dialogue modelling, etc. This work has been used later on inside and outside the thesis. We also investigated the problem of AV event detection. We showed how the trust on one of the modalities (visual to be precise) can be modeled and used to bias the method, leading to a visually-supervised EM algorithm (ViSEM). Afterwards we modified the approach to target audio-visual speaker detection yielding to an on-line method working in the humanoid robot NAO. In parallel to the work on audio-visual speaker detection, we developed a new approach for audio-visual command recognition. We explored different features and classifiers and confirmed that the use of audio-visual data increases the performance when compared to auditory-only and to video-only classifiers. Later, we sought for the best method using tiny training sets (5-10 samples per class). This is interesting because real systems need to adapt and learn new commands from the user. Such systems need to be operational with a few examples for the general public usage. Finally, we contributed to the field of sound source localization, in the particular case of non-coplanar microphone arrays. This is interesting because the geometry of the microphone can be any. Consequently, this opens the door to dynamic microphone arrays that would adapt their geometry to fit some particular tasks. Also, because the design of commercial systems may be subject to certain constraints for which circular or linear arrays are not suited
Xiong, Xin, and Shuang Li. "Energy Audit of a Building : Energy Audit and Saving Analysis." Thesis, University of Gävle, University of Gävle, Department of Technology and Built Environment, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-4617.
Full textThe typical residential building is located at the crossing of S. Centralgatan Street and Nedre Akargatan Street in the city of Gavle. It is a quadrangle building of six floors with a yard in the middle. There are 180 apartments of five types in total, and at the first floor there is a kindergarten. There is a District Heating in the building and heating recovery system ventilation which use heat exchanger to reheat.
Several solutions are used for reducing the heat loss. In the first step, the heat loss and heat in has been calculated. There are several parameters that involve the heat loss and heat in of whole building, so each parameter in the energy balance equation is extracted and calculated. And then the Energy Balance Sheet has been built. Among the heat loss part, the transmission is 1237MWh, the hot tap water is 332MWh, the mechanical ventilation is 1041MWh, the natural ventilation is 325.7MWh.In the part of heat in, the DH is 1265.7MWh, the heat pump is 793MWh, the solar radiation is 562MWh, the internal heating is 315MWh.Later in the second step, after analyzing data of heat loss part, the improvements will be focused on the transmission and hot tap water parts because the heat loss in those two parts occupy the most. At the end of final step, the solutions have been discussed to optimize the heating system.
As conclusion, there are several suggested solutions. The total reduction of heat loss after adjustment is 163MWh, accounts 5.6% of originally heat loss. The heat loss of the building has been reduced from 2935.7MWh to 2772.7MWh.
Gebru, Israel Dejene. "Analyse audio-visuelle dans le cadre des interactions humaines avec les robots." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM020.
Full textIn recent years, there has been a growing interest in human-robot interaction (HRI), with the aim to enable robots to naturally interact and communicate with humans. Natural interaction implies that robots not only need to understand speech and non-verbal communication cues such as body gesture, gaze, or facial expressions, but they also need to understand the dynamics of the social interplay, e.g., find people in the environment, distinguish between different people, track them through the physical space, parse their actions and activity, estimate their engagement, identify who is speaking, who speaks to whom, etc. All these necessitate the robots to have multimodal perception skills to meaningfully detect and integrate information from their multiple sensory channels. In this thesis, we focus on the robot's audio-visual sensory inputs consisting of the (multiple) microphones and video cameras. Among the different addressable perception tasks, in this thesis we explore three, namely; (P1) multiple speakers localization, (P2) multiple-person location tracking, and (P3) speaker diarization. The majority of existing works in signal processing and computer vision address these problems by utilizing audio signals alone, or visual information only. However, in this thesis, we plan to address them via fusion of the audio and visual information gathered by two microphones and one video camera. Our goal is to exploit the complimentary nature of the audio and visual modalities with a hope of attaining significant improvements on robustness and performance over systems that use a single modality. Moreover, the three problems are addressed considering challenging HRI scenarios such as, eg a robot engaged in a multi-party interaction with varying number of participants, which may speak at the same time as well as may move around the scene and turn their heads/faces towards the other participants rather than facing the robot
Phillips, Nicola Jane. "Audio-visual scene analysis : attending to music in film." Thesis, University of Cambridge, 2000. https://www.repository.cam.ac.uk/handle/1810/251745.
Full textDietrich, Kelly. "Analysis of talker characteristics in audio-visual speech integration." Connect to resource, 2008. http://hdl.handle.net/1811/32149.
Full textGower, Ephraim. "Mathematical Analysis and Audio Applications in Blind Signal Decomposition." Thesis, University of Essex, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.520088.
Full textSwartz, Jordan. "Audit Office Growth Under Analysis." Thesis, The University of Arizona, 2015. http://hdl.handle.net/10150/578932.
Full textBest, Peter J. "Machine-independent audit trail analysis." Thesis, Queensland University of Technology, 1994.
Find full textTore, Gian Maria. "Expérience et figuration au cinéma : pour une sémiotique audio-visuelle." Limoges, 2007. http://www.theses.fr/2007LIMO2007.
Full textLecomte, Sébastien. "Classification partiellement supervisée par SVM : application à la détection d’événements en surveillance audio." Thesis, Troyes, 2013. http://www.theses.fr/2013TROY0031/document.
Full textThis thesis addresses partially supervised Support Vector Machines for novelty detection (One-Class SVM). These have been studied to design abnormal audio events detection for supervision of public infrastructures, in particular public transportation systems. In this context, the null hypothesis (“normal” audio signals) is relatively well known (even though corresponding signals can be notably non stationary). Conversely, every “abnormal” signal should be detected and, if possible, clustered with similar signals. Thus, a reference system based on a single model of normal signals is presented, then we propose to use several concurrent One-Class SVM to cluster new data. Regarding the amount of data to process, special solvers have been studied. The proposed algorithms must be real time. This is the reason why we have also investigated algorithms with warm start capabilities. By the study of these algorithms, we have proposed a unified framework for One Class and Binary SVMs, with and without bias. The proposed approach has been validated on a database of real signals. The whole process applied to the monitoring of a subway station has been presented during the final review of the European Project VANAHEIM
Klevhamre, Benny, and Peter Nilsson. "Further Development of an Audio Analyzer." Thesis, Linköping University, Department of Electrical Engineering, 2002. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-1155.
Full textEn del av en Audioanalystor har blivit utveckladoch implementerad som en applikation i det hårdvarubeskrivande språket VHDL. Denna del har sedan programmerats in i en PLD-krets på ett kretskort som används i audiotester för mobiltelefoner på Flextronics. Applikationen konverterar data så att det ska gå att skicka information mellan telefonen och olika mätinstrument. Applikationen består av två äldre applikationer. Av dessa två har en blivit helt implementerad. I den andra kvarstår att finna orsaken till varför den ger ifrån sig felaktigt data i form av oönskat brus. Arbetet avbröts p.g.a. slutdatum. A part of an audio analyzer has been developed and implemented as an application in the hardware description language VHDL. This part has later been programmed into a PLD device on a circuit board used for audio tests on mobile telephones at Flextronics. The application converts data, making it possible to send information between the telephone and different measuring instruments. The application consists of two older applications. One of them has been fully implemented. What is left in the other part is to find the cause why it is sending wrong data as unwanted noise. The work had to be stopped when deadline was reached
A part of an audio analyzer has been developed and implemented as an application in the hardware description language VHDL. This part has later been programmed into a PLD device on a circuit board used for audio tests on mobile telephones at Flextronics. The application converts data, making it possible to send information between the telephone and different measuring instruments. The application consists of two older applications. One of them has been fully implemented. What is left in the other part is to find the cause why it is sending wrong data as unwanted noise. The work had to be stopped when deadline was reached.
Aissa-El-Bey, Abdeldjalil. "Séparation aveugle de sources audio dans le contexte automobile." Paris, ENST, 2007. http://www.theses.fr/2007ENST0011.
Full textAïssa-el-Bey, Abdeldjalil. "Séparation aveugle de sources audio dans le contexte automobile /." Paris : École nationale supérieure des télécommunications, 2007. http://catalogue.bnf.fr/ark:/12148/cb41198300d.
Full textComer, K. Allen. "A wavelet-based technique for reducing noise in audio signals." Thesis, This resource online, 1996. http://scholar.lib.vt.edu/theses/available/etd-06082009-170933/.
Full textHlísta, Juraj. "Reaktivní audit." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2010. http://www.nusl.cz/ntk/nusl-237105.
Full textSigtia, Siddharth. "Neural networks for analysing music and environmental audio." Thesis, Queen Mary, University of London, 2017. http://qmro.qmul.ac.uk/xmlui/handle/123456789/24741.
Full textFourer, Dominique. "Approche informée pour l’analyse du son et de la musique." Thesis, Bordeaux 1, 2013. http://www.theses.fr/2013BOR14973/document.
Full textIn the field of audio signal processing, analysis is an essential step which allows interactions with existing signals. In fact, the quality of transformed or synthesized audio signals depends on the accuracy over the estimated model parameters. However, theoretical limits exist and show that the best accuracy which can be reached by a classic estimator can be insufficient for the most demanding applications (e.g. active listening of music). The work which is developed in this thesis revisits well known audio analysis problems like spectral analysis, automatic transcription of music and audio sources separation using the novel ``informed'' approach. This approach takes advantage of a specific configuration where the parameters of the elementary signals which compose a mixture are known before the mixing process. Using the tools which are proposed in this thesis, the minimal side information is computed and transmitted with the mixture signal. This allows any kind of transformation of the mixture signal with a constraint over the resulting quality. When the compatibility with existing audio formats is required, the side information is embedded directly into the analyzed audio signal using a watermarking technique. This work describes several theoretical and practical aspects of audio signal processing. We show that a classic estimator combined with the sufficient side information can obtain better performances than classic approaches (classic estimation or pure coding)
Guo, Ziyuan. "Objective Audio Quality Assessment Based on Spectro-Temporal Modulation Analysis." Thesis, KTH, Ljud- och bildbehandling, 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-91847.
Full textBoyes, Graham. "Dictionary-based analysis/synthesis and structured representations of musical audio." Thesis, McGill University, 2012. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=106507.
Full textDans la représentation du signal audio musical, il est commun de favoriser une interprétation de type signal ou bien de type symbole, alors que la représentation de type mi-niveau, ou intermédiaire, devient un sujet d'actualité. Dans cette thèse nous investiguons la perspective de ces représentations intermédiaires et structurées. Notre recherche intègre tant les aspects théoriques liés à des objets sonores séparables, que les méthodes d'analyse des signaux fondées sur des dictionnaires, et ce jusqu'à la conception de logiciels conus dans le cadre de la programmation orienté objet. Contrairement aux exemples disponibles dans la littérature notre approche des représentations intermédiaires part du niveau symbolique pour aller vers le signal, plutôt que le contraire. Cette méthodologie est appliquée non seulement à la spécification de techniques analytiques mais aussi à la conception d'un système logiciel afférent. Les résultats expérimentaux montrent que notre méthode est capable de réduire la distance d'Itakura-Saito, distance fondé sur la perception, ceci en comparaison à une méthode de décomposition générique. Nous montrons également que notre représentation structurée peut être utilisée dans des applications pratiques telles que la visualisation, l'agrégation post-traitement ainsi qu'en composition musicale.