Dissertations / Theses on the topic 'Audio synthesi'

To see the other types of publications on this topic, follow the link: Audio synthesi.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Audio synthesi.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

CHEMLA, ROMEU SANTOS AXEL CLAUDE ANDRE'. "MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES." Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/700444.

Full text
Abstract:
Tra i diversi campi di ricerca nell’ambito dell’informatica musicale, la sintesi e la generazione di segnali audio incarna la pluridisciplinalità di questo settore, nutrendo insieme le pratiche scientifiche e musicale dalla sua creazione. Inerente all’informatica dalla sua creazione, la generazione audio ha ispirato numerosi approcci, evolvendo colle pratiche musicale e gli progressi tecnologici e scientifici. Inoltre, alcuni processi di sintesi permettono anche il processo inverso, denominato analisi, in modo che i parametri di sintesi possono anche essere parzialmente o totalmente estratti dai suoni, dando una rappresentazione alternativa ai segnali analizzati. Per di più, la recente ascesa dei algoritmi di l’apprendimento automatico ha vivamente interrogato il settore della ricerca scientifica, fornendo potenti data-centered metodi che sollevavano diversi epistemologici interrogativi, nonostante i sui efficacia. Particolarmente, un tipo di metodi di apprendimento automatico, denominati modelli generativi, si concentrano sulla generazione di contenuto originale usando le caratteristiche che hanno estratti dei dati analizzati. In tal caso, questi modelli non hanno soltanto interrogato i precedenti metodi di generazione, ma anche sul modo di integrare questi algoritmi nelle pratiche artistiche. Mentre questi metodi sono progressivamente introdotti nel settore del trattamento delle immagini, la loro applicazione per la sintesi di segnali audio e ancora molto marginale. In questo lavoro, il nostro obiettivo e di proporre un nuovo metodo di audio sintesi basato su questi nuovi tipi di generativi modelli, rafforazti dalle nuove avanzati dell’apprendimento automatico. Al primo posto, facciamo una revisione dei approcci esistenti nei settori dei sistemi generativi e di sintesi sonore, focalizzando sul posto di nostro lavoro rispetto a questi disciplini e che cosa possiamo aspettare di questa collazione. In seguito, studiamo in maniera più precisa i modelli generativi, e come possiamo utilizzare questi recenti avanzati per l’apprendimento di complesse distribuzione di suoni, in un modo che sia flessibile e nel flusso creativo del utente. Quindi proponiamo un processo di inferenza / generazione, il quale rifletta i processi di analisi/sintesi che sono molto usati nel settore del trattamento del segnale audio, usando modelli latenti, che sono basati sull’utilizzazione di un spazio continuato di alto livello, che usiamo per controllare la generazione. Studiamo dapprima i risultati preliminari ottenuti con informazione spettrale estratte da diversi tipi di dati, che valutiamo qualitativamente e quantitativamente. Successiva- mente, studiamo come fare per rendere questi metodi più adattati ai segnali audio, fronteggiando tre diversi aspetti. Primo, proponiamo due diversi metodi di regolarizzazione di questo generativo spazio che sono specificamente sviluppati per l’audio : una strategia basata sulla traduzione segnali / simboli, e una basata su vincoli percettivi. Poi, proponiamo diversi metodi per fronteggiare il aspetto temporale dei segnali audio, basati sull’estrazione di rappresentazioni multiscala e sulla predizione, che permettono ai generativi spazi ottenuti di anche modellare l’aspetto dinamico di questi segnali. Per finire, cambiamo il nostro approccio scientifico per un punto di visto piú ispirato dall’idea di ricerca e creazione. Primo, descriviamo l’architettura e il design della nostra libreria open-source, vsacids, sviluppata per permettere a esperti o non-esperti musicisti di provare questi nuovi metodi di sintesi. Poi, proponiamo una prima utilizzazione del nostro modello con la creazione di una performance in real- time, chiamata ægo, basata insieme sulla nostra libreria vsacids e sull’uso di une agente di esplorazione, imparando con rinforzo nel corso della composizione. Finalmente, tramo dal lavoro presentato alcuni conclusioni sui diversi modi di migliorare e rinforzare il metodo di sintesi proposto, nonché eventuale applicazione artistiche.
Among the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.
À travers les différents domaines de recherche de la musique computationnelle, l’analysie et la génération de signaux audio sont l’exemple parfait de la trans-disciplinarité de ce domaine, nourrissant simultanément les pratiques scientifiques et artistiques depuis leur création. Intégrée à la musique computationnelle depuis sa création, la synthèse sonore a inspiré de nombreuses approches musicales et scientifiques, évoluant de pair avec les pratiques musicales et les avancées technologiques et scientifiques de son temps. De plus, certaines méthodes de synthèse sonore permettent aussi le processus inverse, appelé analyse, de sorte que les paramètres de synthèse d’un certain générateur peuvent être en partie ou entièrement obtenus à partir de sons donnés, pouvant ainsi être considérés comme une représentation alternative des signaux analysés. Parallèlement, l’intérêt croissant soulevé par les algorithmes d’apprentissage automatique a vivement questionné le monde scientifique, apportant de puissantes méthodes d’analyse de données suscitant de nombreux questionnements épistémologiques chez les chercheurs, en dépit de leur effectivité pratique. En particulier, une famille de méthodes d’apprentissage automatique, nommée modèles génératifs, s’intéressent à la génération de contenus originaux à partir de caractéristiques extraites directement des données analysées. Ces méthodes n’interrogent pas seulement les approches précédentes, mais aussi sur l’intégration de ces nouvelles méthodes dans les processus créatifs existants. Pourtant, alors que ces nouveaux processus génératifs sont progressivement intégrés dans le domaine la génération d’image, l’application de ces techniques en synthèse audio reste marginale. Dans cette thèse, nous proposons une nouvelle méthode d’analyse-synthèse basés sur ces derniers modèles génératifs, depuis renforcés par les avancées modernes dans le domaine de l’apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des systèmes génératifs, sur comment notre travail peut s’insérer dans les pratiques de synthèse sonore existantes, et que peut-on espérer de l’hybridation de ces deux approches. Ensuite, nous nous focaliserons plus précisément sur comment les récentes avancées accomplies dans ce domaine dans ce domaine peuvent être exploitées pour l’apprentissage de distributions sonores complexes, tout en étant suffisamment flexibles pour être intégrées dans le processus créatif de l’utilisateur. Nous proposons donc un processus d’inférence / génération, reflétant les paradigmes d’analyse-synthèse existant dans le domaine de génération audio, basé sur l’usage de modèles latents continus que l’on peut utiliser pour contrôler la génération. Pour ce faire, nous étudierons déjà les résultats préliminaires obtenus par cette méthode sur l’apprentissage de distributions spectrales, prises d’ensembles de données diversifiés, en adoptant une approche à la fois quantitative et qualitative. Ensuite, nous proposerons d’améliorer ces méthodes de manière spécifique à l’audio sur trois aspects distincts. D’abord, nous proposons deux stratégies de régularisation différentes pour l’analyse de signaux audio : une basée sur la traduction signal/ symbole, ainsi qu’une autre basée sur des contraintes perceptives. Nous passerons par la suite à la dimension temporelle de ces signaux audio, proposant de nouvelles méthodes basées sur l’extraction de représentations temporelles multi-échelle et sur une tâche supplémentaire de prédiction, permettant la modélisation de caractéristiques dynamiques par les espaces génératifs obtenus. En dernier lieu, nous passerons d’une approche scientifique à une approche plus orientée vers un point de vue recherche & création. Premièrement, nous présenterons notre librairie open-source, vsacids, visant à être employée par des créateurs experts et non-experts comme un outil intégré. Ensuite, nous proposons une première utilisation musicale de notre système par la création d’une performance temps réel, nommée ægo, basée à la fois sur notre librarie et sur un agent d’exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu’à maintenant, concernant les possibles améliorations et développements de la méthode de synthèse proposée, ainsi que sur de possibles applications créatives.
APA, Harvard, Vancouver, ISO, and other styles
2

Lundberg, Anton. "Data-Driven Procedural Audio : Procedural Engine Sounds Using Neural Audio Synthesis." Thesis, KTH, Datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280132.

Full text
Abstract:
The currently dominating approach for rendering audio content in interactivemedia, such as video games and virtual reality, involves playback of static audiofiles. This approach is inflexible and requires management of large quantities of audio data. An alternative approach is procedural audio, where sound models are used to generate audio in real time from live inputs. While providing many advantages, procedural audio has yet to find widespread use in commercial productions, partly due to the audio produced by many of the proposed models not meeting industry standards. This thesis investigates how procedural audio can be performed using datadriven methods. We do this by specifically investigating how to generate the sound of car engines using neural audio synthesis. Building on a recently published method that integrates digital signal processing with deep learning, called Differentiable Digital Signal Processing (DDSP), our method obtains sound models by training deep neural networks to reconstruct recorded audio examples from interpretable latent features. We propose a method for incorporating engine cycle phase information, as well as a differentiable transient synthesizer. Our results illustrate that DDSP can be used for procedural engine sounds; however, further work is needed before our models can generate engine sounds without undesired artifacts and before they can be used in live real-time applications. We argue that our approach can be useful for procedural audio in more general contexts, and discuss how our method can be applied to other sound sources.
Det i dagsläget dominerande tillvägagångssättet för rendering av ljud i interaktivamedia, såsom datorspel och virtual reality, innefattar uppspelning av statiska ljudfiler. Detta tillvägagångssätt saknar flexibilitet och kräver hantering av stora mängder ljuddata. Ett alternativt tillvägagångssätt är procedurellt ljud, vari ljudmodeller styrs för att generera ljud i realtid. Trots sina många fördelar används procedurellt ljud ännu inte i någon vid utsträckning inom kommersiella produktioner, delvis på grund av att det genererade ljudet från många föreslagna modeller inte når upp till industrins standarder. Detta examensarbete undersöker hur procedurellt ljud kan utföras med datadrivna metoder. Vi gör detta genom att specifikt undersöka metoder för syntes av bilmotorljud baserade på neural ljudsyntes. Genom att bygga på en nyligen publicerad metod som integrerar digital signalbehandling med djupinlärning, kallad Differentiable Digital Signal Processing (DDSP), kan vår metod skapa ljudmodeller genom att träna djupa neurala nätverk att rekonstruera inspelade ljudexempel från tolkningsbara latenta prediktorer. Vi föreslår en metod för att använda fasinformation från motorers förbränningscykler, samt en differentierbar metod för syntes av transienter. Våra resultat visar att DDSP kan användas till procedurella motorljud, men mer arbete krävs innan våra modeller kan generera motorljud utan oönskade artefakter samt innan de kan användas i realtidsapplikationer. Vi diskuterar hur vårt tillvägagångssätt kan vara användbart inom procedurellt ljud i mer generella sammanhang, samt hur vår metod kan tillämpas på andra ljudkällor
APA, Harvard, Vancouver, ISO, and other styles
3

Elfitri, I. "Analysis by synthesis spatial audio coding." Thesis, University of Surrey, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.590657.

Full text
Abstract:
Spatial Audio Coding (SAC) is a technique used to encode multichannel audio signals by extract ing the spatial parameters and downmixing the audio signals to a mono or stereo audio signal. Recently, various SAC techniques have been proposed to efficiently encode multichannel audio signals. However, all of them operate in open-loop, where the encoder and decoder operate sequentially and independently, and, thus, lack a mechanism for minimising the decoded audio reconstruction error. This thesis proposes a novel SAC technique that utilises the closed-loop system configuration, termed Analysis by Synthesis (AbS), in order to optimise the downmix: signal and the spatial parameters, so as to minimise the decoded signal error. In order to show the effect of the AbS optimisations, the Reverse One-To-Two (R-OTT) module, used in the MPEG Surround (MPS) , must first be applied in the frequency domain to recalculate the downmix and residual signals based on the quantised spatial parameters. These parameters show that the AbS scheme can minimise the quantisation errors of the spatial parameters. As the full AbS is far too complicated to be applied in practice, a simplified AbS algorithm for finding sub-optimal parameters, based on the adapted R-OTT module, is also proposed. Subjective tests show that the proposed Analysis by Synthesis Spatial Audio Coding (AbS-SAC), encoding 5-channel audio signals at a bitrate of 51.2 kb/s per audio channel, achieves higher Subjective Difference Grade (SDG) scores than the tested Advanced Audio Coding (AAC) technique. Furthermore, the objective test also shows that the proposed AbS-SAC method, operating at bitrates of 40 to 96 kb/s per audio channel, significantly outperforms (in terms of Objective Difference Grade (ODG) scores) the tested AAC multichannel technique.
APA, Harvard, Vancouver, ISO, and other styles
4

Wood, Steven Gregory. "Objective Test Methods for Waveguide Audio Synthesis." BYU ScholarsArchive, 2007. https://scholarsarchive.byu.edu/etd/853.

Full text
Abstract:
Acoustic Physical Modeling has emerged as a newer musical synthesis technique. The most common form of physical modeling synthesis in both industry and academia is digital waveguide synthesis. Commercially available for the past thirteen years, the top synthesizer manufacturers have chosen to include physical modeling synthesis in their top of the line models. In the area of audio quality testing, the most common tests have traditionally been group listening tests. While these tests are subjective and can be expensive and time-consuming, the results are validated by the groups' proper quality standards. Research has been conducted to evaluate objective testing procedures in order to find alternative methods for testing audio quality. This research has resulted in various standards approved by the International Telecommunication Union. Tests have proven the reliability of these objective test methods in the areas of telephony as well as various codecs, including MP3. The objective of this research is to determine whether objective test measurements can be used reliably in the area of acoustic physical modeling synthesis, specifically digital waveguide synthesis. Both the Perceptual Audio Quality Measure (PAQM) and Noise-To-Mask Ratio (NMR) objective tests will be performed on the Karplus-Strong algorithm form of Digital Waveguide synthesis. A corresponding listening test based on the Mean Opinion Score (MOS) will also be conducted, and the results from the objective and subjective tests will be compared. The results will show that more research and work needs to be done in this area, as neither the PAQM nor NMR algorithms sufficiently matched the output of the subjective listening tests. Recommendations will be made for future work.
APA, Harvard, Vancouver, ISO, and other styles
5

Ustun, Selen. "Audio browsing of automaton-based hypertext." Thesis, Texas A&M University, 2003. http://hdl.handle.net/1969.1/33.

Full text
Abstract:
With the wide-spread adoption of hypermedia systems and the World Wide Web (WWW) in particular, these systems have evolved from simple systems with only textual content to those that incorporate a large content base, which consists of a wide variety of document types. Also, with the increase in the number of users, there has grown a need for these systems to be accessible to a wider range of users. Consequently, the growth of the systems along with the number and variety of users require new presentation and navigation mechanisms for a wider audience. One of the new presentation methods is the audio-only presentation of hypertext content and this research proposes a novel solution to this problem for complex and dynamic systems. The hypothesis is that the proposed Audio Browser is an efficient tool for presenting hypertext in audio format, which will prove to be useful for several applications including browsers for visually-impaired and remote users. The Audio Browser provides audio-only browsing of contents in a Petri-based hypertext system called Context-Aware Trellis (caT). It uses a combination of synthesized speech and pre-recorded speech to allow its user to listen to contents of documents, follow links, and get information about the navigation process. It also has mechanisms for navigating within documents in order to allow users to view contents more quickly.
APA, Harvard, Vancouver, ISO, and other styles
6

Jehan, Tristan 1974. "Perceptual synthesis engine : an audio-driven timbre generator." Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/61543.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2001.
Includes bibliographical references (leaves 68-75).
A real-time synthesis engine which models and predicts the timbre of acoustic instruments based on perceptual features extracted from an audio stream is presented. The thesis describes the modeling sequence including the analysis of natural sounds, the inference step that finds the mapping between control and output parameters, the timbre prediction step, and the sound synthesis. The system enables applications such as cross-synthesis, pitch shifting or compression of acoustic instruments, and timbre morphing between instrument families. It is fully implemented in the Max/MSP environment. The Perceptual Synthesis Engine was developed for the Hyperviolin as a novel, generic and perceptually meaningful synthesis technique for non-discretely pitched instruments.
by Tristan Jehan.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
7

Payne, R. G. "Digital techniques for the analysis and synthesis of audio signals." Thesis, Bucks New University, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.234706.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Coulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Mémoire, Université de Sherbrooke, 2000. http://savoirs.usherbrooke.ca/handle/11143/1078.

Full text
Abstract:
Les objectifs de notre recherche s’exposent en deux grands points : 1) Explorer les techniques de codage param étrique à synthèse sinusoïdale et les appliquer aux signaux audio (principalement de musique). 2) Améliorer la qualité intrinsèque de ces modèles notamment au niveau des compromis temps/fréquence propres au codage par transformées. Nous avons comme méthodologie, effectué des simulations en « C » et en MATLAB des récents algorithmes de synthèse sinusoïdale, mais en nous inspirant en particulier du codeur MSLPC (Multisinusoid LPC) de Wen- Whei C, De-Yu W. et Li-Wei W. de l’Université Nationale Chiao-Tung de Taiwan (5). Ce mémoire contient quatre chapitres. Le Chapitre 1 présente une introduction et une mise en contexte. Le chapitre 2 présente un aperçu sur le codage paramétrique et l’intérêt de cette technique. Une présentation des types de codeurs paramétriques existants suivra. Le chapitre 3 est consacré à la description des différentes étapes parcourues dans la conception d’un codeur à synthèse sinusoïdale avec des méthodes récemment développées. Le chapitre 4 présente la conception et l’implantation rigoureuse du modèle ainsi qu'une présentation de notre compromis temps/fréquence proposée pour améliorer la qualité intrinsèque du codeur sinusoïdal. Dans ce chapitre 4, nous présentons aussi une évaluation informelle de la performance de notre modèle. Enfin nous terminerons ce mémoire par une conclusion.
APA, Harvard, Vancouver, ISO, and other styles
9

Coulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Sherbrooke : Université de Sherbrooke, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Andreux, Mathieu. "Foveal autoregressive neural time-series modeling." Electronic Thesis or Diss., Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE073.

Full text
Abstract:
Cette thèse s'intéresse à la modélisation non-supervisée de séries temporelles univariées. Nous abordons tout d'abord le problème de prédiction linéaire des valeurs futures séries temporelles gaussiennes sous hypothèse de longues dépendances, qui nécessitent de tenir compte d'un large passé. Nous introduisons une famille d'ondelettes fovéales et causales qui projettent les valeurs passées sur un sous-espace adapté au problème, réduisant ainsi la variance des estimateurs associés. Dans un deuxième temps, nous cherchons sous quelles conditions les prédicteurs non-linéaires sont plus performants que les méthodes linéaires. Les séries temporelles admettant une représentation parcimonieuse en temps-fréquence, comme celles issues de l'audio, réunissent ces conditions, et nous proposons un algorithme de prédiction utilisant une telle représentation. Le dernier problème que nous étudions est la synthèse de signaux audios. Nous proposons une nouvelle méthode de génération reposant sur un réseau de neurones convolutionnel profond, avec une architecture encodeur-décodeur, qui permet de synthétiser de nouveaux signaux réalistes. Contrairement à l'état de l'art, nous exploitons explicitement les propriétés temps-fréquence des sons pour définir un encodeur avec la transformée en scattering, tandis que le décodeur est entraîné pour résoudre un problème inverse dans une métrique adaptée
This dissertation studies unsupervised time-series modelling. We first focus on the problem of linearly predicting future values of a time-series under the assumption of long-range dependencies, which requires to take into account a large past. We introduce a family of causal and foveal wavelets which project past values on a subspace which is adapted to the problem, thereby reducing the variance of the associated estimators. We then investigate under which conditions non-linear predictors exhibit better performances than linear ones. Time-series which admit a sparse time-frequency representation, such as audio ones, satisfy those requirements, and we propose a prediction algorithm using such a representation. The last problem we tackle is audio time-series synthesis. We propose a new generation method relying on a deep convolutional neural network, with an encoder-decoder architecture, which allows to synthesize new realistic signals. Contrary to state-of-the-art methods, we explicitly use time-frequency properties of sounds to define an encoder with the scattering transform, while the decoder is trained to solve an inverse problem in an adapted metric
APA, Harvard, Vancouver, ISO, and other styles
11

Boyes, Graham. "Dictionary-based analysis/synthesis and structured representations of musical audio." Thesis, McGill University, 2012. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=106507.

Full text
Abstract:
In the representation of musical audio, it is common to favour either a signal or symbol interpretation, where mid-level representation is an emerging topic. In this thesis we investigate the perspective of structured, intermediate representations through an integration of theoretical aspects related to separable sound objects, dictionary-based methods of signal analysis, and object-oriented programming. In contrast to examples in the literature that approach an intermediate representation from the signal level, we orient our formulation towards the symbolic level. This methodology is applied to both the specification of analytical techniques and the design of a software framework. Experimental results demonstrate that our method is able to achieve a lower Itakura-Saito distance, a perceptually-motivated measure of spectral dissimilarity, when compared to a generic model and that our structured representation can be applied to visualization as well as agglomerative post-processing.
Dans la représentation du signal audio musical, il est commun de favoriser une interprétation de type signal ou bien de type symbole, alors que la représentation de type mi-niveau, ou intermédiaire, devient un sujet d'actualité. Dans cette thèse nous investiguons la perspective de ces représentations intermédiaires et structurées. Notre recherche intègre tant les aspects théoriques liés à des objets sonores séparables, que les méthodes d'analyse des signaux fondées sur des dictionnaires, et ce jusqu'à la conception de logiciels conus dans le cadre de la programmation orienté objet. Contrairement aux exemples disponibles dans la littérature notre approche des représentations intermédiaires part du niveau symbolique pour aller vers le signal, plutôt que le contraire. Cette méthodologie est appliquée non seulement à la spécification de techniques analytiques mais aussi à la conception d'un système logiciel afférent. Les résultats expérimentaux montrent que notre méthode est capable de réduire la distance d'Itakura-Saito, distance fondé sur la perception, ceci en comparaison à une méthode de décomposition générique. Nous montrons également que notre représentation structurée peut être utilisée dans des applications pratiques telles que la visualisation, l'agrégation post-traitement ainsi qu'en composition musicale.
APA, Harvard, Vancouver, ISO, and other styles
12

Rodgers, Tara. "Synthesizing sound: metaphor in audio-technical discourse and synthesis history." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=97090.

Full text
Abstract:
Synthesized sound is ubiquitous in contemporary music and aural environments around the world. Yet, relatively little has been written on its cultural origins and meanings. This dissertation constructs a long history of synthesized sound that examines the century before synthesizers were mass-produced in the 1970s, and attends to ancient and mythic themes that circulate in contemporary audio-technical discourse. Research draws upon archival materials including late-nineteenth and early-twentieth century acoustics texts, and inventors' publications, correspondence, and synthesizer product manuals from the 1940s through the 1970s. As a feminist history of synthesized sound, this project investigates how metaphors in audio-technical discourse are invested with notions of identity and difference. Through analyses of key concepts in the history of synthesized sound, I argue that audio-technical language and representation, which typically stands as neutral, in fact privileges the perspective of an archetypal Western, white, and male subject. I identify two primary metaphors for conceiving electronic sounds that were in use by the early-twentieth century and continue to inform sonic epistemologies: electronic sounds as waves, and electronic sounds as individuals. The wave metaphor, in circulation since ancient times, produces an affective orientation to audio technologies based on a masculinist and colonizing subject position, whereby the generation and control of electronic sound entails the pleasure and danger of navigating and taming unruly waves. The second metaphor took shape over the nineteenth century as sounds, like modern bodies and subjects, came to be understood as individual entities with varying properties to be analyzed and controlled. Notions of sonic individuation and variability emerged in the contexts of Darwinian thought and a cultural fascination with electricity as a kind of animating force. Practices of classifying sounds as individuals, sorted by desirable and undesirable aesthetic variations, were deeply entwined with epistemologies of gender and racial difference in Western philosophy and modern science. Synthesized sound also inherits other histories, including applications of the terms synthesis and synthetic in diverse cultural fields; designs of earlier mechanical and electronic devices; and developments in musical modernism and electronics hobbyist cultures. The long-term and broad perspective on synthesis history adopted in this study aims to challenge received truths in audio-technical discourse and resist the linear and coherent progress narratives often found in histories of technology and new media. This dissertation aims to make important contributions to fields of sound and media studies, which can benefit from feminist contributions generally and elaboration on forms and meanings of synthesis technologies specifically. Also, feminist scholars have extensively theorized visual cultures and technologies, with few extended investigations of sound and audio technologies. This project also aims to open up new directions in a field of feminist sound studies by historicizing notions of identity and difference in audio-technical discourse, and claiming the usefulness of sound to feminist thought.
Le son synthétique est omniprésent dans la musique contemporaine et dans l'environnement sonore à l'échelle mondiale. Cependant, on a relativement peu écrit sur sa signification ou sur ses origines culturelles. Cette thèse construit une longue histoire du son synthétique au cours du siècle avant que ne soit massivement introduit le synthétiseur dans les années 1970; et s'attache aux thèmes anciens et mythiques qui émanent dans le discours contemporain de la technique audio. Cette recherche s'appuie sur des documents d'archives, y compris ceux de la fin du xixe siècle et du début du xxe siècle, comprenant des textes acoustiques, des publications d'inventeurs, de la correspondance ou des manuels d'utilisation des synthétiseurs à partir des années 1940 jusqu'aux années 1970.En tant que récit féministe du son synthétique, ce projet étudie comment les métaphores dans le discours de la technique audio sont porteuses de notions d'identité et de différence. À travers l'analyse de concepts clés de l'histoire du son synthétique, j'affirme que le langage de la technique audio et sa représentation, qui passe habituellement pour neutre, privilégie en fait la perspective masculine, archétypale du sujet blanc occidental. J'identifie deux métaphores primaires pour la conception des sons électroniques qui ont été utilisés à l'aube du xxe siècle et qui contribuent sans cesse à une épistémologie du son: des sons électroniques comme des vagues et les sons électroniques en tant qu'individus. La métaphore des vagues, en circulation depuis l'aube des temps, est productrice d'un affect aux technologies audio, typiquement basé sur un point de vue masculin et colonisateur; où la création et le contrôle du son électronique entraîne le plaisir et le danger propre à la navigation sur une mer houleuse. La seconde métaphore a pris forme au cours du xixe siècle au moment où les sons, comme des organismes vivants modernes, sujets, se sont vus interprétés comme de véritables entités individuelles aux propriétés variables pouvant faire l'objet d'analyse et de contrôle. Les notions d'individuation et de variabilité sonore émergèrent dans le contexte d'une pensée Darwinienne, alors qu'une fascination culturelle pour l'électricité vue comme une sorte de puissance immuable, se forgeait. Les méthodes de classification des sons en tant qu'individus, triés en fonction de variations esthétiques désirables ou indésirables, ont été intimement liées aux épistémologies du sexe et de la différence raciale dans la philosophie occidentale et dans les sciences modernes. Le son électronique est aussi l'héritier d'autres histoires, incluant les usages de notions telles que synthèse ou synthétique dans divers champs culturels; le design des premiers dispositifs mécaniques et électroniques, ou encore l'évolution de la modernité musicale et le développement d'un public amateur de culture électronique. La perspective à long terme et le large spectre sur l'histoire de la synthèse musicale adoptée dans cette étude vise à contester les vérités reçues dans le discours ambiant de la technique audio et à résister à la progression d'histoires linéaires et cohérentes qu'on trouve encore trop souvent dans l'histoire de la technologie et des nouveaux médias. Cette thèse contribue d'une façon importante au domaine des études en son et médias, qui pourraient à leur tour bénéficier d'un apport féministe en général et plus spécifiquement de l'élaboration des formes et des significations des technologies de la synthèse musicale. En outre, si les universitaires féministes ont largement théorisé les nouvelles cultures technologiques ou visuelles, peu d'entre elles ont exploré le son et les techniques audio. Ce projet veut ouvrir de nouvelles voies dans un domaine d'études féministes du son dans une perspective historienne avec des notions d'identité et de différence dans le discours de la technique audio, tout en clamant l'utilité du son à une pensée féministe.
APA, Harvard, Vancouver, ISO, and other styles
13

Deena, Salil Prashant. "Visual speech synthesis by learning joint probabilistic models of audio and video." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/visual-speech-synthesis-by-learning-joint-probabilistic-models-of-audio-and-video(bdd1a78b-4957-469e-8be4-34e83e676c79).html.

Full text
Abstract:
Visual speech synthesis deals with synthesising facial animation from an audio representation of speech. In the last decade or so, data-driven approaches have gained prominence with the development of Machine Learning techniques that can learn an audio-visual mapping. Many of these Machine Learning approaches learn a generative model of speech production using the framework of probabilistic graphical models, through which efficient inference algorithms can be developed for synthesis. In this work, the audio and visual parameters are assumed to be generated from an underlying latent space that captures the shared information between the two modalities. These latent points evolve through time according to a dynamical mapping and there are mappings from the latent points to the audio and visual spaces respectively. The mappings are modelled using Gaussian processes, which are non-parametric models that can represent a distribution over non-linear functions. The result is a non-linear state-space model. It turns out that the state-space model is not a very accurate generative model of speech production because it assumes a single dynamical model, whereas it is well known that speech involves multiple dynamics (for e.g. different syllables) that are generally non-linear. In order to cater for this, the state-space model can be augmented with switching states to represent the multiple dynamics, thus giving a switching state-space model. A key problem is how to infer the switching states so as to model the multiple non-linear dynamics of speech, which we address by learning a variable-order Markov model on a discrete representation of audio speech. Various synthesis methods for predicting visual from audio speech are proposed for both the state-space and switching state-space models. Quantitative evaluation, involving the use of error and correlation metrics between ground truth and synthetic features, is used to evaluate our proposed method in comparison to other probabilistic models previously applied to the problem. Furthermore, qualitative evaluation with human participants has been conducted to evaluate the realism, perceptual characteristics and intelligibility of the synthesised animations. The results are encouraging and demonstrate that by having a joint probabilistic model of audio and visual speech that caters for the non-linearities in audio-visual mapping, realistic visual speech can be synthesised from audio speech.
APA, Harvard, Vancouver, ISO, and other styles
14

Fernández-Torné, Anna. "Audio description and technologies. Study on the semi-automatisation of the translation and voicing of audio descriptions." Doctoral thesis, Universitat Autònoma de Barcelona, 2016. http://hdl.handle.net/10803/394035.

Full text
Abstract:
Aquesta tesi explora l’aplicació de tecnologies en l’àmbit de l’audiodescripció per tal de semiautomatitzar-ne el procés. D’una banda, s’implementa la síntesi de parla en la locució de l’audiodescripció en català i, de l’altra, s’aplica la traducció automàtica amb postedició a audiodescripcions angleses per obtenir guions audiodescriptius en català. Quant a la síntesi de parla, s’avaluen veus naturals i veus sintètiques disponibles en català (5 de masculines i 5 de femenines per a cada categoria) mitjançant un qüestionari autoadministrat basat principalment en les escales de notes mitjanes d’opinió (Mean Opinion Score, MOS) de la recomanació P.85 de l’UIT-T per a l’avaluació subjectiva de la qualitat de la parla sintètica. Així, els participants avaluen les veus tenint en compte diferents ítems (impressió general, accentuació, pronunciació, pauses discursives, entonació, naturalitat, agradabilitat, esforç d’escolta i acceptació). Les veus que obtenen els millors resultats de cada categoria es fan servir llavors per avaluar la recepció per part de persones cegues o amb baixa visió d’audiodescripcions locutades amb veu sintètica en comparació amb audiodescripcions enregistrades amb veu natural. Tant les dades quantitatives com les qualitatives obtingudes mostren que les persones cegues o amb baixa visió prefereixen que l’audiodescripció es locuti amb veus naturals, més que no pas mitjançant sistemes de síntesi de parla, ja que les veus naturals obtenen puntuacions estadísticament superiors a les veus sintètiques. Tot i això, els usuaris finals accepten l’audiodescripció amb veu sintètica (94% dels participants) com a solució alternativa, i de fet un 20% dels subjectes sosté que la veu preferida de les quatre que s’avaluen és una de sintètica. Pel que fa a la traducció automàtica, s’avaluen cinc motors de traducció automàtica en línia i disponibles gratuïtament de l’anglès al català per tal de determinar quin és el més adequat per a l’audiodescripció. S’avaluen les versions traduïdes automàticament i l’esforç de postedició mitjançant vuit puntuacions diferents, que inclouen tant opinions humanes (temps, necessitat i dificultat de postedició, i adequació, fluïdesa i classificació de les versions traduïdes automàticament) i mesures automàtiques (HBLEU i HTER). Els resultats mostren que hi ha clares diferències pel que fa a qualitat entre els sistemes avaluats i que un (Google Translate) és el que obté millors puntuacions en sis de les vuit mesures emprades per a l’avaluació. Un cop seleccionat el motor que obté millors resultats, es compara l’esforç, tant objectiu com subjectiu, en tres situacions diferents: en la creació de zero d’una audiodescripció, en la traducció manual d’una audiodescripció, i en la postedició d’una audiodescripció traduïda automàticament. Els resultats indiquen que l’esforç objectiu de postedició és inferior que el de crear una audiodescripció ex novo i que traduir-la manualment, tot i que l’esforç subjectiu es percep com a superior en la tasca de postedició.
This PhD thesis explores the application of technologies to the audio description field with the aim to semi-automatise the process in two ways. On the one hand, text-tospeech is implemented to the voicing of audio description in Catalan and, on the other hand, machine translation with post-editing is applied to the English audio descriptions to obtain Catalan AD scripts. In relation to TTS, a selection of available synthetic and natural voices in Catalan (5 masculine ones and 5 feminine ones for each category) is assessed by means of a selfadministered questionnaire mainly based on the ITU-T P.85 Standard Mean Opinion Score (MOS) scales for the subjective assessment of the quality of synthetic speech. Thus, participants assess the voices taking into account various items (overall impression, accentuation, pronunciation, speech pauses, intonation, naturalness, pleasantness, listening effort, and acceptance). The voices obtaining the best scores for each category are then used to assess the reception of text-to-speech audio descriptions compared to human-voiced audio descriptions by blind and visually impaired persons. Both quantitative and qualitative data obtained show that the preferential choice of blind and partially sighted persons is the audio description voiced by a human, rather than by a speech synthesis system since natural voices obtain statistically higher scores than synthetic voices. However, TTS AD is accepted by end users (94% of the participants) as an alternative acceptable solution, and 20% of the respondents actually state that their preferred voice from the four under analysis is a synthetic one. As regards MT, a selection of five available free on-line machine translation engines from English into Catalan is evaluated in order to determine which is the most suitable for audio description. Their raw machine translation outputs and the post-editing effort involved are assessed using eight different scores, including human judgments (PE time, PE necessity, PE difficulty, MT output adequacy, MT output fluency and MT output ranking) and automatic metrics (HBLEU and HTER). The results show that there are clear quality differences among the systems assessed and that one of them (Google Translate) is the best rated in six out of the eight evaluation measures used. Once the best performing engine is selected, the effort, both objective and subjective, involved in three scenarios is compared: the effort of creating an audio description from scratch (AD creation), of manually translating an audio description (AD translation), and of post-editing a machine-translated audio description (AD PE). The results show that the objective post-editing effort is lower than creating an AD ex novo and manually translating it, although the subjective effort is perceived to be higher for the post-editing task.
APA, Harvard, Vancouver, ISO, and other styles
15

Heinrichs, Christian. "Human expressivity in the control and integration of computationally generated audio." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/33924.

Full text
Abstract:
While physics-based synthesis offers a wide range of benefits in the real-time generation of sound for interactive environments, it is difficult to incorporate nuanced and complex behaviour that enhances the sound in a narrative or aesthetic context. The work presented in this thesis explores real-time human performance as a means of stylistically augmenting computational sound models. Transdisciplinary in nature, this thesis builds upon previous work in sound synthesis, film sound theory and physical sound interaction. Two levels on which human performance can enhance the aesthetic value of computational models are investigated: first, in the real-time manipulation of an idiosyncratic parameter space to generate unique sound effects, and second, in the performance of physical source models in synchrony with moving images. In the former, various mapping techniques were evaluated to control a model of a creaking door based on a proposed extension of practical synthesis techniques. In the latter, audio post-production professionals with extensive experience in performing Foley were asked to perform the soundtrack to a physics-based animation using bespoke physical interfaces and synthesis engines. The generated dataset was used to gain insights into stylistic features afforded by performed sound synchronisation, and potential ways of integrating them into an interactive environment such as a game engine. Interacting with practical synthesis models that have extended to incorporate performability enables rapid generation of unique and expressive sound effects, while maintaining a believable source-sound relationship. Performatively authoring behaviours of sound models makes it possible to enhance the relationship between sound and image (both stylistically and perceptually) in ways precluded by one-to-one mappings between physics-based parameters. Mediation layers are required in order to facilitate performed behaviour: in the design of the model on one hand, and in the integration of such behaviours into interactive environments on the other. This thesis provides some examples of how such a system could be implemented. Furthermore, some interesting observations are made regarding the design of physical interfaces for performing environmental sound, and the creative exploitation of model constraints.
APA, Harvard, Vancouver, ISO, and other styles
16

Faller, Kenneth John II. "Automated synthesis of a reduced-parameter model for 3D digital audio." FIU Digital Commons, 1996. http://digitalcommons.fiu.edu/etd/3245.

Full text
Abstract:
Head-Related Impulse Responses (HRIRs) are used in signal processing to implement the synthesis of spatialized audio. They represent the modification that sound undergoes from its source to the listener's eardrums. HRIRs are somewhat different for each listener and require expensive specialized equipment for their individual measurement. Therefore, the development of a method to obtain customized HRIRs without specialized equipment is extremely desirable. A customizable representation of HRIRs can be created by modeling them in terms of an appropriate set of time delays and a resonant frequency. Previously, this was achieved manually, by trial and error. In this research an automated algorithm for the definition of the appropriate delays and resonant frequency needed to model an HRIR was developed, implemented and evaluated. This provides an objective, repeatable way to determine the parameters of the HRIR model. The automated process provided an average accuracy of 96.9% in the analysis of 2160 HRIRs.
APA, Harvard, Vancouver, ISO, and other styles
17

Strandberg, Carl. "Mediating Interactions in Games Using Procedurally Implemented Modal Synthesis : Do players prefer and choose objects with interactive synthetic sounds over objects with traditional sample based sounds?" Thesis, Luleå tekniska universitet, Institutionen för konst, kommunikation och lärande, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-68015.

Full text
Abstract:
Procedurally implemented synthetic audio could offer greater interactive potential for audio in games than the currently popular sample based approach does. At the same time, synthetic audio can reduce storage requirements that using sample based audio results in. This study examines these potentials, and looks at one game interaction in depth to gain knowledge around if players prefer and chooses objects with interactive sounds generated through procedurally implemented modal synthesis, over objects with traditionally implemented sample based sound. An in-game environment listening test was created where 20 subjects were asked to throw a ball, 35 times, at a wall to destroy wall tiles and reveal a message. For each throw they could select one of two balls; one ball had a modal synthesis sound that varied in pitch with how hard the ball was thrown, the other had a traditionally implemented sample based sound that did not correspond with how hard it was thrown but one of four samples was called at random. The subjects were then asked questions to evaluate how realistic they perceived the two versions to be, which they preferred, and how they perceived the sounds corresponding to interaction. The results show that the modal synthesis version is preferred and perceived as being more realistic than the sample based version, but wether this was a deciding factor in subjects’ choices could not be determined.
APA, Harvard, Vancouver, ISO, and other styles
18

Oger, Marie. "Model-based techniques for flexible speech and audio coding." Nice, 2007. http://www.theses.fr/2007NICE4109.

Full text
Abstract:
L’objectif de cette thèse est de développer des techniques de codage de parole et audio optimales et plus flexibles que avec l’état de l’art, pouvant s’adapter en temps réel à différentes contraintes (débit, largeur de bande, retard). Cette problématique est étudiée à l’aide de différents outils : modélisation statistique, théorie de la quantification à haut débit, codage entropique flexible. On propose d’abord une nouvelle technique de codage flexible des coefficients de prédiction linéaire (LPC) combinant une transformée de Karhumen-Loeve (KLT) et une quantification scalaire basée sur un modèle gaussien généralisé. Les performances sont équivalentes à celle du quantificateur utilisé dans l’AMR-WB. De plus la complexité est moindre. Puis, on propose deux techniques de codage audio par transformée flexible, l’une utilisant le codage « stack-run » et l’autre le codage par plans de bits basé modèle. Dans les deux cas, le signal après pondération perceptuelle et transformation discrète en cosinus modifié (MDCT) est modélisé par une distribution gaussienne généralisée qui sert à optimiser le codage. La qualité du codeur stack-run est meilleure que ITU-T G. 722. 1 à bas débit et équivalente à haut débit. Par contre, le codeur stack-run est plus complexe et son coût mémoire est faible. L’avantage du codage par plans de bits est d’être scalable en débit. Nous proposons d’utiliser le modèle gaussien généralisé afin d’initialiser les tables de probabilités du codage arithmétique utilisé dans le codage par plan de bits. La qualité associée est inférieure à celle du codeur stack-run à bas débit et équivalente à haut débit. Par contre, la complexité de calcul est proche de G. 722. 1
The objective of this thesis is to develop optimal speech and audio coding techniques which are more flexible than the state of the art and can adapt in real-time to various constraints (rate, bandwidth, delay). This problem is addressed using several tools : statistical models, high-rate quantization theory, flexible entropy coding. Firstly, a novel method of flexible coding for linear prediction coding (LPC) coefficients is proposed using Karhunen-Loeve transform (KLT) and scalar quantization based on generalized Gaussian modelling. This method has a performance equivalent to the LPC quantizer used in AMR-WB with a lower complexity. Then, two transform audio coding structures are proposed using either stack-run coding or model-based bit plane coding. In both case the coefficients after perceptual weighting and modified discrete cosine transform (MDCT) are approximated by a generalized Gaussian distribution. The coding of MDCT coefficients is optimized according to this model. The performance is compared with that of ITU-T G. 7222. 1. The stack-run coder is better than G. 7222. 1 at low bit rates and equivalent at high bit rates. However, the computational complexity of the proposed stack-run coder is higher and the memory requirement is low. The bit plane coder has the advantage of being bit rate scalable. The generalized Gaussian model is used to initialize the probability tables of an arithmetic coder. The bit plane coder is worse than stack-run coding at low bit rates and equivalent at high bit rates. It has a computational complexity close to G. 7222. 1 while memory requirement is still low
APA, Harvard, Vancouver, ISO, and other styles
19

Mosbruger, Michael C. "Alternative audio solution to enhance immersions in deployable synthetic environments." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03sep%5FMosbruger.pdf.

Full text
Abstract:
Thesis (M.S. in Modeling, Virtual Environments, and Simulation)--Naval Postgraduate School, September 2003.
Thesis advisor(s): Russell D. Shilling, Rudolph P. Darken. Includes bibliographical references (p. 169-172). Also available online.
APA, Harvard, Vancouver, ISO, and other styles
20

Kudumakis, Panos E. "Synthesis and coding of audio signals using wavelet transforms for multimedia applications." Thesis, King's College London (University of London), 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.343479.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Emnett, Keith Jeffrey 1973. "Synthetic News Radio : content filtering and delivery for broadcast audio news." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/61108.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1999.
Includes bibliographical references (p. 58-59).
Synthetic News Radio uses automatic speech recognition and clustered text news stories to automatically find story boundaries in an audio news broadcast, and it creates semantic representations that can match stories of similar content through audio-based queries. Current speech recognition technology cannot by itself produce enough information to accurately characterize news audio; therefore, the clustered text stories represent a knowledge base of relevant news topics that the system can use to combine recognition transcripts of short, intonational phrases into larger, complete news stories. Two interface mechanisms, a graphical desktop application and a touch-tone drive phone interface, allow quick and efficient browsing of the new structured news broadcasts. The system creates a personal, synthetic newscast by extracting stories, based on user interests, from multiple hourly newscasts and then reassembling them into a single recording at the end of the day. The system also supports timely delivery of important stories over a LAN or to a wireless audio pager. This thesis describes the design and evaluation of the news segmentation and content matching technology, and evaluates the effectiveness of the interface and delivery mechanisms.
by Keith Jeffrey Emnett.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
22

Mitchell, Thomas James. "An exploration of evolutionary computation applied to frequency modulation audio synthesis parameter optimisation." Thesis, University of the West of England, Bristol, 2010. http://eprints.uwe.ac.uk/18265/.

Full text
Abstract:
With the ever-increasing complexity of sound synthesisers, there is a growing demand for automated parameter estimation and sound space navigation techniques. This thesis explores the potential for evolutionary computation to automatically map known sound qualities onto the parameters of frequency modulation synthesis. Within this exploration are original contributions in the domain of synthesis parameter estimation and, within the developed system, evolutionary computation, in the form of the evolutionary algorithms that drive the underlying optimisation process. Based upon the requirement for the parameter estimation system to deliver multiple search space solutions, existing evolutionary algorithmic architectures are augmented to enable niching, while maintaining the strengths of the original algorithms. Two novel evolutionary algorithms are proposed in which cluster analysis is used to identify and maintain species within the evolving populations. A conventional evolution strategy and cooperative coevolution strategy are defined, with cluster-orientated operators that enable the simultaneous optimisation of multiple search space solutions at distinct optima. A test methodology is developed that enables components of the synthesis matching problem to be identified and isolated, enabling the performance of different optimisation techniques to be compared quantitatively. A system is consequently developed that evolves sound matches using conventional frequency modulation synthesis models, and the effectiveness of different evolutionary algorithms is assessed and compared in application to both static and timevarying sound matching problems. Performance of the system is then evaluated by interview with expert listeners. The thesis is closed with a reflection on the algorithms and systems which have been developed, discussing possibilities for the future of automated synthesis parameter estimation techniques, and how they might be employed.
APA, Harvard, Vancouver, ISO, and other styles
23

Bailey, Nicholas James. "On the synthesis and processing of high quality audio signals by parallel computers." Thesis, Durham University, 1991. http://etheses.dur.ac.uk/6285/.

Full text
Abstract:
This work concerns the application of new computer architectures to the creation and manipulation of high-quality audio bandwidth signals. The configuration of both the hardware and software in such systems falls under consideration in the three major sections which present increasing levels of algorithmic concurrency. In the first section, the programs which are described are distributed in identical copies across an array of processing elements; these programs run autonomously, generating data independently, but with control parameters peculiar to each copy: this type of concurrency is referred to as isonomic}The central section presents a structure which distributes tasks across an arbitrary network of processors; the flow of control in such a program is quasi- indeterminate, and controlled on a demand basis by the rate of completion of the slave tasks and their irregular interaction with the master. Whilst that interaction is, in principle, deterministic, it is also data-dependent; the dynamic nature of task allocation demands that no a priori knowledge of the rate of task completion be required. This type of concurrency is called dianomic? Finally, an architecture is described which will support a very high level of algorithmic concurrency. The programs which make efficient use of such a machine are designed not by considering flow of control, but by considering flow of data. Each atomic algorithmic unit is made as simple as possible, which results in the extensive distribution of a program over very many processing elements. Programs designed by considering only the optimum data exchange routes are said to exhibit systolic^ concurrency. Often neglected in the study of system design are those provisions necessary for practical implementations. It was intended to provide users with useful application programs in fulfilment of this study; the target group is electroacoustic composers, who use digital signal processing techniques in the context of musical composition. Some of the algorithms in use in this field are highly complex, often requiring a quantity of processing for each sample which exceeds that currently available even from very powerful computers. Consequently, applications tend to operate not in 'real-time' (where the output of a system responds to its input apparently instantaneously), but by the manipulation of sounds recorded digitally on a mass storage device. The first two sections adopt existing, public-domain software, and seek to increase its speed of execution significantly by parallel techniques, with the minimum compromise of functionality and ease of use. Those chosen are the general- purpose direct synthesis program CSOUND, from M.I.T., and a stand-alone phase vocoder system from the C.D.P.(^4). In each case, the desired aim is achieved: to increase speed of execution by two orders of magnitude over the systems currently in use by composers. This requires substantial restructuring of the programs, and careful consideration of the best computer architectures on which they are to run concurrently. The third section examines the rationale behind the use of computers in music, and begins with the implementation of a sophisticated electronic musical instrument capable of a degree of expression at least equal to its acoustic counterparts. It seems that the flexible control of such an instrument demands a greater computing resource than the sound synthesis part. A machine has been constructed with the intention of enabling the 'gestural capture' of performance information in real-time; the structure of this computer, which has one hundred and sixty high-performance microprocessors running in parallel, is expounded; and the systolic programming techniques required to take advantage of such an array are illustrated in the Occam programming language.
APA, Harvard, Vancouver, ISO, and other styles
24

Yong, Louisa Chung-Sze. "An Internet-based audio synthesis resource : a case study in Manchester and Salford." Thesis, University of Salford, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365975.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Bowler, I. "Digital techniques in the storage and processing of audio waveforms for music synthesis." Thesis, Bucks New University, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.373583.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Sini, Aghilas. "Caractérisation et génération de l’expressivité en fonction des styles de parole pour la construction de livres audio." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S026.

Full text
Abstract:
Dans ces travaux de thèse nous abordons l'expressivité de la parole lue avec un type de données particulier qui sont les livres audio. Les livres audio sont des enregistrements audio d’œuvres littéraires fait par des professionnels (des acteurs, des chanteurs, des narrateurs professionnels) ou par des amateurs. Ces enregistrements peuvent être destinés à un public particulier (aveugles ou personnes mal voyantes). La disponibilité de ce genre de données en grande quantité avec une assez bonne qualité a attiré l'attention de la communauté scientifique en traitement automatique du langage et de la parole en général, ainsi que des chercheurs spécialisés dans la synthèse de parole expressive. Pour explorer ce vaste champ d'investigation qui est l'expressivité, nous proposons dans cette thèse d'étudier trois entités élémentaires de l'expressivité qui sont véhiculées par les livres audio: l'émotion, les variations liées aux changements discursifs et les propriétés du locuteur. Nous traitons ces patrons d'un point de vue prosodique. Les principales contributions de cette thèse sont la construction d'un corpus de livres audio comportant un nombre important d'enregistrements partiellement annotés par un expert, une étude quantitative caractérisant les émotions dans ce type de données, la construction de modèles basés sur des techniques d'apprentissage automatique pour l'annotation automatique de types de discours et enfin nous proposons une représentation vectorielle de l'identité prosodique d'un locuteur dans le cadre de la synthèse statistique paramétrique de la parole
In this thesis, we study the expressivity of read speech with a particular type of data, which are audiobooks. Audiobooks are audio recordings of literary works made by professionals (actors, singers, professional narrators) or by amateurs. These recordings may be intended for a particular audience (blind or visually impaired people). The availability of this kind of data in large quantities with a good enough quality has attracted the attention of the research community in automatic speech and language processing in general and of researchers specialized in expressive speech synthesis systems. We propose in this thesis to study three elementary entities of expressivity that are conveyed by audiobooks: emotion, variations related to discursive changes, and speaker properties. We treat these patterns from a prosodic point of view. The main contributions of this thesis are: the construction of a corpus of audiobooks with a large number of recordings partially annotated by an expert, a quantitative study characterizing the emotions in this type of data, the construction of a model based on automatic learning techniques for the automatic annotation of discourse types and finally we propose a vector representation of the prosodic identity of a speaker in the framework of parametric statistical speech synthesis
APA, Harvard, Vancouver, ISO, and other styles
27

Cobos, Serrano Máximo. "Application of sound source separation methods to advanced spatial audio systems." Doctoral thesis, Universitat Politècnica de València, 2010. http://hdl.handle.net/10251/8969.

Full text
Abstract:
This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.
Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969
Palancia
APA, Harvard, Vancouver, ISO, and other styles
28

Polotti, Pietro. "Fractal additive synthesis : spectral modeling of sound for low rate coding of quality audio /." [S.l.] : [s.n.], 2003. http://library.epfl.ch/theses/?nr=2711.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Bürger, Michael [Verfasser]. "On the Analysis and Synthesis of Local Sound Fields for Personal Audio / Michael Bürger." München : Verlag Dr. Hut, 2019. http://d-nb.info/1202169015/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Hansjons, Vegeborn Victor. "LjudMAP: A Visualization Tool for Exploring Audio Collections with Real-Time Concatenative Synthesis Capabilities." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277831.

Full text
Abstract:
This thesis presents the software tool “LjudMAP," which fuses techniques of music informatics and unsupervised machine learning methods to assist in the exploration of audio collections. LjudMAP builds on concepts of the software tool, "Temporally Disassembled Audio," which was developed to enable fast browsing of recorded speech material. LjudMAP is intended instead for analysis and real-time composition of electroacoustic music, and is programmed in a way that can include more audio features. This thesis presents investigations into how LjudMAP can be used for identifying similarities and clusters within audio collections. A key contribution is the coagulation of clusters of sound based on principles of proximity in time and feature space. The thesis also shows how LjudMAP can be used for composition, with several demonstrations provided by one electroacoustic composer with a variety of sound materials. The source code for LjudMAP is available at: https://github.com/victorwegeborn/LjudMAP.
I detta examensarbete presenteras mjukvaruverktyget ”LjudMAP,” som sam- manfogar tekniker i musikinformatik och oövervakade maskininlärningsmetoder för att bistå i utforskande av ljudsamlingar. LjudMAP bygger på koncepten som återfinns i ”Temporally Disassembled Audio” som är framtaget för att möjliggöra snabbt bläddrande i ljudupptagningar av tal. LjudMAP är istället avsedd för analys och realtidskomposition av elektroakustik musik, och är programmerad på ett sätt som kan inkludera fler ljuddeskriptorer. I examensarbetet presenteras undersökningar i hur LjudMAP kan användas för att identifiera likheter och kluster av ljud inom ljudsamlingar. Ett viktigt bidrag är koagulering av kluster av ljud baserat på principer för närhet i tids- och funktionsrymden. Examensarbetet visar också hur LjudMAP kan användas för komposition genom flera demonstrationer utförda av en elektroakustisk kompositör som använt sig av olika ljudkällor. Källkoden för LjudMAP finns tillgängligt vid: https://github.com/victorwegeborn AP.
APA, Harvard, Vancouver, ISO, and other styles
31

Giannakis, Konstantinos. "Sound mosaics : a graphical user interface for sound synthesis based on audio-visual associations." Thesis, Middlesex University, 2001. http://eprints.mdx.ac.uk/6634/.

Full text
Abstract:
This thesis presents the design of a Graphical User Interface (GUI) for computer-based sound synthesis to support users in the externalisation of their musical ideas when interacting with the System in order to create and manipulate sound. The approach taken consisted of three research stages. The first stage was the formulation of a novel visualisation framework to display perceptual dimensions of sound in Visual terms. This framework was based on the findings of existing related studies and a series of empirical investigations of the associations between auditory and visual precepts that we performed for the first time in the area of computer-based sound synthesis. The results of our empirical investigations suggested associations between the colour dimensions of brightness and saturation with the auditory dimensions of pitch and loudness respectively, as well as associations between the multidimensional precepts of visual texture and timbre. The second stage of the research involved the design and implementation of Sound Mosaics, a prototype GUI for sound synthesis based on direct manipulation of visual representations that make use of the visualisation framework developed in the first stage. We followed an iterative design approach that involved the design and evaluation of an initial Sound Mosaics prototype. The insights gained during this first iteration assisted us in revising various aspects of the original design and visualisation framework that led to a revised implementation of Sound Mosaics. The final stage of this research involved an evaluation study of the revised Sound Mosaics prototype that comprised two controlled experiments. First, a comparison experiment with the widely used frequency-domain representations of sound indicated that visual representations created with Sound Mosaics were more comprehensible and intuitive. Comprehensibility was measured as the level of accuracy in a series of sound image association tasks, while intuitiveness was related to subjects' response times and perceived levels of confidence. Second, we conducted a formative evaluation of Sound Mosaics, in which it was exposed to a number of users with and without musical background. Three usability factors were measured: effectiveness, efficiency, and subjective satisfaction. Sound Mosaics was demonstrated to perform satisfactorily in ail three factors for music subjects, although non-music subjects yielded less satisfactory results that can be primarily attributed to the subjects' unfamiliarity with the task of sound synthesis. Overall, our research has set the necessary groundwork for empirically derived and validated associations between auditory and visual dimensions that can be used in the design of cognitively useful GUIs for computer-based sound synthesis and related area.
APA, Harvard, Vancouver, ISO, and other styles
32

Olivero, Anaik. "Les multiplicateurs temps-fréquence : Applications à l’analyse et la synthèse de signaux sonores et musicaux." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM4788/document.

Full text
Abstract:
Cette thèse s'inscrit dans le contexte de l'analyse/transformation/synthèse des signaux audio utilisant des représentations temps-fréquence, de type transformation de Gabor. Dans ce contexte, la complexité des transformations permettant de relier des sons peut être modélisée au moyen de multiplicateurs de Gabor, opérateurs de signaux linéaires caractérisés par une fonction de transfert temps-fréquence, à valeurs complexes, que l'on appelle masque de Gabor. Les multiplicateurs de Gabor permettent deformaliser le concept de filtrage dans le plan temps-fréquence. En agissant de façon multiplicative dans le plan temps-fréquence, ils sont a priori bien adaptés pour réaliser des transformations sonores telles que des modifications de timbre des sons. Dans un premier temps, ce travail de thèses intéresse à la modélisation du problème d'estimation d'un masque de Gabor entre deux signaux donnés et la mise en place de méthodes de calculs efficaces permettant de résoudre le problème. Le multiplicateur de Gabor entre deux signaux n'est pas défini de manière unique et les techniques d'estimation proposées de construire des multiplicateurs produisant des signaux sonores de qualité satisfaisante. Dans un second temps, nous montrons que les masques de Gabor contiennent une information pertinente capable d'établir une classification des signaux,et proposons des stratégies permettant de localiser automatiquement les régions temps-fréquence impliquées dans la différentiation de deux classes de signaux. Enfin, nous montrons que les multiplicateurs de Gabor constituent tout un panel de transformations sonores entre deux sons, qui, dans certaines situations, peuvent être guidées par des descripteurs de timbre
Analysis/Transformation/Synthesis is a generalparadigm in signal processing, that aims at manipulating or generating signalsfor practical applications. This thesis deals with time-frequencyrepresentations obtained with Gabor atoms. In this context, the complexity of a soundtransformation can be modeled by a Gabor multiplier. Gabormultipliers are linear diagonal operators acting on signals, andare characterized by a time-frequency transfer function of complex values, called theGabor mask. Gabor multipliers allows to formalize the conceptof filtering in the time-frequency domain. As they act by multiplying in the time-frequencydomain, they are "a priori'' well adapted to producesound transformations like timbre transformations. In a first part, this work proposes to model theproblem of Gabor mask estimation between two given signals,and provides algorithms to solve it. The Gabor multiplier between two signals is not uniquely defined and the proposed estimationstrategies are able to generate Gabor multipliers that produce signalswith a satisfied sound quality. In a second part, we show that a Gabor maskcontain a relevant information, as it can be viewed asa time-frequency representation of the difference oftimbre between two given sounds. By averaging the energy contained in a Gabor mask, we obtain a measure of this difference that allows to discriminate different musical instrumentsounds. We also propose strategies to automaticallylocalize the time-frequency regions responsible for such a timbre dissimilarity between musicalinstrument classes. Finally, we show that the Gabor multipliers can beused to construct a lot of sounds morphing trajectories,and propose an extension
APA, Harvard, Vancouver, ISO, and other styles
33

Liuni, Marco. "Adaptation Automatique de la Résolution pour l'Analyse et la Synthèse du Signal Audio." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00773550.

Full text
Abstract:
Dans cette thèse, on s'intéresse à des méthodes qui permettent de varier localement la résolution temps-fréquence pour l'analyse et la re-synthèse du son. En Analyse Temps-Fréquence, l'adaptativité est la possibilité de concevoir de représentations et opérateurs avec des caractéristiques qui peuvent être modifiées en fonction des objets à analyser: le premier objectif de ce travail est la définition formelle d'un cadre mathématique qui puisse engendrer des méthodes adaptatives pour l'analyse du son. Le deuxième est de rendre l'adaptation automatique; on établit des critères pour définir localement la meilleure résolution temps-fréquence, en optimisant des mesures de parcimonie appropriées. Afin d'exploiter l'adaptativité dans le traitement spectral du son, on introduit des méthodes de reconstruction efficaces, basées sur des analyses à résolution variable, conçues pour préserver et améliorer les techniques actuelles de manipulation du son. L'idée principale est que les algorithmes adaptatifs puissent contribuer à la simplification de l'utilisation de méthodes de traitement du son qui nécessitent aujourd'hui un haut niveau d'expertise. En particulier, la nécessité d'une configuration manuelle détaillée constitue une limitation majeure dans les applications grand public de traitement du son de haute qualité (par exemple: transposition, compression/dilatation temporelle). Nous montrons des exemples où la gestion automatique de la résolution temps-fréquence permet non seulement de réduire significativement les paramètres à régler, mais aussi d'améliorer la qualité des traitements.
APA, Harvard, Vancouver, ISO, and other styles
34

Lostanlen, Vincent. "Opérateurs convolutionnels dans le plan temps-fréquence." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEE012/document.

Full text
Abstract:
Dans le cadre de la classification de sons,cette thèse construit des représentations du signal qui vérifient des propriétés d’invariance et de variabilité inter-classe. D’abord, nous étudions le scattering temps- fréquence, une représentation qui extrait des modulations spectrotemporelles à différentes échelles. Enclassification de sons urbains et environnementaux, nous obtenons de meilleurs résultats que les réseaux profonds à convolutions et les descripteurs à court terme. Ensuite, nous introduisons le scattering en spirale, une représentation qui combine des transformées en ondelettes selon le temps, selon les log-fréquences, et à travers les octaves. Le scattering en spirale suit la géométrie de la spirale de Shepard, qui fait un tour complet à chaque octave. Nous étudions les sons voisés avec un modèle source-filtre non stationnaire dans lequel la source et le filtre sont transposés au cours du temps, et montrons que le scattering en spirale sépare et linéarise ces transpositions. Le scattering en spirale améliore lesperformances de l’état de l’art en classification d’instruments de musique. Outre la classification de sons, le scattering temps-fréquence et le scattering en spirale peuvent être utilisés comme des descripteurspour la synthèse de textures audio. Contrairement au scattering temporel, le scattering temps-fréquence est capable de capturer la cohérence de motifs spectrotemporels en bioacoustique et en parole, jusqu’à une échelle d’intégration de 500 ms environ. À partir de ce cadre d’analyse-synthèse, une collaboration artscience avec le compositeur Florian Hecker
This dissertation addresses audio classification by designing signal representations which satisfy appropriate invariants while preserving inter-class variability. First, we study time-frequencyscattering, a representation which extract modulations at various scales and rates in a similar way to idealized models of spectrotemporal receptive fields in auditory neuroscience. We report state-of-the-artresults in the classification of urban and environmental sounds, thus outperforming short-term audio descriptors and deep convolutional networks. Secondly, we introduce spiral scattering, a representationwhich combines wavelet convolutions along time, along log-frequency, and across octaves. Spiral scattering follows the geometry of the Shepard pitch spiral, which makes a full turn at every octave. We study voiced sounds with a nonstationary sourcefilter model where both the source and the filter are transposed through time, and show that spiral scattering disentangles and linearizes these transpositions. Furthermore, spiral scattering reaches state-of-the-art results in musical instrument classification ofsolo recordings. Aside from audio classification, time-frequency scattering and spiral scattering can be used as summary statistics for audio texture synthesis. We find that, unlike the previously existing temporal scattering transform, time-frequency scattering is able to capture the coherence ofspectrotemporal patterns, such as those arising in bioacoustics or speech, up to anintegration scale of about 500 ms. Based on this analysis-synthesis framework, an artisticcollaboration with composer Florian Hecker has led to the creation of five computer music
APA, Harvard, Vancouver, ISO, and other styles
35

Musti, Utpala. "Synthèse acoustico-visuelle de la parole par sélection d'unités bimodales." Thesis, Université de Lorraine, 2013. http://www.theses.fr/2013LORR0003.

Full text
Abstract:
Ce travail porte sur la synthèse de la parole audio-visuelle. Dans la littérature disponible dans ce domaine, la plupart des approches traite le problème en le divisant en deux problèmes de synthèse. Le premier est la synthèse de la parole acoustique et l'autre étant la génération d'animation faciale correspondante. Mais, cela ne garantit pas une parfaite synchronisation et cohérence de la parole audio-visuelle. Pour pallier implicitement l'inconvénient ci-dessus, nous avons proposé une approche de synthèse de la parole acoustique-visuelle par la sélection naturelle des unités synchrones bimodales. La synthèse est basée sur le modèle de sélection d'unité classique. L'idée principale derrière cette technique de synthèse est de garder l'association naturelle entre la modalité acoustique et visuelle intacte. Nous décrivons la technique d'acquisition de corpus audio-visuelle et la préparation de la base de données pour notre système. Nous présentons une vue d'ensemble de notre système et nous détaillons les différents aspects de la sélection d'unités bimodales qui ont besoin d'être optimisées pour une bonne synthèse. L'objectif principal de ce travail est de synthétiser la dynamique de la parole plutôt qu'une tête parlante complète. Nous décrivons les caractéristiques visuelles cibles que nous avons conçues. Nous avons ensuite présenté un algorithme de pondération de la fonction cible. Cet algorithme que nous avons développé effectue une pondération de la fonction cible et l'élimination de fonctionnalités redondantes de manière itérative. Elle est basée sur la comparaison des classements de coûts cible et en se basant sur une distance calculée à partir des signaux de parole acoustiques et visuels dans le corpus. Enfin, nous présentons l'évaluation perceptive et subjective du système de synthèse final. Les résultats montrent que nous avons atteint l'objectif de synthétiser la dynamique de la parole raisonnablement bien
This work deals with audio-visual speech synthesis. In the vast literature available in this direction, many of the approaches deal with it by dividing it into two synthesis problems. One of it is acoustic speech synthesis and the other being the generation of corresponding facial animation. But, this does not guarantee a perfectly synchronous and coherent audio-visual speech. To overcome the above drawback implicitly, we proposed a different approach of acoustic-visual speech synthesis by the selection of naturally synchronous bimodal units. The synthesis is based on the classical unit selection paradigm. The main idea behind this synthesis technique is to keep the natural association between the acoustic and visual modality intact. We describe the audio-visual corpus acquisition technique and database preparation for our system. We present an overview of our system and detail the various aspects of bimodal unit selection that need to be optimized for good synthesis. The main focus of this work is to synthesize the speech dynamics well rather than a comprehensive talking head. We describe the visual target features that we designed. We subsequently present an algorithm for target feature weighting. This algorithm that we developed performs target feature weighting and redundant feature elimination iteratively. This is based on the comparison of target cost based ranking and a distance calculated based on the acoustic and visual speech signals of units in the corpus. Finally, we present the perceptual and subjective evaluation of the final synthesis system. The results show that we have achieved the goal of synthesizing the speech dynamics reasonably well
APA, Harvard, Vancouver, ISO, and other styles
36

Bleda, Pérez Sergio. "Contribuciones a la implementación de sistemas Wave Field Synthesis." Doctoral thesis, Universitat Politècnica de València, 2009. http://hdl.handle.net/10251/6685.

Full text
Abstract:
De entre los sistemas de reproducción de sonido 3D, Wavefield Synthesis (WFS) presenta una serie de ventajas sobre el resto, principalmente en lo que respecta al gran realismo y sensación de inmersión acústica que proporciona. Otra gran ventaja adicional, es que la zona útil de escucha es muy amplia, superando al resto de sistemas disponibles en la actualidad. La teoría de WFS fue propuesta a finales de los 80 y principios de los 90, no siendo hasta el siglo XXI cuando se han puesto en marcha los primeros prototipos de estos sistemas, aunque muchos aspectos no contemplados en la teoría inicial siguen siendo en la actualidad retos importantes. La presente tesis aborda el estudio de la implementación de los sistemas de WFS aportando soluciones prácticas a las limitaciones tecnológicas que presentan estos sistemas, así como otra serie de problemas de implementación y funcionamiento en tiempo real que, aunque en una primera instancia no se describen como limitaciones físicas, suponen un problema a superar cuando se busca un sistema que funcione eficientemente. El objetivo final de esta tesis es aportar soluciones que contribuyan al desarrollo de un sistema de WFS totalmente funcional, por lo que durante su desarrollo ha sido necesario encontrar soluciones particulares y originales a multitud de problemas de diferente índole. Esta serie de problemas proviene por un lado de las limitaciones físicas de WFS y por otro de la implementación práctica del sistema. Por otro lado también se ha trabajo en los aspectos computacionales relacionados con la implementación en tiempo real de sistemas de WFS, los cuales necesitan una gran potencia de cálculo para dicho funcionamiento en tiempo real sin cortes ni grandes latencias. Este último se ha tratado de forma rigurosa dedicando un capítulo completo para su análisis y propuesta de soluciones eficientes y efectivas en coste.
Bleda Pérez, S. (2009). Contribuciones a la implementación de sistemas Wave Field Synthesis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/6685
Palancia
APA, Harvard, Vancouver, ISO, and other styles
37

Jackson, Judith. "Generative Processes for Audification." Oberlin College Honors Theses / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=oberlin1528280288385596.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Vaiapury, Karthikeyan. "Model based 3D vision synthesis and analysis for production audit of installations." Thesis, Queen Mary, University of London, 2013. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8721.

Full text
Abstract:
One of the challenging problems in the aerospace industry is to design an automated 3D vision system that can sense the installation components in an assembly environment and check certain safety constraints are duly respected. This thesis describes a concept application to aid a safety engineer to perform an audit of a production aircraft against safety driven installation requirements such as segregation, proximity, orientation and trajectory. The capability is achieved using the following steps. The initial step is to perform image capture of a product and measurement of distance between datum points within the product with/without reference to a planar surface. This provides the safety engineer a means to perform measurements on a set of captured images of the equipment they are interested in. The next step is to reconstruct the digital model of fabricated product by using multiple captured images to reposition parts according to the actual model. Then, the projection onto the 3D digital reconstruction of the safety related installation constraints, respecting the original intent of the constraints that are defined in the digital mock up is done. The differences between the 3D reconstruction of the actual product and the design time digital mockup of the product are identified. Finally, the differences/non conformances that have a relevance to safety driven installation requirements with reference to the original safety requirement intent are identified. The above steps together give the safety engineer the ability to overlay a digital reconstruction that should be as true to the fabricated product as possible so that they can see how the product conforms or doesn't conform to the safety driven installation requirements. The work has produced a concept demonstrator that will be further developed in future work to address accuracy, work flow and process efficiency. A new depth based segmentation technique GrabcutD which is an improvement to existing Grabcut, a graph cut based segmentation method is proposed. Conventional Grabcut relies only on color information to achieve segmentation. However, in stereo or multiview analysis, there is additional information that could be also used to improve segmentation. Clearly, depth based approaches bear the potential discriminative power of ascertaining whether the object is nearer of farer. We show the usefulness of the approach when stereo information is available and evaluate it using standard datasets against state of the art result.
APA, Harvard, Vancouver, ISO, and other styles
39

Somasundaram, Arunachalam. "A facial animation model for expressive audio-visual speech." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1148973645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Liudkevich, Denis. "Návrh virtuálního síťového kolaborativního zvukového nástroje." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413250.

Full text
Abstract:
The aim of this work was to create an online platform for multi-user sound creation with original sound synthesis tools. The educational context of the application was also taken into account by hiding the controls of the sound parameters behind the subconsciously known physical phenomena and the game form of the application. A substantial part of the logic and all graphics of the instruments is written in the JavaScript programming language and its library p5.js. It is located on the client side and communicates with the Node.js-based server via a web socket. The audio part is on another server in the SuperCollider environment, it is transmitted via IceCast and communicates with the main OSC message server. The application contains 3 instruments for generating sounds and one effects module. Each instrument is designed for multiple users and requires their cooperation. Acceptable transmission speeds and minimum computational demands have been achieved by optimizing the instrument's internal algorithms, the way in which the graphic content is displayed and the appropriate routing of the individual sound modules. The sound is specific for each instrument. The instruments in the application are tuned and designed so that the user can both achieve interesting sound results himself and play his role as a whole with others. Methods such as granular synthesis, chaotic oscillators, string instrument modeling, filter combinations, and so on are used to generate sound. Great emphasis in the development of the application was placed on the separation of roles, simultaneous control of one instrument by several players and communication of users through playing the instruments and text expression - chat. An important part is also a block for displaying descriptive information.
APA, Harvard, Vancouver, ISO, and other styles
41

Moulin, Samuel. "Quel son spatialisé pour la vidéo 3D ? : influence d'un rendu Wave Field Synthesis sur l'expérience audio-visuelle 3D." Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015PA05H102/document.

Full text
Abstract:
Le monde du divertissement numérique connaît depuis plusieurs années une évolution majeure avec la démocratisation des technologies vidéo 3D. Il est désormais commun de visualiser des vidéos stéréoscopiques sur différents supports : au cinéma, à la télévision, dans les jeux vidéos, etc. L'image 3D a considérablement évolué mais qu'en est-il des technologies de restitution sonore associées ? La plupart du temps, le son qui accompagne la vidéo 3D est basé sur des effets de latéralisation, plus au moins étendus (stéréophonie, systèmes 5.1). Il est pourtant naturel de s'interroger sur le besoin d'introduire des événements sonores en lien avec l'ajout de cette nouvelle dimension visuelle : la profondeur. Plusieurs technologies semblent pouvoir offrir une description sonore 3D de l'espace (technologies binaurales, Ambisonics, Wave Field Synthesis). Le recours à ces technologies pourrait potentiellement améliorer la qualité d'expérience de l'utilisateur, en termes de réalisme tout d'abord grâce à l'amélioration de la cohérence spatiale audio-visuelle, mais aussi en termes de sensation d'immersion. Afin de vérifier cette hypothèse, nous avons mis en place un système de restitution audio-visuelle 3D proposant une présentation visuelle stéréoscopique associée à un rendu sonore spatialisé par Wave Field Synthesis. Trois axes de recherche ont alors été étudiés : 1 / Perception de la distance en présentation unimodale ou bimodale. Dans quelle mesure le système audio-visuel est-il capable de restituer des informations spatiales relatives à la distance, dans le cas d'objets sonores, visuels, ou audio-visuels ? Les expériences menées montrent que la Wave Field Synthesis permet de restituer la distance de sources sonores virtuelles. D'autre part, les objets visuels et audio-visuels sont localisés avec plus de précisions que les objets uniquement sonores. 2 / Intégration multimodale suivant la distance. Comment garantir une perception spatiale audio-visuelle cohérente de stimuli simples ? Nous avons mesuré l'évolution de la fenêtre d'intégration spatiale audio-visuelle suivant la distance, c'est-à-dire les positions des stimuli audio et visuels pour lesquelles la fusion des percepts a lieu. 3 / Qualité d'expérience audio-visuelle 3D. Quel est l'apport du rendu de la profondeur sonore sur la qualité d'expérience audio-visuelle 3D ? Nous avons tout d'abord évalué la qualité d'expérience actuelle, lorsque la présentation de contenus vidéo 3D est associée à une bande son 5.1, diffusée par des systèmes grand public (système 5.1, casque, et barre de son). Nous avons ensuite étudié l'apport du rendu de la profondeur sonore grâce au système audio-visuel proposé (vidéo 3D associée à la Wave Field Synthesis)
The digital entertainment industry is undergoing a major evolution due to the recent spread of stereoscopic-3D videos. It is now possible to experience 3D by watching movies, playing video games, and so on. In this context, video catches most of the attention but what about the accompanying audio rendering? Today, the most often used sound reproduction technologies are based on lateralization effects (stereophony, 5.1 surround systems). Nevertheless, it is quite natural to wonder about the need of introducing a new audio technology adapted to this new visual dimension: the depth. Many alternative technologies seem to be able to render 3D sound environments (binaural technologies, ambisonics, Wave Field Synthesis). Using these technologies could potentially improve users' quality of experience. It could impact the feeling of realism by adding audio-visual spatial congruence, but also the immersion sensation. In order to validate this hypothesis, a 3D audio-visual rendering system is set-up. The visual rendering provides stereoscopic-3D images and is coupled with a Wave Field Synthesis sound rendering. Three research axes are then studied: 1/ Depth perception using unimodal or bimodal presentations. How the audio-visual system is able to render the depth of visual, sound, and audio-visual objects? The conducted experiments show that Wave Field Synthesis can render virtual sound sources perceived at different distances. Moreover, visual and audio-visual objects can be localized with a higher accuracy in comparison to sound objects. 2/ Crossmodal integration in the depth dimension. How to guarantee the perception of congruence when audio-visual stimuli are spatially misaligned? The extent of the integration window was studied at different visual object distances. In other words, according to the visual stimulus position, we studied where sound objects should be placed to provide the perception of a single unified audio-visual stimulus. 3/ 3D audio-visual quality of experience. What is the contribution of sound depth rendering on the 3D audio-visual quality of experience? We first assessed today's quality of experience using sound systems dedicated to the playback of 5.1 soundtracks (5.1 surround system, headphones, soundbar) in combination with 3D videos. Then, we studied the impact of sound depth rendering using the set-up audio-visual system (3D videos and Wave Field Synthesis)
APA, Harvard, Vancouver, ISO, and other styles
42

Gibbons, J. A. "Accelerating finite difference models with field programmable gate arrays : application to real-time audio synthesis and acoustic modelling." Thesis, University of York, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.444681.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Poepel, Cornelius. "An investigation of audio signal-driven sound synthesis with a focus on its use for bowed stringed synthesisers." Thesis, University of Birmingham, 2011. http://etheses.bham.ac.uk//id/eprint/1479/.

Full text
Abstract:
This thesis proposes an alternative approach to sound synthesis. It seeks to offer traditional string players a synthesiser which will allow them to make use of their existing skills in performance. A theoretical apparatus reflecting on the constraints of formalisation is developed and used to shed light on construction-related shortcomings in the instrumental developments of related research. Historical aspects and methods of sound synthesis, and the act of musical performance, are addressed with the aim of drawing conclusions for the construction of algorithms and interfaces. The alternative approach creates an openness and responsiveness in the synthesis instrument by using implicit playing parameters without the necessity to define, specify or measure all of them. In order to investigate this approach, several synthesis algorithms are developed, sounds are designed and a selection of them empirically compared to conventionally synthesised sounds. The algorithms are used in collaborative projects with other musicians in order to examine their practical musical value. The results provide evidence that implementations using the approach presented can offer musically significant differences as compared to similarly complex conventional implementations, and that - depending on the disposition of the musician - they can form a valuable contribution to the sound repertoire of performers and composers.
APA, Harvard, Vancouver, ISO, and other styles
44

Andersson, Olliver. "Exploring new interaction possibilities for video game music scores using sample-based granular synthesis." Thesis, Luleå tekniska universitet, Medier, ljudteknik och teater, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-79572.

Full text
Abstract:
For a long time, the function of the musical score has been to support activity in video games, largely by reinforcing the drama and excitement. Rather than leave the score in the background, this project explores the interaction possibilities of an adaptive video game score using real-time modulation of granular synthesis. This study evaluates a vertically re-orchestrated musical score with elements of the score being played back with granular synthesis. A game level was created where parts of the musical score utilized one granular synthesis stem, the parameters of which were controlled by the player. A user experience study was conducted to evaluate the granular synthesis interaction. The results show a wide array of user responses, opinions, impression and recommendations about how the granular synthesis interaction was musically experienced. Some results show that the granular synthesis stem is regarded as an interactive feature and have a direct relationship to the background music. Other results show that interaction went unnoticed. In most cases, the granular synthesis score was experienced as comparable to a more conventional game score and so, granular synthesis can be seen a new interactive tool for the sounddesigner. The study shows that there is more to be explored regarding musical interactions within games.

For contact with the author or request of videoclips, audio or other resources

Mail: olliver.andersson@gmail.com

APA, Harvard, Vancouver, ISO, and other styles
45

Meynard, Adrien. "Stationnarités brisées : approches à l'analyse et à la synthèse." Thesis, Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0475.

Full text
Abstract:
La non-stationnarité est caractéristique des phénomènes physiques transitoires. Par exemple, elle peut être engendrée par la variation de vitesse d'un moteur lors d'une accélération. De même, du fait de l'effet Doppler, un son stationnaire émis par une source en mouvement sera perçu comme étant non stationnaire par un observateur fixe. Ces exemples nous conduisent à considérer une classe de non-stationnarité formée des signaux stationnaires dont la stationnarité a été brisée par une opérateur de déformation physiquement pertinent. Après avoir décrit les modèles de déformation considérés (chapitre 1), nous présentons différentes méthodes permettant d'étendre l'analyse et la synthèse spectrale à de tels signaux. L'estimation spectrale des signaux revient à déterminer le spectre du processus stationnaire sous-jacent et la déformation ayant brisé sa stationnarité. Ainsi, dans le chapitre 2, nous nous intéressons à l'analyse de signaux localement déformés pour lesquels la déformation subie s'exprime simplement comme un déplacement des coefficients d'ondelettes dans le plan temps-échelle. Nous tirons profit de cet propriété pour proposer l'algorithme d'estimation du spectre instantané JEFAS. Dans le chapitre 3, nous étendons cette analyse spectrale aux signaux multi-capteurs pour lesquels l'opérateur de déformation prend une forme matricielle. Il s'agit d'un problème de séparation de sources doublement non stationnaire. Dans le chapitre 4, nous proposons un approche à la synthèse pour étudier des signaux localement déformés. Enfin, dans le chapitre 5, nous construisons une représentation temps-fréquence adaptée à l'étude des signaux localement harmoniques
Nonstationarity characterizes transient physical phenomena. For example, it may be caused by a speed variation of an accelerating engine. Similarly, because of the Doppler effect, a stationary sound emitted by a moving source is perceived as being nonstationary by a motionless observer. These examples lead us to consider a class of nonstationary signals formed from stationary signals whose stationarity has been broken by a physically relevant deformation operator. After describing the considered deformation models (chapter 1), we present different methods that extend the spectral analysis and synthesis to such signals. The spectral estimation amounts to determining simultaneously the spectrum of the underlying stationary process and the deformation breaking its stationarity. To this end, we consider representations of the signal in which this deformation is characterized by a simple operation. Thus, in chapter 2, we are interested in the analysis of locally deformed signals. The deformation describing these signals is simply expressed as a displacement of the wavelet coefficients in the time-scale domain. We take advantage of this property to develop a method for the estimation of these displacements. Then, we propose an instantaneous spectrum estimation algorithm, named JEFAS. In chapter 3, we extend this spectral analysis to multi-sensor signals where the deformation operator takes a matrix form. This is a doubly nonstationary blind source separation problem. In chapter 4, we propose a synthesis approach to study locally deformed signals. Finally, in chapter 5, we construct a time-frequency representation adapted to the description of locally harmonic signals
APA, Harvard, Vancouver, ISO, and other styles
46

Disch, Sascha [Verfasser]. "Modulation vocoder for analysis, processing and synthesis of audio signals with application to frequency selective pitch transposition / Sascha Disch." Hannover : Technische Informationsbibliothek und Universitätsbibliothek Hannover (TIB), 2011. http://d-nb.info/1014323789/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Tiger, Guillaume. "Synthèse sonore d'ambiances urbaines pour les applications vidéoludiques." Thesis, Paris, CNAM, 2014. http://www.theses.fr/2015CNAM0968/document.

Full text
Abstract:
Suite à un état de l'art détaillant la création et l'utilisation de l'espace sonore dans divers environnements urbains virtuels (soundmaps, jeux vidéo, réalité augmentée), il s'agira de déterminer une méthodologie et des techniques de conception pour les espaces sonores urbains virtuels du point de vue de l'immersion, de l'interface et de la dramaturgie.ces développements se feront dans le cadre du projet terra dynamica, tendant vers une utilisation plurielle de la ville virtuelle (sécurité et sureté, transports de surface, aménagement de l'urbanisme, services de proximité et citoyens, jeux). le principal objectif du doctorat sera de déterminer des réponses informatiques concrètes à la problématique suivante : comment, en fonction de leur utilisation anticipée, les espaces sonores urbains virtuels doivent-ils être structurés et avec quels contenus?la formalisation informatique des solutions étayées au fil du doctorat et la création du contenu sonore illustrant le projet seront basés sur l'analyse de données scientifiques provenant de domaines variés tels que la psychologie de la perception, l'architecture et l'urbanisme, l'acoustique, la recherche esthétique (musicale) ainsi que sur l'observation et le recueil de données audio-visuelles du territoire urbain, de manière à rendre compte tant de la richesse du concept d'espace sonore que de la multiplicité de ses déclinaisons dans le cadre de la ville virtuelle
In video gaming and interactive media, the making of complex sound ambiences relies heavily on the allowed memory and computational resources. So a compromise solution is necessary regarding the choice of audio material and its treatment in order to reach immersive and credible real-time ambiences. Alternatively, the use of procedural audio techniques, i.e. the generation of audio content relatively to the data provided by the virtual scene, has increased in recent years. Procedural methodologies seem appropriate to sonify complex environments such as virtual cities.In this thesis we specifically focus on the creation of interactive urban sound ambiences. Our analysis of these ambiences is based on the Soundscape theory and on a state of art on game oriented urban interactive applications. We infer that the virtual urban soundscape is made of several perceptive auditory grounds including a background. As a first contribution we define the morphological and narrative properties of such a background. We then consider the urban background sound as a texture and propose, as a second contribution, to pinpoint, specify and prototype a granular synthesis tool dedicated to interactive urban sound backgrounds.The synthesizer prototype is created using the visual programming language Pure Data. On the basis of our state of the art, we include an urban ambiences recording methodology to feed the granular synthesis. Finally, two validation steps regarding the prototype are described: the integration to the virtual city simulation Terra Dynamica on the one side and a perceptive listening comparison test on the other
APA, Harvard, Vancouver, ISO, and other styles
48

Bilhanan, Anuleka. "High level synthesis of an image processing algorithm for cancer detection." [Tampa, Fla.] : University of South Florida, 2004. http://purl.fcla.edu/fcla/etd/SFE0000303.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Silva, Marcio José da. "Modelagem de um sistema para auralização musical utilizando Wave Field Synthesis." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/27/27158/tde-18052015-163521/.

Full text
Abstract:
Buscando-se a aplicação prática da teoria de Wave Field Synthesis (WFS) na música, foi feita uma pesquisa visando à modelagem de um sistema de sonorização capaz de criar imagens sonoras espaciais com a utilização desta técnica. Diferentemente da maioria das outras técnicas de sonorização, que trabalham com uma região de escuta pequena e localizada, WFS permite projetar os sons de cada fonte sonora - como instrumentos musicais e vozes - em diferentes pontos do espaço de audição, em uma região de escuta que pode abranger quase toda a área compreendida por este espaço, dependendo da quantidade de alto-falantes instalados. O desenvolvimento de um código de estrutura modular para WFS foi baseado na plataforma orientada a patches Pure Data (Pd), e no sistema de auralização AUDIENCE, desenvolvido na USP, sendo integrável como ferramenta para espacialização sonora interativa. A solução emprega patches dinâmicos e uma arquitetura modular, permitindo flexibilidade e manutenabilidade do código, com vantagens frente a outros software existentes, particularmente na instalação, operação e para lidar com um número elevado de fontes sonoras e alto-falantes. Para este sistema também foram desenvolvidos alto-falantes especiais com características que facilitam seu uso em aplicações musicais.
Seeking the practical application of the theory of Wave Field Synthesis (WFS) in music, a research aimed at modeling a sound system capable of creating spatial sound images with the use of this technique was made. Unlike most other techniques for sound projection that work with a small, localized listening area, WFS allows projecting the sounds of each sound source - such as musical instruments and voices - at different points within the hearing space, in a region that can cover almost the entire area comprised by this space, depending on the amount of installed speakers. The development of a modular structured code for WFS was based on the patch-oriented platform Pure Data (Pd), and on the AUDIENCE auralization system developed at USP, and it is integrable as a tool for interactive sound spatialization. The solution employs dynamic patches and a modular architecture, allowing code flexibility and maintainability, with advantages compared to other existing software, particularly in the installation, operation and to handle a large number of sound sources and speakers. For this system special speakers with features that facilitate its use in musical applications were also developed.
APA, Harvard, Vancouver, ISO, and other styles
50

Gonon, Gilles. "Proposition d'un schéma d'analyse/synthèse adaptatif dans le plan temps-fréquence basé sur des critères entropiques : application au codage audio par transformée." Le Mans, 2002. http://cyberdoc.univ-lemans.fr/theses/2002/2002LEMA1004.pdf.

Full text
Abstract:
Les représentations adaptées contribuent à l'étude et au traitement des informations portées par les signaux en permettant une analyse pertinente différente pour chaque signal. Ce travail de thèse porte sur l'élaboration d'une représentation utilisant successivement des segmentations temporelle et fréquentielle adaptées au signal plus souple que les solutions existantes. Ce schéma est appliqué dans un codeur perceptuel par transformée de type haute fidélité. Le signal est d'abord segmenté temporellement. Le critère utilisé est basé sur un estimateur d'entropie locale, dont il fournit un indice des variations, propice à une segmentation automatique séparant les zones transitoires et les zones stationnaires. Les tranches temporelles ainsi délimitées sont alors décomposées en paquets d'ondelettes et une recherche de la meilleure base permet l'adaptation en fréquence de la représentation. Une extension de la recherche de meilleure base est proposée pour augmenter le dictionnaire des bases disponibles par rapport au cas dyadique. À l'issue de cette analyse le signal est localisé dans des atomes du plan temps-fréquence. Un codeur d'architecture orginale incluant notre représentation est ensuite présenté, ainsi que le détail de son implémentation. Ce codeur est évalué par des tests subjectifs comparant les sons compressés aux originaux et au standard MPEG1-III pour un débit de 96 kbit/s. Les résultats montrent que l'utilisation du schéma de représentation adapté dans un codeur est compétitif avec les solutions des codeurs standards alors que de nombreuses améliorations sont possibles
Adaptive representations contribute to the study and caracterization of the information carried by any signal. In this work, we present a new decomposition which uses separated segmentation criterias in time and frequency to improve the adaptivity of the analysis to the signal. This scheme is applied to a transform perceptual audio coder. The signal is first temporally segmented using a local entropic criteria. Based upon an estimator of the local entropy, the segmentation criteria is relevant of the entropy variations in a signal and allows to separate stationnary parts from transients ones. Temporal frames thus defined are frequentially filtered using the Wavelet Packet Decomposition and the adaptation is performed by the mean of the Best Basis Search Algorithm. An extension of the library of dyadic basis is derived to improve the entropic gain performed over the signal and so the adaptivity of the decomposition. The perceptual audio coder we developped follows an original design in order to include the proposed scheme. The whole implementation of the coder is described in the document. This coder is evaluated with subjective tests, performed according to absolute and blind comparison for a rate of 96 kbps. As many parts of our coder are still to be improved, results show a subjective quality equivalent to the tested standard and hardly transparent toward the original sounds
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography