Dissertations / Theses on the topic 'Audio synthesi'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Audio synthesi.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
CHEMLA, ROMEU SANTOS AXEL CLAUDE ANDRE'. "MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES." Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/700444.
Full textAmong the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.
À travers les différents domaines de recherche de la musique computationnelle, l’analysie et la génération de signaux audio sont l’exemple parfait de la trans-disciplinarité de ce domaine, nourrissant simultanément les pratiques scientifiques et artistiques depuis leur création. Intégrée à la musique computationnelle depuis sa création, la synthèse sonore a inspiré de nombreuses approches musicales et scientifiques, évoluant de pair avec les pratiques musicales et les avancées technologiques et scientifiques de son temps. De plus, certaines méthodes de synthèse sonore permettent aussi le processus inverse, appelé analyse, de sorte que les paramètres de synthèse d’un certain générateur peuvent être en partie ou entièrement obtenus à partir de sons donnés, pouvant ainsi être considérés comme une représentation alternative des signaux analysés. Parallèlement, l’intérêt croissant soulevé par les algorithmes d’apprentissage automatique a vivement questionné le monde scientifique, apportant de puissantes méthodes d’analyse de données suscitant de nombreux questionnements épistémologiques chez les chercheurs, en dépit de leur effectivité pratique. En particulier, une famille de méthodes d’apprentissage automatique, nommée modèles génératifs, s’intéressent à la génération de contenus originaux à partir de caractéristiques extraites directement des données analysées. Ces méthodes n’interrogent pas seulement les approches précédentes, mais aussi sur l’intégration de ces nouvelles méthodes dans les processus créatifs existants. Pourtant, alors que ces nouveaux processus génératifs sont progressivement intégrés dans le domaine la génération d’image, l’application de ces techniques en synthèse audio reste marginale. Dans cette thèse, nous proposons une nouvelle méthode d’analyse-synthèse basés sur ces derniers modèles génératifs, depuis renforcés par les avancées modernes dans le domaine de l’apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des systèmes génératifs, sur comment notre travail peut s’insérer dans les pratiques de synthèse sonore existantes, et que peut-on espérer de l’hybridation de ces deux approches. Ensuite, nous nous focaliserons plus précisément sur comment les récentes avancées accomplies dans ce domaine dans ce domaine peuvent être exploitées pour l’apprentissage de distributions sonores complexes, tout en étant suffisamment flexibles pour être intégrées dans le processus créatif de l’utilisateur. Nous proposons donc un processus d’inférence / génération, reflétant les paradigmes d’analyse-synthèse existant dans le domaine de génération audio, basé sur l’usage de modèles latents continus que l’on peut utiliser pour contrôler la génération. Pour ce faire, nous étudierons déjà les résultats préliminaires obtenus par cette méthode sur l’apprentissage de distributions spectrales, prises d’ensembles de données diversifiés, en adoptant une approche à la fois quantitative et qualitative. Ensuite, nous proposerons d’améliorer ces méthodes de manière spécifique à l’audio sur trois aspects distincts. D’abord, nous proposons deux stratégies de régularisation différentes pour l’analyse de signaux audio : une basée sur la traduction signal/ symbole, ainsi qu’une autre basée sur des contraintes perceptives. Nous passerons par la suite à la dimension temporelle de ces signaux audio, proposant de nouvelles méthodes basées sur l’extraction de représentations temporelles multi-échelle et sur une tâche supplémentaire de prédiction, permettant la modélisation de caractéristiques dynamiques par les espaces génératifs obtenus. En dernier lieu, nous passerons d’une approche scientifique à une approche plus orientée vers un point de vue recherche & création. Premièrement, nous présenterons notre librairie open-source, vsacids, visant à être employée par des créateurs experts et non-experts comme un outil intégré. Ensuite, nous proposons une première utilisation musicale de notre système par la création d’une performance temps réel, nommée ægo, basée à la fois sur notre librarie et sur un agent d’exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu’à maintenant, concernant les possibles améliorations et développements de la méthode de synthèse proposée, ainsi que sur de possibles applications créatives.
Lundberg, Anton. "Data-Driven Procedural Audio : Procedural Engine Sounds Using Neural Audio Synthesis." Thesis, KTH, Datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280132.
Full textDet i dagsläget dominerande tillvägagångssättet för rendering av ljud i interaktivamedia, såsom datorspel och virtual reality, innefattar uppspelning av statiska ljudfiler. Detta tillvägagångssätt saknar flexibilitet och kräver hantering av stora mängder ljuddata. Ett alternativt tillvägagångssätt är procedurellt ljud, vari ljudmodeller styrs för att generera ljud i realtid. Trots sina många fördelar används procedurellt ljud ännu inte i någon vid utsträckning inom kommersiella produktioner, delvis på grund av att det genererade ljudet från många föreslagna modeller inte når upp till industrins standarder. Detta examensarbete undersöker hur procedurellt ljud kan utföras med datadrivna metoder. Vi gör detta genom att specifikt undersöka metoder för syntes av bilmotorljud baserade på neural ljudsyntes. Genom att bygga på en nyligen publicerad metod som integrerar digital signalbehandling med djupinlärning, kallad Differentiable Digital Signal Processing (DDSP), kan vår metod skapa ljudmodeller genom att träna djupa neurala nätverk att rekonstruera inspelade ljudexempel från tolkningsbara latenta prediktorer. Vi föreslår en metod för att använda fasinformation från motorers förbränningscykler, samt en differentierbar metod för syntes av transienter. Våra resultat visar att DDSP kan användas till procedurella motorljud, men mer arbete krävs innan våra modeller kan generera motorljud utan oönskade artefakter samt innan de kan användas i realtidsapplikationer. Vi diskuterar hur vårt tillvägagångssätt kan vara användbart inom procedurellt ljud i mer generella sammanhang, samt hur vår metod kan tillämpas på andra ljudkällor
Elfitri, I. "Analysis by synthesis spatial audio coding." Thesis, University of Surrey, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.590657.
Full textWood, Steven Gregory. "Objective Test Methods for Waveguide Audio Synthesis." BYU ScholarsArchive, 2007. https://scholarsarchive.byu.edu/etd/853.
Full textUstun, Selen. "Audio browsing of automaton-based hypertext." Thesis, Texas A&M University, 2003. http://hdl.handle.net/1969.1/33.
Full textJehan, Tristan 1974. "Perceptual synthesis engine : an audio-driven timbre generator." Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/61543.
Full textIncludes bibliographical references (leaves 68-75).
A real-time synthesis engine which models and predicts the timbre of acoustic instruments based on perceptual features extracted from an audio stream is presented. The thesis describes the modeling sequence including the analysis of natural sounds, the inference step that finds the mapping between control and output parameters, the timbre prediction step, and the sound synthesis. The system enables applications such as cross-synthesis, pitch shifting or compression of acoustic instruments, and timbre morphing between instrument families. It is fully implemented in the Max/MSP environment. The Perceptual Synthesis Engine was developed for the Hyperviolin as a novel, generic and perceptually meaningful synthesis technique for non-discretely pitched instruments.
by Tristan Jehan.
S.M.
Payne, R. G. "Digital techniques for the analysis and synthesis of audio signals." Thesis, Bucks New University, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.234706.
Full textCoulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Mémoire, Université de Sherbrooke, 2000. http://savoirs.usherbrooke.ca/handle/11143/1078.
Full textCoulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Sherbrooke : Université de Sherbrooke, 2001.
Find full textAndreux, Mathieu. "Foveal autoregressive neural time-series modeling." Electronic Thesis or Diss., Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE073.
Full textThis dissertation studies unsupervised time-series modelling. We first focus on the problem of linearly predicting future values of a time-series under the assumption of long-range dependencies, which requires to take into account a large past. We introduce a family of causal and foveal wavelets which project past values on a subspace which is adapted to the problem, thereby reducing the variance of the associated estimators. We then investigate under which conditions non-linear predictors exhibit better performances than linear ones. Time-series which admit a sparse time-frequency representation, such as audio ones, satisfy those requirements, and we propose a prediction algorithm using such a representation. The last problem we tackle is audio time-series synthesis. We propose a new generation method relying on a deep convolutional neural network, with an encoder-decoder architecture, which allows to synthesize new realistic signals. Contrary to state-of-the-art methods, we explicitly use time-frequency properties of sounds to define an encoder with the scattering transform, while the decoder is trained to solve an inverse problem in an adapted metric
Boyes, Graham. "Dictionary-based analysis/synthesis and structured representations of musical audio." Thesis, McGill University, 2012. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=106507.
Full textDans la représentation du signal audio musical, il est commun de favoriser une interprétation de type signal ou bien de type symbole, alors que la représentation de type mi-niveau, ou intermédiaire, devient un sujet d'actualité. Dans cette thèse nous investiguons la perspective de ces représentations intermédiaires et structurées. Notre recherche intègre tant les aspects théoriques liés à des objets sonores séparables, que les méthodes d'analyse des signaux fondées sur des dictionnaires, et ce jusqu'à la conception de logiciels conus dans le cadre de la programmation orienté objet. Contrairement aux exemples disponibles dans la littérature notre approche des représentations intermédiaires part du niveau symbolique pour aller vers le signal, plutôt que le contraire. Cette méthodologie est appliquée non seulement à la spécification de techniques analytiques mais aussi à la conception d'un système logiciel afférent. Les résultats expérimentaux montrent que notre méthode est capable de réduire la distance d'Itakura-Saito, distance fondé sur la perception, ceci en comparaison à une méthode de décomposition générique. Nous montrons également que notre représentation structurée peut être utilisée dans des applications pratiques telles que la visualisation, l'agrégation post-traitement ainsi qu'en composition musicale.
Rodgers, Tara. "Synthesizing sound: metaphor in audio-technical discourse and synthesis history." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=97090.
Full textLe son synthétique est omniprésent dans la musique contemporaine et dans l'environnement sonore à l'échelle mondiale. Cependant, on a relativement peu écrit sur sa signification ou sur ses origines culturelles. Cette thèse construit une longue histoire du son synthétique au cours du siècle avant que ne soit massivement introduit le synthétiseur dans les années 1970; et s'attache aux thèmes anciens et mythiques qui émanent dans le discours contemporain de la technique audio. Cette recherche s'appuie sur des documents d'archives, y compris ceux de la fin du xixe siècle et du début du xxe siècle, comprenant des textes acoustiques, des publications d'inventeurs, de la correspondance ou des manuels d'utilisation des synthétiseurs à partir des années 1940 jusqu'aux années 1970.En tant que récit féministe du son synthétique, ce projet étudie comment les métaphores dans le discours de la technique audio sont porteuses de notions d'identité et de différence. À travers l'analyse de concepts clés de l'histoire du son synthétique, j'affirme que le langage de la technique audio et sa représentation, qui passe habituellement pour neutre, privilégie en fait la perspective masculine, archétypale du sujet blanc occidental. J'identifie deux métaphores primaires pour la conception des sons électroniques qui ont été utilisés à l'aube du xxe siècle et qui contribuent sans cesse à une épistémologie du son: des sons électroniques comme des vagues et les sons électroniques en tant qu'individus. La métaphore des vagues, en circulation depuis l'aube des temps, est productrice d'un affect aux technologies audio, typiquement basé sur un point de vue masculin et colonisateur; où la création et le contrôle du son électronique entraîne le plaisir et le danger propre à la navigation sur une mer houleuse. La seconde métaphore a pris forme au cours du xixe siècle au moment où les sons, comme des organismes vivants modernes, sujets, se sont vus interprétés comme de véritables entités individuelles aux propriétés variables pouvant faire l'objet d'analyse et de contrôle. Les notions d'individuation et de variabilité sonore émergèrent dans le contexte d'une pensée Darwinienne, alors qu'une fascination culturelle pour l'électricité vue comme une sorte de puissance immuable, se forgeait. Les méthodes de classification des sons en tant qu'individus, triés en fonction de variations esthétiques désirables ou indésirables, ont été intimement liées aux épistémologies du sexe et de la différence raciale dans la philosophie occidentale et dans les sciences modernes. Le son électronique est aussi l'héritier d'autres histoires, incluant les usages de notions telles que synthèse ou synthétique dans divers champs culturels; le design des premiers dispositifs mécaniques et électroniques, ou encore l'évolution de la modernité musicale et le développement d'un public amateur de culture électronique. La perspective à long terme et le large spectre sur l'histoire de la synthèse musicale adoptée dans cette étude vise à contester les vérités reçues dans le discours ambiant de la technique audio et à résister à la progression d'histoires linéaires et cohérentes qu'on trouve encore trop souvent dans l'histoire de la technologie et des nouveaux médias. Cette thèse contribue d'une façon importante au domaine des études en son et médias, qui pourraient à leur tour bénéficier d'un apport féministe en général et plus spécifiquement de l'élaboration des formes et des significations des technologies de la synthèse musicale. En outre, si les universitaires féministes ont largement théorisé les nouvelles cultures technologiques ou visuelles, peu d'entre elles ont exploré le son et les techniques audio. Ce projet veut ouvrir de nouvelles voies dans un domaine d'études féministes du son dans une perspective historienne avec des notions d'identité et de différence dans le discours de la technique audio, tout en clamant l'utilité du son à une pensée féministe.
Deena, Salil Prashant. "Visual speech synthesis by learning joint probabilistic models of audio and video." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/visual-speech-synthesis-by-learning-joint-probabilistic-models-of-audio-and-video(bdd1a78b-4957-469e-8be4-34e83e676c79).html.
Full textFernández-Torné, Anna. "Audio description and technologies. Study on the semi-automatisation of the translation and voicing of audio descriptions." Doctoral thesis, Universitat Autònoma de Barcelona, 2016. http://hdl.handle.net/10803/394035.
Full textThis PhD thesis explores the application of technologies to the audio description field with the aim to semi-automatise the process in two ways. On the one hand, text-tospeech is implemented to the voicing of audio description in Catalan and, on the other hand, machine translation with post-editing is applied to the English audio descriptions to obtain Catalan AD scripts. In relation to TTS, a selection of available synthetic and natural voices in Catalan (5 masculine ones and 5 feminine ones for each category) is assessed by means of a selfadministered questionnaire mainly based on the ITU-T P.85 Standard Mean Opinion Score (MOS) scales for the subjective assessment of the quality of synthetic speech. Thus, participants assess the voices taking into account various items (overall impression, accentuation, pronunciation, speech pauses, intonation, naturalness, pleasantness, listening effort, and acceptance). The voices obtaining the best scores for each category are then used to assess the reception of text-to-speech audio descriptions compared to human-voiced audio descriptions by blind and visually impaired persons. Both quantitative and qualitative data obtained show that the preferential choice of blind and partially sighted persons is the audio description voiced by a human, rather than by a speech synthesis system since natural voices obtain statistically higher scores than synthetic voices. However, TTS AD is accepted by end users (94% of the participants) as an alternative acceptable solution, and 20% of the respondents actually state that their preferred voice from the four under analysis is a synthetic one. As regards MT, a selection of five available free on-line machine translation engines from English into Catalan is evaluated in order to determine which is the most suitable for audio description. Their raw machine translation outputs and the post-editing effort involved are assessed using eight different scores, including human judgments (PE time, PE necessity, PE difficulty, MT output adequacy, MT output fluency and MT output ranking) and automatic metrics (HBLEU and HTER). The results show that there are clear quality differences among the systems assessed and that one of them (Google Translate) is the best rated in six out of the eight evaluation measures used. Once the best performing engine is selected, the effort, both objective and subjective, involved in three scenarios is compared: the effort of creating an audio description from scratch (AD creation), of manually translating an audio description (AD translation), and of post-editing a machine-translated audio description (AD PE). The results show that the objective post-editing effort is lower than creating an AD ex novo and manually translating it, although the subjective effort is perceived to be higher for the post-editing task.
Heinrichs, Christian. "Human expressivity in the control and integration of computationally generated audio." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/33924.
Full textFaller, Kenneth John II. "Automated synthesis of a reduced-parameter model for 3D digital audio." FIU Digital Commons, 1996. http://digitalcommons.fiu.edu/etd/3245.
Full textStrandberg, Carl. "Mediating Interactions in Games Using Procedurally Implemented Modal Synthesis : Do players prefer and choose objects with interactive synthetic sounds over objects with traditional sample based sounds?" Thesis, Luleå tekniska universitet, Institutionen för konst, kommunikation och lärande, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-68015.
Full textOger, Marie. "Model-based techniques for flexible speech and audio coding." Nice, 2007. http://www.theses.fr/2007NICE4109.
Full textThe objective of this thesis is to develop optimal speech and audio coding techniques which are more flexible than the state of the art and can adapt in real-time to various constraints (rate, bandwidth, delay). This problem is addressed using several tools : statistical models, high-rate quantization theory, flexible entropy coding. Firstly, a novel method of flexible coding for linear prediction coding (LPC) coefficients is proposed using Karhunen-Loeve transform (KLT) and scalar quantization based on generalized Gaussian modelling. This method has a performance equivalent to the LPC quantizer used in AMR-WB with a lower complexity. Then, two transform audio coding structures are proposed using either stack-run coding or model-based bit plane coding. In both case the coefficients after perceptual weighting and modified discrete cosine transform (MDCT) are approximated by a generalized Gaussian distribution. The coding of MDCT coefficients is optimized according to this model. The performance is compared with that of ITU-T G. 7222. 1. The stack-run coder is better than G. 7222. 1 at low bit rates and equivalent at high bit rates. However, the computational complexity of the proposed stack-run coder is higher and the memory requirement is low. The bit plane coder has the advantage of being bit rate scalable. The generalized Gaussian model is used to initialize the probability tables of an arithmetic coder. The bit plane coder is worse than stack-run coding at low bit rates and equivalent at high bit rates. It has a computational complexity close to G. 7222. 1 while memory requirement is still low
Mosbruger, Michael C. "Alternative audio solution to enhance immersions in deployable synthetic environments." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03sep%5FMosbruger.pdf.
Full textThesis advisor(s): Russell D. Shilling, Rudolph P. Darken. Includes bibliographical references (p. 169-172). Also available online.
Kudumakis, Panos E. "Synthesis and coding of audio signals using wavelet transforms for multimedia applications." Thesis, King's College London (University of London), 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.343479.
Full textEmnett, Keith Jeffrey 1973. "Synthetic News Radio : content filtering and delivery for broadcast audio news." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/61108.
Full textIncludes bibliographical references (p. 58-59).
Synthetic News Radio uses automatic speech recognition and clustered text news stories to automatically find story boundaries in an audio news broadcast, and it creates semantic representations that can match stories of similar content through audio-based queries. Current speech recognition technology cannot by itself produce enough information to accurately characterize news audio; therefore, the clustered text stories represent a knowledge base of relevant news topics that the system can use to combine recognition transcripts of short, intonational phrases into larger, complete news stories. Two interface mechanisms, a graphical desktop application and a touch-tone drive phone interface, allow quick and efficient browsing of the new structured news broadcasts. The system creates a personal, synthetic newscast by extracting stories, based on user interests, from multiple hourly newscasts and then reassembling them into a single recording at the end of the day. The system also supports timely delivery of important stories over a LAN or to a wireless audio pager. This thesis describes the design and evaluation of the news segmentation and content matching technology, and evaluates the effectiveness of the interface and delivery mechanisms.
by Keith Jeffrey Emnett.
S.M.
Mitchell, Thomas James. "An exploration of evolutionary computation applied to frequency modulation audio synthesis parameter optimisation." Thesis, University of the West of England, Bristol, 2010. http://eprints.uwe.ac.uk/18265/.
Full textBailey, Nicholas James. "On the synthesis and processing of high quality audio signals by parallel computers." Thesis, Durham University, 1991. http://etheses.dur.ac.uk/6285/.
Full textYong, Louisa Chung-Sze. "An Internet-based audio synthesis resource : a case study in Manchester and Salford." Thesis, University of Salford, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365975.
Full textBowler, I. "Digital techniques in the storage and processing of audio waveforms for music synthesis." Thesis, Bucks New University, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.373583.
Full textSini, Aghilas. "Caractérisation et génération de l’expressivité en fonction des styles de parole pour la construction de livres audio." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S026.
Full textIn this thesis, we study the expressivity of read speech with a particular type of data, which are audiobooks. Audiobooks are audio recordings of literary works made by professionals (actors, singers, professional narrators) or by amateurs. These recordings may be intended for a particular audience (blind or visually impaired people). The availability of this kind of data in large quantities with a good enough quality has attracted the attention of the research community in automatic speech and language processing in general and of researchers specialized in expressive speech synthesis systems. We propose in this thesis to study three elementary entities of expressivity that are conveyed by audiobooks: emotion, variations related to discursive changes, and speaker properties. We treat these patterns from a prosodic point of view. The main contributions of this thesis are: the construction of a corpus of audiobooks with a large number of recordings partially annotated by an expert, a quantitative study characterizing the emotions in this type of data, the construction of a model based on automatic learning techniques for the automatic annotation of discourse types and finally we propose a vector representation of the prosodic identity of a speaker in the framework of parametric statistical speech synthesis
Cobos, Serrano Máximo. "Application of sound source separation methods to advanced spatial audio systems." Doctoral thesis, Universitat Politècnica de València, 2010. http://hdl.handle.net/10251/8969.
Full textCobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969
Palancia
Polotti, Pietro. "Fractal additive synthesis : spectral modeling of sound for low rate coding of quality audio /." [S.l.] : [s.n.], 2003. http://library.epfl.ch/theses/?nr=2711.
Full textBürger, Michael [Verfasser]. "On the Analysis and Synthesis of Local Sound Fields for Personal Audio / Michael Bürger." München : Verlag Dr. Hut, 2019. http://d-nb.info/1202169015/34.
Full textHansjons, Vegeborn Victor. "LjudMAP: A Visualization Tool for Exploring Audio Collections with Real-Time Concatenative Synthesis Capabilities." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-277831.
Full textI detta examensarbete presenteras mjukvaruverktyget ”LjudMAP,” som sam- manfogar tekniker i musikinformatik och oövervakade maskininlärningsmetoder för att bistå i utforskande av ljudsamlingar. LjudMAP bygger på koncepten som återfinns i ”Temporally Disassembled Audio” som är framtaget för att möjliggöra snabbt bläddrande i ljudupptagningar av tal. LjudMAP är istället avsedd för analys och realtidskomposition av elektroakustik musik, och är programmerad på ett sätt som kan inkludera fler ljuddeskriptorer. I examensarbetet presenteras undersökningar i hur LjudMAP kan användas för att identifiera likheter och kluster av ljud inom ljudsamlingar. Ett viktigt bidrag är koagulering av kluster av ljud baserat på principer för närhet i tids- och funktionsrymden. Examensarbetet visar också hur LjudMAP kan användas för komposition genom flera demonstrationer utförda av en elektroakustisk kompositör som använt sig av olika ljudkällor. Källkoden för LjudMAP finns tillgängligt vid: https://github.com/victorwegeborn AP.
Giannakis, Konstantinos. "Sound mosaics : a graphical user interface for sound synthesis based on audio-visual associations." Thesis, Middlesex University, 2001. http://eprints.mdx.ac.uk/6634/.
Full textOlivero, Anaik. "Les multiplicateurs temps-fréquence : Applications à l’analyse et la synthèse de signaux sonores et musicaux." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM4788/document.
Full textAnalysis/Transformation/Synthesis is a generalparadigm in signal processing, that aims at manipulating or generating signalsfor practical applications. This thesis deals with time-frequencyrepresentations obtained with Gabor atoms. In this context, the complexity of a soundtransformation can be modeled by a Gabor multiplier. Gabormultipliers are linear diagonal operators acting on signals, andare characterized by a time-frequency transfer function of complex values, called theGabor mask. Gabor multipliers allows to formalize the conceptof filtering in the time-frequency domain. As they act by multiplying in the time-frequencydomain, they are "a priori'' well adapted to producesound transformations like timbre transformations. In a first part, this work proposes to model theproblem of Gabor mask estimation between two given signals,and provides algorithms to solve it. The Gabor multiplier between two signals is not uniquely defined and the proposed estimationstrategies are able to generate Gabor multipliers that produce signalswith a satisfied sound quality. In a second part, we show that a Gabor maskcontain a relevant information, as it can be viewed asa time-frequency representation of the difference oftimbre between two given sounds. By averaging the energy contained in a Gabor mask, we obtain a measure of this difference that allows to discriminate different musical instrumentsounds. We also propose strategies to automaticallylocalize the time-frequency regions responsible for such a timbre dissimilarity between musicalinstrument classes. Finally, we show that the Gabor multipliers can beused to construct a lot of sounds morphing trajectories,and propose an extension
Liuni, Marco. "Adaptation Automatique de la Résolution pour l'Analyse et la Synthèse du Signal Audio." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00773550.
Full textLostanlen, Vincent. "Opérateurs convolutionnels dans le plan temps-fréquence." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEE012/document.
Full textThis dissertation addresses audio classification by designing signal representations which satisfy appropriate invariants while preserving inter-class variability. First, we study time-frequencyscattering, a representation which extract modulations at various scales and rates in a similar way to idealized models of spectrotemporal receptive fields in auditory neuroscience. We report state-of-the-artresults in the classification of urban and environmental sounds, thus outperforming short-term audio descriptors and deep convolutional networks. Secondly, we introduce spiral scattering, a representationwhich combines wavelet convolutions along time, along log-frequency, and across octaves. Spiral scattering follows the geometry of the Shepard pitch spiral, which makes a full turn at every octave. We study voiced sounds with a nonstationary sourcefilter model where both the source and the filter are transposed through time, and show that spiral scattering disentangles and linearizes these transpositions. Furthermore, spiral scattering reaches state-of-the-art results in musical instrument classification ofsolo recordings. Aside from audio classification, time-frequency scattering and spiral scattering can be used as summary statistics for audio texture synthesis. We find that, unlike the previously existing temporal scattering transform, time-frequency scattering is able to capture the coherence ofspectrotemporal patterns, such as those arising in bioacoustics or speech, up to anintegration scale of about 500 ms. Based on this analysis-synthesis framework, an artisticcollaboration with composer Florian Hecker has led to the creation of five computer music
Musti, Utpala. "Synthèse acoustico-visuelle de la parole par sélection d'unités bimodales." Thesis, Université de Lorraine, 2013. http://www.theses.fr/2013LORR0003.
Full textThis work deals with audio-visual speech synthesis. In the vast literature available in this direction, many of the approaches deal with it by dividing it into two synthesis problems. One of it is acoustic speech synthesis and the other being the generation of corresponding facial animation. But, this does not guarantee a perfectly synchronous and coherent audio-visual speech. To overcome the above drawback implicitly, we proposed a different approach of acoustic-visual speech synthesis by the selection of naturally synchronous bimodal units. The synthesis is based on the classical unit selection paradigm. The main idea behind this synthesis technique is to keep the natural association between the acoustic and visual modality intact. We describe the audio-visual corpus acquisition technique and database preparation for our system. We present an overview of our system and detail the various aspects of bimodal unit selection that need to be optimized for good synthesis. The main focus of this work is to synthesize the speech dynamics well rather than a comprehensive talking head. We describe the visual target features that we designed. We subsequently present an algorithm for target feature weighting. This algorithm that we developed performs target feature weighting and redundant feature elimination iteratively. This is based on the comparison of target cost based ranking and a distance calculated based on the acoustic and visual speech signals of units in the corpus. Finally, we present the perceptual and subjective evaluation of the final synthesis system. The results show that we have achieved the goal of synthesizing the speech dynamics reasonably well
Bleda, Pérez Sergio. "Contribuciones a la implementación de sistemas Wave Field Synthesis." Doctoral thesis, Universitat Politècnica de València, 2009. http://hdl.handle.net/10251/6685.
Full textBleda Pérez, S. (2009). Contribuciones a la implementación de sistemas Wave Field Synthesis [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/6685
Palancia
Jackson, Judith. "Generative Processes for Audification." Oberlin College Honors Theses / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=oberlin1528280288385596.
Full textVaiapury, Karthikeyan. "Model based 3D vision synthesis and analysis for production audit of installations." Thesis, Queen Mary, University of London, 2013. http://qmro.qmul.ac.uk/xmlui/handle/123456789/8721.
Full textSomasundaram, Arunachalam. "A facial animation model for expressive audio-visual speech." Columbus, Ohio : Ohio State University, 2006. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1148973645.
Full textLiudkevich, Denis. "Návrh virtuálního síťového kolaborativního zvukového nástroje." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413250.
Full textMoulin, Samuel. "Quel son spatialisé pour la vidéo 3D ? : influence d'un rendu Wave Field Synthesis sur l'expérience audio-visuelle 3D." Thesis, Sorbonne Paris Cité, 2015. http://www.theses.fr/2015PA05H102/document.
Full textThe digital entertainment industry is undergoing a major evolution due to the recent spread of stereoscopic-3D videos. It is now possible to experience 3D by watching movies, playing video games, and so on. In this context, video catches most of the attention but what about the accompanying audio rendering? Today, the most often used sound reproduction technologies are based on lateralization effects (stereophony, 5.1 surround systems). Nevertheless, it is quite natural to wonder about the need of introducing a new audio technology adapted to this new visual dimension: the depth. Many alternative technologies seem to be able to render 3D sound environments (binaural technologies, ambisonics, Wave Field Synthesis). Using these technologies could potentially improve users' quality of experience. It could impact the feeling of realism by adding audio-visual spatial congruence, but also the immersion sensation. In order to validate this hypothesis, a 3D audio-visual rendering system is set-up. The visual rendering provides stereoscopic-3D images and is coupled with a Wave Field Synthesis sound rendering. Three research axes are then studied: 1/ Depth perception using unimodal or bimodal presentations. How the audio-visual system is able to render the depth of visual, sound, and audio-visual objects? The conducted experiments show that Wave Field Synthesis can render virtual sound sources perceived at different distances. Moreover, visual and audio-visual objects can be localized with a higher accuracy in comparison to sound objects. 2/ Crossmodal integration in the depth dimension. How to guarantee the perception of congruence when audio-visual stimuli are spatially misaligned? The extent of the integration window was studied at different visual object distances. In other words, according to the visual stimulus position, we studied where sound objects should be placed to provide the perception of a single unified audio-visual stimulus. 3/ 3D audio-visual quality of experience. What is the contribution of sound depth rendering on the 3D audio-visual quality of experience? We first assessed today's quality of experience using sound systems dedicated to the playback of 5.1 soundtracks (5.1 surround system, headphones, soundbar) in combination with 3D videos. Then, we studied the impact of sound depth rendering using the set-up audio-visual system (3D videos and Wave Field Synthesis)
Gibbons, J. A. "Accelerating finite difference models with field programmable gate arrays : application to real-time audio synthesis and acoustic modelling." Thesis, University of York, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.444681.
Full textPoepel, Cornelius. "An investigation of audio signal-driven sound synthesis with a focus on its use for bowed stringed synthesisers." Thesis, University of Birmingham, 2011. http://etheses.bham.ac.uk//id/eprint/1479/.
Full textAndersson, Olliver. "Exploring new interaction possibilities for video game music scores using sample-based granular synthesis." Thesis, Luleå tekniska universitet, Medier, ljudteknik och teater, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-79572.
Full textFor contact with the author or request of videoclips, audio or other resources
Mail: olliver.andersson@gmail.com
Meynard, Adrien. "Stationnarités brisées : approches à l'analyse et à la synthèse." Thesis, Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0475.
Full textNonstationarity characterizes transient physical phenomena. For example, it may be caused by a speed variation of an accelerating engine. Similarly, because of the Doppler effect, a stationary sound emitted by a moving source is perceived as being nonstationary by a motionless observer. These examples lead us to consider a class of nonstationary signals formed from stationary signals whose stationarity has been broken by a physically relevant deformation operator. After describing the considered deformation models (chapter 1), we present different methods that extend the spectral analysis and synthesis to such signals. The spectral estimation amounts to determining simultaneously the spectrum of the underlying stationary process and the deformation breaking its stationarity. To this end, we consider representations of the signal in which this deformation is characterized by a simple operation. Thus, in chapter 2, we are interested in the analysis of locally deformed signals. The deformation describing these signals is simply expressed as a displacement of the wavelet coefficients in the time-scale domain. We take advantage of this property to develop a method for the estimation of these displacements. Then, we propose an instantaneous spectrum estimation algorithm, named JEFAS. In chapter 3, we extend this spectral analysis to multi-sensor signals where the deformation operator takes a matrix form. This is a doubly nonstationary blind source separation problem. In chapter 4, we propose a synthesis approach to study locally deformed signals. Finally, in chapter 5, we construct a time-frequency representation adapted to the description of locally harmonic signals
Disch, Sascha [Verfasser]. "Modulation vocoder for analysis, processing and synthesis of audio signals with application to frequency selective pitch transposition / Sascha Disch." Hannover : Technische Informationsbibliothek und Universitätsbibliothek Hannover (TIB), 2011. http://d-nb.info/1014323789/34.
Full textTiger, Guillaume. "Synthèse sonore d'ambiances urbaines pour les applications vidéoludiques." Thesis, Paris, CNAM, 2014. http://www.theses.fr/2015CNAM0968/document.
Full textIn video gaming and interactive media, the making of complex sound ambiences relies heavily on the allowed memory and computational resources. So a compromise solution is necessary regarding the choice of audio material and its treatment in order to reach immersive and credible real-time ambiences. Alternatively, the use of procedural audio techniques, i.e. the generation of audio content relatively to the data provided by the virtual scene, has increased in recent years. Procedural methodologies seem appropriate to sonify complex environments such as virtual cities.In this thesis we specifically focus on the creation of interactive urban sound ambiences. Our analysis of these ambiences is based on the Soundscape theory and on a state of art on game oriented urban interactive applications. We infer that the virtual urban soundscape is made of several perceptive auditory grounds including a background. As a first contribution we define the morphological and narrative properties of such a background. We then consider the urban background sound as a texture and propose, as a second contribution, to pinpoint, specify and prototype a granular synthesis tool dedicated to interactive urban sound backgrounds.The synthesizer prototype is created using the visual programming language Pure Data. On the basis of our state of the art, we include an urban ambiences recording methodology to feed the granular synthesis. Finally, two validation steps regarding the prototype are described: the integration to the virtual city simulation Terra Dynamica on the one side and a perceptive listening comparison test on the other
Bilhanan, Anuleka. "High level synthesis of an image processing algorithm for cancer detection." [Tampa, Fla.] : University of South Florida, 2004. http://purl.fcla.edu/fcla/etd/SFE0000303.
Full textSilva, Marcio José da. "Modelagem de um sistema para auralização musical utilizando Wave Field Synthesis." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/27/27158/tde-18052015-163521/.
Full textSeeking the practical application of the theory of Wave Field Synthesis (WFS) in music, a research aimed at modeling a sound system capable of creating spatial sound images with the use of this technique was made. Unlike most other techniques for sound projection that work with a small, localized listening area, WFS allows projecting the sounds of each sound source - such as musical instruments and voices - at different points within the hearing space, in a region that can cover almost the entire area comprised by this space, depending on the amount of installed speakers. The development of a modular structured code for WFS was based on the patch-oriented platform Pure Data (Pd), and on the AUDIENCE auralization system developed at USP, and it is integrable as a tool for interactive sound spatialization. The solution employs dynamic patches and a modular architecture, allowing code flexibility and maintainability, with advantages compared to other existing software, particularly in the installation, operation and to handle a large number of sound sources and speakers. For this system special speakers with features that facilitate its use in musical applications were also developed.
Gonon, Gilles. "Proposition d'un schéma d'analyse/synthèse adaptatif dans le plan temps-fréquence basé sur des critères entropiques : application au codage audio par transformée." Le Mans, 2002. http://cyberdoc.univ-lemans.fr/theses/2002/2002LEMA1004.pdf.
Full textAdaptive representations contribute to the study and caracterization of the information carried by any signal. In this work, we present a new decomposition which uses separated segmentation criterias in time and frequency to improve the adaptivity of the analysis to the signal. This scheme is applied to a transform perceptual audio coder. The signal is first temporally segmented using a local entropic criteria. Based upon an estimator of the local entropy, the segmentation criteria is relevant of the entropy variations in a signal and allows to separate stationnary parts from transients ones. Temporal frames thus defined are frequentially filtered using the Wavelet Packet Decomposition and the adaptation is performed by the mean of the Best Basis Search Algorithm. An extension of the library of dyadic basis is derived to improve the entropic gain performed over the signal and so the adaptivity of the decomposition. The perceptual audio coder we developped follows an original design in order to include the proposed scheme. The whole implementation of the coder is described in the document. This coder is evaluated with subjective tests, performed according to absolute and blind comparison for a rate of 96 kbps. As many parts of our coder are still to be improved, results show a subjective quality equivalent to the tested standard and hardly transparent toward the original sounds