Relevant bibliographies by topics / Audio synthesi

Academic literature on the topic 'Audio synthesi'

Author: Grafiati

Published: 9 March 2023

Last updated: 10 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Audio synthesi.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Audio synthesi"

Mikulicz, Szymon. "Precise Inter-Device Audio Playback Synchronization for Linux." International Journal of Signal Processing Systems 9, no. 3 (September 2021): 17–21. http://dx.doi.org/10.18178/ijsps.9.3.17-21.

Full text

Abstract:

Wave Field Synthesis is a method producing sound that uses arrays of closely placed speakers. This creates an unique challenge for distributed playback systems. Because of clock frequency drift, the playback must constantly be corrected via interpolation and shifting in time of the played stream. In this paper a new approach to network based audio playback synchronization is presented, that makes heavy use of the PTP network time synchronization protocol and ALSA Linux audio subsystem. The software does not need any specialized hardware and can approximate precisely how the playback stream should be interpolated via a set of statistical indicators. The evaluation shows that the difference between two devices playing audio using the presented system is under 10 μs for 99 % of the time, which fully satisfies the requirements of Wave Field Synthesis. The system was compared to other network audio synchronization systems available currently: NetJack2, RAVENNA and Snapcast, all of which had from 10 to 50 times higher differences between two devices than the presented system.

APA, Harvard, Vancouver, ISO, and other styles

VOITKO, Viktoriia, Svitlana BEVZ, Sergii BURBELO, and Pavlo STAVYTSKYI. "AUDIO GENERATION TECHNOLOGY OF A SYSTEM OF SYNTHESIS AND ANALYSIS OF MUSIC COMPOSITIONS." Herald of Khmelnytskyi National University 305, no. 1 (February 23, 2022): 64–67. http://dx.doi.org/10.31891/2307-5732-2022-305-1-64-67.

Full text

Abstract:

System of audio synthesis and analysis of music compositions is considered. It consists of two primary parts, the audio analysis component, and the music synthesis component. An audio generation component implements various ways of creating audio sequences. One of them is aimed to record melodies played with voice and transform them into sequences played with selected musical instruments. In addition, an audio input created with a human voice can be utilized as a seed, that is used to generate similar music sequences using artificial intelligence. Finally, a manual approach for music generation and editing is available. After automatic mechanisms for composition generation are used, the results of their work are presented on a two-dimensional plane which represents the dependence of music note pitches on time. It is possible to manually adjust the result of audio generation or create new music sequences with this approach. A creation process could be used iteratively to create multiple parallel music sequences that are to be played as a single audio composition. To implement a seed-based audio synthesis, a deep learning architecture based on a variational autoencoder is used to train a neural network that can reproduce input-like data. When using such an approach an additional important step must be considered. All the input data must be converted from a raw audio format to spectrograms which are represented as grayscale images. Moreover, the result of a sound generation is also represented in a spectrogram and therefore, must be converted back to an output audio format that can be played using speakers. This is required as using spectrograms helps to discard redundant data that raw audio format contains and thus significantly reduces resources consumption and increases overall synthesis speed.

APA, Harvard, Vancouver, ISO, and other styles

George, E. Bryan, and Mark J. T. Smith. "Audio analysis/synthesis system." Journal of the Acoustical Society of America 97, no. 3 (March 1995): 2016. http://dx.doi.org/10.1121/1.412041.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Naihan, Yanqing Liu, Yu Wu, Shujie Liu, Sheng Zhao, and Ming Liu. "RobuTrans: A Robust Transformer-Based Text-to-Speech Model." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8228–35. http://dx.doi.org/10.1609/aaai.v34i05.6337.

Full text

Abstract:

Recently, neural network based speech synthesis has achieved outstanding results, by which the synthesized audios are of excellent quality and naturalness. However, current neural TTS models suffer from the robustness issue, which results in abnormal audios (bad cases) especially for unusual text (unseen context). To build a neural model which can synthesize both natural and stable audios, in this paper, we make a deep analysis of why the previous neural TTS models are not robust, based on which we propose RobuTrans (Robust Transformer), a robust neural TTS model based on Transformer. Comparing to TransformerTTS, our model first converts input texts to linguistic features, including phonemic features and prosodic features, then feed them to the encoder. In the decoder, the encoder-decoder attention is replaced with a duration-based hard attention mechanism, and the causal self-attention is replaced with a "pseudo non-causal attention" mechanism to model the holistic information of the input. Besides, the position embedding is replaced with a 1-D CNN, since it constrains the maximum length of synthesized audio. With these modifications, our model not only fix the robustness problem, but also achieves on parity MOS (4.36) with TransformerTTS (4.37) and Tacotron2 (4.37) on our general set.

APA, Harvard, Vancouver, ISO, and other styles

Wang, Cheng-i., and Shlomo Dubnov. "Guided Music Synthesis with Variable Markov Oracle." Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 10, no. 5 (June 29, 2021): 55–62. http://dx.doi.org/10.1609/aiide.v10i5.12767.

Full text

Abstract:

In this work the problem of guided improvisation is approached and elaborated; then a new method, Variable Markov Oracle, for guided music synthesis is proposed as the first step to tackle the guided improvisation problem. Variable Markov Oracle is based on previous results from Audio Oracle, which is a fast indexing and recombination method of repeating sub-clips in an audio signal. The newly proposed Variable Markov Oracle is capable of identifying inherent datapoint clusters in an audio signal while tracking the sequential relations among clusters at the same time. With a target audio signal indexed by Variable Markov Oracle, a query-matching algorithm is devised to synthesize new music materials by recombination of the target audio matched to a query audio. This approach makes the query-matching algorithm a solution to the guided music synthesis problem. The query-matching algorithm is efficient and intelligent since it follows the inherent clusters discovered by Variable Markov Oracle, creating a query-by-content result which allows numerous applications in concatenative synthesis, machine improvisation and interactive music system. Examples of using Variable Markov Oracle to synthesize new musical materials based on given music signals in the style of Jazz are shown.

APA, Harvard, Vancouver, ISO, and other styles

Cabrera, Andrés, JoAnn Kuchera-Morin, and Curtis Roads. "The Evolution of Spatial Audio in the AlloSphere." Computer Music Journal 40, no. 4 (December 2016): 47–61. http://dx.doi.org/10.1162/comj_a_00382.

Full text

Abstract:

Spatial audio has been at the core of the multimodal experience at the AlloSphere, a unique instrument for data discovery and exploration through interactive immersive display, since its conception. The AlloSphere multichannel spatial audio design has direct roots in the history of electroacoustic spatial audio and is the result of previous activities in spatial audio at the University of California at Santa Barbara. A concise technical description of the AlloSphere, its architectural and acoustic features, its unique 3-D visual projection system, and the current 54.1 Meyer Sound audio infrastructure is presented, with details of the audio software architecture and the immersive sound capabilities it supports. As part of the process of realizing scientific and artistic projects for the AlloSphere, spatial audio research has been conducted, including the use of decorrelation of audio signals to supplement spatialization and tackling the thorny problem of interactive up-mixing through the Sound Element Spatializer and the Zirkonium Chords project. The latter uses the metaphor of geometric spatial chords as a high-level means of spatial up-mixing in performance. Other developments relating to spatial audio are presented, such as Ryan McGee's Spatial Modulation Synthesis, which simultaneously explores the synthesis of space and timbre.

APA, Harvard, Vancouver, ISO, and other styles

Park, Se Jin, Minsu Kim, Joanna Hong, Jeongsoo Choi, and Yong Man Ro. "SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2062–70. http://dx.doi.org/10.1609/aaai.v36i2.20102.

Full text

Abstract:

The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation learning or leverage intermediate structural information such as landmarks and 3D models. However, they struggle to synthesize fine details of the lips varying at the phoneme level as they do not sufficiently provide visual information of the lips at the video synthesis step. To overcome this limitation, our work proposes Audio-Lip Memory that brings in visual information of the mouth region corresponding to input audio and enforces fine-grained audio-visual coherence. It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time. Therefore, using the retrieved lip motion features as visual hints, it can easily correlate audio with visual dynamics in the synthesis step. By analyzing the memory, we demonstrate that unique lip features are stored in each memory slot at the phoneme level, capturing subtle lip motion based on memory addressing. In addition, we introduce visual-visual synchronization loss which can enhance lip-syncing performance when used along with audio-visual synchronization loss in our model. Extensive experiments are performed to verify that our method generates high-quality video with mouth shapes that best align with the input audio, outperforming previous state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

Kuntz, Matthieu, and Bernhard U. Seeber. "Spatial audio for interactive hearing research." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 265, no. 2 (February 1, 2023): 5120–27. http://dx.doi.org/10.3397/in_2022_0741.

Full text

Abstract:

The use of sound field synthesis for hearing research has gained popularity due to the ability to auralize a wide range of sound scenes in a controlled and reproducible way. We are interested in reproducing acoustic environments for interactive hearing research, allowing participants to move freely over an extended area in the reproduced sound field. While the physically accurate sound field reproduction using sound field synthesis is limited to the sweet spot, it is unclear how different perceptual measures vary across the reproduction area and how suitable sound field synthesis is to evaluate them. To investigate the viability of listening experiments and provide a database for modelling approaches, measurements of binaural cues were carried out in the Simulated Open Field Environment loudspeaker array. Results show that the binaural cues are reproduced well close to the center, but exhibit more variance than in the corresponding free field case. Off center, lower interaural coherence is observed, which can affect binaural unmasking and speech intelligibility. In this work, we study binaural cues and speech reception thresholds over a wide area in the loudspeaker array to investigate the feasibility of psychoacoustic experiments involving speech understanding.

APA, Harvard, Vancouver, ISO, and other styles

Loy, D. Gareth. "The Systems Concepts Digital Synthesizer: An Architectural Retrospective." Computer Music Journal 37, no. 3 (September 2013): 49–67. http://dx.doi.org/10.1162/comj_a_00193.

Full text

Abstract:

In the mid 1970s, specialized hardware for synthesizing digital audio helped computer music research move beyond its early reliance on software synthesis running on slow mainframe computers. This hardware allowed for synthesis of complex musical scores in real time and for dynamic, interactive control of synthesis. Peter Samson developed one such device, the Systems Concepts Digital Synthesizer, for Stanford University's Center for Computer Research in Music and Acoustics. The “Samson Box” addressed the classical problems of digital audio synthesis with an elegance that still rewards study. This article thoroughly examines the principles underlying the Box's design—while considering how it was actually employed by its users—and describes the architecture's advantages and disadvantages. An interview with Samson is included.

APA, Harvard, Vancouver, ISO, and other styles

Bessell, David. "Dynamic Convolution Modeling, a Hybrid Synthesis Strategy." Computer Music Journal 37, no. 1 (March 2013): 44–51. http://dx.doi.org/10.1162/comj_a_00159.

Full text

Abstract:

This article outlines a hybrid approach to the synthesis of percussion sounds. The synthesis method described here combines techniques and concepts from physical modeling and convolution to produce audio synthesis of percussive instruments. This synthesis method not only achieves a high degree of realism in comparison with audio samples but also retains some of the flexibility associated with waveguide physical models. When the results are analyzed, the method exhibits some interesting detailed spectral features that have some aspects in common with the behavior of acoustic percussion instruments. In addition to outlining the synthesis process, the article discusses some of the more creative possibilities inherent in this approach, e.g., the use and free combination of excitation and resonance sources from beyond the realms of the purely percussive examples given.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Audio synthesi"

CHEMLA, ROMEU SANTOS AXEL CLAUDE ANDRE'. "MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES." Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/700444.

Full text

Abstract:

Tra i diversi campi di ricerca nell’ambito dell’informatica musicale, la sintesi e la generazione di segnali audio incarna la pluridisciplinalità di questo settore, nutrendo insieme le pratiche scientifiche e musicale dalla sua creazione. Inerente all’informatica dalla sua creazione, la generazione audio ha ispirato numerosi approcci, evolvendo colle pratiche musicale e gli progressi tecnologici e scientifici. Inoltre, alcuni processi di sintesi permettono anche il processo inverso, denominato analisi, in modo che i parametri di sintesi possono anche essere parzialmente o totalmente estratti dai suoni, dando una rappresentazione alternativa ai segnali analizzati. Per di più, la recente ascesa dei algoritmi di l’apprendimento automatico ha vivamente interrogato il settore della ricerca scientifica, fornendo potenti data-centered metodi che sollevavano diversi epistemologici interrogativi, nonostante i sui efficacia. Particolarmente, un tipo di metodi di apprendimento automatico, denominati modelli generativi, si concentrano sulla generazione di contenuto originale usando le caratteristiche che hanno estratti dei dati analizzati. In tal caso, questi modelli non hanno soltanto interrogato i precedenti metodi di generazione, ma anche sul modo di integrare questi algoritmi nelle pratiche artistiche. Mentre questi metodi sono progressivamente introdotti nel settore del trattamento delle immagini, la loro applicazione per la sintesi di segnali audio e ancora molto marginale. In questo lavoro, il nostro obiettivo e di proporre un nuovo metodo di audio sintesi basato su questi nuovi tipi di generativi modelli, rafforazti dalle nuove avanzati dell’apprendimento automatico. Al primo posto, facciamo una revisione dei approcci esistenti nei settori dei sistemi generativi e di sintesi sonore, focalizzando sul posto di nostro lavoro rispetto a questi disciplini e che cosa possiamo aspettare di questa collazione. In seguito, studiamo in maniera più precisa i modelli generativi, e come possiamo utilizzare questi recenti avanzati per l’apprendimento di complesse distribuzione di suoni, in un modo che sia flessibile e nel flusso creativo del utente. Quindi proponiamo un processo di inferenza / generazione, il quale rifletta i processi di analisi/sintesi che sono molto usati nel settore del trattamento del segnale audio, usando modelli latenti, che sono basati sull’utilizzazione di un spazio continuato di alto livello, che usiamo per controllare la generazione. Studiamo dapprima i risultati preliminari ottenuti con informazione spettrale estratte da diversi tipi di dati, che valutiamo qualitativamente e quantitativamente. Successiva- mente, studiamo come fare per rendere questi metodi più adattati ai segnali audio, fronteggiando tre diversi aspetti. Primo, proponiamo due diversi metodi di regolarizzazione di questo generativo spazio che sono specificamente sviluppati per l’audio : una strategia basata sulla traduzione segnali / simboli, e una basata su vincoli percettivi. Poi, proponiamo diversi metodi per fronteggiare il aspetto temporale dei segnali audio, basati sull’estrazione di rappresentazioni multiscala e sulla predizione, che permettono ai generativi spazi ottenuti di anche modellare l’aspetto dinamico di questi segnali. Per finire, cambiamo il nostro approccio scientifico per un punto di visto piú ispirato dall’idea di ricerca e creazione. Primo, descriviamo l’architettura e il design della nostra libreria open-source, vsacids, sviluppata per permettere a esperti o non-esperti musicisti di provare questi nuovi metodi di sintesi. Poi, proponiamo una prima utilizzazione del nostro modello con la creazione di una performance in real- time, chiamata ægo, basata insieme sulla nostra libreria vsacids e sull’uso di une agente di esplorazione, imparando con rinforzo nel corso della composizione. Finalmente, tramo dal lavoro presentato alcuni conclusioni sui diversi modi di migliorare e rinforzare il metodo di sintesi proposto, nonché eventuale applicazione artistiche.
Among the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.
À travers les différents domaines de recherche de la musique computationnelle, l’analysie et la génération de signaux audio sont l’exemple parfait de la trans-disciplinarité de ce domaine, nourrissant simultanément les pratiques scientifiques et artistiques depuis leur création. Intégrée à la musique computationnelle depuis sa création, la synthèse sonore a inspiré de nombreuses approches musicales et scientifiques, évoluant de pair avec les pratiques musicales et les avancées technologiques et scientifiques de son temps. De plus, certaines méthodes de synthèse sonore permettent aussi le processus inverse, appelé analyse, de sorte que les paramètres de synthèse d’un certain générateur peuvent être en partie ou entièrement obtenus à partir de sons donnés, pouvant ainsi être considérés comme une représentation alternative des signaux analysés. Parallèlement, l’intérêt croissant soulevé par les algorithmes d’apprentissage automatique a vivement questionné le monde scientifique, apportant de puissantes méthodes d’analyse de données suscitant de nombreux questionnements épistémologiques chez les chercheurs, en dépit de leur effectivité pratique. En particulier, une famille de méthodes d’apprentissage automatique, nommée modèles génératifs, s’intéressent à la génération de contenus originaux à partir de caractéristiques extraites directement des données analysées. Ces méthodes n’interrogent pas seulement les approches précédentes, mais aussi sur l’intégration de ces nouvelles méthodes dans les processus créatifs existants. Pourtant, alors que ces nouveaux processus génératifs sont progressivement intégrés dans le domaine la génération d’image, l’application de ces techniques en synthèse audio reste marginale. Dans cette thèse, nous proposons une nouvelle méthode d’analyse-synthèse basés sur ces derniers modèles génératifs, depuis renforcés par les avancées modernes dans le domaine de l’apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des systèmes génératifs, sur comment notre travail peut s’insérer dans les pratiques de synthèse sonore existantes, et que peut-on espérer de l’hybridation de ces deux approches. Ensuite, nous nous focaliserons plus précisément sur comment les récentes avancées accomplies dans ce domaine dans ce domaine peuvent être exploitées pour l’apprentissage de distributions sonores complexes, tout en étant suffisamment flexibles pour être intégrées dans le processus créatif de l’utilisateur. Nous proposons donc un processus d’inférence / génération, reflétant les paradigmes d’analyse-synthèse existant dans le domaine de génération audio, basé sur l’usage de modèles latents continus que l’on peut utiliser pour contrôler la génération. Pour ce faire, nous étudierons déjà les résultats préliminaires obtenus par cette méthode sur l’apprentissage de distributions spectrales, prises d’ensembles de données diversifiés, en adoptant une approche à la fois quantitative et qualitative. Ensuite, nous proposerons d’améliorer ces méthodes de manière spécifique à l’audio sur trois aspects distincts. D’abord, nous proposons deux stratégies de régularisation différentes pour l’analyse de signaux audio : une basée sur la traduction signal/ symbole, ainsi qu’une autre basée sur des contraintes perceptives. Nous passerons par la suite à la dimension temporelle de ces signaux audio, proposant de nouvelles méthodes basées sur l’extraction de représentations temporelles multi-échelle et sur une tâche supplémentaire de prédiction, permettant la modélisation de caractéristiques dynamiques par les espaces génératifs obtenus. En dernier lieu, nous passerons d’une approche scientifique à une approche plus orientée vers un point de vue recherche & création. Premièrement, nous présenterons notre librairie open-source, vsacids, visant à être employée par des créateurs experts et non-experts comme un outil intégré. Ensuite, nous proposons une première utilisation musicale de notre système par la création d’une performance temps réel, nommée ægo, basée à la fois sur notre librarie et sur un agent d’exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu’à maintenant, concernant les possibles améliorations et développements de la méthode de synthèse proposée, ainsi que sur de possibles applications créatives.

APA, Harvard, Vancouver, ISO, and other styles

Lundberg, Anton. "Data-Driven Procedural Audio : Procedural Engine Sounds Using Neural Audio Synthesis." Thesis, KTH, Datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280132.

Full text

Abstract:

The currently dominating approach for rendering audio content in interactivemedia, such as video games and virtual reality, involves playback of static audiofiles. This approach is inflexible and requires management of large quantities of audio data. An alternative approach is procedural audio, where sound models are used to generate audio in real time from live inputs. While providing many advantages, procedural audio has yet to find widespread use in commercial productions, partly due to the audio produced by many of the proposed models not meeting industry standards. This thesis investigates how procedural audio can be performed using datadriven methods. We do this by specifically investigating how to generate the sound of car engines using neural audio synthesis. Building on a recently published method that integrates digital signal processing with deep learning, called Differentiable Digital Signal Processing (DDSP), our method obtains sound models by training deep neural networks to reconstruct recorded audio examples from interpretable latent features. We propose a method for incorporating engine cycle phase information, as well as a differentiable transient synthesizer. Our results illustrate that DDSP can be used for procedural engine sounds; however, further work is needed before our models can generate engine sounds without undesired artifacts and before they can be used in live real-time applications. We argue that our approach can be useful for procedural audio in more general contexts, and discuss how our method can be applied to other sound sources.
Det i dagsläget dominerande tillvägagångssättet för rendering av ljud i interaktivamedia, såsom datorspel och virtual reality, innefattar uppspelning av statiska ljudfiler. Detta tillvägagångssätt saknar flexibilitet och kräver hantering av stora mängder ljuddata. Ett alternativt tillvägagångssätt är procedurellt ljud, vari ljudmodeller styrs för att generera ljud i realtid. Trots sina många fördelar används procedurellt ljud ännu inte i någon vid utsträckning inom kommersiella produktioner, delvis på grund av att det genererade ljudet från många föreslagna modeller inte når upp till industrins standarder. Detta examensarbete undersöker hur procedurellt ljud kan utföras med datadrivna metoder. Vi gör detta genom att specifikt undersöka metoder för syntes av bilmotorljud baserade på neural ljudsyntes. Genom att bygga på en nyligen publicerad metod som integrerar digital signalbehandling med djupinlärning, kallad Differentiable Digital Signal Processing (DDSP), kan vår metod skapa ljudmodeller genom att träna djupa neurala nätverk att rekonstruera inspelade ljudexempel från tolkningsbara latenta prediktorer. Vi föreslår en metod för att använda fasinformation från motorers förbränningscykler, samt en differentierbar metod för syntes av transienter. Våra resultat visar att DDSP kan användas till procedurella motorljud, men mer arbete krävs innan våra modeller kan generera motorljud utan oönskade artefakter samt innan de kan användas i realtidsapplikationer. Vi diskuterar hur vårt tillvägagångssätt kan vara användbart inom procedurellt ljud i mer generella sammanhang, samt hur vår metod kan tillämpas på andra ljudkällor

APA, Harvard, Vancouver, ISO, and other styles

Elfitri, I. "Analysis by synthesis spatial audio coding." Thesis, University of Surrey, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.590657.

Full text

Abstract:

Spatial Audio Coding (SAC) is a technique used to encode multichannel audio signals by extract ing the spatial parameters and downmixing the audio signals to a mono or stereo audio signal. Recently, various SAC techniques have been proposed to efficiently encode multichannel audio signals. However, all of them operate in open-loop, where the encoder and decoder operate sequentially and independently, and, thus, lack a mechanism for minimising the decoded audio reconstruction error. This thesis proposes a novel SAC technique that utilises the closed-loop system configuration, termed Analysis by Synthesis (AbS), in order to optimise the downmix: signal and the spatial parameters, so as to minimise the decoded signal error. In order to show the effect of the AbS optimisations, the Reverse One-To-Two (R-OTT) module, used in the MPEG Surround (MPS) , must first be applied in the frequency domain to recalculate the downmix and residual signals based on the quantised spatial parameters. These parameters show that the AbS scheme can minimise the quantisation errors of the spatial parameters. As the full AbS is far too complicated to be applied in practice, a simplified AbS algorithm for finding sub-optimal parameters, based on the adapted R-OTT module, is also proposed. Subjective tests show that the proposed Analysis by Synthesis Spatial Audio Coding (AbS-SAC), encoding 5-channel audio signals at a bitrate of 51.2 kb/s per audio channel, achieves higher Subjective Difference Grade (SDG) scores than the tested Advanced Audio Coding (AAC) technique. Furthermore, the objective test also shows that the proposed AbS-SAC method, operating at bitrates of 40 to 96 kb/s per audio channel, significantly outperforms (in terms of Objective Difference Grade (ODG) scores) the tested AAC multichannel technique.

APA, Harvard, Vancouver, ISO, and other styles

Wood, Steven Gregory. "Objective Test Methods for Waveguide Audio Synthesis." BYU ScholarsArchive, 2007. https://scholarsarchive.byu.edu/etd/853.

Full text

Abstract:

Acoustic Physical Modeling has emerged as a newer musical synthesis technique. The most common form of physical modeling synthesis in both industry and academia is digital waveguide synthesis. Commercially available for the past thirteen years, the top synthesizer manufacturers have chosen to include physical modeling synthesis in their top of the line models. In the area of audio quality testing, the most common tests have traditionally been group listening tests. While these tests are subjective and can be expensive and time-consuming, the results are validated by the groups' proper quality standards. Research has been conducted to evaluate objective testing procedures in order to find alternative methods for testing audio quality. This research has resulted in various standards approved by the International Telecommunication Union. Tests have proven the reliability of these objective test methods in the areas of telephony as well as various codecs, including MP3. The objective of this research is to determine whether objective test measurements can be used reliably in the area of acoustic physical modeling synthesis, specifically digital waveguide synthesis. Both the Perceptual Audio Quality Measure (PAQM) and Noise-To-Mask Ratio (NMR) objective tests will be performed on the Karplus-Strong algorithm form of Digital Waveguide synthesis. A corresponding listening test based on the Mean Opinion Score (MOS) will also be conducted, and the results from the objective and subjective tests will be compared. The results will show that more research and work needs to be done in this area, as neither the PAQM nor NMR algorithms sufficiently matched the output of the subjective listening tests. Recommendations will be made for future work.

APA, Harvard, Vancouver, ISO, and other styles

Ustun, Selen. "Audio browsing of automaton-based hypertext." Thesis, Texas A&M University, 2003. http://hdl.handle.net/1969.1/33.

Full text

Abstract:

With the wide-spread adoption of hypermedia systems and the World Wide Web (WWW) in particular, these systems have evolved from simple systems with only textual content to those that incorporate a large content base, which consists of a wide variety of document types. Also, with the increase in the number of users, there has grown a need for these systems to be accessible to a wider range of users. Consequently, the growth of the systems along with the number and variety of users require new presentation and navigation mechanisms for a wider audience. One of the new presentation methods is the audio-only presentation of hypertext content and this research proposes a novel solution to this problem for complex and dynamic systems. The hypothesis is that the proposed Audio Browser is an efficient tool for presenting hypertext in audio format, which will prove to be useful for several applications including browsers for visually-impaired and remote users. The Audio Browser provides audio-only browsing of contents in a Petri-based hypertext system called Context-Aware Trellis (caT). It uses a combination of synthesized speech and pre-recorded speech to allow its user to listen to contents of documents, follow links, and get information about the navigation process. It also has mechanisms for navigating within documents in order to allow users to view contents more quickly.

APA, Harvard, Vancouver, ISO, and other styles

Jehan, Tristan 1974. "Perceptual synthesis engine : an audio-driven timbre generator." Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/61543.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2001.
Includes bibliographical references (leaves 68-75).
A real-time synthesis engine which models and predicts the timbre of acoustic instruments based on perceptual features extracted from an audio stream is presented. The thesis describes the modeling sequence including the analysis of natural sounds, the inference step that finds the mapping between control and output parameters, the timbre prediction step, and the sound synthesis. The system enables applications such as cross-synthesis, pitch shifting or compression of acoustic instruments, and timbre morphing between instrument families. It is fully implemented in the Max/MSP environment. The Perceptual Synthesis Engine was developed for the Hyperviolin as a novel, generic and perceptually meaningful synthesis technique for non-discretely pitched instruments.
by Tristan Jehan.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

Payne, R. G. "Digital techniques for the analysis and synthesis of audio signals." Thesis, Bucks New University, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.234706.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Coulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Mémoire, Université de Sherbrooke, 2000. http://savoirs.usherbrooke.ca/handle/11143/1078.

Full text

Abstract:

Les objectifs de notre recherche s’exposent en deux grands points : 1) Explorer les techniques de codage param étrique à synthèse sinusoïdale et les appliquer aux signaux audio (principalement de musique). 2) Améliorer la qualité intrinsèque de ces modèles notamment au niveau des compromis temps/fréquence propres au codage par transformées. Nous avons comme méthodologie, effectué des simulations en « C » et en MATLAB des récents algorithmes de synthèse sinusoïdale, mais en nous inspirant en particulier du codeur MSLPC (Multisinusoid LPC) de Wen- Whei C, De-Yu W. et Li-Wei W. de l’Université Nationale Chiao-Tung de Taiwan (5). Ce mémoire contient quatre chapitres. Le Chapitre 1 présente une introduction et une mise en contexte. Le chapitre 2 présente un aperçu sur le codage paramétrique et l’intérêt de cette technique. Une présentation des types de codeurs paramétriques existants suivra. Le chapitre 3 est consacré à la description des différentes étapes parcourues dans la conception d’un codeur à synthèse sinusoïdale avec des méthodes récemment développées. Le chapitre 4 présente la conception et l’implantation rigoureuse du modèle ainsi qu'une présentation de notre compromis temps/fréquence proposée pour améliorer la qualité intrinsèque du codeur sinusoïdal. Dans ce chapitre 4, nous présentons aussi une évaluation informelle de la performance de notre modèle. Enfin nous terminerons ce mémoire par une conclusion.

APA, Harvard, Vancouver, ISO, and other styles

Coulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Sherbrooke : Université de Sherbrooke, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Andreux, Mathieu. "Foveal autoregressive neural time-series modeling." Electronic Thesis or Diss., Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE073.

Full text

Abstract:

Cette thèse s'intéresse à la modélisation non-supervisée de séries temporelles univariées. Nous abordons tout d'abord le problème de prédiction linéaire des valeurs futures séries temporelles gaussiennes sous hypothèse de longues dépendances, qui nécessitent de tenir compte d'un large passé. Nous introduisons une famille d'ondelettes fovéales et causales qui projettent les valeurs passées sur un sous-espace adapté au problème, réduisant ainsi la variance des estimateurs associés. Dans un deuxième temps, nous cherchons sous quelles conditions les prédicteurs non-linéaires sont plus performants que les méthodes linéaires. Les séries temporelles admettant une représentation parcimonieuse en temps-fréquence, comme celles issues de l'audio, réunissent ces conditions, et nous proposons un algorithme de prédiction utilisant une telle représentation. Le dernier problème que nous étudions est la synthèse de signaux audios. Nous proposons une nouvelle méthode de génération reposant sur un réseau de neurones convolutionnel profond, avec une architecture encodeur-décodeur, qui permet de synthétiser de nouveaux signaux réalistes. Contrairement à l'état de l'art, nous exploitons explicitement les propriétés temps-fréquence des sons pour définir un encodeur avec la transformée en scattering, tandis que le décodeur est entraîné pour résoudre un problème inverse dans une métrique adaptée
This dissertation studies unsupervised time-series modelling. We first focus on the problem of linearly predicting future values of a time-series under the assumption of long-range dependencies, which requires to take into account a large past. We introduce a family of causal and foveal wavelets which project past values on a subspace which is adapted to the problem, thereby reducing the variance of the associated estimators. We then investigate under which conditions non-linear predictors exhibit better performances than linear ones. Time-series which admit a sparse time-frequency representation, such as audio ones, satisfy those requirements, and we propose a prediction algorithm using such a representation. The last problem we tackle is audio time-series synthesis. We propose a new generation method relying on a deep convolutional neural network, with an encoder-decoder architecture, which allows to synthesize new realistic signals. Contrary to state-of-the-art methods, we explicitly use time-frequency properties of sounds to define an encoder with the scattering transform, while the decoder is trained to solve an inverse problem in an adapted metric

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Audio synthesi"

Kunow, Kristian. Rundfunk und Internet: These, Antithese, Synthese? Edited by Arbeitsgemeinschaft der Landesmedienanstalten in der Bundesrepublik Deutschland. Berlin: Vistas, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Audio system for technical readings. Berlin: Springer, 1998.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

1944-, Kamajou François, and Cameroon. Ministry of Scientific Research., eds. Audit scientifique de la recherche agricole au Cameroun: Synthèse de l'audit, rapport général. [Yaoundé]: République du Cameroun, Ministère de la recherche scientifique et technique, 1993.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Joachim, Hornegger, ed. Pattern recognition and image processing in C [plus] [plus]. Wiesbaden: Vieweg, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Audio Engineering Society. International Conference. Virtual, synthetic and entertainment audio: The proceedings of the AES 22nd international conference 2002, June 15-17, Espoo, Finland. New York: AES, 2002.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Nakagawa, Seiichi. Speech, hearing and neural network models. Tokyo: Ohmsha, 1995.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Junqua, Jean-Claude. Robustness in automatic speech recognition: Fundamentals and applications. Boston: Kluwer Academic Publishers, 1996.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Rick, Beasley, ed. Voice application development with VoiceXML. Indianapolis, Ind: Sams, 2002.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

IEEE/RSJ International Conference on Intelligent Robots and Systems (2001 Maui, Hawaii). Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2001): Expanding the societal role of robotics in the next millennium, October 29-November 3, 2001, Outrigger Wailea Resort, Maui, Hawaii, USA. Piscataway, N.J: IEEE, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

IEEE Workshop on Automatic Speech Recognition and Understanding (1997 Santa Barbara, Calif.). 1997 IEEE Workshop on Automatic Speech Recognition and Understanding proceedings. Piscataway, NJ: Published under the sponsorship of the IEEE Signal Processing Society, 1997.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Audio synthesi"

Weik, Martin H. "audio synthesis." In Computer Science and Communications Dictionary, 77. Boston, MA: Springer US, 2000. http://dx.doi.org/10.1007/1-4020-0613-6_1022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tarr, Eric. "Introduction to Signal Synthesis." In Hack Audio, 79–101. New York, NY : Routledge, 2019. | Series: Audio Engineering Society presents: Routledge, 2018. http://dx.doi.org/10.4324/9781351018463-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bode, Peer D. "Analog Audio Synthesis." In The Routledge Companion to Media Technology and Obsolescence, 148–63. New York : Routledge/Taylor & Francis Group, 2019.: Routledge, 2018. http://dx.doi.org/10.4324/9781315442686-11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Van Every, Shawn. "Audio Synthesis and Analysis." In Pro Android Media, 179–93. Berkeley, CA: Apress, 2009. http://dx.doi.org/10.1007/978-1-4302-3268-1_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Jackson, Wallace. "The Synthesis of Digital Audio: Tone Generation." In Digital Audio Editing Fundamentals, 93–105. Berkeley, CA: Apress, 2015. http://dx.doi.org/10.1007/978-1-4842-1648-4_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Jackson, Wallace. "The History of Digital Audio: MIDI and Synthesis." In Digital Audio Editing Fundamentals, 11–17. Berkeley, CA: Apress, 2015. http://dx.doi.org/10.1007/978-1-4842-1648-4_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chen, Jiashu. "3D Audio and Virtual Acoustical Environment Synthesis." In Acoustic Signal Processing for Telecommunication, 283–301. Boston, MA: Springer US, 2000. http://dx.doi.org/10.1007/978-1-4419-8644-3_13.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Záviška, Pavel, Pavel Rajmic, Zdeněk Průša, and Vítězslav Veselý. "Revisiting Synthesis Model in Sparse Audio Declipper." In Latent Variable Analysis and Signal Separation, 429–45. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-93764-9_40.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bernardes, Gilberto, and Diogo Cocharro. "Dynamic Music Generation, Audio Analysis-Synthesis Methods." In Encyclopedia of Computer Graphics and Games, 1–4. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-319-08234-9_211-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Huzaifah, Muhammad, and Lonce Wyse. "Deep Generative Models for Musical Audio Synthesis." In Handbook of Artificial Intelligence for Music, 639–78. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-72116-9_22.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Audio synthesi"

Huang, Mincong (Jerry), Samuel Chabot, and Jonas Braasch. "Panoptic Reconstruction of Immersive Virtual Soundscapes Using Human-Scale Panoramic Imagery with Visual Recognition." In ICAD 2021: The 26th International Conference on Auditory Display. icad.org: International Community for Auditory Display, 2021. http://dx.doi.org/10.21785/icad2021.043.

Full text

Abstract:

This work, situated at Rensselaer’s Collaborative-Research Augmented Immersive Virtual Environment Laboratory (CRAIVE-Lab), uses panoramic image datasets for spatial audio display. A system is developed for the room-centered immersive virtual reality facility to analyze panoramic images on a segment-by-segment basis, using pre-trained neural network models for semantic segmentation and object detection, thereby generating audio objects with respective spatial locations. These audio objects are then mapped with a series of synthetic and recorded audio datasets and populated within a spatial audio environment as virtual sound sources. The resulting audiovisual outcomes are then displayed using the facility’s human-scale panoramic display, as well as the 128-channel loudspeaker array for wave field synthesis (WFS). Performance evaluation indicates effectiveness for real-time enhancements, with potentials for large-scale expansion and rapid deployment in dynamic immersive virtual environments.

APA, Harvard, Vancouver, ISO, and other styles

Fox, K. Michael, Jeremy Stewart, and Rob Hamilton. "madBPM: Musical and Auditory Display for Biological Predictive Modeling." In The 23rd International Conference on Auditory Display. Arlington, Virginia: The International Community for Auditory Display, 2017. http://dx.doi.org/10.21785/icad2017.045.

Full text

Abstract:

The modeling of biological data can be carried out using structured sound and musical process in conjunction with integrated visualizations. With a future goal of improving the speed and accuracy of techniques currently in use for the production of synthetic high value chemicals through the greater understanding of data sets, the madBPM project couples real-time audio synthesis and visual rendering with a highly flexible data-ingestion engine. Each component of the madBPM system is modular, allowing for customization of audio, visual and data-based processing.

APA, Harvard, Vancouver, ISO, and other styles

Chatziioannou, Vasileios. "Digital synthesis of impact sounds." In the Audio Mostly 2015. New York, New York, USA: ACM Press, 2015. http://dx.doi.org/10.1145/2814895.2814908.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lie Lu, Yi Mao, Liu Wenyin, and Hong-Jiang Zhang. "Audio restoration by constrained audio texture synthesis." In 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698). IEEE, 2003. http://dx.doi.org/10.1109/icme.2003.1221334.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Schimbinschi, Florin, Christian Walder, Sarah M. Erfani, and James Bailey. "SynthNet: Learning to Synthesize Music End-to-End." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/467.

Full text

Abstract:

We consider the problem of learning a mapping directly from annotated music to waveforms, bypassing traditional single note synthesis. We propose a specific architecture based on WaveNet, a convolutional autoregressive generative model designed for text to speech. We investigate the representations learned by these models on music and concludethat mappings between musical notes and the instrument timbre can be learned directly from the raw audio coupled with the musical score, in binary piano roll format.Our model requires minimal training data (9 minutes), is substantially better in quality and converges 6 times faster in comparison to strong baselines in the form of powerful text to speech models.The quality of the generated waveforms (generation accuracy) is sufficiently high,that they are almost identical to the ground truth.Our evaluations are based on both the RMSE of the Constant-Q transform, and mean opinion scores from human subjects.We validate our work using 7 distinct synthetic instrument timbres, real cello music and also provide visualizations and links to all generated audio.

APA, Harvard, Vancouver, ISO, and other styles

Zhu, Hao, Huaibo Huang, Yi Li, Aihua Zheng, and Ran He. "Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/327.

Full text

Abstract:

Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, cross-modality coherence between audio and video information has not been well addressed during synthesis. In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization. Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.

APA, Harvard, Vancouver, ISO, and other styles

Skinner, Martha. "Audio and Video Drawings Mapping Temporality." In ACADIA 2006: Synthetic Landscapes. ACADIA, 2006. http://dx.doi.org/10.52842/conf.acadia.2006.178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Skinner, Martha. "Audio and Video Drawings Mapping Temporality." In ACADIA 2006: Synthetic Landscapes. ACADIA, 2006. http://dx.doi.org/10.52842/conf.acadia.2006.178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ye, Zhenhui, Zhou Zhao, Yi Ren, and Fei Wu. "SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/620.

Full text

Abstract:

The recent progress in non-autoregressive text-to-speech (NAR-TTS) has made fast and high-quality speech synthesis possible. However, current NAR-TTS models usually use phoneme sequence as input and thus cannot understand the tree-structured syntactic information of the input sequence, which hurts the prosody modeling. To this end, we propose SyntaSpeech, a syntax-aware and light-weight NAR-TTS model, which integrates tree-structured syntactic information into the prosody modeling modules in PortaSpeech. Specifically, 1) We build a syntactic graph based on the dependency tree of the input sentence, then process the text encoding with a syntactic graph encoder to extract the syntactic information. 2) We incorporate the extracted syntactic encoding with PortaSpeech to improve the prosody prediction. 3) We introduce a multi-length discriminator to replace the flow-based post-net in PortaSpeech, which simplifies the training pipeline and improves the inference speed, while keeping the naturalness of the generated audio. Experiments on three datasets not only show that the tree-structured syntactic information grants SyntaSpeech the ability to synthesize better audio with expressive prosody, but also demonstrate the generalization ability of SyntaSpeech to adapt to multiple languages and multi-speaker text-to-speech. Ablation studies demonstrate the necessity of each component in SyntaSpeech. Source code and audio samples are available at https://syntaspeech.github.io.

APA, Harvard, Vancouver, ISO, and other styles

Moore, Carl, and William Brent. "Interactive Real-time Concatenative Synthesis in Virtual Reality." In ICAD 2019: The 25th International Conference on Auditory Display. Newcastle upon Tyne, United Kingdom: Department of Computer and Information Sciences, Northumbria University, 2019. http://dx.doi.org/10.21785/icad2019.068.

Full text

Abstract:

This paper presents a new platform for interactive concatenative synthesis designed for virtual reality and proposes further applications for immersive audio tools and instruments. TimbreSpace VR is an extension of William Brent’s TimbreSpace software using the timbreID library for Pure Data. Design and implementation of the application are discussed, as well as its live performance aspects. Finally, future work is laid out for the project, proposing versatile audio manipulation software specifically for XR platforms.

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Audio synthesi"

Murad, M. Hassan, Stephanie M. Chang, Celia Fiordalisi, Jennifer S. Lin, Timothy J. Wilt, Amy Tsou, Brian Leas, et al. Improving the Utility of Evidence Synthesis for Decision Makers in the Face of Insufficient Evidence. Agency for Healthcare Research and Quality (AHRQ), April 2021. http://dx.doi.org/10.23970/ahrqepcwhitepaperimproving.

Full text

Abstract:

Background: Healthcare decision makers strive to operate on the best available evidence. The Agency for Healthcare Research and Quality Evidence-based Practice Center (EPC) Program aims to support healthcare decision makers by producing evidence reviews that rate the strength of evidence. However, the evidence base is often sparse or heterogeneous, or otherwise results in a high degree of uncertainty and insufficient evidence ratings. Objective: To identify and suggest strategies to make insufficient ratings in systematic reviews more actionable. Methods: A workgroup comprising EPC Program members convened throughout 2020. We conducted interative discussions considering information from three data sources: a literature review for relevant publications and frameworks, a review of a convenience sample of past systematic reviews conducted by the EPCs, and an audit of methods used in past EPC technical briefs. Results: Several themes emerged across the literature review, review of systematic reviews, and review of technical brief methods. In the purposive sample of 43 systematic reviews, the use of the term “insufficient” covered both instances of no evidence and instances of evidence being present but insufficient to estimate an effect. The results of the literature review and review of the EPC Program systematic reviews illustrated the importance of clearly stating the reasons for insufficient evidence. Results of both the literature review and review of systematic reviews highlighted the factors decision makers consider when making decisions when evidence of benefits or harms is insufficient, such as costs, values, preferences, and equity. We identified five strategies for supplementing systematic review findings when evidence on benefit or harms is expected to be or found to be insufficient, including: reconsidering eligible study designs, summarizing indirect evidence, summarizing contextual and implementation evidence, modelling, and incorporating unpublished health system data. Conclusion: Throughout early scoping, protocol development, review conduct, and review presentation, authors should consider five possible strategies to supplement potential insufficient findings of benefit or harms. When there is no evidence available for a specific outcome, reviewers should use a statement such as “no studies” instead of “insufficient.” The main reasons for insufficient evidence rating should be explicitly described.

APA, Harvard, Vancouver, ISO, and other styles

Kiianovska, N. M. The development of theory and methods of using cloud-based information and communication technologies in teaching mathematics of engineering students in the United States. Видавничий центр ДВНЗ «Криворізький національний університет», December 2014. http://dx.doi.org/10.31812/0564/1094.

Full text

Abstract:

The purpose of the study is the analysis of the development of the theory and methods of ICT usage while teaching higher mathematics engineering students in the United States. It was determined following tasks: to analyze the problem source, to identify the state of its elaboration, to identify key trends in the development of theory and methods of ICT usage while teaching higher mathematics engineering students in the United States, the object of study – the use of ICT in teaching engineering students, the research methods are: analysis of scientific, educational, technical, historical sources; systematization and classification of scientific statements on the study; specification, comparison, analysis and synthesis, historical and pedagogical analysis of the sources to establish the chronological limits and implementation of ICT usage in educational practice of U.S. technical colleges. In article was reviewed a modern ICT tools used in learning of fundamental subjects for future engineers in the United States, shown the evolution and convergence of ICT learning tools. Discussed experience of the «best practices» using online ICT in higher engineering education at United States. Some of these are static, while others are interactive or dynamic, giving mathematics learners opportunities to develop visualization skills, explore mathematical concepts, and obtain solutions to self-selected problems. Among ICT tools are the following: tools to transmit audio and video data, tools to collaborate on projects, tools to support object-oriented practice. The analysis leads to the following conclusion: using cloud-based tools of learning mathematic has become the leading trend today. Therefore, university professors are widely considered to implement tools to assist the process of learning mathematics such properties as mobility, continuity and adaptability.

APA, Harvard, Vancouver, ISO, and other styles

Baluk, Nadia, Natalia Basij, Larysa Buk, and Olha Vovchanska. VR/AR-TECHNOLOGIES – NEW CONTENT OF THE NEW MEDIA. Ivan Franko National University of Lviv, February 2021. http://dx.doi.org/10.30970/vjo.2021.49.11074.

Full text

Abstract:

The article analyzes the peculiarities of the media content shaping and transformation in the convergent dimension of cross-media, taking into account the possibilities of augmented reality. With the help of the principles of objectivity, complexity and reliability in scientific research, a number of general scientific and special methods are used: method of analysis, synthesis, generalization, method of monitoring, observation, problem-thematic, typological and discursive methods. According to the form of information presentation, such types of media content as visual, audio, verbal and combined are defined and characterized. The most important in journalism is verbal content, it is the one that carries the main information load. The dynamic development of converged media leads to the dominance of image and video content; the likelihood of increasing the secondary content of the text increases. Given the market situation, the effective information product is a combined content that combines text with images, spreadsheets with video, animation with infographics, etc. Increasing number of new media are using applications and website platforms to interact with recipients. To proceed, the peculiarities of the new content of new media with the involvement of augmented reality are determined. Examples of successful interactive communication between recipients, the leading news agencies and commercial structures are provided. The conditions for effective use of VR / AR-technologies in the media content of new media, the involvement of viewers in changing stories with augmented reality are determined. The so-called immersive effect with the use of VR / AR-technologies involves complete immersion, immersion of the interested audience in the essence of the event being relayed. This interaction can be achieved through different types of VR video interactivity. One of the most important results of using VR content is the spatio-temporal and emotional immersion of viewers in the plot. The recipient turns from an external observer into an internal one; but his constant participation requires that the user preferences are taken into account. Factors such as satisfaction, positive reinforcement, empathy, and value influence the choice of VR / AR content by viewers.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Academic literature on the topic 'Audio synthesi'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Audio synthesi"

Dissertations / Theses on the topic "Audio synthesi"

Books on the topic "Audio synthesi"

Book chapters on the topic "Audio synthesi"

Conference papers on the topic "Audio synthesi"

Reports on the topic "Audio synthesi"