Academic literature on the topic 'Audio synthesi'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Audio synthesi.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Audio synthesi"

1

Mikulicz, Szymon. "Precise Inter-Device Audio Playback Synchronization for Linux." International Journal of Signal Processing Systems 9, no. 3 (September 2021): 17–21. http://dx.doi.org/10.18178/ijsps.9.3.17-21.

Full text
Abstract:
Wave Field Synthesis is a method producing sound that uses arrays of closely placed speakers. This creates an unique challenge for distributed playback systems. Because of clock frequency drift, the playback must constantly be corrected via interpolation and shifting in time of the played stream. In this paper a new approach to network based audio playback synchronization is presented, that makes heavy use of the PTP network time synchronization protocol and ALSA Linux audio subsystem. The software does not need any specialized hardware and can approximate precisely how the playback stream should be interpolated via a set of statistical indicators. The evaluation shows that the difference between two devices playing audio using the presented system is under 10 μs for 99 % of the time, which fully satisfies the requirements of Wave Field Synthesis. The system was compared to other network audio synchronization systems available currently: NetJack2, RAVENNA and Snapcast, all of which had from 10 to 50 times higher differences between two devices than the presented system.
APA, Harvard, Vancouver, ISO, and other styles
2

VOITKO, Viktoriia, Svitlana BEVZ, Sergii BURBELO, and Pavlo STAVYTSKYI. "AUDIO GENERATION TECHNOLOGY OF A SYSTEM OF SYNTHESIS AND ANALYSIS OF MUSIC COMPOSITIONS." Herald of Khmelnytskyi National University 305, no. 1 (February 23, 2022): 64–67. http://dx.doi.org/10.31891/2307-5732-2022-305-1-64-67.

Full text
Abstract:
System of audio synthesis and analysis of music compositions is considered. It consists of two primary parts, the audio analysis component, and the music synthesis component. An audio generation component implements various ways of creating audio sequences. One of them is aimed to record melodies played with voice and transform them into sequences played with selected musical instruments. In addition, an audio input created with a human voice can be utilized as a seed, that is used to generate similar music sequences using artificial intelligence. Finally, a manual approach for music generation and editing is available. After automatic mechanisms for composition generation are used, the results of their work are presented on a two-dimensional plane which represents the dependence of music note pitches on time. It is possible to manually adjust the result of audio generation or create new music sequences with this approach. A creation process could be used iteratively to create multiple parallel music sequences that are to be played as a single audio composition. To implement a seed-based audio synthesis, a deep learning architecture based on a variational autoencoder is used to train a neural network that can reproduce input-like data. When using such an approach an additional important step must be considered. All the input data must be converted from a raw audio format to spectrograms which are represented as grayscale images. Moreover, the result of a sound generation is also represented in a spectrogram and therefore, must be converted back to an output audio format that can be played using speakers. This is required as using spectrograms helps to discard redundant data that raw audio format contains and thus significantly reduces resources consumption and increases overall synthesis speed.
APA, Harvard, Vancouver, ISO, and other styles
3

George, E. Bryan, and Mark J. T. Smith. "Audio analysis/synthesis system." Journal of the Acoustical Society of America 97, no. 3 (March 1995): 2016. http://dx.doi.org/10.1121/1.412041.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Naihan, Yanqing Liu, Yu Wu, Shujie Liu, Sheng Zhao, and Ming Liu. "RobuTrans: A Robust Transformer-Based Text-to-Speech Model." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8228–35. http://dx.doi.org/10.1609/aaai.v34i05.6337.

Full text
Abstract:
Recently, neural network based speech synthesis has achieved outstanding results, by which the synthesized audios are of excellent quality and naturalness. However, current neural TTS models suffer from the robustness issue, which results in abnormal audios (bad cases) especially for unusual text (unseen context). To build a neural model which can synthesize both natural and stable audios, in this paper, we make a deep analysis of why the previous neural TTS models are not robust, based on which we propose RobuTrans (Robust Transformer), a robust neural TTS model based on Transformer. Comparing to TransformerTTS, our model first converts input texts to linguistic features, including phonemic features and prosodic features, then feed them to the encoder. In the decoder, the encoder-decoder attention is replaced with a duration-based hard attention mechanism, and the causal self-attention is replaced with a "pseudo non-causal attention" mechanism to model the holistic information of the input. Besides, the position embedding is replaced with a 1-D CNN, since it constrains the maximum length of synthesized audio. With these modifications, our model not only fix the robustness problem, but also achieves on parity MOS (4.36) with TransformerTTS (4.37) and Tacotron2 (4.37) on our general set.
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Cheng-i., and Shlomo Dubnov. "Guided Music Synthesis with Variable Markov Oracle." Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 10, no. 5 (June 29, 2021): 55–62. http://dx.doi.org/10.1609/aiide.v10i5.12767.

Full text
Abstract:
In this work the problem of guided improvisation is approached and elaborated; then a new method, Variable Markov Oracle, for guided music synthesis is proposed as the first step to tackle the guided improvisation problem. Variable Markov Oracle is based on previous results from Audio Oracle, which is a fast indexing and recombination method of repeating sub-clips in an audio signal. The newly proposed Variable Markov Oracle is capable of identifying inherent datapoint clusters in an audio signal while tracking the sequential relations among clusters at the same time. With a target audio signal indexed by Variable Markov Oracle, a query-matching algorithm is devised to synthesize new music materials by recombination of the target audio matched to a query audio. This approach makes the query-matching algorithm a solution to the guided music synthesis problem. The query-matching algorithm is efficient and intelligent since it follows the inherent clusters discovered by Variable Markov Oracle, creating a query-by-content result which allows numerous applications in concatenative synthesis, machine improvisation and interactive music system. Examples of using Variable Markov Oracle to synthesize new musical materials based on given music signals in the style of Jazz are shown.
APA, Harvard, Vancouver, ISO, and other styles
6

Cabrera, Andrés, JoAnn Kuchera-Morin, and Curtis Roads. "The Evolution of Spatial Audio in the AlloSphere." Computer Music Journal 40, no. 4 (December 2016): 47–61. http://dx.doi.org/10.1162/comj_a_00382.

Full text
Abstract:
Spatial audio has been at the core of the multimodal experience at the AlloSphere, a unique instrument for data discovery and exploration through interactive immersive display, since its conception. The AlloSphere multichannel spatial audio design has direct roots in the history of electroacoustic spatial audio and is the result of previous activities in spatial audio at the University of California at Santa Barbara. A concise technical description of the AlloSphere, its architectural and acoustic features, its unique 3-D visual projection system, and the current 54.1 Meyer Sound audio infrastructure is presented, with details of the audio software architecture and the immersive sound capabilities it supports. As part of the process of realizing scientific and artistic projects for the AlloSphere, spatial audio research has been conducted, including the use of decorrelation of audio signals to supplement spatialization and tackling the thorny problem of interactive up-mixing through the Sound Element Spatializer and the Zirkonium Chords project. The latter uses the metaphor of geometric spatial chords as a high-level means of spatial up-mixing in performance. Other developments relating to spatial audio are presented, such as Ryan McGee's Spatial Modulation Synthesis, which simultaneously explores the synthesis of space and timbre.
APA, Harvard, Vancouver, ISO, and other styles
7

Park, Se Jin, Minsu Kim, Joanna Hong, Jeongsoo Choi, and Yong Man Ro. "SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2062–70. http://dx.doi.org/10.1609/aaai.v36i2.20102.

Full text
Abstract:
The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation learning or leverage intermediate structural information such as landmarks and 3D models. However, they struggle to synthesize fine details of the lips varying at the phoneme level as they do not sufficiently provide visual information of the lips at the video synthesis step. To overcome this limitation, our work proposes Audio-Lip Memory that brings in visual information of the mouth region corresponding to input audio and enforces fine-grained audio-visual coherence. It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time. Therefore, using the retrieved lip motion features as visual hints, it can easily correlate audio with visual dynamics in the synthesis step. By analyzing the memory, we demonstrate that unique lip features are stored in each memory slot at the phoneme level, capturing subtle lip motion based on memory addressing. In addition, we introduce visual-visual synchronization loss which can enhance lip-syncing performance when used along with audio-visual synchronization loss in our model. Extensive experiments are performed to verify that our method generates high-quality video with mouth shapes that best align with the input audio, outperforming previous state-of-the-art methods.
APA, Harvard, Vancouver, ISO, and other styles
8

Kuntz, Matthieu, and Bernhard U. Seeber. "Spatial audio for interactive hearing research." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 265, no. 2 (February 1, 2023): 5120–27. http://dx.doi.org/10.3397/in_2022_0741.

Full text
Abstract:
The use of sound field synthesis for hearing research has gained popularity due to the ability to auralize a wide range of sound scenes in a controlled and reproducible way. We are interested in reproducing acoustic environments for interactive hearing research, allowing participants to move freely over an extended area in the reproduced sound field. While the physically accurate sound field reproduction using sound field synthesis is limited to the sweet spot, it is unclear how different perceptual measures vary across the reproduction area and how suitable sound field synthesis is to evaluate them. To investigate the viability of listening experiments and provide a database for modelling approaches, measurements of binaural cues were carried out in the Simulated Open Field Environment loudspeaker array. Results show that the binaural cues are reproduced well close to the center, but exhibit more variance than in the corresponding free field case. Off center, lower interaural coherence is observed, which can affect binaural unmasking and speech intelligibility. In this work, we study binaural cues and speech reception thresholds over a wide area in the loudspeaker array to investigate the feasibility of psychoacoustic experiments involving speech understanding.
APA, Harvard, Vancouver, ISO, and other styles
9

Loy, D. Gareth. "The Systems Concepts Digital Synthesizer: An Architectural Retrospective." Computer Music Journal 37, no. 3 (September 2013): 49–67. http://dx.doi.org/10.1162/comj_a_00193.

Full text
Abstract:
In the mid 1970s, specialized hardware for synthesizing digital audio helped computer music research move beyond its early reliance on software synthesis running on slow mainframe computers. This hardware allowed for synthesis of complex musical scores in real time and for dynamic, interactive control of synthesis. Peter Samson developed one such device, the Systems Concepts Digital Synthesizer, for Stanford University's Center for Computer Research in Music and Acoustics. The “Samson Box” addressed the classical problems of digital audio synthesis with an elegance that still rewards study. This article thoroughly examines the principles underlying the Box's design—while considering how it was actually employed by its users—and describes the architecture's advantages and disadvantages. An interview with Samson is included.
APA, Harvard, Vancouver, ISO, and other styles
10

Bessell, David. "Dynamic Convolution Modeling, a Hybrid Synthesis Strategy." Computer Music Journal 37, no. 1 (March 2013): 44–51. http://dx.doi.org/10.1162/comj_a_00159.

Full text
Abstract:
This article outlines a hybrid approach to the synthesis of percussion sounds. The synthesis method described here combines techniques and concepts from physical modeling and convolution to produce audio synthesis of percussive instruments. This synthesis method not only achieves a high degree of realism in comparison with audio samples but also retains some of the flexibility associated with waveguide physical models. When the results are analyzed, the method exhibits some interesting detailed spectral features that have some aspects in common with the behavior of acoustic percussion instruments. In addition to outlining the synthesis process, the article discusses some of the more creative possibilities inherent in this approach, e.g., the use and free combination of excitation and resonance sources from beyond the realms of the purely percussive examples given.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Audio synthesi"

1

CHEMLA, ROMEU SANTOS AXEL CLAUDE ANDRE'. "MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES." Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/700444.

Full text
Abstract:
Tra i diversi campi di ricerca nell’ambito dell’informatica musicale, la sintesi e la generazione di segnali audio incarna la pluridisciplinalità di questo settore, nutrendo insieme le pratiche scientifiche e musicale dalla sua creazione. Inerente all’informatica dalla sua creazione, la generazione audio ha ispirato numerosi approcci, evolvendo colle pratiche musicale e gli progressi tecnologici e scientifici. Inoltre, alcuni processi di sintesi permettono anche il processo inverso, denominato analisi, in modo che i parametri di sintesi possono anche essere parzialmente o totalmente estratti dai suoni, dando una rappresentazione alternativa ai segnali analizzati. Per di più, la recente ascesa dei algoritmi di l’apprendimento automatico ha vivamente interrogato il settore della ricerca scientifica, fornendo potenti data-centered metodi che sollevavano diversi epistemologici interrogativi, nonostante i sui efficacia. Particolarmente, un tipo di metodi di apprendimento automatico, denominati modelli generativi, si concentrano sulla generazione di contenuto originale usando le caratteristiche che hanno estratti dei dati analizzati. In tal caso, questi modelli non hanno soltanto interrogato i precedenti metodi di generazione, ma anche sul modo di integrare questi algoritmi nelle pratiche artistiche. Mentre questi metodi sono progressivamente introdotti nel settore del trattamento delle immagini, la loro applicazione per la sintesi di segnali audio e ancora molto marginale. In questo lavoro, il nostro obiettivo e di proporre un nuovo metodo di audio sintesi basato su questi nuovi tipi di generativi modelli, rafforazti dalle nuove avanzati dell’apprendimento automatico. Al primo posto, facciamo una revisione dei approcci esistenti nei settori dei sistemi generativi e di sintesi sonore, focalizzando sul posto di nostro lavoro rispetto a questi disciplini e che cosa possiamo aspettare di questa collazione. In seguito, studiamo in maniera più precisa i modelli generativi, e come possiamo utilizzare questi recenti avanzati per l’apprendimento di complesse distribuzione di suoni, in un modo che sia flessibile e nel flusso creativo del utente. Quindi proponiamo un processo di inferenza / generazione, il quale rifletta i processi di analisi/sintesi che sono molto usati nel settore del trattamento del segnale audio, usando modelli latenti, che sono basati sull’utilizzazione di un spazio continuato di alto livello, che usiamo per controllare la generazione. Studiamo dapprima i risultati preliminari ottenuti con informazione spettrale estratte da diversi tipi di dati, che valutiamo qualitativamente e quantitativamente. Successiva- mente, studiamo come fare per rendere questi metodi più adattati ai segnali audio, fronteggiando tre diversi aspetti. Primo, proponiamo due diversi metodi di regolarizzazione di questo generativo spazio che sono specificamente sviluppati per l’audio : una strategia basata sulla traduzione segnali / simboli, e una basata su vincoli percettivi. Poi, proponiamo diversi metodi per fronteggiare il aspetto temporale dei segnali audio, basati sull’estrazione di rappresentazioni multiscala e sulla predizione, che permettono ai generativi spazi ottenuti di anche modellare l’aspetto dinamico di questi segnali. Per finire, cambiamo il nostro approccio scientifico per un punto di visto piú ispirato dall’idea di ricerca e creazione. Primo, descriviamo l’architettura e il design della nostra libreria open-source, vsacids, sviluppata per permettere a esperti o non-esperti musicisti di provare questi nuovi metodi di sintesi. Poi, proponiamo una prima utilizzazione del nostro modello con la creazione di una performance in real- time, chiamata ægo, basata insieme sulla nostra libreria vsacids e sull’uso di une agente di esplorazione, imparando con rinforzo nel corso della composizione. Finalmente, tramo dal lavoro presentato alcuni conclusioni sui diversi modi di migliorare e rinforzare il metodo di sintesi proposto, nonché eventuale applicazione artistiche.
Among the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.
À travers les différents domaines de recherche de la musique computationnelle, l’analysie et la génération de signaux audio sont l’exemple parfait de la trans-disciplinarité de ce domaine, nourrissant simultanément les pratiques scientifiques et artistiques depuis leur création. Intégrée à la musique computationnelle depuis sa création, la synthèse sonore a inspiré de nombreuses approches musicales et scientifiques, évoluant de pair avec les pratiques musicales et les avancées technologiques et scientifiques de son temps. De plus, certaines méthodes de synthèse sonore permettent aussi le processus inverse, appelé analyse, de sorte que les paramètres de synthèse d’un certain générateur peuvent être en partie ou entièrement obtenus à partir de sons donnés, pouvant ainsi être considérés comme une représentation alternative des signaux analysés. Parallèlement, l’intérêt croissant soulevé par les algorithmes d’apprentissage automatique a vivement questionné le monde scientifique, apportant de puissantes méthodes d’analyse de données suscitant de nombreux questionnements épistémologiques chez les chercheurs, en dépit de leur effectivité pratique. En particulier, une famille de méthodes d’apprentissage automatique, nommée modèles génératifs, s’intéressent à la génération de contenus originaux à partir de caractéristiques extraites directement des données analysées. Ces méthodes n’interrogent pas seulement les approches précédentes, mais aussi sur l’intégration de ces nouvelles méthodes dans les processus créatifs existants. Pourtant, alors que ces nouveaux processus génératifs sont progressivement intégrés dans le domaine la génération d’image, l’application de ces techniques en synthèse audio reste marginale. Dans cette thèse, nous proposons une nouvelle méthode d’analyse-synthèse basés sur ces derniers modèles génératifs, depuis renforcés par les avancées modernes dans le domaine de l’apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des systèmes génératifs, sur comment notre travail peut s’insérer dans les pratiques de synthèse sonore existantes, et que peut-on espérer de l’hybridation de ces deux approches. Ensuite, nous nous focaliserons plus précisément sur comment les récentes avancées accomplies dans ce domaine dans ce domaine peuvent être exploitées pour l’apprentissage de distributions sonores complexes, tout en étant suffisamment flexibles pour être intégrées dans le processus créatif de l’utilisateur. Nous proposons donc un processus d’inférence / génération, reflétant les paradigmes d’analyse-synthèse existant dans le domaine de génération audio, basé sur l’usage de modèles latents continus que l’on peut utiliser pour contrôler la génération. Pour ce faire, nous étudierons déjà les résultats préliminaires obtenus par cette méthode sur l’apprentissage de distributions spectrales, prises d’ensembles de données diversifiés, en adoptant une approche à la fois quantitative et qualitative. Ensuite, nous proposerons d’améliorer ces méthodes de manière spécifique à l’audio sur trois aspects distincts. D’abord, nous proposons deux stratégies de régularisation différentes pour l’analyse de signaux audio : une basée sur la traduction signal/ symbole, ainsi qu’une autre basée sur des contraintes perceptives. Nous passerons par la suite à la dimension temporelle de ces signaux audio, proposant de nouvelles méthodes basées sur l’extraction de représentations temporelles multi-échelle et sur une tâche supplémentaire de prédiction, permettant la modélisation de caractéristiques dynamiques par les espaces génératifs obtenus. En dernier lieu, nous passerons d’une approche scientifique à une approche plus orientée vers un point de vue recherche & création. Premièrement, nous présenterons notre librairie open-source, vsacids, visant à être employée par des créateurs experts et non-experts comme un outil intégré. Ensuite, nous proposons une première utilisation musicale de notre système par la création d’une performance temps réel, nommée ægo, basée à la fois sur notre librarie et sur un agent d’exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu’à maintenant, concernant les possibles améliorations et développements de la méthode de synthèse proposée, ainsi que sur de possibles applications créatives.
APA, Harvard, Vancouver, ISO, and other styles
2

Lundberg, Anton. "Data-Driven Procedural Audio : Procedural Engine Sounds Using Neural Audio Synthesis." Thesis, KTH, Datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280132.

Full text
Abstract:
The currently dominating approach for rendering audio content in interactivemedia, such as video games and virtual reality, involves playback of static audiofiles. This approach is inflexible and requires management of large quantities of audio data. An alternative approach is procedural audio, where sound models are used to generate audio in real time from live inputs. While providing many advantages, procedural audio has yet to find widespread use in commercial productions, partly due to the audio produced by many of the proposed models not meeting industry standards. This thesis investigates how procedural audio can be performed using datadriven methods. We do this by specifically investigating how to generate the sound of car engines using neural audio synthesis. Building on a recently published method that integrates digital signal processing with deep learning, called Differentiable Digital Signal Processing (DDSP), our method obtains sound models by training deep neural networks to reconstruct recorded audio examples from interpretable latent features. We propose a method for incorporating engine cycle phase information, as well as a differentiable transient synthesizer. Our results illustrate that DDSP can be used for procedural engine sounds; however, further work is needed before our models can generate engine sounds without undesired artifacts and before they can be used in live real-time applications. We argue that our approach can be useful for procedural audio in more general contexts, and discuss how our method can be applied to other sound sources.
Det i dagsläget dominerande tillvägagångssättet för rendering av ljud i interaktivamedia, såsom datorspel och virtual reality, innefattar uppspelning av statiska ljudfiler. Detta tillvägagångssätt saknar flexibilitet och kräver hantering av stora mängder ljuddata. Ett alternativt tillvägagångssätt är procedurellt ljud, vari ljudmodeller styrs för att generera ljud i realtid. Trots sina många fördelar används procedurellt ljud ännu inte i någon vid utsträckning inom kommersiella produktioner, delvis på grund av att det genererade ljudet från många föreslagna modeller inte når upp till industrins standarder. Detta examensarbete undersöker hur procedurellt ljud kan utföras med datadrivna metoder. Vi gör detta genom att specifikt undersöka metoder för syntes av bilmotorljud baserade på neural ljudsyntes. Genom att bygga på en nyligen publicerad metod som integrerar digital signalbehandling med djupinlärning, kallad Differentiable Digital Signal Processing (DDSP), kan vår metod skapa ljudmodeller genom att träna djupa neurala nätverk att rekonstruera inspelade ljudexempel från tolkningsbara latenta prediktorer. Vi föreslår en metod för att använda fasinformation från motorers förbränningscykler, samt en differentierbar metod för syntes av transienter. Våra resultat visar att DDSP kan användas till procedurella motorljud, men mer arbete krävs innan våra modeller kan generera motorljud utan oönskade artefakter samt innan de kan användas i realtidsapplikationer. Vi diskuterar hur vårt tillvägagångssätt kan vara användbart inom procedurellt ljud i mer generella sammanhang, samt hur vår metod kan tillämpas på andra ljudkällor
APA, Harvard, Vancouver, ISO, and other styles
3

Elfitri, I. "Analysis by synthesis spatial audio coding." Thesis, University of Surrey, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.590657.

Full text
Abstract:
Spatial Audio Coding (SAC) is a technique used to encode multichannel audio signals by extract ing the spatial parameters and downmixing the audio signals to a mono or stereo audio signal. Recently, various SAC techniques have been proposed to efficiently encode multichannel audio signals. However, all of them operate in open-loop, where the encoder and decoder operate sequentially and independently, and, thus, lack a mechanism for minimising the decoded audio reconstruction error. This thesis proposes a novel SAC technique that utilises the closed-loop system configuration, termed Analysis by Synthesis (AbS), in order to optimise the downmix: signal and the spatial parameters, so as to minimise the decoded signal error. In order to show the effect of the AbS optimisations, the Reverse One-To-Two (R-OTT) module, used in the MPEG Surround (MPS) , must first be applied in the frequency domain to recalculate the downmix and residual signals based on the quantised spatial parameters. These parameters show that the AbS scheme can minimise the quantisation errors of the spatial parameters. As the full AbS is far too complicated to be applied in practice, a simplified AbS algorithm for finding sub-optimal parameters, based on the adapted R-OTT module, is also proposed. Subjective tests show that the proposed Analysis by Synthesis Spatial Audio Coding (AbS-SAC), encoding 5-channel audio signals at a bitrate of 51.2 kb/s per audio channel, achieves higher Subjective Difference Grade (SDG) scores than the tested Advanced Audio Coding (AAC) technique. Furthermore, the objective test also shows that the proposed AbS-SAC method, operating at bitrates of 40 to 96 kb/s per audio channel, significantly outperforms (in terms of Objective Difference Grade (ODG) scores) the tested AAC multichannel technique.
APA, Harvard, Vancouver, ISO, and other styles
4

Wood, Steven Gregory. "Objective Test Methods for Waveguide Audio Synthesis." BYU ScholarsArchive, 2007. https://scholarsarchive.byu.edu/etd/853.

Full text
Abstract:
Acoustic Physical Modeling has emerged as a newer musical synthesis technique. The most common form of physical modeling synthesis in both industry and academia is digital waveguide synthesis. Commercially available for the past thirteen years, the top synthesizer manufacturers have chosen to include physical modeling synthesis in their top of the line models. In the area of audio quality testing, the most common tests have traditionally been group listening tests. While these tests are subjective and can be expensive and time-consuming, the results are validated by the groups' proper quality standards. Research has been conducted to evaluate objective testing procedures in order to find alternative methods for testing audio quality. This research has resulted in various standards approved by the International Telecommunication Union. Tests have proven the reliability of these objective test methods in the areas of telephony as well as various codecs, including MP3. The objective of this research is to determine whether objective test measurements can be used reliably in the area of acoustic physical modeling synthesis, specifically digital waveguide synthesis. Both the Perceptual Audio Quality Measure (PAQM) and Noise-To-Mask Ratio (NMR) objective tests will be performed on the Karplus-Strong algorithm form of Digital Waveguide synthesis. A corresponding listening test based on the Mean Opinion Score (MOS) will also be conducted, and the results from the objective and subjective tests will be compared. The results will show that more research and work needs to be done in this area, as neither the PAQM nor NMR algorithms sufficiently matched the output of the subjective listening tests. Recommendations will be made for future work.
APA, Harvard, Vancouver, ISO, and other styles
5

Ustun, Selen. "Audio browsing of automaton-based hypertext." Thesis, Texas A&M University, 2003. http://hdl.handle.net/1969.1/33.

Full text
Abstract:
With the wide-spread adoption of hypermedia systems and the World Wide Web (WWW) in particular, these systems have evolved from simple systems with only textual content to those that incorporate a large content base, which consists of a wide variety of document types. Also, with the increase in the number of users, there has grown a need for these systems to be accessible to a wider range of users. Consequently, the growth of the systems along with the number and variety of users require new presentation and navigation mechanisms for a wider audience. One of the new presentation methods is the audio-only presentation of hypertext content and this research proposes a novel solution to this problem for complex and dynamic systems. The hypothesis is that the proposed Audio Browser is an efficient tool for presenting hypertext in audio format, which will prove to be useful for several applications including browsers for visually-impaired and remote users. The Audio Browser provides audio-only browsing of contents in a Petri-based hypertext system called Context-Aware Trellis (caT). It uses a combination of synthesized speech and pre-recorded speech to allow its user to listen to contents of documents, follow links, and get information about the navigation process. It also has mechanisms for navigating within documents in order to allow users to view contents more quickly.
APA, Harvard, Vancouver, ISO, and other styles
6

Jehan, Tristan 1974. "Perceptual synthesis engine : an audio-driven timbre generator." Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/61543.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2001.
Includes bibliographical references (leaves 68-75).
A real-time synthesis engine which models and predicts the timbre of acoustic instruments based on perceptual features extracted from an audio stream is presented. The thesis describes the modeling sequence including the analysis of natural sounds, the inference step that finds the mapping between control and output parameters, the timbre prediction step, and the sound synthesis. The system enables applications such as cross-synthesis, pitch shifting or compression of acoustic instruments, and timbre morphing between instrument families. It is fully implemented in the Max/MSP environment. The Perceptual Synthesis Engine was developed for the Hyperviolin as a novel, generic and perceptually meaningful synthesis technique for non-discretely pitched instruments.
by Tristan Jehan.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
7

Payne, R. G. "Digital techniques for the analysis and synthesis of audio signals." Thesis, Bucks New University, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.234706.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Coulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Mémoire, Université de Sherbrooke, 2000. http://savoirs.usherbrooke.ca/handle/11143/1078.

Full text
Abstract:
Les objectifs de notre recherche s’exposent en deux grands points : 1) Explorer les techniques de codage param étrique à synthèse sinusoïdale et les appliquer aux signaux audio (principalement de musique). 2) Améliorer la qualité intrinsèque de ces modèles notamment au niveau des compromis temps/fréquence propres au codage par transformées. Nous avons comme méthodologie, effectué des simulations en « C » et en MATLAB des récents algorithmes de synthèse sinusoïdale, mais en nous inspirant en particulier du codeur MSLPC (Multisinusoid LPC) de Wen- Whei C, De-Yu W. et Li-Wei W. de l’Université Nationale Chiao-Tung de Taiwan (5). Ce mémoire contient quatre chapitres. Le Chapitre 1 présente une introduction et une mise en contexte. Le chapitre 2 présente un aperçu sur le codage paramétrique et l’intérêt de cette technique. Une présentation des types de codeurs paramétriques existants suivra. Le chapitre 3 est consacré à la description des différentes étapes parcourues dans la conception d’un codeur à synthèse sinusoïdale avec des méthodes récemment développées. Le chapitre 4 présente la conception et l’implantation rigoureuse du modèle ainsi qu'une présentation de notre compromis temps/fréquence proposée pour améliorer la qualité intrinsèque du codeur sinusoïdal. Dans ce chapitre 4, nous présentons aussi une évaluation informelle de la performance de notre modèle. Enfin nous terminerons ce mémoire par une conclusion.
APA, Harvard, Vancouver, ISO, and other styles
9

Coulibaly, Patrice Yefoungnigui. "Codage audio à bas débit avec synthèse sinusoïdale." Sherbrooke : Université de Sherbrooke, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Andreux, Mathieu. "Foveal autoregressive neural time-series modeling." Electronic Thesis or Diss., Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE073.

Full text
Abstract:
Cette thèse s'intéresse à la modélisation non-supervisée de séries temporelles univariées. Nous abordons tout d'abord le problème de prédiction linéaire des valeurs futures séries temporelles gaussiennes sous hypothèse de longues dépendances, qui nécessitent de tenir compte d'un large passé. Nous introduisons une famille d'ondelettes fovéales et causales qui projettent les valeurs passées sur un sous-espace adapté au problème, réduisant ainsi la variance des estimateurs associés. Dans un deuxième temps, nous cherchons sous quelles conditions les prédicteurs non-linéaires sont plus performants que les méthodes linéaires. Les séries temporelles admettant une représentation parcimonieuse en temps-fréquence, comme celles issues de l'audio, réunissent ces conditions, et nous proposons un algorithme de prédiction utilisant une telle représentation. Le dernier problème que nous étudions est la synthèse de signaux audios. Nous proposons une nouvelle méthode de génération reposant sur un réseau de neurones convolutionnel profond, avec une architecture encodeur-décodeur, qui permet de synthétiser de nouveaux signaux réalistes. Contrairement à l'état de l'art, nous exploitons explicitement les propriétés temps-fréquence des sons pour définir un encodeur avec la transformée en scattering, tandis que le décodeur est entraîné pour résoudre un problème inverse dans une métrique adaptée
This dissertation studies unsupervised time-series modelling. We first focus on the problem of linearly predicting future values of a time-series under the assumption of long-range dependencies, which requires to take into account a large past. We introduce a family of causal and foveal wavelets which project past values on a subspace which is adapted to the problem, thereby reducing the variance of the associated estimators. We then investigate under which conditions non-linear predictors exhibit better performances than linear ones. Time-series which admit a sparse time-frequency representation, such as audio ones, satisfy those requirements, and we propose a prediction algorithm using such a representation. The last problem we tackle is audio time-series synthesis. We propose a new generation method relying on a deep convolutional neural network, with an encoder-decoder architecture, which allows to synthesize new realistic signals. Contrary to state-of-the-art methods, we explicitly use time-frequency properties of sounds to define an encoder with the scattering transform, while the decoder is trained to solve an inverse problem in an adapted metric
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Audio synthesi"

1

Kunow, Kristian. Rundfunk und Internet: These, Antithese, Synthese? Edited by Arbeitsgemeinschaft der Landesmedienanstalten in der Bundesrepublik Deutschland. Berlin: Vistas, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Audio system for technical readings. Berlin: Springer, 1998.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

1944-, Kamajou François, and Cameroon. Ministry of Scientific Research., eds. Audit scientifique de la recherche agricole au Cameroun: Synthèse de l'audit, rapport général. [Yaoundé]: République du Cameroun, Ministère de la recherche scientifique et technique, 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Joachim, Hornegger, ed. Pattern recognition and image processing in C [plus] [plus]. Wiesbaden: Vieweg, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Audio Engineering Society. International Conference. Virtual, synthetic and entertainment audio: The proceedings of the AES 22nd international conference 2002, June 15-17, Espoo, Finland. New York: AES, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Nakagawa, Seiichi. Speech, hearing and neural network models. Tokyo: Ohmsha, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Junqua, Jean-Claude. Robustness in automatic speech recognition: Fundamentals and applications. Boston: Kluwer Academic Publishers, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Rick, Beasley, ed. Voice application development with VoiceXML. Indianapolis, Ind: Sams, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

IEEE/RSJ International Conference on Intelligent Robots and Systems (2001 Maui, Hawaii). Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2001): Expanding the societal role of robotics in the next millennium, October 29-November 3, 2001, Outrigger Wailea Resort, Maui, Hawaii, USA. Piscataway, N.J: IEEE, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

IEEE Workshop on Automatic Speech Recognition and Understanding (1997 Santa Barbara, Calif.). 1997 IEEE Workshop on Automatic Speech Recognition and Understanding proceedings. Piscataway, NJ: Published under the sponsorship of the IEEE Signal Processing Society, 1997.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Audio synthesi"

1

Weik, Martin H. "audio synthesis." In Computer Science and Communications Dictionary, 77. Boston, MA: Springer US, 2000. http://dx.doi.org/10.1007/1-4020-0613-6_1022.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tarr, Eric. "Introduction to Signal Synthesis." In Hack Audio, 79–101. New York, NY : Routledge, 2019. | Series: Audio Engineering Society presents: Routledge, 2018. http://dx.doi.org/10.4324/9781351018463-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Bode, Peer D. "Analog Audio Synthesis." In The Routledge Companion to Media Technology and Obsolescence, 148–63. New York : Routledge/Taylor & Francis Group, 2019.: Routledge, 2018. http://dx.doi.org/10.4324/9781315442686-11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Van Every, Shawn. "Audio Synthesis and Analysis." In Pro Android Media, 179–93. Berkeley, CA: Apress, 2009. http://dx.doi.org/10.1007/978-1-4302-3268-1_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Jackson, Wallace. "The Synthesis of Digital Audio: Tone Generation." In Digital Audio Editing Fundamentals, 93–105. Berkeley, CA: Apress, 2015. http://dx.doi.org/10.1007/978-1-4842-1648-4_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Jackson, Wallace. "The History of Digital Audio: MIDI and Synthesis." In Digital Audio Editing Fundamentals, 11–17. Berkeley, CA: Apress, 2015. http://dx.doi.org/10.1007/978-1-4842-1648-4_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chen, Jiashu. "3D Audio and Virtual Acoustical Environment Synthesis." In Acoustic Signal Processing for Telecommunication, 283–301. Boston, MA: Springer US, 2000. http://dx.doi.org/10.1007/978-1-4419-8644-3_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Záviška, Pavel, Pavel Rajmic, Zdeněk Průša, and Vítězslav Veselý. "Revisiting Synthesis Model in Sparse Audio Declipper." In Latent Variable Analysis and Signal Separation, 429–45. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-93764-9_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Bernardes, Gilberto, and Diogo Cocharro. "Dynamic Music Generation, Audio Analysis-Synthesis Methods." In Encyclopedia of Computer Graphics and Games, 1–4. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-319-08234-9_211-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Huzaifah, Muhammad, and Lonce Wyse. "Deep Generative Models for Musical Audio Synthesis." In Handbook of Artificial Intelligence for Music, 639–78. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-72116-9_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Audio synthesi"

1

Huang, Mincong (Jerry), Samuel Chabot, and Jonas Braasch. "Panoptic Reconstruction of Immersive Virtual Soundscapes Using Human-Scale Panoramic Imagery with Visual Recognition." In ICAD 2021: The 26th International Conference on Auditory Display. icad.org: International Community for Auditory Display, 2021. http://dx.doi.org/10.21785/icad2021.043.

Full text
Abstract:
This work, situated at Rensselaer’s Collaborative-Research Augmented Immersive Virtual Environment Laboratory (CRAIVE-Lab), uses panoramic image datasets for spatial audio display. A system is developed for the room-centered immersive virtual reality facility to analyze panoramic images on a segment-by-segment basis, using pre-trained neural network models for semantic segmentation and object detection, thereby generating audio objects with respective spatial locations. These audio objects are then mapped with a series of synthetic and recorded audio datasets and populated within a spatial audio environment as virtual sound sources. The resulting audiovisual outcomes are then displayed using the facility’s human-scale panoramic display, as well as the 128-channel loudspeaker array for wave field synthesis (WFS). Performance evaluation indicates effectiveness for real-time enhancements, with potentials for large-scale expansion and rapid deployment in dynamic immersive virtual environments.
APA, Harvard, Vancouver, ISO, and other styles
2

Fox, K. Michael, Jeremy Stewart, and Rob Hamilton. "madBPM: Musical and Auditory Display for Biological Predictive Modeling." In The 23rd International Conference on Auditory Display. Arlington, Virginia: The International Community for Auditory Display, 2017. http://dx.doi.org/10.21785/icad2017.045.

Full text
Abstract:
The modeling of biological data can be carried out using structured sound and musical process in conjunction with integrated visualizations. With a future goal of improving the speed and accuracy of techniques currently in use for the production of synthetic high value chemicals through the greater understanding of data sets, the madBPM project couples real-time audio synthesis and visual rendering with a highly flexible data-ingestion engine. Each component of the madBPM system is modular, allowing for customization of audio, visual and data-based processing.
APA, Harvard, Vancouver, ISO, and other styles
3

Chatziioannou, Vasileios. "Digital synthesis of impact sounds." In the Audio Mostly 2015. New York, New York, USA: ACM Press, 2015. http://dx.doi.org/10.1145/2814895.2814908.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Lie Lu, Yi Mao, Liu Wenyin, and Hong-Jiang Zhang. "Audio restoration by constrained audio texture synthesis." In 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698). IEEE, 2003. http://dx.doi.org/10.1109/icme.2003.1221334.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Schimbinschi, Florin, Christian Walder, Sarah M. Erfani, and James Bailey. "SynthNet: Learning to Synthesize Music End-to-End." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/467.

Full text
Abstract:
We consider the problem of learning a mapping directly from annotated music to waveforms, bypassing traditional single note synthesis. We propose a specific architecture based on WaveNet, a convolutional autoregressive generative model designed for text to speech. We investigate the representations learned by these models on music and concludethat mappings between musical notes and the instrument timbre can be learned directly from the raw audio coupled with the musical score, in binary piano roll format.Our model requires minimal training data (9 minutes), is substantially better in quality and converges 6 times faster in comparison to strong baselines in the form of powerful text to speech models.The quality of the generated waveforms (generation accuracy) is sufficiently high,that they are almost identical to the ground truth.Our evaluations are based on both the RMSE of the Constant-Q transform, and mean opinion scores from human subjects.We validate our work using 7 distinct synthetic instrument timbres, real cello music and also provide visualizations and links to all generated audio.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhu, Hao, Huaibo Huang, Yi Li, Aihua Zheng, and Ran He. "Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/327.

Full text
Abstract:
Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, cross-modality coherence between audio and video information has not been well addressed during synthesis. In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization. Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.
APA, Harvard, Vancouver, ISO, and other styles
7

Skinner, Martha. "Audio and Video Drawings Mapping Temporality." In ACADIA 2006: Synthetic Landscapes. ACADIA, 2006. http://dx.doi.org/10.52842/conf.acadia.2006.178.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Skinner, Martha. "Audio and Video Drawings Mapping Temporality." In ACADIA 2006: Synthetic Landscapes. ACADIA, 2006. http://dx.doi.org/10.52842/conf.acadia.2006.178.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ye, Zhenhui, Zhou Zhao, Yi Ren, and Fei Wu. "SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/620.

Full text
Abstract:
The recent progress in non-autoregressive text-to-speech (NAR-TTS) has made fast and high-quality speech synthesis possible. However, current NAR-TTS models usually use phoneme sequence as input and thus cannot understand the tree-structured syntactic information of the input sequence, which hurts the prosody modeling. To this end, we propose SyntaSpeech, a syntax-aware and light-weight NAR-TTS model, which integrates tree-structured syntactic information into the prosody modeling modules in PortaSpeech. Specifically, 1) We build a syntactic graph based on the dependency tree of the input sentence, then process the text encoding with a syntactic graph encoder to extract the syntactic information. 2) We incorporate the extracted syntactic encoding with PortaSpeech to improve the prosody prediction. 3) We introduce a multi-length discriminator to replace the flow-based post-net in PortaSpeech, which simplifies the training pipeline and improves the inference speed, while keeping the naturalness of the generated audio. Experiments on three datasets not only show that the tree-structured syntactic information grants SyntaSpeech the ability to synthesize better audio with expressive prosody, but also demonstrate the generalization ability of SyntaSpeech to adapt to multiple languages and multi-speaker text-to-speech. Ablation studies demonstrate the necessity of each component in SyntaSpeech. Source code and audio samples are available at https://syntaspeech.github.io.
APA, Harvard, Vancouver, ISO, and other styles
10

Moore, Carl, and William Brent. "Interactive Real-time Concatenative Synthesis in Virtual Reality." In ICAD 2019: The 25th International Conference on Auditory Display. Newcastle upon Tyne, United Kingdom: Department of Computer and Information Sciences, Northumbria University, 2019. http://dx.doi.org/10.21785/icad2019.068.

Full text
Abstract:
This paper presents a new platform for interactive concatenative synthesis designed for virtual reality and proposes further applications for immersive audio tools and instruments. TimbreSpace VR is an extension of William Brent’s TimbreSpace software using the timbreID library for Pure Data. Design and implementation of the application are discussed, as well as its live performance aspects. Finally, future work is laid out for the project, proposing versatile audio manipulation software specifically for XR platforms.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Audio synthesi"

1

Murad, M. Hassan, Stephanie M. Chang, Celia Fiordalisi, Jennifer S. Lin, Timothy J. Wilt, Amy Tsou, Brian Leas, et al. Improving the Utility of Evidence Synthesis for Decision Makers in the Face of Insufficient Evidence. Agency for Healthcare Research and Quality (AHRQ), April 2021. http://dx.doi.org/10.23970/ahrqepcwhitepaperimproving.

Full text
Abstract:
Background: Healthcare decision makers strive to operate on the best available evidence. The Agency for Healthcare Research and Quality Evidence-based Practice Center (EPC) Program aims to support healthcare decision makers by producing evidence reviews that rate the strength of evidence. However, the evidence base is often sparse or heterogeneous, or otherwise results in a high degree of uncertainty and insufficient evidence ratings. Objective: To identify and suggest strategies to make insufficient ratings in systematic reviews more actionable. Methods: A workgroup comprising EPC Program members convened throughout 2020. We conducted interative discussions considering information from three data sources: a literature review for relevant publications and frameworks, a review of a convenience sample of past systematic reviews conducted by the EPCs, and an audit of methods used in past EPC technical briefs. Results: Several themes emerged across the literature review, review of systematic reviews, and review of technical brief methods. In the purposive sample of 43 systematic reviews, the use of the term “insufficient” covered both instances of no evidence and instances of evidence being present but insufficient to estimate an effect. The results of the literature review and review of the EPC Program systematic reviews illustrated the importance of clearly stating the reasons for insufficient evidence. Results of both the literature review and review of systematic reviews highlighted the factors decision makers consider when making decisions when evidence of benefits or harms is insufficient, such as costs, values, preferences, and equity. We identified five strategies for supplementing systematic review findings when evidence on benefit or harms is expected to be or found to be insufficient, including: reconsidering eligible study designs, summarizing indirect evidence, summarizing contextual and implementation evidence, modelling, and incorporating unpublished health system data. Conclusion: Throughout early scoping, protocol development, review conduct, and review presentation, authors should consider five possible strategies to supplement potential insufficient findings of benefit or harms. When there is no evidence available for a specific outcome, reviewers should use a statement such as “no studies” instead of “insufficient.” The main reasons for insufficient evidence rating should be explicitly described.
APA, Harvard, Vancouver, ISO, and other styles
2

Kiianovska, N. M. The development of theory and methods of using cloud-based information and communication technologies in teaching mathematics of engineering students in the United States. Видавничий центр ДВНЗ «Криворізький національний університет», December 2014. http://dx.doi.org/10.31812/0564/1094.

Full text
Abstract:
The purpose of the study is the analysis of the development of the theory and methods of ICT usage while teaching higher mathematics engineering students in the United States. It was determined following tasks: to analyze the problem source, to identify the state of its elaboration, to identify key trends in the development of theory and methods of ICT usage while teaching higher mathematics engineering students in the United States, the object of study – the use of ICT in teaching engineering students, the research methods are: analysis of scientific, educational, technical, historical sources; systematization and classification of scientific statements on the study; specification, comparison, analysis and synthesis, historical and pedagogical analysis of the sources to establish the chronological limits and implementation of ICT usage in educational practice of U.S. technical colleges. In article was reviewed a modern ICT tools used in learning of fundamental subjects for future engineers in the United States, shown the evolution and convergence of ICT learning tools. Discussed experience of the «best practices» using online ICT in higher engineering education at United States. Some of these are static, while others are interactive or dynamic, giving mathematics learners opportunities to develop visualization skills, explore mathematical concepts, and obtain solutions to self-selected problems. Among ICT tools are the following: tools to transmit audio and video data, tools to collaborate on projects, tools to support object-oriented practice. The analysis leads to the following conclusion: using cloud-based tools of learning mathematic has become the leading trend today. Therefore, university professors are widely considered to implement tools to assist the process of learning mathematics such properties as mobility, continuity and adaptability.
APA, Harvard, Vancouver, ISO, and other styles
3

Baluk, Nadia, Natalia Basij, Larysa Buk, and Olha Vovchanska. VR/AR-TECHNOLOGIES – NEW CONTENT OF THE NEW MEDIA. Ivan Franko National University of Lviv, February 2021. http://dx.doi.org/10.30970/vjo.2021.49.11074.

Full text
Abstract:
The article analyzes the peculiarities of the media content shaping and transformation in the convergent dimension of cross-media, taking into account the possibilities of augmented reality. With the help of the principles of objectivity, complexity and reliability in scientific research, a number of general scientific and special methods are used: method of analysis, synthesis, generalization, method of monitoring, observation, problem-thematic, typological and discursive methods. According to the form of information presentation, such types of media content as visual, audio, verbal and combined are defined and characterized. The most important in journalism is verbal content, it is the one that carries the main information load. The dynamic development of converged media leads to the dominance of image and video content; the likelihood of increasing the secondary content of the text increases. Given the market situation, the effective information product is a combined content that combines text with images, spreadsheets with video, animation with infographics, etc. Increasing number of new media are using applications and website platforms to interact with recipients. To proceed, the peculiarities of the new content of new media with the involvement of augmented reality are determined. Examples of successful interactive communication between recipients, the leading news agencies and commercial structures are provided. The conditions for effective use of VR / AR-technologies in the media content of new media, the involvement of viewers in changing stories with augmented reality are determined. The so-called immersive effect with the use of VR / AR-technologies involves complete immersion, immersion of the interested audience in the essence of the event being relayed. This interaction can be achieved through different types of VR video interactivity. One of the most important results of using VR content is the spatio-temporal and emotional immersion of viewers in the plot. The recipient turns from an external observer into an internal one; but his constant participation requires that the user preferences are taken into account. Factors such as satisfaction, positive reinforcement, empathy, and value influence the choice of VR / AR content by viewers.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography