Literatura científica selecionada sobre o tema "Generative audio models"

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Consulte a lista de atuais artigos, livros, teses, anais de congressos e outras fontes científicas relevantes para o tema "Generative audio models".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Artigos de revistas sobre o assunto "Generative audio models"

1

Evans, Zach, Scott H. Hawley e Katherine Crowson. "Musical audio samples generated from joint text embeddings". Journal of the Acoustical Society of America 152, n.º 4 (outubro de 2022): A178. http://dx.doi.org/10.1121/10.0015956.

Texto completo da fonte
Resumo:
The field of machine learning has benefited from the appearance of diffusion-based generative models for images and audio. While text-to-image models have become increasingly prevalent, text-to-audio generative models are currently an active area of research. We present work on generating short samples of musical instrument sounds generated by a model which was conditioned on text descriptions and the file structure labels of large sample libraries. Preliminary findings indicate that generation of wide-spectrum sounds such as percussion are not difficult, while the generation of harmonic musical sounds presents challenges for audio diffusion models.
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Wang, Heng, Jianbo Ma, Santiago Pascual, Richard Cartwright e Weidong Cai. "V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models". Proceedings of the AAAI Conference on Artificial Intelligence 38, n.º 14 (24 de março de 2024): 15492–501. http://dx.doi.org/10.1609/aaai.v38i14.29475.

Texto completo da fonte
Resumo:
Building artificial intelligence (AI) systems on top of a set of foundation models (FMs) is becoming a new paradigm in AI research. Their representative and generative abilities learnt from vast amounts of data can be easily adapted and transferred to a wide range of downstream tasks without extra training from scratch. However, leveraging FMs in cross-modal generation remains under-researched when audio modality is involved. On the other hand, automatically generating semantically-relevant sound from visual input is an important problem in cross-modal generation studies. To solve this vision-to-audio (V2A) generation problem, existing methods tend to design and build complex systems from scratch using modestly sized datasets. In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM. We first investigate the domain gap between the latent space of the visual CLIP and the auditory CLAP models. Then we propose a simple yet effective mapper mechanism (V2A-Mapper) to bridge the domain gap by translating the visual input between CLIP and CLAP spaces. Conditioned on the translated CLAP embedding, pretrained audio generative FM AudioLDM is adopted to produce high-fidelity and visually-aligned sound. Compared to previous approaches, our method only requires a quick training of the V2A-Mapper. We further analyze and conduct extensive experiments on the choice of the V2A-Mapper and show that a generative mapper is better at fidelity and variability (FD) while a regression mapper is slightly better at relevance (CS). Both objective and subjective evaluation on two V2A datasets demonstrate the superiority of our proposed method compared to current state-of-the-art approaches - trained with 86% fewer parameters but achieving 53% and 19% improvement in FD and CS, respectively. Supplementary materials such as audio samples are provided at our demo website: https://v2a-mapper.github.io/.
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Sakirin, Tam, e Siddartha Kusuma. "A Survey of Generative Artificial Intelligence Techniques". Babylonian Journal of Artificial Intelligence 2023 (10 de março de 2023): 10–14. http://dx.doi.org/10.58496/bjai/2023/003.

Texto completo da fonte
Resumo:
Generative artificial intelligence (AI) refers to algorithms capable of creating novel, realistic digital content autonomously. Recently, generative models have attained groundbreaking results in domains like image and audio synthesis, spurring vast interest in the field. This paper surveys the landscape of modern techniques powering the rise of creative AI systems. We structurally examine predominant algorithmic approaches including generative adversarial networks (GANs), variational autoencoders (VAEs), and autoregressive models. Architectural innovations and illustrations of generated outputs are highlighted for major models under each category. We give special attention to generative techniques for constructing realistic images, tracing rapid progress from early GAN samples to modern diffusion models like Stable Diffusion. The paper further reviews generative modeling to create convincing audio, video, and 3D renderings, which introduce critical challenges around fake media detection and data bias. Additionally, we discuss common datasets that have enabled advances in generative modeling. Finally, open questions around evaluation, technique blending, controlling model behaviors, commercial deployment, and ethical considerations are outlined as active areas for future work. This survey presents both long-standing and emerging techniques molding the state and trajectory of generative AI. The key goals are to overview major algorithm families, highlight innovations through example models, synthesize capabilities for multimedia generation, and discuss open problems around data, evaluation, control, and ethics. Please let me know if you would like any clarification or modification of this proposed abstract.
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Broad, Terence, Frederic Fol Leymarie e Mick Grierson. "Network Bending: Expressive Manipulation of Generative Models in Multiple Domains". Entropy 24, n.º 1 (24 de dezembro de 2021): 28. http://dx.doi.org/10.3390/e24010028.

Texto completo da fonte
Resumo:
This paper presents the network bending framework, a new approach for manipulating and interacting with deep generative models. We present a comprehensive set of deterministic transformations that can be inserted as distinct layers into the computational graph of a trained generative neural network and applied during inference. In addition, we present a novel algorithm for analysing the deep generative model and clustering features based on their spatial activation maps. This allows features to be grouped together based on spatial similarity in an unsupervised fashion. This results in the meaningful manipulation of sets of features that correspond to the generation of a broad array of semantically significant features of the generated results. We outline this framework, demonstrating our results on deep generative models for both image and audio domains. We show how it allows for the direct manipulation of semantically meaningful aspects of the generative process as well as allowing for a broad range of expressive outcomes.
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Aldausari, Nuha, Arcot Sowmya, Nadine Marcus e Gelareh Mohammadi. "Video Generative Adversarial Networks: A Review". ACM Computing Surveys 55, n.º 2 (31 de março de 2023): 1–25. http://dx.doi.org/10.1145/3487891.

Texto completo da fonte
Resumo:
With the increasing interest in the content creation field in multiple sectors such as media, education, and entertainment, there is an increased trend in the papers that use AI algorithms to generate content such as images, videos, audio, and text. Generative Adversarial Networks (GANs) is one of the promising models that synthesizes data samples that are similar to real data samples. While the variations of GANs models in general have been covered to some extent in several survey papers, to the best of our knowledge, this is the first paper that reviews the state-of-the-art video GANs models. This paper first categorizes GANs review papers into general GANs review papers, image GANs review papers, and special field GANs review papers such as anomaly detection, medical imaging, or cybersecurity. The paper then summarizes the main improvements in GANs that are not necessarily applied in the video domain in the first run but have been adopted in multiple video GANs variations. Then, a comprehensive review of video GANs models are provided under two main divisions based on existence of a condition. The conditional models are then further classified according to the provided condition into audio, text, video, and image. The paper concludes with the main challenges and limitations of the current video GANs models.
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Shen, Qiwei, Junjie Xu, Jiahao Mei, Xingjiao Wu e Daoguo Dong. "EmoStyle: Emotion-Aware Semantic Image Manipulation with Audio Guidance". Applied Sciences 14, n.º 8 (10 de abril de 2024): 3193. http://dx.doi.org/10.3390/app14083193.

Texto completo da fonte
Resumo:
With the flourishing development of generative models, image manipulation is receiving increasing attention. Rather than text modality, several elegant designs have delved into leveraging audio to manipulate images. However, existing methodologies mainly focus on image generation conditional on semantic alignment, ignoring the vivid affective information depicted in the audio. We propose an Emotion-aware StyleGAN Manipulator (EmoStyle), a framework where affective information from audio can be explicitly extracted and further utilized during image manipulation. Specifically, we first leverage the multi-modality model ImageBind for initial cross-modal retrieval between images and music, and select the music-related image for further manipulation. Simultaneously, by extracting sentiment polarity from the lyrics of the audio, we generate an emotionally rich auxiliary music branch to accentuate the affective information. We then leverage pre-trained encoders to encode audio and the audio-related image into the same embedding space. With the aligned embeddings, we manipulate the image via a direct latent optimization method. We conduct objective and subjective evaluations on the generated images, and our results show that our framework is capable of generating images with specified human emotions conveyed in the audio.
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Andreu, Sergi, e Monica Villanueva Aylagas. "Neural Synthesis of Sound Effects Using Flow-Based Deep Generative Models". Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 18, n.º 1 (11 de outubro de 2022): 2–9. http://dx.doi.org/10.1609/aiide.v18i1.21941.

Texto completo da fonte
Resumo:
Creating variations of sound effects for video games is a time-consuming task that grows with the size and complexity of the games themselves. The process usually comprises recording source material and mixing different layers of sound to create sound effects that are perceived as diverse during gameplay. In this work, we present a method to generate controllable variations of sound effects that can be used in the creative process of sound designers. We adopt WaveFlow, a generative flow model that works directly on raw audio and has proven to perform well for speech synthesis. Using a lower-dimensional mel spectrogram as the conditioner allows both user controllability and a way for the network to generate more diversity. Additionally, it gives the model style transfer capabilities. We evaluate several models in terms of the quality and variability of the generated sounds using both quantitative and subjective evaluations. The results suggest that there is a trade-off between quality and diversity. Nevertheless, our method achieves a quality level similar to that of the training set while generating perceivable variations according to a perceptual study that includes game audio experts.
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Lattner, Stefan, e Javier Nistal. "Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks". Electronics 10, n.º 11 (5 de junho de 2021): 1349. http://dx.doi.org/10.3390/electronics10111349.

Texto completo da fonte
Resumo:
Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Yang, Junpeng, e Haoran Zhang. "Development And Challenges of Generative Artificial Intelligence in Education and Art". Highlights in Science, Engineering and Technology 85 (13 de março de 2024): 1334–47. http://dx.doi.org/10.54097/vaeav407.

Texto completo da fonte
Resumo:
Thanks to the rapid development of generative deep learning models, Artificial Intelligence Generated Content (AIGC) has attracted more and more research attention in recent years, which aims to learn models from massive data to generate relevant content based on input conditions. Different from traditional single-modal generation tasks that focus on content generation for a particular modality, such as image generation, text generation, or semantic generation, AIGC trains a single model that can simultaneously understand language, images, videos, audio, and more. AIGC marks the transition from traditional decision-based artificial intelligence to generative artificial intelligence, which has been widely applied in various fields. Focusing on the key technologies and representative applications of AIGC, this paper identifies several key technical challenges and controversies in the field. These include defects in cross-modal and multimodal generation, issues related to model stability and data consistency, privacy concerns, and questions about whether advanced generative models like ChatGPT can be considered general artificial intelligence (AGI). While this dissertation provides valuable insights into the revolution and challenge of generative AI in art and education, it acknowledges the sensitivity of generated content and the ethical dilemmas it may pose, and ownership rights for AI-generated works and the need for new intellectual property norms are subjects of ongoing discussion. To address the current technical bottlenecks in cross-modal and multimodal generation, future research aims to quantitatively analyze and compare existing models, proposing practical optimization strategies. With the rapid advancement of generative AI, we anticipate a transition from user-generated content (UGC) to artificial intelligence-generated content (AIGC) and, ultimately, a new era of human-computer co-creation with strong interactive potential in the near future.
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Choi, Ha-Yeong, Sang-Hoon Lee e Seong-Whan Lee. "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion". Proceedings of the AAAI Conference on Artificial Intelligence 38, n.º 16 (24 de março de 2024): 17862–70. http://dx.doi.org/10.1609/aaai.v38i16.29740.

Texto completo da fonte
Resumo:
Diffusion-based generative models have recently exhibited powerful generative performance. However, as many attributes exist in the data distribution and owing to several limitations of sharing the model parameters across all levels of the generation process, it remains challenging to control specific styles for each attribute. To address the above problem, we introduce decoupled denoising diffusion models (DDDMs) with disentangled representations, which can enable effective style transfers for each attribute in generative models. In particular, we apply DDDMs for voice conversion (VC) tasks, tackling the intricate challenge of disentangling and individually transferring each speech attributes such as linguistic information, intonation, and timbre. First, we use a self-supervised representation to disentangle the speech representation. Subsequently, the DDDMs are applied to resynthesize the speech from the disentangled representations for style transfer with respect to each attribute. Moreover, we also propose the prior mixup for robust voice style transfer, which uses the converted representation of the mixed style as a prior distribution for the diffusion models. The experimental results reveal that our method outperforms publicly available VC models. Furthermore, we show that our method provides robust generative performance even when using a smaller model size. Audio samples are available at https://hayeong0.github.io/DDDM-VC-demo/.
Estilos ABNT, Harvard, Vancouver, APA, etc.

Teses / dissertações sobre o assunto "Generative audio models"

1

Douwes, Constance. "On the Environmental Impact of Deep Generative Models for Audio". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS074.

Texto completo da fonte
Resumo:
Cette thèse étudie l'impact environnemental des modèles d'apprentissage profond pour la génération audio et vise à mettre le coût de calcul au cœur du processus d'évaluation. En particulier, nous nous concentrons sur différents types de modèles d'apprentissage profond spécialisés dans la synthèse audio de formes d'onde brutes. Ces modèles sont désormais un élément clé des systèmes audio modernes, et leur utilisation a considérablement augmenté ces dernières années. Leur flexibilité et leurs capacités de généralisation en font des outils puissants dans de nombreux contextes, de la synthèse de texte à la parole à la génération audio inconditionnelle. Cependant, ces avantages se font au prix de sessions d'entraînement coûteuses sur de grandes quantités de données, exploitées sur du matériel dédié à forte consommation d'énergie, ce qui entraîne d'importantes émissions de gaz à effet de serre. Les mesures que nous utilisons en tant que communauté scientifique pour évaluer nos travaux sont au cœur de ce problème. Actuellement, les chercheurs en apprentissage profond évaluent leurs travaux principalement sur la base des améliorations de la précision, de la log-vraisemblance, de la reconstruction ou des scores d'opinion, qui occultent tous le coût de calcul des modèles génératifs. Par conséquent, nous proposons d'utiliser une nouvelle méthodologie basée sur l'optimalité de Pareto pour aider la communauté à mieux évaluer leurs travaux tout en ramenant l'empreinte énergétique -- et in fine les émissions de carbone -- au même niveau d'intérêt que la qualité du son. Dans la première partie de cette thèse, nous présentons un rapport complet sur l'utilisation de diverses mesures d'évaluation des modèles génératifs profonds pour les tâches de synthèse audio. Bien que l'efficacité de calcul soit de plus en plus abordée, les mesures de qualité sont les plus couramment utilisées pour évaluer les modèles génératifs profonds, alors que la consommation d'énergie n'est presque jamais mentionnée. Nous abordons donc cette question en estimant le coût en carbone de la formation des modèles génératifs et en le comparant à d'autres coûts en carbone notables pour démontrer qu'il est loin d'être insignifiant. Dans la deuxième partie de cette thèse, nous proposons une évaluation à grande échelle des vocodeurs neuronaux pervasifs, qui sont une classe de modèles génératifs utilisés pour la génération de la parole, conditionnée par le mel-spectrogramme. Nous introduisons une analyse multi-objectifs basée sur l'optimalité de Pareto à la fois de la qualité de l'évaluation humaine et de la consommation d'énergie. Dans ce cadre, nous montrons que des modèles plus légers peuvent être plus performants que des modèles plus coûteux. En proposant de s'appuyer sur une nouvelle définition de l'efficacité, nous entendons fournir aux praticiens une base de décision pour choisir le meilleur modèle en fonction de leurs exigences. Dans la dernière partie de la thèse, nous proposons une méthode pour réduire les coûts associés à l'inférence des modèle génératif profonds, basée sur la quantification des réseaux de neurones. Nous montrons un gain notable sur la taille des modèles et donnons des pistes pour l'utilisation future de ces modèles dans des systèmes embarqués. En somme, nous fournissons des clés pour mieux comprendre l'impact des modèles génératifs profonds pour la synthèse audio ainsi qu'un nouveau cadre pour développer des modèles tout en tenant compte de leur impact environnemental. Nous espérons que ce travail permettra de sensibiliser les chercheurs à la nécessité d'étudier des modèles efficaces sur le plan énergétique tout en garantissant une qualité audio élevée
In this thesis, we investigate the environmental impact of deep learning models for audio generation and we aim to put computational cost at the core of the evaluation process. In particular, we focus on different types of deep learning models specialized in raw waveform audio synthesis. These models are now a key component of modern audio systems, and their use has increased significantly in recent years. Their flexibility and generalization capabilities make them powerful tools in many contexts, from text-to-speech synthesis to unconditional audio generation. However, these benefits come at the cost of expensive training sessions on large amounts of data, operated on energy-intensive dedicated hardware, which incurs large greenhouse gas emissions. The measures we use as a scientific community to evaluate our work are at the heart of this problem. Currently, deep learning researchers evaluate their works primarily based on improvements in accuracy, log-likelihood, reconstruction, or opinion scores, all of which overshadow the computational cost of generative models. Therefore, we propose using a new methodology based on Pareto optimality to help the community better evaluate their work's significance while bringing energy footprint -- and in fine carbon emissions -- at the same level of interest as the sound quality. In the first part of this thesis, we present a comprehensive report on the use of various evaluation measures of deep generative models for audio synthesis tasks. Even though computational efficiency is increasingly discussed, quality measurements are the most commonly used metrics to evaluate deep generative models, while energy consumption is almost never mentioned. Therefore, we address this issue by estimating the carbon cost of training generative models and comparing it to other noteworthy carbon costs to demonstrate that it is far from insignificant. In the second part of this thesis, we propose a large-scale evaluation of pervasive neural vocoders, which are a class of generative models used for speech generation, conditioned on mel-spectrogram. We introduce a multi-objective analysis based on Pareto optimality of both quality from human-based evaluation and energy consumption. Within this framework, we show that lighter models can perform better than more costly models. By proposing to rely on a novel definition of efficiency, we intend to provide practitioners with a decision basis for choosing the best model based on their requirements. In the last part of the thesis, we propose a method to reduce the inference costs of neural vocoders, based on quantizated neural networks. We show a significant gain on the memory size and give some hints for the future use of these models on embedded hardware. Overall, we provide keys to better understand the impact of deep generative models for audio synthesis as well as a new framework for developing models while accounting for their environmental impact. We hope that this work raises awareness on the need to investigate energy-efficient models simultaneously with high perceived quality
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Caillon, Antoine. "Hierarchical temporal learning for multi-instrument and orchestral audio synthesis". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS115.

Texto completo da fonte
Resumo:
Les progrès récents en matière d'apprentissage automatique ont permis l'émergence de nouveaux types de modèles adaptés à de nombreuses tâches, ce grâce à l'optimisation d'un ensemble de paramètres visant à minimiser une fonction de coût. Parmi ces techniques, les modèles génératifs probabilistes ont permis des avancées notables dans la génération de textes, d'images et de sons. Cependant, la génération de signaux audio musicaux reste un défi. Cela vient de la complexité intrinsèque des signaux audio, une seule seconde d'audio brut comprenant des dizaines de milliers d'échantillons individuels. La modélisation des signaux musicaux est plus difficile encore, étant donné que d'importantes informations sont structurées sur différentes échelles de temps, allant du micro (timbre, transitoires, phase) au macro (genre, tempo, structure). La modélisation simultanée de toutes ces échelles implique l'utilisation de larges architectures de modèles, rendant impossible leur utilisation en temps réel en raison de la complexité de calcul. Dans cette thèse, nous proposons une approche hiérarchique de la modélisation du signal audio musical, permettant l'utilisation de modèles légers tout en offrant différents niveaux de contrôle à l'utilisateur. Notre hypothèse principale est que l'extraction de différents niveaux de représentation d'un signal audio permet d'abstraire la complexité des niveaux inférieurs pour chaque étape de modélisation. Dans un premier temps, nous proposons un modèle audio combinant Auto Encodeur Variationnel et Réseaux Antagonistes Génératifs, appliqué directement sur la forme d'onde brute et permettant une synthèse audio neuronale de haute qualité à 48 kHz, tout en étant 20 fois plus rapide que le temps réel sur CPU. Nous étudions ensuite l'utilisation d'approches autoregressives pour modéliser le comportement temporel de la représentation produite par ce modèle audio bas niveau, tout en utilisant des signaux de conditionnement supplémentaires tels que des descripteurs acoustiques ou le tempo. Enfin, nous proposons une méthode pour utiliser tous les modèles proposés directement sur des flux audio, ce qui les rend utilisables dans des applications temps réel que nous avons développées au cours de cette thèse. Nous concluons en présentant diverses collaborations créatives menées en parallèle de ce travail avec plusieurs compositeurs et musiciens, intégrant directement l'état actuel des technologies proposées au sein de pièces musicales
Recent advances in deep learning have offered new ways to build models addressing a wide variety of tasks through the optimization of a set of parameters based on minimizing a cost function. Amongst these techniques, probabilistic generative models have yielded impressive advances in text, image and sound generation. However, musical audio signal generation remains a challenging problem. This comes from the complexity of audio signals themselves, since a single second of raw audio spans tens of thousands of individual samples. Modeling musical signals is even more challenging as important information are structured across different time scales, from micro (e.g. timbre, transient, phase) to macro (e.g. genre, tempo, structure) information. Modeling every scale at once would require large architectures, precluding the use of resulting models in real time setups for computational complexity reasons.In this thesis, we study how a hierarchical approach to audio modeling can address the musical signal modeling task, while offering different levels of control to the user. Our main hypothesis is that extracting different representation levels of an audio signal allows to abstract the complexity of lower levels for each modeling stage. This would eventually allow the use of lightweight architectures, each modeling a single audio scale. We start by addressing raw audio modeling by proposing an audio model combining Variational Auto Encoders and Generative Adversarial Networks, yielding high-quality 48kHz neural audio synthesis, while being 20 times faster than real time on CPU. Then, we study how autoregressive models can be used to understand the temporal behavior of the representation yielded by this low-level audio model, using optional additional conditioning signals such as acoustic descriptors or tempo. Finally, we propose a method for using all the proposed models directly on audio streams, allowing their use in realtime applications that we developed during this thesis. We conclude by presenting various creative collaborations led in parallel of this work with several composers and musicians, directly integrating the current state of the proposed technologies inside musical pieces
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Nishikimi, Ryo. "Generative, Discriminative, and Hybrid Approaches to Audio-to-Score Automatic Singing Transcription". Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263772.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

CHEMLA, ROMEU SANTOS AXEL CLAUDE ANDRE'. "MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES". Doctoral thesis, Università degli Studi di Milano, 2020. http://hdl.handle.net/2434/700444.

Texto completo da fonte
Resumo:
Tra i diversi campi di ricerca nell’ambito dell’informatica musicale, la sintesi e la generazione di segnali audio incarna la pluridisciplinalità di questo settore, nutrendo insieme le pratiche scientifiche e musicale dalla sua creazione. Inerente all’informatica dalla sua creazione, la generazione audio ha ispirato numerosi approcci, evolvendo colle pratiche musicale e gli progressi tecnologici e scientifici. Inoltre, alcuni processi di sintesi permettono anche il processo inverso, denominato analisi, in modo che i parametri di sintesi possono anche essere parzialmente o totalmente estratti dai suoni, dando una rappresentazione alternativa ai segnali analizzati. Per di più, la recente ascesa dei algoritmi di l’apprendimento automatico ha vivamente interrogato il settore della ricerca scientifica, fornendo potenti data-centered metodi che sollevavano diversi epistemologici interrogativi, nonostante i sui efficacia. Particolarmente, un tipo di metodi di apprendimento automatico, denominati modelli generativi, si concentrano sulla generazione di contenuto originale usando le caratteristiche che hanno estratti dei dati analizzati. In tal caso, questi modelli non hanno soltanto interrogato i precedenti metodi di generazione, ma anche sul modo di integrare questi algoritmi nelle pratiche artistiche. Mentre questi metodi sono progressivamente introdotti nel settore del trattamento delle immagini, la loro applicazione per la sintesi di segnali audio e ancora molto marginale. In questo lavoro, il nostro obiettivo e di proporre un nuovo metodo di audio sintesi basato su questi nuovi tipi di generativi modelli, rafforazti dalle nuove avanzati dell’apprendimento automatico. Al primo posto, facciamo una revisione dei approcci esistenti nei settori dei sistemi generativi e di sintesi sonore, focalizzando sul posto di nostro lavoro rispetto a questi disciplini e che cosa possiamo aspettare di questa collazione. In seguito, studiamo in maniera più precisa i modelli generativi, e come possiamo utilizzare questi recenti avanzati per l’apprendimento di complesse distribuzione di suoni, in un modo che sia flessibile e nel flusso creativo del utente. Quindi proponiamo un processo di inferenza / generazione, il quale rifletta i processi di analisi/sintesi che sono molto usati nel settore del trattamento del segnale audio, usando modelli latenti, che sono basati sull’utilizzazione di un spazio continuato di alto livello, che usiamo per controllare la generazione. Studiamo dapprima i risultati preliminari ottenuti con informazione spettrale estratte da diversi tipi di dati, che valutiamo qualitativamente e quantitativamente. Successiva- mente, studiamo come fare per rendere questi metodi più adattati ai segnali audio, fronteggiando tre diversi aspetti. Primo, proponiamo due diversi metodi di regolarizzazione di questo generativo spazio che sono specificamente sviluppati per l’audio : una strategia basata sulla traduzione segnali / simboli, e una basata su vincoli percettivi. Poi, proponiamo diversi metodi per fronteggiare il aspetto temporale dei segnali audio, basati sull’estrazione di rappresentazioni multiscala e sulla predizione, che permettono ai generativi spazi ottenuti di anche modellare l’aspetto dinamico di questi segnali. Per finire, cambiamo il nostro approccio scientifico per un punto di visto piú ispirato dall’idea di ricerca e creazione. Primo, descriviamo l’architettura e il design della nostra libreria open-source, vsacids, sviluppata per permettere a esperti o non-esperti musicisti di provare questi nuovi metodi di sintesi. Poi, proponiamo una prima utilizzazione del nostro modello con la creazione di una performance in real- time, chiamata ægo, basata insieme sulla nostra libreria vsacids e sull’uso di une agente di esplorazione, imparando con rinforzo nel corso della composizione. Finalmente, tramo dal lavoro presentato alcuni conclusioni sui diversi modi di migliorare e rinforzare il metodo di sintesi proposto, nonché eventuale applicazione artistiche.
Among the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.
À travers les différents domaines de recherche de la musique computationnelle, l’analysie et la génération de signaux audio sont l’exemple parfait de la trans-disciplinarité de ce domaine, nourrissant simultanément les pratiques scientifiques et artistiques depuis leur création. Intégrée à la musique computationnelle depuis sa création, la synthèse sonore a inspiré de nombreuses approches musicales et scientifiques, évoluant de pair avec les pratiques musicales et les avancées technologiques et scientifiques de son temps. De plus, certaines méthodes de synthèse sonore permettent aussi le processus inverse, appelé analyse, de sorte que les paramètres de synthèse d’un certain générateur peuvent être en partie ou entièrement obtenus à partir de sons donnés, pouvant ainsi être considérés comme une représentation alternative des signaux analysés. Parallèlement, l’intérêt croissant soulevé par les algorithmes d’apprentissage automatique a vivement questionné le monde scientifique, apportant de puissantes méthodes d’analyse de données suscitant de nombreux questionnements épistémologiques chez les chercheurs, en dépit de leur effectivité pratique. En particulier, une famille de méthodes d’apprentissage automatique, nommée modèles génératifs, s’intéressent à la génération de contenus originaux à partir de caractéristiques extraites directement des données analysées. Ces méthodes n’interrogent pas seulement les approches précédentes, mais aussi sur l’intégration de ces nouvelles méthodes dans les processus créatifs existants. Pourtant, alors que ces nouveaux processus génératifs sont progressivement intégrés dans le domaine la génération d’image, l’application de ces techniques en synthèse audio reste marginale. Dans cette thèse, nous proposons une nouvelle méthode d’analyse-synthèse basés sur ces derniers modèles génératifs, depuis renforcés par les avancées modernes dans le domaine de l’apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des systèmes génératifs, sur comment notre travail peut s’insérer dans les pratiques de synthèse sonore existantes, et que peut-on espérer de l’hybridation de ces deux approches. Ensuite, nous nous focaliserons plus précisément sur comment les récentes avancées accomplies dans ce domaine dans ce domaine peuvent être exploitées pour l’apprentissage de distributions sonores complexes, tout en étant suffisamment flexibles pour être intégrées dans le processus créatif de l’utilisateur. Nous proposons donc un processus d’inférence / génération, reflétant les paradigmes d’analyse-synthèse existant dans le domaine de génération audio, basé sur l’usage de modèles latents continus que l’on peut utiliser pour contrôler la génération. Pour ce faire, nous étudierons déjà les résultats préliminaires obtenus par cette méthode sur l’apprentissage de distributions spectrales, prises d’ensembles de données diversifiés, en adoptant une approche à la fois quantitative et qualitative. Ensuite, nous proposerons d’améliorer ces méthodes de manière spécifique à l’audio sur trois aspects distincts. D’abord, nous proposons deux stratégies de régularisation différentes pour l’analyse de signaux audio : une basée sur la traduction signal/ symbole, ainsi qu’une autre basée sur des contraintes perceptives. Nous passerons par la suite à la dimension temporelle de ces signaux audio, proposant de nouvelles méthodes basées sur l’extraction de représentations temporelles multi-échelle et sur une tâche supplémentaire de prédiction, permettant la modélisation de caractéristiques dynamiques par les espaces génératifs obtenus. En dernier lieu, nous passerons d’une approche scientifique à une approche plus orientée vers un point de vue recherche & création. Premièrement, nous présenterons notre librairie open-source, vsacids, visant à être employée par des créateurs experts et non-experts comme un outil intégré. Ensuite, nous proposons une première utilisation musicale de notre système par la création d’une performance temps réel, nommée ægo, basée à la fois sur notre librarie et sur un agent d’exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu’à maintenant, concernant les possibles améliorations et développements de la méthode de synthèse proposée, ainsi que sur de possibles applications créatives.
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Guenebaut, Boris. "Automatic Subtitle Generation for Sound in Videos". Thesis, University West, Department of Economics and IT, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:hv:diva-1784.

Texto completo da fonte
Resumo:

The last ten years have been the witnesses of the emergence of any kind of video content. Moreover, the appearance of dedicated websites for this phenomenon has increased the importance the public gives to it. In the same time, certain individuals are deaf and occasionally cannot understand the meanings of such videos because there is not any text transcription available. Therefore, it is necessary to find solutions for the purpose of making these media artefacts accessible for most people. Several software propose utilities to create subtitles for videos but all require an extensive participation of the user. Thence, a more automated concept is envisaged. This thesis report indicates a way to generate subtitles following standards by using speech recognition. Three parts are distinguished. The first one consists in separating audio from video and converting the audio in suitable format if necessary. The second phase proceeds to the recognition of speech contained in the audio. The ultimate stage generates a subtitle file from the recognition results of the previous step. Directions of implementation have been proposed for the three distinct modules. The experiment results have not done enough satisfaction and adjustments have to be realized for further work. Decoding parallelization, use of well trained models, and punctuation insertion are some of the improvements to be done.

Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Scarlato, Michele. "Sicurezza di rete, analisi del traffico e monitoraggio". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2012. http://amslaurea.unibo.it/3223/.

Texto completo da fonte
Resumo:
Il lavoro è stato suddiviso in tre macro-aree. Una prima riguardante un'analisi teorica di come funzionano le intrusioni, di quali software vengono utilizzati per compierle, e di come proteggersi (usando i dispositivi che in termine generico si possono riconoscere come i firewall). Una seconda macro-area che analizza un'intrusione avvenuta dall'esterno verso dei server sensibili di una rete LAN. Questa analisi viene condotta sui file catturati dalle due interfacce di rete configurate in modalità promiscua su una sonda presente nella LAN. Le interfacce sono due per potersi interfacciare a due segmenti di LAN aventi due maschere di sotto-rete differenti. L'attacco viene analizzato mediante vari software. Si può infatti definire una terza parte del lavoro, la parte dove vengono analizzati i file catturati dalle due interfacce con i software che prima si occupano di analizzare i dati di contenuto completo, come Wireshark, poi dei software che si occupano di analizzare i dati di sessione che sono stati trattati con Argus, e infine i dati di tipo statistico che sono stati trattati con Ntop. Il penultimo capitolo, quello prima delle conclusioni, invece tratta l'installazione di Nagios, e la sua configurazione per il monitoraggio attraverso plugin dello spazio di disco rimanente su una macchina agent remota, e sui servizi MySql e DNS. Ovviamente Nagios può essere configurato per monitorare ogni tipo di servizio offerto sulla rete.
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Mehri, Soroush. "Sequential modeling, generative recurrent neural networks, and their applications to audio". Thèse, 2016. http://hdl.handle.net/1866/18762.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.

Livros sobre o assunto "Generative audio models"

1

Osipov, Vladimir. Control and audit of the activities of a commercial organization: external and internal. ru: INFRA-M Academic Publishing LLC., 2021. http://dx.doi.org/10.12737/1137320.

Texto completo da fonte
Resumo:
The textbook reveals the role of control in ensuring the effective operation of a commercial organization, and sets its purpose and objectives. The main directions of external and internal control of the activities of a commercial organization are defined and the characteristics of the functions performed by them are given. The basic principles of external and internal audit are formulated, their purpose is defined, and the procedure for regulatory and legal regulation of audit activities in the Russian Federation is considered. The features of control over the activities of a commercial organization in management accounting are revealed, and the need for its further development in modern business conditions is justified. To consolidate the theoretical material, the practical and methodological support of the discipline is provided. Meets the requirements of the federal state educational standards of higher education of the latest generation. It is intended for students in the bachelor's degree program 38.03.01 " Economics "(profile "Accounting, Analysis and Audit") and teachers of economic specialties, students of the postgraduate education system, practitioners related to external and internal control and audit of the activities of commercial organizations.
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Kazimagomedov, Abdulla, Aida Abdulsalamova, M. Mel'nikov e N. Gadzhiev. Analysis of the activities of a commercial bank. ru: INFRA-M Academic Publishing LLC., 2022. http://dx.doi.org/10.12737/1831614.

Texto completo da fonte
Resumo:
The textbook presents modern ideas about the analysis of the activities of a commercial bank, in particular, the theoretical and practical issues related to the organization of internal control and audit, analysis of banking operations and services, customer base and creditworthiness of borrowers, banking risks, regulatory requirements of the Central Bank of the Russian Federation and interest rates, financial condition and financial results of a commercial bank are comprehensively disclosed et al . Meets the requirements of the federal state educational standards of higher education of the latest generation. For students of educational institutions of higher education studying in economic specialties at bachelor's and master's levels, graduate students, as well as teachers of economic disciplines, managers and employees working in the banking and financial system, as well as for those who are interested in improving their qualifications.
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Kerouac, Jack. Big-Sur: Roman. Sankt-Peterburg: Azbuka, 2013.

Encontre o texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Colmeiro, José. Peripheral Visions / Global Sounds. Liverpool University Press, 2018. http://dx.doi.org/10.5949/liverpool/9781786940308.001.0001.

Texto completo da fonte
Resumo:
Galician audio/visual culture has experienced an unprecedented period of growth following the process of political and cultural devolution in post-Franco Spain. This creative explosion has occurred in a productive dialogue with global currents and with considerable projection beyond the geopolitical boundaries of the nation and the state, but these seismic changes are only beginning to be the subject of attention of cultural and media studies. This book examines contemporary audio/visual production in Galicia as privileged channels through which modern Galician cultural identities have been imagined, constructed and consumed, both at home and abroad. The cultural redefinition of Galicia in the global age is explored through different media texts (popular music, cinema, video) which cross established boundaries and deterritorialise new border zones where tradition and modernity dissolve, generating creative tensions between the urban and the rural, the local and the global, the real and the imagined. The book aims for the deperipheralization and deterritorialization of the Galician cultural map by overcoming long-established hegemonic exclusions, whether based on language, discipline, genre, gender, origins, or territorial demarcation, while aiming to disjoint the center/periphery dichotomy that has relegated Galician culture to the margins. In essence, it is an attempt to resituate Galicia and Galician studies out of the periphery and open them to the world.
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Aguayo, Angela J. Documentary Resistance. Oxford University Press, 2019. http://dx.doi.org/10.1093/oso/9780190676216.001.0001.

Texto completo da fonte
Resumo:
The potential of documentary moving images to foster democratic exchange has been percolating within media production culture for the last century, and now, with mobile cameras at our fingertips and broadcasts circulating through unpredictable social networks, the documentary impulse is coming into its own as a political force of social change. The exploding reach and power of audio and video are multiplying documentary modes of communication. Once considered an outsider media practice, documentary is finding mass appeal in the allure of moving images, collecting participatory audiences that create meaningful challenges to the social order. Documentary is adept at collecting frames of human experience, challenging those insights, and turning these stories into public knowledge that is palpable for audiences. Generating pathways of exchange between unlikely interlocutors, collective identification forged with documentary discourse constitutes a mode of political agency that is directing energy toward acting in the world. Reflecting experiences of life unfolding before the camera, documentary representations help order social relationships that deepen our public connections and generate collective roots. As digital culture creates new pathways through which information can flow, the connections generated from social change documentary constitute an emerging public commons. Considering the deep ideological divisions that are fracturing U.S. democracy, it is of critical significance to understand how communities negotiate power and difference by way of an expanding documentary commons. Investment in the force of documentary resistance helps cultivate an understanding of political life from the margins, where documentary production practices are a form of survival.
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Kerouac, Jack. Big Sur. Penguin Books, Limited, 2018.

Encontre o texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Kerouac, Jack. Big Sur. Penguin Books, 2011.

Encontre o texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Kerouac, Jack. Big Sur. Penguin Classics, 2001.

Encontre o texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Kerouac, Jack. Big Sur. McGraw-Hill Companies, 1990.

Encontre o texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Kerouac, Jack. Big Sur. Independently Published, 2021.

Encontre o texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.

Capítulos de livros sobre o assunto "Generative audio models"

1

Huzaifah, Muhammad, e Lonce Wyse. "Deep Generative Models for Musical Audio Synthesis". In Handbook of Artificial Intelligence for Music, 639–78. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-72116-9_22.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Ye, Sheng, Yu-Hui Wen, Yanan Sun, Ying He, Ziyang Zhang, Yaoyuan Wang, Weihua He e Yong-Jin Liu. "Audio-Driven Stylized Gesture Generation with Flow-Based Model". In Lecture Notes in Computer Science, 712–28. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20065-6_41.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Wyse, Lonce, Purnima Kamath e Chitralekha Gupta. "Sound Model Factory: An Integrated System Architecture for Generative Audio Modelling". In Artificial Intelligence in Music, Sound, Art and Design, 308–22. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-03789-4_20.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Farkas, Michal, e Peter Lacko. "Using Advanced Audio Generating Techniques to Model Electrical Energy Load". In Engineering Applications of Neural Networks, 39–48. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-65172-9_4.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Golani, Mati, e Shlomit S. Pinter. "Generating a Process Model from a Process Audit Log". In Lecture Notes in Computer Science, 136–51. Berlin, Heidelberg: Springer Berlin Heidelberg, 2003. http://dx.doi.org/10.1007/3-540-44895-0_10.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

de Berardinis, Jacopo, Valentina Anita Carriero, Nitisha Jain, Nicolas Lazzari, Albert Meroño-Peñuela, Andrea Poltronieri e Valentina Presutti. "The Polifonia Ontology Network: Building a Semantic Backbone for Musical Heritage". In The Semantic Web – ISWC 2023, 302–22. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-47243-5_17.

Texto completo da fonte
Resumo:
AbstractIn the music domain, several ontologies have been proposed to annotate musical data, in both symbolic and audio form, and generate semantically rich Music Knowledge Graphs. However, current models lack interoperability and are insufficient for representing music history and the cultural heritage context in which it was generated; risking the propagation of recency and cultural biases to downstream applications. In this article, we propose the Polifonia Ontology Network (PON) for music cultural heritage, centred around four modules: Music Meta (metadata), Representation (content), Source (provenance) and Instrument (cultural objects). We design PON with a strong accent on cultural stakeholder requirements and competency questions (CQs), contributing an NLP-based toolkit to support knowledge engineers in generating, validating, and analysing them; and a novel, high-quality CQ dataset produced as a result. We show current and future use of these resources by internal project pilots, early adopters in the music industry, and opportunities for the Semantic Web and Music Information Retrieval communities.
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Kim, Sang-Kyun, Doo Sun Hwang, Ji-Yeun Kim e Yang-Seock Seo. "An Effective News Anchorperson Shot Detection Method Based on Adaptive Audio/Visual Model Generation". In Lecture Notes in Computer Science, 276–85. Berlin, Heidelberg: Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11526346_31.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Yoshii, Kazuyoshi, e Masataka Goto. "MusicCommentator: Generating Comments Synchronized with Musical Audio Signals by a Joint Probabilistic Model of Acoustic and Textual Features". In Lecture Notes in Computer Science, 85–97. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-04052-8_8.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Renugadevi, R., J. Shobana, K. Arthi, Kalpana A. V., D. Satishkumar e M. Sivaraja. "Real-Time Applications of Artificial Intelligence Technology in Daily Operations". In Advances in Computational Intelligence and Robotics, 243–57. IGI Global, 2024. http://dx.doi.org/10.4018/979-8-3693-2615-2.ch012.

Texto completo da fonte
Resumo:
Artificial intelligence (AI) is a system endowed with the capability to perceive its surroundings and execute actions aimed at maximizing the probability of accomplishing its objectives. It possesses the capacity to interpret and analyze data in a manner that facilitates learning and adaptation over time. Generative AI pertains to artificial intelligence models specifically designed for the creation of fresh content, spanning written text, audio, images, or videos. Its applications are diverse, ranging from generating stories mimicking a particular author's style to producing realistic images of non-existent individuals, composing music in the manner of renowned composers, or translating textual descriptions into video clips.
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Carpio de los Pinos, Carmen, e Arturo Galán González. "Facilitating Accessibility: A Study on Innovative Didactic Materials to Generate Emotional Interactions with Pictorial Art". In The Science of Emotional Intelligence. IntechOpen, 2021. http://dx.doi.org/10.5772/intechopen.97796.

Texto completo da fonte
Resumo:
This research has been undertaken to establish criteria for the construction of didactic materials to be experienced through touch (using a three-dimensional model) and hearing (through the provision of an audio description of the chosen painting) to provide learning and emotions. Eleven experts examined the didactic tools in which the scene of the painting had been depicted, through the use of white plastic figures modeled using a 3D printer. The models had been positioned to accurately correspond with the reference painting, with an explanatory narration supplied as an audio recording. Each of the experts involved were asked the same open questions in interviews that were audio-recorded and later transcribed. This feedback was analyzed and eleven concerns for consideration were determined: 1. How the figures felt to touch 2. Modeling and placement of the figure, 3. Position of the character, 4. Size, 5. The accurate 3D depiction of the 2D image, 6. Perspectives or visual points of view of the scene, 7. Enough representation of the painting in the model, 8. Distribution of visual components within the scene, 9. Perceptual appraisals, 10. Size of the model, 11. Touch of the whole model. The results indicated that the size of the model and the figurines was appropriate for their function. The figurines felt pleasant to handle and adequately described the postures and placement. Suggestions for further improvements were including more figurines in the model and adding color (omitted in the test model) to belong an inclusive design.
Estilos ABNT, Harvard, Vancouver, APA, etc.

Trabalhos de conferências sobre o assunto "Generative audio models"

1

Yang, Hyukryul, Hao Ouyang, Vladlen Koltun e Qifeng Chen. "Hiding Video in Audio via Reversible Generative Models". In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019. http://dx.doi.org/10.1109/iccv.2019.00119.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Nguyen, Viet-Nhat, Mostafa Sadeghi, Elisa Ricci e Xavier Alameda-Pineda. "Deep Variational Generative Models for Audio-Visual Speech Separation". In 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2021. http://dx.doi.org/10.1109/mlsp52302.2021.9596406.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Mingliang Gu e Yuguo Xia. "Fusing generative and discriminative models for Chinese dialect identification". In 2008 International Conference on Audio, Language and Image Processing (ICALIP). IEEE, 2008. http://dx.doi.org/10.1109/icalip.2008.4590173.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Shah, Neil, Dharmeshkumar M. Agrawal e Niranajan Pedanekar. "Adding Crowd Noise to Sports Commentary using Generative Models". In Life Improvement in Quality by Ubiquitous Experiences Workshop. Brazilian Computing Society, 2021. http://dx.doi.org/10.5753/lique.2021.15715.

Texto completo da fonte
Resumo:
Crowd noise forms an integral part of a live sports experience. In the post-COVID era, when live audiences are absent, crowd noise needs to be added to the live commentary. This paper exploits the correlation between commentary and crowd noise of a live sports event and presents an audio stylizing sports commentary method by generating live stadium-like sound using neural generative models. We use the Generative Adversarial Network (GAN)-based architectures such as Cycle-consistent GANs (Cycle-GANs) and Mel-GANs to generate live stadium-like sound samples given the live commentary. Due to the unavailability of raw commentary sound samples, we use end-to-end time-domain source separation models (SEGAN and Wave-U-Net) to extract commentary sound from combined recordings of the live sound acquired from YouTube highlights of soccer videos. We present a qualitative and a subjective user evaluation of the similarity of the generated live sound with the reference live sound.
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Barnett, Julia. "The Ethical Implications of Generative Audio Models: A Systematic Literature Review". In AIES '23: AAAI/ACM Conference on AI, Ethics, and Society. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3600211.3604686.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Ye, Zhenhui, Zhou Zhao, Yi Ren e Fei Wu. "SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech". In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/620.

Texto completo da fonte
Resumo:
The recent progress in non-autoregressive text-to-speech (NAR-TTS) has made fast and high-quality speech synthesis possible. However, current NAR-TTS models usually use phoneme sequence as input and thus cannot understand the tree-structured syntactic information of the input sequence, which hurts the prosody modeling. To this end, we propose SyntaSpeech, a syntax-aware and light-weight NAR-TTS model, which integrates tree-structured syntactic information into the prosody modeling modules in PortaSpeech. Specifically, 1) We build a syntactic graph based on the dependency tree of the input sentence, then process the text encoding with a syntactic graph encoder to extract the syntactic information. 2) We incorporate the extracted syntactic encoding with PortaSpeech to improve the prosody prediction. 3) We introduce a multi-length discriminator to replace the flow-based post-net in PortaSpeech, which simplifies the training pipeline and improves the inference speed, while keeping the naturalness of the generated audio. Experiments on three datasets not only show that the tree-structured syntactic information grants SyntaSpeech the ability to synthesize better audio with expressive prosody, but also demonstrate the generalization ability of SyntaSpeech to adapt to multiple languages and multi-speaker text-to-speech. Ablation studies demonstrate the necessity of each component in SyntaSpeech. Source code and audio samples are available at https://syntaspeech.github.io.
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Agiomyrgiannakis, Yannis. "B-Spline Pdf: A Generalization of Histograms to Continuous Density Models for Generative Audio Networks". In ICASSP 2018 - 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018. http://dx.doi.org/10.1109/icassp.2018.8461399.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Vatanparvar, Korosh, Viswam Nathan, Ebrahim Nemati, Md Mahbubur Rahman e Jilong Kuang. "Adapting to Noise in Speech Obfuscation by Audio Profiling Using Generative Models for Passive Health Monitoring". In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) in conjunction with the 43rd Annual Conference of the Canadian Medical and Biological Engineering Society. IEEE, 2020. http://dx.doi.org/10.1109/embc44109.2020.9176156.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Schimbinschi, Florin, Christian Walder, Sarah M. Erfani e James Bailey. "SynthNet: Learning to Synthesize Music End-to-End". In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/467.

Texto completo da fonte
Resumo:
We consider the problem of learning a mapping directly from annotated music to waveforms, bypassing traditional single note synthesis. We propose a specific architecture based on WaveNet, a convolutional autoregressive generative model designed for text to speech. We investigate the representations learned by these models on music and concludethat mappings between musical notes and the instrument timbre can be learned directly from the raw audio coupled with the musical score, in binary piano roll format.Our model requires minimal training data (9 minutes), is substantially better in quality and converges 6 times faster in comparison to strong baselines in the form of powerful text to speech models.The quality of the generated waveforms (generation accuracy) is sufficiently high,that they are almost identical to the ground truth.Our evaluations are based on both the RMSE of the Constant-Q transform, and mean opinion scores from human subjects.We validate our work using 7 distinct synthetic instrument timbres, real cello music and also provide visualizations and links to all generated audio.
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Farooq, Ahmed, Jari Kangas e Roope Raisamo. "TAUCHI-GPT: Leveraging GPT-4 to create a Multimodal Open-Source Research AI tool". In AHFE 2023 Hawaii Edition. AHFE International, 2023. http://dx.doi.org/10.54941/ahfe1004176.

Texto completo da fonte
Resumo:
In the last few year advances in deep learning and artificial intelligence have made it possible to generate high-quality text, audio, and visual content automatically for a wide range of application areas including research and education. However, designing and customizing an effective R&D tool capable of providing necessary tool-specific output, and breaking down complex research tasks requires a great deal of expertise and effort, and is often a time-consuming and expensive process. Using existing Generative Pre-trained Transformers (GPT) and foundational models, it is now possible to leverage the Large Language Model GPTs already trained on specific datasets to be effective in common research and development workflow. In this paper, we develop and test a customized version of autonomous pretrained generative transformer which is an experimental open-source project built on top of GPT-4 language model that chains together LLM "thoughts", to autonomously achieve and regress towards specifics goals. Our implementation, referred to as TAUCHI-GPT, which uses an automated approach to text generation that leverages deep learning and output reflection to create high-quality text, visual and auditory output, achieve common research and development tasks. TAUCHI-GPT is based on the GPT-4 architecture and connects to Stable Diffusion and ElevenLabs to input and output complex multimodal streams through chain prompting. Moreover, using the Google Search API, TAUCHI-GPT can also scrap online repositories to understand, learn and deconstruct complex research tasks, identify relevant information, and plan appropriate courses of action by implementing a chain of thought (CoT).
Estilos ABNT, Harvard, Vancouver, APA, etc.

Relatórios de organizações sobre o assunto "Generative audio models"

1

Decleir, Cyril, Mohand-Saïd Hacid e Jacques Kouloumdjian. A Database Approach for Modeling and Querying Video Data. Aachen University of Technology, 1999. http://dx.doi.org/10.25368/2022.90.

Texto completo da fonte
Resumo:
Indexing video data is essential for providing content based access. In this paper, we consider how database technology can offer an integrated framework for modeling and querying video data. As many concerns in video (e.g., modeling and querying) are also found in databases, databases provide an interesting angle to attack many of the problems. From a video applications perspective, database systems provide a nice basis for future video systems. More generally, database research will provide solutions to many video issues even if these are partial or fragmented. From a database perspective, video applications provide beautiful challenges. Next generation database systems will need to provide support for multimedia data (e.g., image, video, audio). These data types require new techniques for their management (i.e., storing, modeling, querying, etc.). Hence new solutions are significant. This paper develops a data model and a rule-based query language for video content based indexing and retrieval. The data model is designed around the object and constraint paradigms. A video sequence is split into a set of fragments. Each fragment can be analyzed to extract the information (symbolic descriptions) of interest that can be put into a database. This database can then be searched to find information of interest. Two types of information are considered: (1) the entities (objects) of interest in the domain of a video sequence, (2) video frames which contain these entities. To represent these information, our data model allows facts as well as objects and constraints. We present a declarative, rule-based, constraint query language that can be used to infer relationships about information represented in the model. The language has a clear declarative and operational semantics. This work is a major revision and a consolidation of [12, 13].
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Vakaliuk, Tetiana A., Valerii V. Kontsedailo, Dmytro S. Antoniuk, Olha V. Korotun, Iryna S. Mintii e Andrey V. Pikilnyak. Using game simulator Software Inc in the Software Engineering education. [б. в.], fevereiro de 2020. http://dx.doi.org/10.31812/123456789/3762.

Texto completo da fonte
Resumo:
The article presents the possibilities of using game simulator Sotware Inc in the training of future software engineer in higher education. Attention is drawn to some specific settings that need to be taken into account when training in the course of training future software engineers. More and more educational institutions are introducing new teaching methods, which result in the use of engineering students, in particular, future software engineers, to deal with real professional situations in the learning process. The use of modern ICT, including game simulators, in the educational process, allows to improve the quality of educational material and to enhance the educational effects from the use of innovative pedagogical programs and methods, as it gives teachers additional opportunities for constructing individual educational trajectories of students. The use of ICT allows for a differentiated approach to students with different levels of readiness to study. A feature of any software engineer is the need to understand the related subject area for which the software is being developed. An important condition for the preparation of a highly qualified specialist is the independent fulfillment by the student of scientific research, the generation, and implementation of his idea into a finished commercial product. In the process of research, students gain knowledge, skills of the future IT specialist and competences of the legal protection of the results of intellectual activity, technological audit, marketing, product realization in the market of innovations. Note that when the real-world practice is impossible for students, game simulators that simulate real software development processes are an alternative.
Estilos ABNT, Harvard, Vancouver, APA, etc.
Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!

Vá para a bibliografia