Gotowa bibliografia na temat „Neural audio synthesis”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Spis treści
Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Neural audio synthesis”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Artykuły w czasopismach na temat "Neural audio synthesis"
Li, Dongze, Kang Zhao, Wei Wang, Bo Peng, Yingya Zhang, Jing Dong i Tieniu Tan. "AE-NeRF: Audio Enhanced Neural Radiance Field for Few Shot Talking Head Synthesis". Proceedings of the AAAI Conference on Artificial Intelligence 38, nr 4 (24.03.2024): 3037–45. http://dx.doi.org/10.1609/aaai.v38i4.28086.
Pełny tekst źródłaVyawahare, Prof D. G. "Image to Audio Conversion for Blind People Using Neural Network". International Journal for Research in Applied Science and Engineering Technology 11, nr 12 (31.12.2023): 1949–57. http://dx.doi.org/10.22214/ijraset.2023.57712.
Pełny tekst źródłaKiefer, Chris. "Sample-level sound synthesis with recurrent neural networks and conceptors". PeerJ Computer Science 5 (8.07.2019): e205. http://dx.doi.org/10.7717/peerj-cs.205.
Pełny tekst źródłaLiu, Yunyi, i Craig Jin. "Impact on quality and diversity from integrating a reconstruction loss into neural audio synthesis". Journal of the Acoustical Society of America 154, nr 4_supplement (1.10.2023): A99. http://dx.doi.org/10.1121/10.0022922.
Pełny tekst źródłaKhandelwal, Karan, Krishiv Pandita, Kshitij Priyankar, Kumar Parakram i Tejaswini K. "Svara Rachana - Audio Driven Facial Expression Synthesis". International Journal for Research in Applied Science and Engineering Technology 12, nr 5 (31.05.2024): 2024–29. http://dx.doi.org/10.22214/ijraset.2024.62019.
Pełny tekst źródłaVOITKO, Viktoriia, Svitlana BEVZ, Sergii BURBELO i Pavlo STAVYTSKYI. "AUDIO GENERATION TECHNOLOGY OF A SYSTEM OF SYNTHESIS AND ANALYSIS OF MUSIC COMPOSITIONS". Herald of Khmelnytskyi National University 305, nr 1 (23.02.2022): 64–67. http://dx.doi.org/10.31891/2307-5732-2022-305-1-64-67.
Pełny tekst źródłaLi, Naihan, Yanqing Liu, Yu Wu, Shujie Liu, Sheng Zhao i Ming Liu. "RobuTrans: A Robust Transformer-Based Text-to-Speech Model". Proceedings of the AAAI Conference on Artificial Intelligence 34, nr 05 (3.04.2020): 8228–35. http://dx.doi.org/10.1609/aaai.v34i05.6337.
Pełny tekst źródłaHryhorenko, N., N. Larionov i V. Bredikhin. "RESEARCH OF THE PROCESS OF VISUAL ART TRANSMISSION IN MUSIC AND THE CREATION OF COLLECTIONS FOR PEOPLE WITH VISUAL IMPAIRMENTS". Municipal economy of cities 6, nr 180 (4.12.2023): 2–6. http://dx.doi.org/10.33042/2522-1809-2023-6-180-2-6.
Pełny tekst źródłaAndreu, Sergi, i Monica Villanueva Aylagas. "Neural Synthesis of Sound Effects Using Flow-Based Deep Generative Models". Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 18, nr 1 (11.10.2022): 2–9. http://dx.doi.org/10.1609/aiide.v18i1.21941.
Pełny tekst źródłaLi, Naihan, Shujie Liu, Yanqing Liu, Sheng Zhao i Ming Liu. "Neural Speech Synthesis with Transformer Network". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 6706–13. http://dx.doi.org/10.1609/aaai.v33i01.33016706.
Pełny tekst źródłaRozprawy doktorskie na temat "Neural audio synthesis"
Lundberg, Anton. "Data-Driven Procedural Audio : Procedural Engine Sounds Using Neural Audio Synthesis". Thesis, KTH, Datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280132.
Pełny tekst źródłaDet i dagsläget dominerande tillvägagångssättet för rendering av ljud i interaktivamedia, såsom datorspel och virtual reality, innefattar uppspelning av statiska ljudfiler. Detta tillvägagångssätt saknar flexibilitet och kräver hantering av stora mängder ljuddata. Ett alternativt tillvägagångssätt är procedurellt ljud, vari ljudmodeller styrs för att generera ljud i realtid. Trots sina många fördelar används procedurellt ljud ännu inte i någon vid utsträckning inom kommersiella produktioner, delvis på grund av att det genererade ljudet från många föreslagna modeller inte når upp till industrins standarder. Detta examensarbete undersöker hur procedurellt ljud kan utföras med datadrivna metoder. Vi gör detta genom att specifikt undersöka metoder för syntes av bilmotorljud baserade på neural ljudsyntes. Genom att bygga på en nyligen publicerad metod som integrerar digital signalbehandling med djupinlärning, kallad Differentiable Digital Signal Processing (DDSP), kan vår metod skapa ljudmodeller genom att träna djupa neurala nätverk att rekonstruera inspelade ljudexempel från tolkningsbara latenta prediktorer. Vi föreslår en metod för att använda fasinformation från motorers förbränningscykler, samt en differentierbar metod för syntes av transienter. Våra resultat visar att DDSP kan användas till procedurella motorljud, men mer arbete krävs innan våra modeller kan generera motorljud utan oönskade artefakter samt innan de kan användas i realtidsapplikationer. Vi diskuterar hur vårt tillvägagångssätt kan vara användbart inom procedurellt ljud i mer generella sammanhang, samt hur vår metod kan tillämpas på andra ljudkällor
Nistal, Hurlé Javier. "Exploring generative adversarial networks for controllable musical audio synthesis". Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAT009.
Pełny tekst źródłaAudio synthesizers are electronic musical instruments that generate artificial sounds under some parametric control. While synthesizers have evolved since they were popularized in the 70s, two fundamental challenges are still unresolved: 1) the development of synthesis systems responding to semantically intuitive parameters; 2) the design of "universal," source-agnostic synthesis techniques. This thesis researches the use of Generative Adversarial Networks (GAN) towards building such systems. The main goal is to research and develop novel tools for music production that afford intuitive and expressive means of sound manipulation, e.g., by controlling parameters that respond to perceptual properties of the sound and other high-level features. Our first work studies the performance of GANs when trained on various common audio signal representations (e.g., waveform, time-frequency representations). These experiments compare different forms of audio data in the context of tonal sound synthesis. Results show that the Magnitude and Instantaneous Frequency of the phase and the complex-valued Short-Time Fourier Transform achieve the best results. Building on this, our following work presents DrumGAN, a controllable adversarial audio synthesizer of percussive sounds. By conditioning the model on perceptual features describing high-level timbre properties, we demonstrate that intuitive control can be gained over the generation process. This work results in the development of a VST plugin generating full-resolution audio and compatible with any Digital Audio Workstation (DAW). We show extensive musical material produced by professional artists from Sony ATV using DrumGAN. The scarcity of annotations in musical audio datasets challenges the application of supervised methods to conditional generation settings. Our third contribution employs a knowledge distillation approach to extract such annotations from a pre-trained audio tagging system. DarkGAN is an adversarial synthesizer of tonal sounds that employs the output probabilities of such a system (so-called “soft labels”) as conditional information. Results show that DarkGAN can respond moderately to many intuitive attributes, even with out-of-distribution input conditioning. Applications of GANs to audio synthesis typically learn from fixed-size two-dimensional spectrogram data analogously to the "image data" in computer vision; thus, they cannot generate sounds with variable duration. In our fourth paper, we address this limitation by exploiting a self-supervised method for learning discrete features from sequential data. Such features are used as conditional input to provide step-wise time-dependent information to the model. Global consistency is ensured by fixing the input noise z (characteristic in adversarial settings). Results show that, while models trained on a fixed-size scheme obtain better audio quality and diversity, ours can competently generate audio of any duration. One interesting direction for research is the generation of audio conditioned on preexisting musical material, e.g., the generation of some drum pattern given the recording of a bass line. Our fifth paper explores a simple pretext task tailored at learning such types of complex musical relationships. Concretely, we study whether a GAN generator, conditioned on highly compressed MP3 musical audio signals, can generate outputs resembling the original uncompressed audio. Results show that the GAN can improve the quality of the audio signals over the MP3 versions for very high compression rates (16 and 32 kbit/s). As a direct consequence of applying artificial intelligence techniques in musical contexts, we ask how AI-based technology can foster innovation in musical practice. Therefore, we conclude this thesis by providing a broad perspective on the development of AI tools for music production, informed by theoretical considerations and reports from real-world AI tool usage by professional artists
Andreux, Mathieu. "Foveal autoregressive neural time-series modeling". Electronic Thesis or Diss., Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE073.
Pełny tekst źródłaThis dissertation studies unsupervised time-series modelling. We first focus on the problem of linearly predicting future values of a time-series under the assumption of long-range dependencies, which requires to take into account a large past. We introduce a family of causal and foveal wavelets which project past values on a subspace which is adapted to the problem, thereby reducing the variance of the associated estimators. We then investigate under which conditions non-linear predictors exhibit better performances than linear ones. Time-series which admit a sparse time-frequency representation, such as audio ones, satisfy those requirements, and we propose a prediction algorithm using such a representation. The last problem we tackle is audio time-series synthesis. We propose a new generation method relying on a deep convolutional neural network, with an encoder-decoder architecture, which allows to synthesize new realistic signals. Contrary to state-of-the-art methods, we explicitly use time-frequency properties of sounds to define an encoder with the scattering transform, while the decoder is trained to solve an inverse problem in an adapted metric
Książki na temat "Neural audio synthesis"
Nakagawa, Seiichi. Speech, hearing and neural network models. Tokyo: Ohmsha, 1995.
Znajdź pełny tekst źródłaShikano, K., i Y. Tohkura. Speech, Hearing and Neural Network Models, (Biomedical and Health Research). Ios Pr Inc, 1995.
Znajdź pełny tekst źródłaCzęści książek na temat "Neural audio synthesis"
Eppe, Manfred, Tayfun Alpay i Stefan Wermter. "Towards End-to-End Raw Audio Music Synthesis". W Artificial Neural Networks and Machine Learning – ICANN 2018, 137–46. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01424-7_14.
Pełny tekst źródłaTarjano, Carlos, i Valdecy Pereira. "Neuro-Spectral Audio Synthesis: Exploiting Characteristics of the Discrete Fourier Transform in the Real-Time Simulation of Musical Instruments Using Parallel Neural Networks". W Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series, 362–75. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-30490-4_30.
Pełny tekst źródłaSingh, Harman, Parminder Singh i Manjot Kaur Gill. "Statistical Parametric Speech Synthesis for Punjabi Language using Deep Neural Network". W SCRS CONFERENCE PROCEEDINGS ON INTELLIGENT SYSTEMS, 431–41. Soft Computing Research Society, 2021. http://dx.doi.org/10.52458/978-93-91842-08-6-41.
Pełny tekst źródłaTits, Noé, Kevin El Haddad i Thierry Dutoit. "The Theory behind Controllable Expressive Speech Synthesis: A Cross-Disciplinary Approach". W Human 4.0 - From Biology to Cybernetic. IntechOpen, 2021. http://dx.doi.org/10.5772/intechopen.89849.
Pełny tekst źródłaMin, Zeping, Qian Ge i Zhong Li. "CAMP: A Unified Data Solution for Mandarin Speech Recognition Tasks". W Advances in Transdisciplinary Engineering. IOS Press, 2023. http://dx.doi.org/10.3233/atde230552.
Pełny tekst źródłaStreszczenia konferencji na temat "Neural audio synthesis"
Pons, Jordi, Santiago Pascual, Giulio Cengarle i Joan Serra. "Upsampling Artifacts in Neural Audio Synthesis". W ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. http://dx.doi.org/10.1109/icassp39728.2021.9414913.
Pełny tekst źródłaYang, Zih-Syuan, i Jason Hockman. "A Plugin for Neural Audio Synthesis of Impact Sound Effects". W AM '23: Audio Mostly 2023. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3616195.3616221.
Pełny tekst źródłaEzzerg, Abdelhamid, Adam Gabrys, Bartosz Putrycz, Daniel Korzekwa, Daniel Saez-Trigueros, David McHardy, Kamil Pokora, Jakub Lachowicz, Jaime Lorenzo-Trueba i Viacheslav Klimkov. "Enhancing audio quality for expressive Neural Text-to-Speech". W 11th ISCA Speech Synthesis Workshop (SSW 11). ISCA: ISCA, 2021. http://dx.doi.org/10.21437/ssw.2021-14.
Pełny tekst źródłaShimba, Taiki, Ryuhei Sakurai, Hirotake Yamazoe i Joo-Ho Lee. "Talking heads synthesis from audio with deep neural networks". W 2015 IEEE/SICE International Symposium on System Integration (SII). IEEE, 2015. http://dx.doi.org/10.1109/sii.2015.7404961.
Pełny tekst źródłaRamos, Vania Miriam Ortiz, i Sukhan Lee. "Synthesis of Disparate Audio Species via Recurrent Neural Embedding". W 2023 IEEE International Symposium on Multimedia (ISM). IEEE, 2023. http://dx.doi.org/10.1109/ism59092.2023.00036.
Pełny tekst źródłaAntognini, Joseph M., Matt Hoffman i Ron J. Weiss. "Audio Texture Synthesis with Random Neural Networks: Improving Diversity and Quality". W ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8682598.
Pełny tekst źródłaGuo, Yudong, Keyu Chen, Sen Liang, Yong-Jin Liu, Hujun Bao i Juyong Zhang. "AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis". W 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021. http://dx.doi.org/10.1109/iccv48922.2021.00573.
Pełny tekst źródłaHuang, Mincong (Jerry), Samuel Chabot i Jonas Braasch. "Panoptic Reconstruction of Immersive Virtual Soundscapes Using Human-Scale Panoramic Imagery with Visual Recognition". W ICAD 2021: The 26th International Conference on Auditory Display. icad.org: International Community for Auditory Display, 2021. http://dx.doi.org/10.21785/icad2021.043.
Pełny tekst źródłaKazakova, Sophia A., Anastasia A. Zorkina, Armen M. Kocharyan, Aleksei N. Svischev i Sergey V. Rybin. "Expressive Audio Data Augmentation Based on Speech Synthesis and Deep Neural Networks". W 2023 International Conference on Quality Management, Transport and Information Security, Information Technologies (IT&QM&IS). IEEE, 2023. http://dx.doi.org/10.1109/itqmtis58985.2023.10346366.
Pełny tekst źródłaLiu, Yunyi, i Craig Jin. "Impact on quality and diversity from integrating a reconstruction loss into neural audio synthesis". W 185th Meeting of the Acoustical Society of America. ASA, 2023. http://dx.doi.org/10.1121/2.0001871.
Pełny tekst źródła