Auswahl der wissenschaftlichen Literatur zum Thema „Zero-Shot Style Transfer“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Zero-Shot Style Transfer" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Zero-Shot Style Transfer"

1

Zhang, Yu, Rongjie Huang, Ruiqi Li, JinZheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai und Zhou Zhao. „StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 17 (24.03.2024): 19597–605. http://dx.doi.org/10.1609/aaai.v38i17.29932.

Der volle Inhalt der Quelle
Annotation:
Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expressiveness. Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase. To overcome these challenges, we propose StyleSinger, the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. StyleSinger incorporates two critical approaches for enhanced effectiveness: 1) the Residual Style Adaptor (RSA) which employs a residual quantization module to capture diverse style characteristics in singing voices, and 2) the Uncertainty Modeling Layer Normalization (UMLN) to perturb the style attributes within the content representation during the training phase and thus improve the model generalization. Our extensive evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples. Access to singing voice samples can be found at https://stylesinger.github.io/.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Xi, Jier, Xiufen Ye und Chuanlong Li. „Sonar Image Target Detection Based on Style Transfer Learning and Random Shape of Noise under Zero Shot Target“. Remote Sensing 14, Nr. 24 (10.12.2022): 6260. http://dx.doi.org/10.3390/rs14246260.

Der volle Inhalt der Quelle
Annotation:
With the development of sonar technology, sonar images have been widely used to detect targets. However, there are many challenges for sonar images in terms of object detection. For example, the detectable targets in the sonar data are more sparse than those in optical images, the real underwater scanning experiment is complicated, and the sonar image styles produced by different types of sonar equipment due to their different characteristics are inconsistent, which makes it difficult to use them for sonar object detection and recognition algorithms. In order to solve these problems, we propose a novel sonar image object-detection method based on style learning and random noise with various shapes. Sonar style target sample images are generated through style transfer, which enhances insufficient sonar objects image. By introducing various noise shapes, which included points, lines, and rectangles, the problems of mud and sand obstruction and a mutilated target in the real environment are solved, and the single poses of the sonar image target is improved by fusing multiple poses of optical image target. In the meantime, a method of feature enhancement is proposed to solve the issue of missing key features when using style transfer on optical images directly. The experimental results show that our method achieves better precision.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Wang, Wenjing, Jizheng Xu, Li Zhang, Yue Wang und Jiaying Liu. „Consistent Video Style Transfer via Compound Regularization“. Proceedings of the AAAI Conference on Artificial Intelligence 34, Nr. 07 (03.04.2020): 12233–40. http://dx.doi.org/10.1609/aaai.v34i07.6905.

Der volle Inhalt der Quelle
Annotation:
Recently, neural style transfer has drawn many attentions and significant progresses have been made, especially for image style transfer. However, flexible and consistent style transfer for videos remains a challenging problem. Existing training strategies, either using a significant amount of video data with optical flows or introducing single-frame regularizers, have limited performance on real videos. In this paper, we propose a novel interpretation of temporal consistency, based on which we analyze the drawbacks of existing training strategies; and then derive a new compound regularization. Experimental results show that the proposed regularization can better balance the spatial and temporal performance, which supports our modeling. Combining with the new cost formula, we design a zero-shot video style transfer framework. Moreover, for better feature migration, we introduce a new module to dynamically adjust inter-channel distributions. Quantitative and qualitative results demonstrate the superiority of our method over other state-of-the-art style transfer methods. Our project is publicly available at: https://daooshee.github.io/CompoundVST/.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Park, Jangkyoung, Ammar Ul Hassan und Jaeyoung Choi. „CCFont: Component-Based Chinese Font Generation Model Using Generative Adversarial Networks (GANs)“. Applied Sciences 12, Nr. 16 (10.08.2022): 8005. http://dx.doi.org/10.3390/app12168005.

Der volle Inhalt der Quelle
Annotation:
Font generation using deep learning has made considerable progress using image style transfer, but the automatic conversion/generation of Chinese characters still remains a difficult task owing to the complex character shape and large number of Chinese characters. Most known Chinese character generation models use the image conversion method of the Chinese character shape itself; however, it is difficult to reproduce complex Chinese characters. Recent methods have utilized character compositionality by separating up to three or four components to improve the quality of generated characters, but it is still difficult to generate high-quality results for complex Chinese characters with many components. In this study, we proposed the CCFont model (component-based Chinese font generation model using generative adversarial networks (GANs)) that automatically generates all Chinese characters using Chinese character components (up to 17 components). The CCFont model generates all Chinese characters in various styles using the components of Chinese characters based on conditional GAN. By acquiring local style information from the components, the information is more accurate and there is less information loss than when global information is obtained from the image of the entire character, reducing the failure of style conversion and improving quality to produce high-quality results. Additionally, the CCFont model generates high-quality results without any additional training (zero-shot font generation without any additional training) for the first-seen characters and styles. For example, the CCFont model, which was trained with only traditional Chinese (TC) characters, generates high-quality results for languages that can be divided into components, such as Korean and Thai, as well as simplified Chinese (SC) characters that are only seen during inference. CCFont can be adopted as a multi-lingual font-generation model that can be applied to all languages, which can be divided into components. To the best of our knowledge, the proposed method is the first to generate a zero-shot multilingual generation model using components. Qualitative and quantitative experiments were conducted to demonstrate the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Azizah, Kurniawati, und Wisnu Jatmiko. „Transfer Learning, Style Control, and Speaker Reconstruction Loss for Zero-Shot Multilingual Multi-Speaker Text-to-Speech on Low-Resource Languages“. IEEE Access 10 (2022): 5895–911. http://dx.doi.org/10.1109/access.2022.3141200.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Bai, Zhongyu, Hongli Xu, Qichuan Ding und Xiangyue Zhang. „Side-Scan Sonar Image Classification with Zero-Shot and Style Transfer“. IEEE Transactions on Instrumentation and Measurement, 2024, 1. http://dx.doi.org/10.1109/tim.2024.3352693.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Fares, Mireille, Catherine Pelachaud und Nicolas Obin. „Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding“. Frontiers in Artificial Intelligence 6 (12.06.2023). http://dx.doi.org/10.3389/frai.2023.1142997.

Der volle Inhalt der Quelle
Annotation:
Modeling virtual agents with behavior style is one factor for personalizing human-agent interaction. We propose an efficient yet effective machine learning approach to synthesize gestures driven by prosodic features and text in the style of different speakers including those unseen during training. Our model performs zero-shot multimodal style transfer driven by multimodal data from the PATS database containing videos of various speakers. We view style as being pervasive; while speaking, it colors the communicative behaviors expressivity while speech content is carried by multimodal signals and text. This disentanglement scheme of content and style allows us to directly infer the style embedding even of a speaker whose data are not part of the training phase, without requiring any further training or fine-tuning. The first goal of our model is to generate the gestures of a source speaker based on the content of two input modalities–Mel spectrogram and text semantics. The second goal is to condition the source speaker's predicted gestures on the multimodal behavior style embedding of a target speaker. The third goal is to allow zero-shot style transfer of speakers unseen during training without re-training the model. Our system consists of two main components: (1) a speaker style encoder network that learns to generate a fixed-dimensional speaker embedding style from a target speaker multimodal data (mel-spectrogram, pose, and text) and (2) a sequence-to-sequence synthesis network that synthesizes gestures based on the content of the input modalities—text and mel-spectrogram—of a source speaker and conditioned on the speaker style embedding. We evaluate that our model is able to synthesize gestures of a source speaker given the two input modalities and transfer the knowledge of target speaker style variability learned by the speaker style encoder to the gesture generation task in a zero-shot setup, indicating that the model has learned a high-quality speaker representation. We conduct objective and subjective evaluations to validate our approach and compare it with baselines.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Xu, Hongli, Zhongyu Bai, Xiangyue Zhang und Qichuan Ding. „MFSANet: Zero-Shot Side-Scan Sonar Image Recognition Based on Style Transfer“. IEEE Geoscience and Remote Sensing Letters, 2023, 1. http://dx.doi.org/10.1109/lgrs.2023.3318051.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Zhang, Qing, Jing Zhang, Xiangdong Su, Feilong Bao und Guanglai Gao. „Contour detection network for zero-shot sketch-based image retrieval“. Complex & Intelligent Systems, 02.06.2023. http://dx.doi.org/10.1007/s40747-023-01096-2.

Der volle Inhalt der Quelle
Annotation:
AbstractZero-shot sketch-based image retrieval (ZS-SBIR) is a challenging task that involves searching natural images related to a given hand-drawn sketch under the zero-shot scene. The previous approach projected image and sketch features into a low-dimensional common space for retrieval, and used semantic features to transfer the knowledge of seen to unseen classes. However, it is not effective enough to align multimodal features when projecting them into a common space, since the styles and contents of sketches and natural images are different and they are not one-to-one correspondence. To solve this problem, we propose a novel three-branch joint training network with contour detection network (called CDNNet) for the ZS-SBIR task, which uses contour maps as a bridge to align sketches and natural images to alleviate the domain gap. Specifically, we use semantic metrics to constrain the relationship between contour images and natural images and between contour images and sketches, so that natural image and sketch features can be aligned in the common space. Meanwhile, we further employ second-order attention to capture target subject information to increase the performance of retrieval descriptors. In addition, we use a teacher model and word embedding method to transfer the knowledge of the seen to the unseen classes. Extensive experiments on two large-scale datasets demonstrate that our proposed approach outperforms state-of-the-art CNN-based models: it improves by 2.6% on the Sketchy and 1.2% on TU-Berlin datasets in terms of mAP.
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema "Zero-Shot Style Transfer"

1

Lakew, Surafel Melaku. „Multilingual Neural Machine Translation for Low Resource Languages“. Doctoral thesis, Università degli studi di Trento, 2020. http://hdl.handle.net/11572/257906.

Der volle Inhalt der Quelle
Annotation:
Machine Translation (MT) is the task of mapping a source language to a target language. The recent introduction of neural MT (NMT) has shown promising results for high-resource language, however, poorly performing for low-resource language (LRL) settings. Furthermore, the vast majority of the 7, 000+ languages around the world do not have parallel data, creating a zero-resource language (ZRL) scenario. In this thesis, we present our approach to improving NMT for LRL and ZRL, leveraging a multilingual NMT modeling (M-NMT), an approach that allows building a single NMT to translate across multiple source and target languages. This thesis begins by i) analyzing the effectiveness of M-NMT for LRL and ZRL translation tasks, spanning two NMT modeling architectures (Recurrent and Transformer), ii) presents a self-learning approach for improving the zero-shot translation directions of ZRLs, iii) proposes a dynamic transfer-learning approach from a pre-trained (parent) model to a LRL (child) model by tailoring to the vocabulary entries of the latter, iv) extends M-NMT to translate from a source language to specific language varieties (e.g. dialects), and finally, v) proposes an approach that can control the verbosity of an NMT model output. Our experimental findings show the effectiveness of the proposed approaches in improving NMT of LRLs and ZRLs.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Lakew, Surafel Melaku. „Multilingual Neural Machine Translation for Low Resource Languages“. Doctoral thesis, Università degli studi di Trento, 2020. http://hdl.handle.net/11572/257906.

Der volle Inhalt der Quelle
Annotation:
Machine Translation (MT) is the task of mapping a source language to a target language. The recent introduction of neural MT (NMT) has shown promising results for high-resource language, however, poorly performing for low-resource language (LRL) settings. Furthermore, the vast majority of the 7, 000+ languages around the world do not have parallel data, creating a zero-resource language (ZRL) scenario. In this thesis, we present our approach to improving NMT for LRL and ZRL, leveraging a multilingual NMT modeling (M-NMT), an approach that allows building a single NMT to translate across multiple source and target languages. This thesis begins by i) analyzing the effectiveness of M-NMT for LRL and ZRL translation tasks, spanning two NMT modeling architectures (Recurrent and Transformer), ii) presents a self-learning approach for improving the zero-shot translation directions of ZRLs, iii) proposes a dynamic transfer-learning approach from a pre-trained (parent) model to a LRL (child) model by tailoring to the vocabulary entries of the latter, iv) extends M-NMT to translate from a source language to specific language varieties (e.g. dialects), and finally, v) proposes an approach that can control the verbosity of an NMT model output. Our experimental findings show the effectiveness of the proposed approaches in improving NMT of LRLs and ZRLs.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Fares, Mireille. „Multimodal Expressive Gesturing With Style“. Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS017.

Der volle Inhalt der Quelle
Annotation:
La génération de gestes expressifs permet aux agents conversationnels animés (ACA) d'articuler un discours d'une manière semblable à celle des humains. Le thème central du manuscrit est d'exploiter et contrôler l'expressivité comportementale des ACA en modélisant le comportement multimodal que les humains utilisent pendant la communication. Le but est (1) d’exploiter la prosodie de la parole, la prosodie visuelle et le langage dans le but de synthétiser des comportements expressifs pour les ACA; (2) de contrôler le style des gestes synthétisés de manière à pouvoir les générer avec le style de n'importe quel locuteur. Nous proposons un modèle de synthèse de gestes faciaux à partir du texte et la parole; et entraîné sur le corpus TEDx que nous avons collecté. Nous proposons ZS-MSTM 1.0, une approche permettant de synthétiser des gestes stylisés du haut du corps à partir du contenu du discours d'un locuteur source et correspondant au style de tout locuteur cible. Il est entraîné sur le corpus PATS qui inclut des données multimodales de locuteurs ayant des styles de comportement différents. Il n'est pas limité aux locuteurs de PATS, et génère des gestes dans le style de n'importe quel nouveau locuteur vu ou non par notre modèle, sans entraînement supplémentaire, ce qui rend notre approche «zero-shot». Le style comportemental est modélisé sur les données multimodales des locuteurs - langage, gestes et parole - et indépendamment de l'identité du locuteur. Nous proposons ZS-MSTM 2.0 pour générer des gestes faciaux stylisés en plus des gestes du haut du corps. Ce dernier est entraîné sur une extension de PATS, qui inclut des actes de dialogue et des repères faciaux en 2D
The generation of expressive gestures allows Embodied Conversational Agents (ECA) to articulate the speech intent and content in a human-like fashion. The central theme of the manuscript is to leverage and control the ECAs’ behavioral expressivity by modelling the complex multimodal behavior that humans employ during communication. The driving forces of the Thesis are twofold: (1) to exploit speech prosody, visual prosody and language with the aim of synthesizing expressive and human-like behaviors for ECAs; (2) to control the style of the synthesized gestures such that we can generate them with the style of any speaker. With these motivations in mind, we first propose a semantically aware and speech-driven facial and head gesture synthesis model trained on the TEDx Corpus which we collected. Then we propose ZS-MSTM 1.0, an approach to synthesize stylized upper-body gestures, driven by the content of a source speaker’s speech and corresponding to the style of any target speakers, seen or unseen by our model. It is trained on PATS Corpus which includes multimodal data of speakers having different behavioral style. ZS-MSTM 1.0 is not limited to PATS speakers, and can generate gestures in the style of any newly coming speaker without further training or fine-tuning, rendering our approach zero-shot. Behavioral style is modelled based on multimodal speakers’ data - language, body gestures, and speech - and independent from the speaker’s identity ("ID"). We additionally propose ZS-MSTM 2.0 to generate stylized facial gestures in addition to the upper-body gestures. We train ZS-MSTM 2.0 on PATS Corpus, which we extended to include dialog acts and 2D facial landmarks
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Zero-Shot Style Transfer"

1

Huang, Yaoxiong, Mengchao He, Lianwen Jin und Yongpan Wang. „RD-GAN: Few/Zero-Shot Chinese Character Style Transfer via Radical Decomposition and Rendering“. In Computer Vision – ECCV 2020, 156–72. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58539-6_10.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Zero-Shot Style Transfer"

1

Lee, Sang-Hoon, Ha-Yeong Choi, Hyung-Seok Oh und Seong-Whan Lee. „HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer“. In INTERSPEECH 2023. ISCA: ISCA, 2023. http://dx.doi.org/10.21437/interspeech.2023-1608.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Tang, Hao, Songhua Liu, Tianwei Lin, Shaoli Huang, Fu Li, Dongliang He und Xinchao Wang. „Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer“. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023. http://dx.doi.org/10.1109/cvpr52729.2023.01758.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Izumi, Kota, und Keiji Yanai. „Zero-Shot Font Style Transfer with a Differentiable Renderer“. In MMAsia '22: ACM Multimedia Asia. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3551626.3564961.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Sun, Haochen, Lei Wu, Xiang Li und Xiangxu Meng. „Style-woven Attention Network for Zero-shot Ink Wash Painting Style Transfer“. In ICMR '22: International Conference on Multimedia Retrieval. New York, NY, USA: ACM, 2022. http://dx.doi.org/10.1145/3512527.3531391.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Liu, Kunhao, Fangneng Zhan, Yiwen Chen, Jiahui Zhang, Yingchen Yu, Abdulmotaleb El Saddik, Shijian Lu und Eric Xing. „StyleRF: Zero-Shot 3D Style Transfer of Neural Radiance Fields“. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023. http://dx.doi.org/10.1109/cvpr52729.2023.00806.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Fares, Mireille, Catherine Pelachaud und Nicolas Obin. „Zero-Shot Style Transfer for Multimodal Data-Driven Gesture Synthesis“. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 2023. http://dx.doi.org/10.1109/fg57933.2023.10042658.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Sheng, Lu, Ziyi Lin, Jing Shao und Xiaogang Wang. „Avatar-Net: Multi-scale Zero-Shot Style Transfer by Feature Decoration“. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018. http://dx.doi.org/10.1109/cvpr.2018.00860.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Yang, Serin, Hyunmin Hwang und Jong Chul Ye. „Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer“. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023. http://dx.doi.org/10.1109/iccv51070.2023.02091.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Song, Kun, Yi Ren, Yi Lei, Chunfeng Wang, Kun Wei, Lei Xie, Xiang Yin und Zejun Ma. „StyleS2ST: Zero-shot Style Transfer for Direct Speech-to-speech Translation“. In INTERSPEECH 2023. ISCA: ISCA, 2023. http://dx.doi.org/10.21437/interspeech.2023-648.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Chen, Liyang, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan und Sheng Zhao. „VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer“. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE, 2023. http://dx.doi.org/10.1109/iccvw60793.2023.00320.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie