Journal articles on the topic 'Transfert de style zero-shot'

To see the other types of publications on this topic, follow the link: Transfert de style zero-shot.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Transfert de style zero-shot.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Yu, Rongjie Huang, Ruiqi Li, JinZheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, and Zhou Zhao. "StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 19597–605. http://dx.doi.org/10.1609/aaai.v38i17.29932.

Full text
Abstract:
Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expressiveness. Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase. To overcome these challenges, we propose StyleSinger, the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. StyleSinger incorporates two critical approaches for enhanced effectiveness: 1) the Residual Style Adaptor (RSA) which employs a residual quantization module to capture diverse style characteristics in singing voices, and 2) the Uncertainty Modeling Layer Normalization (UMLN) to perturb the style attributes within the content representation during the training phase and thus improve the model generalization. Our extensive evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples. Access to singing voice samples can be found at https://stylesinger.github.io/.
APA, Harvard, Vancouver, ISO, and other styles
2

Xi, Jier, Xiufen Ye, and Chuanlong Li. "Sonar Image Target Detection Based on Style Transfer Learning and Random Shape of Noise under Zero Shot Target." Remote Sensing 14, no. 24 (December 10, 2022): 6260. http://dx.doi.org/10.3390/rs14246260.

Full text
Abstract:
With the development of sonar technology, sonar images have been widely used to detect targets. However, there are many challenges for sonar images in terms of object detection. For example, the detectable targets in the sonar data are more sparse than those in optical images, the real underwater scanning experiment is complicated, and the sonar image styles produced by different types of sonar equipment due to their different characteristics are inconsistent, which makes it difficult to use them for sonar object detection and recognition algorithms. In order to solve these problems, we propose a novel sonar image object-detection method based on style learning and random noise with various shapes. Sonar style target sample images are generated through style transfer, which enhances insufficient sonar objects image. By introducing various noise shapes, which included points, lines, and rectangles, the problems of mud and sand obstruction and a mutilated target in the real environment are solved, and the single poses of the sonar image target is improved by fusing multiple poses of optical image target. In the meantime, a method of feature enhancement is proposed to solve the issue of missing key features when using style transfer on optical images directly. The experimental results show that our method achieves better precision.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Wenjing, Jizheng Xu, Li Zhang, Yue Wang, and Jiaying Liu. "Consistent Video Style Transfer via Compound Regularization." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12233–40. http://dx.doi.org/10.1609/aaai.v34i07.6905.

Full text
Abstract:
Recently, neural style transfer has drawn many attentions and significant progresses have been made, especially for image style transfer. However, flexible and consistent style transfer for videos remains a challenging problem. Existing training strategies, either using a significant amount of video data with optical flows or introducing single-frame regularizers, have limited performance on real videos. In this paper, we propose a novel interpretation of temporal consistency, based on which we analyze the drawbacks of existing training strategies; and then derive a new compound regularization. Experimental results show that the proposed regularization can better balance the spatial and temporal performance, which supports our modeling. Combining with the new cost formula, we design a zero-shot video style transfer framework. Moreover, for better feature migration, we introduce a new module to dynamically adjust inter-channel distributions. Quantitative and qualitative results demonstrate the superiority of our method over other state-of-the-art style transfer methods. Our project is publicly available at: https://daooshee.github.io/CompoundVST/.
APA, Harvard, Vancouver, ISO, and other styles
4

Park, Jangkyoung, Ammar Ul Hassan, and Jaeyoung Choi. "CCFont: Component-Based Chinese Font Generation Model Using Generative Adversarial Networks (GANs)." Applied Sciences 12, no. 16 (August 10, 2022): 8005. http://dx.doi.org/10.3390/app12168005.

Full text
Abstract:
Font generation using deep learning has made considerable progress using image style transfer, but the automatic conversion/generation of Chinese characters still remains a difficult task owing to the complex character shape and large number of Chinese characters. Most known Chinese character generation models use the image conversion method of the Chinese character shape itself; however, it is difficult to reproduce complex Chinese characters. Recent methods have utilized character compositionality by separating up to three or four components to improve the quality of generated characters, but it is still difficult to generate high-quality results for complex Chinese characters with many components. In this study, we proposed the CCFont model (component-based Chinese font generation model using generative adversarial networks (GANs)) that automatically generates all Chinese characters using Chinese character components (up to 17 components). The CCFont model generates all Chinese characters in various styles using the components of Chinese characters based on conditional GAN. By acquiring local style information from the components, the information is more accurate and there is less information loss than when global information is obtained from the image of the entire character, reducing the failure of style conversion and improving quality to produce high-quality results. Additionally, the CCFont model generates high-quality results without any additional training (zero-shot font generation without any additional training) for the first-seen characters and styles. For example, the CCFont model, which was trained with only traditional Chinese (TC) characters, generates high-quality results for languages that can be divided into components, such as Korean and Thai, as well as simplified Chinese (SC) characters that are only seen during inference. CCFont can be adopted as a multi-lingual font-generation model that can be applied to all languages, which can be divided into components. To the best of our knowledge, the proposed method is the first to generate a zero-shot multilingual generation model using components. Qualitative and quantitative experiments were conducted to demonstrate the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
5

Azizah, Kurniawati, and Wisnu Jatmiko. "Transfer Learning, Style Control, and Speaker Reconstruction Loss for Zero-Shot Multilingual Multi-Speaker Text-to-Speech on Low-Resource Languages." IEEE Access 10 (2022): 5895–911. http://dx.doi.org/10.1109/access.2022.3141200.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Yang, Zhenhua, Dezhi Peng, Yuxin Kong, Yuyi Zhang, Cong Yao, and Lianwen Jin. "FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 6603–11. http://dx.doi.org/10.1609/aaai.v38i7.28482.

Full text
Abstract:
Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory performance, they still struggle with complex characters and large style variations. To address these issues, we propose FontDiffuser, a diffusion-based image-to-image one-shot font generation method, which innovatively models the font imitation task as a noise-to-denoise paradigm. In our method, we introduce a Multi-scale Content Aggregation (MCA) block, which effectively combines global and local content cues across different scales, leading to enhanced preservation of intricate strokes of complex characters. Moreover, to better manage the large variations in style transfer, we propose a Style Contrastive Refinement (SCR) module, which is a novel structure for style representation learning. It utilizes a style extractor to disentangle styles from images, subsequently supervising the diffusion model via a meticulously designed style contrastive loss. Extensive experiments demonstrate FontDiffuser's state-of-the-art performance in generating diverse characters and styles. It consistently excels on complex characters and large style changes compared to previous methods. The code is available at https://github.com/yeungchenwa/FontDiffuser.
APA, Harvard, Vancouver, ISO, and other styles
7

Cheng, Jikang, Zhen Han, Zhongyuan Wang, and Liang Chen. "“One-Shot” Super-Resolution via Backward Style Transfer for Fast High-Resolution Style Transfer." IEEE Signal Processing Letters 28 (2021): 1485–89. http://dx.doi.org/10.1109/lsp.2021.3098230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Yu, Yong. "Few Shot POP Chinese Font Style Transfer using CycleGAN." Journal of Physics: Conference Series 2171, no. 1 (January 1, 2022): 012031. http://dx.doi.org/10.1088/1742-6596/2171/1/012031.

Full text
Abstract:
Abstract The new style design of Chinese fonts is an arduous task, because there are many types of commonly used Chinese characters and the composition of Chinese characters is complicated. Therefore, the style transfer of Chinese characters based on GAN has become a research hotspot in the past two years. This line of re-search is dedicated to using a small number of artificially designed new style fonts and learning the map-ping from the source font style domain to the target style domain. However, such methods have two problems: 1. The performance on pop (point of purchase) fonts with exaggerated and random style is not satisfying. 2. Plentiful manually designed fonts are still required. In order to solve the above problems, we propose a few-shot font style transfer model based on CycleGAN. It uses meta-knowledge to reduce the use of manually designed fonts and enables each character to fully learn the knowledge contained in all new style fonts to achieve satisfying pop font style transfer effect. We also construct a dataset based on commonly used 3500 Chinese characters and verify the effectiveness of our model.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhu, Anna, Xiongbo Lu, Xiang Bai, Seiichi Uchida, Brian Kenji Iwana, and Shengwu Xiong. "Few-Shot Text Style Transfer via Deep Feature Similarity." IEEE Transactions on Image Processing 29 (2020): 6932–46. http://dx.doi.org/10.1109/tip.2020.2995062.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Feng, Wancheng, Yingchao Liu, Jiaming Pei, Wenxuan Liu, Chunpeng Tian, and Lukun Wang. "Local Consistency Guidance: Personalized Stylization Method of Face Video (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (March 24, 2024): 23486–87. http://dx.doi.org/10.1609/aaai.v38i21.30440.

Full text
Abstract:
Face video stylization aims to convert real face videos into specified reference styles. While one-shot methods perform well in single-image stylization, ensuring continuity between frames and retaining the original facial expressions present challenges in video stylization. To address these issues, our approach employs a personalized diffusion model with pixel-level control. We propose Local Consistency Guidance(LCG) strategy, composed of local-cross attention and local style transfer, to ensure temporal consistency. This framework enables the synthesis of high-quality stylized face videos with excellent temporal continuity.
APA, Harvard, Vancouver, ISO, and other styles
11

Cifka, Ondrej, Umut Simsekli, and Gael Richard. "Groove2Groove: One-Shot Music Style Transfer With Supervision From Synthetic Data." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2638–50. http://dx.doi.org/10.1109/taslp.2020.3019642.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Wu, Xinyi, Zhenyao Wu, Yuhang Lu, Lili Ju, and Song Wang. "Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Semantic Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2740–49. http://dx.doi.org/10.1609/aaai.v36i3.20177.

Full text
Abstract:
In this paper, we tackle the problem of one-shot unsupervised domain adaptation (OSUDA) for semantic segmentation where the segmentors only see one unlabeled target image during training. In this case, traditional unsupervised domain adaptation models usually fail since they cannot adapt to the target domain with over-fitting to one (or few) target samples. To address this problem, existing OSUDA methods usually integrate a style-transfer module to perform domain randomization based on the unlabeled target sample, with which multiple domains around the target sample can be explored during training. However, such a style-transfer module relies on an additional set of images as style reference for pre-training and also increases the memory demand for domain adaptation. Here we propose a new OSUDA method that can effectively relieve such computational burden. Specifically, we integrate several style-mixing layers into the segmentor which play the role of style-transfer module to stylize the source images without introducing any learned parameters. Moreover, we propose a patchwise prototypical matching (PPM) method to weighted consider the importance of source pixels during the supervised training to relieve the negative adaptation. Experimental results show that our method achieves new state-of-the-art performance on two commonly used benchmarks for domain adaptive semantic segmentation under the one-shot setting and is more efficient than all comparison approaches.
APA, Harvard, Vancouver, ISO, and other styles
13

Ibrahim, Bekkouch Imad Eddine, Victoria Eyharabide, Valérie Le Page, and Frédéric Billiet. "Few-Shot Object Detection: Application to Medieval Musicological Studies." Journal of Imaging 8, no. 2 (January 19, 2022): 18. http://dx.doi.org/10.3390/jimaging8020018.

Full text
Abstract:
Detecting objects with a small representation in images is a challenging task, especially when the style of the images is very different from recent photos, which is the case for cultural heritage datasets. This problem is commonly known as few-shot object detection and is still a new field of research. This article presents a simple and effective method for black box few-shot object detection that works with all the current state-of-the-art object detection models. We also present a new dataset called MMSD for medieval musicological studies that contains five classes and 693 samples, manually annotated by a group of musicology experts. Due to the significant diversity of styles and considerable disparities between the artistic representations of the objects, our dataset is more challenging than the current standards. We evaluate our method on YOLOv4 (m/s), (Mask/Faster) RCNN, and ViT/Swin-t. We present two methods of benchmarking these models based on the overall data size and the worst-case scenario for object detection. The experimental results show that our method always improves object detector results compared to traditional transfer learning, regardless of the underlying architecture.
APA, Harvard, Vancouver, ISO, and other styles
14

Yang, Ze, Yali Wang, Xianyu Chen, Jianzhuang Liu, and Yu Qiao. "Context-Transformer: Tackling Object Confusion for Few-Shot Detection." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 12653–60. http://dx.doi.org/10.1609/aaai.v34i07.6957.

Full text
Abstract:
Few-shot object detection is a challenging but realistic scenario, where only a few annotated training images are available for training detectors. A popular approach to handle this problem is transfer learning, i.e., fine-tuning a detector pretrained on a source-domain benchmark. However, such transferred detector often fails to recognize new objects in the target domain, due to low data diversity of training samples. To tackle this problem, we propose a novel Context-Transformer within a concise deep transfer framework. Specifically, Context-Transformer can effectively leverage source-domain object knowledge as guidance, and automatically exploit contexts from only a few training images in the target domain. Subsequently, it can adaptively integrate these relational clues to enhance the discriminative power of detector, in order to reduce object confusion in few-shot scenarios. Moreover, Context-Transformer is flexibly embedded in the popular SSD-style detectors, which makes it a plug-and-play module for end-to-end few-shot learning. Finally, we evaluate Context-Transformer on the challenging settings of few-shot detection and incremental few-shot detection. The experimental results show that, our framework outperforms the recent state-of-the-art approaches.
APA, Harvard, Vancouver, ISO, and other styles
15

Weng, Shao-En, Hong-Han Shuai, and Wen-Huang Cheng. "Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 13718–26. http://dx.doi.org/10.1609/aaai.v37i11.26607.

Full text
Abstract:
Often a face has a voice. Appearance sometimes has a strong relationship with one's voice. In this work, we study how a face can be converted to a voice, which is a face-based voice conversion. Since there is no clean dataset that contains face and speech, voice conversion faces difficult learning and low-quality problems caused by background noise or echo. Too much redundant information for face-to-voice also causes synthesis of a general style of speech. Furthermore, previous work tried to disentangle speech with bottleneck adjustment. However, it is hard to decide on the size of the bottleneck. Therefore, we propose a bottleneck-free strategy for speech disentanglement. To avoid synthesizing the general style of speech, we utilize framewise facial embedding. It applied adversarial learning with a multi-scale discriminator for the model to achieve better quality. In addition, the self-attention module is added to focus on content-related features for in-the-wild data. Quantitative experiments show that our method outperforms previous work.
APA, Harvard, Vancouver, ISO, and other styles
16

Dou, Zi-Yi, and Nanyun Peng. "Zero-Shot Commonsense Question Answering with Cloze Translation and Consistency Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10572–80. http://dx.doi.org/10.1609/aaai.v36i10.21301.

Full text
Abstract:
Commonsense question answering (CQA) aims to test if models can answer questions regarding commonsense knowledge that everyone knows. Prior works that incorporate external knowledge bases have shown promising results, but knowledge bases are expensive to construct and are often limited to a fixed set of relations. In this paper, we instead focus on better utilizing the implicit knowledge stored in pre-trained language models. While researchers have found that the knowledge embedded in pre-trained language models can be extracted by having them fill in the blanks of carefully designed prompts for relation extraction and text classification, it remains unclear if we can adopt this paradigm in CQA where the inputs and outputs take much more flexible forms. To this end, we investigate four translation methods that can translate natural questions into cloze-style sentences to better solicit commonsense knowledge from language models, including a syntactic-based model, an unsupervised neural model, and two supervised neural models. In addition, to combine the different translation methods, we propose to encourage consistency among model predictions on different translated questions with unlabeled data. We demonstrate the effectiveness of our methods on three CQA datasets in zero-shot settings. We show that our methods are complementary to a knowledge base improved model, and combining them can lead to state-of-the-art zero-shot performance. Analyses also reveal distinct characteristics of the different cloze translation methods and provide insights on why combining them can lead to great improvements. Code/dataset is available at https://github.com/PlusLabNLP/zero_shot_cqa.
APA, Harvard, Vancouver, ISO, and other styles
17

Men, Yifang, Yuan Yao, Miaomiao Cui, Zhouhui Lian, and Xuansong Xie. "DCT-net." ACM Transactions on Graphics 41, no. 4 (July 2022): 1–9. http://dx.doi.org/10.1145/3528223.3530159.

Full text
Abstract:
This paper introduces DCT-Net, a novel image translation architecture for few-shot portrait stylization. Given limited style exemplars (~100), the new architecture can produce high-quality style transfer results with advanced ability to synthesize high-fidelity contents and strong generality to handle complicated scenes (e.g., occlusions and accessories). Moreover, it enables full-body image translation via one elegant evaluation network trained by partial observations (i.e., stylized heads). Few-shot learning based style transfer is challenging since the learned model can easily become overfitted in the target domain, due to the biased distribution formed by only a few training examples. This paper aims to handle the challenge by adopting the key idea of "calibration first, translation later" and exploring the augmented global structure with locally-focused translation. Specifically, the proposed DCT-Net consists of three modules: a content adapter borrowing the powerful prior from source photos to calibrate the content distribution of target samples; a geometry expansion module using affine transformations to release spatially semantic constraints; and a texture translation module leveraging samples produced by the calibrated distribution to learn a fine-grained conversion. Experimental results demonstrate the proposed method's superiority over the state of the art in head stylization and its effectiveness on full image translation with adaptive deformations. Our code is publicly available at https://github.com/menyifang/DCT-Net.
APA, Harvard, Vancouver, ISO, and other styles
18

Wang, Xin, Jiawei Wu, Da Zhang, Yu Su, and William Yang Wang. "Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8965–72. http://dx.doi.org/10.1609/aaai.v33i01.33018965.

Full text
Abstract:
Although promising results have been achieved in video captioning, existing models are limited to the fixed inventory of activities in the training corpus, and do not generalize to open vocabulary scenarios. Here we introduce a novel task, zeroshot video captioning, that aims at describing out-of-domain videos of unseen activities. Videos of different activities usually require different captioning strategies in many aspects, i.e. word selection, semantic construction, and style expression etc, which poses a great challenge to depict novel activities without paired training data. But meanwhile, similar activities share some of those aspects in common. Therefore, we propose a principled Topic-Aware Mixture of Experts (TAMoE) model for zero-shot video captioning, which learns to compose different experts based on different topic embeddings, implicitly transferring the knowledge learned from seen activities to unseen ones. Besides, we leverage external topic-related text corpus to construct the topic embedding for each activity, which embodies the most relevant semantic vectors within the topic. Empirical results not only validate the effectiveness of our method in utilizing semantic knowledge for video captioning, but also show its strong generalization ability when describing novel activities.
APA, Harvard, Vancouver, ISO, and other styles
19

Zhang, Chao, Hongbin Dong, and Baosong Deng. "Improving Pre-Training and Fine-Tuning for Few-Shot SAR Automatic Target Recognition." Remote Sensing 15, no. 6 (March 22, 2023): 1709. http://dx.doi.org/10.3390/rs15061709.

Full text
Abstract:
SAR-ATR (synthetic aperture radar-automatic target recognition) is a hot topic in remote sensing. This work suggests a few-shot target recognition approach (FTL) based on the concept of transfer learning to accomplish accurate target recognition of SAR images in a few-shot scenario since the classic SAR ATR method has significant data reliance. At the same time, the strategy introduces a model distillation method to improve the model’s performance further. This method is composed of three parts. First, the data engine, which uses the style conversion model and optical image data to generate image data similar to SAR style and realize cross-domain conversion, can effectively solve the problem of insufficient training data of the SAR image classification model. Second is model training, which uses SAR image data sets to pre-train the model. Here, we introduce the deep Brownian distance covariance (Deep BDC) pooling layer to optimize the image feature representation so that the model can learn the image representation by measuring the difference between the joint feature function of the embedded feature and the edge product. Third, model fine-tuning, which freezes the model structure, except the classifier, and fine-tunes it by using a small amount of novel data. The knowledge distillation approach is also introduced simultaneously to train the model repeatedly, sharpen the knowledge, and enhance model performance. According to experimental results on the MSTAR benchmark dataset, the proposed method is demonstrably better than the SOTA method in the few-shot SAR ATR issue. The recognition accuracy is about 80% in the case of 10-way 10-shot.
APA, Harvard, Vancouver, ISO, and other styles
20

Pham Ngoc, Phuong, Chung Tran Quang, and Mai Luong Chi. "ADAPT-TTS: HIGH-QUALITY ZERO-SHOT MULTI-SPEAKER TEXT-TO-SPEECH ADAPTIVE-BASED FOR VIETNAMESE." Journal of Computer Science and Cybernetics 39, no. 2 (June 12, 2023): 159–73. http://dx.doi.org/10.15625/1813-9663/18136.

Full text
Abstract:
Current adaptive-based speech synthesis techniques are based on two main streams: 1. Fine-tuning the model using small amounts of adaptive data, and 2. Conditionally training the entire model through a speaker embedding of the target speaker. However, both of these methods require adaptive data to appear during training, which makes the training cost to generate new voices quite expensively. In addition, the traditional TTS model uses a simple loss function to reproduce the acoustic features. However, this optimization is based on incorrect distribution assumptions leading to noisy composite audio results. We introduce the Adapt-TTS model that allows high-quality audio synthesis from a small adaptive sample without training to solve these problems. Key recommendations: 1. The Extracting Mel-vector (EMV) architecture allows for a better representation of speaker characteristics and speech style; 2. An improved zero-shot model with a denoising diffusion model (Mel-spectrogram denoiser) component allows for new voice synthesis without training with better quality (less noise). The evaluation results have proven the model's effectiveness when only needing a single utterance (1-3 seconds) of the reference speaker, the synthesis system gave high-quality synthesis results and achieved high similarity.
APA, Harvard, Vancouver, ISO, and other styles
21

Yao, Mingshuai, Yabo Zhang, Xianhui Lin, Xiaoming Li, and Wangmeng Zuo. "VQ-FONT: Few-Shot Font Generation with Structure-Aware Enhancement and Quantization." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 15 (March 24, 2024): 16407–15. http://dx.doi.org/10.1609/aaai.v38i15.29577.

Full text
Abstract:
Few-shot font generation is challenging, as it needs to capture the fine-grained stroke styles from a limited set of reference glyphs, and then transfer to other characters, which are expected to have similar styles. However, due to the diversity and complexity of Chinese font styles, the synthesized glyphs of existing methods usually exhibit visible artifacts, such as missing details and distorted strokes. In this paper, we propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement. Specifically, we pre-train a VQGAN to encapsulate font token prior within a code-book. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes. Furthermore, our VQ-Font leverages the inherent design of Chinese characters, where structure components such as radicals and character components are combined in specific arrangements, to recalibrate fine-grained styles based on references. This process improves the matching and fusion of styles at the structure level. Both modules collaborate to enhance the fidelity of the generated fonts. Experiments on a collected font dataset show that our VQ-Font outperforms the competing methods both quantitatively and qualitatively, especially in generating challenging styles. Our code is available at https://github.com/Yaomingshuai/VQ-Font.
APA, Harvard, Vancouver, ISO, and other styles
22

V, Sandeep Kumar, Hari Kishore R, Guru Prasadh M, and Divakar R. "ONE SHOT FACE STYLIZATION USING GANS." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 07, no. 10 (October 1, 2023): 1–11. http://dx.doi.org/10.55041/ijsrem26061.

Full text
Abstract:
One-shot face stylization is an interesting and challenging subject in computer vision and deep learning. This work deals with the art of manipulating a target face using a reference image as inspiration, which requires controlling facial recognition while specifying important style characteristics This project has attracted a lot of interest due to its potential applications in digital art, entertainment, and personal products. In this abstract, we examine the important features of a one- shot face stylization. Deep neural networks, especially generative adversarial networks (GANs), are widely used in the process to generate customized facial images. These networks are trained on data structures that combine the target and reference faces, with the reference image acting as a strategic identifier. The success of a one-shot facial lies in the meticulous execution of the fading process, which strikes a balance between preserving identity and improving technique. These disadvantages typically include a combination of manpower retention, strategic formation, emotional quality, and enemy training. In conclusion, advances in this field have the potential to transform creative expression and personalization across industries from digital art and animation to virtual avatars and social media filters. Key Words: Facial recognition, generative adversarial networks, virtual avatar, image to image transfer.
APA, Harvard, Vancouver, ISO, and other styles
23

Wang, Suzhen, Lincheng Li, Yu Ding, and Xin Yu. "One-Shot Talking Face Generation from Single-Speaker Audio-Visual Correlation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2531–39. http://dx.doi.org/10.1609/aaai.v36i3.20154.

Full text
Abstract:
Audio-driven one-shot talking face generation methods are usually trained on video resources of various persons. However, their created videos often suffer unnatural mouth shapes and asynchronous lips because those methods struggle to learn a consistent speech style from different speakers. We observe that it would be much easier to learn a consistent speech style from a specific speaker, which leads to authentic mouth movements. Hence, we propose a novel one-shot talking face generation framework by exploring consistent correlations between audio and visual motions from a specific speaker and then transferring audio-driven motion fields to a reference image. Specifically, we develop an Audio-Visual Correlation Transformer (AVCT) that aims to infer talking motions represented by keypoint based dense motion fields from an input audio. In particular, considering audio may come from different identities in deployment, we incorporate phonemes to represent audio signals. In this manner, our AVCT can inherently generalize to audio spoken by other identities. Moreover, as face keypoints are used to represent speakers, AVCT is agnostic against appearances of the training speaker, and thus allows us to manipulate face images of different identities readily. Considering different face shapes lead to different motions, a motion field transfer module is exploited to reduce the audio-driven dense motion field gap between the training identity and the one-shot reference. Once we obtained the dense motion field of the reference image, we employ an image renderer to generate its talking face videos from an audio clip. Thanks to our learned consistent speaking style, our method generates authentic mouth shapes and vivid movements. Extensive experiments demonstrate that our synthesized videos outperform the state-of-the-art in terms of visual quality and lip-sync.
APA, Harvard, Vancouver, ISO, and other styles
24

Guo, Wei, Yuqi Zhang, De Ma, and Qian Zheng. "Learning to Manipulate Artistic Images." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (March 24, 2024): 1994–2002. http://dx.doi.org/10.1609/aaai.v38i3.27970.

Full text
Abstract:
Recent advancement in computer vision has significantly lowered the barriers to artistic creation. Exemplar-based image translation methods have attracted much attention due to flexibility and controllability. However, these methods hold assumptions regarding semantics or require semantic information as the input, while accurate semantics is not easy to obtain in artistic images. Besides, these methods suffer from cross-domain artifacts due to training data prior and generate imprecise structure due to feature compression in the spatial domain. In this paper, we propose an arbitrary Style Image Manipulation Network (SIM-Net), which leverages semantic-free information as guidance and a region transportation strategy in a self-supervised manner for image generation. Our method balances computational efficiency and high resolution to a certain extent. Moreover, our method facilitates zero-shot style image manipulation. Both qualitative and quantitative experiments demonstrate the superiority of our method over state-of-the-art methods.Code is available at https://github.com/SnailForce/SIM-Net.
APA, Harvard, Vancouver, ISO, and other styles
25

Lee, Suhyeon, Junhyuk Hyun, Hongje Seong, and Euntai Kim. "Unsupervised Domain Adaptation for Semantic Segmentation by Content Transfer." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (May 18, 2021): 8306–15. http://dx.doi.org/10.1609/aaai.v35i9.17010.

Full text
Abstract:
In this paper, we tackle the unsupervised domain adaptation (UDA) for semantic segmentation, which aims to segment the unlabeled real data using labeled synthetic data. The main problem of UDA for semantic segmentation relies on reducing the domain gap between the real image and synthetic image. To solve this problem, we focused on separating information in an image into content and style. Here, only the content has cues for semantic segmentation, and the style makes the domain gap. Thus, precise separation of content and style in an image leads to effect as supervision of real data even when learning with synthetic data. To make the best of this effect, we propose a zero-style loss. Even though we perfectly extract content for semantic segmentation in the real domain, another main challenge, the class imbalance problem, still exists in UDA for semantic segmentation. We address this problem by transferring the contents of tail classes from synthetic to real domain. Experimental results show that the proposed method achieves the state-of-the-art performance in semantic segmentation on the major two UDA settings.
APA, Harvard, Vancouver, ISO, and other styles
26

Song, Yun-Zhu, Yi-Syuan Chen, Lu Wang, and Hong-Han Shuai. "General then Personal: Decoupling and Pre-training for Personalized Headline Generation." Transactions of the Association for Computational Linguistics 11 (2023): 1588–607. http://dx.doi.org/10.1162/tacl_a_00621.

Full text
Abstract:
Abstract Personalized Headline Generation aims to generate unique headlines tailored to users’ browsing history. In this task, understanding user preferences from click history and incorporating them into headline generation pose challenges. Existing approaches typically rely on predefined styles as control codes, but personal style lacks explicit definition or enumeration, making it difficult to leverage traditional techniques. To tackle these challenges, we propose General Then Personal (GTP), a novel framework comprising user modeling, headline generation, and customization. We train the framework using tailored designs that emphasize two central ideas: (a) task decoupling and (b) model pre-training. With the decoupling mechanism separating the task into generation and customization, two mechanisms, i.e., information self-boosting and mask user modeling, are further introduced to facilitate the training and text control. Additionally, we introduce a new evaluation metric to address existing limitations. Extensive experiments conducted on the PENS dataset, considering both zero-shot and few-shot scenarios, demonstrate that GTP outperforms state-of-the-art methods. Furthermore, ablation studies and analysis emphasize the significance of decoupling and pre-training. Finally, the human evaluation validates the effectiveness of our approaches.1
APA, Harvard, Vancouver, ISO, and other styles
27

Gong, Rui, Dengxin Dai, Yuhua Chen, Wen Li, Danda Pani Paudel, and Luc Van Gool. "Analogical Image Translation for Fog Generation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1433–41. http://dx.doi.org/10.1609/aaai.v35i2.16233.

Full text
Abstract:
Image-to-image translation is to map images from a given style to another given style. While exceptionally successful, current methods assume the availability of training images in both source and target domains, which does not always hold in practice. Inspired by humans' reasoning capability of analogy, we propose analogical image translation (AIT) that exploit the concept of gist, for the first time. Given images of two styles in the source domain: A and A', along with images B of the first style in the target domain, learn a model to translate B to B' in the target domain, such that A:A' :: B:B'. AIT is especially useful for translation scenarios in which training data of one style is hard to obtain but training data of the same two styles in another domain is available. For instance, in the case from normal conditions to extreme, rare conditions, obtaining real training images for the latter case is challenging. However, obtaining synthetic data for both cases is relatively easy. In this work, we aim at adding adverse weather effects, more specifically fog, to images taken in clear weather. To circumvent the challenge of collecting real foggy images, AIT learns the gist of translating synthetic clear-weather to foggy images, followed by adding fog effects onto real clear-weather images, without ever seeing any real foggy image. AIT achieves zero-shot image translation capability, whose effectiveness and benefit are demonstrated by the downstream task of semantic foggy scene understanding.
APA, Harvard, Vancouver, ISO, and other styles
28

Wang, Dingmin, Qiuyuan Huang, Matthew Jackson, and Jianfeng Gao. "Retrieve What You Need: A Mutual Learning Framework for Open-domain Question Answering." Transactions of the Association for Computational Linguistics 12 (2024): 247–63. http://dx.doi.org/10.1162/tacl_a_00646.

Full text
Abstract:
Abstract An open-domain question answering (QA) system usually follows a retrieve-then-read paradigm, in which a retriever is used to retrieve relevant passages from a large corpus, and then a reader generates answers based on the retrieved passages and the original question. In this paper, we propose a simple and novel mutual learning framework to improve the performance of retrieve-then-read-style models via an intermediate module named the knowledge selector, which we train with reinforcement learning. The key benefits of our proposed intermediate module are: 1) no requirement for additional annotated question-passage pairs; 2) improvements in both retrieval and QA performance, as well as computational efficiency, compared to prior competitive retrieve-then-read models; 3) with no finetuning, improvement in the zero-shot performance of large-scale pre-trained language models, e.g., ChatGPT, by encapsulating the input with relevant knowledge without violating the input length constraint.
APA, Harvard, Vancouver, ISO, and other styles
29

Kumar, Vinod K. C., Thamer A. Altaim, Shenbaga Sundaram Subramanian, Shadi Abdelbaset Alkhob, Pradeep Reddy, M. B. S. Anusha, Naresh Bhaskar Raj, P. Senthi, and Riziq Allah Mustafa Gaowgzeh. "Effect of lower body, core and upper body kinematic chain exercise protocol on throwing performance among university shot put athletes: A pilot study." Fizjoterapia Polska 23, no. 3 (July 31, 2023): 108–15. http://dx.doi.org/10.56984/8zg143r1m.

Full text
Abstract:
A coordinated sequence of movements is required to generate maximum power and velocity in shot put. Kinematic chains emphasize the interactions between various body segments during a movement. They suggest that force production and transfer are optimized by coordinating multiple joints and muscle groups. In previous research, the kinematic chain has been attributed to shot put performance. Few studies have examined the effects of a comprehensive kinematic chain exercise protocol on throwing performance among shot put athletes, particularly at universities. Pilot study investigating lower body, core, and upper body kinematic chain exercise protocol on university shot put athletes' throwing performance. A total of twenty-four young athletes specializing in shotput, with an average age of 19.87 years and a standard deviation of 1.31 years, were divided into two groups, namely the experimental group and the control group, using a random assignment method, the experimental group, consisting of 12 participants, underwent an 8-week kinematic chain training program alongside their regular training sessions. On the other hand, the control group, also consisting of 12 participants, only participated in their regular training sessions without any additional intervention. Pre- and post-training assessments were conducted to measure shotput throwing performance, preference for throwing style, and the participants' satisfaction with the exercise protocol, using a questionnaire. The athletes who took part in the kinematic chain program demonstrated a significant improvement in throwing distance compared to the control group (p = 0.01). Additionally, the athletes in the experimental group reported higher levels of satisfaction with the exercise protocol (p = 0.005). These findings indicate that incorporating an 8-week Lower Body, Core and Upper Body kinematic chain exercise protocol into regular training sessions can lead to more pronounced improvements in sport-specific throwing performance among young shotput athletes.
APA, Harvard, Vancouver, ISO, and other styles
30

Fittall, A. M., and R. G. Cowley. "THE HV11 3-D SEISMIC SURVEY: SKUA – SWIFT AREA GEOLOGY REVEALED?" APPEA Journal 32, no. 1 (1992): 159. http://dx.doi.org/10.1071/aj91013.

Full text
Abstract:
The 4630 km of HV11 3-D seismic survey data, shot over the Skua and Swift fault blocks in Timor Sea licence AC/L4, reveals details of Tithonian faulting not evident previously. The HV11 survey provided 10 times the data density of previous coverage and significantly improved data quality through the recording of lower frequencies and use of accurate navigation systems and high resolution processing parameters.Tithonian faulting is revealed as a series of northeast-trending en echelon faults overprinting a deeper, north-northeastern, possibly latest Triassic, trend which defines the major fault block boundaries. Transfer of fault throw between en echelon segments appears to be by strike ramps with no evidence for cross-cutting transfer faults. Skua Field fault geometries preclude Upper Jurassic, right lateral strike-slip tectonics. Semi-regional fault trends also have an en echelon style with transfer of fault throw by strike ramps. Escarpments developed along the Tithonian faults are also evident on the HV11 data.The direction of Tithonian extension is interpreted to be oblique to the deeper fault trend, giving rise to the en echelon Tithonian fault style. Each en echelon segment appears to control an hydrocarbon accumulation, which may be due to fault-independent drape over palaeotopographic relief.En echelon Miocene faulting, incisement and depositional mounding in the Puffin Formation are also detailed by the HV11 seismic data. The HV11 survey demonstrates the value of acquisition of 3-D seismic data as an exploration tool in an area of complex and subtle structural geology.
APA, Harvard, Vancouver, ISO, and other styles
31

Zaitsu, Wataru, Mingzhe Jin, Shunichi Ishihara, Satoru Tsuge, and Mitsuyuki Inaba. "Can we spot fake public comments generated by ChatGPT(-3.5, -4)?: Japanese stylometric analysis expose emulation created by one-shot learning." PLOS ONE 19, no. 3 (March 13, 2024): e0299031. http://dx.doi.org/10.1371/journal.pone.0299031.

Full text
Abstract:
Public comments are an important opinion for civic when the government establishes rules. However, recent AI can easily generate large quantities of disinformation, including fake public comments. We attempted to distinguish between human public comments and ChatGPT-generated public comments (including ChatGPT emulated that of humans) using Japanese stylometric analysis. Study 1 conducted multidimensional scaling (MDS) to compare 500 texts of five classes: Human public comments, GPT-3.5 and GPT-4 generated public comments only by presenting the titles of human public comments (i.e., zero-shot learning, GPTzero), GPT-3.5 and GPT-4 emulated by presenting sentences of human public comments and instructing to emulate that (i.e., one-shot learning, GPTone). The MDS results showed that the Japanese stylometric features of the public comments were completely different from those of the GPTzero-generated texts. Moreover, GPTone-generated public comments were closer to those of humans than those generated by GPTzero. In Study 2, the performance levels of the random forest (RF) classifier for distinguishing three classes (human, GPTzero, and GPTone texts). RF classifiers showed the best precision for the human public comments of approximately 90%, and the best precision for the fake public comments generated by GPT (GPTzero and GPTone) was 99.5% by focusing on integrated next writing style features: phrase patterns, parts-of-speech (POS) bigram and trigram, and function words. Therefore, the current study concluded that we could discriminate between GPT-generated fake public comments and those written by humans at the present time.
APA, Harvard, Vancouver, ISO, and other styles
32

Bao, Yuyan, Guannan Wei, Oliver Bračevac, Yuxuan Jiang, Qiyang He, and Tiark Rompf. "Reachability types: tracking aliasing and separation in higher-order functional programs." Proceedings of the ACM on Programming Languages 5, OOPSLA (October 20, 2021): 1–32. http://dx.doi.org/10.1145/3485516.

Full text
Abstract:
Ownership type systems, based on the idea of enforcing unique access paths, have been primarily focused on objects and top-level classes. However, existing models do not as readily reflect the finer aspects of nested lexical scopes, capturing, or escaping closures in higher-order functional programming patterns, which are increasingly adopted even in mainstream object-oriented languages. We present a new type system, λ * , which enables expressive ownership-style reasoning across higher-order functions. It tracks sharing and separation through reachability sets, and layers additional mechanisms for selectively enforcing uniqueness on top of it. Based on reachability sets, we extend the type system with an expressive flow-sensitive effect system, which enables flavors of move semantics and ownership transfer. In addition, we present several case studies and extensions, including applications to capabilities for algebraic effects, one-shot continuations, and safe parallelization.
APA, Harvard, Vancouver, ISO, and other styles
33

Moyse Ferreira, Lucy. "Colour, movement and modernity in Sonia Delaunay’s (1926) fashion film." Journal of Visual Culture 19, no. 3 (December 2020): 391–404. http://dx.doi.org/10.1177/1470412920965997.

Full text
Abstract:
Sonia Delaunay is best known for her abstract and colourful style which is manifested across her artwork, fashion, textile and interior designs alike. In 1926, this culminated in a fashion film, titled ‘L’Elégance’. Shot using the Keller-Dorian colour process, the film features a succession of Delaunay’s simultaneous fashion and textile designs. This article explores the implications and origins of the film, considering technological, cultural and social factors. It focuses on the themes of colour and movement that were essential both to the film and Delaunay’s philosophy at large and were strongly related by Delaunay and her contemporaries to modernity and women’s liberation. The author positions Delaunay’s earlier work as a form of proto-cinema and demonstrates how her film uniquely transfers aesthetic and cultural themes. She explains why fashion film was a necessary step for Delaunay and how she employed notions of temporality to further fashion film as an avant-garde form.
APA, Harvard, Vancouver, ISO, and other styles
34

Su, Kun, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, et al. "V2Meow: Meowing to the Visual Beat via Video-to-Music Generation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (March 24, 2024): 4952–60. http://dx.doi.org/10.1609/aaai.v38i5.28299.

Full text
Abstract:
Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of video-acoustic signatures has been confined to specific visual scenarios. In contrast, our research confronts the challenge of learning globally aligned signatures between video and music directly from paired music and videos, without explicitly modeling domain-specific rhythmic or semantic relationships. We propose V2Meow, a video-to-music generation system capable of producing high-quality music audio for a diverse range of video input types using a multi-stage autoregressive model. Trained on 5k hours of music audio clips paired with video frames mined from in-the-wild music videos, V2Meow is competitive with previous domain-specific models when evaluated in a zero-shot manner. It synthesizes high-fidelity music audio waveforms solely by conditioning on pre-trained general-purpose visual features extracted from video frames, with optional style control via text prompts. Through both qualitative and quantitative evaluations, we demonstrate that our model outperforms various existing music generation systems in terms of visual-audio correspondence and audio quality. Music samples are available at tinyurl.com/v2meow.
APA, Harvard, Vancouver, ISO, and other styles
35

Li, Yundong, Yi Liu, Han Dong, Wei Hu, and Chen Lin. "Intrusion detection of railway clearance from infrared images using generative adversarial networks." Journal of Intelligent & Fuzzy Systems 40, no. 3 (March 2, 2021): 3931–43. http://dx.doi.org/10.3233/jifs-192141.

Full text
Abstract:
The intrusion detection of railway clearance is crucial for avoiding railway accidents caused by the invasion of abnormal objects, such as pedestrians, falling rocks, and animals. However, detecting intrusions using deep learning methods from infrared images captured at night remains a challenging task because of the lack of sufficient training samples. To address this issue, a transfer strategy that migrates daytime RGB images to the nighttime style of infrared images is proposed in this study. The proposed method consists of two stages. In the first stage, a data generation model is trained on the basis of generative adversarial networks using RGB images and a small number of infrared images, and then, synthetic samples are generated using a well-trained model. In the second stage, a single shot multibox detector (SSD) model is trained using synthetic data and utilized to detect abnormal objects from infrared images at nighttime. To validate the effectiveness of the proposed method, two groups of experiments, namely, railway and non-railway scenes, are conducted. Experimental results demonstrate the effectiveness of the proposed method, and an improvement of 17.8% is achieved for object detection at nighttime.
APA, Harvard, Vancouver, ISO, and other styles
36

DAS, JAYITA, SYED M. ALAM, and SANJUKTA BHANJA. "RECENT TRENDS IN SPINTRONICS-BASED NANOMAGNETIC LOGIC." SPIN 04, no. 03 (September 2014): 1450004. http://dx.doi.org/10.1142/s2010324714500040.

Full text
Abstract:
With the growing concerns of standby power in sub-100-nm CMOS technologies, alternative computing techniques and memory technologies are explored. Spin transfer torque magnetoresistive RAM (STT-MRAM) is one such nonvolatile memory relying on magnetic tunnel junctions (MTJs) to store information. It uses spin transfer torque to write information and magnetoresistance to read information. In 2012, Everspin Technologies, Inc. commercialized the first 64Mbit Spin Torque MRAM. On the computing end, nanomagnetic logic (NML) is a promising technique with zero leakage and high data retention. In 2000, Cowburn and Welland first demonstrated its potential in logic and information propagation through magnetostatic interaction in a chain of single domain circular nanomagnetic dots of Supermalloy ( Ni 80 Fe 14 Mo 5 X 1, X is other metals). In 2006, Imre et al. demonstrated wires and majority gates followed by coplanar cross wire systems demonstration in 2010 by Pulecio et al. Since 2004 researchers have also investigated the potential of MTJs in logic. More recently with dipolar coupling between MTJs demonstrated in 2012, logic-in-memory architecture with STT-MRAM have been investigated. The architecture borrows the computing concept from NML and read and write style from MRAM. The architecture can switch its operation between logic and memory modes with clock as classifier. Further through logic partitioning between MTJ and CMOS plane, a significant performance boost has been observed in basic computing blocks within the architecture. In this work, we have explored the developments in NML, in MTJs and more recent developments in hybrid MTJ/CMOS logic-in-memory architecture and its unique logic partitioning capability.
APA, Harvard, Vancouver, ISO, and other styles
37

Lee, Shih-Hsiung, and Hung-Chun Chen. "U-SSD: Improved SSD Based on U-Net Architecture for End-to-End Table Detection in Document Images." Applied Sciences 11, no. 23 (December 2, 2021): 11446. http://dx.doi.org/10.3390/app112311446.

Full text
Abstract:
Tables are an important element in a document and can express more information with fewer words. Due to the different arrangements of tables and texts, as well as the variety of layouts, table detection is a challenge in the field of document analysis. Nowadays, as Optical Character Recognition technology has gradually matured, it can help us to obtain text information quickly, and the ability to accurately detect table structures can improve the efficiency of obtaining text content. The process of document digitization is influenced by the editor’s style on the table layout. In addition, many industries rely on a large number of people to process data, which has high expense, thus, the industry imports artificial intelligence and Robotic Process Automation to handle simple and complicated routine text digitization work. Therefore, this paper proposes an end-to-end table detection model, U-SSD, as based on the object detection method of deep learning, takes the Single Shot MultiBox Detector (SSD) as the basic model architecture, improves it by U-Net, and adds dilated convolution to enhance the feature learning capability of the network. The experiment in this study uses the dataset of accident claim documents, as provided by a Taiwanese Law Firm, and conducts table detection. The experimental results show that the proposed method is effective. In addition, the results of the evaluation on open dataset of TableBank, Github, and ICDAR13 show that the SSD-based network architectures can achieve good performance.
APA, Harvard, Vancouver, ISO, and other styles
38

Shinde, Sandesh. "Human Motion Imitation using Generative Adversarial Networks." International Journal for Research in Applied Science and Engineering Technology 10, no. 4 (April 30, 2022): 218–21. http://dx.doi.org/10.22214/ijraset.2022.41041.

Full text
Abstract:
Abstract: Within a unified framework, we handle human image synthesis, including human motion imitation, appearance transfer, and new view synthesis. It indicates that after the model has been trained, it can do all of these jobs. To estimate the human body structure, existing task-specific techniques mostly employ 2D key-points (position). However, they can only represent location data and have no ability to define the person's unique shape or simulate limb rotations. To untangle the position and form, we suggest using a 3D body mesh recovery module in this study. It may define the customized body form as well as model joint placement and rotation. We present a Liquid Warping GAN technique that propagates source information in both image and feature spaces to the synthesized reference in order to retain source information such as texture, style, colour, and face identity. A denoising convolutional auto-encoder extracts the source characteristics in order to accurately characterize the source identity. In addition, our approach allows for more flexible warping from many sources. A one/few-shot adversarial learning is used to increase the generalization capacity of the unseen source pictures. In particular, it begins by putting a model through a rigorous training process. The model is then fine-tuned in a self-supervised manner by using one/few unseen images to create high-resolution (512x512 and 1024x1024) outputs. In addition, we created the imitation dataset to assess human motion imitation and unique view synthesis. Extensive testing has shown that our approaches work better in retaining facial identification, form consistency, and outfit details. Keywords: Human Image Synthesis, Motion Imitation, Novel View Synthesis, Generative Adversarial Network
APA, Harvard, Vancouver, ISO, and other styles
39

Vorobyova, Anna E., Inna E. Fedyunina, and Ekaterina A. Vinogradova. "LINGUISTIC AND MENTAL ASPECTS OF THE TRANSLATION OF OFFICIAL TEXTS (BASED ON THE OFFICIAL DOCUMENTS OF PRESS SERVICES AND NEWS PORTALS)." Sovremennye issledovaniya sotsialnykh problem 14, no. 4 (December 29, 2022): 372–87. http://dx.doi.org/10.12731/2077-1770-2022-14-4-372-387.

Full text
Abstract:
Background. The translation process is to be considered not merely as a cross- language, but as a cross-cultural interaction as well, since two language systems and two cultures operate simultaneously. The translator’s mission is to adequately transfer the source code of information into the translation language and adapt the national and cultural features of the original text. This involves mental processing of the underlying message implications and presuppositions. Strategies and methods of translation of official documents are determined not only by their linguistic characteristics, but also by their national-specific features. The above-mentioned aspects contribute to the background of the study. Purpose. The article deals with the analysis of the lexical and syntactic features of the translation of official business texts through the prism of the national cultural code, as well as reveals the influence of mental features of various linguistic and cultural communities on the linguistic, structural and style-forming characteristics of business texts. Materials and methods. The empiric material for the study is represented by 70 official documents of international organizations in English (UNESCO, The Minsk Declaration, U.S. Mission Russia, etc.) and their corresponding Russian translations. The documents are published in an open access on the Internet on the official websites of press services and news portals. In the course of analyzing the sources of actual material, general scientific methods of analysis and synthesis, cognitive analysis, and content analysis were used. Results. The results of the study proved that the national-cultural specificity has a direct impact on the linguistic, structural and style-forming characteristics of official business texts. To ensure the adequacy of a translation in the recipient language, cognitive processing of the source code through the prism of the semantic space of the translator’s culture is required. Recurrent methods of translating of official documents are the techniques of generalization, specification, addition, zero translation and integration. Conclusion. Official business texts are marked by certain functional characteristics that must be taken into consideration in the process of translation to ensure its adequacy and communicative effect.
APA, Harvard, Vancouver, ISO, and other styles
40

Haruna, Yunusa, Shiyin Qin, and Mesmin J. Mbyamm Kiki. "An Improved Approach to Detection of Rice Leaf Disease with GAN-Based Data Augmentation Pipeline." Applied Sciences 13, no. 3 (January 19, 2023): 1346. http://dx.doi.org/10.3390/app13031346.

Full text
Abstract:
The lack of large balanced datasets in the agricultural field is a glaring problem for researchers and developers to design and train optimal deep learning models. This paper shows that using synthetic data augmentation outperforms the standard methods on object detection models and can be crucially important when datasets are few or imbalanced. The purpose of this study was to synthesize rice leaf disease data using a Style-Generative Adversarial Network Adaptive Discriminator Augmentation (SG2-ADA) and the variance of the Laplacian filter to improve the performance of Faster-Region-Based Convolutional Neural Network (faster-RCNN) and Single Shot Detector (SSD) in detecting the major diseases affecting rice. We collected a few unbalanced raw samples of rice leaf diseases images grouped into four diseases namely; bacterial blight (BB), tungro (TG), brown-spot (BS), and rice-blast (RB) with 1584, 1308, 1440, and 1600 images, respectively. We then train StyleGAN2-ADA for 250 epochs whilst using the variance of the Laplacian filter to discard blurry and poorly generated images. The synthesized images were used for augmenting faster-RCNN and SSD models in detecting rice leaf diseases. The StyleGAN2-ADA model achieved a Fréchet Inception Distance (FID) score of 26.67, Kernel Inception Distance (KID) score of 0.08, Precision of 0.49, and Recall of 0.14. In addition, we attained a mean average precision (mAP) of 0.93 and 0.91 for faster-RCNN and SSD, respectively. The learning curves of loss over 250 epochs are 0.03 and 0.04 for Faster-RCNN and SSD, respectively. In comparison to the standard data augmentation, we achieved a t-test p-value of 9.1×10−4 and 8.3×10−5. Hence, the proposed data augmentation pipeline to improve faster-RCNN and SSD models in detecting rice leaf diseases is significant. Our data augmentation approach is helpful to researchers and developers that are faced with the problem of fewer imbalanced datasets and can also be adopted by other fields faced with the same problems.
APA, Harvard, Vancouver, ISO, and other styles
41

Ali, Abdullah Kadhlm, Ahmed Qassem Mohammed, and Qasim Selah Mahdi. "Experimental study of a natural draft hybrid (wet/dry) cooling tower with a splash fill type." AIMS Energy 10, no. 4 (2022): 648–64. http://dx.doi.org/10.3934/energy.2022031.

Full text
Abstract:
<abstract> <p>Cooling towers have such a significant influence on work and efficiency that researchers and designers are working tirelessly to enhance their performance. A prototype design for a natural draft hybrid (wet/dry) cooling tower has been created, relying on geometrical, dynamic, and thermodynamic similarities. Based on Iraqi weather, experiments have been conducted using splash fill (150 mm) in summer (hot and dry) weather conditions. This study investigated heat transfer mechanisms of both air and water in a natural draft hybrid cooling tower model(NDHCTs), both directly (wet section) and indirectly (dry section). The tower is filled with splash-style packing, and the warm water is spread throughout the building using sprayer nozzles. The influences of water flow rates, fill thickness, and air velocity on the cooling range, approach, cooling capacity, thermal efficiency of the cooling tower, water evaporation loss into the air stream and water loss percentage were explored in this study. The experimental were carried out with four different water flow rates, ranging from 7.5 to 12 (Lpm) litres per minute, and eight different air velocities, all while keeping a constant inlet water temperature and a zero (m/s) crosswind. Data has been gathered, and performance variables have been determined. The findings demonstrate that the cooling tower's efficacy increases when the water flow rate is low, and the cooling range increases with increasing air velocity and decreases with increasing water flow rate; for a 7.5 Lpm water flow rate and a 2.4 m/s air velocity, it expanded to 19.5 ℃. The cooling capacity increased to 23.2 kW for a water flow rate of 12 Lpm and an air velocity of 2.4 m/s.</p> </abstract>
APA, Harvard, Vancouver, ISO, and other styles
42

Augustyn, Dariusz R., Łukasz Wyciślik, and Mateusz Sojka. "The FaaS-Based Cloud Agnostic Architecture of Medical Services—Polish Case Study." Applied Sciences 12, no. 15 (August 8, 2022): 7954. http://dx.doi.org/10.3390/app12157954.

Full text
Abstract:
In this paper, the authors, based on a case study of the Polish healthcare IT system being deployed to the cloud, show the possibilities for limiting the computing resources consumption of rarely used services. The architecture of today’s developed application systems is often based on the architectural style of microservices, where individual groups of services are deployed independently of each other. This is also the case with the system under discussion. Most often, the nature of the workload of each group of services is different, which creates some challenges but also provides opportunities to make optimizations in the consumption of computing resources, thus lowering the environmental footprint and at the same time gaining measurable financial benefits. Unlike other scaling methods, such as those based on MDP and reinforcement learning in particular, which focus on system load prediction, in this paper, the authors propose a reactive approach in which any, even unpredictable, change in system load may result in a change (autoscaling) in the number of instances of computing processes so as to adapt the system to the current demand for computing resources as soon as possible. The authors’ main motivation for undertaking the study is to observe the growing interest in implementing FaaS technology in systems deployed to production in many fields, but with relatively little adoption in the healthcare field. Thus, as part of the research conducted here, the authors propose a solution for infrequently used services enabling the so-called scale-to-zero feature using the FaaS model implemented by the Fission tool. This solution is at the same time compatible with the cloud-agnostic approach which in turn helps avoid so-called cloud computing vendor lock-in. Using the example of the system in question, quantitative experimental results showing the savings achieved are presented, proving the justification for this novel implementation in the field of healthcare IT systems.
APA, Harvard, Vancouver, ISO, and other styles
43

Ekejiuba, Azunna IB. "Paradigm Shift in Protective Barrier Covering Implements for the Endemic Phase of Corona Virus and Routine Airborne Pollutants: The Game Changer Approach - Phase Three Category." Epidemiology International Journal 7, no. 4 (2023): 1–58. http://dx.doi.org/10.23880/eij-16000266.

Full text
Abstract:
This article presented a possible protective solution to the diverse health problems associated with human beings inhaling routine anthropogenic airborne particulates and micro-organisms by introducing some regular user friendly barrier covering implements i.e. narrows it down to zero pollutant inhaled for each person’s health care protection via the introduction of cosmetic-style barrier coverings, for the individual sense organs (i.e. nose-mouth-eye). Comprehensively, the atmosphere air is a mixture of several gases, consisting of three main components (78% nitrogen, 21% oxygen, and 1% argon), water vapor, trace gases such as the noble gases (neon, helium, krypton, and xenon); greenhouse gases (carbon dioxide, methane, nitrous oxide, and ozone); and the other gases such as hydrogen, iodine, carbon monoxide, ammonia, nitrogen dioxide, and sulfur dioxide, etc. Furthermore, particulate matter (PM) a mixture of solid particles and liquid droplets, such as dust, dirt, soot (a.k.a. black carbon), smoke, and smog-causing pollutants such as oxides of nitrogen (NOx), oxides of sulfur (SOx), are regularly being released into the atmosphere by human activities (anthropogenic sources). Along with volatile organic compounds (VOCs) i.e. chemical gases released from solid and liquid chemical products such as detergents, pesticides, printer supplies, adhesives, furniture, electronics, paints (and many other products), gasoline vapors, power plants and automobile exhaust, re-occurring wildfires and bush burning in different parts of the world (e.g. Canada, Brazil, California, etc.). Specifically, the June 2023 Canadian wildfire, whose smoke drifted into the northeast United States, and then temporarily made New York City “the most polluted city on the planet”, plus, the occasional air borne viruses and bacteria diseases (particles and respiratory droplets), during pandemics e.g. influenza, corona virus disease 19 (COVID-19), the common respiratory syncytial virus (RSV-a seasonal virus, characterized by variable epidemiology, depending on geographic area and climate) that share many similar symptoms as corona virus, etc. Most notably, this July 22, 2023 Erika Edwards report on “tripledemic” quoted Dr Mandy Cohen (director of the Centers for Disease Control and Prevention), as saying that the American people are expecting to have three bugs out there, “three viruses: COVID, of course, flu and RSV”. This means that many Americans will be urged to get three different vaccinations this fall: COVID, RSV and the annual flu shot. “But that will be a challenge for the health care system, (said Dr. William Schaffner, an infectious diseases expert and professor of preventive medicine at Vanderbilt University Medical Center), at a time when there’s already vaccine fatigue”. The pollutants and greenhouse gases (GHGs- CO2, CH4, N2O, O3, etc.) do not only contributing to climate change (e.g. global warming the emphasis in my first and second articles) but are also the major air, water, and soil pollution that already afflictsmany cities/countries globally today. Air pollutants with the strongest evidence for public health concern include particulate matter (PM), ozone (O3), nitrogen dioxide (NO2) and sulfur dioxide (SO2).
APA, Harvard, Vancouver, ISO, and other styles
44

Saunders, John. "Editorial." International Sports Studies 43, no. 1 (November 9, 2021): 1–6. http://dx.doi.org/10.30819/iss.43-1.01.

Full text
Abstract:
It was the Canadian philosopher Marshall McLuhan who first introduced the term ‘global village’ into the lexicon, almost fifty years ago. He was referring to the phenomenon of global interconnectedness of which we are all too aware today. At that time, we were witnessing the world just opening up. In 1946, British Airways had commenced a twice weekly service from London to New York. The flight involved one or two touch downs en-route and took a scheduled 19 hours and 45 minutes. By the time McLuhan had published his book “Understanding media; the extensions of man”, there were regular services by jet around the globe. London to Sydney was travelled in just under 35 hours. Moving forward to a time immediately pre-covid, there were over 30 non-stop flights a day in each direction between London and New York. The travel time from London to Sydney had been cut by a third, to slightly under 22 hours, with just one touchdown en-route. The world has well and truly ‘opened up’. No place is unreachable by regular services. But that is just one part of the picture. In 1962, the very first live television pictures were transmitted across the Atlantic, via satellite. It was a time when sports’ fans would tune in besides a crackling radio set to hear commentary of their favourite game relayed from the other side of the world. Today of course, not only can we watch a live telecast of the Olympic Games in the comfort of our own homes wherever the games are being held, but we can pick up a telephone and talk face to face with friends and relatives in real time, wherever they may be in the world. To today’s generation – generation Z – this does not seem in the least bit remarkable. Indeed, they have been nicknamed ‘the connected generation’ precisely because such a degree of human interconnectedness no longer seems worth commenting on. The media technology and the transport advances that underpin this level of connectedness, have become taken for granted assumptions to them. This is why the global events of 2020 and the associated public health related reactions, have proved to be so remarkable to them. It is mass travel and the closeness and variety of human contact in day-to-day interactions, that have provided the breeding ground for the pandemic. Consequently, moving around and sharing close proximity with many strangers, have been the activities that have had to be curbed, as the initial primary means to manage the spread of the virus. This has caused hardship to many, either through the loss of a job and the associated income or, the lengthy enforced separation from family and friends – for the many who find themselves living and working far removed from their original home. McLuhan’s powerful metaphor was ahead of its time. His thoughts were centred around media and electronic communications well prior to the notion of a ‘physical’ pandemic, which today has provided an equally potent image of how all of our fortunes have become intertwined, no matter where we sit in the world. Yet it is this event which seems paradoxically to have for the first time forced us to consider more closely the path of progress pursued over the last half century. It is as if we are experiencing for the first time the unleashing of powerful and competing forces, which are both centripetal and centrifugal. On the one hand we are in a world where we have a World Health Organisation. This is a body which has acted as a global force, first declaring the pandemic and subsequently acting in response to it as a part of its brief for international public health. It has brought the world’s scientists and global health professionals together to accelerate the research and development process and develop new norms and standards to contain the spread of the coronavirus pandemic and help care for those affected. At the same time, we have been witnessing nations retreating from each other and closing their borders in order to restrict the interaction of their citizens with those from other nations around the world. We have perceived that danger and risk are increased by international travel and human to human interaction. As a result, increasingly communication has been carried out from the safety and comfort of one’s own home, with electronic media taking the place of personal interaction in the real world. The change to the media dominated world, foreseen by McLuhan a half century ago, has been hastened and consolidated by the threats posed by Covid 19. Real time interactions can be conducted more safely and more economically by means of the global reach of the internet and the ever-enhanced technologies that are being offered to facilitate that. Yet at a geopolitical level prior to Covid 19, the processes of globalism and nationalism were already being recognised as competing forces. In many countries, tensions have emerged between those who are benefitting from the opportunities presented by the development of free trade between countries and those who are invested in more traditional ventures, set in their own nations and communities. The emerging beneficiaries have become characterised as the global elites. Their demographic profile is one associated with youth, education and progressive social ideas. However, they are counter-balanced by those who, rather than opportunities, have experienced threats from the disruptions and turbulence around them. Among the ideas challenged, have been the expected certainties of employment, social values and the security with which many grew up. Industries which have been the lifeblood of their communities are facing extinction and even the security of housing and a roof over the heads of self and family may be under threat. In such circumstances, some people may see waves of new immigrants, technology, and changing social values as being tides which need to be turned back. Their profile is characterised by a demographic less equipped to face such changes - the more mature, less well educated and less mobile. Yet this tension appears to be creating something more than just the latest version of the generational divide. The recent clashes between Republicans and Democrats in the US have provided a very potent example of these societal stresses. The US has itself exported some of these arenas of conflict to the rest of the world. Black lives Matter and #Me too, are social movements with their foundation in the US which have found their way far beyond the immediate contexts which gave them birth. In the different national settings where these various tensions have emerged, they have been characterised through labels such as left and right, progressive and traditional, the ‘haves’ versus the ‘have nots’ etc. Yet common to all of this growing competitiveness between ideologies and values is a common thread. The common thread lies in the notion of competition itself. It finds itself expressed most potently in the spread and adoption of ideas based on what has been termed the neoliberal values of the free market. These values have become ingrained in the language and concepts we employ every day. Thus, everything has a price and ultimately the price can be represented by a dollar value. We see this process of commodification around us on a daily basis. Sports studies’ scholars have long drawn attention to its continuing growth in the world of sport, especially in situations when it overwhelms the human characteristics of the athletes who are at the very heart of sport. When the dollar value of the athlete and their performance becomes more important than the individual and the game, then we find ourselves at the heart of some of the core problems reported today. It is at the point where sport changes from an experience, where the athletes develop themselves and become more complete persons experiencing positive and enriching interactions with fellow athletes, to an environment where young athletes experience stress and mental and physical ill health as result of their experiences. Those who are supremely talented (and lucky?) are rewarded with fabulous riches. Others can find themselves cast out on the scrap heap as a result of an unfair selection process or just the misfortune of injury. Sport as always, has proved to be a mirror of life in reflecting this process in the world at large, highlighting the heights that can be climbed by the fortunate as well as the depths that can be plumbed by the ill-fated. Advocates of the free-market approach will point to the opportunities it can offer. Figures can show that in a period of capitalist organised economies, there has been an unprecedented reduction in the amount of poverty in the world. Despite rapid growth in populations, there has been some extraordinary progress in lifting people out of extreme poverty. Between 1990 and 2010, the numbers in poverty fell by half as a share of the total population in developing countries, from 43% to 21%—a reduction of almost 1 billion people (The Economist Leader, June 1st, 2013). Nonetheless the critics of capitalism will continue to point to an increasing gap between the haves and don’t haves and specifically a decline in the ‘middle classes’, which have for so long provided the backbone of stable democratic societies. This delicate balance between retreating into our own boundaries as a means to manage the pandemic and resuming open borders to prevent economic damage to those whose businesses and employment depend upon the continuing movement of people and goods, is one which is being agonised over at this time in liberal democratic societies around the world. The experience of the pandemic has varied between countries, not solely because of the strategies adopted by politicians, but also because of the current health systems and varying social and economic conditions of life in different parts of the world. For many of us, the crises and social disturbances noted above have been played out on our television screens and websites. Increasingly it seems that we have been consuming our life experiences in a world dominated by our screens and sheltered from the real messiness of life. Meanwhile, in those countries with a choice, the debate has been between public health concerns and economic health concerns. Some have argued that the two are not totally independent of each other, while others have argued that the extent to which they are seen as interrelated lies in the extent to which life’s values have themselves become commodified. Others have pointed to the mental health problems experienced by people of all ages as a result of being confined for long periods of time within limited spaces and experiencing few chances to meet with others outside their immediate household. Still others have experienced different conditions – such as the chance to work from home in a comfortable environment and be freed from the drudgery of commuting in crowded traffic or public transport. So, at a national/communal level as well as at an individual level, this international crisis has exposed people to different decisions. It has offered, for many, a chance to recalibrate their lives. Those who have the resources, are leaving the confines of the big capital cities and seeking a healthier and less turbulent existence in quieter urban centres. For those of us in what can be loosely termed ‘an information industry’, today’s work practices are already an age away from what they were in pre-pandemic times. Yet again, a clear split is evident. The notion of ‘essential industries’ has been reclassified. The delivery of goods, the facilitation of necessary purchase such as food; these and other tasks have acquired a new significance which has enhanced the value of those who deliver these services. However, for those whose tasks can be handled via the internet or offloaded to other anonymous beings a readjustment of a different kind is occurring. So to the future - for those who have suffered ill-health and lost loved ones, the pandemic only reinforces the human priority. Health and well-being trumps economic health and wealth where choices can be made. The closeness of human contact has been reinforced by the tales of families who have been deprived of the touch of their loved ones, many of whom still don’t know when that opportunity will be offered again. When writing our editorial, a year ago, I little expected to be still pursuing a Covid related theme today. Yet where once we were expecting to look back on this time as a minor hiccough, with normal service being resumed sometime last year, it has not turned out to be that way. Rather, it seems that we have been offered a major reset opportunity in the way in which we continue to progress our future as humans. The question is, will we be bold enough to see the opportunity and embrace a healthier more equitable more locally responsible lifestyle or, will we revert to a style of ‘progress’ where powerful countries, organisations and individuals continue to amass increased amounts of wealth and influence and become increasingly less responsive to the needs of individuals in the throng below. Of course, any retreat from globalisation as it has evolved to date, will involve disruption of a different kind, which will inevitably lead to pain for some. It seems inevitable that any change and consequent progress is going to involve winners and losers. Already airline companies and the travel industry are putting pressure on governments to “get back to normal” i.e. where things were previously. Yet, in the shadow of widespread support for climate activism and the extinction rebellion movement, reports have emerged that since the lockdowns air pollution has dropped dramatically around the world – a finding that clearly offers benefits to all our population. In a similar vein the impossibility of overseas air travel in Australia has resulted in a major increase in local tourism, where more inhabitants are discovering the pleasures of their own nation. The transfer of their tourist and holiday dollars from overseas to local tourist providers has produced at one level a traditional zero-sum outcome, but it has also been accompanied by a growing appreciation of local citizens for the wonders of their own land and understanding of the lives of their fellow citizens as well as massive savings in foregone air travel. Continuing to define life in terms of competition for limited resources will inevitably result in an ever-continuing run of zero-sum games. Looking beyond the prism of competition and personal reward has the potential to add to what Michael Sandel (2020) has termed ‘the common good’. Does the possibility of a reset, offer the opportunity to recalibrate our views of effort and reward to go beyond a dollar value and include this important dimension? How has sport been experiencing the pandemic and are there chances for a reset here? An opinion piece from Peter Horton in this edition, has highlighted the growing disconnect of professional sport at the highest level from the communities that gave them birth. Is this just another example of the outcome of unrestrained commodification? Professional sport has suffered in the pandemic with the cancelling of fixtures and the enforced absence of crowds. Yet it has shown remarkable resilience. Sport science staff may have been reduced alongside all the auxiliary workers who go to make up the total support staff on match days and other times. Crowds have been absent, but the game has gone on. Players have still been able to play and receive the support they have become used to from trainers, physiotherapists and analysts, although for the moment there may be fewer of them. Fans have had to rely on electronic media to watch their favourites in action– but perhaps that has just encouraged the continuing spread of support now possible through technology which is no longer dependent on personal attendance through the turnstile. Perhaps for those committed to the watching of live sport in the outdoors, this might offer a chance for more attention to be paid to sport at local and community levels. Might the local villagers be encouraged to interrelate with their hometown heroes, rather than the million-dollar entertainers brought in from afar by the big city clubs? To return to the village analogy and the tensions between global and local, could it be that the social structure of the village has become maladapted to the reality of globalisation? If we wish to retain the traditional values of village life, is returning to our village a necessary strategy? If, however we see that today the benefits and advantages lie in functioning as one single global community, then perhaps we need to do some serious thinking as to how that community can function more effectively for all of its members and not just its ‘elites’. As indicated earlier, sport has always been a reflection of our society. Whichever way our communities decide to progress, sport will have a place at their heart and sport scholars will have a place in critically reflecting the nature of the society we are building. It is on such a note that I am pleased to introduce the content of volume 43:1 to you. We start with a reminder from Hoyoon Jung of the importance of considering the richness provided by a deep analysis of context, when attempting to evaluate and compare outcomes for similar events. He examines the concept of nation building through sport, an outcome that has been frequently attributed to the conduct of successful events. In particular, he examines this outcome in the context of the experiences of South Africa and Brazil as hosts of world sporting events. The mega sporting event that both shared was the FIFA world cup, in 2010 and 2014 respectively. Additional information could be gained by looking backwards to the 1995 Rugby World Cup in the case of South Africa and forward to the 2016 Olympics with regard to Brazil. Differentiating the settings in terms of timing as well as in the makeup of the respective local cultures, has led Jung to conclude that a successful outcome for nation building proved possible in the case of South Africa. However, different settings, both economically and socially, made it impossible for Brazil to replicate the South African experience. From a globally oriented perspective to a more local one, our second paper by Rafal Gotowski and Marta Anna Zurawak examines the growth and development, with regard to both participation and performance, of a more localised activity in Poland - the Nordic walking marathon. Their analysis showed that this is a locally relevant activity that is meeting the health-related exercise needs of an increasing number of people in the middle and later years, including women. It is proving particularly beneficial as an activity due to its ability to offer a high level of intensity while reducing the impact - particularly on the knees. The article by Petr Vlček, Richard Bailey, Jana Vašíčková XXABSTRACT Claude Scheuer is also concerned with health promoting physical activity. Their focus however is on how the necessary habit of regular and relevant physical activity is currently being introduced to the younger generation in European schools through the various physical education curricula. They conclude that physical education lessons, as they are currently being conducted, are not providing the needed 50% minimum threshold of moderate to vigorous physical activity. They go further, to suggest that in reality, depending on the physical education curriculum to provide the necessary quantum of activity within the child’s week, is going to be a flawed vision, given the instructional and other objectives they are also expected to achieve. They suggest implementing instead an ‘Active Schools’ concept, where the PE lessons are augmented by other school-based contexts within a whole school programme of health enhancing physical activity for children. Finally, we step back to the global and international context and the current Pandemic. Eric Burhaein, Nevzt Demirci, Carla Cristina Vieira Lourenco, Zsolt Nemeth and Diajeng Tyas Pinru Phytanza have collaborated as a concerned group of physical educators to provide an important international position statement which addresses the role which structured and systematic physical activity should assume in the current crisis. This edition then concludes with two brief contributions. The first is an opinion piece by Peter Horton which provides a professional and scholarly reaction to the recent attempt by a group of European football club owners to challenge the global football community and establish a self-governing and exclusive European Super League. It is an event that has created great alarm and consternation in the world of football. Horton reflects the outrage expressed by that community and concludes: While recognising the benefits accruing from well managed professionalism, the essential conflict between the values of sport and the values of market capitalism will continue to simmer below the surface wherever sport is commodified rather than practised for more ‘intrinsic’ reasons. We conclude however on a more celebratory note. We are pleased to acknowledge the recognition achieved by one of the members of our International Review Board. The career and achievements of Professor John Wang – a local ‘scholar’- have been recognised in his being appointed as the foundation E.W. Barker Professor in Physical Education and Sport at the Nanyang Technological University. This is a well-deserved honour and one that reflects the growing stature of the Singapore Physical Education and Sports Science community within the world of International Sport Studies. John Saunders Brisbane, June 2021
APA, Harvard, Vancouver, ISO, and other styles
45

Fares, Mireille, Catherine Pelachaud, and Nicolas Obin. "Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding." Frontiers in Artificial Intelligence 6 (June 12, 2023). http://dx.doi.org/10.3389/frai.2023.1142997.

Full text
Abstract:
Modeling virtual agents with behavior style is one factor for personalizing human-agent interaction. We propose an efficient yet effective machine learning approach to synthesize gestures driven by prosodic features and text in the style of different speakers including those unseen during training. Our model performs zero-shot multimodal style transfer driven by multimodal data from the PATS database containing videos of various speakers. We view style as being pervasive; while speaking, it colors the communicative behaviors expressivity while speech content is carried by multimodal signals and text. This disentanglement scheme of content and style allows us to directly infer the style embedding even of a speaker whose data are not part of the training phase, without requiring any further training or fine-tuning. The first goal of our model is to generate the gestures of a source speaker based on the content of two input modalities–Mel spectrogram and text semantics. The second goal is to condition the source speaker's predicted gestures on the multimodal behavior style embedding of a target speaker. The third goal is to allow zero-shot style transfer of speakers unseen during training without re-training the model. Our system consists of two main components: (1) a speaker style encoder network that learns to generate a fixed-dimensional speaker embedding style from a target speaker multimodal data (mel-spectrogram, pose, and text) and (2) a sequence-to-sequence synthesis network that synthesizes gestures based on the content of the input modalities—text and mel-spectrogram—of a source speaker and conditioned on the speaker style embedding. We evaluate that our model is able to synthesize gestures of a source speaker given the two input modalities and transfer the knowledge of target speaker style variability learned by the speaker style encoder to the gesture generation task in a zero-shot setup, indicating that the model has learned a high-quality speaker representation. We conduct objective and subjective evaluations to validate our approach and compare it with baselines.
APA, Harvard, Vancouver, ISO, and other styles
46

Bai, Zhongyu, Hongli Xu, Qichuan Ding, and Xiangyue Zhang. "Side-Scan Sonar Image Classification with Zero-Shot and Style Transfer." IEEE Transactions on Instrumentation and Measurement, 2024, 1. http://dx.doi.org/10.1109/tim.2024.3352693.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Xu, Hongli, Zhongyu Bai, Xiangyue Zhang, and Qichuan Ding. "MFSANet: Zero-Shot Side-Scan Sonar Image Recognition Based on Style Transfer." IEEE Geoscience and Remote Sensing Letters, 2023, 1. http://dx.doi.org/10.1109/lgrs.2023.3318051.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Zhang, Kejun, Rui Zhang, Yonglin Wu, Yifei Li, Yonggen Ling, Bolin Wang, Lingyun Sun, and Yingming Li. "Few-shot font style transfer with multiple style encoders." Science China Information Sciences 65, no. 6 (April 22, 2022). http://dx.doi.org/10.1007/s11432-021-3435-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Zhang, Qing, Jing Zhang, Xiangdong Su, Feilong Bao, and Guanglai Gao. "Contour detection network for zero-shot sketch-based image retrieval." Complex & Intelligent Systems, June 2, 2023. http://dx.doi.org/10.1007/s40747-023-01096-2.

Full text
Abstract:
AbstractZero-shot sketch-based image retrieval (ZS-SBIR) is a challenging task that involves searching natural images related to a given hand-drawn sketch under the zero-shot scene. The previous approach projected image and sketch features into a low-dimensional common space for retrieval, and used semantic features to transfer the knowledge of seen to unseen classes. However, it is not effective enough to align multimodal features when projecting them into a common space, since the styles and contents of sketches and natural images are different and they are not one-to-one correspondence. To solve this problem, we propose a novel three-branch joint training network with contour detection network (called CDNNet) for the ZS-SBIR task, which uses contour maps as a bridge to align sketches and natural images to alleviate the domain gap. Specifically, we use semantic metrics to constrain the relationship between contour images and natural images and between contour images and sketches, so that natural image and sketch features can be aligned in the common space. Meanwhile, we further employ second-order attention to capture target subject information to increase the performance of retrieval descriptors. In addition, we use a teacher model and word embedding method to transfer the knowledge of the seen to the unseen classes. Extensive experiments on two large-scale datasets demonstrate that our proposed approach outperforms state-of-the-art CNN-based models: it improves by 2.6% on the Sketchy and 1.2% on TU-Berlin datasets in terms of mAP.
APA, Harvard, Vancouver, ISO, and other styles
50

Li, Yumei, Guangfeng Lin, Menglan He, Dan Yuan, and Kaiyang Liao. "Layer similarity guiding few-shot Chinese style transfer." Visual Computer, June 7, 2023. http://dx.doi.org/10.1007/s00371-023-02915-w.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography