Log in

Relevant bibliographies by topics / Non-autoregressive Machine Translation / Journal articles

To see the other types of publications on this topic, follow the link: Non-autoregressive Machine Translation.

Journal articles on the topic 'Non-autoregressive Machine Translation'

Author: Grafiati

Published: 25 May 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 31 journal articles for your research on the topic 'Non-autoregressive Machine Translation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Wang, Yiren, Fei Tian, Di He, Tao Qin, ChengXiang Zhai, and Tie-Yan Liu. "Non-Autoregressive Machine Translation with Auxiliary Regularization." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 5377–84. http://dx.doi.org/10.1609/aaai.v33i01.33015377.

Full text

Abstract:

As a new neural machine translation approach, NonAutoregressive machine Translation (NAT) has attracted attention recently due to its high efficiency in inference. However, the high efficiency has come at the cost of not capturing the sequential dependency on the target side of translation, which causes NAT to suffer from two kinds of translation errors: 1) repeated translations (due to indistinguishable adjacent decoder hidden states), and 2) incomplete translations (due to incomplete transfer of source side information via the decoder hidden states). In this paper, we propose to address these two problems by improving the quality of decoder hidden representations via two auxiliary regularization terms in the training process of an NAT model. First, to make the hidden states more distinguishable, we regularize the similarity between consecutive hidden states based on the corresponding target tokens. Second, to force the hidden states to contain all the information in the source sentence, we leverage the dual nature of translation tasks (e.g., English to German and German to English) and minimize a backward reconstruction error to ensure that the hidden states of the NAT decoder are able to recover the source side sentence. Extensive experiments conducted on several benchmark datasets show that both regularization strategies are effective and can alleviate the issues of repeated translations and incomplete translations in NAT models. The accuracy of NAT models is therefore improved significantly over the state-of-the-art NAT models with even better efficiency for inference.

APA, Harvard, Vancouver, ISO, and other styles

2

Wang, Shuheng, Shumin Shi, Heyan Huang, and Wei Zhang. "Improving Non-Autoregressive Machine Translation via Autoregressive Training." Journal of Physics: Conference Series 2031, no. 1 (September 1, 2021): 012045. http://dx.doi.org/10.1088/1742-6596/2031/1/012045.

Full text

Abstract:

Abstract In recent years, non-autoregressive machine translation has attracted many researchers’ attentions. Non-autoregressive translation (NAT) achieves faster decoding speed at the cost of translation accuracy compared with autoregressive translation (AT). Since NAT and AT models have similar architecture, a natural idea is to use AT task assisting NAT task. Previous works use curriculum learning or distillation to improve the performance of NAT model. However, they are complex to follow and diffucult to be integrated into some new works. So in this paper, to make it easy, we introduce a multi-task framework to improve the performance of NAT task. Specially, we use a fully shared encoder-decoder network to train NAT task and AT task simultaneously. To evaluate the performance of our model, we conduct experiments on serval benchmask tasks, including WMT14 EN-DE, WMT16 EN-RO and IWSLT14 DE-EN. The experimental results demonstrate that our model achieves improvements but still keeps simple.

APA, Harvard, Vancouver, ISO, and other styles

3

Shao, Chenze, Jinchao Zhang, Jie Zhou, and Yang Feng. "Rephrasing the Reference for Non-autoregressive Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 13538–46. http://dx.doi.org/10.1609/aaai.v37i11.26587.

Full text

Abstract:

Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem that there may exist multiple possible translations of a source sentence, so the reference sentence may be inappropriate for the training when the NAT output is closer to other translations. In response to this problem, we introduce a rephraser to provide a better training target for NAT by rephrasing the reference sentence according to the NAT output. As we train NAT based on the rephraser output rather than the reference sentence, the rephraser output should fit well with the NAT output and not deviate too far from the reference, which can be quantified as reward functions and optimized by reinforcement learning. Experiments on major WMT benchmarks and NAT baselines show that our approach consistently improves the translation quality of NAT. Specifically, our best variant achieves comparable performance to the autoregressive Transformer, while being 14.7 times more efficient in inference.

APA, Harvard, Vancouver, ISO, and other styles

4

Ran, Qiu, Yankai Lin, Peng Li, and Jie Zhou. "Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 15 (May 18, 2021): 13727–35. http://dx.doi.org/10.1609/aaai.v35i15.17618.

Full text

Abstract:

Non-autoregressive neural machine translation (NAT) generates each target word in parallel and has achieved promising inference acceleration. However, existing NAT models still have a big gap in translation quality compared to autoregressive neural machine translation models due to the multimodality problem: the target words may come from multiple feasible translations. To address this problem, we propose a novel NAT framework ReorderNAT which explicitly models the reordering information to guide the decoding of NAT. Specially, ReorderNAT utilizes deterministic and non-deterministic decoding strategies that leverage reordering information as a proxy for the final translation to encourage the decoder to choose words belonging to the same translation. Experimental results on various widely-used datasets show that our proposed model achieves better performance compared to most existing NAT models, and even achieves comparable translation quality as autoregressive translation models with a significant speedup.

APA, Harvard, Vancouver, ISO, and other styles

5

Wang, Shuheng, Shumin Shi, and Heyan Huang. "Enhanced encoder for non-autoregressive machine translation." Machine Translation 35, no. 4 (November 16, 2021): 595–609. http://dx.doi.org/10.1007/s10590-021-09285-x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Shao, Chenze, Jinchao Zhang, Yang Feng, Fandong Meng, and Jie Zhou. "Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 01 (April 3, 2020): 198–205. http://dx.doi.org/10.1609/aaai.v34i01.5351.

Full text

Abstract:

Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to generate influent translations with over-translation and under-translation errors. In this paper, we propose to train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The bag-of-ngrams training objective is differentiable and can be efficiently calculated, which encourages NAT to capture the target-side sequential dependency and correlates well with the translation quality. We validate our approach on three translation tasks and show that our approach largely outperforms the NAT baseline by about 5.0 BLEU scores on WMT14 En↔De and about 2.5 BLEU scores on WMT16 En↔Ro.

APA, Harvard, Vancouver, ISO, and other styles

7

Li, Feng, Jingxian Chen, and Xuejun Zhang. "A Survey of Non-Autoregressive Neural Machine Translation." Electronics 12, no. 13 (July 6, 2023): 2980. http://dx.doi.org/10.3390/electronics12132980.

Full text

Abstract:

Non-autoregressive neural machine translation (NAMT) has received increasing attention recently in virtue of its promising acceleration paradigm for fast decoding. However, these splendid speedup gains are at the cost of accuracy, in comparison to its autoregressive counterpart. To close this performance gap, many studies have been conducted for achieving a better quality and speed trade-off. In this paper, we survey the NAMT domain from two new perspectives, i.e., target dependency management and training strategies arrangement. Proposed approaches are elaborated at length, involving five model categories. We then collect extensive experimental data to present abundant graphs for quantitative evaluation and qualitative comparison according to the reported translation performance. Based on that, a comprehensive performance analysis is provided. Further inspection is conducted for two salient problems: target sentence length prediction and sequence-level knowledge distillation. Accumulative reinvestigation of translation quality and speedup demonstrates that non-autoregressive decoding may not run fast as it seems and still lacks authentic surpassing for accuracy. We finally prospect potential work from inner and outer facets and call for more practical and warrantable studies for the future.

APA, Harvard, Vancouver, ISO, and other styles

8

Liu, Min, Yu Bao, Chengqi Zhao, and Shujian Huang. "Selective Knowledge Distillation for Non-Autoregressive Neural Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 13246–54. http://dx.doi.org/10.1609/aaai.v37i11.26555.

Full text

Abstract:

Benefiting from the sequence-level knowledge distillation, the Non-Autoregressive Transformer (NAT) achieves great success in neural machine translation tasks. However, existing knowledge distillation has side effects, such as propagating errors from the teacher to NAT students, which may limit further improvements of NAT models and are rarely discussed in existing research. In this paper, we introduce selective knowledge distillation by introducing an NAT evaluator to select NAT-friendly targets that are of high quality and easy to learn. In addition, we introduce a simple yet effective progressive distillation method to boost NAT performance. Experiment results on multiple WMT language directions and several representative NAT models show that our approach can realize a flexible trade-off between the quality and complexity of training data for NAT models, achieving strong performances. Further analysis shows that distilling only 5% of the raw translations can help an NAT outperform its counterpart trained on raw data by about 2.4 BLEU.

APA, Harvard, Vancouver, ISO, and other styles

9

Du, Quan, Kai Feng, Chen Xu, Tong Xiao, and Jingbo Zhu. "Non-autoregressive neural machine translation with auxiliary representation fusion." Journal of Intelligent & Fuzzy Systems 41, no. 6 (December 16, 2021): 7229–39. http://dx.doi.org/10.3233/jifs-211105.

Full text

Abstract:

Recently, many efforts have been devoted to speeding up neural machine translation models. Among them, the non-autoregressive translation (NAT) model is promising because it removes the sequential dependence on the previously generated tokens and parallelizes the generation process of the entire sequence. On the other hand, the autoregressive translation (AT) model in general achieves a higher translation accuracy than the NAT counterpart. Therefore, a natural idea is to fuse the AT and NAT models to seek a trade-off between inference speed and translation quality. This paper proposes an ARF-NAT model (NAT with auxiliary representation fusion) to introduce the merit of a shallow AT model to an NAT model. Three functions are designed to fuse the auxiliary representation into the decoder of the NAT model. Experimental results show that ARF-NAT outperforms the NAT baseline by 5.26 BLEU scores on the WMT’14 German-English task with a significant speedup (7.58 times) over several strong AT baselines.

APA, Harvard, Vancouver, ISO, and other styles

10

Xinlu, Zhang, Wu Hongguan, Ma Beijiao, and Zhai Zhengang. "Research on Low Resource Neural Machine Translation Based on Non-autoregressive Model." Journal of Physics: Conference Series 2171, no. 1 (January 1, 2022): 012045. http://dx.doi.org/10.1088/1742-6596/2171/1/012045.

Full text

Abstract:

Abstract The autoregressive model can’t make full use of context information because of its single direction of generation, and the autoregressive method can’t perform parallel computation in decoding, which affects the efficiency of translation generation. Therefore, we explore a non-autoregressive translation generation method based on insertion and deletion in low-resource languages, which decomposes translation generation into three steps: deletion-insertion-generation. Therefore, the dynamic editing of the translation can be realized in the iterative updating process. At the same time, each step can be calculated in parallel, which improves the decoding efficiency. In order to reduce the complexity of data sets in non-autoregressive model training, we have trained Uyghur-Chinese training data with sequence-level knowledge distillation. Experiments on Uyghur-Chinese, English-Romanian distilled data sets and standard data sets verify the effectiveness of the non-autoregressive method.

APA, Harvard, Vancouver, ISO, and other styles

11

Huang, Chenyang, Hao Zhou, Osmar R. Zaïane, Lili Mou, and Lei Li. "Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10776–84. http://dx.doi.org/10.1609/aaai.v36i10.21323.

Full text

Abstract:

How do we perform efficient inference while retaining high translation quality? Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient. Recent non-autoregressive translation models speed up the inference, but their quality is still inferior. In this work, we propose DSLP, a highly efficient and high-performance model for machine translation. The key insight is to train a non-autoregressive Transformer with Deep Supervision and feed additional Layer-wise Predictions. We conducted extensive experiments on four translation tasks (both directions of WMT'14 EN-DE and WMT'16 EN-RO). Results show that our approach consistently improves the BLEU scores compared with respective base models. Specifically, our best variant outperforms the autoregressive model on three translation tasks, while being 14.8 times more efficient in inference.

APA, Harvard, Vancouver, ISO, and other styles

12

Guo, Junliang, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, and Tie-Yan Liu. "Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7839–46. http://dx.doi.org/10.1609/aaai.v34i05.6289.

Full text

Abstract:

Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifically, we design a curriculum in the fine-tuning process to progressively switch the training from autoregressive generation to non-autoregressive generation. Experiments on four benchmark translation datasets show that the proposed method achieves good improvement (more than 1 BLEU score) over previous NAT baselines in terms of translation accuracy, and greatly speed up (more than 10 times) the inference process over AT baselines.

APA, Harvard, Vancouver, ISO, and other styles

13

Guo, Junliang, Xu Tan, Di He, Tao Qin, Linli Xu, and Tie-Yan Liu. "Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3723–30. http://dx.doi.org/10.1609/aaai.v33i01.33013723.

Full text

Abstract:

Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens from the inputs of the decoder, achieve significantly inference speedup but at the cost of inferior accuracy compared to autoregressive translation (AT) models. Previous work shows that the quality of the inputs of the decoder is important and largely impacts the model accuracy. In this paper, we propose two methods to enhance the decoder inputs so as to improve NAT models. The first one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, which are then fed into the decoder as inputs. The second one transforms source-side word embeddings to target-side word embeddings through sentence-level alignment and word-level adversary learning, and then feeds the transformed word embeddings into the decoder as inputs. Experimental results show our method largely outperforms the NAT baseline (Gu et al. 2017) by 5.11 BLEU scores on WMT14 English-German task and 4.72 BLEU scores on WMT16 English-Romanian task.

APA, Harvard, Vancouver, ISO, and other styles

14

Shu, Raphael, Jason Lee, Hideki Nakayama, and Kyunghyun Cho. "Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference Using a Delta Posterior." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8846–53. http://dx.doi.org/10.1609/aaai.v34i05.6413.

Full text

Abstract:

Although neural machine translation models reached high translation quality, the autoregressive nature makes inference difficult to parallelize and leads to high translation latency. Inspired by recent refinement-based approaches, we propose LaNMT, a latent-variable non-autoregressive model with continuous latent variables and deterministic inference procedure. In contrast to existing approaches, we use a deterministic inference algorithm to find the target sequence that maximizes the lowerbound to the log-probability. During inference, the length of translation automatically adapts itself. Our experiments show that the lowerbound can be greatly increased by running the inference algorithm, resulting in significantly improved translation quality. Our proposed model closes the performance gap between non-autoregressive and autoregressive approaches on ASPEC Ja-En dataset with 8.6x faster decoding. On WMT'14 En-De dataset, our model narrows the gap with autoregressive baseline to 2.0 BLEU points with 12.5x speedup. By decoding multiple initial latent variables in parallel and rescore using a teacher model, the proposed model further brings the gap down to 1.0 BLEU point on WMT'14 En-De task with 6.8x speedup.

APA, Harvard, Vancouver, ISO, and other styles

15

Wang, Shuheng, Heyan Huang, and Shumin Shi. "Improving Non-Autoregressive Machine Translation Using Sentence-Level Semantic Agreement." Applied Sciences 12, no. 10 (May 16, 2022): 5003. http://dx.doi.org/10.3390/app12105003.

Full text

Abstract:

Theinference stage can be accelerated significantly using a Non-Autoregressive Transformer (NAT). However, the training objective used in the NAT model also aims to minimize the loss between the generated words and the golden words in the reference. Since the dependencies between the target words are lacking, this training objective computed at word level can easily cause semantic inconsistency between the generated and source sentences. To alleviate this issue, we propose a new method, Sentence-Level Semantic Agreement (SLSA), to obtain consistency between the source and generated sentences. Specifically, we utilize contrastive learning to pull the sentence representations of the source and generated sentences closer together. In addition, to strengthen the capability of the encoder, we also integrate an agreement module into the encoder to obtain a better representation of the source sentence. The experiments are conducted on three translation datasets: the WMT 2014 EN → DE task, the WMT 2016 EN → RO task, and the IWSLT 2014 DE → DE task, and the improvement in the NAT model’s performance shows the effect of our proposed method.

APA, Harvard, Vancouver, ISO, and other styles

16

Guo, Pei, Yisheng Xiao, Juntao Li, and Min Zhang. "RenewNAT: Renewing Potential Translation for Non-autoregressive Transformer." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 12854–62. http://dx.doi.org/10.1609/aaai.v37i11.26511.

Full text

Abstract:

Non-autoregressive neural machine translation (NAT) models are proposed to accelerate the inference process while maintaining relatively high performance. However, existing NAT models are difficult to achieve the desired efficiency-quality trade-off. For one thing, fully NAT models with efficient inference perform inferior to their autoregressive counterparts. For another, iterative NAT models can, though, achieve comparable performance while diminishing the advantage of speed. In this paper, we propose RenewNAT, a flexible framework with high efficiency and effectiveness, to incorporate the merits of fully and iterative NAT models. RenewNAT first generates the potential translation results and then renews them in a single pass. It can achieve significant performance improvements at the same expense as traditional NAT models (without introducing additional model parameters and decoding latency). Experimental results on various translation benchmarks (e.g., 4 WMT) show that our framework consistently improves the performance of strong fully NAT methods (e.g., GLAT and DSLP) without additional speed overhead.

APA, Harvard, Vancouver, ISO, and other styles

17

Weng, Rongxiang, Heng Yu, Weihua Luo, and Min Zhang. "Deep Fusing Pre-trained Models into Neural Machine Translation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 11468–76. http://dx.doi.org/10.1609/aaai.v36i10.21399.

Full text

Abstract:

Pre-training and fine-tuning have become the de facto paradigm in many natural language processing (NLP) tasks. However, compared to other NLP tasks, neural machine translation (NMT) aims to generate target language sentences through the contextual representation from the source language counterparts. This characteristic means the optimization objective of NMT is far from that of the universal pre-trained models (PTMs), leading to the standard procedure of pre-training and fine-tuning does not work well in NMT. In this paper, we propose a novel framework to deep fuse the pre-trained representation into NMT, fully exploring the potential of PTMs in NMT. Specifically, we directly replace the randomly initialized Transformer encoder with a pre-trained encoder and propose a layer-wise coordination structure to coordinate PTM and NMT decoder learning. Then, we introduce a partitioned multi-task learning method to fine-tune the pre-trained parameter, reducing the gap between PTM and NMT by progressively learning the task-specific representation. Experimental results show that our approach achieves considerable improvements on WMT14 En2De, WMT14 En2Fr, and WMT16 Ro2En translation benchmarks and outperforms previous work in both autoregressive and non-autoregressive NMT models.

APA, Harvard, Vancouver, ISO, and other styles

18

Huang, Fei, Pei Ke, and Minlie Huang. "Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation." Transactions of the Association for Computational Linguistics 11 (2023): 941–59. http://dx.doi.org/10.1162/tacl_a_00582.

Full text

Abstract:

Abstract Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.1

APA, Harvard, Vancouver, ISO, and other styles

19

Xiao, Yisheng, Ruiyang Xu, Lijun Wu, Juntao Li, Tao Qin, Tie-Yan Liu, and Min Zhang. "AMOM: Adaptive Masking over Masking for Conditional Masked Language Model." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 13789–97. http://dx.doi.org/10.1609/aaai.v37i11.26615.

Full text

Abstract:

Transformer-based autoregressive (AR) methods have achieved appealing performance for varied sequence-to-sequence generation tasks, e.g., neural machine translation, summarization, and code generation, but suffer from low inference efficiency. To speed up the inference stage, many non-autoregressive (NAR) strategies have been proposed in the past few years. Among them, the conditional masked language model (CMLM) is one of the most versatile frameworks, as it can support many different sequence generation scenarios and achieve very competitive performance on these tasks. In this paper, we further introduce a simple yet effective adaptive masking over masking strategy to enhance the refinement capability of the decoder and make the encoder optimization easier. Experiments on 3 different tasks (neural machine translation, summarization, and code generation) with 15 datasets in total confirm that our proposed simple method achieves significant performance improvement over the strong CMLM model. Surprisingly, our proposed model yields state-of-the-art performance on neural machine translation (34.62 BLEU on WMT16 EN to RO, 34.82 BLEU on WMT16 RO to EN, and 34.84 BLEU on IWSLT De to En) and even better performance than the AR Transformer on 7 benchmark datasets with at least 2.2x speedup. Our code is available at GitHub.

APA, Harvard, Vancouver, ISO, and other styles

20

Dong, Qianqian, Feng Wang, Zhen Yang, Wei Chen, Shuang Xu, and Bo Xu. "Adapting Translation Models for Transcript Disfluency Detection." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6351–58. http://dx.doi.org/10.1609/aaai.v33i01.33016351.

Full text

Abstract:

Transcript disfluency detection (TDD) is an important component of the real-time speech translation system, which arouses more and more interests in recent years. This paper presents our study on adapting neural machine translation (NMT) models for TDD. We propose a general training framework for adapting NMT models to TDD task rapidly. In this framework, the main structure of the model is implemented similar to the NMT model. Additionally, several extended modules and training techniques which are independent of the NMT model are proposed to improve the performance, such as the constrained decoding, denoising autoencoder initialization and a TDD-specific training object. With the proposed training framework, we achieve significant improvement. However, it is too slow in decoding to be practical. To build a feasible and production-ready solution for TDD, we propose a fast non-autoregressive TDD model following the non-autoregressive NMT model emerged recently. Even we do not assume the specific architecture of the NMT model, we build our TDD model on the basis of Transformer, which is the state-of-the-art NMT model. We conduct extensive experiments on the publicly available set, Switchboard, and in-house Chinese set. Experimental results show that the proposed model significantly outperforms previous state-ofthe-art models.

APA, Harvard, Vancouver, ISO, and other styles

21

Xu, Weijia, and Marine Carpuat. "EDITOR: An Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints." Transactions of the Association for Computational Linguistics 9 (2021): 311–28. http://dx.doi.org/10.1162/tacl_a_00368.

Full text

Abstract:

Abstract We introduce an Edit-Based TransfOrmer with Repositioning (EDITOR), which makes sequence generation flexible by seamlessly allowing users to specify preferences in output lexical choice. Building on recent models for non-autoregressive sequence generation (Gu et al., 2019), EDITOR generates new sequences by iteratively editing hypotheses. It relies on a novel reposition operation designed to disentangle lexical choice from word positioning decisions, while enabling efficient oracles for imitation learning and parallel edits at decoding time. Empirically, EDITOR uses soft lexical constraints more effectively than the Levenshtein Transformer (Gu et al., 2019) while speeding up decoding dramatically compared to constrained beam search (Post and Vilar, 2018). EDITOR also achieves comparable or better translation quality with faster decoding speed than the Levenshtein Transformer on standard Romanian-English, English-German, and English-Japanese machine translation tasks.

APA, Harvard, Vancouver, ISO, and other styles

22

S, Tarun. "Bridging Languages through Images: A Multilingual Text-to-Image Synthesis Approach." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (May 11, 2024): 1–5. http://dx.doi.org/10.55041/ijsrem33773.

Full text

Abstract:

This research investigates the challenges posed by the predominant focus on English language text-to-image generation (TTI) because of the lack of annotated image caption data in other languages. The resulting inequitable access to TTI technology in non-English-speaking regions motivates the research of multilingual TTI (mTTI) and the potential of neural machine translation (NMT) to facilitate its development. The study presents two main contributions. Firstly, a systematic empirical study employing a multilingual multi-modal encoder evaluates standard cross-lingual NLP methods applied to mTTI, including TRANSLATE TRAIN, TRANSLATE TEST, and ZERO-SHOT TRANSFER. Secondly, a novel parameter-efficient approach called Ensemble Adapter (ENSAD) is introduced, leveraging multilingual text knowledge within the mTTI framework to avoid the language gap and enhance mTTI performance. Additionally, the research addresses challenges associated with transformer-based TTI models, such as slow generation and complexity for high-resolution images. It proposes hierarchical transformers and local parallel autoregressive generation techniques to overcome these limitations. A 6B-parameter transformer pretrained with a cross-modal general language model (CogLM) and fine-tuned for fast super-resolution results in a new text-to-image system, denoted as It, which demonstrates competitive performance compared to the state-of-the-art DALL-E-2. Furthermore, It supports interactive text-guided editing on images, offering a versatile and efficient solution for text-to-image generation.. Keywords: Text-to-image generation, Multilingual TTI (mTTI), Neural machine translation (NMT), Cross-lingual NLP, Ensemble Adapter (ENSAD), Hierarchical transformers, Super- resolution, Transformer-based models, Cross-modal general language model (CogLM).

APA, Harvard, Vancouver, ISO, and other styles

23

Welleck, Sean, and Kyunghyun Cho. "MLE-Guided Parameter Search for Task Loss Minimization in Neural Sequence Modeling." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 16 (May 18, 2021): 14032–40. http://dx.doi.org/10.1609/aaai.v35i16.17652.

Full text

Abstract:

Neural autoregressive sequence models are used to generate sequences in a variety of natural language processing (NLP) tasks, where they are evaluated according to sequence-level task losses. These models are typically trained with maximum likelihood estimation, which ignores the task loss, yet empirically performs well as a surrogate objective. Typical approaches to directly optimizing the task loss such as policy gradient and minimum risk training are based around sampling in the sequence space to obtain candidate update directions that are scored based on the loss of a single sequence. In this paper, we develop an alternative method based on random search in the parameter space that leverages access to the maximum likelihood gradient. We propose maximum likelihood guided parameter search (MGS), which samples from a distribution over update directions that is a mixture of random search around the current parameters and around the maximum likelihood gradient, with each direction weighted by its improvement in the task loss. MGS shifts sampling to the parameter space, and scores candidates using losses that are pooled from multiple sequences. Our experiments show that MGS is capable of optimizing sequence-level losses, with substantial reductions in repetition and non-termination in sequence completion, and similar improvements to those of minimum risk training in machine translation.

APA, Harvard, Vancouver, ISO, and other styles

24

Liu, Chuanming, and Jingqi Yu. "Uncertainty-aware non-autoregressive neural machine translation." Computer Speech & Language, August 2022, 101444. http://dx.doi.org/10.1016/j.csl.2022.101444.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Shao, Chenze, Yang Feng, Jinchao Zhang, Fandong Meng, and Jie Zhou. "Sequence-Level Training for Non-Autoregressive Neural Machine Translation." Computational Linguistics, September 6, 2021, 1–36. http://dx.doi.org/10.1162/coli_a_00421.

Full text

Abstract:

Abstract In recent years, Neural Machine Translation (NMT) has achieved notable results in various translation tasks. However, the word-by-word generation manner determined by the autoregressive mechanism leads to high translation latency of the NMT and restricts its low-latency applications. Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup through generating target words independently and simultaneously. Nevertheless, NAT still takes the word-level cross-entropy loss as the training objective, which is not optimal because the output of NAT cannot be properly evaluated due to the multimodality problem. In this article, we propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality. Firstly, we propose training NAT models to optimize sequence-level evaluation metrics (e.g., BLEU) based on several novel reinforcement algorithms customized for NAT, which outperforms the conventional method by reducing the variance of gradient estimation. Secondly, we introduce a novel training objective for NAT models, which aims to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The BoN training objective is differentiable and can be calculated efficiently without doing any approximations. Finally, we apply a three-stage training strategy to combine these two methods to train the NAT model.We validate our approach on four translation tasks (WMT14 En↔De, WMT16 En↔Ro), which shows that our approach largely outperforms NAT baselines and achieves remarkable performance on all translation tasks. The source code is available at https://github.com/ictnlp/Seq-NAT.

APA, Harvard, Vancouver, ISO, and other styles

26

Wang, Shuheng, Heyan Huang, and Shumin Shi. "Incorporating history and future into non-autoregressive machine translation." Computer Speech & Language, July 2022, 101439. http://dx.doi.org/10.1016/j.csl.2022.101439.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Sheshadri, Shailashree K., and Deepa Gupta. "KasNAT: Non-autoregressive machine translation for Kashmiri to English using knowledge distillation." Journal of Intelligent & Fuzzy Systems, April 26, 2024, 1–15. http://dx.doi.org/10.3233/jifs-219383.

Full text

Abstract:

Non-Autoregressive Machine Translation (NAT) represents a groundbreaking advancement in Machine Translation, enabling the simultaneous prediction of output tokens and significantly boosting translation speeds compared to traditional auto-regressive (AR) models. Recent NAT models have adeptly balanced translation quality and speed, surpassing their AR counterparts. The widely employed Knowledge Distillation (KD) technique in NAT involves generating training data from pre-trained AR models, enhancing NAT model performance. While KD has consistently proven its empirical effectiveness and substantial accuracy gains in NAT models, its potential within Indic languages has yet to be explored. This study pioneers the evaluation of NAT model performance for Indic languages, focusing mainly on Kashmiri to English translation. Our exploration encompasses varying encoder and decoder layers and fine-tuning hyper-parameters, shedding light on the vital role KD plays in facilitating NAT models to capture variations in output data effectively. Our NAT models, enhanced with KD, exhibit sacreBLEU scores ranging from 16.20 to 22.20. The Insertion Transformer reaches a SacreBLEU of 22.93, approaching AR model performance.

APA, Harvard, Vancouver, ISO, and other styles

28

Xie, Pan, Zexian Li, Zheng Zhao, Jiaqi Liu, and Xiaohui Hu. "MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive Machine Translation." IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 1–10. http://dx.doi.org/10.1109/taslp.2022.3221043.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Wang, Shuheng, Shumin Shi, and Heyan Huang. "Alleviating repetitive tokens in non-autoregressive machine translation with unlikelihood training." Soft Computing, January 3, 2024. http://dx.doi.org/10.1007/s00500-023-09490-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Xiao, Yisheng, Lijun Wu, Junliang Guo, Juntao Li, Min Zhang, Tao Qin, and Tie-Yan Liu. "A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond." IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 1–20. http://dx.doi.org/10.1109/tpami.2023.3277122.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Lim, Yeon-Soo, Eun-Ju Park, Hyun-Je Song, and Seong-Bae Park. "A Non-Autoregressive Neural Machine Translation Model with Iterative Length Update of Target Sentence." IEEE Access, 2022, 1. http://dx.doi.org/10.1109/access.2022.3169419.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!