Bei Li
2020
Does Multi-Encoder Help? A Case Study on Context-Aware Neural Machine Translation
Bei Li
|
Hui Liu
|
Ziyang Wang
|
Yufan Jiang
|
Tong Xiao
|
Jingbo Zhu
|
Tongran Liu
|
changliang li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
In encoder-decoder neural models, multiple encoders are in general used to represent the contextual information in addition to the individual sentence. In this paper, we investigate multi-encoder approaches in document-level neural machine translation (NMT). Surprisingly, we find that the context encoder does not only encode the surrounding sentences but also behaves as a noise generator. This makes us rethink the real benefits of multi-encoder in context-aware translation - some of the improvements come from robust training. We compare several methods that introduce noise and/or well-tuned dropout setup into the training of these encoders. Experimental results show that noisy training plays an important role in multi-encoder-based NMT, especially when the training data is small. Also, we establish a new state-of-the-art on IWSLT Fr-En task by careful use of noise generation and dropout methods.
The NiuTrans System for WNGT 2020 Efficiency Task
Chi Hu
|
Bei Li
|
Yinqiao Li
|
Ye Lin
|
Yanyang Li
|
Chenglong Wang
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the Fourth Workshop on Neural Generation and Translation
This paper describes the submissions of the NiuTrans Team to the WNGT 2020 Efficiency Shared Task. We focus on the efficient implementation of deep Transformer models (Wang et al., 2019; Li et al., 2019) using NiuTensor, a flexible toolkit for NLP tasks. We explored the combination of deep encoder and shallow decoder in Transformer models via model compression and knowledge distillation. The neural machine translation decoding also benefits from FP16 inference, attention caching, dynamic batching, and batch pruning. Our systems achieve promising results in both translation quality and efficiency, e.g., our fastest system can translate more than 40,000 tokens per second with an RTX 2080 Ti while maintaining 42.9 BLEU on newstest2018.
Search
Co-authors
- Tong Xiao 2
- Jingbo Zhu 2
- Hui Liu 1
- Ziyang Wang 1
- Yufan Jiang 1
- show all...