Papers
arxiv:1807.05353

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Published on Jul 14, 2018
Authors:
,

Abstract

Recurrent stacking of a single layer in neural machine translation achieves comparable translation quality to traditional multiple-layer architectures while reducing parameters, with additional improvements from back-translation using pseudo-parallel corpora.

AI-generated summary

In neural machine translation (NMT), the most common practice is to stack a number of recurrent or feed-forward layers in the encoder and the decoder. As a result, the addition of each new layer improves the translation quality significantly. However, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all the layers thereby leading to a recurrently stacked NMT model. We empirically show that the translation quality of a model that recurrently stacks a single layer 6 times is comparable to the translation quality of a model that stacks 6 separate layers. We also show that using pseudo-parallel corpora by back-translation leads to further significant improvements in translation quality.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1807.05353 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1807.05353 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.