arxiv:2401.12413

How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation via Tiny Multi-Parallel Data

Published on Jan 22, 2024

Authors:

Abstract

Zero-shot translation in multilingual machine translation achieves significant improvements through minimal multi-parallel data fine-tuning while maintaining English-centric translation quality.

AI-generated summary

Zero-shot translation aims to translate between language pairs not seen during training in Multilingual Machine Translation (MMT) and is largely considered an open problem. A common, albeit resource-consuming, solution is to add as many related translation directions as possible to the training corpus. In this paper, we show that for an English-centric model, surprisingly large zero-shot improvements can be achieved by simply fine-tuning with a very small amount of multi-parallel data. For example, on the EC30 dataset, we obtain up to +21.7 ChrF non-English overall improvements (870 directions) by using only 100 multi-parallel samples while preserving English-centric translation quality. When investigating the size effect of fine-tuning data and its transfer capabilities, we found that already a small, randomly sampled set of fine-tuning directions is sufficient to achieve comparable improvements. The resulting non-English performance is close to the complete translation upper bound. Even in a minimal setting -- fine-tuning with only one single sample -- the well-known off-target issue is almost completely resolved, explaining parts -- but not all -- of the observed improvements in translation quality.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2401.12413

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2401.12413 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.12413 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2401.12413 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.