In an April 22, 2024 paper, Maxim Enis and Mark Hopkins from Williams College showcased the superior machine translation (MT) capabilities of Claude 3 Opus, a large language model (LLM) released by Anthropic in March 2024.

Specifically, they revealed Claude’s “state-of-the-art” MT ability across various language pairs, including low- and very low-resource language pairs, indicating a potential closing of the performance gap between high- and low-resource languages.

Enis and Hopkins conducted experiments involving English and 36 other languages, comprising 15 high-resource, 17 low-resource, and 4 very low-resource languages. Each language underwent evaluation in both the eng->xxx and xxx->eng directions. 

Their findings revealed Claude’s superiority over strong baselines such as NLLB and Google Translate in 25% of the evaluated languages. However, they noted a performance difference when translating into and out of English with Claude demonstrating better translation ability when translating into English rather than out of English. Claude “still struggles with out-of-English translation,” they noted.

This indicates that supervised baselines still have an advantage over LLMs when English is not the target language of the translation task, according to the researchers.

“Although LLMs may achieve state-of-the-art results in certain translation directions, the cost, time, and energy use of computational inference limits their applicability as translators”.

Promise as Low-Resource Translator

Moreover, Claude exhibited “remarkable resource efficiency” — i.e., the extent to which the performance of a multilingual translation engine depends on the resource level (e.g. high, low, very low) of the language pair — when translating into English.

They noted that “Claude may be the first LLM to demonstrate resource efficiency” in MT, showcasing “particular promise as a low-resource translator,” compared to strong NMT baselines.

Enis and Hopkins benchmarked Claude’s performance against various datasets, including FLORES-200 and newly created ones, which are “verifiably unseen” by Claude. The researchers found signs of data contamination on the FLORES-200 dataset in both translation directions, underscoring “the importance of developing a machine translation benchmark for LLMs with unseen source and target sentences.”

Future Era of LLM-Powered Machine Translation

Despite the promising findings, Enis and Hopkins highlighted the limitations posed by the costs and inference time of LLMs, hindering their broader applicability in MT tasks. “Although LLMs may achieve state-of-the-art results in certain translation directions, the cost, time, and energy use of computational inference limits their applicability as translators,” they said.

To address this challenge, they explored a technique called knowledge distillation, which involves transferring the knowledge and expertise of a complex model (a teacher model) into a smaller one (a student model) and proposed that translation abilities of LLMs — Claude in that case — can be leveraged to advance the state-of-the-art in traditional neural machine translation (NMT). 

10 LLM Use Cases (Main Title)

Slator Pro Guide: Translation AI

The Slator Pro Guide presents 10 new and impactful ways that LLMs can be used to enhance translation workflows.

Enis and Hopkins generated a Yoruba-English parallel corpus for knowledge distillation by translating sentences and documents using Claude 3 Opus. The synthetic data produced was used to train smaller models. These models exhibited performance on par with or surpassing strong baselines like NLLB-54B and Google Translate.

Enis and Hopkins suggested that distillation techniques could be applied productively to LLMs to create compact NMT models that surpass the current state-of-the-art. They also believe that further refinements and optimizations of these methods could result in even better performance. Moreover, their approach can be applied to many more language pairs, whether currently supported by translation systems or not.

Concluding, they emphasized that their findings “point toward a future era of LLM-powered machine translation.”

Source link