How France’s Inria Aims to Improve AI Translation for Low-Resource Languages – slator.com

Large language models (LLMs) have significantly improved AI translation for high-resource languages, but performance remains uneven for low-resource languages (LRLs).

In a March 6, 2025 paper, researchers Armel Zebaze, Benoît Sagot, and Rachel Bawden from Inria, the French National Institute for Research in Digital Science and Technology, introduced Compositional Translation (CompTra), an LLM-based approach designed to improve translation quality for LRLs.

CompTra prompts LLMs to break down sentences into simpler phrases, translate each separately using in-context examples, and then use these phrase-translation pairs to guide the translation of the original sentence.

The researchers explained that “LLMs are more effective at handling short phrases” and that simpler phrases make it easier to find relevant in-context examples. By breaking down complex sentences into simpler phrases, CompTra makes the translation task more manageable for LLMs and enables more effective use of limited in-context examples.

Compositionality

CompTra works by first breaking down the input sentence into simpler, independent phrases that capture some of its aspects and use its words in the same context. For each decomposed phrase, the system retrieves relevant in-context examples (typically 4) from a selection pool.

2024 Cover Slator Pro Guide Translation AI

2024 Slator Pro Guide: Translation AI

The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.

With these examples in place, the LLM translates each phrase individually in a few-shot setting, leveraging the retrieved examples as references. This process results in translated outputs for each of the decomposed phrases.

To maintain accuracy, a language identification step is applied to verify that all translations align with the intended target language. Any outputs that fail this check are discarded.

In the final stage, the LLM is prompted once again to translate the original sentence, incorporating the successfully translated phrase-translation pairs as additional context. This helps guide the model in generating a final translation that integrates the insights derived from the previously translated components, leading to a more accurate and contextually appropriate result.

The researchers tested CompTra on translations from English into 15 different LRLs, using datasets including FLORES 200, NTREX 128, and TICO-19. They used several LLMs including LLaMA 3.1 (8B, 70B), Gemma 2 (9B, 27B), and Command-R+.

They found that CompTra consistently outperformed standard similarity-based few-shot translation, with gains measured using XCOMET and MetricX evaluation metrics. Gains were particularly evident when working with smaller selection pools or out-of-domain data.

The researchers hope that “applying compositionality to perform [machine translation] MT will hopefully inspire further work on reasoning-based approaches to MT.”

The code and outputs are publicly available, encouraging further experimentation and adoption.

Source link

Tagged Armel Zebaze, Benoît Sagot, high, Inria, language models, LLMs, Low, Rachel Bawden, researchers, resource languages, translation

DANIEL FINCK

localization

manager · Engineer · consultant

+49 (0) 30 54871960

dfinck@loquatics.com

loquatics.com

linkedin.com/in/dfinck/

Berlin, Germany

Get In Touch

DANIEL FINCK

localization

manager · Engineer · consultant

+49 (0) 30 54871960

dfinck@loquatics.com

loquatics.com

linkedin.com/in/dfinck/

Berlin, Germany

Get In Touch

Compositionality

2024 Slator Pro Guide: Translation AI

DANIEL FINCK

localization

manager · Engineer · consultant

+49 (0) 30 54871960

Get In Touch

Login

DANIEL FINCK

localization

manager · Engineer · consultant

+49 (0) 30 54871960

Get In Touch

Login

Compositionality

2024 Slator Pro Guide: Translation AI

Login

Don't need to reset? Login

Forgot Password?