On February 13, 2025, researchers Itai Mondshine, Tzuf Paz-Argaman, and Reut Tsarfaty from Bar-Ilan University suggested that translating only specific components of a prompt can improve the performance of multilingual large language models (LLMs) across various natural language processing (NLP) tasks.

This research builds on prior work by Google, Alibaba, the Pune Institute of Computer Technology, and others, which explored various strategies to improve multilingual LLM performance. 

These earlier strategies include full pre-translation — translating entire non-English prompts into English before inference —, direct inference — prompting the model directly in the (non-English) source language —, or translating only input sentences from low- to high-resource languages.

In contrast, “selective pre-translation” translates only specific components of a prompt based on the task and language characteristics.

The researchers explained that a prompt consists of four parts: instruction, context, examples, and output, each of which may or may not be translated. The instruction provides guidance to the model, explaining the task. The context represents the data on which the model operates. Examples are optional demonstrations of output pairs used for in-context learning. The output refers to the expected response format, either in English or the source language.

According to the researchers, selective pre-translation is “a more surgical approach,” but its use has been “sporadic” and lacks a systematic research foundation. They emphasized that the optimal selective pre-translation strategy for diverse multilingual settings and tasks remains unclear.

“In this work, we aim to uncover the optimal setup for selective pretranslation by systematically assessing its use,” they said, describing this study as the “ first systematic evaluation,” to their knowledge.

What to Translate

The researchers tested selective pre-translation — comparing it to both full pre-translation and direct inference — across 35 languages and four NLP tasks: question answering (QA), natural language inference (NLI), named entity recognition (NER), and abstractive summarization.

Their experiments involved several LLMs, including GPT-3.5-turbo, Mistral-8x7B, Gemini-1.0-pro, and BLOOMZ-7b1-mt.

They found that for extractive tasks like QA and NER, keeping the context in the source language led to better performance for both high- and low-resource languages. Additionally, prompts that included examples in the source language outperformed those with English examples, especially for low-resource languages

The choice of instruction language had little impact, with similar performance observed whether instructions were given in English or in the source language. The output should generally remain in the source language, except for NER in low-resource languages, where an English output improved results.

For generative tasks such as summarization, pre-translating instructions, context, and examples into English improved performance. Furthermore, generating output in English consistently led to better performance, particularly in low-resource settings.

2024 Cover Slator Pro Guide Translation AI

2024 Slator Pro Guide: Translation AI

The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.

“While it is fine in such generative tasks to instruct the model to generate outputs in the source language for high-resource languages, it appears better to generate in English in the low-resource case,” the researchers noted.

Poor Prompt Translation

Translation quality also played a crucial role in model performance. They found that while poor machine translation (MT) of an entire prompt can degrade results, selective pre-translation helps mitigate these risks.

“Higher translation quality goes hand in hand with improved task performance” they said, highlighting that “selective pre-translation can mitigate the negative effects of poor translation quality by strategically choosing which prompt components to translate.”

The researchers concluded that “selective pre-translation can outperform both full pre-translation and direct inference, particularly for languages considered low-resource.”



Source link