In a January 28, 2025, paper Rajen Chatterjee and Sarthak Garg from Apple, along with Zilu Tang from Boston University, presented a framework for mitigating translation hallucinations in large language models (LLMs).
According to the researchers, “this is among the first works to demonstrate how to mitigate translation hallucination in LLMs.”
They explained that LLM-based systems are particularly susceptible to translation hallucinations, “which can lead to misunderstandings in conversations, potentially damaging relationships and undermining user trust in the system.”
Additionally, they noted that the presence of certain features in the source sentences — such as quotes, URLs/online handles, or words/phrases in all capital letters — significantly increases the likelihood of hallucinations.
Unlike previous efforts, their approach aims to mitigate the issue at the training stage rather than relying on “post-hoc corrections,” which can add complexity and latency to production systems.
The researchers explained that post-hoc mitigation strategies first detect whether a translation contains hallucination, and if so, generate and present a mitigated translation to the user.
However, in practical scenarios, this has several drawbacks. First, it requires an additional hallucination detector in production. Running that detector on every translation increases cost and latency. Finally, re-running inference if hallucinations are detected is often much slower than regular inference.
“While all these approaches mitigate hallucinations during or after inference, our approach takes an orthogonal path by addressing the issue directly within the model itself,” they said.
Easily Scalable, Highly Effective
To tackle this, they created a system where an LLM could learn from its mistakes by training on examples of good and bad translations, ultimately leading to fewer hallucinations in the translations it produces. Their method involves:
- gathering a large set of English sentences,
- generating translations using a baseline LLM,
- detecting hallucinations using a hallucination detector model,
- creating a preference optimization dataset by categorizing the translations into two groups: preferred (which were accurate) and dispreferred (which contained hallucinations),
- fine-tuning the LLM using the hallucination-focused preference dataset, training the models to prioritize preferred translations over dispreferred ones avoid generating hallucinations.
The researchers claim that their approach reduced hallucinations by 96% across five language pairs (English to Czech, German, Icelandic, Russian, and Chinese). In a zero-shot setting with unseen languages (French, Italian, and Spanish), hallucinations dropped by 89%.
Additionally, they emphasized that their method maintained overall translation quality — performing nearly as well as post-hoc mitigation approaches —- but without the added complexity of running hallucination detection at inference time.
“Our approach requires no additional human-annotated data, is easily scalable across many language pairs, and is highly effective,” they said.
While the research focuses on English-to-X translations, the researchers suggest that their framework could be extended to other language directions with further development.