In an era when large language models (LLMs) are reshaping AI translation, two recent studies have emerged with strategies to tackle document-level machine translation (MT).
One from the University of Zurich introduces a method that treats document-level translation as conversation. Instead of translating a whole document in one go or splitting it into isolated segments, they propose breaking a document into segments (paragraphs) and translating them one at a time while carrying forward the conversation’s context.
This means each new segment is translated with the benefit of all previous translations, much like following a conversation where earlier exchanges set the context.
According to the researchers, this approach leverages the “natural ability” of LLMs to “remember” what has been translated already, preventing important details from being dropped and ensuring a more coherent translation from start to finish.
The Big Picture
A particularly interesting twist in their approach is what they call “source-primed variant.” In this variant, before translating any segment, the model reads the entire document to get a sense of the overall topic and style. Subsequent translations then benefit from this “big picture” context, resulting in outputs that are more consistent and coherent.
The researchers explained that this is important for the initial segments that otherwise would be translated with little context.
“This approach has the advantage of providing information about the document’s topic and style, which might help the model generate appropriate tense and formality levels from the start,” they said.
Additionally, they emphasized that “this methodology is simple and effective, does not require additional training, and can be applied to any LLMs that support chat mode.”
Evaluation of the WMT-24 general track shows that this strategy outperforms both single-turn and traditional segment-level methods.
Mimicking Translators’ Behavior
Meanwhile, researchers from Huawei and Soochow University take a different approach — one that mimics the way professional translators work. Their multi-knowledge fusion approach adds extra layers of context to the translation process.
In practice, the approach augments the translation process with two additional layers of knowledge: a summarization of the entire document and a dedicated translation of key entity terms.
First, the LLM is prompted to generate a brief summary of the full document to capture its main ideas. Next, it extracts and translates key entity words — such as names, places, and events — that are critical for maintaining lexical and terminological consistency. With these additional pieces of knowledge in hand, the model produces several candidate translations for each segment. A final reranking process then selects the best translation.
“We propose an enhanced approach by incorporating multiple sources of knowledge, including both the document summarization and entity translation, to enhance the performance of LLM-based document-level MT,” the researchers said.
Experiments showed that this extra context can improve translation quality — boosting COMET scores by up to 0.8 points in some cases. “Our multi-knowledge-fusion approach significantly enhances LLM performance on document-level MT,” they said.
Emerging Research Area with Significant Potential
Both approaches underscore that effective document-level MT translation is about more than putting together sentence-level outputs — it’s about capturing the full narrative.
As LLMs continue to evolve, both approaches offer ways to overcome the challenges inherent in document-level MT.
Concluding, Huawei and Soochow University researchers noted that “the adaptation of LLMs for document-level MT is an emerging research area with significant potential.”
Authors:
University of Zurich paper — Hanxu Hu, Jannis Vamvas, and Rico Sennrich;
Huawei and Soochow University paper — Bin Liu, Xinglin Lyu, Junhui Li, Daimeng Wei, Min Zhang, Shimin Tao, and Hao Yang