In a March 14, 2025 paper, researchers from Alibaba‘s MarcoPolo Team explored the translation capabilities of large reasoning models (LRMs) like OpenAI’s o1 and o3, DeepSeeks’s R1, Anthropic’s Claude 3.7 Sonnet, or xAI’s Grok 3, positioning them as “the next evolution” in translation beyond neural machine translation (NMT) and large language models (LLMs).
They explained that, unlike traditional LLMs, LRMs introduce reasoning capabilities that allow them to dynamically infer meaning beyond the text.
Referring to LRMs as “multilingual cognitive agents,” they claim these models are “reframing” AI translation as a “dynamic reasoning task” rather than a simple text-to-text mapping.
The researchers investigated various translation scenarios and found that LRMs outperform existing LLMs in complex translation tasks such as stylized and document-level translation.
New Possibilities for Translation
They explained that LRMs first analyze the source text’s style and intent before generating a translation. This reasoning-driven approach enables them to capture stylistic nuances more effectively than LLMs.
However, it also introduces the risk of “over-localization” — where the model adapts too closely to the target language’s stylistic norms at the expense of fidelity to the source text.
Additionally, LRMs use reasoning to unify context across paragraphs, a capability that significantly improves document-level translation. The researchers noted that LRMs enhance terminology consistency, pronoun resolution, tone adaptation, and logical coherence across longer texts.
“By enabling translation systems to dynamically reason about context, culture, and intent, LRMs open up new possibilities for translation,” they said.
Promise in Multimodal Translation
Beyond text translation, LRMs also show promise in multimodal translation, integrating textual and non-textual inputs such as images, according to the researchers.
Unlike LLMs, which primarily rely on pattern recognition, LRMs infer relationships between different modalities, allowing them to gain better contextual understanding and resolve ambiguities.
However, the researchers acknowledge that challenges remain, particularly in processing highly domain-specific visual content or sign language.
Another distinguishing feature of LRMs is their self-reflection capability, which allows them to identify and correct translation errors during inference. This makes them more robust in handling noisy, incomplete, or ambiguous inputs compared to standard LLMs.
Inference Inefficiency
Although “LRMs represent a significant advancement over traditional MT systems and even LLMs,” according to the researchers, inference efficiency remains a major challenge.
Their reliance on chain-of-thought reasoning improves translation quality but increases computational overhead and latency. “This inefficiency poses a barrier to real-time applications of LRMs,” they noted.
While the study highlights LRMs as “a significant step forward” in AI translation, the researchers concluded that “their full potential has yet to be realized.”
Authors: Sinuo Liu, Chenyang Lyu, Minghao Wu, Longyue Wang, Weihua Luo, and Kaifu Zhang