In a May 19, 2025 paper, Huawei researchers proposed a new method to combine the strengths of traditional AI translation systems and large language models (LLMs), aiming to improve translation quality while reducing the latency and computational cost typically associated with LLMs.

They noted that while traditional AI translation systems and LLMs perform similarly on straightforward content, LLMs often outperform traditional systems on complex, idiomatic, or informal texts. 

Huawei’s method seeks to capitalize on this by using LLMs only when they are likely to outperform traditional AI translation systems.

They explained that previous approaches mainly use quality estimation (QE) to assess output quality and then call on an LLM when the quality is deemed insufficient. However, Huawei argues that this approach can be inefficient, as it involves running both systems for each sentence without confirming whether the LLM output is actually better.

Instead, the researchers propose a new approach that relies solely on features of the source sentence to decide whether to use traditional AI translation systems or LLMs. “Our method directly decides whether to call the LLM or NMT based on the input source text,” they said.

While they acknowledged this approach was “challenging,” they found that just two indicators — sentence complexity and translation domain — were sufficient to make a “sound decision.”

2024 Cover Slator Pro Guide Translation AI

2024 Slator Pro Guide: Translation AI

The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.

Complementary Strengths

At the core of their method is a trained classifier that analyzes each source sentence and predicts whether the LLM is likely to produce a better translation than traditional AI translation systems. This decider operates without needing to generate or compare translations beforehand. During inference, it routes each sentence to the most suitable system, using the LLM only when its output is likely superior.

This selective strategy resulted in an average LLM usage rate of around 25% across test sets in Chinese–English, English–Chinese, Japanese–English, and German–English while matching or outperforming full-LLM translation quality, according to the researchers. On literary test sets, where LLMs excel, it relied more heavily on the LLM. On technical content, where traditional AI translation systems perform better, it defaulted to the traditional AI translation system.

The Huawei researchers emphasized that this method delivers benefits only when the two systems have complementary strengths. “If the NMT performs equally as LLM,” they said, “we see no improvement after integration.”

They also noted that the method generalized well to new domains and even worked effectively with fine-tuned models. The researchers plan to release the test sets to support further research.

Authors: Zhanglin Wu, Daimeng Wei, Xiaoyu Chen, Hengchao Shang, Jiaxin Guo, Zongyao Li, Yuanchang Luo, Jinlong Yang, Zhiqiang Rao, and Hao Yang



Source link