Two recent research papers — one from a team at Alibaba and Xiamen University, the other from Meta AI —- proposed different ways to improve simultaneous AI translation (SiMT): by adapting large language models (LLMs) for real-time use, and by extending translation systems with a lightweight streaming module.
The Alibaba and Xiamen University researchers introduced a method called EAST (Efficient and Adaptive Simultaneous Translation), which enables LLMs to handle streaming input.
To achieve this, the researchers created specially structured training data where source and target text segments are “interleaved” with special markers indicating when the model should “read” more input or “write” part of the translation. This structure helps the LLM learn to translate incrementally, without needing to see the full sentence in advance.
“To the best of our knowledge, this is the first efficient and adaptive LLM-based SiMT method,” the researchers said.
In experiments across eight language pairs, EAST achieved state-of-the-art performance, while maintaining strong results in non-streaming (offline) scenarios. “EAST not only excels in high-quality simultaneous translation but also ensures that the offline translation capabilities are not compromised,” they noted.
The method also performed well beyond sentence level, which the researchers say “highlights its suitability for streaming translation in real-world scenarios.”
The researchers also found that just 10,000 examples per language pair were “sufficient” to train the LLM effectively, suggesting that LLM-based SiMT may be feasible even in lower-resource settings.
However, they acknowledged that “the proposed method assumes an idealized setting where the input is clean and fluent.” They noted that real-world simultaneous translation often involves disfluent or noisy input — especially in speech-to-text use cases — and that EAST has not yet been tested under such conditions.
Moreover, the focus was strictly on text-to-text scenarios and does not extend to simultaneous speech translation, a separate challenge still under active research.
The team has also made code and training data publicly available to encourage further research and development.
Meta Adds Streaming to Existing Systems
Meta AI’s approach takes a different route — adapting existing translation models for real-time use without retraining them.
Their method, AliBaStr-MT (Alignment-Based Streaming Machine Translation), builds on a pre-trained translation model and adds a small module that helps the model decide when to “read” more input and when to “write” the translation.
This read/write module is trained using the model’s own attention patterns — specifically, how it aligns source and target words — to determine which input tokens are necessary for each output token. This avoids the need for manually labeled data or retraining the full model.
What makes the method especially practical is that it doesn’t interfere with the core translation model, allowing teams to keep using their non-streaming models where needed, while enabling real-time capabilities with minimal extra training and computing overhead.
Both methods respond to the growing demand for real-time, low-latency translation. Alibaba’s EAST shows that LLMs can be trained for SiMT with relatively little data, while Meta’s approach demonstrates that existing systems can operate in streaming mode with minimal changes.
Authors:
Alibaba and Xiamen University paper — Biao Fu, Minpeng Liao, Kai Fan, Chengxi Li, Liang Zhang, Yidong Chen, and Xiaodong Shi
Meta AI paper — Zeeshan Ahmed, Frank Seide, Zhe Liu, Rastislav Rabatin, Jachym Kolar, Niko Moritz, Ruiming Xie, Simone Merello, and Christian Fuegen