In a June 12, 2024 paper researchers from Tencent AI and the Harbin Institute of Technology introduced TasTe, a method for teaching large language models (LLMs) to translate through self-reflection.

The key idea is to enable LLMs to generate preliminary translations (i.e., drafts), self-evaluate their own translations, and make refinements based on the evaluation.

The researchers explained that LLMs have shown exceptional performance across various natural language processing tasks, including machine translation (MT). However, their translations still do not match the quality of supervised neural machine translation (NMT) systems.

To address this, the authors proposed the TasTe framework (translating through self-reflection), which improves the translation capabilities of LLMs by incorporating a self-reflection process. 

This process consists of two stages. In the first stage, LLMs are prompted to generate preliminary translations (i.e. drafts) while simultaneously making quality predictions for these translations. The quality predictions can be in the form of labels like “good,” “medium,” and “bad” or scores ranging from 0 to 100. This self-assessment step allows the models to evaluate the quality of their own outputs.

10 LLM Use Cases (Main Title)

Slator Pro Guide: Translation AI

The Slator Pro Guide presents 10 new and impactful ways that LLMs can be used to enhance translation workflows.

In the second stage, LLMs refine these preliminary translations based on the quality predictions in the first stage to produce final translations. According to Xuebo Liu, Assistant Professor at Harbin Institute of Technology, speaking to Slator, low-quality drafts with severe errors undergo extensive modifications, medium-quality drafts with minor errors receive moderate adjustments, and high-quality drafts with minimal or no errors require little to no changes. “By equipping models to tailor their modifications to the draft quality, we effectively rectify conspicuous errors and prevent the misguidance of error propagation that could otherwise compromise originally accurate translations, thereby safeguarding the overall translation quality,” he added.

This entire process can be seen as a form of self-reflection, mirroring the common “try-evaluate-improve” approach humans use when handling complex tasks to execute them more effectively, Liu said.

Automatic Post-Editing Tool

They evaluated TasTe in four language directions (German </> English and Chinese </> English) using the WMT22 benchmark. They found that TasTe outperformed existing methods by effectively utilizing the self-assessment to enhance translation quality. 

Additionally, they tested if this approach could be used to evaluate translations generated by other systems and refine them as an automatic post-editing (APE) tool. They found that “TasTe can not only serve as an effective inference framework for a single LLM but also as an APE tool to enhance translations generated by other translation systems.”

The authors provide their code and datasets for further research at GitHub.

Authors: Yutong Wang, Jiali Zeng, Xuebo Liu, Fandong Meng, Jie Zhou, Min Zhang



Source link