Sony Adds New Languages to Speech Translation Systems Without Retraining – slator.com

LOQUATICS NEWS READER

[ad_1]

In a September 17, 2024 paper, researchers from Sony and Carnegie Mellon University presented a method that allows speech translation (ST) systems to support more language pairs without the need for expensive and time-consuming retraining on new datasets.

Traditionally, ST systems can only translate language pairs they have been specifically trained on, the researchers explained. When it comes to adding new languages, the standard approach requires retraining the model using both existing and new datasets.

The researchers proposed a solution that allows for the “merging” of a model trained on new language pairs with an existing one, eliminating the need for retraining. Model merging is an approach to directly manipulate the parameters of pre-trained models to create new models capable of performing additional tasks without the need for extensive retraining.

At the heart of this approach is the concept of representing each language as a “task vector.” Each task vector corresponds to the model parameters that have been fine-tuned for a specific language. By developing techniques to arithmetically combine these task vectors, the researchers enable the mixing and matching of capabilities from different pre-trained models. This effectively combines knowledge from both the existing model and the newly trained one, significantly expanding the system’s language capabilities.

The researchers emphasized that by training an ST model solely on the new language pair and integrating it with an existing pre-trained ST model, they can substantially reduce training costs.

2024 Cover Slator Pro Guide Translation AI

2024 Slator Pro Guide: Translation AI

The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.

However, they also noted a challenge: when this method is applied directly, it can lead to language confusion, where the system may translate into the wrong language. To address this issue, the researchers introduced a language control (LC) model, which ensures that the merged system accurately follows instructions and translates into the correct target language.

Unlocking Unsupported Languages

The researchers tested their method on two popular datasets, MuST-C and CoVoST-2. They saw notable improvements in performance, with BLEU score gains of up to 4.66 on MuST-C and 4.92 on CoVoST-2.

Additionally, the researchers showed that their method can be used to add new language pairs even when there is no existing ST training data or pre-trained ST models available for those languages. They do this by creating a new ST system using insights from existing machine translation (MT) systems. Then, they merge this new ST system with an existing one, enabling support for previously unsupported languages.

Authors: Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, and Shinji Watanabe

[ad_2]

Source link

News provided by