In May 2024, researchers emphasized the crucial role that emotions play in human communication and introduced a new dataset designed to enhance speech-to-text and speech-to-speech translation by integrating emotional context into the translation process.

In July 2024, Alibaba incorporated speech emotion recognition (SER) into its FunAudioLLM to retain original emotions in AI-powered interpreting.

Building on this, an August 6, 2024, paper by Charles Brazier and Jean-Luc Rouas from the University of Bordeaux demonstrated how to integrate emotional context into large language models (LLMs) to condition translation and improve quality.

They argue that “conditioning the translation with a specific emotion would use a suitable vocabulary in the translation.”

This research builds on the authors’ previous work, which was the first to explore combining machine translation (MT) models with emotion information. Their earlier study demonstrated that adding emotion-related data to input sentences could enhance translation quality. In this latest study, Brazier and Rouas take the concept further by replacing the MT model used in their prior work with a fine-tuned LLM.

They introduced a pipeline where emotions — such as arousal, dominance, and valence — are embedded into LLM prompts. They utilized a SER model to extract emotional dimensions from audio recordings, which were then incorporated into the LLM’s input prompts to guide the translation process.

MAIN IMAGE - Slator Pro Guide The Future of Language Industry Jobs

Slator Pro Guide: The Future of Language Industry Jobs

This 80-page guide analyzes employment trends in the language services and technology industry.

Notable Improvements

To test this approach, they fine-tuned five LLMs for English-to-French translation and identified the best-performing model, Unbabel’s TowerBase-7B-v0.1, for further experimentation. For each input sentence, the SER model analyzed the corresponding audio to automatically estimate its emotional dimensions, which were then included in the translation prompts.

Brazier and Rous compared translation performance with and without the inclusion of emotional dimensions as extra information added to each input prompt.

According to the authors, the integration of emotional data into the translation process resulted in “notable improvements” in BLEU and COMET scores compared to those without emotion integration, especially when arousal was considered.

The TowerBase-7B-v0.1 model showed the most significant performance gains when emotional context was included, suggesting that incorporating emotional context can lead to more accurate and contextually appropriate translations, especially in scenarios where emotion plays a crucial role.

“Incorporating emotion information into the translation process appears to enhance translation quality,” said Brazier and Rous. They also plan to extend their method to speech translation.



Source link