In a July 29, 2024 paper, researchers from Apple and the University of Southern California introduced a new approach to addressing gender bias in machine translation (MT) systems.

As the researchers explained, traditional MT systems often default to the most statistically prevalent gender forms in the training data, which can lead to translations that misrepresent the intended meaning and reinforce societal stereotypes. While context sometimes helps determine the appropriate gender, many situations lack sufficient contextual clues, leading to incorrect gender assignments in translations, they added.

To tackle this issue, the researchers developed a method that identifies gender ambiguity in source texts and offers multiple translation alternatives, covering all possible gender combinations (masculine and feminine) for the ambiguous entities. 

“Our work advocates and proposes a solution for enabling users to choose from all equally correct translation alternatives,” the researchers said.

For instance, the sentence “The secretary was angry with the boss.” contains two entities — secretary and boss — and could yield four grammatically correct translations in Spanish, depending on the gender assigned to each role.

The researchers emphasized that offering multiple translation alternatives that reflect all valid gender choices is a “reasonable approach.”

Unlike existing methods that operate at the sentence level, this new approach functions at the entity level, allowing for a more nuanced handling of gender-specific references. 

The process begins by analyzing the source sentence to identify entities (such as nouns or pronouns) with ambiguous gender references. Once identified, two separate translations are created: one using masculine forms and another one using feminine forms. The final step integrates these translations into a single output that maintains the grammatical integrity of the target language.

To generate these translations, fine-tuned MT models or large language models (LLMs) can be employed.

Seamless Integration with MT models

The researchers highlighted that, when combined with a proper user interface their approach allows translators to select the correct gender for each entity. “Our key technical contribution is a novel semi-supervised solution for generating alternatives that integrates seamlessly with standard MT models,” they explained.

This solution not only facilitates new translation interfaces with precise gender control but also aids human translators by automatically identifying ambiguities and suggesting alternative translations, they added.

To encourage further research, the researchers open-sourced training and test datasets for five language pairs: English > German, Spanish, French, Portuguese, Russian, and Italian.

Looking ahead, they plan to explore other genderless source languages, such as Chinese, Korean, and Japanese, and the unique challenges they present. They also aim to extend their approach to include non-binary and gender-neutral forms.

Authors: Sarthak Garg, Mozhdeh Gheini, Clara Emmanuel, Tatiana Likhomanenko, Qin Gao, and Matthias Paulik



Source link