‘Don’t Be Too Scared of LLMs’, Machine Translation Pioneers Caution at AMTA – slator.com

Since its first conference, in 1994, the Association for Machine Translation in the Americas (AMTA), the North American branch of the International Association for Machine Translation, has aimed to bring together machine translation (MT) researchers and buyers to talk shop and explore the challenges, and opportunities, of the field.

In October 2024, AMTA hosted its biannual conference in Chicago, drawing more than 100 in-person attendees — including Slator — plus over 100 virtual participants for what planners billed as a special occasion.

“This is a propitious year for a conference on MT and related cross-lingual technologies, as we will be marking the 30th anniversary of the first AMTA conference and the 70th anniversary of the first public demonstration of machine translation,” AMTA shared on a conference webpage.

The celebratory mood seemed fitting considering the past two years MT has enjoyed the more general spotlight, outside of the R&D labs and specialized startups where the technology has historically gotten its due.

In a somewhat ironic twist, large language models (LLMs), such as OpenAI’s ChatGPT, have propelled MT into the mainstream spotlight once again. AMTA leaned into any possible tension throughout the conference, with multiple sessions comparing and contrasting the advantages and disadvantages of MT models versus LLMs.

“Even if we now see a push toward LLMs we shouldn’t be too scared about it,” keynote speaker Philipp Koehn, a pioneer in the MT world, reassured attendees during a panel discussion. “We’re the ones who know the problem [of MT] and know how to work on the data and on models.”

The Presidents Panel discussion, moderated by current AMTA President Jay Marciano, featured Koehn and past AMTA presidents Alon Lavie, now VP of AI Research at Phrase, and Steve Richardson, a computer science professor at Brigham Young University. Perhaps unsurprisingly, all disagreed with the proposition that “[a]utomated translation is a solved problem.”

“It’s kind of abstract and naive. What do you mean by solved?” asked Lavie, who served as AMTA President from 2008-2012. “We’re going to see the focus go from generating acceptable, understandable translations to generating the desired translation over the next few years — a major shift.”

“It’s really task-specific. Big companies translate hundreds of billions of words every day, but there’s all kinds of applications for MT. For some it’s OK, it does the job,” said Richardson, AMTA President from 2018-2022, adding that some major systems, such as Google, translate up to almost 200 languages, but “that’s just a drop in the bucket.”

The Elephant in the Room?

Kirti R Vashee tempered expectations of LLMs saving the day for all of MT’s challenges, suggesting that 2023 was a year of overhype followed by disappointment, and citing a Gartner prediction that 30% of GenAI projects will be abandoned after POC by the end of 2025.

Some of the greatest obstacles to the widespread adoption of LLMs for MT include cost, a scarcity of high-quality data for AI training and improvement, and a severe shortage of human expertise with the technology.

In Vashee’s view, the emerging LLM future will require smaller, easy-to-adapt models in order to reach a critical mass; some countries, such as the UAE, India, and South Africa, are already developing such models. Much work remains to be done, however, to address latency, size, and cost issues more generally.

Intento’s Daria Sinitsyna presented an LLM-powered automatic assessment of MT quality, with the goal of signing off on a “perfect” automatic translation that requires no human touch. Her group found that a zero-shot LLM was the fastest, least expensive prompting method, comparable in quality to Chain of Thought (COT), with the reviewer agent being key to achieving higher quality results. GPT-4o showcased a good balance between fewer false alarms and important issues identified.

2024 Cover Slator Pro Guide Translation AI

2024 Slator Pro Guide: Translation AI

The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.

In his keynote, Koehn addressed, more generally, the question on the minds of many attendees (and observers): Is it better to build an MT model from scratch or use an LLM because “they can do everything”?

LLMs are a tempting option thanks to their massive amounts of training data, which includes document-level text, compared to the sentence-level data at the heart of most MT models. According to Koehn, however, accidental parallel data (i.e., bilingual text) in training data is what makes LLMs capable of translation; training an LLM without parallel data would be, he mused, an “interesting effort.”

Beyond adapting LLMs to produce MT, Koehn noted several methods to overcome the high costs associated with LLMs. These include storing data in smaller chunks, which would use less memory and thus require less expensive graphic cards, and knowledge distillation, in which a smaller language model “learns” from the “teacher” LLM.

Another limit thus far has been the imbalance of English training data compared to data from all other languages, coupled with English-only benchmarks, stunting LLMs’ language coverage. Some of Koehn’s most recent work has explored “unlearning” as a method for preventing harmful content generation through languages other than English, an issue that appears to be possible due to the dearth of training data for those other languages.

Human Role in Domain and Content Expertise

For a conference whose target audience is not individual translators, human linguists and subject matter experts figured into many presentations.

“I think humans feel they will be most helpful as experts in domain or content,” Lavie opined during the Presidents Panel, in response to a question about the future of humans in the translation workflow. “Whoever is paying them is not going to be able to trust AI systems to do the right thing 100% of the time, but […] at least in environments where we’re processing a significant amount of content, AI will do 90% and humans will have to provide expertise for the remaining 10%”.

Slator Pro Guide: The Future of Language Industry Jobs

This 80-page guide analyzes employment trends in the language services and technology industry.

Alan K Melby proposed standardized labels on translation output, defined by one key distinction: whether or not the content has been verified by a qualified professional translator (emphasis on qualified professional).

The output of human translators who are not qualified is no longer clearly distinguishable from raw MT, Melby explained, adding that “a few years ago” the distinction might have been much clearer. Today, by contrast, just saying a human was involved does not necessarily help.

On the other hand, Michel Simard, of the National Research Council of Canada, made the case for labeling MT output as AI-generated content.

For more than 60 years, MT has been at the center of AI research, and MT continues to fit most standard definitions of AI. MT techniques and, increasingly, systems are also the same as AI techniques and systems (for example, text generation apps).

Simard agreed, however, that content reviewed by a qualified professional should be labeled as such, adding that a label could also be modified to read that “some” content may have been AI-translated or AI-generated if it includes a mix of raw MT output, pure human translation, and/or MTPE.

Konstantine Boukhvalov, Director of ManpowerGroup Public Sector, pointed out during his presentation on human language technology for the public sector that rather than pit humans against a tool (e.g., GenAI), it makes sense to compare humans using a tool with humans not using the tool.

His business has augmented workflows for public sector clients so that GenAI does the heavy lifting (e.g., MTPE or transcription) followed by a human polishing the final product (e.g., by fact-checking). Oftentimes, the modified human role requires technical training, including a basic understanding of GenAI and prompt engineering. Based on Boukhvalov’s own experience, he does not expect these augmented workflows to result in a reduced workforce.

Source link

Tagged Language Service, Large Language Model, LLM, LLMs, Machine Translation, MT, OpenAI

DANIEL FINCK

localization

manager · Engineer · consultant

+49 (0) 30 54871960

dfinck@loquatics.com

loquatics.com

linkedin.com/in/dfinck/

Berlin, Germany

Get In Touch

DANIEL FINCK

localization

manager · Engineer · consultant

+49 (0) 30 54871960

dfinck@loquatics.com

loquatics.com

linkedin.com/in/dfinck/

Berlin, Germany

Get In Touch

The Elephant in the Room?

2024 Slator Pro Guide: Translation AI

Human Role in Domain and Content Expertise

Slator Pro Guide: The Future of Language Industry Jobs

DANIEL FINCK

localization

manager · Engineer · consultant

+49 (0) 30 54871960

Get In Touch

Login

DANIEL FINCK

localization

manager · Engineer · consultant

+49 (0) 30 54871960

Get In Touch

Login

The Elephant in the Room?

2024 Slator Pro Guide: Translation AI

Human Role in Domain and Content Expertise

Slator Pro Guide: The Future of Language Industry Jobs

Login

Don't need to reset? Login

Forgot Password?