While large language models (LLMs) are quickly replacing neural machine translation (NMT) models, as Unbabel’s CTO João Graca mentioned in a recent Slator podcast, in certain niche fields NMT is holding out.

Following a December 2023 study from Logrus Global, Ocean Translations, and the University of Manchester, which found that fine-tuning small-sized language models in the clinical domain produces significantly better translations than LLMs, a new study was published on July 26, 2024.

In this latest study, Bunyamin Keles, Murat Gunay, and Serdar Caglar from AI Amplified, an AI research and infrastructure company specializing in training AI models, further explored the power of tailored NMT models in medical translation. Specifically, the AI Amplified team developed small NMT models tailored for medical texts, using the MarianMT base model.

Diverging from the December 2023 study, they incorporated LLMs in the loop to create synthetic training data. “We’ve observed that LLMs are particularly effective at generating synthetic data, which has proven invaluable for training our models,” said Murat Gunay talking to Slator. Their models were trained on both synthetic and real medical data sourced from scientific articles, clinical documents, and other medical texts, and are available in six languages: English </> German, Turkish, French, Romanian, Spanish, and Portuguese.

The authors argue that their LLMs-in-the-loop approach, combined with fine-tuning on high-quality, domain-specific data, enables these specialized NMT models to outperform general-purpose models and even some of the leading LLMs.

They pointed out that models with more parameters do not necessarily yield better quality scores, stressing that the quality of the data and the fine-tuning process are often more important than model size alone. “LLMs may not necessarily be better [than NMT], and […] the quality of the data set and training is also essential,” they emphasized.

Small Specialized Models Outperform LLMs

The authors compared the translation quality of their models against Google Translate, DeepL, and GPT-4-Turbo across all language pairs. For the English-to-German medical translation model, they extended the comparison to include Claude-3.

Their models outperformed Google Translate, DeepL, and GPT-4-Turbo across multiple automatic evaluation metrics, including BLEU, METEOR, ROUGE, and BERT, as well as through evaluation by ChatGPT and Claude AI as “impartial judges.” They opted for automatic and LLM-based evaluations over human evaluation “to mitigate time and cost constraints” while still obtaining “valuable insights into translation quality.”

“Analysis […] demonstrates that our models achieve highly satisfactory and statistically significant results,” they said, though they remain committed to continually improving their datasets and models to achieve even higher performance scores.

To this end, they also highlighted the need for “more shared open-source benchmark test data. ” In an effort to standardize evaluation in this domain, they introduced a new medical translation test dataset.

Their models are available for testing on their website, where users can explore demo translations and witness the models’ capabilities firsthand.

MAIN IMAGE - Slator Pro Guide The Future of Language Industry Jobs

Slator Pro Guide: The Future of Language Industry Jobs

This 80-page guide analyzes employment trends in the language services and technology industry.

Zero-Error Medical Translation

The author’s primary objective was to achieve “zero-error translation of medical texts,” recognizing the potential risks that mistranslations can pose in healthcare settings. “A mistranslation between patient and physician can jeopardize patient safety,” they said.

Despite the availability of some medical translation models in various languages, they pointed out that “there is still a great need for medical text translation models,” given the “continued demand for high-end translation services,” in the medical field.

They also stressed that medical translation is “crucial” for bridging communication gaps, underscoring the “indispensable” role of machine translation in healthcare.

Designed for use by healthcare professionals and various stakeholders, these models aim to “significantly contribute to the global health community,” paving the way for “improved knowledge dissemination and better healthcare outcomes.”

“This research […] paves the way for future healthcare-related AI developments,” the authors concluded.



Source link