Google, Unbabel Expand Key AI Translation Benchmark to 55 Languages with WMT24++ – slator.com

Researchers from Google and Unbabel have unveiled WMT24++, a major expansion of the WMT24 machine translation (MT) benchmark, extending its language coverage from 9 to 55 languages and dialects. The dataset now includes human-written reference translations and post-edits for 46 additional languages, as well as new post-edits of the references for 8 of the original […]
Google Expands Low-Resource AI Translation with SMOL Dataset – slator.com

On February 17, 2025, Google released SMOL (Set of Maximal Overall Leverage), a dataset translated by professional translators aimed at improving machine translation (MT) for 115 low-resource languages (LRLs). SMOL consists of two components: SMOLSENT, a collection of 863 English sentences translated into 81 languages, and SMOLDOC, a dataset of 584 English documents translated into […]
Meta’s BOUQuET Benchmark Brings Linguistic Diversity to AI Translation Evaluation – slator.com

On February 6, 2025, Meta unveiled BOUQuET, a comprehensive dataset and benchmarking initiative aimed at improving multilingual machine translation (MT) evaluation. This development aligns with Meta’s ongoing efforts to source diverse AI translation data through collaborative partnerships. The researchers noted that existing datasets and benchmarks often fall short due to their English-centric focus, narrow range […]