The landscape of large language models (LLMs) is changing fast. The pace of releases is accelerating, along with performance and capability advances. 

Models are becoming more multimodal, and covering a wider range of languages. 

The Slator 2024 Language Industry Market Report examines how quickly LLMs are changing and considers the implications for language industry players that build applications on top of these “base” models.

The report includes a one-page timeline of key LLM releases from 2021 to 2024. 

In this article, we home in on the last six months with a recap of ten notable LLM releases that have moved the needle in terms of language AI.

GPT-4o from OpenAI — May 2024

OpenAI describes the newest version of its large language model as a “step towards much more natural human-computer interaction.”

GPT-4o accepts and generates any combination of text, audio, image, and video. Its response time to audio inputs — 232 milliseconds, reportedly —  is relevant for applications that need low latency like live captioning and real-time speech-to-speech. Furthermore, it is already being integrated as the LLM of choice in translation management platforms like Phrase.

OpenAI also cites a range of new and improved capabilities, such as the interpretation of emotions through facial expressions. See Slator’s coverage of the original release here.

LlaMA 3 from Meta — April 2024

You have likely already encountered Meta’s Llama 3 in the wild. The model — described by Meta as “the most capable openly available LLM to date” — is behind the Meta AI assistant that is now embedded in Facebook, Instagram, and WhatsApp.

Llama 3 is optimized for dialogue use cases (i.e., AI assistants) and is English-centric. Meta calls non-English use cases “out-of-scope” but does, however, allow for fine-tuning on languages beyond English if this refinement falls within the terms of its license agreement.

A research paper for Lllama 3 has been promised for “the coming months”.

Gemma from Google — February 2024

Gemma is — compared to its predecessor, Google Gemini— quite little. The model’s weights are available in two sizes: 2B and 7B. 

Lightweight models are faster, cheaper, and easier to use. Google hopes this, along with making Gemma open, will encourage researchers and developers to build more on the company’s AI models.

Gemma shares technical and infrastructure components with Gemini and Google states that Gemma surpasses significantly larger models on key benchmarks. Potential applications include “conversational translators” and “multilingual writing assistants.”

MAIN IMAGE - 2024 Market Report

Slator 2024 Language Industry Market Report — Language AI Edition

The 140-page flagship report features in-depth market analysis, language AI opportunities, survey results, and much more.

Gemini v.1.5 from Google — February 2024

Gemini is Google’s “largest and most capable AI model widely available today.” Originally launched in December 2023, Gemini was Google’s attempt to regain the upper hand in AI, a year on from the launch of ChatGPT by Microsoft-backed OpenAI.

The original release claimed state-of-the-art performance on a range of multimodal benchmarks,” including automatic speech recognition (ASR) and automatic speech translation.

The 1.5 version offers a broader context window (useful, for example, for achieving more contextually relevant machine translation) and has “enhanced performance” on tasks such as analyzing, classifying and summarizing large amounts of content.

Aya from Cohere — February 2024

A number of specialized AI startups have moved into the LLM development space alongside OpenAI and big tech. These include AI labs such as Anthropic, Stability.AI, and Mistral AI, as well as AI platform Cohere. 

Cohere’s Aya model aims to extend AI capabilities beyond English to achieve massive multilinguality. The model is an open-source, massively multilingual LLM covering 101 different languages. 

Marzieh Fadaee, Senior Research Scientist at Cohere, told SlatorCon Remote in March 2024, that, with Aya, Cohere has also created one of the largest datasets for instruction fine-tuning of multilingual models. It is, Fadaee said, a resource that is “particularly valuable for languages with limited representation.”

CroissantLLM from Unbabel — February 2024

CroissantLLM is a French-English LLM from CentraleSupélec, Carnegie Mellon University, and Unbabel

The open-source model was developed to address the lack of models where English is not the dominant training language. “Our end goal is to have a model less skewed towards English performance or cultural biases,” the researchers said

CroissantLLM is designed to be very lightweight, with the goal of encouraging widespread adoption and a reduction of cost and deployment challenges. 

Mixtral 8x7B from Mistral AI — December 2023

Another open-source model, Mixtral 8x7B handles English, French, Italian, German, and Spanish. 

On release, Mistral reported that Mixtral 8x7B was the “the best model overall regarding cost / performance trade-offs” and said it exceeded GPT3.5 on most benchmarks.

In December 2023, researchers from ADAPT Centre used Mistral 7B (an earlier version) to show how fine-tuning can enhance the real-time, adaptive machine translation capabilities of a general-purpose LLM.

Translatotron 3 from Google — December 2023

Translation 3 is another step forward in the fast-moving field of direct speech-to-speech. Google calls it the “first fully unsupervised end-to-end model for direct speech-to-speech translation.”

The third version of Translatotron improves on previous versions in a few ways, most notably in its unsupervised S2ST architecture. 

According to a post from Google researchers, “this method opens the door not only to translation between more language pairs but also towards translation of the non-textual speech attributes such as pauses, speaking rates, and speaker identity,”

The system was also able to “learn” direct speech-to-speech translation from monolingual data alone.

SeamlessM4T v2 from Meta — November 2023

Exemplifying the current trend toward more multimodal models, Seamless M4T is a suite of AI models for both speech and text translations. The model can convert across modes — speech into text, speech into speech, and text into speech — and across languages with text-to-text translations for up to 100 languages.

On release, Meta Product Product Lead Jeff Wang said on X, “We just made speech translation a whole lot better!”

SeamlessExpressive from Meta — November 2023

SeamlessExpressive (a component of SeamlessM4T v2) offers speech-to-speech translation in English, Spanish, German, French, Italian, and Chinese. The original speaker’s pitch, pace and tone are retained in the translated speech.

A further, novel approach to speech mapping was put forward in early June 2024 by Meta AI researchers. A new model — SeamlessExpressiveLM — was instructed to work in sequence, first translating semantic content, and then transfering the speaker’s vocal style. This sequential approach was evaluated in Spanish and Hungarian into English speech translations, with measurable vocal style improvements. 

For a more in-depth analysis of the changing AI model landscape in 2024 and its implications for players in the language industry, obtain a copy of Slator’s 2024 Language Industry Market Report — Language AI Edition.

Source link