Large language models (LLMs) have changed the game for machine translation (MT). LLMs vary in architecture, ranging from decoder-only designs to encoder-decoder frameworks.
Encoder-decoder models, such as Google’s T5 and Meta’s BART, consist of two distinct components: an encoder and a decoder. The encoder processes the input (e.g., a sentence or document) and transforms it into numeral values that represent the meaning and the relationships between words.
This transformation is important because it allows the model to “understand” the input. Then, the decoder uses the information of the encoder and generates an output, such as a translation of the input sentence in another language or a summary of a document.
As Sebastian Raschka, ML and AI researcher, explained, encoder-decoder models “are particularly good at tasks where there is a complex mapping between the input and output sequences and where it is crucial to capture the relationships between the elements in both sequences” — such as translating from one language to another or summarizing long texts.
In contrast, decoder-only models, like OpenAI’s GPT family models, Google’s PaLM, or Meta’s Llama, consist solely of a decoder component. These models generate an output based on the input by predicting the next word or character in a sequence based on the previous words or characters, without the need for a separate encoding step.
While they may struggle with understanding complex input structures or relationships, as encoder-decoder models do, they are highly capable of generating fluent text. This makes them particularly good at text generation tasks — like completing a sentence or generating a story based on a prompt.
Strengths and Weaknesses
Researchers have explored the strengths and weaknesses of these architectures. A study published on September 12, 2024, evaluated encoder-decoder and decoder-only models in multilingual MT tasks, focusing on Indian regional languages such as Telugu, Tamil, and Malayalam. In this study, mT5, known for its “robust multilingual capability”, was used as the encoder-decoder example, while Llama 2 served as the decoder-only counterpart.
The results showed that encoder-decoder models generally outperformed their decoder-only counterparts in translation quality and contextual understanding. However, decoder-only models demonstrated significant advantages in computational efficiency and fluency.
This led the researchers to conclude that both architectures have distinct strengths, contributing insights into the effectiveness of different model types in the evolving landscape of MT.
The study’s primary goal was “to advance the field of machine translation, contributing valuable insights into the effectiveness of different model architectures,” according to the researchers.
Yet, other studies suggest that decoder-only models, when properly fine-tuned, can match or even surpass state-of-the-art encoder-decoder systems.
2024 Slator Pro Guide: Translation AI
The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.
Research from 2023 and 2024 highlighted the advantages of the decoder-only structure over the encoder-decoder one. Researchers pointed out that without a separate encoder, decoder-only models are easier to train since they can efficiently process large datasets by directly concatenating documents. Additionally, their unsupervised pre-training approach allows them to leverage readily available training data, unlike encoder-decoder models, which require paired text inputs.
The researchers of the latter study, published on September 23, 2024, concluded that “the flexibility and the simpler training setup of decoders should make them both more suitable and efficient for most real world applications,” with the decoder-only architecture being “more appropriate to answer the ever-growing demand for iterative, interactive and machine assisted translation workflow.”