In a February 20, 2025 paper, researchers Danni Liu and Jan Niehues from the Karlsruhe Institute of Technology proposed a way to improve how large language models (LLMs) perform across different languages.
They explained that LLMs like Llama 3 and Qwen 2.5, show strong performance in tasks like machine translation (MT) but often struggle with low-resource languages due to limited available data. Current fine-tuning processes do not effectively bridge the performance gaps across diverse languages, making it difficult for models to generalize effectively beyond high-resource settings.
The researchers focus on leveraging the middle layers of LLMs to enable better cross-lingual transfer across multiple tasks, including MT.
LLMs consist of multiple layers. The early (or bottom) layers handle basic patterns like individual words, while the final (or top) layers focus on producing a response. The middle layers play a key role in capturing the deeper meaning of sentences and how different words relate to each other.
Liu and Niehues found that these middle layers “exhibit the strongest potential for cross-lingual alignment,” meaning they help ensure that words and phrases with similar meanings are represented in a comparable way across languages. Strengthening this alignment helps the model transfer knowledge between languages more effectively.
By extracting embeddings (i.e., representations of text in vector form) from the model’s middle layers and adjusting them so that equivalent concepts are closer together across languages, the researchers aim to improve the model’s ability to understand and generate text in multiple languages.
Alternating Training Strategy
Rather than relying solely on task-specific fine-tuning, they introduce an “alternating training strategy” that switches between task-specific fine-tuning (e.g., for translation) and alignment training. Specifically, an additional step — middle-layer alignment — is integrated into the fine-tuning process to ensure that the representations learned in one language are more transferable to others.
Tests showed that this method improved both translation accuracy and performance across both high-resource and low-resource languages. Liu and Niehues noted that the models were also able to generalize their performance to languages not included in the initial alignment training.
One significant advantage of this method is its modular nature: “task-specific and alignment modules trained separately can be combined post-hoc to improve transfer performance” without requiring full model retraining. This makes it possible to improve existing models with enhanced multilingual capabilities while avoiding the high computational costs of retraining from scratch.
Additionally, this approach is faster and more cost-effective since “a few hundreds of parallel sentences as alignment data are sufficient.”
The researchers have made the code available on GitHub, allowing others to implement and test their approach.