LLMs are now outperforming traditional machine translation tools with remarkable consistency, so now begs the question, which is the best LLM for translation?
🔖 Bookmark this article
Our product team is continuously performing research and sharing insights on Large Language Models (LLM) for translation, so we’ll update this article regularly. Alternatively, sign up for our newsletter to stay up-to-date on the latest translation and localization trends.
Although it’s still early to say for certain, initial testing from the product team at Lokalise reveals differences in translation accuracy between two of the most popular LLMs, as well as traditional machine translation tools.
When we talk about translation quality, it’s important to note that it’s inherently subjective. Two linguists might evaluate the same translation quite differently based on their personal preferences and interpretations.
So, we designed this experiment to overcome subjectivity, enabling us to draw conclusions we’re confident about.
The results: LLMs vs traditional machine translation
Our research team put five leading translation engines to the test:
- LLMs: Claude Sonnet 3.5 and GPT-4o
- Traditional machine translation: Google Translate, DeepL, and Microsoft Translator
There was a clear winner: Claude Sonnet 3.5 ranked #1 across all languages tested, demonstrating superior performance not just among LLMs but against all translation engines in the study.
How we tested LLMs for translation
To ensure scientific rigor, we used the Bradley-Terry model—one of the most respected statistical methods for ranking items based on pairwise comparisons. This approach allowed us to establish a clear hierarchy of translation quality based on maximum likelihood estimation.
The language pairs we tested
We used Large Language Models to translate from English into three languages:
- English to German
- English to Polish
- English to Russian
Then we asked human annotators to evaluate translation quality through pairwise comparisons of translations from different engines. That means, for every translation, native speakers compared the variants from different engines and highlighted the best one.
600+ pairwise comparisons were carried out by multiple human annotators for each language pair.
📝 Sidenote: Pairwise comparison is often easier for human evaluators to make relative judgments (“A is better than B”).
Why LLMs outperform traditional translation tools
LLMs like Claude Sonnet 3.5 and GPT-4o bring several advantages to translation tasks that traditional machine translation tools don’t:
- Contextual understanding: Unlike traditional translation systems, LLMs grasp the broader context of text, enabling more natural-sounding outputs
- Cultural nuance: These models can better preserve idioms, cultural references, and tone across languages
- Adaptability: LLMs demonstrate greater flexibility when handling specialized terminology or uncommon language patterns
As Large Language Models evolve and new models are released, the gap between LLM and traditional machine translation quality is only going to get bigger.
The time has come for machine translation to move over and Large Language Model translation to fill its bionic boots!
How to choose the right LLM for translation
🗒️ Note: There is already a new version of Claude. It’s Claude Sonnet 3.7 (this one wasn’t tested in the study)
While Claude Sonnet 3.5 emerged as the clear winner in our testing, the best solution may depend on factors such as:
- Language pairs required
- Content type (technical documentation, marketing copy, legal text)
- Integration capabilities (e.g. with a translation and localization platform)
- Budget constraints
With Large Language Models constantly evolving and releasing new versions, it’s hard to stay on top of which translation model to choose depending on your needs.
That’s where Lokalise AI comes in. We pick the best engine for you based on your language pair and content type, so all you need to do is click ‘translate with AI’.
Lokalise also integrates with 60+ modern tools, so you can plug the best LLM translation into your workflow in a matter of minutes.
When it comes to budget, customers have registered savings of up to 80% using Lokalise AI instead of going the traditional linguistic route.
Stay tuned for more LLM translation research and insights
If you’re looking for the highest quality translations, LLM-based solutions excel in automated translation technology.
Remember though that LLMs need context for translation, in the same way that humans need context. That said, LLM-powered translations beat traditional machine translations even without context. Without context, LLM-powered translations were rated “good” 78% of the time.
For even higher accuracy, why not look for a translation platform that already integrates with LLMs like Claude, so you can manage and automate the entire translation process in one place?