Top LLMs for translation, tested by Lokalise

LLMs are now outperforming traditional machine translation tools with remarkable consistency, so now begs the question, which is the best LLM for translation?

🔖 Bookmark this article

Our product team is continuously performing research and sharing insights on Large Language Models (LLM) for translation, so we’ll update this article regularly. Alternatively, sign up for our newsletter to stay up-to-date on the latest translation and localization trends.

Although it’s still early to say for certain, initial testing from the product team at Lokalise reveals differences in translation accuracy between two of the most popular LLMs, as well as traditional machine translation tools.

When we talk about translation quality, it’s important to note that it’s inherently subjective. Two linguists might evaluate the same translation quite differently based on their personal preferences and interpretations.

So, we designed this experiment to overcome subjectivity, enabling us to draw conclusions we’re confident about.

The results: LLMs vs traditional machine translation

Our research team put five leading translation engines to the test:

LLMs: Claude Sonnet 3.5 and GPT-4o
Traditional machine translation: Google Translate, DeepL, and Microsoft Translator

There was a clear winner: Claude Sonnet 3.5 ranked #1 across all languages tested, demonstrating superior performance not just among LLMs but against all translation engines in the study.

Curious to learn how AI translation works? Read the linked jargon-free guide.

How we tested LLMs for translation

To ensure scientific rigor, we used the Bradley-Terry model—one of the most respected statistical methods for ranking items based on pairwise comparisons. This approach allowed us to establish a clear hierarchy of translation quality based on maximum likelihood estimation.

The language pairs we tested

We used Large Language Models to translate from English into three languages:

English to German
English to Polish
English to Russian

Then we asked human annotators to evaluate translation quality through pairwise comparisons of translations from different engines. That means, for every translation, native speakers compared the variants from different engines and highlighted the best one.

600+ pairwise comparisons were carried out by multiple human annotators for each language pair.

Curious to learn more? Discover what’s the difference between NLP vs. LLM.

📝 Sidenote: Pairwise comparison is often easier for human evaluators to make relative judgments (“A is better than B”).

Why LLMs outperform traditional translation tools

LLMs like Claude Sonnet 3.5 and GPT-4o bring several advantages to translation tasks that traditional machine translation tools don’t:

Contextual understanding: Unlike traditional translation systems, LLMs grasp the broader context of text, enabling more natural-sounding outputs
Cultural nuance: These models can better preserve idioms, cultural references, and tone across languages
Adaptability: LLMs demonstrate greater flexibility when handling specialized terminology or uncommon language patterns

As large language models (LLMs) evolve and new models are released, the gap between LLM and traditional machine translation quality is only going to get bigger. Today, LLM code translation is increasingly common, which speaks volumes about how far the technology has come.

The time has come for machine translation to move over and Large Language Model translation to fill its bionic boots!

📚 Further reading: Can LLM translate text accurately?

How to choose the right LLM for translation

🗒️ Note: There is already a new version of Claude. It’s Claude Sonnet 3.7 (this one wasn’t tested in the study)

While Claude Sonnet 3.5 emerged as the clear winner in our testing, the best solution may depend on factors such as:

Language pairs required
Content type (technical documentation, marketing copy, legal text)
Integration capabilities (e.g. with a translation and localization platform)
Budget constraints

With Large Language Models constantly evolving and releasing new versions, it’s hard to stay on top of which translation model to choose depending on your needs.

That’s where Lokalise AI comes in. We pick the best engine for you based on your language pair and content type, so all you need to do is click ‘translate with AI’.

Lokalise also integrates with 60+ modern tools, so you can plug the best LLM translation into your workflow in a matter of minutes.

When it comes to budget, customers have registered savings of up to 80% using Lokalise AI instead of going the traditional linguistic route.

Stay tuned for more LLM translation research and insights

If you’re looking for the highest quality translations, LLM-based solutions excel in automated translation technology.

Remember though that LLMs need context for translation, in the same way that humans need context. That said, LLM-powered translations beat traditional machine translations even without context. Without context, LLM-powered translations were rated “good” 78% of the time.

For even higher accuracy, why not look for a translation platform that already integrates with LLMs like Claude, so you can manage and automate the entire translation process in one place?

What is the best LLM for translation? A comparison of top AI translation models

The results: LLMs vs traditional machine translation

How we tested LLMs for translation

The language pairs we tested

Why LLMs outperform traditional translation tools

How to choose the right LLM for translation

Stay tuned for more LLM translation research and insights

Related Posts

Why localization is important for your business

Step-by-step guide: How to translate Excel files

Lokalise + Figma – start translating at the design stage and significantly shorten your time to market

Translation and localization: what’s the difference?

17 key translation software features and more

UI Localization: How to Make Your App Feel Native Everywhere

How to create user-friendly global experiences through UX localization

8 Reasons Why Website Localization is Important (+ Examples)

The results: LLMs vs traditional machine translation

How we tested LLMs for translation

The language pairs we tested

Why LLMs outperform traditional translation tools

How to choose the right LLM for translation

Stay tuned for more LLM translation research and insights

Want the latest scoop on localization and global growth?

Related Posts