To paraphrase Captain Kirk, AI has boldly gone where no machine has gone before: it has finally reached human-level translation quality.
But this doesn’t mean we’re done with AI translation. Far from it. We’re only just getting started, and the possibilities are both endless and exciting.
After decades of clunky, error-prone, and literal-sounding machine translation, we’ve reached a pivotal moment where automated translation matches human quality.
When fed the right context, most AI translations are ready to publish without edits, while the remaining few generally need only light post-editing.
Companies across industries are also signalling that AI translation quality is now at a level they can use, with many shifting to AI-first translation workflows.
📝 In this article, we share findings from testing our own AI translation capabilities, so the AI translation quality data relates to Lokalise’s AI translation system.
From ‘good enough’ to ‘human-level’ AI translation quality
Just a couple years ago, relying on AI for really high quality localization felt nearly impossible. When GPT was released, the translation community met this new wave of LLM-powered AI translation tools with skepticism. They considered them to be as accurate or less accurate than traditional machine translation (MT) tools like Google Translate and DeepL.
Our 2023 survey of 13,000 Lokalise customers reflected this perception:
70.3% of respondents believed that machine translation tools often failed to capture nuances and cultural references in text.
Today, it’s a different story. And the reason for that is the evolution of large language models, like Claude and ChatGPT, because they really do change the game.
They understand context and cultural differences, while capturing brand voice and grammar rules. They also learn from historical translations and maintain consistency, and reason about language much like humans do, offering multiple high-quality translation options instantly.
In a 2024 Lokalise blind-comparison study using over 600 pairwise human evaluations, LLMs achieved ‘good’ ratings in 56–80% of translations across several tested language pairs.
Claude 3.5-Sonnet ranked #1 overall and was preferred in 78% of cases, while DeepL (though still strong) didn’t win any language pair against the best LLMs.
🚀 Claude 4 is now available for AI translation inside Lokalise. This model significantly improves translation quality for rarer and lower-resource language pairs, based on our internal language support tests.
This aligns with international benchmarks: At the 2024 Conference on Machine Translation (WMT24), Claude 3.5-Sonnet won 9 out of 11 tested language pairs as the top performer, with GPT-4 also ranking highly. DeepL was included as a strong traditional baseline but no longer leads, as LLM-based systems now set the standard.
The accuracy of Lokalise’s AI translation system is remarkable:
- Across major language pairs, AI translation acceptance rates consistently exceed 80%
- In controlled studies with native speakers, AI translations are deemed acceptable 78% of the time in blind evaluations
- Custom AI models now exceed 90% acceptance rates — on par with human translation
- Some organizations report acceptance rates as high as 98%
It comes as no surprise that teams are now shifting to LLM-powered tools, integrating their flexibility and contextual capabilities.
We’re delivering human-like translation quality, and seeing it work is amazing. We’re finally at a point where we can deliver high-quality translation at an absolute fraction of the cost, which is transformational.
Adam Soltys, Lead Product Manager at Lokalise
AI translation quality all depends on context
Context is king in the era of large language models. Unlike traditional machine translation tools, LLMs can consume contextual information to deliver more accurate, culturally appropriate translations.
We’ve categorized context into three groups:
1. Historical translations: Previous translations you’ve completed for similar purposes provide valuable context. For example, translations for marketing content establish brand voice consistency, while UI content translations ensure interface terminology remains uniform across the product experience.
2. Linguistic assets: Glossaries and style guides ensure AI translation respects established terminology and brand voice. These assets help LLMs understand not just what to translate, but how to translate it according to specific organizational standards and industry requirements.
3. Text-specific context: This contextual layer is particularly crucial for shorter translations where meaning might be ambiguous. When translating a CTA button for a website, providing context like ‘this is a purchase button in an automotive e-commerce site’ or including a screenshot showing the button’s placement dramatically improves translation accuracy.
Remember: Quality over quantity
The key insight is that context quality matters more than context quantity. Providing large volumes of poor-quality context can lead to poor translations. High-quality past translations, accurate descriptions, and relevant screenshots are essential, but they must be carefully curated.
The level of context used explains why some organizations achieve 98% AI translation acceptance rates while others struggle with basic quality.
How we define and measure translation quality
Translation quality remains inherently subjective, varying based on context, audience, purpose and the evaluator. However, our data reveals clear patterns that prove AI translation consistently meets human-level quality.
We gathered this evidence through two complementary methodologies:
- User testing at scale
The most honest measure of translation quality is customer behavior. Do they actually accept and use AI translations, or do they consistently require extensive editing?
Our analysis of ~40,000 monthly AI translation suggestions across live customer projects reveals an 84% acceptance rate.
These are product managers, content managers, and translators making real decisions about whether AI translations meet their publication standards. When they accept AI suggestions without edits, it signals the translation meets their quality standards for deployment.
Result: 84% acceptance rate on average
- Expert validation
To validate behavioral data, we conducted rigorous evaluations with professional native speakers across three major languages.
Seven reviewers performed 615 pairwise comparisons in blind conditions, evaluating AI translations against human references.
The result: 78% acceptance in expert evaluation — slightly lower than user acceptance, but remarkably consistent.
Our analysis of over 1,000 data points across 30+ major language pairs shows consistent performance above 65% acceptance rates, with most commercially important pairs performing above 75%.
Language pair | Number of texts evaluated | Acceptance rates |
English → Spanish | 79,000 | 83% |
English → French | 65,000 | 84% |
English → German | 37,000 | 87% |
English → Italian | 36,000 | 81% |
English → Portuguese | 36,000 | 88% |
English → Polish | 25,000 | 88% |
English → Chinese | 25,000 | 74% |
English → Russian | 22,000 | 79% |
English → Arabic | 21,000 | 79% |
English → Indonesian | 20,000 | 83% |
Is AI ready for compliance-heavy industries?
Perhaps the most surprising development is AI translation’s rapid adoption in compliance-heavy industries. According to our 2025 Localization Trends Report, finance and healthcare sectors showed dramatic increases in AI translation usage between 2021 and 2024, signaling confidence in AI translation quality even for sensitive content.

AI translation performance in the finance sector: 700% increase
In finance, including banks, insurance, and diversified financials, AI translation usage rose by 700% between 2023 and 2024, while human-only translation decreased by 47% (from 67.7% to 35.8%).
This reflects growing confidence in AI’s ability to handle financial communications accurately.
💰Real story of what’s at stake:
A comma in the wrong place of an US sales contract cost an American defense and aerospace manufacturer (Lockheed) $70 million. The comma was misplaced by one decimal point, adjusting the sales price for changes to the inflation rate. In Europe, commas are used instead of periods to mark decimal points.
Despite these risks, financial institutions are increasingly trusting AI for translation, a testament to the quality improvements we’ve seen.
Current AI translation capabilities in finance:
- Excellent: Earnings summaries, market updates, customer communications
- Improving: Regulatory communications with human oversight
- Still requires human validation: Legal contracts, investor relations, compliance filings
AI translations in the healthcare sector: From zero to high-adoption
Healthcare’s translation transformation is even more striking: from zero machine-assisted translation in 2021 to significant adoption by the end of 2024, with a 27% drop in human translation in 2024 (from 58.9% to 43.1%).
In healthcare, mistranslation can be life-threatening. A misinterpreted dosage instruction or symptom description poses genuine patient safety risks.
🧑⚕️ Real story of what’s at stake:
The misinterpretation of the word ‘intoxicado’ led to a misdiagnosis. The patient said he felt ‘intoxicado’ (meaning nauseated in Spanish), before collapsing. A paramedic thought he was ‘intoxicated’ and the patient spent more than 36 hours waiting until someone could see him. The delay resulted in the rupture of a brain aneurysm.
Yet the 90% accuracy achieved by custom-trained AI models is driving AI translation adoption, particularly for routine healthcare communications.
Current AI capabilities in healthcare:
- Excellent: Appointment confirmations, general health information, procedural explanations
- Improving with oversight: Patient education materials, clinical summaries
- Requires human validation: Patient consent forms, complex clinical documentation, drug labeling
The emerging consensus points toward hybrid approaches: AI handles initial translation of routine content while human medical translators focus on high-risk, complex communications.
The rise in hybrid workflows and AI quality assurance
For compliance-heavy industries, the preferred approach combines AI and human translation, routing only translations below defined quality thresholds to human reviewers. This approach cuts quality assurance costs by up to 80% while maintaining quality standards, proving more effective than either AI-only or human-only approaches.
What’s new? Quality scoring and automatic routing to humans for review
Advanced AI translation platforms now incorporate built-in quality scoring using frameworks like MQM (Multidimensional Quality Metrics), the recognized standard for evaluating AI translation quality.
Here’s how automated quality assessment works:
AI translation quality at Lokalise is assessed on accuracy, fluency, terminology, and style.
- Each translation receives a 0-100 quality score
- Minor issues: -5 points
- Major issues: -25 points
- Critical issues: -75 points
- Scores below defined thresholds (e.g., 80) automatically route to publish or human review
Quality thresholds in practice:
🟢 High score (≥80): Auto-approved or lightly reviewed
🔴 Low score (<80): Flagged for human review
This smart routing system ensures quality while maximizing efficiency, allowing human translators to focus their expertise where it’s most needed.
Why does AI translation quality matter more than ever?
Deadlines are tighter, content volumes are expected to grow 5x in the next three years, and stakeholders expect faster delivery at lower costs with fewer hands on deck. Meanwhile, the pressure to prove ROI is at an all-time high.
At the same time, customer expectations have shifted. Global audiences now expect seamless, culturally appropriate communication that feels native to their market. Poor translation quality doesn’t just impact customer experience — it damages brand credibility and customer trust.

This elevated expectation makes AI translation quality improvements particularly valuable. But it’s not merely about advanced AI translation. It’s knowing how to centralize AI and recognizing that value comes from intelligent integration combined with human expertise.
Success depends on it. Organizations that implement the best-performing LLMs (GPT, Claude, and emerging models) while understanding how to customize them for localization through smart routing and context augmentation will deliver consistent, high-quality multilingual experiences without proportional increases in localization costs or time-to-market delays.
The human-level quality of AI translation represents one of the most significant developments in global business communication.
Now it’s up to organizations to master the strategic integration of human-level AI translation to dramatically drive international growth and customer engagement.
Ready to explore what’s next in AI translation? Tune into our AI series to discover the emerging trends and technologies that will shape the future of global communication.