The ICU message format is widely used across translation software and i18n libraries to structure source messages clearly. If you’ve ever engaged in software localization for a project, you’ve likely encountered it.
The syntax is intuitive, using curly braces for placeholders and arguments, but it can get confusing since different tools often support different subsets of the format. Understanding the role of software internationalization is essential for leveraging the full potential of the ICU message format in your projects.
In this guide, we’ll break down the ICU format and explore how to use it effectively for localization, with practical examples to make it all click.
What is ICU?
ICU stands for International Components for Unicode. According to the official docs, it’s a set of libraries providing tools for globalizing software systems.
Originally designed for C/C++ and Java, ICU has expanded to other languages like JavaScript. It’s known for being portable and delivering consistent results across different platforms.
Many i18n frameworks rely on the ICU message format to handle translations, and we’ll dive into some of them in this article. Beyond basic translations, ICU also supports advanced features like plural rules and selection logic, making it ideal for complex localization needs. Efficient translation management can optimize the use of the ICU format, ensuring accurate and timely updates.
Modules in ICU library suite
The ICU library offers a range of modules to support different internationalization needs. We won’t dive into every single one, but here’s a breakdown of the key components you’ll likely use when developing i18n software:
Strings, properties, and CharacterIterator
This core module provides Unicode support for:
- Strings: Directly supported by the ICU API.
- Properties: Includes C definitions, functions, and some macros.
- String iteration: Lets you navigate forward and backward through Unicode characters, returning either the characters themselves or their index values.
Conversion basics
Used to convert text between different encoding types. It handles transformations between Unicode and non-Unicode encodings. ICU’s converter API supports all major encodings and offers advanced features like fast text conversion and customizable callbacks to manage invalid or unmapped sequences.
Locales and resources
This module handles everything related to locales—which represent a group of users with similar language and cultural expectations. Each locale can contain multiple attributes like language, script, country code, and more.
Date/Time services
ICU uses a scalar value called UDates
to represent dates and times, independent of time zones or calendar systems. The module includes four main classes: Calendar
, GregorianCalendar
, TimeZone
, and SimpleTimeZone.
Formatting and parsing
This handles most of the formatting work required for localization. It supports currency, date, and number formatting, along with text display and complex message formats like pluralization and selection rules.
Libraries that support the ICU message format
The ICU message format is part of the formatting and parsing module in the ICU library. It’s a powerful and flexible module that supports various formatting methods across many programming languages.
Due to the importance of localizing software, a lot of i18n libraries have adopted the ICU message format. Here’s a quick rundown of some key libraries that use it:
- C/C++ — ICU4C (version 68.1) is the full implementation of ICU in C/C++.
- Java — ICU4J (version 68.1) is the Java version of the ICU library.
- JavaScript — There’s no native ICU implementation, but you can use third-party libraries like Angular’s i18n module, Globalize, and react-i18n.
- PHP — Symfony is a popular PHP library with built-in support for ICU formats.
- Python — PyICU is a Python wrapper that provides ICU functionality.
For managing translations in larger i18n projects, tools like Lokalise are super useful. Lokalise is an all-in-one translation management platform that fully supports the ICU format. It’s got features for handling filenames, language codes, collaborative access for translators, and customizing the process for uploading and downloading translations.
Practical usage of the ICU message format
Let’s dive into using the ICU message format and see some practical examples. For this section, we’ll use YAML files to store translations and JavaScript to implement our scenarios.
We’ll be using the format()
function from the i18next library. This function follows a format(value, format, locale)
signature and returns the formatted message string.
Normal text translation
First, let’s translate some simple text between different locales. We’ll create a YAML file for each locale (English, Chinese, and Arabic) to store the translated messages.
Here’s how the content looks in each file:
messages_en.yaml
"Welcome_to_the_tutorial": "Welcome to the tutorial"
messages_ar.yaml
"Welcome_to_the_tutorial": "مرحبًا بك في البرنامج التعليمي"
messages_zh.yaml
"Welcome_to_the_tutorial": "欢迎使用本教程"
With the translation files set up, we can create a basic formatting file in JavaScript:
formatting.js
format('Welcome_to_the_tutorial')
Pretty straightforward, right? Now, let’s move on to handling pluralization with the ICU format in the next section.
Pluralization
Pluralization is a key feature of the ICU message format, making it easy to handle different text forms based on numeric values. This is powered by the CLDR (Common Locale Data Repository), which defines plural rules for different languages to ensure correct text forms for each target language.
For example, let’s say we want to display a sentence like:
“I bought one book” or “I bought <number of books> books,” depending on the count. First, we’ll create YAML files for each locale with the appropriate CLDR-based plural rules.
Translation files:
messages_en.yaml
booksCount: > I bought {n, plural, one {# book} other {# books}}
messages_ar.yaml
booksCount: > اشتريت {n, plural, one {# الكتاب} other {# الكتب}}
messsages_zh.yaml
booksCount: > 我买了 {n, plural, one {# 书} other {# 图书}}
Now that we have the translation files set up, we can use the format() function to pass in the count value:
formatter.js
format('booksCount', { n: 3 });
For English, this outputs: “I bought 3 books”.For Chinese, it would display: “我买了3书”.
When uploading translation files with ICU plurals to Lokalise, make sure to enable the Detect ICU plurals option in the upload settings.
This ensures that the plural keys are correctly recognized and you can provide translations for each form.
Interpolation
The ICU message format makes it easy to handle dynamic text using interpolation. For example, let’s say we want to display the sentence:
“When I left home, my age was <the_age>,” where <the_age> is a variable value.
To implement this, we’ll create one JSON file per locale (messages_en.json
, messages_ar.json
, messages_es.json
, and messages_zh.json
) and update our formatter.js
file to use the format()
function.
Translation files:
messages_en.json
{ "left_home_age": "When I left home my age was, {age}" }
messages_ar.json
{ "left_home_age": "عندما غادرت المنزل كان عمري, {age}" }
messages_zh.json
{ "left_home_age": "当我离开家时,我的年龄是, {age}" }
Now let’s add the format function in formatter.js
:
format('left_home_age', { age: 21 });
This function will output:
- English message: “When I left home, my age was 21.”
- Chineese message: “当我离开家时,我的年龄是, 21.”
See how easy it is to handle interpolation with the ICU message format? It’s all about substituting values into the message strings using the correct argument names.
In the next section, we’ll cover conditional selection for customizing messages based on different scenarios.
Conditional selection using select
The ICU message format also supports conditional text selection, making it easy to handle scenarios like gender-based pronouns. Let’s say we want to display the following text:
“Hello, Your friend <friend’s name> is now online. <She/He/They> added a new image to the system.”
Here, we need to show the correct pronoun based on the friend’s gender. No worries—ICU’s select
arguments are built just for this.
Translation files:
messages_en.yaml
friend_add_image: > Hello, Your friend {friend} is now online. {gender, select, female {She} male {He} other {They}} added a new image to the system.
messages_ar.yaml
friend_add_image: > مرحبًا ، صديقك {friend} متصل الآن. {gender, select, female {هي} male {هو} other {أنهم}} أضاف صورة جديدة إلى النظام.
messages_zh.yaml
friend_add_image: > 您好,您的朋友{friend}现在在线。 {gender, select, female {她} male {他} other {他们}} 向系统添加了新映像。
With these translations in place, let’s update our formatter.js file:
format('friend_add_image', { friend: 'Ann', gender: 'female' });
If the selected locale is English, the output will be:
“Hello, Your friend Ann is now online. She added a new image to the system.”
Simple, right? This same pattern works for any content that needs conditional rendering based on variables like gender, role, or status. Using select arguments ensures your message strings are flexible and adapt to the correct context across different target languages.
Number formatting
The ICU message format supports number formatting for two main use cases: currency and percentage formatting. Let’s see how to set these up in an i18n application.
For example, suppose you need to display the sentence:
“They could achieve a 70% success rate in the project.”
We’ll use ICU’s number
syntax to format this as a percentage.
Translation files:
messages_en.yaml
success_rate: They could achieve {n, number, percent} success rate in the project.
messages_zh.yaml
success_rate:他们可以在项目中获得{n, number, percent}成功率。
messages_ar.yaml
success_rate:يمكنهم تحقيق {n, number, percent} معدل نجاح في المشروع.
Next, update your formatter.js
file like this:
format('success_rate', { n: 0.7 });
For English, this outputs:“They could achieve 70% success rate in the project.”
Currency formatting
You can also format currency values using the same syntax by specifying a currency type. Different libraries handle this slightly differently. Here’s a quick rundown of how you can set it up depending on the library:
- ICU4C (C++): Use the
NumberFormat.setCurrency()
method. - ICU4C (C API): Set the currency code via
unum_setTextAttribute()
. - ICU4J (Java): Use
NumberFormat.setCurrency()
for currency formatting.
This approach ensures you get the correct default format for each target language based on locale-specific settings.
Date/Time formatting
ICU provides four predefined date formats: short, medium, long, and full. If you want to display a date like:
“I entered university on 19/02/2017,” you just need to choose the right date format in your translation files.
Here’s how to set it up:
Translation files:
messages_en.yaml
enter_university : I entered university on {uni_date, date, short}
messages_zh.yaml
enter_university : 我从上大学 {uni_date, date, short}
messages_ar.yaml
enter_university : دخلت الجامعة {uni_date, date, short}
Next, update your formatter.js file like this:
format('enter_university', { uni_date: new Date('2019-01-01') });
For English, this would display as:
“I entered university on 1/1/2019” (using the short
format).
That’s how easily ICU can handle date formatting according to your application’s needs.
Summary
In this article, we explored how to use the ICU message format to localize software applications. The ICU format simplifies localization by offering support for:
- Basic text translations
- Conditional text (using
select
) - Interpolation
- Number formatting (percentages and currencies)
- Date formatting
With its powerful features and wide adoption, ICU is a solid choice for handling complex localization requirements across multiple target languages.