A complete guide to ICU message format

The ICU format is a widely used message format in numerous translation software systems and i18n libraries. It provides a clear view of the expected data in the source messages. If you have ever localized a software project, you have most likely used the ICU message format. Even though the syntax of this format is intuitive, there are several complications that occur because different i18n applications use different ICU format subsets.

{
    "message": "Simple ICU Messages",
    "text": "Just write text without any variables"
}

So, in this article, we are going to discuss the ICU format and work to understand how we can use it in localization.

What is ICU?

First, let’s take a look at what ICU means. According to the official documentation, ICU stands for International Components for Unicode. It is a set of libraries that provides globalization support for the internationalization of software systems. These libraries are basically C/C++ and Java libraries. ICU is portable and gives the same results on all platforms, including C/C++ and Java platforms.

Although initially ICU support was limited to C/C++ and Java, now it is expanded to the other programming languages including JavaScript. Many i18n frameworks provide support for the ICU messaging format and we will talk about those frameworks in this article.

Besides translating messages, ICU provides lots of other useful functionalities, such as advanced pluralization and selection rules. 

Modules in ICU library suite

There are a number of modules in ICU that provide different and useful functionalities. You can get a clear view of all the ICU modules if you read the official documentation. For now, let’s discuss some of the most important components of ICU that will be useful in developing i18n software systems.

Strings, Properties, and CharacterIterator

This is one of the basic modules in the ICU library. With this component, it provides Unicode support for:

  • Strings, which are directly supported by ICU API functions.
  • Properties such as, C definitions and functions, as well as some macros.
  • String iteration, which allows you to iterate forward and backward over Unicode characters and return the Unicode characters or their corresponding index values.

Conversion basics

This is used to convert text from one encoding type to another. It transforms the encoding code page to Unicode and back. In brief, it converts content from Unicode to non-Unicode characters and vice versa. ICU’s converter API supports all major encodings. Other than this, there are some additional important functionalities such as fast text conversion, callbacks for handling, and substituting invalid or unmapped byte sequences.

Locales and resources

This module handles the functionalities relevant to the concepts of the locale. The locale is a specific user community or a group of users who have similar culture and language expectations. The locale can have one or more piece of ordered information such as script code, country code, and variant code. You can find all the valid script codes at the official docs.

Date/Time services

Dates and times are represented with UDates in ICU. It is a scalar value that shows the time without depending on the time zone or the calendar system. The time given is the number of milliseconds which has passed from a specific time called an epoch. ICU has four classes which are related to time and date: Calendar, GregorianCalendar, Time Zone, and Simple Time Zone.

Formatting and parsing

This module handles the majority of the formatting work that is required in i18n software systems. As an example, it handles currency formatting, date formatting, and number formatting in order to make the localization process easier. Also, it focuses on the text displaying and the message formats, including pluralization and selection as well.

ICU message format

The ICU message format is a part of the formatting and parsing module in the ICU library. It is a powerful and flexible module supporting a number of formatting methods in numerous programming languages. 

A large number of libraries that support i18n have been developed due to the importance of localizing and internationalizing software systems. The majority of these libraries support the ICU messaging format; here is a list of libraries that do so:

  1. C/C++  —  ICU 68.1 is known as ICU4C and is a complete implementation of ICU.
  2. Java — ICU4J 68.1 is a complete implementation of ICU in Java.
  3. JavaScript — JavaScript does not have any 1st party implementation of ICU. But, there are plenty of third-party ICU libraries such as Angular built-in I18n module, Globalize, react-i18n, and others.
  4. PHP — Symfony is a third-party library that supports the ICU format in PHP.
  5. Python — PyICU is the Python wrapper that supports the ICU format.

Other than these libraries, there are some other important software tools you should know about if you are developing an i18n application. One of those tools is Lokalise. It is an all-in-one software localization and translation management platform. Lokalise fully supports the ICU format. You can find detailed descriptions on how Lokalise handles filenames and language codes in uploaded files in the uploading translations article, and also how it helps with downloading translation data to your PC and customizing the process in the downloading translations article.

Practical usage of the ICU message format

In this section, we are going to have a look at how we can do implementations with the ICU message format and syntax. In the following examples, we will store our translation files in YAML format and also implement the specific scenarios in JavaScript. For this implementation, we will use the format() function which is found inside the i18next library. i18next accepts a formatting function with a format(value, format, locale) signature that returns a string.

Normal text translation

Let’s see how we can translate plain text between locales. For this, we create a number of files; a YAML file per locale to hold the translation content and a JavaScript file for the format() function.

We will have four locales (English, Spanish, Chinese, and Arabic) in this tutorial. Hence, we need four YAML files to hold the message contents as shown here. We will add the translation content in English, Arabic, Spanish, and Chinese, respectively, as follows:

messages_en.yaml

"Welcome_to_the_tutorial": "Welcome to the tutorial"

messages_ar.yaml

"Welcome_to_the_tutorial": "مرحبًا بك في البرنامج التعليمي"

messages_es.yaml

"Welcome_to_the_tutorial": "Bienvenida al tutorial"

messages_zh.yaml

"Welcome_to_the_tutorial": "欢迎使用本教程"

Now we are done with our translation files. Let’s create a file to hold our JavaScript format() function, like so:

formatting.js

format('Welcome_to_the_tutorial')

Okay – super easy. Right? Now let’s see how we can handle pluralization with the ICU message format in the next subsection.

Pluralization

Pluralization is one of the most important features that is provided by ICU message formatting. In this part, let’s take a look at how we can perform pluralization with ICU.

Assume that I want to display the sentence I bought one book or I bought <number of books> books according to the count of the books we need to input. For this purpose, let’s first create all the message files.

messages_en.yaml

booksCount: >
    I bought {n, plural, 
         one {# book}
         other {# books}}

messages_ar.yaml

booksCount: >
    اشتريت {n, plural, 
         one {# الكتاب}
         other {# الكتب}}

messages_es.yaml

booksCount: >
    Yo compré {n, plural, 
         one {# libro}
         other {# libros}}

messsages_zh.yaml

booksCount: >
    我买了 {n, plural, 
         one {# 书}
         other {# 图书}}

Okay, we have created our translation files for pluralization. Next, we need to send this through the format() function. To do this, we’ll have a simple format function in the formatter.js file, as shown here:

format('booksCount', { n: 3 });

This function gives you the output as I bought 3 books if the selected language is English. If the selected language is Chinese, the output will be “我买了3书”.

Finally, please note that if you are uploading translation files with ICU plurals to Lokalise, you have to enable Detect ICU plurals option in the upload settings:

 

In this case the key will be recognized as plural and you can provide translations for different forms:

 

Interpolation

Text interpolation is supported by the ICU message format to deal with the dynamic text easily. Suppose you need to display the sentence When I left home my age was <the_age>, according to the given age. 

For this, we are going to create one JSON file per locale as messages_en.json, messages_ar.json, messages_es.json, and messages_zh.json, and update the formatting.js file in order to add the formatting function, like so: 

messages_en.json

{
    "left_home_age": "When I left home my age was, {age}"
}

messages_ar.json

{
    "left_home_age": "عندما غادرت المنزل كان عمري, {age}"
}

messages_es.json

{
    "left_home_age": "Cuando salí de casa mi edad era, {age}"
}

messages_zh.json

{
    "left_home_age": "当我离开家时,我的年龄是, {age}"
}

So, we are done with our translation files for interpolation. Let’s see how we can add the format() function in the formatter.js file:

formatter.js

format('left_home_age', { age: 21 });

That’s easy it is to handle interpolation in the ICU message format. If the language you have selected is English, the output will be When I left home my age was 21. Likewise, if the language you have selected is Spanish, the output will be Cuando salí de casa mi edad era 21.

The above is quite a simple way to handle interpolation with the ICU format. Next, we are going to discuss how we can conditionally select suitable words, such as selecting gender.

 

Switch with select

The syntax of the select is almost the same as the interpolation. Creating translation files for the switch with the conditional decision is quite a straightforward task with ICU message format. Imagine a situation in which I need to display the following text in the interface:

Hello, Your friend <friend’s name> is now online. <She/He/They> added a new image to the system.

In order to display this message, you have to pay attention to the gender information correctly. Don’t worry, ICU messaging format greatly helps us to make the switch with the select process.

Let’s update our four message translation YAML files as follows:

messages_en.yaml

friend_add_image: > 
    Hello, Your friend {friend} is now online. 
    {gender, select,
    female {She}
    male {He}
    other {They}}
    added a new image to the system.

messages_es.yaml

friend_add_image: > 
    Hola, tu amiga {friend} está ahora en línea. 
    {gender, select,
    female {ella}
    male {él}
    other {ellas}}
    agregó una nueva imagen al sistema.

messages_ar.yaml

friend_add_image: > 
    مرحبًا ، صديقك {friend} متصل الآن.
    {gender, select,
    female {هي}
    male {هو}
    other {أنهم}}
    أضاف صورة جديدة إلى النظام.

messages_zh.yaml

friend_add_image: > 
    您好,您的朋友{friend}现在在线。
    {gender, select,
    female {她}
    male {他}
    other {他们}}
    向系统添加了新映像。

At this moment, we have completed our translation files for the switch with select operation. Next, we’ll update the formatting.js file as shown here:

format('friend_add_image', { friend: 'Ann', gender: 'female' });

If you select the English locale, the output of this section will be Hello, Your friend Ann is now online. She added a new image to the system. Simple, right?

Similarly, you can easily create any content you may need to conditionally render and display using ICU message formatting.

Number formatting

Next, ICU message formatting supports number formatting for numbers falling under either one of two criteria. These are currency formatting and percentage formatting. Let’s see how to use ICU message formatting to handle number formatting in an i18n software application. 

Imagine you need to display text saying: They could achieve 70% success rate in the project. To implement this percentage formatting, we need to update the message translation YAML files as follows.

messages_en.yaml

success_rate: They could achieve {n, number, percent} success rate in the project.

messages_zh.yaml

success_rate:他们可以在项目中获得{n, number, percent}成功率。

messages_ar.yaml   

success_rate:يمكنهم تحقيق {n, number, percent} معدل نجاح في المشروع.

messages_es.yaml

success_rate:Podrían lograr una {n, number, percent} tasa de éxito en el proyecto.

The next step is to update the formatter.js file as follows.

format('success_rate', { n: 0.7 });

Number formatting is as simple as that. If you select the English locale, the output will be They could achieve 70% success rate in the project.

You can also handle currency formats with the ICU messaging format. For this, we can find different implementations in different libraries. Some of the important currency-handling methods are mentioned below:

  • ICU4C (C++) NumberFormat.setCurrency() method.
  • ICU4C (C API) allows setting of the currency code via the unum_setTextAttribute() method.
  • ICU4J (ava) NumberFormat.setCurrency() method.

You can try these according to the platform you are using to implement localization in your application.

Date/Time formatting

In ICU, there are four observable predefined date formats: short, medium, long, full. If you want to display content with a date such as, I entered university on 19/02/2017, you need to pay attention to the date format that you are going to use. This is quite quick and easy in the ICU message format. Let’s see how to do that now.

messages_en.yaml

enter_university : I entered university on {uni_date, date, short}

messages_zh.yaml

enter_university : 我从上大学 {uni_date, date, short}

messages_es.yaml

enter_university : Entré a la universidad el {uni_date, date, short}

messages_ar.yaml

enter_university : دخلت الجامعة {uni_date, date, short}

Then we need to update the formatter.js file as follows:

format('enter_university', { uni_date: new Date('2019-01-01') });

Cool – so that is how we can easily format dates according to the application and its usage.

Summary

So, in this article, we discussed how we can use ICU message formatting in order to localize our software applications. ICU message formatting makes the localization process quite straightforward, quick, and efficient by providing i18n support for date and number formatting, basic text translation support, conditional switches, and important features, such as interpolation. Since this message format is currently used in many popular frameworks, it’s better to have hands-on experience and clear knowledge of its syntax and implementation.

Related posts

Sign up to our newsletter

Get the latest articles on all things data delivered straight to your inbox.

Read also
Localization made easy. Why wait?
The preferred localization tool of 2000+ companies