1.2 The fun part: extracting strings
In any case, internationalizing your application is a manual process where you extract all the translatable text out of file, and place it in a linguistic file, which compiles all the translatable text of your application.
Then, a mechanism to retrieve the translated text is setup, which lets the application display the translated text from a linguistic file, depending on the requested language.
Will that slow down my application? Usually no. Templates are usually cached into memory. In the rare case it’s not the linguistic file content is likely to be loaded in memory, and the lookup to find the translation is negligible on today’s hardware.
1.3 Best practices to write code and copy for i18n
Translators are not programmers. Writing good copy for i18n is not always easy for developers. Here are a few rules to write better copy for your internationalization.
1.3.1 Extract simple, complete sentences
Try to extract complete sentences that will make sense to a translator.
Original text: “Hello, good to see you.”
GOOD: Hello, good to see you
# Perfect, the sentence makes sense
BAD: Hello,
good to see you
# It can be hard for a translator to understand the context of one or the other string.
1.3.2 Include variables if your internationalization framework supports interpolations
Most internationalization frameworks let you interpolate variables. Choose your variable names appropriately, so translators can understand the segment in context.
Original text: Hello #{user.name}, Good to see you.
GOOD: Hello %{user_name}, good to see you
# Perfect, the sentence makes sense
BAD: Hello
good to see you
# Context can be difficult to understand.
# You don’t necessarily greet people “Hello username” in this order in other languages.
SO-SO: Hello %{var}, good to see you
# Context is a bit difficult to understand.
# Give the variable a better name.
1.3.3 Don’t include untranslatable content
If something is not translatable, don’t include it in a linguistic file.
{$starttime|displaydatenotz:'hour'}
1.3.4 Avoid including code in text
Original text: <a href="urlblah" title="Hello!" onClick="$('thingy').toggle(); style="color: pink; font-size: 12px">Hello there!</a>
GOOD: Hello!
Hello there!
# Perfect, the sentence makes sense, no code.
BAD: <a href="urlblah" title="Hello!" onClick="$('thingy').toggle(); style="color: pink; font-size: 12px">Hello there!</a>
# Too much code! Translators will make mistakes in the code,
# which will break the feature.
Sometimes you have to, though:
Original text: Hello <strong>#{user.name}</strong>!
GOOD: Hello <strong>%{user_name}</strong>
# Perfect, the sentence makes sense.
# There are some code but this is acceptable.
BAD: Hello
# Context can be difficult to understand.
# You don’t necessarily greet people “Hello username” in this order
# in other languages.
BAD: Hello <strong>%{user_name}
# As a rule of thumb, if you include some code in the translatable text,
# include it all the way.
1.3.5 Do repeat yourself
Remember, translators are not programmers! Translatable text is not the place to write clever code. In my experience, it’s better to repeat yourself than to write a clever sentence translators will have troubles translating.
Example:
Hi {if $to}{$to->getLinkText()}
{else}there{/if},
{if $user}{$user->name} {else}{$fromname} ({$fromemail})
was on our awesome website and {/if}{if $attending}thinks you should come along to
{if !$eventname}see{/if}{else}thinks you might be interested in
{if !$eventname}seeing{else}going to{/if}{/if}
{if $eventname}{$eventname}{else}{$headliner->getLinkText()}{/if} at
{$venue->getLinkText()} on {$startdate|displaydate:'shortdmy'}.
This kind of code needs a complete rewrite. It’s best to write conditionals, each containing one full sentence.
1.3.6 Use pluralization (if your framework supports it)
Pluralization is a way to change the spelling of a noun depending of the plurality of a variable.
Let’s consider this example: Less than #{@count} minutes.
It displays:
Less than 0 minutes
if @count equals 0, which is correct.
Less than 3 minutes
if @count equals 3, which is correct too.
Less than 1 minutes
If @count equals 1, which is incorrect English.
Developers usually get around this by writing conditional around this sentence, creating two strings for the translators:
if @count equals 1
then display “Less than 1 minute”
else
then display “Less than %{@count} minutes”
The problem is that pluralization differs from one language to another. In French for instance, it’s spelled:
Moins de 0 minute
if @count == 0
Moins de 1 minute
if @count == 1
Moins de 3 minutes
if @count == 3
So, the conditional works in English, but doesn’t work in French.
Even worse: languages such as Polish or Arabic have 4 and 6 plural rules, and languages such as Chinese or Japanese have no plural rules at all!
Thankfully Unicode publishes and maintains CLDR, a database listing the plural rules of most languages. Also, good internationalization frameworks take care of this complexity for you, so you will have to use it.
In your application, you will extract the text like so:
t(“less_than_x_minutes”, :count => @dogs.count)
And in your linguistic in English will look like so:
less_than_x_minutes:
one: “less than %{count} minute”
other: “less than %{count} minutes”
Which will translate in French:
less_than_x_minutes:
one: “moins de %{count} minute”
other: “moins de %{count} minutes”
In Russian:
less_than_x_minutes:
one: "меньше %{count} минут"
few: "меньше %{count} минут"
many: "меньше %{count} минуты"
other: "меньше %{count} минуты"
In Chinese:
less_than_x_minutes:
other: "不到 %{count} 分钟"
1.4 Internationalizing database content
Internationalize database content is tricky. Handling the translation of this content depends on the nature of the content itself.
Is you database content static? For instance, is it a list of countries or a list of languages, then I would recommend using a unique identifier (such as a country code or language code), and translate this content in a linguistic file.
On my web application for instance I have a database containing all the languages in the world. Each language is identified by a ISO language code. I then use the language code to get the name of the country from a linguistic file.
Linguistic file:
languages:
en: English
fr: French
es: Spanish
da: Danish
The advantages are multiple. First, the kind of content is standard. It comes, once again from Unicode’s CLDR, which means it’s likely to be correct and won’t offend anyone. Second, this data was already translated on the CLDR repository, so I won’t have to translate this content by myself (This repository contains translated languages, scripts and territories in a YAML format). Lastly, we’re dealing with the real world, and this content is subject to changes: new counties appear, some disappear or get renamed. If that’s the case, it’s just a matter of updating this linguistic file from the source.
If you database content is fairly “dynamic”, for instance it is a list of products in a shop, then save the translations in database.
If you only need to translate one table, add one new column per language in your table to handle the translations. If you need to translate several tables, then create a dedicated table containing the translations of all tables.
[TODO: Develop]
1.5 Internationalizing Javascript
If you develop a web application containing Javascript, then you’re probably wondering how to internationalize the text contained in the Javascript code.
The good news is that there are several plugins for Javascript to help you with this task. For instance i18n-js i18n-node or i18next. They all work by extracting the translatable text into a separate file containing a JSON hash.
If you don’t have much text to translate, a leaner solution is to extract the translatable text to a file compatible with your server i18n framework, and make your server dump these translations as a JSON hash inside your HTML body like so.
window.I18n = {"translations":{"key1":"text1.","key2":"text2."}}
You can then recover your translations in Javascript by calling window.I18n[“translations”][“key1”].