Improvement on Web Translate It: better YAML importer

By Edouard on January 14, 2010

Today I improved and launched a new parser for the importer used for YAML files on Web Translate It. The previous parser was buggy and inefficient.

The default Ruby on Rails implementation for i18n, so-called “Simple” use YAML to store its files.

A YAML file looks like so:

fr:
  some_key:
    key: value
    # a comment
    hello: bonjour

The YAML parser used until today was home-made. It used to browse every line in the YAML file, extracted keys, values and comments.

Really good YAML parsers exist for pretty much every language, but they don’t allow me to extract comments and display them on the translation interface, and it was a no-go when I implemented this feature.

My parser was working fine with really simple YAML files. In practice many customers had problems importing their YAML files because my parser did not support some edge-cases.

As Web Translate It grows, a few more customers needed to be able to import YAML files. As a consequence, last night and today a few more import jobs failed for a customer. I am really sorry about that.

Being able to actually import YAML files is obviously more important that being able to extract comments. On top of that, my parser had become incredibly complex and I felt I would not be able to fix the issue in a simple way, and to maintain a such hullabaloo.

So I switched to the vanilla ruby YAML parser, which not only parses files really well, but also parses them really fast. The only downside is that it is not possible to extract and import comments from the YAML file any longer.

A note of caution with YAML

Be cautious when writing copy in your YAML file because everything your write in the file is interpreted by the parser (by your application as well as by Web Translate It). Words like true, false, yes, no are interpreted as the booleans True or False by YAML parsers.

Consider this YAML file:

en:
  date:
    true: foo
    false: bar
    yes: baz
    no: boo

It will be interpreted as:

en:
  date:
    true: foo
    false: bar

Because yes == true and false == no in YAML, the parser consider the compounds yes: baz and no: boo duplicate respectively true: foo and false: bar. If you must use yes and no as keys, wrap them around quotes, like so:

en:
  date:
    true: foo
    false: bar
    "yes": baz
    "no": boo

Same goes for the value. This:

en:
  date:
    a_key: yes
    b_key: no

will be interpreted as:

en:
  date:
    a_key: true
    b_key: false

So wrap them around quotes:

en:
  date:
    true: "yes"
    false: "no"

Thank you for your patience with this issue, and thank you for using Web Translate It.