Week-end project: public language and territory database

Posted by Edouard on December 13, 2009

Web Translate It has a rather large database of languages, territories and scripts that is used internally.

It is used to display language lists, or for the importers to figure out the plural forms. I thought some of this data could be interesting and valuable to other people.

This week-end I decided to expose this data to everyone. Here you go:

It is pretty cool, I am navigating through the links since an hour now :)

You could probably find this information on Wikipedia, but the difference here is that is structured and to the point. For example, we know the relationships between a language, a territory and a script.

French for example is spoken in Belgium, Canada, Switzerland, France, Luxembourg, Monaco and Senegal.

Or, Mongolian is spoken in and is written using the scripts Cyrillic and Mongolian

The other interesting data is the plural forms rules and code. When you translate a plural string from one language to another, Web Translate It automatically creates the plural rule for the target language, using the right plural forms.

For example, here is the plural rule for Russian, and here another one for Polish.

Making this data available to the public could also let people report eventual mistakes in this data.

I don’t have plans to make this data editable by users, like a wiki. This data is very critical to some Web Translate It features, so I’d rather be the sole maintainer.