Word and Character Count in WebTranslateIt

Word Count

Word count is the number of words in a document or passage of text. Word count is commonly used by translators to determine the price of a translation job. When counting words for a translation job, the word count is based on the source language.

Therefore, what “counts as” a word, and which words “don’t count” toward the total is important.

With WebTranslateIt, HTML tags do not count against the word count, since you can click to paste them, but translatable attributes in HTML tags (alt, summary, placeholder, standby, abbr, content, title and label attributes) are extracted out and are included in the word count.

We use a XML parser to find the attributes, so if you translate a string having the same attribute for the same tag, like for instance <a href="https://webtranslateit.com" title="WebTranslateIt" title="A translation website">WebTranslateIt</a>, only the first attribute will count in the word count.

Variable placeholders count as one word.

WebTranslateIt word count is language-aware, conforms to the latest Unicode Standard and has built-in, dictionary-based support for text in languages such as Chinese, Japanese or Thai. We’re currently using ICU v.70.1.

Examples

Sentence Language Word Count
Hello, how are you? English 4 words
こんにちは元気ですか Japanese 4 words
Welcome to <a href="https://webtranslateit.com"
title="Welcome back!">WebTranslateIt</a>
English 5 words
There are %{count} posts English 4 words

Character Count

In some languages pairs character count is used by translators to determine the price of a translation job. Counting characters using bytes was vastly used in the past, but is incorrect with some languages and with emojis, for instance. We think it is more correct to count characters by graphemes.

A grapheme is a sequence of one or more code points that are displayed as a single, graphical unit that a reader recognizes as a single element of the writing system. For instance, a and ä are graphemes, but they may consist of different code points.

With WebTranslateIt, HTML tags do not count against the character count, but translatable attributes in HTML tags (alt, summary, placeholder, standby, abbr, content, title and label attributes) are extracted out and are included in the character count. Variable placeholders are included in the character count.

Strings containing several successive whitespace characters or similar (, \n, \r, \t) are squished into one whitespace character for the character count.

For instance, this string:

Hello how


are you?

Counts as 18 characters.

Examples

Character Count using bytes Count using graphemes
A 1 byte 1 grapheme
🤔 2 bytes 1 grapheme
की 2 bytes 1 grapheme

Next Up: Synchronization Tool. The WebTranslateIt Synchronization Tool wti is a powerful command-line tool designed for advanced users to synchronize your language files with WebTranslateIt… »