More improvements to the word and character counter

By Edouard on November 2, 2022

We’ve released another update to our word and character counter, which was using a ruby implementation of ICU. That implementation had a bug in which counting words or characters in strings containing both Japanese Katakanas and Latin characters would fail.

We’ve replaced that library with another library using a Ruby binding to ICU4C, the original C implementation of ICU. The system should now be more reliable, and is 60% faster than the previous implementation (which, in turn, was 16 times faster than the original implementation using system calls).

We’re currently using ICU 66, but plan to upgrade to ICU 70 shortly.