Multilingual rule-based approach to number expansion: Framework, extensions and application
- First Online:
- Cite this article as:
- Moberg, M. & Pärssinen, K. Int J Speech Technol (2007) 9: 29. doi:10.1007/s10772-006-9002-5
- 36 Downloads
The language development of a multilingual text-to-speech system requires contribution from linguists and native speakers of a given language. Text normalization including number expansion is one of the language-specific processing steps. The most available solutions do not support inflections and are not simple enough to be practical for non-technical developers. This paper presents a novel solution for expressing the number expansion rules. The rule framework is fast and easy to use without technical background and truly multilingual supporting gender-specific inflections of numerals. The rules require only a small amount of memory and are conveniently stored as software independent language data. The same rule framework can be extended to carry out other text-normalization tasks including processing of context-dependent abbreviations and interpretation of formatted text such as date and time expressions. The framework has been successfully used in creating number, unit and time conversion rules for 42 languages. The created rules supported cardinal numbers from 0 to 999999 and 13 units such as m, km, h and min. Professional translators without technical background generated the rules for most of the languages. The average number of rule lines for number, unit and time rules were 87, 49 and 13, respectively. The average development time for a full rule set was seven hours per language. The most complex rule sets were in Slavonic languages whereas the simplest ones were in Sino-Tibetan languages.