Abstract
This paper describes finite state transducers employed for expansion of numbers, acronyms and graphic abbreviations into full-word numerals and phrases in the task of Russian speech synthesis. The developed finite state transducers cover cardinal and ordinal numbers, convert phone numbers, dates, codes, etc. The developed project is the first Russian open-source normalization system known to the author.
Keywords
- Preprocessing
- Text-to-speech
- Morphology
- Numeral
- Abbreviation
- Acronym
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Reichel, U.D., Pfitzinger, H.R.: Text preprocessing for speech synthesis (2006)
The Festival Speech Synthesis System. http://www.cstr.ed.ac.uk/projects/festival/
Unitex 3.1beta. http://www-igm.univ-mlv.fr/~unitex/
Paumier, S.: Unitex 3.1.beta User Manual. Université Paris-Est Marne-la-Vallée. http://igm.univ-mlv.fr/~unitex/UnitexManual3.1.pdf (2015). Accessed 15 Jan 2015
Dutoit, T.: An Introduction to Text-to-Speech Synthesis, vol. 3. Springer Science & Business Media, Berlin (1997)
Sproat, R., Black, A., Chen, S., Kumar, S., Ostendorfk, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15, 287–333 (2001)
Sproat, R.: Lightly supervised learning of text normalization: Russian number names. In: Spoken Language Technology Workshop (SLT), 2010 IEEE, pp. 436–441. IEEE, December 2010
Khomitsevich, O.G., Rybin, S.V., Anichkin, I.M.: Linguistic analysis for text normalization and homonymy resolution in a Russian TTS system [Иcпoльзoвaниe лингвиcтичecкoгo aнaлизa для нopмaлизaции тeкcтa и cнятия oмoнимии в cиcтeмe cинтeзa pyccкoй peчи]. Instrument making. Thematic issue “Speech information systems” [Пpибopocтpoeниe. Teмaтичecкий выпycк «Peчeвыe инфopмaциoнныe cиcтeмы»], vol. 2, pp. 42–46. Izvestija vuzov (2013)
Nagel, S.: Formenbildung im Russischen. Formale Beschreibung und Automatisierung für das CISLEX-Wörterbuchsystem (2002)
Russian Grammar [Pyccкaя гpaммaтикa], vol. 1. Nauka, Moscow (1980)
Rosental, D.E., Golub, I.B., Telenkova, M.A.: The Modern Russian Language [Coвpeмeнный pyccкий язык]. Airis-Press, Moscow (1997)
Rosental, D.E., Djandjakova, E.V., Kabanova, N.P.: Reference Book on Orthography, Pronunciation, Literary Editing [Cпpaвoчник пo пpaвoпиcaнию, пpoизнoшeнию, литepaтypнoмy peдaктиpoвaнию]. CheRo, Moscow (1998)
Linguistics. Big encyclopedic dictionary [Языкoзнaниe. Бoльшoй энциклoпeдичecкий cлoвapь]. Big Russian Encyclopedy, Moscow (1998)
Akhmanova, O.S.: The Dictionary of Linguistic Terms [Cлoвapь лингвиcтичecкиx тepминoв]. Editorial URSS, Moscow (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Lukanin, A. (2015). Normalization of Non-standard Words with Finite State Transducers for Russian Speech Synthesis. In: Khachay, M., Konstantinova, N., Panchenko, A., Ignatov, D., Labunets, V. (eds) Analysis of Images, Social Networks and Texts. AIST 2015. Communications in Computer and Information Science, vol 542. Springer, Cham. https://doi.org/10.1007/978-3-319-26123-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-26123-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26122-5
Online ISBN: 978-3-319-26123-2
eBook Packages: Computer ScienceComputer Science (R0)