Abstract
This article describes the problems encountered during the design and implementation of automatic speech recognition systems for the Hungarian language, proposes practical solutions for treating them, and evaluates their practicality using publicly available databases. The article introduces a rule-based system for modeling the phonological rules inside of words as well as at word boundaries and the notion of stochastic morphological analysis for the treatment of the vocabulary size problem. Finally, the implementation of the proposed methods by the FlexiVoice speech engine is described, and the results of the experimental evaluation on isolated and connected digit recognition, on a 2000-word recognition of Hungarian city names, and on inflected word recognition tasks are summarized.
Similar content being viewed by others
References
Elekfi, L. (1994). Dictionary of Hungarian Inflections[in Hungarian]. Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest.
Geutner, P. (1995). Using morphology towards better large-vocabulary speech recognition systems. In Proceedings of ICASSP ’95, pp. 445–448.
Geutner, P., Finke, M., and Scheytt, P. (1998). Adaptive vocabularies for transcribing multilingual broadcast news. In Proceedings of ICASSP ’98, pp. 925–928.
Grimes, B.F. (Ed.), (1996). Ethnologue: Languages of the World, 13th ed., Available: http://www.sil.org/ethnologue
Kassai, I. (1998). Phonetics[in Hungarian]. Budapest, Nemzeti Tankönyvkiadó.
Kocsor, A., Kuba, A., and Toth, L. (1999). An overview of the oasis speech recognition project. In Proceedings of ICAI’99, in press.
Koskenniemi, K. (1983). Two-level morphology: A general computational model for word-form recognition and production. Publication No. 11. Helsinki: University of Helsinki Department of General Linguistics.
MTA (1986). Orthographical Rules of the Hungarian Language, 11th ed., [in Hungarian] Budapest, Akadémiai kiadó.
Prószéky, G. and Kis, B. (1999). Computer Interaction—in a Human Language[in Hungarian]. Bicske, Szak kiadó.
Rabiner, L.R. and Juang, B.-H. (1993). Fundamentals of Speech Recognition. Englewood Cliffs, NJ, PTR Prentice Hall.
Rabiner, L.R., Wilpon, J.G., and Soong, F.K. (1989). High performance connected digit recognition using hidden Markov models. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(8):1214–1225.
SAMPA. (1996). SAMPA—Computer Readable Phonetic Alphabet. Available: http://www.phon.ucl.ac.uk/home/sampa/home.htm
Vicsi, K. and Vig, A. (1995). Text independent neural network/rule based hybrid, continuous speech recognition. In Proceedings of Eurospeech ’95, Madrid, pp. 2201–2204.
Vicsi, K. and Vig, A. (1997). Babel—A multi-lingual database. Technical report, "György Békésy" Acoustics Research Laboratory of the Budapest University of Technology and Economics.Available: http://www.ttt.bme.hu/speech/database.htm
Wothke, K. (1991). Automatic Phonetic Transcription Taking into Account the Morphological Structure of Words. Technical Report, IBM Germany, Heidelberg Scientific Center.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Szarvas, M., Fegyó, T., Mihajlik, P. et al. Automatic Recognition of Hungarian: Theory And Practice. International Journal of Speech Technology 3, 237–251 (2000). https://doi.org/10.1023/A:1026515132762
Issue Date:
DOI: https://doi.org/10.1023/A:1026515132762