Predictive Text Entry for Agglutinative Languages Using Unsupervised Morphological Segmentation
Systems for predictive text entry on ambiguous keyboards typically rely on dictionaries with word frequencies which are used to suggest the most likely words matching user input. This approach is insufficient for agglutinative languages, where morphological phenomena increase the rate of out-of-vocabulary words. We propose a method for text entry, which circumvents the problem of out-of-vocabulary words, by replacing the dictionary with a Markov chain on morph sequences combined with a third order hidden Markov model (HMM) mapping key sequences to letter sequences and phonological constraints for pruning suggestion lists. We evaluate our method by constructing text entry systems for Finnish and Turkish and comparing our systems with published text entry systems and the text entry systems of three commercially available mobile phones. Measured using the keystrokes per character ratio (KPC) , we achieve superior results. For training, we use corpora, which are segmented using unsupervised morphological segmentation.
KeywordsMobile Phone Hide Markov Model Word Form Training Corpus Letter Sequence
Unable to display preview. Download preview PDF.
- 2.Creutz, M., Hirsimäki, T., Kurimo, M., Puurula, A., Pylkkönen, J., Siivola, V., Varjokallio, M., Arisoy, E., Saraçlar, M., Stolcke, A.: Morph-based speech recognition and modeling of out-of-vocabulary words across languages. ACM Transactions on Speech and Language Processing 5(1) (2009)Google Scholar
- 3.Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4(1) (2007)Google Scholar
- 4.Grover, D., King, M., Kushler, C.A.: Reduced keyboard disambiguating computer. Patent US 5818437 (1998)Google Scholar
- 6.Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. Ph.D. thesis, University of Helsinki (1983)Google Scholar
- 11.Silfverberg, M., Hyvärinen, M., Pirinen, T.: Improving predictive entry of Finnish text messages using IRC logs. In: Jassem, K., Fuglewicz, P., Piasecki, M., Przepiórkowski, A. (eds.) Proceedings of the Computational Liguistics-Applications Conference, Jachranka (2011)Google Scholar
- 12.Silfverberg, M., Lindén, K.: Combining statistical models for POS tagging using finite-state calculus. In: Pedersen, B.S., Nešpore, G., Skadina, I. (eds.) 18th Nordic Conference on Computational Linguistics, pp. 183–190 (2011)Google Scholar