Predictive Text Entry for Agglutinative Languages Using Unsupervised Morphological Segmentation

  • Miikka Silfverberg
  • Krister Lindén
  • Mirka Hyvärinen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7182)


Systems for predictive text entry on ambiguous keyboards typically rely on dictionaries with word frequencies which are used to suggest the most likely words matching user input. This approach is insufficient for agglutinative languages, where morphological phenomena increase the rate of out-of-vocabulary words. We propose a method for text entry, which circumvents the problem of out-of-vocabulary words, by replacing the dictionary with a Markov chain on morph sequences combined with a third order hidden Markov model (HMM) mapping key sequences to letter sequences and phonological constraints for pruning suggestion lists. We evaluate our method by constructing text entry systems for Finnish and Turkish and comparing our systems with published text entry systems and the text entry systems of three commercially available mobile phones. Measured using the keystrokes per character ratio (KPC) [8], we achieve superior results. For training, we use corpora, which are segmented using unsupervised morphological segmentation.


Mobile Phone Hide Markov Model Word Form Training Corpus Letter Sequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Brants, T.: Tnt - a statistical part-of-speech tagger. In: Proceedings of the Sixth Applied Natural Language Processing, pp. 224–231. ACL, Seattle (2000)CrossRefGoogle Scholar
  2. 2.
    Creutz, M., Hirsimäki, T., Kurimo, M., Puurula, A., Pylkkönen, J., Siivola, V., Varjokallio, M., Arisoy, E., Saraçlar, M., Stolcke, A.: Morph-based speech recognition and modeling of out-of-vocabulary words across languages. ACM Transactions on Speech and Language Processing 5(1) (2009)Google Scholar
  3. 3.
    Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4(1) (2007)Google Scholar
  4. 4.
    Grover, D., King, M., Kushler, C.A.: Reduced keyboard disambiguating computer. Patent US 5818437 (1998)Google Scholar
  5. 5.
    Klarlund, N.: Word n-grams for cluster keyboards. In: TextEntry 2003 Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods, pp. 51–58. ACL, Stroudsburg (2003)CrossRefGoogle Scholar
  6. 6.
    Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. Ph.D. thesis, University of Helsinki (1983)Google Scholar
  7. 7.
    Lindén, K., Axelson, E., Hardwick, S., Silfverberg, M., Pirinen, T.: HFST–Framework for Compiling and Applying Morphologies. In: Mahlow, C., Piotrowski, M. (eds.) SFCM 2011. CCIS, vol. 100, pp. 67–85. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    MacKenzie, I.S.: KSPC (Keystrokes per Character) as a Characteristic of Text Entry Techniques. In: Paternó, F. (ed.) Mobile HCI 2002. LNCS, vol. 2411, pp. 195–210. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Mackenzie, I.S., Kober, H., Smith, D., Jones, T., Skepner, E.: Letterwise: Prefix-based disambiguation for mobile text input. In: Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology, pp. 111–120. ACM, Orlando (2001)CrossRefGoogle Scholar
  10. 10.
    MacKenzie, I.S., William Soukoreff, R.: Text entry for mobile computing: Models and methods, theory and practice. Human-Computer Interaction 17(2), 147–198 (2002)CrossRefGoogle Scholar
  11. 11.
    Silfverberg, M., Hyvärinen, M., Pirinen, T.: Improving predictive entry of Finnish text messages using IRC logs. In: Jassem, K., Fuglewicz, P., Piasecki, M., Przepiórkowski, A. (eds.) Proceedings of the Computational Liguistics-Applications Conference, Jachranka (2011)Google Scholar
  12. 12.
    Silfverberg, M., Lindén, K.: Combining statistical models for POS tagging using finite-state calculus. In: Pedersen, B.S., Nešpore, G., Skadina, I. (eds.) 18th Nordic Conference on Computational Linguistics, pp. 183–190 (2011)Google Scholar
  13. 13.
    Tantuğ, A.C.: A probabilistic mobile text entry system for agglutinative languages. IEEE Transactions on Consumer Electronics 56(4), 1018–1024 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Miikka Silfverberg
    • 1
  • Krister Lindén
    • 1
  • Mirka Hyvärinen
    • 1
  1. 1.Department of Modern LanguagesUniversity of HelsinkiHelsinkiFinland

Personalised recommendations