Allomorfessor: Towards Unsupervised Morpheme Analysis

  • Oskar Kohonen
  • Sami Virpioja
  • Mikaela Klami
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5706)


We extend the unsupervised morpheme segmentation method Morfessor Baseline to account for the linguistic phenomenon of allomorphy, where one morpheme has several different surface forms. Our method discovers common base forms for allomorphs from an unannotated corpus. We evaluate the method by participating in the Morpho Challenge 2008 competition 1, where inferred analyses are compared against a linguistic gold standard. While our competition entry achieves high precision, but low recall, and therefore low F-measure scores, we show that a small model change gives state-of-the-art results.


Word Form Edit Distance Edit Operation Word Lexicon Approximate String Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baroni, M., Matiasek, J., Trost, H.: Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In: Proceedings of the ACL 2002 workshop on Morphological and phonological learning, Morristown, NJ, USA, pp. 48–57. ACL (2002)Google Scholar
  2. 2.
    Bernhard, D.: Simple morpheme labelling in unsupervised morpheme analysis. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 873–880. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4(1) (January 2007)Google Scholar
  4. 4.
    Dasgupta, S., Ng, V.: High-performance, language-independent morphological segmentation. In: The annual conference of the North American Chapter of the ACL, NAACL-HLT (2007)Google Scholar
  5. 5.
    de Marcken, C.G.: Unsupervised Language Acquisition. PhD thesis, MIT (1996)Google Scholar
  6. 6.
    Goldwater, S., Griffiths, T.L., Johnson, M.: Interpolating between types and tokens by estimating power-law generators. In: Advances in Neural Information Processing Systems (NIPS), p. 18 (2006)Google Scholar
  7. 7.
    Kurimo, M., Turunen, V., Varjokallio, M.: Overview of Morpho Challenge 2008. In: Peters, C., et al. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 951–966. Springer, Heidelberg (2009)Google Scholar
  8. 8.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Schone, P., Jurafsky, D.: Knowledge-free induction of morphology using latent semantic analysis. In: Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning, Morristown, NJ, USA, pp. 67–72. ACL (2000)Google Scholar
  10. 10.
    Yarowsky, D., Wicentowski, R.: Minimally supervised morphological analysis by multimodal alignment. In: Proceedings of the 38th Meeting of the ACL, pp. 207–216 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Oskar Kohonen
    • 1
  • Sami Virpioja
    • 1
  • Mikaela Klami
    • 1
  1. 1.Adaptive Informatics Research CentreHelsinki University of TechnologyFinland

Personalised recommendations