Advertisement

Unsupervised Morpheme Analysis with Allomorfessor

  • Sami Virpioja
  • Oskar Kohonen
  • Krista Lagus
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6241)

Abstract

Allomorfessor extends the unsupervised morpheme segmentation method Morfessor to account for the linguistic phenomenon of allomorphy, where one morpheme has several different surface forms. The method discovers common base forms for allomorphs from an unannotated corpus by finding small modifications, called mutations, for them. Using Maximum a Posteriori estimation, the model is able to decide the amount and types of the mutations needed for the particular language. In Morpho Challenge 2009 evaluations, the effect of the mutations was discovered to be rather small. However, Allomorfessor performed generally well, achieving the best results for English in the linguistic evaluation, and being in the top three in the application evaluations for all languages.

Keywords

Machine Translation Mean Average Precision Word Form Mean Average Precision Viterbi Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bernhard, D.: Simple morpheme labelling in unsupervised morpheme analysis. In: Peters, C., Jijkoun, V., Mandl, T., Müller, H., Oard, D.W., Peñas, A., Petras, V., Santos, D. (eds.) CLEF 2007. LNCS, vol. 5152, pp. 873–880. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Bernhard, D.: MorphoNet: Exploring the use of community structure for unsupervised morpheme analysis. In: Working notes for the CLEF 2009 Workshop, Corfu, Greece (2009)Google Scholar
  3. 3.
    Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Tech. Rep. A81, Publications in Computer and Information Science, Helsinki University of Technology (2005)Google Scholar
  4. 4.
    Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4(1) (2007)Google Scholar
  5. 5.
    Dasgupta, S., Ng, V.: High-performance, language-independent morphological segmentation. In: The Annual Conference of the North American Chapter of the ACL, NAACL-HLT (2007)Google Scholar
  6. 6.
    Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 153–189 (2001)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Kohonen, O., Virpioja, S., Klami, M.: Allomorfessor: Towards unsupervised morpheme analysis. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G.J.F., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) CLEF 2008. LNCS, vol. 5706, pp. 975–982. Springer, Heidelberg (2009)Google Scholar
  8. 8.
    Kurimo, M., Virpioja, S., Turunen, V., Blackwood, G.W., Byrne, W.: Overview and results of Morpho Challenge 2009. In: Multilingual Information Access Evaluation 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, Corfu, Greece, September 30 - October 2. LNCS, vol. I, Springer, Heidelberg (2010)Google Scholar
  9. 9.
    Rissanen, J.: Stochastic Complexity in Statistical Inquiry, vol. 15. World Scientific Series in Computer Science, Singapore (1989)Google Scholar
  10. 10.
    Virpioja, S., Kohonen, O.: Unsupervised morpheme analysis with Allomorfessor. In: Working notes for the CLEF 2009 Workshop, Corfu, Greece (2009)Google Scholar
  11. 11.
    Yarowsky, D., Wicentowski, R.: Minimally supervised morphological analysis by multimodal alignment. In: Proceedings of the 38th Meeting of the ACL, pp. 207–216 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Sami Virpioja
    • 1
  • Oskar Kohonen
    • 1
  • Krista Lagus
    • 1
  1. 1.Adaptive Informatics Research CentreAalto University School of Science and TechnologyFinland

Personalised recommendations