Unsupervised Word Categorization Using Self-Organizing Maps and Automatically Extracted Morphs

  • Mikaela Klami
  • Krista Lagus
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)


Automatic creation of syntactic and semantic word categorizations is a challenging problem for highly inflecting languages due to excessive data sparsity. Moreover, the study of colloquial language resources requires the utilization of fully corpus-based tools. We present a completely automated approach for producing word categorizations for morphologically rich languages. Self-Organizing Map (SOM) is utilized for clustering words based on the morphological properties of the context words. These properties are extracted using an automated morphological segmentation algorithm called Morfessor. Our experiments on a colloquial Finnish corpus of stories told by young children show that utilizing unsupervised morphs as features leads to clearly improved clusterings when compared to the use of whole context words as features.


Word Form Text Corpus Word Categorization Context Word Best Match Unit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ritter, H., Kohonen, T.: Self-Organizing Maps. Biological Cybernetics 61, 241–254 (1989)CrossRefGoogle Scholar
  2. 2.
    Honkela, T., Pulkki, V., Kohonen, T.: Contextual relations of words in Grimm tales analyzed by self-organizing map. In: Proceedings of ICANN 1995. Paris. EC2 et Cie, vol. 2, pp. 3–7 (1995)Google Scholar
  3. 3.
    Redington, M., Chater, N., Finch, S.: Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22(4), 425–469 (1998)CrossRefGoogle Scholar
  4. 4.
    Schutze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)MathSciNetGoogle Scholar
  5. 5.
    Lagus, K., Airola, A., Creutz, M.: Data analysis of conceptual similarities of Finnish verbs. In: Proceedings of the CogSci 2002, Fairfax, Virginia, pp. 566–571 (2002)Google Scholar
  6. 6.
    Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. In: ACL 30, pp.183–190 (1993) Google Scholar
  7. 7.
    Schulte im Walde, S.: Clustering verbs semantically according to their alternation behaviour. In: COLING 2000, pp. 747–753 (2000)Google Scholar
  8. 8.
    Light, M.: Morphological cues for lexical semantics. In: ACL 34, pp. 25–31 (1996)Google Scholar
  9. 9.
    Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 153–198 (2001)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the Workshop on Morphological and Phonological Learning of ACL 2002, Philadelphia, Pennsylvania, pp. 21–30 (2002)Google Scholar
  11. 11.
    Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings of AKRR 2005, Espoo, pp. 106–113 (2005)Google Scholar
  13. 13.
    Hirsimaki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., Pylkkonen, J.: Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language 20(4), 515–541 (2006)CrossRefGoogle Scholar
  14. 14.
    Kohonen, T.: Self-Organizing Maps, 3rd edn. Springer, Berlin (2001)MATHGoogle Scholar
  15. 15.
    Riihela, M.: The Storycrafting Method, Stakes, Helsinki, Finland (2001)Google Scholar
  16. 16.
    Hakulinen, A., Vilkuna, M., Korhonen, R., Koivisto, V., Heinonen, T., Alho, I.: Iso suomen kielioppi. Suomalaisen Kirjallisuuden Seura, Helsinki (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Mikaela Klami
    • 1
  • Krista Lagus
    • 1
  1. 1.Adaptive Informatics Research CentreHelsinki University of Technology, TKKFinland

Personalised recommendations