International Workshop on Systems and Frameworks for Computational Morphology

Systems and Frameworks for Computational Morphology pp 94-103 | Cite as

Dsolve—Morphological Segmentation for German Using Conditional Random Fields

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 537)

Abstract

We describe Dsolve, a system for the segmentation of morphologically complex German words into their constituent morphs. Our approach treats morphological segmentation as a classification task, in which the locations and types of morph boundaries are predicted by a Conditional Random Field model trained from manually annotated data. The prediction of morph-boundary types in addition to their locations distinguishes Dsolve from similar approaches previously suggested in the literature. We show that the use of boundary types provides a (somewhat counter-intuitive) performance boost with respect to the simpler task of predicting only segment locations.

References

  1. 1.
    Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI, Stanford (2003)Google Scholar
  2. 2.
    Chang, J.Z., Chang, J.S.: Word root finder: a morphological segmentor based on CRF. In: Proceedings of COLING 2012: Demonstration Papers, pp. 51–58 (2012)Google Scholar
  3. 3.
    Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, pp. 21–30 (2002)Google Scholar
  4. 4.
    Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process. 4(1), 3:1–3:34 (2007)CrossRefGoogle Scholar
  5. 5.
    Creutz, M., Lindén, K.: Morpheme segmentation gold standards for Finnish and English. Technical report A77, Helsinki University of Technology (2004)Google Scholar
  6. 6.
    Daelemans, W.: Grafon: a grapheme-to-phoneme conversion system for Dutch. In: Proceedings of COLING 1988, pp. 133–138 (1988)Google Scholar
  7. 7.
    Déjean, H.: Morphemes as necessary concept for structures discovery from untagged corpora. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 295–298 (1998)Google Scholar
  8. 8.
    Frakes, W.B.: Stemming algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval, pp. 131–160. Prentice-Hall, Upper Saddle River (1992)Google Scholar
  9. 9.
    Geyken, A., Hanneforth, T.: TAGH: a complete morphology for German based on weighted finite state automata. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds.) FSMNLP 2005. LNCS (LNAI), vol. 4002, pp. 55–66. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  10. 10.
    Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Green, S., DeNero, J.: A class-based agreement model for generating accurately inflected translations. In: Proceedings of ACL 2012, pp. 146–155 (2012)Google Scholar
  12. 12.
    Haapalainen, M., Ari, M.: GERTWOL und morphologische Disambiguierung für das Deutsche. In: Proceedings of the 10th Nordic Conference of Computational Linguistics. University of Helsinki, Department of General Linguistics (1995)Google Scholar
  13. 13.
    Harris, Z.: From phoneme to morpheme. Language 31, 190–222 (1955)CrossRefGoogle Scholar
  14. 14.
    Klenk, U., Langer, H.: Morphological segmentation without a lexicon. Literary Linguist. Comput. 4(4), 247–253 (1989)CrossRefGoogle Scholar
  15. 15.
    Kohonen, O., Virpioja, S., Lagus, K.: Semi-supervised learning of concatenative morphology. In: Proceedings of SIGMORPHON 2010, pp. 78–86 (2010)Google Scholar
  16. 16.
    Kurimo, M., Virpioja, S., Turunen, V., Lagus, K.: Morpho challenge competition 2005–2010: evaluations and results. Proceedings of SIGMORPHON 2010, pp. 87–95 (2010)Google Scholar
  17. 17.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann (2001)Google Scholar
  18. 18.
    Lavergne, T., Cappé, O., Yvon, F.: Practical very large scale CRFs. In: Proceedings of ACL 2010, pp. 504–513 (2010)Google Scholar
  19. 19.
    Müller, C., Gurevych, I.: Semantically enhanced term frequency. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 598–601. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  20. 20.
    Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Berlin (1999)CrossRefMATHGoogle Scholar
  21. 21.
    Pfeifer, W.: Etymologisches Wörterbuch des Deutschen, 2nd edn. Akademie-Verlag, Berlin (1993)Google Scholar
  22. 22.
    Porter, M.F.: An algorithm for suffix stripping. Electron. Libr. Inf. Syst. 14(3), 130–137 (1980)Google Scholar
  23. 23.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–285 (1989)CrossRefGoogle Scholar
  24. 24.
    Reichel, U.D., Weilhammer, K.: Automated morphological segmentation and evaluation. In: Proceedings of LREC, pp. 503–506 (2004)Google Scholar
  25. 25.
    van Rijsbergen, C.J.: Information Retrieval. Butterworth-Heinemann, Newton (1979)MATHGoogle Scholar
  26. 26.
    Ruokolainen, T., Kohonen, O., Virpioja, S., Kurimo, M.: Supervised morphological segmentation in a low-resource learning setting using conditional random fields. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 29–37 (2013)Google Scholar
  27. 27.
    Ruokolainen, T., Kohonen, O., Virpioja, S., Kurimo, M.: Painless semi-supervised morphological segmentation using conditional random fields. In: Proceedings of EACL 2014, pp. 84–89 (2014)Google Scholar
  28. 28.
    Schmid, H., Fitschen, A., Heid, U.: SMOR: a German computational morphology covering derivation, composition and inflection. In: Proceedings of LREC (2004)Google Scholar
  29. 29.
    Selkirk, E.O.: On the nature of phonological representation. In: Myers, T., Laver, J., Anderson, J. (eds.) The Cognitive Representation of Speech, pp. 379–388. North-Holland Publishing Company, Dordrecht (1981)CrossRefGoogle Scholar
  30. 30.
    Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for SIGHAN bakeoff 2005. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing (2005)Google Scholar
  31. 31.
    Wallach, H.M.: Conditional random fields: an introduction. Technical report MS-CIS-04-21, University of Pennsylvania, Department of Computer and Information Science (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Berlin-Brandenburg Academy of Sciences and HumanitiesBerlinGermany

Personalised recommendations