Skip to main content

Dsolve—Morphological Segmentation for German Using Conditional Random Fields

Part of the Communications in Computer and Information Science book series (CCIS,volume 537)

Abstract

We describe Dsolve, a system for the segmentation of morphologically complex German words into their constituent morphs. Our approach treats morphological segmentation as a classification task, in which the locations and types of morph boundaries are predicted by a Conditional Random Field model trained from manually annotated data. The prediction of morph-boundary types in addition to their locations distinguishes Dsolve from similar approaches previously suggested in the literature. We show that the use of boundary types provides a (somewhat counter-intuitive) performance boost with respect to the simpler task of predicting only segment locations.

Keywords

  • Classification Task
  • Model Order
  • Conditional Random Field
  • Boundary Type
  • Boundary Detection

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-23980-4_6
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-23980-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    Although as correctly noted in [23], any class-string c which maximizes P(co) will also maximize P(c|o) if the observation string o is held fixed.

  2. 2.

    Note that our use of “model order” in this paper refers only to the context window size used to define the feature function inventory, and is unrelated to the order of linear-chain feature dependencies in the underlying CRF models.

  3. 3.

    http://www.bbaw.de.

  4. 4.

    http://kaskade.dwds.de/~moocow/gramophone/de-dlexdb.data.txt.

  5. 5.

    http://www.cis.hut.fi/projects/morpho/morfessorflatcat.shtml; FlatCat models were trained with perplexity threshold 10.0 using annotated corpus data in semi-supervised mode.

References

  1. Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI, Stanford (2003)

    Google Scholar 

  2. Chang, J.Z., Chang, J.S.: Word root finder: a morphological segmentor based on CRF. In: Proceedings of COLING 2012: Demonstration Papers, pp. 51–58 (2012)

    Google Scholar 

  3. Creutz, M., Lagus, K.: Unsupervised discovery of morphemes. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, pp. 21–30 (2002)

    Google Scholar 

  4. Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process. 4(1), 3:1–3:34 (2007)

    CrossRef  Google Scholar 

  5. Creutz, M., Lindén, K.: Morpheme segmentation gold standards for Finnish and English. Technical report A77, Helsinki University of Technology (2004)

    Google Scholar 

  6. Daelemans, W.: Grafon: a grapheme-to-phoneme conversion system for Dutch. In: Proceedings of COLING 1988, pp. 133–138 (1988)

    Google Scholar 

  7. Déjean, H.: Morphemes as necessary concept for structures discovery from untagged corpora. In: Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning, pp. 295–298 (1998)

    Google Scholar 

  8. Frakes, W.B.: Stemming algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval, pp. 131–160. Prentice-Hall, Upper Saddle River (1992)

    Google Scholar 

  9. Geyken, A., Hanneforth, T.: TAGH: a complete morphology for German based on weighted finite state automata. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds.) FSMNLP 2005. LNCS (LNAI), vol. 4002, pp. 55–66. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  10. Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)

    MathSciNet  CrossRef  Google Scholar 

  11. Green, S., DeNero, J.: A class-based agreement model for generating accurately inflected translations. In: Proceedings of ACL 2012, pp. 146–155 (2012)

    Google Scholar 

  12. Haapalainen, M., Ari, M.: GERTWOL und morphologische Disambiguierung für das Deutsche. In: Proceedings of the 10th Nordic Conference of Computational Linguistics. University of Helsinki, Department of General Linguistics (1995)

    Google Scholar 

  13. Harris, Z.: From phoneme to morpheme. Language 31, 190–222 (1955)

    CrossRef  Google Scholar 

  14. Klenk, U., Langer, H.: Morphological segmentation without a lexicon. Literary Linguist. Comput. 4(4), 247–253 (1989)

    CrossRef  Google Scholar 

  15. Kohonen, O., Virpioja, S., Lagus, K.: Semi-supervised learning of concatenative morphology. In: Proceedings of SIGMORPHON 2010, pp. 78–86 (2010)

    Google Scholar 

  16. Kurimo, M., Virpioja, S., Turunen, V., Lagus, K.: Morpho challenge competition 2005–2010: evaluations and results. Proceedings of SIGMORPHON 2010, pp. 87–95 (2010)

    Google Scholar 

  17. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann (2001)

    Google Scholar 

  18. Lavergne, T., Cappé, O., Yvon, F.: Practical very large scale CRFs. In: Proceedings of ACL 2010, pp. 504–513 (2010)

    Google Scholar 

  19. Müller, C., Gurevych, I.: Semantically enhanced term frequency. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 598–601. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  20. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Berlin (1999)

    CrossRef  MATH  Google Scholar 

  21. Pfeifer, W.: Etymologisches Wörterbuch des Deutschen, 2nd edn. Akademie-Verlag, Berlin (1993)

    Google Scholar 

  22. Porter, M.F.: An algorithm for suffix stripping. Electron. Libr. Inf. Syst. 14(3), 130–137 (1980)

    Google Scholar 

  23. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–285 (1989)

    CrossRef  Google Scholar 

  24. Reichel, U.D., Weilhammer, K.: Automated morphological segmentation and evaluation. In: Proceedings of LREC, pp. 503–506 (2004)

    Google Scholar 

  25. van Rijsbergen, C.J.: Information Retrieval. Butterworth-Heinemann, Newton (1979)

    MATH  Google Scholar 

  26. Ruokolainen, T., Kohonen, O., Virpioja, S., Kurimo, M.: Supervised morphological segmentation in a low-resource learning setting using conditional random fields. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 29–37 (2013)

    Google Scholar 

  27. Ruokolainen, T., Kohonen, O., Virpioja, S., Kurimo, M.: Painless semi-supervised morphological segmentation using conditional random fields. In: Proceedings of EACL 2014, pp. 84–89 (2014)

    Google Scholar 

  28. Schmid, H., Fitschen, A., Heid, U.: SMOR: a German computational morphology covering derivation, composition and inflection. In: Proceedings of LREC (2004)

    Google Scholar 

  29. Selkirk, E.O.: On the nature of phonological representation. In: Myers, T., Laver, J., Anderson, J. (eds.) The Cognitive Representation of Speech, pp. 379–388. North-Holland Publishing Company, Dordrecht (1981)

    CrossRef  Google Scholar 

  30. Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for SIGHAN bakeoff 2005. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing (2005)

    Google Scholar 

  31. Wallach, H.M.: Conditional random fields: an introduction. Technical report MS-CIS-04-21, University of Pennsylvania, Department of Computer and Information Science (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kay-Michael Würzner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Würzner, KM., Jurish, B. (2015). Dsolve—Morphological Segmentation for German Using Conditional Random Fields. In: Mahlow, C., Piotrowski, M. (eds) Systems and Frameworks for Computational Morphology. SFCM 2015. Communications in Computer and Information Science, vol 537. Springer, Cham. https://doi.org/10.1007/978-3-319-23980-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23980-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23978-1

  • Online ISBN: 978-3-319-23980-4

  • eBook Packages: Computer ScienceComputer Science (R0)