Skip to main content

Unsupervised Morphological Segmentation Using Neural Word Embeddings

Part of the Lecture Notes in Computer Science book series (LNAI,volume 9918)

Abstract

We present a fully unsupervised method for morphological segmentation. Unlike many morphological segmentation systems, our method is based on semantic features rather than orthographic features. In order to capture word meanings, word embeddings are obtained from a two-level neural network [11]. We compute the semantic similarity between words using the neural word embeddings, which forms our baseline segmentation model. We model morphotactics with a bigram language model based on maximum likelihood estimates by using the initial segmentations from the baseline. Results show that using semantic features helps to improve morphological segmentation especially in agglutinating languages like Turkish. Our method shows competitive performance compared to other unsupervised morphological segmentation systems.

Keywords

  • Morphology
  • Semantics
  • Neural representation of speech and language
  • Morphological segmentation
  • Unsupervised learning
  • Word embeddings

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-45925-7_4
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   39.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-45925-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   49.99
Price excludes VAT (USA)
Fig. 1.

References

  1. Can, B., Manandhar, S.: Clustering morphological paradigms using syntactic categories. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 641–648. Springer, Heidelberg (2010)

    Google Scholar 

  2. Clark, A.: Inducing syntactic categories by context distribution clustering. In: Proceedings of 2nd Workshop on Learning Language in Logic and 4th Conference on Computational Natural Language Learning, ConLL 2000, vol. 7, pp. 91–94. Association for Computational Linguistics, Stroudsburg (2000)

    Google Scholar 

  3. Creutz, M., Lagus, K.: Inducing the morphological lexicon of a natural language from unannotated text. In: Proceedings of International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning (AKRR 2005), pp. 106–113 (2005)

    Google Scholar 

  4. Creutz, M., Lagus, K.: Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Technical report A81 (2005)

    Google Scholar 

  5. Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process. 4, 3:1–3:34 (2007)

    CrossRef  Google Scholar 

  6. Goldwater, S., Griffiths, T.L., Johnson, M.: Interpolating between types and tokens by estimating power-law generators. In: Advances in Neural Information Processing Systems, vol. 18, p. 459 (2006)

    Google Scholar 

  7. Hankamer, J.: Finite state morphology and left to right phonology. In: Proceedings of 5th West Coast Conference on Formal Linguistics, January 1986

    Google Scholar 

  8. Kurimo, M., Lagus, K., Virpioja, S., Turunen, V.T.: Morpho Challenge 2010, June 2011. http://research.ics.tkk.fi/events/morphochallenge2010/. Accessed 4 Jul 2016

  9. Lee, Y.K., Haghighi, A., Barzilay, R.: Modeling syntactic context improves morphological segmentation. In: Proceedings of 15th Conference on Computational Natural Language Learning, CoNLL 2011, pp. 1–9. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  10. Lignos, C.: Learning from unseen data. In: Kurimo, M., Virpioja, S., Turunen, V., Lagus, K. (eds.) Proceedings of Morpho Challenge 2010 Workshop, pp. 35–38. Aalto University, Espoo (2010)

    Google Scholar 

  11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). CoRR arXiv:abs/1301.3781

  12. Narasimhan, K., Barzilay, R., Jaakkola, T.S.: An unsupervised method for uncovering morphological chains. Trans. Assoc. Comput. Linguist. (TACL) 3, 157–167 (2015)

    Google Scholar 

  13. Nicolas, L., Farré, J., Molinero, M.A.: Unsupervised learning of concatenative morphology based on frequency-related form occurrence. In: Kurimo, M., Virpioja, S., Turunen, V., Lagus, K. (eds.) Proceedings of Morpho Challenge 2010 Workshop, pp. 39–43. Aalto University, Espoo (2010)

    Google Scholar 

  14. Schone, P., Jurafsky, D.: Knowledge-free induction of inflectional morphologies. In: Proceedings of 2nd Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, NAACL 2001, pp. 1–9. Association for Computational Linguistics, Stroudsburg (2001)

    Google Scholar 

  15. Soricut, R., Och, F.: Unsupervised morphology induction using word embeddings. In: Human Language Technologies: The 2015 Annual Conference of the North American Chapter of the ACL, pp. 1627–1637 (2015)

    Google Scholar 

  16. Sproat, R.W.: Morphology and Computation. MIT Press, Cambridge (1992)

    Google Scholar 

  17. Team, D.D.: Deeplearning4j: Open-Source Distributed Deep Learning for the JVM, Apache Software Foundation License 2.0, May 2016. http://deeplearning4j.org/

Download references

Acknowledgements

This research was supported by TUBITAK (The Scientific and Technological Research Council of Turkey) grant number 115E464.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ahmet Üstün or Burcu Can .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Üstün, A., Can, B. (2016). Unsupervised Morphological Segmentation Using Neural Word Embeddings. In: Král, P., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2016. Lecture Notes in Computer Science(), vol 9918. Springer, Cham. https://doi.org/10.1007/978-3-319-45925-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45925-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45924-0

  • Online ISBN: 978-3-319-45925-7

  • eBook Packages: Computer ScienceComputer Science (R0)