Abstract
Morphological segmentation is the task of segmenting words into morphemes, the basic semantic units. It is one of the most fundamental tasks in natural language processing, especially for morphologically-rich languages. In this paper, we treat the morphological segmentation as a character sequence to sequence learning problem and propose an attention based neural network model for solving it. In our proposed method, we use a bidirectional long-short term memory as the encoder, which can increase the amount of input information available to the network and capture past and future information effectively. Additionally, an attention mechanism is presented in the decoder to make our morphological segmentation model focus on certain contexts of current character to be tagged. We conduct experiments on several languages such as Turkish, Finnish, and English. Experimental results show that our model can achieve either better or comparable results to existing methods in morphological segmentation.
Similar content being viewed by others
References
Tsarfaty, R., Seddah, D., Goldberg, Y., Kubler, S., Candito, M., & Foster, J., et al. (2010). Statistical parsing of morphologically rich languages (SPMRL). What, how and whither. In Proceedings of the NAACL HLT 2010 first workshop on statistical parsing of morphologically-rich languages, Los Angeles, California, June, 2010 (pp. 1–12).
Benajiba, Y., & Zitouni, I. (2010). Arabic word segmentation for better unit of analysis. In Proceedings of the international conference on language resources and evaluation, Valletta, Malta, May 17–23, 2010 (pp. 1346–1352).
Ruokolainen, T., Kohonen, O., Virpioja, S., & Kurimo, M. (2013). Supervised morphological segmentation in a low-resource learning setting using conditional random fields. In Proceedings of the seventeenth conference on computational natural language learning, Sofia, Bulgaria, August 8–9, 2013 (pp. 29–37).
Lee, Y.-S. (2004). Morphological analysis for statistical machine translation. In Proceedings of the human language technologies and North American Association for computational linguistics, Boston, Massachusetts, May 02–07, 2004 (pp. 57–60).
Tsarfaty, R., Seddah, D., Kubler, S., & Nivre, J. (2013). Parsing morphologically rich languages: Introduction to the special issue. Computational Linguistics, 39(1), 15–22.
Hjelm, H., & Schwarz, C. (2005). LiSa-morphological analysis for information retrieval. In Proceedings of the 15th Nordic conference of computational linguistics, Joensuu, Finland, May 20–21, 2005 (pp. 65–70).
Uchimoto, K., Nobata, C., Yamada, A., Sekine, S., & Isahara, H. (2003). Morphological analysis of a large spontaneous speech corpus in Japanese. In Proceedings of the 41st annual meeting on association for computational linguistics—volume 1, Sapporo, Japan, July 07–12, 2003 (pp. 479–488).
Ruokolainen, T., Kohonen, O., Sirts, K., Grönroos, S.-A., Kurimo, M., & Virpioja, S. (2016). A comparative study of minimally supervised morphological segmentation. Computational Linguistics, 42(1), 91–120.
Sutskever, I., Vinyals, O., & Le, Q.V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the 27th international conference on neural information processing systems, Montreal, Canada, December 08–13, 2014 (pp. 3104–3112).
Wang, L., Cao, Z., Xia, Y., &de Melo G. (2016). Morphological segmentation with window LSTM neural networks. In Proceedings of the thirtieth AAAI conference on artificial intelligence, Phoenix, Arizona, February 12–17, 2016 (pp. 2842–2848).
Graves, A., Jaitly, N., & Mohamed, A. (2013). Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of the IEEE workshop on automatic speech recognition and understanding, Olomouc, Czech Republic, December 8–12, 2013 (pp. 273–278).
Jason, C., & Eric, N. (2016). Named entity recognition with bidirectional LSTMCNNs. Transactions of the Association for Computational Linguistics, 4, 357–370.
Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. CoRR, arXiv:1508.01991 [cs.CL].
Kurimo, M., Virpioja, S., Turunen, V., & Lagus, K. (2010). Morpho challenge competition 2005–2010: Evaluations and results. In Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology, Uppsala, Sweden, July 15, 2010 (pp. 87–95).
Peng, F., Feng F., & McCallum A. (2004). Chinese segmentation and new word detection using conditional random fields. In Proceedings of the 20th international conference on computational linguistics, Geneva, Switzerland, Aug 23–27, 2004 (pp. 562–568).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, S. A Neural Attention Based Model for Morphological Segmentation. Wireless Pers Commun 102, 2527–2534 (2018). https://doi.org/10.1007/s11277-018-5274-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-018-5274-8