Mongolian Word Segmentation Based on Three Character Level Seq2Seq Models

  • Na Liu
  • Xiangdong SuEmail author
  • Guanglai Gao
  • Feilong Bao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11305)


Mongolian word segmentation is splitting the Mongolian words into roots and suffixes. It plays an important role in Mongolian related natural language processing tasks. To improve performance and avoid the tedious work of rule-making and statistics over large-scale corpus in early methods, this work takes a Seq2Seq framework to realize Mongolian word segmentation. Since each Mongolian word consisted of several sequential characters, we map Mongolian word segmentation to character-level Seq2Seq task, and further propose three different models from three different prospective to achieve the segmentation goal. The three character-level Seq2Seq models are (1) translation model, (2) true and pseudo mapping model, (3) binary choice model. The main differences of these three models are the output sequences and the architectures of the RNNs in segmentation. We employ an improved beam search to optimize the second segmentation model and boost the segmentation process. All the models are trained on a limited dataset, and the second model achieved the state-of-the-art accuracy.


Mongolian Word segmentation Seq2Seq Limited search strategy LSTM 



This work was funded by National Natural Science Foundation of China (Grant No. 61762069), Natural Science Foundation of Inner Mongolia Autonomous Region (Grant No. 2017BS0601), Research program of science and technology at Universities of Inner Mongolia Autonomous Region (Grant No. NJZY18237).


  1. 1.
    Hankamer, J.: Finite state morphology and left to right phonology. In: Proceedings of the West Coast Conference on Formal Linguistics, vol. 5, pp. 41–52 (1986)Google Scholar
  2. 2.
    Na, L., Junyi, W., Guiping, L.: Query expansion based on mongolian semantics. In: Third World Congress on Software Engineering IEEE Computer Society, pp. 25–28. IEEE, Wuhan (2012)Google Scholar
  3. 3.
    Jing, W., Hou, H., Bao, F., Jiang, Y.: Template-based model for Mongolian - Chinese machine translation. In: Technologies and Applications of Artificial Intelligence, pp. 352–357. IEEE, Tainan (2016)Google Scholar
  4. 4.
    Weihua, W., Feilong, B., Guanglai, G.: Mongolian named entity recognition with bidirectional recurrent neural networks. In: IEEE International Conference on TOOLS with Artificial Intelligence, pp. 495–500. IEEE, San Jose (2017)Google Scholar
  5. 5.
    Liu, R., Bao, F., Gao, G., Wang, Y.: Mongolian text-to-speech system based on deep neural network. In: Tao, J., Zheng, T.F., Bao, C., Wang, D., Li, Y. (eds.) NCMMSC 2017. CCIS, vol. 807, pp. 99–108. Springer, Singapore (2018). Scholar
  6. 6.
    Hongxu, H., Liu, Q., Nasanurtu, M.: Mongolian word segmentation based on statistical language model. Pattern Recognit. Artif. Intell. 22(1), 108–112 (2009)Google Scholar
  7. 7.
    Ming, Y., Hongxu, H.: Researching of Mongolian word segmentation system based on dictionary, rules and language model. M.S. Thesis, Inner Mongolia University, Hohhot, Inner Mongolia, China (2011)Google Scholar
  8. 8.
    Jianguo, S., Hongxu, H., Bao, F.: Research on Slavic Mongolian word segmentation based on dictionary and rule. J. Chin. Inf. Process. 29(1), 197–202 (2015)Google Scholar
  9. 9.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  10. 10.
    Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, pp. 1412–1421 (2015)Google Scholar
  11. 11.
    Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
  12. 12.
    Vaswani, A., et al.: Tensor2tensor for neural machine translation. arXiv preprint arXiv:1803.07416 (2018)
  13. 13.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Computer Science. arXiv preprint arXiv:1409.0473(2014)
  14. 14.
    Shiqi, S., Yong, C., Zhongjun, H., Wei, H., et al.: Minimum risk training for neural machine translation. In: ACL 2016: Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1683–1692. ACL, Berlin (2016)Google Scholar
  15. 15.
    Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-end attention-based large vocabulary speech recognition. Computer Science, pp. 4945–4949 (2016)Google Scholar
  16. 16.
    Arık, S.O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A.: Deep voice: realtime neural text-to-speech. In: International Conference on Machine Learning and Computing, ICMLC 2017, pp. 195–2049. ACM, Singapore (2017)Google Scholar
  17. 17.
    Asri, L.E., He, J., Suleman, K.: A sequence-to-sequence model for user simulation in spoken dialogue systems. In: Conference of the International Speech Communication Association, Interspeech, pp. 1151–1155. IEEE, San Francisco (2016)Google Scholar
  18. 18.
    Nallapati, R., Zhou, B., Santos, C., Gulcehre, C., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, pp. 280–290. ACL, Berlin (2016)Google Scholar
  19. 19.
    See, A., Liu, P.J., Manning, C.D.: Get to the point: summarization with pointer-generator networks. In: ACL 2017: Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1073–1083. ACL, Vancouver (2017)Google Scholar
  20. 20.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27 (NIPS 2014), vol. 4, pp. 3104–3112. MIT Press Cambridge, Montréal (2014)Google Scholar
  21. 21.
    Hakkani-Tür, D., Tur, G., Celikyilmaz, A., Chen, Y.N., Gao, J., Deng, L.: Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In: Conference of the International Speech Communication Association, Interspeech, pp. 715–719. IEEE, San Francisco (2016)Google Scholar
  22. 22.
    Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems 27 (NIPS 2014), vol. 3, pp. 2204–2212. MIT Press Cambridge, Montréal (2014)Google Scholar
  23. 23.
    Neubig, G.: Neural machine translation and sequence-to-sequence models: a tutorial, pp. 41–43. arXiv preprint arXiv: 1703.01619v1 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Na Liu
    • 1
    • 2
    • 3
  • Xiangdong Su
    • 1
    • 2
    Email author
  • Guanglai Gao
    • 1
    • 2
  • Feilong Bao
    • 1
    • 2
  1. 1.College of Computer ScienceInner Mongolia UniversityHohhotChina
  2. 2.Inner Mongolia Key Laboratory of Mongolian Information Processing TechnologyHohhotChina
  3. 3.Department of ScienceHetao UniversityBayannurChina

Personalised recommendations