An End-to-End Scalable Iterative Sequence Tagging with Multi-Task Learning

  • Lin Gui
  • Jiachen Du
  • Zhishan Zhao
  • Yulan He
  • Ruifeng XuEmail author
  • Chuang Fan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11109)


Multi-task learning (MTL) models, which pool examples arisen out of several tasks, have achieved remarkable results in language processing. However, multi-task learning is not always effective when compared with the single-task methods in sequence tagging. One possible reason is that existing methods to multi-task sequence tagging often reply on lower layer parameter sharing to connect different tasks. The lack of interactions between different tasks results in limited performance improvement. In this paper, we propose a novel multi-task learning architecture which could iteratively utilize the prediction results of each task explicitly. We train our model for part-of-speech (POS) tagging, chunking and named entity recognition (NER) tasks simultaneously. Experimental results show that without any task-specific features, our model obtains the state-of-the-art performance on both chunking and NER.


Multi-task learning Interactions Sequence tagging 



This work was supported by the National Natural Science Foundation of China U1636103, 61632011, Shenzhen Foundational Research Funding 20170307150024907, Key Technologies Research and Development Program of Shenzhen JSGG20170817140856618.


  1. 1.
    Ma, X., Hovy, E.H.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In: ACL, pp. 1064–1074 (2016)Google Scholar
  2. 2.
    dos Santos, C.N., Zadrozny, B.: Learning character-level representations for part-of-speech tagging. In: ICML, pp. 1818–1826 (2014)Google Scholar
  3. 3.
    Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR, abs/1508.01991 (2015)Google Scholar
  4. 4.
    Collobert, R., et al.: Natural language processing (almost) from scratch. JMLR 12, 2493–2537 (2011)zbMATHGoogle Scholar
  5. 5.
    Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL, pp. 260–270 (2016)Google Scholar
  6. 6.
    Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNS. TACL 4, 357–370 (2016)Google Scholar
  7. 7.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Cheng, H., Fang, H., Ostendorf, M.: Open-domain name error detection using a multi-task RNN. In: EMNLP, pp. 737–746 (2015)Google Scholar
  9. 9.
    Søgaard, A., Goldberg, Y.: Deep multi-task learning with low level tasks supervised at lower layers. In: ACL(2), pp. 231–235 (2016)Google Scholar
  10. 10.
    Alonso, H.M., Plank, B.: When is multitask learning effective? Semantic sequence predictionunder varying data conditions. In: EACL, pp. 44–53 (2017)Google Scholar
  11. 11.
    Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: AAAI, pp. 2741–2749 (2016)Google Scholar
  12. 12.
    Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: NIPS, pp. 2377–2385 (2015)Google Scholar
  13. 13.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  14. 14.
    Ling, W., Dyer, C., Black, A.W., Trancoso, I., Fermandez, R., Amir, S., Marujo, L., Luís, T.: Finding function in form: compositional character models for open vocabulary word representation. In: EMNLP, pp. 1520–1530 (2015)Google Scholar
  15. 15.
    Kudoh, T., Matsumoto, Y.: Use of support vector learning for chunk identification. In: CoNLL, pp. 142–144 (2000)Google Scholar
  16. 16.
    Shen, H., Sarkar, A.: Voting between multiple data representations for text chunking. In: Kégl, B., Lapalme, G. (eds.) AI 2005. LNCS (LNAI), vol. 3501, pp. 389–400. Springer, Heidelberg (2005). Scholar
  17. 17.
    Luo, G., Huang, X., Lin, C.-Y., Nie, Z.: Joint entity recognition and disambiguation. In: EMNLP, pp. 879–888 (2015)Google Scholar
  18. 18.
    Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In: ACL(2), pp. 412–418 (2016)Google Scholar
  19. 19.
    Durrett, G., Klein, D.: A joint model for entity analysis: coreference, typing, and linking. TACL 2, 477–490 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Lin Gui
    • 1
    • 2
  • Jiachen Du
    • 1
  • Zhishan Zhao
    • 3
  • Yulan He
    • 2
  • Ruifeng Xu
    • 1
    Email author
  • Chuang Fan
    • 1
  1. 1.Harbin Institute of Technology (Shenzhen)ShenzhenChina
  2. 2.Aston UniversityBirminghamUK
  3. 3.Baidu Inc.BeijingChina

Personalised recommendations