A Hybrid Method for Named Entity Recognition on Tweet Streams

  • Van Cuong Tran
  • Dinh Tuyen Hoang
  • Ngoc Thanh Nguyen
  • Dosam Hwang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10191)

Abstract

Information extraction from microblogs has recently attracted researchers in the fields of knowledge discovery and data mining owing to its short nature. Annotating data is one of the significant issues in applying machine learning approaches to these sources. Active learning (AL) and semi-supervised learning (SSL) are two distinct approaches to reduce annotation costs. The SSL approach exploits high-confidence samples and AL queries the most informative samples. Thus they can produce better results when jointly applied. This paper proposes a combination of AL and SSL to reduce the labeling effort for named entity recognition (NER) from tweet streams by using both machine-labeled and manually-labeled data. The AL query algorithms select the most informative samples to label those done by a human annotator. In addition, Conditional Random Field (CRF) is chosen as an underlying model to select high-confidence samples. The experiment results on a tweet dataset demonstrate that the proposed method achieves promising results in reducing the human labeling effort and that it can significantly improve the performance of NER systems.

Keywords

Named entity recognition Active learning Semi-supervised learning Hybrid method Tweet streams 

Notes

Acknowledgment

This work was supported by the BK21+ program of the National Research Foundation (NRF) of Korea.

References

  1. 1.
    Baldwin, T., Cook, P., Lui, M., Mackinlay, A., Wang, L.: How noisy social media text, how diffrnt social media sources? In: Proceedings of IJCNLP, pp. 356–364 (2013)Google Scholar
  2. 2.
    Basave, A.E.C., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts (# MSM2013) concept extraction challenge (2013)Google Scholar
  3. 3.
    Delcea, C., Bradea, I.A.: Grey clustering in online social networks. Vietnam J. Comput. Sci., 1–9 (2016). doi: 10.1007/s40595-016-0087-8
  4. 4.
    Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gorrell, G., Troncy, R., Petrak, J., Bontcheva, K.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51(2), 32–49 (2015)CrossRefGoogle Scholar
  5. 5.
    Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of HLT-NAACL, pp. 359–369 (2013)Google Scholar
  6. 6.
    Hassanzadeh, H., Keyvanpour, M.: A two-phase hybrid of semi-supervised and active learning approach for sequence labeling. Intell. Data Anal. 17(2), 251–270 (2013)Google Scholar
  7. 7.
    Korecki, J.N., Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Semi-supervised learning on large complex simulations. In: Proceedings of ICPR 2008, pp. 1–4. IEEE (2008)Google Scholar
  8. 8.
    Liao, W., Veeramachaneni, S.: A simple semi-supervised algorithm for named entity recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, pp. 58–65. ACL (2009)Google Scholar
  9. 9.
    Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 359–367. ACL (2011)Google Scholar
  10. 10.
    Liu, X., Zhou, M.: Two-stage ner for tweets with clustering. Inf. Process. Manage. 49(1), 264–273 (2013)CrossRefGoogle Scholar
  11. 11.
    Nguyen, N.T.: Using consensus methods for solving conflicts of data in distributed systems. In: Hlaváč, V., Jeffery, K.G., Wiedermann, J. (eds.) SOFSEM 2000. LNCS, vol. 1963, pp. 411–419. Springer, Heidelberg (2000). doi: 10.1007/3-540-44411-4_30 CrossRefGoogle Scholar
  12. 12.
    Settles, B.: Active learning literature survey. University of Wisconsin, Madison, 52(55–66), 11 (2010)Google Scholar
  13. 13.
    Tran, V.C., Hwang, D., Jung, J.J.: Twisner: Semi-supervised method for named entity recognition from text streams on twitter. J. Univ. Comput. Sci 22(6), 782–801 (2016)MathSciNetGoogle Scholar
  14. 14.
    Tran, V.C., Nguyen, T.T., Hoang, D.T., Hwang, D., Nguyen, N.T.: Active learning-based approach for named entity recognition on short text streams. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds.) Multimedia and Network Information Systems. AISC, vol. 506, pp. 321–330. Springer, Cham (2017). doi: 10.1007/978-3-319-43982-2_28 CrossRefGoogle Scholar
  15. 15.
    Yao, L., Sun, C., Wang, X., Wang, X.: Combining self learning and active learning for chinese named entity recognition. J. Softw. 5(5), 530–537 (2010)Google Scholar
  16. 16.
    Zhang, Y., Wen, J., Wang, X., Jiang, Z.: Semi-supervised learning combining co-training with active learning. Expert Syst. Appl. 41(5), 2372–2378 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Van Cuong Tran
    • 1
  • Dinh Tuyen Hoang
    • 1
  • Ngoc Thanh Nguyen
    • 2
  • Dosam Hwang
    • 1
  1. 1.Department of Computer EngineeringYeungnam UniversityGyeongsanSouth Korea
  2. 2.Faculty of Computer Science and ManagementWrocław University of Science and TechnologyWrocławPoland

Personalised recommendations