Skip to main content

A Hybrid Method for Named Entity Recognition on Tweet Streams

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10191))

Included in the following conference series:

Abstract

Information extraction from microblogs has recently attracted researchers in the fields of knowledge discovery and data mining owing to its short nature. Annotating data is one of the significant issues in applying machine learning approaches to these sources. Active learning (AL) and semi-supervised learning (SSL) are two distinct approaches to reduce annotation costs. The SSL approach exploits high-confidence samples and AL queries the most informative samples. Thus they can produce better results when jointly applied. This paper proposes a combination of AL and SSL to reduce the labeling effort for named entity recognition (NER) from tweet streams by using both machine-labeled and manually-labeled data. The AL query algorithms select the most informative samples to label those done by a human annotator. In addition, Conditional Random Field (CRF) is chosen as an underlying model to select high-confidence samples. The experiment results on a tweet dataset demonstrate that the proposed method achieves promising results in reducing the human labeling effort and that it can significantly improve the performance of NER systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://about.twitter.com/company.

  2. 2.

    http://nlp.stanford.edu/software/CRF-NER.shtml.

  3. 3.

    https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html.

References

  1. Baldwin, T., Cook, P., Lui, M., Mackinlay, A., Wang, L.: How noisy social media text, how diffrnt social media sources? In: Proceedings of IJCNLP, pp. 356–364 (2013)

    Google Scholar 

  2. Basave, A.E.C., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts (# MSM2013) concept extraction challenge (2013)

    Google Scholar 

  3. Delcea, C., Bradea, I.A.: Grey clustering in online social networks. Vietnam J. Comput. Sci., 1–9 (2016). doi:10.1007/s40595-016-0087-8

  4. Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gorrell, G., Troncy, R., Petrak, J., Bontcheva, K.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51(2), 32–49 (2015)

    Article  Google Scholar 

  5. Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of HLT-NAACL, pp. 359–369 (2013)

    Google Scholar 

  6. Hassanzadeh, H., Keyvanpour, M.: A two-phase hybrid of semi-supervised and active learning approach for sequence labeling. Intell. Data Anal. 17(2), 251–270 (2013)

    Google Scholar 

  7. Korecki, J.N., Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Semi-supervised learning on large complex simulations. In: Proceedings of ICPR 2008, pp. 1–4. IEEE (2008)

    Google Scholar 

  8. Liao, W., Veeramachaneni, S.: A simple semi-supervised algorithm for named entity recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, pp. 58–65. ACL (2009)

    Google Scholar 

  9. Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 359–367. ACL (2011)

    Google Scholar 

  10. Liu, X., Zhou, M.: Two-stage ner for tweets with clustering. Inf. Process. Manage. 49(1), 264–273 (2013)

    Article  Google Scholar 

  11. Nguyen, N.T.: Using consensus methods for solving conflicts of data in distributed systems. In: Hlaváč, V., Jeffery, K.G., Wiedermann, J. (eds.) SOFSEM 2000. LNCS, vol. 1963, pp. 411–419. Springer, Heidelberg (2000). doi:10.1007/3-540-44411-4_30

    Chapter  Google Scholar 

  12. Settles, B.: Active learning literature survey. University of Wisconsin, Madison, 52(55–66), 11 (2010)

    Google Scholar 

  13. Tran, V.C., Hwang, D., Jung, J.J.: Twisner: Semi-supervised method for named entity recognition from text streams on twitter. J. Univ. Comput. Sci 22(6), 782–801 (2016)

    MathSciNet  Google Scholar 

  14. Tran, V.C., Nguyen, T.T., Hoang, D.T., Hwang, D., Nguyen, N.T.: Active learning-based approach for named entity recognition on short text streams. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds.) Multimedia and Network Information Systems. AISC, vol. 506, pp. 321–330. Springer, Cham (2017). doi:10.1007/978-3-319-43982-2_28

    Chapter  Google Scholar 

  15. Yao, L., Sun, C., Wang, X., Wang, X.: Combining self learning and active learning for chinese named entity recognition. J. Softw. 5(5), 530–537 (2010)

    Google Scholar 

  16. Zhang, Y., Wen, J., Wang, X., Jiang, Z.: Semi-supervised learning combining co-training with active learning. Expert Syst. Appl. 41(5), 2372–2378 (2014)

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported by the BK21+ program of the National Research Foundation (NRF) of Korea.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dosam Hwang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Tran, V.C., Hoang, D.T., Nguyen, N.T., Hwang, D. (2017). A Hybrid Method for Named Entity Recognition on Tweet Streams. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2017. Lecture Notes in Computer Science(), vol 10191. Springer, Cham. https://doi.org/10.1007/978-3-319-54472-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54472-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54471-7

  • Online ISBN: 978-3-319-54472-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics