ACIIDS 2017: Intelligent Information and Database Systems pp 258-268 | Cite as
A Hybrid Method for Named Entity Recognition on Tweet Streams
Abstract
Information extraction from microblogs has recently attracted researchers in the fields of knowledge discovery and data mining owing to its short nature. Annotating data is one of the significant issues in applying machine learning approaches to these sources. Active learning (AL) and semi-supervised learning (SSL) are two distinct approaches to reduce annotation costs. The SSL approach exploits high-confidence samples and AL queries the most informative samples. Thus they can produce better results when jointly applied. This paper proposes a combination of AL and SSL to reduce the labeling effort for named entity recognition (NER) from tweet streams by using both machine-labeled and manually-labeled data. The AL query algorithms select the most informative samples to label those done by a human annotator. In addition, Conditional Random Field (CRF) is chosen as an underlying model to select high-confidence samples. The experiment results on a tweet dataset demonstrate that the proposed method achieves promising results in reducing the human labeling effort and that it can significantly improve the performance of NER systems.
Keywords
Named entity recognition Active learning Semi-supervised learning Hybrid method Tweet streamsNotes
Acknowledgment
This work was supported by the BK21+ program of the National Research Foundation (NRF) of Korea.
References
- 1.Baldwin, T., Cook, P., Lui, M., Mackinlay, A., Wang, L.: How noisy social media text, how diffrnt social media sources? In: Proceedings of IJCNLP, pp. 356–364 (2013)Google Scholar
- 2.Basave, A.E.C., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts (# MSM2013) concept extraction challenge (2013)Google Scholar
- 3.Delcea, C., Bradea, I.A.: Grey clustering in online social networks. Vietnam J. Comput. Sci., 1–9 (2016). doi: 10.1007/s40595-016-0087-8
- 4.Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gorrell, G., Troncy, R., Petrak, J., Bontcheva, K.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51(2), 32–49 (2015)CrossRefGoogle Scholar
- 5.Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of HLT-NAACL, pp. 359–369 (2013)Google Scholar
- 6.Hassanzadeh, H., Keyvanpour, M.: A two-phase hybrid of semi-supervised and active learning approach for sequence labeling. Intell. Data Anal. 17(2), 251–270 (2013)Google Scholar
- 7.Korecki, J.N., Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Semi-supervised learning on large complex simulations. In: Proceedings of ICPR 2008, pp. 1–4. IEEE (2008)Google Scholar
- 8.Liao, W., Veeramachaneni, S.: A simple semi-supervised algorithm for named entity recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, pp. 58–65. ACL (2009)Google Scholar
- 9.Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 359–367. ACL (2011)Google Scholar
- 10.Liu, X., Zhou, M.: Two-stage ner for tweets with clustering. Inf. Process. Manage. 49(1), 264–273 (2013)CrossRefGoogle Scholar
- 11.Nguyen, N.T.: Using consensus methods for solving conflicts of data in distributed systems. In: Hlaváč, V., Jeffery, K.G., Wiedermann, J. (eds.) SOFSEM 2000. LNCS, vol. 1963, pp. 411–419. Springer, Heidelberg (2000). doi: 10.1007/3-540-44411-4_30 CrossRefGoogle Scholar
- 12.Settles, B.: Active learning literature survey. University of Wisconsin, Madison, 52(55–66), 11 (2010)Google Scholar
- 13.Tran, V.C., Hwang, D., Jung, J.J.: Twisner: Semi-supervised method for named entity recognition from text streams on twitter. J. Univ. Comput. Sci 22(6), 782–801 (2016)MathSciNetGoogle Scholar
- 14.Tran, V.C., Nguyen, T.T., Hoang, D.T., Hwang, D., Nguyen, N.T.: Active learning-based approach for named entity recognition on short text streams. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds.) Multimedia and Network Information Systems. AISC, vol. 506, pp. 321–330. Springer, Cham (2017). doi: 10.1007/978-3-319-43982-2_28 CrossRefGoogle Scholar
- 15.Yao, L., Sun, C., Wang, X., Wang, X.: Combining self learning and active learning for chinese named entity recognition. J. Softw. 5(5), 530–537 (2010)Google Scholar
- 16.Zhang, Y., Wen, J., Wang, X., Jiang, Z.: Semi-supervised learning combining co-training with active learning. Expert Syst. Appl. 41(5), 2372–2378 (2014)CrossRefGoogle Scholar