Abstract
Named entity recognition (NER) is a task of detecting named entities in documents and categorizing them to predefined classes such as Person (PER), Location (LOC), Organization (ORG) and so on. There have been many approaches proposed to tackle this problem in both formal texts such as news or authorized web content and short texts such as contents in online social network. However, those texts were written in languages other than Vietnamese. In this paper, we propose a method for NER in Vietnamese tweets. Since tweets on Twitter are noisy, irregular, short and consist of acronyms, spelling errors, NER in those tweets is a challenging task. Our method firstly normalizes tweets and then applies a learning model to recognize named entities using six different types of features. We built a training set of more than 40,000 named entities, and a testing set of 2,446 named entities to evaluate our system. The experiment results show that our system achieves encouraging performance with 82.3% F1 score.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bandyopadhyay, A., Roy, D., Mitra, M., Saha, S.: Named entity recognition from tweets. In: Proceedings of the 16th LWA Workshops: KDML, IR and FGWM, Aachen, Germany, September 8–10, 2014, pp. 218–225 (2014)
Chu, M.N., Nghieu, V.D., Phien, H.T.: Basis of linguistics and Vietnamese. Vietnam educational publisher (2010)
Curran, J.R., Clark, S.: Language independent NER using a maximum entropy tagger. In: Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pp. 164–167 (2003)
Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, Hyderabad, India, January 6–12, 2007, pp. 2733–2739 (2007)
Fersini, E., Messina, E., Felici, G., Roth, D.: Soft-constrained inference for named entity recognition. Inf. Process. Manage. 50(5), 807–819 (2014)
Florian, R.: Named entity recognition as a house of cards: classifier stacking. In: Proceedings of the 6th Conference on Natural Language Learning, CoNLL 2002, Held in cooperation with COLING 2002, Taipei, Taiwan, 2002 (2002)
Jung, J.J.: Online named entity recognition method for microtexts in social networking services: A case study of twitter. Expert Syst. Appl. 39(9), 8066–8070 (2012)
Konkol, M., Brychcin, T., KonopÃk, M.: Latent semantics in named entity recognition. Expert Syst. Appl. 42(7), 3470–3479 (2015)
Le, H., Tran, M., Bui, N., Phan, N., Ha, Q.: An integrated approach using conditional random fields for named entity recognition and person property extraction in vietnamese text. In: International Conference on Asian Language Processing, IALP 2011, Penang, Malaysia, 15–17 November, 2011, pp. 115–118 (2011)
Hông Phuong, L., Thi Minh Huyên, N., Roussanaly, A., Vinh, H.T.: A hybrid approach to word segmentation of vietnamese texts. In: MartÃn-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 240–249. Springer, Heidelberg (2008)
Le-Hong, P., Roussanaly, A., et al.: An empirical study of maximum entropy approach for part-of-speech tagging of vietnamese texts. In: Traitement Automatique des Langues Naturelles-TALN 2010 (2010)
Li, C., Sun, A., Weng, J., He, Q.: Tweet segmentation and its application to named entity recognition. IEEE Trans. Knowl. Data Eng. 27(2), 558–570 (2015)
Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19–24 June, 2011, Portland, Oregon, USA, pp. 359–367 (2011)
Mayfield, J., McNamee, P., Piatko, C.D.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 – June 1, 2003, pp. 184–187 (2003)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pp. 188–191 (2003)
Nguyen, D.B., Hoang, S.H., Pham, S.B., Nguyen, T.P.: Named entity recognition for vietnamese. In: Nguyen, N.T., Le, M.T., Światek, J. (eds.) Intelligent Information and Database Systems. LNCS, vol. 5991, pp. 205–214. Springer, Heidelberg (2010)
Nguyen, T.T., Cao, T.H.: Linguistically motivated and ontological features for vietnamese named entity recognition. In: 2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), Ho Chi Minh City, Vietnam, February 27 – March 1, 2012, pp. 1–6 (2012)
Nguyen, V.H., Nguyen, H.T., Snasel, V.: Normalization of vietnamese tweets on twitter. In: Proceedings of the Second Euro-China Conference on Intelligent Data Analysis and Applications (2015)
Phe, H.: syllable Dictionary. Dictionary center, Hanoi encyclopedia Publishers (2011)
Ramage, D., Hall, D.L.W., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 248–256 (2009)
Ritter, A., Clark, S., Mausam, Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27–31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1524–1534 (2011)
Thao, P.T.X., Tri, T.Q., Dien, D., Collier, N.: Named entity recognition in vietnamese using classifier voting. ACM Trans. Asian Lang. Inf. Process. 6(4) (2007)
Tran, T.Q., et al.: Named entity recognition in vietnamese documents. Progress in Informatics 5 (2007)
Le Trung, H., Le Anh, V., Le Trung, K.: Bootstrapping and rule-based model for recognizing vietnamese named entity. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014, Part II. LNCS, vol. 8398, pp. 167–176. Springer, Heidelberg (2014)
Tu, N.C., et al.: Named entity recognition in vietnamese free-text and web documents using conditional random fields. In: The 8th Conference on Some Selection Problems of Information Technology and Telecommunication (2005)
Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6–12, 2002, Philadelphia, PA, USA, pp. 473–480 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, V.H., Nguyen, H.T., Snasel, V. (2015). Named Entity Recognition in Vietnamese Tweets. In: Thai, M., Nguyen, N., Shen, H. (eds) Computational Social Networks. CSoNet 2015. Lecture Notes in Computer Science(), vol 9197. Springer, Cham. https://doi.org/10.1007/978-3-319-21786-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-21786-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21785-7
Online ISBN: 978-3-319-21786-4
eBook Packages: Computer ScienceComputer Science (R0)