Skip to main content

Named Entity Recognition in Vietnamese Tweets

  • Conference paper
  • First Online:
Computational Social Networks (CSoNet 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9197))

Included in the following conference series:

Abstract

Named entity recognition (NER) is a task of detecting named entities in documents and categorizing them to predefined classes such as Person (PER), Location (LOC), Organization (ORG) and so on. There have been many approaches proposed to tackle this problem in both formal texts such as news or authorized web content and short texts such as contents in online social network. However, those texts were written in languages other than Vietnamese. In this paper, we propose a method for NER in Vietnamese tweets. Since tweets on Twitter are noisy, irregular, short and consist of acronyms, spelling errors, NER in those tweets is a challenging task. Our method firstly normalizes tweets and then applies a learning model to recognize named entities using six different types of features. We built a training set of more than 40,000 named entities, and a testing set of 2,446 named entities to evaluate our system. The experiment results show that our system achieves encouraging performance with 82.3% F1 score.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bandyopadhyay, A., Roy, D., Mitra, M., Saha, S.: Named entity recognition from tweets. In: Proceedings of the 16th LWA Workshops: KDML, IR and FGWM, Aachen, Germany, September 8–10, 2014, pp. 218–225 (2014)

    Google Scholar 

  2. Chu, M.N., Nghieu, V.D., Phien, H.T.: Basis of linguistics and Vietnamese. Vietnam educational publisher (2010)

    Google Scholar 

  3. Curran, J.R., Clark, S.: Language independent NER using a maximum entropy tagger. In: Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pp. 164–167 (2003)

    Google Scholar 

  4. Downey, D., Broadhead, M., Etzioni, O.: Locating complex named entities in web text. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, Hyderabad, India, January 6–12, 2007, pp. 2733–2739 (2007)

    Google Scholar 

  5. Fersini, E., Messina, E., Felici, G., Roth, D.: Soft-constrained inference for named entity recognition. Inf. Process. Manage. 50(5), 807–819 (2014)

    Article  Google Scholar 

  6. Florian, R.: Named entity recognition as a house of cards: classifier stacking. In: Proceedings of the 6th Conference on Natural Language Learning, CoNLL 2002, Held in cooperation with COLING 2002, Taipei, Taiwan, 2002 (2002)

    Google Scholar 

  7. Jung, J.J.: Online named entity recognition method for microtexts in social networking services: A case study of twitter. Expert Syst. Appl. 39(9), 8066–8070 (2012)

    Article  Google Scholar 

  8. Konkol, M., Brychcin, T., Konopík, M.: Latent semantics in named entity recognition. Expert Syst. Appl. 42(7), 3470–3479 (2015)

    Article  Google Scholar 

  9. Le, H., Tran, M., Bui, N., Phan, N., Ha, Q.: An integrated approach using conditional random fields for named entity recognition and person property extraction in vietnamese text. In: International Conference on Asian Language Processing, IALP 2011, Penang, Malaysia, 15–17 November, 2011, pp. 115–118 (2011)

    Google Scholar 

  10. Hông Phuong, L., Thi Minh Huyên, N., Roussanaly, A., Vinh, H.T.: A hybrid approach to word segmentation of vietnamese texts. In: Martín-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 240–249. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Le-Hong, P., Roussanaly, A., et al.: An empirical study of maximum entropy approach for part-of-speech tagging of vietnamese texts. In: Traitement Automatique des Langues Naturelles-TALN 2010 (2010)

    Google Scholar 

  12. Li, C., Sun, A., Weng, J., He, Q.: Tweet segmentation and its application to named entity recognition. IEEE Trans. Knowl. Data Eng. 27(2), 558–570 (2015)

    Article  Google Scholar 

  13. Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19–24 June, 2011, Portland, Oregon, USA, pp. 359–367 (2011)

    Google Scholar 

  14. Mayfield, J., McNamee, P., Piatko, C.D.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 – June 1, 2003, pp. 184–187 (2003)

    Google Scholar 

  15. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning, CoNLL 2003, Held in cooperation with HLT-NAACL 2003, Edmonton, Canada, May 31 - June 1, 2003, pp. 188–191 (2003)

    Google Scholar 

  16. Nguyen, D.B., Hoang, S.H., Pham, S.B., Nguyen, T.P.: Named entity recognition for vietnamese. In: Nguyen, N.T., Le, M.T., Światek, J. (eds.) Intelligent Information and Database Systems. LNCS, vol. 5991, pp. 205–214. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  17. Nguyen, T.T., Cao, T.H.: Linguistically motivated and ontological features for vietnamese named entity recognition. In: 2012 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), Ho Chi Minh City, Vietnam, February 27 – March 1, 2012, pp. 1–6 (2012)

    Google Scholar 

  18. Nguyen, V.H., Nguyen, H.T., Snasel, V.: Normalization of vietnamese tweets on twitter. In: Proceedings of the Second Euro-China Conference on Intelligent Data Analysis and Applications (2015)

    Google Scholar 

  19. Phe, H.: syllable Dictionary. Dictionary center, Hanoi encyclopedia Publishers (2011)

    Google Scholar 

  20. Ramage, D., Hall, D.L.W., Nallapati, R., Manning, C.D.: Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 248–256 (2009)

    Google Scholar 

  21. Ritter, A., Clark, S., Mausam, Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, 27–31 July 2011, John McIntyre Conference Centre, Edinburgh, UK, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1524–1534 (2011)

    Google Scholar 

  22. Thao, P.T.X., Tri, T.Q., Dien, D., Collier, N.: Named entity recognition in vietnamese using classifier voting. ACM Trans. Asian Lang. Inf. Process. 6(4) (2007)

    Google Scholar 

  23. Tran, T.Q., et al.: Named entity recognition in vietnamese documents. Progress in Informatics 5 (2007)

    Google Scholar 

  24. Le Trung, H., Le Anh, V., Le Trung, K.: Bootstrapping and rule-based model for recognizing vietnamese named entity. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds.) ACIIDS 2014, Part II. LNCS, vol. 8398, pp. 167–176. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  25. Tu, N.C., et al.: Named entity recognition in vietnamese free-text and web documents using conditional random fields. In: The 8th Conference on Some Selection Problems of Information Technology and Telecommunication (2005)

    Google Scholar 

  26. Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6–12, 2002, Philadelphia, PA, USA, pp. 473–480 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hien T. Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, V.H., Nguyen, H.T., Snasel, V. (2015). Named Entity Recognition in Vietnamese Tweets. In: Thai, M., Nguyen, N., Shen, H. (eds) Computational Social Networks. CSoNet 2015. Lecture Notes in Computer Science(), vol 9197. Springer, Cham. https://doi.org/10.1007/978-3-319-21786-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21786-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21785-7

  • Online ISBN: 978-3-319-21786-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics