Skip to main content

A Hybrid Approach of Pattern Extraction and Semi-supervised Learning for Vietnamese Named Entity Recognition

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7653))

Abstract

Requiring a large hand-annotated corpus in supervised learning of contemporary Vietnamese Named Entity Recognition researches is challenging. We therefore propose a hybrid approach of pattern extraction and semi-supervised learning. Applied rule-based method helps generating patterns automatically. Part-of-speech tagger, lexical diversity and chunking are explored to define rules in pattern extractions which are used for identifying potential named entities. Semi-supervised learning trains a small amount of seed named entities to categorize named entities in extracted patterns. In experiments, our approach shows good increasing the system accuracy with others in Vietnamese.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing, pp. 189–196 (1999)

    Google Scholar 

  2. Cunningham, H., Bontcheva, K.: Named Entity Recognition. In: Proceedings of Recent Advances in Natural Language Processing, Bulgaria (2003)

    Google Scholar 

  3. Isozaki, H.: Japanese Named Entity Recognition Based on A Simple Rule Generator and Decision Tree Learning. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 314–321 (2001)

    Google Scholar 

  4. Jingbo, Z., Wenliang, C., Tianshun, Y.: Using Seed Words to Learn to Categorize Chinese Text. In: Advances in Natural Language Processing, pp. 464–473 (2004)

    Google Scholar 

  5. Mochit, B., Hwa, R.: Syntax-based Semi-supervised Named Entity Tagging. In: Proceedings of the ACL 2005 on Interactive Poster and Demonstration Sessions, pp. 57–60 (2005)

    Google Scholar 

  6. Muslea, I.: Extraction Patterns for Information Extraction Tasks: A survey. In: Proceedings of the AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)

    Google Scholar 

  7. Nguyen, D.B., Hoang, S.H., Pham, S.B., Nguyen, T.P.: Named Entity Recognition for Vietnamese. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds.) ACIIDS 2010, Part II. LNCS (LNAI), vol. 5991, pp. 205–214. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Nguyen, V.T.T., Cao, T.H.: Automatic Extraction of Vietnamese Named-Entities on the Web. Journal of New Generation Computing 25, 277–292 (2007)

    Article  Google Scholar 

  9. Niu, C., Li, W., Ding, J., Rohihi, K.S.: A Bootstrapping Approach to Named Entity Classification Using Successive Learner. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics ACL 2003, vol. 1, pp. 335–342 (2003)

    Google Scholar 

  10. Liao, W., Weeramachaneni, S.: A simple Semi-supervise Algorithm for Named Entity Recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing SemiSupLearn 2009, pp. 58–65 (2009)

    Google Scholar 

  11. Patwardhan, S., Riloff, E.: Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 717–727 (2007)

    Google Scholar 

  12. Sam, R.C., Le, H.T., Nguyen, T.T., Nguyen, T.H.: Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS (LNAI), vol. 6634, pp. 512–524. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Teixeira, J., Sarmento, L., Oliveira, E.: A Bootstrapping Approach for Training a NER with Conditional Random Fields. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS (LNAI), vol. 7026, pp. 664–678. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  14. Tekeuchi, K., Collier, N.: Use of Support Vector Machines in Extended Named Entity Recognition. In: Proceedings of the 6th Conference on Natural Language Learning, COLING 2002, vol. 20, pp. 1–7 (2002)

    Google Scholar 

  15. Thelen, M., Riloff, E.: A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (2002)

    Google Scholar 

  16. Tri, T.Q., Thao, P.T.X., Hung, N.Q., Dien, D., Nigel, C.: Named Entity Recognition in Vietnamese documents. Proceedings of Progress in Informatics (4), 5–13 (2007)

    Google Scholar 

  17. Tsai, T.H., Wu, S.H., Lee, C.W., Shih, C.W., Hsu, W.L.: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model. International Journal of Computational Linguistics and Chinese Language Processing 9(1) (2004)

    Google Scholar 

  18. Zhu, X., Goldberg, A.B.: Introduction to Semi-Supervised Learning. In: Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 3, pp. 1–130 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vo, DT., Ock, CY. (2012). A Hybrid Approach of Pattern Extraction and Semi-supervised Learning for Vietnamese Named Entity Recognition. In: Nguyen, NT., Hoang, K., Jȩdrzejowicz, P. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2012. Lecture Notes in Computer Science(), vol 7653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34630-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34630-9_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34629-3

  • Online ISBN: 978-3-642-34630-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics