Abstract
Requiring a large hand-annotated corpus in supervised learning of contemporary Vietnamese Named Entity Recognition researches is challenging. We therefore propose a hybrid approach of pattern extraction and semi-supervised learning. Applied rule-based method helps generating patterns automatically. Part-of-speech tagger, lexical diversity and chunking are explored to define rules in pattern extractions which are used for identifying potential named entities. Semi-supervised learning trains a small amount of seed named entities to categorize named entities in extracted patterns. In experiments, our approach shows good increasing the system accuracy with others in Vietnamese.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing, pp. 189–196 (1999)
Cunningham, H., Bontcheva, K.: Named Entity Recognition. In: Proceedings of Recent Advances in Natural Language Processing, Bulgaria (2003)
Isozaki, H.: Japanese Named Entity Recognition Based on A Simple Rule Generator and Decision Tree Learning. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 314–321 (2001)
Jingbo, Z., Wenliang, C., Tianshun, Y.: Using Seed Words to Learn to Categorize Chinese Text. In: Advances in Natural Language Processing, pp. 464–473 (2004)
Mochit, B., Hwa, R.: Syntax-based Semi-supervised Named Entity Tagging. In: Proceedings of the ACL 2005 on Interactive Poster and Demonstration Sessions, pp. 57–60 (2005)
Muslea, I.: Extraction Patterns for Information Extraction Tasks: A survey. In: Proceedings of the AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)
Nguyen, D.B., Hoang, S.H., Pham, S.B., Nguyen, T.P.: Named Entity Recognition for Vietnamese. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds.) ACIIDS 2010, Part II. LNCS (LNAI), vol. 5991, pp. 205–214. Springer, Heidelberg (2010)
Nguyen, V.T.T., Cao, T.H.: Automatic Extraction of Vietnamese Named-Entities on the Web. Journal of New Generation Computing 25, 277–292 (2007)
Niu, C., Li, W., Ding, J., Rohihi, K.S.: A Bootstrapping Approach to Named Entity Classification Using Successive Learner. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics ACL 2003, vol. 1, pp. 335–342 (2003)
Liao, W., Weeramachaneni, S.: A simple Semi-supervise Algorithm for Named Entity Recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing SemiSupLearn 2009, pp. 58–65 (2009)
Patwardhan, S., Riloff, E.: Effective Information Extraction with Semantic Affinity Patterns and Relevant Regions. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 717–727 (2007)
Sam, R.C., Le, H.T., Nguyen, T.T., Nguyen, T.H.: Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS (LNAI), vol. 6634, pp. 512–524. Springer, Heidelberg (2011)
Teixeira, J., Sarmento, L., Oliveira, E.: A Bootstrapping Approach for Training a NER with Conditional Random Fields. In: Antunes, L., Pinto, H.S. (eds.) EPIA 2011. LNCS (LNAI), vol. 7026, pp. 664–678. Springer, Heidelberg (2011)
Tekeuchi, K., Collier, N.: Use of Support Vector Machines in Extended Named Entity Recognition. In: Proceedings of the 6th Conference on Natural Language Learning, COLING 2002, vol. 20, pp. 1–7 (2002)
Thelen, M., Riloff, E.: A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2002 (2002)
Tri, T.Q., Thao, P.T.X., Hung, N.Q., Dien, D., Nigel, C.: Named Entity Recognition in Vietnamese documents. Proceedings of Progress in Informatics (4), 5–13 (2007)
Tsai, T.H., Wu, S.H., Lee, C.W., Shih, C.W., Hsu, W.L.: A Chinese Named Entity Recognizer Using the Maximum Entropy-based Hybrid Model. International Journal of Computational Linguistics and Chinese Language Processing 9(1) (2004)
Zhu, X., Goldberg, A.B.: Introduction to Semi-Supervised Learning. In: Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 3, pp. 1–130 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vo, DT., Ock, CY. (2012). A Hybrid Approach of Pattern Extraction and Semi-supervised Learning for Vietnamese Named Entity Recognition. In: Nguyen, NT., Hoang, K., Jȩdrzejowicz, P. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2012. Lecture Notes in Computer Science(), vol 7653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34630-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-34630-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34629-3
Online ISBN: 978-3-642-34630-9
eBook Packages: Computer ScienceComputer Science (R0)