Abstract
Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations and locations. Most existing NER systems are based on supervised learning. This method often requires a large amount of labelled training data, which is very time-consuming to build. To solve this problem, we introduce a semi-supervised learning method for recognizing named entities in Vietnamese text by combining proper name coreference, named-ambiguity heuristics with a powerful sequential learning model, Conditional Random Fields. Our approach inherits the idea of Liao and Veeramachaneni [6] and expands it by using proper name coreference. Starting by training the model using a small data set that is annotated manually, the learning model extracts high confident named entities and finds low confident ones by using proper name coreference rules. The low confident named entities are put in the training set to learn new context features. The F-scores of the system for extracting “Person”, “Location” and “Organization”entities are 83.36%, 69.53% and 65.71% when applying heuristics proposed by Liao and Veeramachaneni. Those values when using our proposed heuristics are 93.13%, 88.15% and 79.35%, respectively. It shows that our method is good in increasing the system accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blum, A., Mitchell, T.: Combining Labelled and Unlabelled Data with Co-training. In: Proceedings of the Workshop on Computational Learning Theory, pp. 92–100 (1998)
Bikel, D., Schwartz, R., Weischedel, R.: An Algorithm that Learns What’s in a Name. Machine Learning 34(1-3), 211–231 (1999)
Borthwick, A.: Maximum Entropy Approach to Named Entity Recognition. Ph.D. Thesis, New York University (1999)
Culotta, A., McCallum, A.: Confidence Estimation for Information Extraction. In: Proceeding of HLT-NAACL, pp. 109–112 (2004)
Nguyen, T.H., Cao, H.T.: An Approach to Entity Coreference and Ambiguity Resolution in Vietnamese Texts. Vietnamese Journal of Post and Telecommunication 19, 74–83 (2008)
Liao, W., Veeramachaneni, S.: A Simple Semi-supervised Algorithm for Named Entity Recognition. In: Proceedings of the NAACL HLT Workshop on Semi-supervised Learning for Natural Language Processing, pp. 28–36 (2009)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of ICML Conference, pp. 282–290
Le, H.P., Roussanaly, A., Nguyen, T.M.H., Rossignol, M.: An Empirical Study of Maximum Entropy Approach for Part of Speech Tagging of Vietnamese Texts. In: Proceedings of TALN 2010 Conference, Canada (2010)
McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. In: Proceedings of CoNLL, Canada, pp. 188–191 (2003)
Malouf, R.: A Comparison of Algorithms for Maximum Entropy Parameter Estimation. In: Sixth Workshop on Computational Language Learning, CoNLL (2002)
Mohit, B., Hwa, R.: Syntax-based semi-supervised Named Entity Tagging. In: Proceedings of the ACL Interactive Poster and Demonstration Sessions, Michigan, pp. 57–60 (2005)
Nguyen, C.T., Tran, T.O., Phan, X.H., Ha, Q.T.: Named Entity Recognition in Vietnamese Free-Text and Web Documents Using Conditional Random Fields. In: Proceedings of the 8th Conference on Some Selection Problems of Information Technology and Telecommunication, Hai Phong, Vietnam (2005)
Niu, C., Li, W., Ding, J., Rohini, K.S.: A Bootstrapping Approach to Named Entity Classification Using Successive Learner. In: Proceedings of the 41st Annual Meeting of the ACL, pp. 335–342 (2003)
Perrow, M., Barber, D.: Tagging of Name Record for Genealogical Data Browsing. In: Proceedings of the 6th ACM/IEEE JCDL, Chapel Hill, NC, USA, pp. 316–325 (2006)
Tran, Q.T., Pham, T.X.T., Ngo, Q.H., Dinh, D., Collier, N.: Named Entity Recognition in Vietnamese Using Classifier Voting. Proceedings of ACM Transactions on Asian Language Information Processing, TALIP (2007)
Wong, Y., Ng, H.T.: One Class per Named Entity: Exploiting Unlabelled Text for Named Entity Recognition. In: Proceedings of IJCAI, pp. 1763–1768 (2007)
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In: Proceedings of Meeting of the Association for Computational Linguistics, pp. 189–196 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sam, R.C., Le, H.T., Nguyen, T.T., Nguyen, T.H. (2011). Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text. In: Huang, J.Z., Cao, L., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 6634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20841-6_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-20841-6_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20840-9
Online ISBN: 978-3-642-20841-6
eBook Packages: Computer ScienceComputer Science (R0)