Advertisement

Bootstrapping and Rule-Based Model for Recognizing Vietnamese Named Entity

  • Hieu Le Trung
  • Vu Le Anh
  • Kien Le Trung
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8398)

Abstract

This paper intends to address and solve the problem Vietnamese Named Entity recognition and classification (VNER) by using the bootstrapping algorithm and rule-based model. The rule-based model relies on contextual rules to provide contextual evidence that a VNE belongs to a category. These rules exploit linguistic constraints of category are constructed by using the bootstrapping algorithm. Bootstrapping algorithm starts with a handful of seed VNEs of a given category and accumulate all contextual rules found around these seeds in a large corpus. These rules are ranked and used to find new VNEs.

Our experimented corpus is generated from about 250.034 online news articles and over 9.000 literatures. Our VNER system consists 27 categories and more 300.000 VNEs which are recognized and categorized. The accuracy of the recognizing and classifying algorithm is about 95%.

Keywords

Natural Language Processing Name Entity Recognition Entity Recognition Category Member Confidence Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chen, C., Lee, H.J.: A Three-Phase System for Chinese Named Entity Recognition. In: Proceedings of ROCLING XVI, pp. 39–48 (2004)Google Scholar
  2. 2.
    Le Trung, H., Le Anh, V., Le Trung, K.: An Unsupervised Learning and Statistical Approach for Vietnamese Word Recognition and Segmentation. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds.) ACIIDS 2010, Part II. LNCS (LNAI), vol. 5991, pp. 195–204. Springer, Heidelberg (2010)Google Scholar
  3. 3.
    Le Trung, H., Le Anh, V., Dang, V.-H., Hoang, H.V.: Recognizing and Tagging Vietnamese Words Based on Statistics and Word Order Patterns. In: Nguyen, N.T., Trawiński, B., Katarzyniak, R., Jo, G.-S. (eds.) Adv. Methods for Comput. Collective Intelligence. SCI, vol. 457, pp. 3–12. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  4. 4.
    Lin, W., Yangarber, R., Grishman, R.: Bootstrapped learning of semantic classes from positive and negative examples. In: Proceedings of ICMLK 2003 Workshop on the Continuum from Labeled to Unlabeled Data (2003)Google Scholar
  5. 5.
    Micheal, T., Riloff, E.: A Bootstrapping Method for Learning Semantic Lexicon using Extraction Pattern Contexts. In: Proceedings of the ACL 2002 conference on Empirical Methods in Natural Language Processing, pp. 214–221 (2002)Google Scholar
  6. 6.
    Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction by Multi-level Bootstrapping. In: Proceedings of the Sixteenth National Conference on the Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference, pp. 474–479 (1999)Google Scholar
  7. 7.
    Tran, Q.T., Pham, T.X.T., Ngo, Q.H., Dinh, D., Collier, N.: Named Entity Recognition in Vietnamese documents. Progress in Informatics Journal, 5–13 (2007)Google Scholar
  8. 8.
    Pham, T.X.T., Kawazoe, A., Dinh, D., Collier, N., Tran, Q.T.: Construction of a Vietnamese Corpora for Named Entity Recognition. In: RIAO 2007, 8th International Conference, pp. 719–724. Carnegie Mellon University, Pittsburgh (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Hieu Le Trung
    • 1
  • Vu Le Anh
    • 2
  • Kien Le Trung
    • 3
  1. 1.Duy Tan UniversityDa NangVietnam
  2. 2.Nguyen Tat Thanh UniversityHo Chi MinhVietnam
  3. 3.Hue University of SciencesHueVietnam

Personalised recommendations