Advertisement

Ripple Down Rules for Part-of-Speech Tagging

  • Dat Quoc Nguyen
  • Dai Quoc Nguyen
  • Son Bao Pham
  • Dang Duc Pham
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6608)

Abstract

This paper presents a new approach to learn a rule based system for the task of part of speech tagging. Our approach is based on an incremental knowledge acquisition methodology where rules are stored in an exception-structure and new rules are only added to correct errors of existing rules; thus allowing systematic control of interaction between rules. Experimental results of our approach on English show that we achieve in the best accuracy published to date: 97.095% on the Penn Treebank corpus. We also obtain the best performance for Vietnamese VietTreeBank corpus.

Keywords

Transformation Rule Unknown Word Exception Rule Current Word Ripple Down Rules 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brants, T.: Tnt – a statistical part-of-speech tagger. In: Proc. ANLP, pp. 224–231 (2000)Google Scholar
  2. 2.
    Collins, M.: Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In: Proc. EMNLP, pp. 1–8 (2002)Google Scholar
  3. 3.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. NAACL-HLT, pp. 173–180 (2003)Google Scholar
  4. 4.
    Tsuruoka, Y., Tsujii, J.: Bidirectional inference with the easiest-first strategy for tagging sequence data. In: Proc. HLT-EMNLP, pp. 467–474 (2005)Google Scholar
  5. 5.
    Giménez, J., Màrquez, L.: Svmtool: A general pos tagger generator based on support vector machines. In: Proc. LREC, pp. 43–46 (2004)Google Scholar
  6. 6.
    Shen, L., Satta, G., Joshi, A.: Guided learning for bidirectional sequence classification. In: Proc. ACL, pp. 760–767 (2007)Google Scholar
  7. 7.
    Brill, E.: Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)MathSciNetGoogle Scholar
  8. 8.
    Hepple, M.: Independence and commitment: assumptions for rapid training and execution of rule-based pos taggers. In: Proc. ACL, pp. 278–277 (2000)Google Scholar
  9. 9.
    Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proc. NAACL, pp. 1–8 (2001)Google Scholar
  10. 10.
    Compton, P., Jansen, R.: Knowledge in context: a strategy for expert system maintenance. In: Proc. AI 1988, pp. 292–306 (1988)Google Scholar
  11. 11.
    Compton, P., Jansen, R.: A philosophical basis for knowledge acquisition. Knowl. Acquis. 2(3), 241–257 (1990)CrossRefGoogle Scholar
  12. 12.
    Richards, D.: Two decades of ripple down rules research. Knowl. Eng. Rev. 24(2), 159–184 (2009)CrossRefGoogle Scholar
  13. 13.
    Nguyen, P.T., Vu, X.L., Nguyen, T.M.H., Nguyen, V.H., Le, H.P.: Building a large syntactically-annotated corpus of vietnamese. In: Proc. LAW, pp. 182–185 (2009)Google Scholar
  14. 14.
    Florian, R., Henderson, J.C., Ngai, G.: Coaxing confidences from an old friend: probabilistic classifications from transformation rule lists. In: Proc. EMNLP, pp. 26–34 (2000)Google Scholar
  15. 15.
    Pham, S.B., Hoffmann, A.: Efficient knowledge acquisition for extracting temporal relations. In: Proc. ECAI, pp. 521–525 (2006)Google Scholar
  16. 16.
    Dien, D., Kiem, H.: Pos-tagger for english-vietnamese bilingual corpus. In: Proc. HLT-NAACL WBT, pp. 88–95 (2003)Google Scholar
  17. 17.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. ICML, pp. 282–289 (2001)Google Scholar
  18. 18.
    Tran, O.T., Le, C.A., Ha, T.Q., Le, Q.H.: An experimental study on vietnamese pos tagging. In: Proc. IALP, pp. 23–27 (2009)Google Scholar
  19. 19.
    Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19(2), 313–330 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Dat Quoc Nguyen
    • 1
  • Dai Quoc Nguyen
    • 1
  • Son Bao Pham
    • 1
    • 2
  • Dang Duc Pham
    • 1
  1. 1.Human Machine Interaction Laboratory, Faculty of Information TechnologyUniversity of Engineering and Technology, Vietnam National UniversityHanoiViet Nam
  2. 2.Information Technology InstituteVietnam National UniversityHanoiViet Nam

Personalised recommendations