Improving the Performance of a Named Entity Recognition System with Knowledge Acquisition

  • Myung Hee Kim
  • Paul Compton
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7603)


Named Entity Recognition (NER) is important for extracting information from highly heterogeneous web documents. Most NER systems have been developed based on formal documents, but informal web documents usually contain noise, and incorrect and incomplete expressions. The performance of current NER systems drops dramatically as informality increases in web documents and a different kind of NER is needed. Here we propose a Ripple-Down-Rules-based Named Entity Recognition (RDRNER) system. This is a wrapper around the machine-learning-based Stanford NER system, correcting its output using rules added by people to deal with specific application domains. The key advantages of this approach are that it can handle the freer writing style that occurs in web documents and correct errors introduced by the web’s informal characteristics. In these studies the Ripple-Down Rule approach, with low-cost rule addition improved the Stanford NER system’s performance on informal web document in a specific domain to the same level as its state-of-the-art performance on formal documents.


Ripple-Down Rules Named Entity Recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Califf, M.E., Mooney, R.J.: Relational Learning of Pattern-Match Rules for Information Extraction. In: ACL 1997 Workshop in Natural Language Learning (1997)Google Scholar
  2. 2.
    Rozenfeld, B., Feldman, R.: Self-supervised relation extraction from the Web. Knowl. Inf. Syst. 17, 17–33 (2008)CrossRefGoogle Scholar
  3. 3.
    Collot, M., Belmore, N.: Electronic Language: A New Variety of English. In: Computer-Mediated Communications: Linguistic, Social and Cross-Cultural Perspectives. John Benjamins, Amsterdam/Philadelphia (1996)Google Scholar
  4. 4.
    Rau, L.F.: Extracting Company Names from Text. In: 6th IEEE Conference on Artificial Intelligence Applications. IEEE Computer Society Press, Miami Beach (1991)Google Scholar
  5. 5.
    Kang, B.H., Compton, P., Preston, P.: Multiple Classification Ripple Down Rules: Evaluation and Possibilities. In: 9th Banff Knowledge Acquisition for Knowledge Based Systems Workshop (1995)Google Scholar
  6. 6.
    Bunescu, R.C., Mooney, R.J.: Learning to Extract Relations from the Web using Minimal Supervision. In: 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic (2007)Google Scholar
  7. 7.
    Asahara, M., Matsumoto, Y.: Japanese Named Entity Extraction with Redundant Morphological Analysis. In: Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics (2003)Google Scholar
  8. 8.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)CrossRefGoogle Scholar
  9. 9.
    Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165, 91–134 (2005)CrossRefGoogle Scholar
  10. 10.
    Nguyen, D.P.T., Matsuo, Y., Ishizuka, M.: Relation extraction from wikipedia using subtree mining. In: 22nd National Conference on Artificial Intelligence, vol. 2, pp. 1414–1420. AAAI Press (2007)Google Scholar
  11. 11.
    Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.R.: StatSnowball: a statistical approach to extracting entity relationships. In: 18th International Conference on World Wide Web, pp. 101–110. ACM, Madrid (2009)CrossRefGoogle Scholar
  12. 12.
    Zacharias, V.: Development and Verification of Rule Based Systems — A Survey of Developers. In: Bassiliades, N., Governatori, G., Paschke, A. (eds.) RuleML 2008. LNCS, vol. 5321, pp. 6–16. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Toral, A., Muñoz, R.: A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In: 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy (2006)Google Scholar
  14. 14.
    Kazama, J.i., Torisawa, K.: ExploitingWikipedia as External Knowledge for Named Entity Recognition. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic (2007)Google Scholar
  15. 15.
    Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. In: ACL 2008: HLT (2008)Google Scholar
  16. 16.
    Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. In: 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence (1999)Google Scholar
  17. 17.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: The Association for Computer Linguistics (2005)Google Scholar
  18. 18.
    Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: CONLL 2009 (2009)Google Scholar
  19. 19.
    Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  20. 20.
    Mikheev, A., Moens, M., Grover, C.: Named Entity recognition without gazetteers. In: 9th Conference on European Chapter of the Association for Computational Linguistics, pp. 1–8. Association for Computational Linguistics, Bergen (1999)CrossRefGoogle Scholar
  21. 21.
    Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing Named Entities in Tweets. In: 49th Association for Computational Linguistics, pp. 359–367 (2011)Google Scholar
  22. 22.
    Compton, P., Peters, L., Lavers, T., Kim, Y.S.: Experience with long-term knowledge acquisition. In: 6th International Conference on Knowledge Capture, pp. 49–56. ACM, Banff (2011)CrossRefGoogle Scholar
  23. 23.
    Pham, S.B., Hoffmann, A.: Extracting Positive Attributions from Scientific Papers. In: Discovery Science Conference (2004)Google Scholar
  24. 24.
    Pham, S.B., Hoffmann, A.: Efficient Knowledge Acquisition for Extracting Temporal Relations. In: 17th European Conference on Artificial Intelligence, Riva del Garda, Italy (2006)Google Scholar
  25. 25.
    Xu, H., Hoffmann, A.: RDRCE: Combining Machine Learning and Knowledge Acquisition. In: Pacific Rim Knowledge Acquisition Workshop (2010)Google Scholar
  26. 26.
    Kim, M.H., Compton, P., Kim, Y.S.: RDR-based Open IE for the Web Document. In: 6th International Conference on Knowledge Capture, Banff, Alberta, Canada (2011)Google Scholar
  27. 27.
    Clark, A., Tim, I.: Combining Distributional and Morphological Information for Part of Speech Induction. In: 10th Annual Meeting of the European Association for Computational Linguistics (2003)Google Scholar
  28. 28.
    Ho, V.H., Compton, P., Benatallah, B., Vayssiere, J., Menzel, L., Vogler, H.: An incremental knowledge acquisition method for improving duplicate invoices detection. In: Proceedings of the International Conference on Data Engineering, Shanghai, China, pp. 1415–1418 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Myung Hee Kim
    • 1
  • Paul Compton
    • 1
  1. 1.The University of New South WalesSydneyAustralia

Personalised recommendations