Skip to main content

Improving the Performance of a Named Entity Recognition System with Knowledge Acquisition

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7603)

Abstract

Named Entity Recognition (NER) is important for extracting information from highly heterogeneous web documents. Most NER systems have been developed based on formal documents, but informal web documents usually contain noise, and incorrect and incomplete expressions. The performance of current NER systems drops dramatically as informality increases in web documents and a different kind of NER is needed. Here we propose a Ripple-Down-Rules-based Named Entity Recognition (RDRNER) system. This is a wrapper around the machine-learning-based Stanford NER system, correcting its output using rules added by people to deal with specific application domains. The key advantages of this approach are that it can handle the freer writing style that occurs in web documents and correct errors introduced by the web’s informal characteristics. In these studies the Ripple-Down Rule approach, with low-cost rule addition improved the Stanford NER system’s performance on informal web document in a specific domain to the same level as its state-of-the-art performance on formal documents.

Keywords

  • Ripple-Down Rules
  • Named Entity Recognition

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Califf, M.E., Mooney, R.J.: Relational Learning of Pattern-Match Rules for Information Extraction. In: ACL 1997 Workshop in Natural Language Learning (1997)

    Google Scholar 

  2. Rozenfeld, B., Feldman, R.: Self-supervised relation extraction from the Web. Knowl. Inf. Syst. 17, 17–33 (2008)

    CrossRef  Google Scholar 

  3. Collot, M., Belmore, N.: Electronic Language: A New Variety of English. In: Computer-Mediated Communications: Linguistic, Social and Cross-Cultural Perspectives. John Benjamins, Amsterdam/Philadelphia (1996)

    Google Scholar 

  4. Rau, L.F.: Extracting Company Names from Text. In: 6th IEEE Conference on Artificial Intelligence Applications. IEEE Computer Society Press, Miami Beach (1991)

    Google Scholar 

  5. Kang, B.H., Compton, P., Preston, P.: Multiple Classification Ripple Down Rules: Evaluation and Possibilities. In: 9th Banff Knowledge Acquisition for Knowledge Based Systems Workshop (1995)

    Google Scholar 

  6. Bunescu, R.C., Mooney, R.J.: Learning to Extract Relations from the Web using Minimal Supervision. In: 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic (2007)

    Google Scholar 

  7. Asahara, M., Matsumoto, Y.: Japanese Named Entity Extraction with Redundant Morphological Analysis. In: Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics (2003)

    Google Scholar 

  8. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)

    CrossRef  Google Scholar 

  9. Etzioni, O., Cafarella, M., Downey, D., Popescu, A., Shaked, T., Soderland, S., Weld, D., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165, 91–134 (2005)

    CrossRef  Google Scholar 

  10. Nguyen, D.P.T., Matsuo, Y., Ishizuka, M.: Relation extraction from wikipedia using subtree mining. In: 22nd National Conference on Artificial Intelligence, vol. 2, pp. 1414–1420. AAAI Press (2007)

    Google Scholar 

  11. Zhu, J., Nie, Z., Liu, X., Zhang, B., Wen, J.R.: StatSnowball: a statistical approach to extracting entity relationships. In: 18th International Conference on World Wide Web, pp. 101–110. ACM, Madrid (2009)

    CrossRef  Google Scholar 

  12. Zacharias, V.: Development and Verification of Rule Based Systems — A Survey of Developers. In: Bassiliades, N., Governatori, G., Paschke, A. (eds.) RuleML 2008. LNCS, vol. 5321, pp. 6–16. Springer, Heidelberg (2008)

    CrossRef  Google Scholar 

  13. Toral, A., Muñoz, R.: A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia. In: 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy (2006)

    Google Scholar 

  14. Kazama, J.i., Torisawa, K.: ExploitingWikipedia as External Knowledge for Named Entity Recognition. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic (2007)

    Google Scholar 

  15. Banko, M., Etzioni, O.: The Tradeoffs Between Open and Traditional Relation Extraction. In: ACL 2008: HLT (2008)

    Google Scholar 

  16. Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. In: 16th National Conference on Artificial Intelligence and the 11th Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence (1999)

    Google Scholar 

  17. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: The Association for Computer Linguistics (2005)

    Google Scholar 

  18. Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: CONLL 2009 (2009)

    Google Scholar 

  19. Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS (LNAI), vol. 4013, pp. 266–277. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  20. Mikheev, A., Moens, M., Grover, C.: Named Entity recognition without gazetteers. In: 9th Conference on European Chapter of the Association for Computational Linguistics, pp. 1–8. Association for Computational Linguistics, Bergen (1999)

    CrossRef  Google Scholar 

  21. Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing Named Entities in Tweets. In: 49th Association for Computational Linguistics, pp. 359–367 (2011)

    Google Scholar 

  22. Compton, P., Peters, L., Lavers, T., Kim, Y.S.: Experience with long-term knowledge acquisition. In: 6th International Conference on Knowledge Capture, pp. 49–56. ACM, Banff (2011)

    CrossRef  Google Scholar 

  23. Pham, S.B., Hoffmann, A.: Extracting Positive Attributions from Scientific Papers. In: Discovery Science Conference (2004)

    Google Scholar 

  24. Pham, S.B., Hoffmann, A.: Efficient Knowledge Acquisition for Extracting Temporal Relations. In: 17th European Conference on Artificial Intelligence, Riva del Garda, Italy (2006)

    Google Scholar 

  25. Xu, H., Hoffmann, A.: RDRCE: Combining Machine Learning and Knowledge Acquisition. In: Pacific Rim Knowledge Acquisition Workshop (2010)

    Google Scholar 

  26. Kim, M.H., Compton, P., Kim, Y.S.: RDR-based Open IE for the Web Document. In: 6th International Conference on Knowledge Capture, Banff, Alberta, Canada (2011)

    Google Scholar 

  27. Clark, A., Tim, I.: Combining Distributional and Morphological Information for Part of Speech Induction. In: 10th Annual Meeting of the European Association for Computational Linguistics (2003)

    Google Scholar 

  28. Ho, V.H., Compton, P., Benatallah, B., Vayssiere, J., Menzel, L., Vogler, H.: An incremental knowledge acquisition method for improving duplicate invoices detection. In: Proceedings of the International Conference on Data Engineering, Shanghai, China, pp. 1415–1418 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, M.H., Compton, P. (2012). Improving the Performance of a Named Entity Recognition System with Knowledge Acquisition. In: ten Teije, A., et al. Knowledge Engineering and Knowledge Management. EKAW 2012. Lecture Notes in Computer Science(), vol 7603. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33876-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33876-2_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33875-5

  • Online ISBN: 978-3-642-33876-2

  • eBook Packages: Computer ScienceComputer Science (R0)