The issue of how to experimentally evaluate information extraction (IE) systems has received hardly any satisfactory solution in the literature. In this paper we propose a novel evaluation model for IE and argue that, among others, it allows (i) a correct appreciation of the degree of overlap between predicted and true segments, and (ii) a fair evaluation of the ability of a system to correctly identify segment boundaries. We describe the properties of this models, also by presenting the result of a re-evaluation of the results of the CoNLL’03 and CoNLL’02 Shared Tasks on Named Entity Extraction.


Information Extraction Spanish Task Event Space Name Entity Recognition Entity Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lavelli, A., Califf, M.E., Ciravegna, F., Freitag, D., Giuliano, C., Kushmerick, N., Romano, L., Ireson, N.: Evaluation of machine learning-based information extraction algorithms: Criticisms and recommendations. Language Resources and Evaluation 42(4), 361–393 (2008)CrossRefGoogle Scholar
  2. 2.
    Sarawagi, S.: Information extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)CrossRefzbMATHGoogle Scholar
  3. 3.
    Lewis, D.D.: Evaluating and optmizing autonomous text classification systems. In: Proceedings of the 18th ACM International Conference on Research and Development in Information Retrieval (SIGIR 1995), Seattle, US, pp. 246–254 (1995)Google Scholar
  4. 4.
    van Rijsbergen, C.J.: Foundations of evaluation. Journal of Documentation 30(4), 365–373 (1974)CrossRefGoogle Scholar
  5. 5.
    Suzuki, J., McDermott, E., Isozaki, H.: Training conditional random fields with multivariate evaluation measures. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL (ACL/COLING 2006), Sydney, AU, pp. 217–224 (2006)Google Scholar
  6. 6.
    Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)CrossRefGoogle Scholar
  7. 7.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006)CrossRefGoogle Scholar
  8. 8.
    Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of the 17th Conference of the American Association for Artificial Intelligence (AAAI 2000), Austin, US, pp. 577–583 (2000)Google Scholar
  9. 9.
    Freitag, D.: Machine learning for information extraction in informal domains. Machine Learning 39, 169–202 (2000)CrossRefzbMATHGoogle Scholar
  10. 10.
    Krishnan, V., Manning, C.D.: An effective two-stage model for exploiting non-local dependencies in named entity recognition. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, AU, pp. 1121–1128 (2006)Google Scholar
  11. 11.
    Tjong Kim Sang, E.F.: Introduction to the CoNLL-2002 shared task: Language-independent named entity recognition. In: Proceedings of the 6th Conference on Natural Language Learning (CONLL 2002), Taipei, TW, 155–158 (2002)Google Scholar
  12. 12.
    Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In: Proceedings of the 7th Conference on Natural Language Learning (CONLL 2003), Edmonton, CA, pp. 142–147 (2003)Google Scholar
  13. 13.
    Freitag, D.: Using grammatical inference to improve precision in information extraction. In: Proceedings of the ICML 1997 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition, Nashville, US (1997)Google Scholar
  14. 14.
    De Sitter, A., Daelemans, W.: Information extraction via double classification. In: Proceedings of the ECML/PKDD 2003 Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, KR, pp. 66–73 (2003)Google Scholar
  15. 15.
    Tsai, R.T.H., Wu, S.H., Chou, W.C., Lin, Y.C., He, D., Hsiang, J., Sung, T.Y., Hsu, W.L.: Various criteria in the evaluation of biomedical named entity recognition. BMC Bioinformatics 7(92) (2006)Google Scholar
  16. 16.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  17. 17.
    Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Language Resources and Evaluation 39(2/3), 165–210 (2005)CrossRefGoogle Scholar
  18. 18.
    Joachims, T.: A support vector method for multivariate performance measures. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005), Bonn, DE, 377–384 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Andrea Esuli
    • 1
  • Fabrizio Sebastiani
    • 1
  1. 1.Istituto di Scienza e Tecnologie dell’InformazioneConsiglio Nazionale delle RicerchePisaItaly

Personalised recommendations