Language Resources and Evaluation

, Volume 42, Issue 4, pp 361–393 | Cite as

Evaluation of machine learning-based information extraction algorithms: criticisms and recommendations

  • Alberto Lavelli
  • Mary Elaine Califf
  • Fabio Ciravegna
  • Dayne Freitag
  • Claudio Giuliano
  • Nicholas Kushmerick
  • Lorenza Romano
  • Neil Ireson
Article

Abstract

We survey the evaluation methodology adopted in information extraction (IE), as defined in a few different efforts applying machine learning (ML) to IE. We identify a number of critical issues that hamper comparison of the results obtained by different researchers. Some of these issues are common to other NLP-related tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings. Some issues are specific to IE: how leniently to assess inexact identification of filler boundaries, the possibility of multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an IE task, these issues should be explicitly addressed, and a number of methodological characteristics should be clearly defined. To empirically verify the practical impact of the issues mentioned above, we perform a survey of the results of different algorithms when applied to a few standard datasets. The survey shows a serious lack of consensus on these issues, which makes it difficult to draw firm conclusions on a comparative evaluation of the algorithms. Our aim is to elaborate a clear and detailed experimental methodology and propose it to the IE community. Widespread agreement on this proposal should lead to future IE comparative evaluations that are fair and reliable. To demonstrate the way the methodology is to be applied we have organized and run a comparative evaluation of ML-based IE systems (the Pascal Challenge on ML-based IE) where the principles described in this article are put into practice. In this article we describe the proposed methodology and its motivations. The Pascal evaluation is then described and its results presented.

Keywords

Evaluation methodology Information extraction Machine learning 

References

  1. Califf, M. E. (1998). Relational learning techniques for natural language information extraction. Ph.D. thesis, University of Texas at Austin.Google Scholar
  2. Califf, M., & Mooney, R. (2003). Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 4, 177–210.CrossRefGoogle Scholar
  3. Chieu, H. L., & Ng, H. T. (2002). Probabilistic reasoning for entity and relation recognition. In Proceedings of the 19th National Conference on Artificial Intelligence (AAAI 2002).Google Scholar
  4. Chinchor, N., Hirschman, L., & Lewis, D. D. (1993). Evaluating message understanding systems: An analysis of the third Message Understanding Conference (MUC-3). Computational Linguistics, 19(3), 409–449.Google Scholar
  5. Ciravegna, F. (2001a). Adaptive information extraction from text by rule induction and generalisation. In Proceedings of 17th International Joint Conference on Artificial Intelligence (IJCAI-01). Seattle, WA.Google Scholar
  6. Ciravegna, F. (2001b). (LP)2, an adaptive algorithm for information extraction from web-related texts. In Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining. Seattle, WA.Google Scholar
  7. Ciravegna, F., Dingli, A., Petrelli, D., & Wilks, Y. (2002). User-system cooperation in document annotation based on information extraction. In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02).Google Scholar
  8. Ciravegna, F., & Lavelli, A. (2004). LearningPinocchio: Adaptive information extraction for real world applications. Journal of Natural Language Engineering, 10(2), 145–165.CrossRefGoogle Scholar
  9. Daelemans, W., & Hoste, V. (2002). Evaluation of machine learning methods for natural language processing tasks. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002). Las Palmas, Spain.Google Scholar
  10. Daelemans, W., Hoste, V., Meulder, F. D., & Naudts, B. (2003). Combined optimization of feature selection and algorithm parameters in machine learning of language. In Proceedings of the 14th European Conference on Machine Learning (ECML 2003). Cavtat-Dubronik, Croatia.Google Scholar
  11. De Sitter, A., & Daelemans, W. (2003). Information extraction via double classification. In Proceedings of the ECML/PKDD 2003 Workshop on Adaptive Text Extraction and Mining (ATEM 2003). Cavtat-Dubronik, Croatia.Google Scholar
  12. Douthat, A. (1998). The Message Understanding Conference scoring software user’s manual. In Proceedings of the 7th Message Understanding Conference (MUC-7). http://www.itl.nist.gov/iaui/894.02/related_projects/muc/muc_sw/muc_sw_manual.html.
  13. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.Google Scholar
  14. Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005).Google Scholar
  15. Finn, A., & Kushmerick, N. (2004a). Information extraction by convergent boundary classification. In Proceedings of the AAAI 2004 Workshop on Adaptive Text Extraction and Mining (ATEM 2004). San Jose, California.Google Scholar
  16. Finn, A., & Kushmerick, N. (2004b). Multi-level boundary classification for information extraction. In Proceedings of the 15th European Conference on Machine Learning. Pisa, Italy.Google Scholar
  17. Freitag, D. (1997). Using grammatical inference to improve precision in information extraction. In Proceedings of the ICML-97 Workshop on Automata Induction, Grammatical Inference, and Language Acquisition. Nashville, Tennessee.Google Scholar
  18. Freitag, D. (1998). Machine learning for information extraction in informal domains. Ph.D. thesis, Carnegie Mellon University.Google Scholar
  19. Freitag, D., & Kushmerick, N. (2000). Boosted wrapper induction. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI 2000). Austin, Texas.Google Scholar
  20. Habert, B., Adda, G., Adda-Decker, M., de Mareuil, P. B., Ferrari, S., Ferret, O., Illouz, G., & Paroubek, P. (1998). Towards tokenization evaluation. In Proceedings of 1st International Conference on Language Resources and Evaluation (LREC-98). Granada, Spain.Google Scholar
  21. Hirschman, L. (1998). The evolution of evaluation: Lessons from the Message Understanding Conferences. Computer Speech and Language, 12(4), 281–305.CrossRefGoogle Scholar
  22. Hoste, V., Hendrickx, I., Daelemans, W., & van den Bosch, A. (2002). Parameter optimization for machine-learning of word sense disambiguation. Natural Language Engineering, 8(4), 311–325.CrossRefGoogle Scholar
  23. Ireson, N., Ciravegna, F., Califf, M. E., Freitag, D., Kushmerick, N., & Lavelli, A. (2005). Evaluating machine learning for information extraction. In Proceedings of 22nd International Conference on Machine Learning (ICML 2005). Bonn, Germany.Google Scholar
  24. Iria, J., & Ciravegna, F. (2006). A methodology and tool for representing language resources for information extraction. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006). Genoa, Italy.Google Scholar
  25. Kosala, R., & Blockeel, H. (2000). Instance-based wrapper induction. In Proceedings of the Tenth Belgian-Dutch Conference on Machine Learning (Benelearn 2000). Tilburg, The Netherlands, pp. 61–68.Google Scholar
  26. Kushmerick, N. (2000). Wrapper induction: Efficency and expressiveness. Artificial Intelligence, 118(1–2), 15–68.CrossRefGoogle Scholar
  27. Li, Y., Bontcheva, K., & Cunningham, H. (2005a). SVM based learning system for information extraction. In J. Winkler, M. Niranjan, & N. Lawrence (Eds.), Deterministic and statistical methods in machine learning, Vol. 3635 of LNAI. (pp. 319–339). Springer Verlag.Google Scholar
  28. Li, Y., Bontcheva, K., & Cunningham, H. (2005b). Using uneven margins SVM and perceptron for information extraction. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CONLL 2005).Google Scholar
  29. Makhoul, J., Kubala, F., Schwartz, R., & Weischedel, R. (1999), Performance measures for information extraction. In Proceedings of the DARPA Broadcast News Workshop. http://www.nist.gov/speech/publications/darpa99/pdf/dir10.pdf.
  30. Noreen, E. W. (1989). Computer Intensive Methods for Testing Hypotheses: An Introduction. New York: Wiley.Google Scholar
  31. Peshkin, L., & Pfeffer, A. (2003). Bayesian information extraction network. In Proceedings of 18th International Joint Conference on Artificial Intelligence (IJCAI 2003). Acapulco, Mexico.Google Scholar
  32. RISE. (1998). A repository of online information sources used in information extraction tasks. [http://www.isi.edu/info-agents/RISE/index.html] Information Sciences Institute/USC.
  33. Roth, D., & Yih, W. (2001). Relational learning via propositional algorithms: An information extraction case study. In Proceedings of 17th International Joint Conference on Artificial Intelligence (IJCAI-01). Seattle, WA.Google Scholar
  34. Roth, D., & Yih, W. (2002). Relational learning via propositional algorithms: An information extraction case study. Technical Report UIUCDCS-R-2002-2300, Department of Computer Science, University of Illinois at Urbana-Champaign.Google Scholar
  35. Sigletos, G., Paliouros, G., Spyropoulos, C., & Hatzopoulos, M. (2005). Combining information extraction systems using voting and stacked generalization. Journal of Machine Learning Research, 6, 1751–1782.Google Scholar
  36. Soderland, S. (1999). Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1–3), 233–272.CrossRefGoogle Scholar
  37. Sutton, C., & McCallum, A. (2004). Collective segmentation and labeling of distant entities. In Proceedings of the ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Alberto Lavelli
    • 1
  • Mary Elaine Califf
    • 2
  • Fabio Ciravegna
    • 3
  • Dayne Freitag
    • 4
  • Claudio Giuliano
    • 1
  • Nicholas Kushmerick
    • 5
  • Lorenza Romano
    • 1
  • Neil Ireson
    • 3
  1. 1.FBK-irstPovoItaly
  2. 2.Illinois State UniversityNormalUSA
  3. 3.University of SheffieldSheffieldUK
  4. 4.Fair Isaac CorporationSan DiegoUSA
  5. 5.Decho CorporationSeattleUSA

Personalised recommendations