Language Resources and Evaluation

, Volume 40, Issue 2, pp 183–201

Fact distribution in Information Extraction

Original Paper

DOI: 10.1007/s10579-006-9014-4

Cite this article as:
Stevenson, M. Lang Resources & Evaluation (2006) 40: 183. doi:10.1007/s10579-006-9014-4


Several recent Information Extraction (IE) systems have been restricted to the identification facts which are described within a single sentence. It is not clear what effect this has on the difficulty of the extraction task or how the performance of systems which consider only single sentences should be compared with those which consider multiple sentences. This paper compares three IE evaluation corpora, from the Message Understanding Conferences, and finds that a significant proportion of the facts mentioned therein are not described within a single sentence. Therefore systems which are evaluated only on facts described within single sentences are being tested against a limited portion of the relevant information in the text and it is difficult to compare their performance with other systems. Further analysis demonstrates that anaphora resolution and world knowledge are required to combine information described across multiple sentences. This result has implications for the development and evaluation of IE systems.


Information ExtractionEvaluationMessage understanding conferences

Copyright information

© Springer Science+Business Media 2007

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldUK