Original Paper

Language Resources and Evaluation

, Volume 40, Issue 2, pp 183-201

First online:

Fact distribution in Information Extraction

  • Mark StevensonAffiliated withDepartment of Computer Science, University of Sheffield Email author 

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


Several recent Information Extraction (IE) systems have been restricted to the identification facts which are described within a single sentence. It is not clear what effect this has on the difficulty of the extraction task or how the performance of systems which consider only single sentences should be compared with those which consider multiple sentences. This paper compares three IE evaluation corpora, from the Message Understanding Conferences, and finds that a significant proportion of the facts mentioned therein are not described within a single sentence. Therefore systems which are evaluated only on facts described within single sentences are being tested against a limited portion of the relevant information in the text and it is difficult to compare their performance with other systems. Further analysis demonstrates that anaphora resolution and world knowledge are required to combine information described across multiple sentences. This result has implications for the development and evaluation of IE systems.


Information Extraction Evaluation Message understanding conferences