Fact distribution in Information Extraction
- Mark Stevenson
- … show all 1 hide
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
Several recent Information Extraction (IE) systems have been restricted to the identification facts which are described within a single sentence. It is not clear what effect this has on the difficulty of the extraction task or how the performance of systems which consider only single sentences should be compared with those which consider multiple sentences. This paper compares three IE evaluation corpora, from the Message Understanding Conferences, and finds that a significant proportion of the facts mentioned therein are not described within a single sentence. Therefore systems which are evaluated only on facts described within single sentences are being tested against a limited portion of the relevant information in the text and it is difficult to compare their performance with other systems. Further analysis demonstrates that anaphora resolution and world knowledge are required to combine information described across multiple sentences. This result has implications for the development and evaluation of IE systems.
- Bagga, A., & Biermann, A. (1997). Analyzing the Complexity of a Domain with Respect to an Information Extraction Task. In Proceedings of the Tenth International Conference on Research on Computational Linguistics (ROCLING-X) (pp. 174–194). Taipei, Taiwan.
- Chieu, H., & Ng, H. (2002). A Maximum Entropy Approach to Information Extraction from Semi-structured and Free Text. In Proceedings of the Eighteenth International Conference on Artificial Intelligence (AAAI-02) (pp. 768–791). Edmonton, Canada.
- Culotta, A., & Sorensen, J. (2004). Dependency Tree Kernels for Relation Extraction In 42nd Annual Meeting of the Association for Computational Linguistics (pp. 423–429). Barcelona, Spain.
- Grishman, R. (2003). Information Extraction. In R. Mitkov (Ed.), The Oxford Handbook of Computational Linguistics (pp. 545–559). Oxford University Press.
- Grover, C., Matheson, C., Mikheev, A., & Moens, M. (2000). LT TTT - A Flexible Tokenisation Tool. In Proceedings of Second International Conference on Language Resources and Evaluation (LREC 2000). Athens, Greece.
- Hirschman, L. (1992). An Adjunct Test for Discourse Processing in MUC-4. In Proceedings of the Fourth Message Understanding Conference (MUC-4) (pp. 67–77). San Francisco, CA.
- Huttunen, S., Yangarber, R., & Grishman R. (2002). Complexity of Event Structures in IE Scenarios. In Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002) (pp. 376–382). Taipei, Taiwan.
- Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a Large Annotated Corpus of English: The Penn Tree Bank. Computational Linguistics, 19(2), 313–330.
- Mitkov, R. (2003). Anaphora Resolution. In R. Mitkov (Ed.), The Oxford Handbook of Computational Linguistics (pp. 266–283). Oxford University Press.
- Sekine, S. (2006). On-Demand Information Extraction. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions (pp. 731–738). Sydney, Australia.
- Soderland, S. (1999). Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning, 31(1–3), 233–272. CrossRef
- Stevenson, M. (2004) Information Extraction from Single and Multiple Sentences. In Proceedings of the Twentieth International Conference on Computational Linguistics (COLING-02) (pp. 875–881). Geneva, Switzerland.
- Stevenson, M., & Greenwood, M. (2005). A Semantic Approach to IE Pattern Induction. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (pp. 379–386). Ann Arbour, MI.
- Sundheim, B. (1991) Overview of the Third Message Understanding Evaluation and Conference. In Proceedings of the Third Message Understanding Conference (MUC-3) (pp. 3–16). San Diego, CA.
- Yangarber, R., Grishman, R., Tapanainen, P., & Huttunen, S. (2000). Automatic Acquisition of Domain Knowledge for Information Extraction. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000) (pp. 940–946). Saarbrücken, Germany.
- Zelenko, D., Aone, C., & Richardella. A. (2003). Kernel Methods for Relation Extraction. Journal of Machine Learning Research, 3, 1083–1106. CrossRef
- Fact distribution in Information Extraction
Language Resources and Evaluation
Volume 40, Issue 2 , pp 183-201
- Cover Date
- Print ISSN
- Online ISSN
- Kluwer Academic Publishers
- Additional Links
- Information Extraction
- Message understanding conferences
- Industry Sectors
- Mark Stevenson (1)
- Author Affiliations
- 1. Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, S1 4DP, Sheffield, UK