Predicting Relevance of Event Extraction for the End User

  • Silja Huttunen
  • Arto Vihavainen
  • Mian Du
  • Roman Yangarber
Part of the Theory and Applications of Natural Language Processing book series (NLP)


We present work on estimating the relevance of the results of an Event Extraction system to the end-user’s needs. Our aim is to develop user-oriented measures of utility of the extracted events, i.e., how useful is the factual information found in the document for the end user. We introduce discourse and lexical features, and build classifiers that learn from the users’ ratings of the relevance of the extraction results. Traditional criteria for evaluating the performance of Information Extraction (IE) focus on the correctness of the extracted information, e.g., in terms of recall, precision, F-measure, etc. We rather focus on subjective criteria for evaluating the quality of the extracted information: utility of results to the end-user. To measure utility, we use methods from text mining and linguistic analysis to identify features that are good predictors of the relevance of an event or a document. We report on experiments in two real-world event extraction domains: corporate activities reported in business news, and health threats in news about infectious epidemics.


Information Extraction News Article Relevance Score Medical Domain Event Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    ACE: Automatic content extraction. (2004)
  2. 2.
    Bagga, A., Biermann, A.W.: Analyzing the complexity of a domain with respect to an information extraction task. In: Proceeding of the 10th International Conference on Research on Computational Linguistics (ROCLING X), Taipei (1997)Google Scholar
  3. 3.
    Bell, A.: The Language of News Media. Language in Society/Blackwell, Oxford (1991)Google Scholar
  4. 4.
    Bouckaert, R.: Bayesian network classifiers in Weka. Technical Report (2004)Google Scholar
  5. 5.
    Culotta, A., McCallum, A.: Confidence estimation for information extraction. In: Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics, Boston (2004)Google Scholar
  6. 6.
    Cvitas, A.: Information extraction in business intelligence systems. In: MIPRO, 2010 Proceedings of the 33rd International Convention, Opatija, May 2010, pp. 1278–1282Google Scholar
  7. 7.
    Freifeld, C., Mandl, K., Reis, B., Brownstein, J.: HealthMap: global infectious disease monitoring through automated classification and visualization of internet media reports. J. Am. Med. Inf. Assoc. 15(1), 150–157 (2008)Google Scholar
  8. 8.
    Grishman, R., Huttunen, S., Yangarber, R.: Event extraction for infectious disease outbreaks. In: Proceedings of the 2nd Human Language Technology Conference (HLT 2002), San Diego, March 2002Google Scholar
  9. 9.
    Grishman, R., Huttunen, S., Yangarber, R.: Information extraction for enhanced access to disease outbreak reports. J. Biomed. Inf. 35(4), 236–246 (2003)Google Scholar
  10. 10.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009).
  11. 11.
    Hirschman, L.: Language understanding evaluations: lessons learned from MUC and ATIS. In: Proceedings of the First International Conference on Language Resources and Evaluation (LREC), Granada, May 1998, pp. 117–122Google Scholar
  12. 12.
    Huttunen, S., Yangarber, R., Grishman, R.: Complexity of event structure in information extraction. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, August 2002Google Scholar
  13. 13.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, pp. 338–345. Morgan Kaufmann, San Mateo (1995)Google Scholar
  14. 14.
    Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel Methods: Support Vector Learning, pp. 185–208. MIT, Cambridge (1999)Google Scholar
  15. 15.
    Saggion, H., Funk, A., Maynard, D., Bontcheva, K.: Ontology-based information extraction for business intelligence. In: Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference. ISWC’07/ASWC’07, Busan, pp. 843–856. Springer, Berlin/Heidelberg (2007).
  16. 16.
    Steinberger, R., Fuart, F., van der Goot, E., Best, C., von Etter, P., Yangarber, R.: Text mining from the web for medical intelligence. In: Perrotta, D., Piskorski, J., Soulié-Fogelman, F., Steinberger, R. (eds.) Mining Massive Data Sets for Security. OIS, Amsterdam (2008)Google Scholar
  17. 17.
    von Etter, P., Huttunen, S., Vihavainen, A., Vuorinen, M., Yangarber, R.: Assessment of utility in Web mining for the domain of public health. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents. Association for Computational Linguistics, Los Angeles, June 2010, pp. 29–37.
  18. 18.
    Yangarber, R., Best, C., von Etter, P., Fuart, F., Horby, D., Steinberger, R.: Combining information about epidemic threats from multiple sources. In: Proceedings of the MMIES Workshop, International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, September 2007Google Scholar
  19. 19.
    Yangarber, R., Steinberger, R.: Automatic epidemiological surveillance from on-line news in MedISys and PULS. In: Proceedings of IMED-2009: International Meeting on Emerging Diseases and Surveillance, Vienna (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Silja Huttunen
    • 1
  • Arto Vihavainen
    • 1
  • Mian Du
    • 1
  • Roman Yangarber
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations