Automatically Generated Noun Lexicons for Event Extraction

  • Béatrice Arnulphy
  • Xavier Tannier
  • Anne Vilnat
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7182)


In this paper, we propose a method for creating automatically weighted lexicons of event names. Almost all names of events are ambiguous in context (i.e., they can be interpreted in an eventive or non-eventive reading). Therefore, weights representing the relative “eventiveness” of a noun can help for disambiguating event detection in texts.

We applied our method on both French and English corpora. Our method has been applied to both French and English corpora. We performed an evaluation based upon a machine-learning approach that shows that using weighted lexicons can be a good way to improve event extraction. We also propose a study concerning the necessary size of corpus to be used for creating a valuable lexicon.


Noun Phrase Extraction Rule Event Extraction Eventive Reading Semantic Role Labelling 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    ACE (Automatic Content Extraction) - English Annotation Guidelines for Events, V 5.4.3 2005.07.01. Tech. rep., LDC (2005)Google Scholar
  2. 2.
    Aït-Mokhtar, S., Chanod, J.P., Roux, C.: Robustness beyond Shallowness: Incremental Deep Parsing. Natural Language Engineering 8 (2002)Google Scholar
  3. 3.
    Arnulphy, B.: A weighted Lexicon of French Names. In: Proc. of RANLP Student Workshop (2011)Google Scholar
  4. 4.
    Arnulphy, B., Tannier, X., Vilnat, A.: Les entités nommées événement et les verbes de cause-conséquence. In: Actes de TALN (2010)Google Scholar
  5. 5.
    Bel, N., Coll, M., Resnik, G.: Automatic Detection of Non-deverbal Event Nouns for Quick Lexicon Production. In: Proc. of COLING (2010)Google Scholar
  6. 6.
    Bittar, A.: Building a TimeBank for French: A Reference Corpus Annotated According to the ISO-TimeML Standard. Ph.D. thesis, Univ. Paris Diderot (2010)Google Scholar
  7. 7.
    Calabrese Steimberg, L.: Les héméronymes. Ces évènements qui font date, ces dates qui deviennent événements. Mots. Les langages du politique 3 (2008)Google Scholar
  8. 8.
    Calabrese Steimberg, L.: La nomination d’événements dans le discours d’information : entre activité collective et déférence épistémiologique. In: Colloque Langage, discours, événements (2011)Google Scholar
  9. 9.
    Carletta, J.: Assessing Agreement on Classification Tasks: the Kappa Statistic. Computational Linguistics 22 (1996)Google Scholar
  10. 10.
    Creswell, C., Beal, M.J., Chen, J., Cornell, T.L., Nilsson, L., Srihari, R.K.: Automatically Extracting Nominal Mentions of Events with a Bootstrapped Probabilistic Classifier. In: Proc. of the COLING/ACL (2006)Google Scholar
  11. 11.
    Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation. In: Proc. of LREC (2004)Google Scholar
  12. 12.
    Eberle, K., Faaß, G., Heid, U.: Corpus-based identification and disambiguation of reading indicators for German nominalizations. In: Proc. of Corpus Linguistics (2009)Google Scholar
  13. 13.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)Google Scholar
  14. 14.
    Gravier, G., Bonastre, J.F., Geoffrois, E., Galliano, S., McTait, K., Choukri, K.: Ester, une campagne d’évaluation des systèmes d’indexation automatique d’émissions radiophoniques en français. In: Proc. of JEP (2004)Google Scholar
  15. 15.
    Grishman, R., Sundheim, B.: Message Understanding Conference: A Brief History. In: Proc. of COLING (1996)Google Scholar
  16. 16.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Google Scholar
  17. 17.
    Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics 33 (1977)Google Scholar
  18. 18.
    Lecolle, M.: Éléments pour la caractérisation des toponymes en emploi événementiel. In: Evrard, I., Pierrard, M., Rosier, L., Raemdonck, D.V. (eds.) Les sens en marge - Représentations linguistiques et observables discursifs: actes du colloque international de Bruxelles, Novembre 3-5, L’Harmattan (2009)Google Scholar
  19. 19.
    Peris, A., Taulé, M., Boleda, G., Rodriguez, H.: ADN-classifier: Automatically assigning denotation types to nominalizations. In: Proc. of LREC (2010)Google Scholar
  20. 20.
    Pustejovsky, J., Verhagen, M., Saurí, R., Littman, J., Gaizauskas, R., Katz, G., Mani, I., Knippen, R., Setzer, A.: TimeBank 1.2. LDC (2006)Google Scholar
  21. 21.
    Pustejovsky, J., Castaño, J., Ingria, R., Saurí, R., Gaizauskas, R., Setzer, A., Katz, G.: TimeML: Robust Specification of Event and Temporal Expressions in Text. In: IWCS-5, Fifth International Workshop on Computational Semantics (2003)Google Scholar
  22. 22.
    Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufman Publishers (1993)Google Scholar
  23. 23.
    Resnik, G., Bel, N.: Automatic detection of non-deverbal event nouns in spanish. In: di Linguistica Computazionale, I. (ed.) Proc. of the 5th Int. Conference on Generative Approaches to the Lexicon (2009)Google Scholar
  24. 24.
    Russo, I., Caselli, T., Rubino, F.: Recognizing deverbal events in context. In: Proc. of CICLing. Springer, Heidelberg (2011)Google Scholar
  25. 25.
    Saurí, R., Knippen, R., Verhagen, M., Pustejovsky, J.: Evita: A Robust Event Recognizer for QA Systems. In: Proc. of HLT/EMNLP (2005)Google Scholar
  26. 26.
    Tanguy, L., Hathout, N.: Webaffix: un outil d’acquisition morphologique dérivationnelle á partir du Web. In: Pierrel, J.M. (ed.) Actes de TALN. ATILF (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Béatrice Arnulphy
    • 1
  • Xavier Tannier
    • 1
  • Anne Vilnat
    • 1
  1. 1.LIMSI-CNRS, Univ. Paris-SudOrsayFrance

Personalised recommendations