Abstract
In this paper, we propose a method for creating automatically weighted lexicons of event names. Almost all names of events are ambiguous in context (i.e., they can be interpreted in an eventive or non-eventive reading). Therefore, weights representing the relative “eventiveness” of a noun can help for disambiguating event detection in texts.
We applied our method on both French and English corpora. Our method has been applied to both French and English corpora. We performed an evaluation based upon a machine-learning approach that shows that using weighted lexicons can be a good way to improve event extraction. We also propose a study concerning the necessary size of corpus to be used for creating a valuable lexicon.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ACE (Automatic Content Extraction) - English Annotation Guidelines for Events, V 5.4.3 2005.07.01. Tech. rep., LDC (2005)
Aït-Mokhtar, S., Chanod, J.P., Roux, C.: Robustness beyond Shallowness: Incremental Deep Parsing. Natural Language Engineering 8 (2002)
Arnulphy, B.: A weighted Lexicon of French Names. In: Proc. of RANLP Student Workshop (2011)
Arnulphy, B., Tannier, X., Vilnat, A.: Les entités nommées événement et les verbes de cause-conséquence. In: Actes de TALN (2010)
Bel, N., Coll, M., Resnik, G.: Automatic Detection of Non-deverbal Event Nouns for Quick Lexicon Production. In: Proc. of COLING (2010)
Bittar, A.: Building a TimeBank for French: A Reference Corpus Annotated According to the ISO-TimeML Standard. Ph.D. thesis, Univ. Paris Diderot (2010)
Calabrese Steimberg, L.: Les héméronymes. Ces évènements qui font date, ces dates qui deviennent événements. Mots. Les langages du politique 3 (2008)
Calabrese Steimberg, L.: La nomination d’événements dans le discours d’information : entre activité collective et déférence épistémiologique. In: Colloque Langage, discours, événements (2011)
Carletta, J.: Assessing Agreement on Classification Tasks: the Kappa Statistic. Computational Linguistics 22 (1996)
Creswell, C., Beal, M.J., Chen, J., Cornell, T.L., Nilsson, L., Srihari, R.K.: Automatically Extracting Nominal Mentions of Events with a Bootstrapped Probabilistic Classifier. In: Proc. of the COLING/ACL (2006)
Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S., Weischedel, R.: The Automatic Content Extraction (ACE) Program - Tasks, Data, and Evaluation. In: Proc. of LREC (2004)
Eberle, K., Faaß, G., Heid, U.: Corpus-based identification and disambiguation of reading indicators for German nominalizations. In: Proc. of Corpus Linguistics (2009)
Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)
Gravier, G., Bonastre, J.F., Geoffrois, E., Galliano, S., McTait, K., Choukri, K.: Ester, une campagne d’évaluation des systèmes d’indexation automatique d’émissions radiophoniques en français. In: Proc. of JEP (2004)
Grishman, R., Sundheim, B.: Message Understanding Conference: A Brief History. In: Proc. of COLING (1996)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Landis, J.R., Koch, G.G.: The Measurement of Observer Agreement for Categorical Data. Biometrics 33 (1977)
Lecolle, M.: Éléments pour la caractérisation des toponymes en emploi événementiel. In: Evrard, I., Pierrard, M., Rosier, L., Raemdonck, D.V. (eds.) Les sens en marge - Représentations linguistiques et observables discursifs: actes du colloque international de Bruxelles, Novembre 3-5, L’Harmattan (2009)
Peris, A., Taulé, M., Boleda, G., Rodriguez, H.: ADN-classifier: Automatically assigning denotation types to nominalizations. In: Proc. of LREC (2010)
Pustejovsky, J., Verhagen, M., Saurí, R., Littman, J., Gaizauskas, R., Katz, G., Mani, I., Knippen, R., Setzer, A.: TimeBank 1.2. LDC (2006)
Pustejovsky, J., Castaño, J., Ingria, R., Saurí, R., Gaizauskas, R., Setzer, A., Katz, G.: TimeML: Robust Specification of Event and Temporal Expressions in Text. In: IWCS-5, Fifth International Workshop on Computational Semantics (2003)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufman Publishers (1993)
Resnik, G., Bel, N.: Automatic detection of non-deverbal event nouns in spanish. In: di Linguistica Computazionale, I. (ed.) Proc. of the 5th Int. Conference on Generative Approaches to the Lexicon (2009)
Russo, I., Caselli, T., Rubino, F.: Recognizing deverbal events in context. In: Proc. of CICLing. Springer, Heidelberg (2011)
Saurí, R., Knippen, R., Verhagen, M., Pustejovsky, J.: Evita: A Robust Event Recognizer for QA Systems. In: Proc. of HLT/EMNLP (2005)
Tanguy, L., Hathout, N.: Webaffix: un outil d’acquisition morphologique dérivationnelle á partir du Web. In: Pierrel, J.M. (ed.) Actes de TALN. ATILF (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arnulphy, B., Tannier, X., Vilnat, A. (2012). Automatically Generated Noun Lexicons for Event Extraction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-28601-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)