Language Resources and Evaluation

, Volume 47, Issue 2, pp 269–298 | Cite as

Multilingual and cross-domain temporal tagging

Original Paper

Abstract

Extraction and normalization of temporal expressions from documents are important steps towards deep text understanding and a prerequisite for many NLP tasks such as information extraction, question answering, and document summarization. There are different ways to express (the same) temporal information in documents. However, after identifying temporal expressions, they can be normalized according to some standard format. This allows the usage of temporal information in a term- and language-independent way. In this paper, we describe the challenges of temporal tagging in different domains, give an overview of existing annotated corpora, and survey existing approaches for temporal tagging. Finally, we present our publicly available temporal tagger HeidelTime, which is easily extensible to further languages due to its strict separation of source code and language resources like patterns and rules. We present a broad evaluation on multiple languages and domains on existing corpora as well as on a newly created corpus for a language/domain combination for which no annotated corpus has been available so far.

Keywords

Temporal information Temporal tagger Named entity recognition Named entity normalization TIMEX2 TIMEX3 

References

  1. Ahn, D., Adafre, S. F., & de Rijke, M. (2005a). Extracting temporal information from open domain text: A comparative exploration. Journal of Digital Information Management, 3, 14–20.Google Scholar
  2. Ahn, D., Adafre, S. F., & de Rijke, M. (2005b). Towards task-based temporal extraction and recognition. In G. Katz, J. Pustejovsky, & F. Schilder (Eds.) Annotating, extracting and reasoning about time and events, Dagstuhl, Germany, no. 05151 in Dagstuhl Seminar Proceedings.Google Scholar
  3. Ahn, D., van Rantwijk, J., & de Rijke, M. (2007) A cascaded machine learning approach to interpreting temporal expressions. In Proceedings of human language technologies: The annual conference of the North American chapter of the association for computational linguistics, pp. 420–427.Google Scholar
  4. Allan, J. (Ed.) (2002). Topic detection and tracking: Event-based information organization. Norwell, MA: Kluwer Academic Publishers.Google Scholar
  5. Alonso, O., Gertz, M., & Baeza-Yates, R. (2007). On the value of temporal information in information retrieval. SIGIR Forum, 41(2), 35–41.CrossRefGoogle Scholar
  6. Alonso, O., Strötgen, J., Baeza-Yates, R., & Gertz, M. (2011). Temporal information retrieval: Challenges and opportunities. In Proceedings of the 1st international temporal web analytics workshop (TWAW 2011), pp. 1–8.Google Scholar
  7. Boguraev, B., & Ando, R. K. (2005). TimeBank-driven TimeML analysis. In G. Katz, J. Pustejovsky, & F. Schilder (Eds.) Annotating, extracting and reasoning about tme and events, no. 05151 in Dagstuhl Seminar Proceedings.Google Scholar
  8. Chinchor, N. A. (1997). Overview of MUC-7/MET-2. In Proceedings of the 7th conference on message understanding (MUC 1997).Google Scholar
  9. Costa, F., & Branco, A. (2010). Temporal information processing of a new language: Fast porting with minimal resources. In Proceedings of the 48th annual meeting of the association for computational linguistics (ACL ’10), pp. 671–677.Google Scholar
  10. Ferro, L., Mani, I., Sundheim, B., & Wilson, G. (2001). TIDES temporal annotation guidelines—version 1.0.2. Technical report, The MITRE Corporation.Google Scholar
  11. Ferro, L., Gerber, L., Mani, I., Sundheim, B., & Wilson, G. (2005). TIDES 2005 standard for the annotation of temporal expressions. Technical report, The MITRE Corporation.Google Scholar
  12. Grishman, R., & Sundheim, B. (1995). Design of the MUC-6 evaluation. In Proceedings of the 6th conference on message understanding (MUC 1995).Google Scholar
  13. Gurevych, I., Mühlhäuser, M., Müller, C., Steimle, J., Weimer, M., & Zesch, T. (2007). Darmstadt knowledge processing repository based on UIMA. In Proceedings of the first workshop on unstructured information management architecture at biannual conference of the society for computational linguistics and language technology.Google Scholar
  14. Hacioglu, K., Chen, Y., & Douglas, B. (2005). Automatic time expression labeling for english and chinese text. In Proceedings of the 6th international conference on intelligent text processing and computational linguistics (CICLing ’05), Springer, pp. 548–559.Google Scholar
  15. Kolomiyets, O., & Moens, M.-F. (2009). Meeting tempeval-2: Shallow approach for temporal tagger. In Proceedings of the workshop on semantic evaluations: Recent achievements and future directions (DEW ’09), pp. 52–57.Google Scholar
  16. Makkonen, J., Ahonen-myka, H., & Salmenkivi, M. (2003). Topic detection and tracking with spatio-temporal evidence. In Proceedings of 25th European conference on information retrieval research (ECIR ’03), pp. 251–265.Google Scholar
  17. Mani, I., & Wilson, G. (2000). Robust temporal processing of news. In Proceedings of the 38th annual meeting on association for computational linguistics (ACL ’00), pp. 69–76.Google Scholar
  18. Mazur, P., & Dale, R. (2009). The DANTE temporal expression tagger. In Proceedings of the 3rd language and technology conference, pp. 245–257.Google Scholar
  19. Mazur, P., & Dale, R. (2010). WikiWars: A new corpus for research on temporal expressions. In Proceedings of the conference on empirical methods in natural language processing (EMNLP ’10), pp. 913–922.Google Scholar
  20. Negri, M., & Marseglia, L. (2005). Recognition and normalization of time expressions: ITC-irst at TERN 2004. Technical report.Google Scholar
  21. Negri, M., Saquete, E., Martínez-Barco, P., & Muñoz, R. (2006). Evaluating knowledge-based approaches to the multilingual extension of a temporal expression normalizer. In Proceedings of the workshop on annotating and reasoning about time and events (ARTE ’06), pp. 30–37.Google Scholar
  22. Pustejovsky, J., Castaño, J. M., Ingria, R., Sauri, R., Gaizauskas, R. J., Setzer, A., Katz, G., & Radev, D. R. (2003a). TimeML: Robust specification of event and temporal expressions in text. In: New Directions in Question Answering, pp. 28–34.Google Scholar
  23. Pustejovsky, J., Hanks, P., Sauri, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D., Ferro, L., Lazo, M. (2003b). The TIMEBANK corpus. In Proceedings of corpus linguistics 2003, pp. 647–656.Google Scholar
  24. Pustejovsky, J., Knippen, R., Littman, J., & Sauri, R. (2005). Temporal and event information in natural language text. Language resources and evaluation, 39(2–3), 23–164.Google Scholar
  25. Saquete Boro, E. (2010). ID 392:TERSEO + T2T3 transducer. A systems for recognizing and normalizing TIMEX3. In Proceedings of the 5th international workshop on semantic evaluation (SemEval ’10), pp. 317–320.Google Scholar
  26. Schilder, F., & Habel, C. (2001). From temporal expressions to temporal information: Semantic tagging of news messages. In Proceedings of the ACL-2001 workshop on temporal and spatial information processing, pp. 65–72.Google Scholar
  27. Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing.Google Scholar
  28. Strötgen, J., & Gertz, M. (2010a). HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th international workshop on semantic evaluation (SemEval ’10), pp. 321–324.Google Scholar
  29. Strötgen, J., & Gertz, M. (2010b). TimeTrails: A system for exploring spatio-temporal information in documents. In Proceedings of the 36th international conference on very large data bases (VLDB 2010), pp. 1569–1572.Google Scholar
  30. Strötgen, J., & Gertz, M. (2011). WikiWarsDE: A German corpus of narratives annotated with temporal expressions. In Proceedings of the conference of the German society for computational linguistics and language technology (GSCL 2011), pp. 129–134.Google Scholar
  31. Strötgen, J., Gertz, M., & Popov, P. (2010). Extraction and exploration of spatio-temporal information in documents. In Proceedings of the 6th workshop on geographic information retrieval (GIR ’10), pp. 1–8.Google Scholar
  32. Strötgen, J., Gertz, M., & Junghans, C. (2011) An event-centric model for multilingual document similarity. In Proceeding of the 34rd international ACM SIGIR conference on research and development in information retrieval (SIGIR’11), pp. 953–962.Google Scholar
  33. UzZaman, N., & Allen, J. (2011). Event and temporal expression extraction from raw text: First step towards a temporally aware system. International Journal of Semantic Computing, 4(4), 487–508.Google Scholar
  34. Verhagen, M., & Pustejovsky, J. (2008). Temporal processing with the TARSQI toolkit. In Coling 2008: Companion volume: Demonstrations, pp. 189–192.Google Scholar
  35. Verhagen, M., Sauri, R., Caselli, T., & Pustejovsky, J. (2010). SemEval-2010 task 13: TempEval-2. In Proceedings of the 5th international workshop on semantic evaluation (SemEval ’10), pp. 57–62.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  1. 1.Institute of Computer ScienceHeidelberg UniversityHeidelbergGermany

Personalised recommendations