Skip to main content
Log in

TimeBank evolution as a community resource for TimeML parsing

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

TimeBank is the only reference corpus for TimeML, an expressive language for annotating complex temporal information. It is a rich resource for a broad range of research into various aspects of the expression of time and temporally related events. This paper traces the development of TimeBank from its initial—and somewhat noisy—version (1.1) to a substantially revised release (1.2), now available via the Linguistic Data Consortium. The development path is motivated by the encouraging empirical results of TimeML-compliant annotators developed on the basis of TimeBank 1.1, and is informed by a detailed study of the characteristics of that initial release, which guides a clean-up process turning TimeBank 1.2 into a consistent and robust community resource.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. TimeBank (Version 1.2) is distributed by the Linguistic Data Consortium; see http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T08.

  2. Temporal and Event Recognition for QA Systems; http://www.timeml.org/site/terqas.

  3. At the time of writing, the specification of TimeML is undergoing revision, with respect to the makeinstance tag in particular. While alternative mechanisms are proposed as replacement expression of the tag’s semantics, we incorporate here a description of makeinstance as it was used in the annotation of TimeBank.

  4. timex2 and timex3 differ substantially in their treatment of event anchoring and sets of times. In particular, relational time expressions (e.g., 2 days before departure) are a single timex2; under TimeML analysis, the same expression would be annotated as a collection of related timex3, signal and event tags, with an additional link anchoring the event. Sets of times (e.g., every day) would also get different analyses. This impacts both the boundaries of annotation spans, and attributes of the covering annotations (tags). Overall, timex3 is not a straightforward extension of timex2, as its analysis of a temporal expression is designed to interact with all the other TimeML components.

  5. Strictly speaking, GUTime targets the timex2 tag, most recently popularized by the Time Expression Recognition and Normalization (TERN) program; see http://www.timex2.mitre.org/tern.htm. As far as extent and normalized value of the temporal expression are concerned, timex3 and timex2 are not that dissimilar.

  6. http://www.cis.upenn.edu/treebank.

  7. http://www.cnts.ua.ac.be/conll2003/ner.

  8. A token may initiate a span of tokens which belong to a given category, or it may fall inside of such a category span, or it would not belong to any category. Thus, with respect to a given category x, a token would be tagged with one of begins_x, insideof_x, or outside tags. This kind of encoding models category assignment to token sequences as individual token tagging task; see (Boguraev and Ando 2005b).

  9. Following the release of TimeBank 1.1, a dedicated effort focused on developing a custom annotation tool. TANGO (Pustejovsky et al. 2003c) specifically addresses the challenges of producing XML-compliant and internally consistent markup for ‘dense’ annotation tasks—of which TimeML is a particularly good example.

  10. Translingual Information Detection, Extraction, and Summarization; http://www-nlpir.nist.gov/tides.

  11. The TimeML working groups included people involved in TIDES and STAG.

  12. TimeML ANnotation Graphical Organizer; http://www.timeml.org/site/tango; a workshop following TERQAS, focusing on developing annotation infrastructure for TimeML.

  13. The most recent TimeML specifications and annotation guidelines are available at http://www.timeml.org.

  14. One other change has been made to the TimeML specification since the completion of TimeBank 1.2; namely, the removal of the makeinstance tag. All the attributes associated with this tag (i.e., tense, aspect, modality, polarity) have been moved to the event tag itself. (See also Footnote 3.)

  15. The annotators were all novices and received only one to two hours of training of TimeML annotation (see Sect. 5.1).

  16. Technically, each annotator’s data should be considered both as the key and as the response, and recall and precision should be computed in both directions. However, with only two annotators only one direction is needed.

  17. The Kappa coefficient adjusts for the number of agreements that would have occurred by chance and is defined by (p op e)/(1−p e), where p o is the observed probability and p e the expected probability. The Kappa coefficient, however, is not well suited for annotation tasks that cannot be construed as pure classification tasks and is therefore not used to measure agreement on whether links were introduced by both annotators. See also (Hirschman et al. 1998).

  18. While certain conclusions can be drawn from the fact that TimeBank 1.1 IAA scores for link identification and typing are low, not a lot should rest on the actual figures: these were inexperienced annotators, whose IAA scores on timex3’s and events were about 10 points lower than those of their experienced counterparts.

  19. One could say that each eventevent pair, or eventtimex3 pair, that has no temporal link defined between them by way of a tlink or slink tag, is in fact evidence of a missing link. This is clearly impractical given that the number of links is quadratic to the number of events and times in a text. Here, the number of missing links is calculated by finding events that are not temporally linked to any other event or time.

  20. The difference in signal counts between the two corpora is due to reasons explained above in (5.2).

Abbreviations

TimeML:

A Markup Language for Time

timex :

Time Expression

LDC:

Linguistic Data Consortium

IE:

Information Extraction

IAA:

Inter-Annotator Agreement

References

  • Allen, J. (1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11), 832–843.

    Google Scholar 

  • Boguraev, B., & Ando, R. K. (2005a). TimeBank-driven TimeML analysis. In: G. Katz, J. Pustejovsky, & F. Schilder (Eds.), International Workshop on Annotating, Extracting, and Reasoning with Time. Dagstuhl, Germany.

    Google Scholar 

  • Boguraev, B., & Ando, R. K. (2005b). TimeML-compliant text analysis for temporal reasoning. In: Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05). Edinburgh, Scotland.

  • Ferro, L. (2001). tides: Instruction manual for the annotation of temporal expressions. Technical Report MTR 01W0000046V01, The MITRE Corporation.

  • Fikes, R., Jenkins, J., & Frank, G. (2003). JTP: A system architecture and component library for hybrid reasoning. Technical Report KSL-03-01, Knowledge Systems Laboratory, Stanford University.

  • Gaizauskas, R., Harkema, H., Hepple, M., & Setzer, A. (2006). Task-oriented extraction of temporal information: The case of clinical narratives. In: A. Montanari, J. Pustejovsky, & P. Revesz (Eds.), TIME 2006: International Symposium on Temporal Representation and Reasoning. Budapest, Hungary.

  • Han, B., & Lavie, A. (2004). A framework for resolution of time in natural language. TALIP Special Issue on Spatial and Temporal Information Processing, 3(1), 11–35.

    Google Scholar 

  • Hirschman, L., Robinson, P., Burger, J., & Vilain, M. (1998). Automatic coreference: The role of annotated training data. In: AAAI 1998 Spring Symposium on Applying Machine Learning to Discourse Processing. Stanford, USA, pp. 1419–1422.

  • Hobbs, J., & Pan, F. (2004). An ontology of time for the semantic web. TALIP Special Issue on Spatial and Temporal Information Processing, 3(1), 66–85.

    Google Scholar 

  • Hobbs, J., & Pustejovsky, J. (2004). Annotating and reasoning about time and events. In: AAAI Spring Symposium on Logical Formalizations of Commonsense Reasoning. Stanford, CA.

  • Lee, K., Pustejovsky, J., & Boguraev, B. (2006). Towards an international standard for annotating temporal information. In: Third International Conference on Terminology, Standardization and Technology Transfer. Beijing, China.

  • Mani, I., Wellner, B., Verhagen, M., Lee, C. M., & Pustejovsky, J. (2006). Machine Learning of Temporal Relations. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics. Sydney, Australia.

  • Pustejovsky, J., Castaño, J., Ingria, R., Saurí, R., Gaizauskas, R., Setzer, A., Katz, G., & Radev, D. (2003a).TimeML: Robust specification of event and temporal expressions in text. In: AAI Spring Symposium on New Directions in Question-Answering (Working Papers). Stanford, CA, pp. 28–34.

  • Pustejovsky, J., Hanks, P., Saurí, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D., Ferro, L., & Lazo, M. (2003b). The TimeBank corpus. In: T. McEnery (Ed.), Corpus Linguistics (pp. 47–656). Lancaster.

  • Pustejovsky, J., Knippen, R., Littman, J., & Saurí, R. (2005). Temporal and event information in natural language text. Language Resources and Evaluation 39(2–3), 123–164.

    Article  Google Scholar 

  • Pustejovsky, J., Mani, I., Bélanger, L., Boguraev, B., Knippen, B., Littman, J., Rumshisky, A., See, A.,Symonenko, S., Guilder, J. V., Guilder, L. V., Verhagen, M., & Ingria, R. (2003c). Graphical Annotation Kit for TimeML’. Technical report, TANGO (TimeML ANnotation Graphical Organizer) Workshop Version 1.4, <http://www.timeml.org/tango> [date of citation: 2005-06-20]

  • Saurí, R., Littman, J., Knippen, B., Gaizauskas, R., Setzer, A., & Pustejovsky, J. (2005). TimeML Annotation Guidelines, Version 1.2.1’. Technical report, TERQAS Workshop/Linguistic Data Consortium. <http://www.timeml.org/site/publications/timeMLdocs/AnnGuide_1.2.1.pdf> [date of citation: 2006-07-16].

  • Setzer, A. (2001). Temporal information in newswire articles: An annotation scheme and corpus study. Ph.D. thesis, University of Sheffield, Sheffield, UK.

  • Verhagen, M. (2005). Temporal closure in an annotation environment. Language Resources and Evaluation, 39(2–3), 211–241.

    Article  Google Scholar 

  • Verhagen, M., Mani, I., Sauri, R., Littman, J., Knippen, R., Jang, S. B., Rumshisky, A., Phillips, J., & Pustejovsky, J. (2005). Automating Temporal Annotation with tarsqi’. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05). Ann Arbor, Michigan, Poster/Demo.

Download references

Acknowledgements

This work was supported in part by the ARDA NIMD and AQUAINT programs, PNWD-SW-6059 and NBCHC040027-MOD-0003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Branimir Boguraev.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boguraev, B., Pustejovsky, J., Ando, R. et al. TimeBank evolution as a community resource for TimeML parsing. Lang Resources & Evaluation 41, 91–115 (2007). https://doi.org/10.1007/s10579-007-9018-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9018-8

Keywords

Navigation