Temporal Expression Recognition in Hindi

  • Nitin Ramrakhiyani
  • Prasenjit Majumder
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8284)

Abstract

Temporal annotation of plain text is considered as a useful component of modern information retrieval tasks. In this work, two approaches for identification and classification of temporal entities in Hindi are developed and analyzed. Firstly, a rule based approach is developed, which takes plain text as input and based on a set of hand-crafted rules, produces a tagged output with identified temporal expressions. This approach is shown to have a strict F1-measure of 0.83. In the other approach, a CRF based classifier is trained with human tagged data and is then tested on a test dataset. The trained classifier identifies the temporal expressions from plain text and further classifies them to various classes. This approach is shown to have a strict F1-measure of 0.78. In this process a reusable gold standard dataset for temporal tagging in Hindi was developed. Named the ILTIMEX2012 corpus, it consists of 300 manually tagged Hindi news documents.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Pustejovsky, J.: TERQAS: Time and Event recognition for question answering systems. In: ARDA Workshop (2002)Google Scholar
  2. 2.
    Pustejovsky, J., Castano, J., Ingria, R., Sauri, R., Gaizauskas, R., Setzer, A., Katz, G., Radev, D.: TimeML: Robust specification of event and temporal expressions in text. In: New Directions in Question Answering 2003, pp. 28–34 (2003)Google Scholar
  3. 3.
    Mani, I., Schiffman, B.: Temporally anchoring and ordering events in news. In: Time and Event Recognition in Natural Language. John Benjamins (2005)Google Scholar
  4. 4.
    Swan, R., Allan, J.: Automatic generation of overview timelines. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49–56. ACM (2000)Google Scholar
  5. 5.
    Allan, J., Gupta, R., Khandelwal, V.: Temporal summaries of new topics. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 10–18. ACM (2001)Google Scholar
  6. 6.
    Majumder, P.: Forum for Information Retrieval Evaluation 2011 (2011), http://www.isical.ac.in/~fire/2011/slides/fire.2011.majumder.prasenjit.pdf
  7. 7.
    Mani, I., Wilson, G., Sundheim, B., Ferro, L.: A multilingual approach to annotating and extracting temporal information. In: Proceedings of the Workshop on Temporal and Spatial Information Processing, vol. 1, p. 12. Association for Computational Linguistics (2001)Google Scholar
  8. 8.
    Negri, M., Marseglia, L.: Recognition and Normalization of Time Expressions: ITC-irst at TERN 2004. Rapport Interne, ITC-irst, Trento (2004)Google Scholar
  9. 9.
    Saquete, E., Muñoz, R., Martínez-Barco, P.: Event ordering using TERSEO system. Data & Knowledge Engineering 58(1), 70–89 (2006)CrossRefGoogle Scholar
  10. 10.
    Strötgen, J., Gertz, M.: HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 321–324 (2010)Google Scholar
  11. 11.
    Hacioglu, K., Chen, Y., Douglas, B.: Automatic time expression labeling for english and chinese text. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 548–559. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    MUC-7: Message Understanding Conference 1998. In: Proceedings of the Seventh Message Understanding Conference, DARPA (1998)Google Scholar
  13. 13.
    Shokouhi, M.: SIGIR 2012 Workshop on Time Aware Information Access (2012), http://research.microsoft.com/en-us/people/milads/taia2012.aspx
  14. 14.
    Mazur, P.: TIMEX Portal (2008), http://www.timexportal.info/
  15. 15.
    MITRE-Corporation: TIDES Temporal Annotation Guide. The MITRE Corporation (June 2001)Google Scholar
  16. 16.
    MITRE-Corporation: 2005 Standard for the Annotation of Temporal Expressions. The MITRE Corporation (April 2005)Google Scholar
  17. 17.
    Mazur, P., Dale, R.: An intermediate representation for the interpretation of temporal expressions. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, pp. 33–36. Association for Computational Linguistics (2006)Google Scholar
  18. 18.
    Ahn, D., Rantwijk, J., Rijke, M.: A Cascaded Machine Learning Approach to Interpreting Temporal Expressions. In: HLT-NAACL, pp. 420–427 (2007)Google Scholar
  19. 19.
  20. 20.
    NIST: Automatic Content Extraction (2004), http://www.itl.nist.gov/iad/mig/tests/ace/2004/index.html
  21. 21.
    Cunningham, H.: GATE, a general architecture for text engineering. Computers and the Humanities 36(2), 223–254 (2002)CrossRefGoogle Scholar
  22. 22.
    Bharati, A., Chaitanya, V., Sangal, R., Ramakrishnamacharyulu, K.: Natural language processing: a Paninian perspective. Prentice-Hall of India, New Delhi (1995)Google Scholar
  23. 23.
    Jha, G.N.: The TDIL Program and the Indian Langauge Corpora Intitiative(ILCI). In: LREC (2010)Google Scholar
  24. 24.
    Kodu, T.: CRF++: Yet another CRF toolkit (2005), http://crfpp.googlecode.com/svn/trunk/doc/index.html

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Nitin Ramrakhiyani
    • 1
  • Prasenjit Majumder
    • 2
  1. 1.Tata Research Development and Design CentrePuneIndia
  2. 2.DAIICTGandhinagarIndia

Personalised recommendations