Language Resources and Evaluation

, Volume 44, Issue 3, pp 263–280

SpatialML: annotation scheme, resources, and evaluation

  • Inderjeet Mani
  • Christy Doran
  • Dave Harris
  • Janet Hitzeman
  • Rob Quimby
  • Justin Richer
  • Ben Wellner
  • Scott Mardis
  • Seamus Clancy
Article

Abstract

SpatialML is an annotation scheme for marking up references to places in natural language. It covers both named and nominal references to places, grounding them where possible with geo-coordinates, and characterizes relationships among places in terms of a region calculus. A freely available annotation editor has been developed for SpatialML, along with several annotated corpora. Inter-annotator agreement on SpatialML extents is 91.3 F-measure on a corpus of SpatialML-annotated ACE documents released by the Linguistic Data Consortium. Disambiguation agreement on geo-coordinates on ACE is 87.93 F-measure. An automatic tagger for SpatialML extents scores 86.9 F on ACE, while a disambiguator scores 93.0 F on it. Results are also presented for two other corpora. In adapting the extent tagger to new domains, merging the training data from the ACE corpus with annotated data in the new domain provides the best performance.

Keywords

Annotation Guidelines Spatial language Geography Information extraction Evaluation Adaptation 

References

  1. Barker, E., & Purves, R. (2008). A caption annotation system for georeferencing images. In Fifth workshop on geographic information retrieval (GIR’08). ACM 17th Conference on Information and Knowledge Management, Napa, CA, October 30, 2008.Google Scholar
  2. Bateman, J. (2008). The long road from spatial language to geospatial information, and the even longer road back: the role of ontological heterogeneity. Invited talk, LREC workshop on methodologies and resources for processing spatial language. http://www.sfbtr8.spatial-cognition.de/SpatialLREC/.
  3. Clementini, E., Di Felice, P., & Hernández, D. (1997). Qualitative representation of positional information. Artificial Intelligence, 95(2), 317–356.CrossRefGoogle Scholar
  4. Cohn, A. G., Bennett, B., Gooday, J., & Gotts, N. M. (1997). Qualitative spatial representation and reasoning with the region connection calculus. GeoInformatica, 1, 275–316.CrossRefGoogle Scholar
  5. Cristiani, M., & Cohn, A. G. (2002). SpaceML: A mark-up language for spatial knowledge. Journal of Visual Languages and Computing, 13, 97–116.CrossRefGoogle Scholar
  6. Daume III, H. (2007). Frustratingly easy domain adaptation. In Proceedings of ACL’2007.Google Scholar
  7. Egenhofer, M., & Herring, J. (1990). Categorizing binary topological relations between regions, lines, and points in geographic databases/technical report. Department of Surveying Engineering, University of Maine, 1990.Google Scholar
  8. Garbin, E., & Mani, I. (2005). Disambiguating toponyms in news. In Proceedings of the human language technology conference and conference on empirical methods in natural language processing (pp. 363–370).Google Scholar
  9. Leidner, J. L. (2006). Toponym resolution: A first large-scale comparative evaluation. Research Report EDI-INF-RR-0839.Google Scholar
  10. Levinson, S. C. (2006). Space in language and cognition: Explorations in cognitive diversity. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  11. Mandl, T., Agosti, M., Di Nunzio, G. M., Yeh, A., Mani, I., Doran, C. et al. (2009). LogCLEF 2009: The CLEF 2009 multilingual logfile analysis track overview. Working notes for the CLEF 2009 workshop, Corfu, Greece. http://clef.isti.cnr.it/2009/working_notes/LogCLEF-2009-Overview-Working-Notes-2009-09-14.pdf.
  12. Mardis, S., & Burger, J. (2005). Design for an integrated gazetteer database: Technical description and user guide for a gazetteer to support natural language processing applications. Mitre technical report, MTR 05B0000085. http://www.mitre.org/work/tech_papers/tech_papers_06/06_0375/index.html.
  13. Papadias, D., Theodoridis, Y., Sellis, T. K., & Egenhofer, M. J. (1995). Topological relations in the world of minimum bounding rectangles: A study with R-trees. In Proceedings of the 1995 ACM SIGMOD international conference on management of data (pp. 92–103). San Jose, California. May 22–25, 1995.Google Scholar
  14. Pustejovsky, J., Ingria, B., Sauri, R., Castano, J., Littman, J., Gaizauskas, R., et al. (2005). The specification language timeML. In I. Mani, J. Pustejovsky, & R. Gaizauskas (Eds.), The language of time: A reader (pp. 545–557). Oxford: Oxford University Press.Google Scholar
  15. Pustejovsky, J., & Moszkowicz, J. L. (2008). Integrating motion predicate classes with spatial and temporal annotations. In Proceedings of COLING 2008: Companion volume—posters and demonstrations (pp. 95–98).Google Scholar
  16. Randell, D. A., Cui, Z., & Cohn, A. G. (1992). A spatial logic based on regions and connection. In Proceedings of 3rd international conference on knowledge representation and reasoning, Morgan Kaufmann, San Mateo (pp. 165–176).Google Scholar
  17. Rashid, A., Shariff, B. M., Egenhofer, M. J., & Mark, D. M. (1998). Natural-language spatial relations between linear and area objects: The topology and metric of english-language terms. International Journal of Geographic Information Science, 12(3), 215–246.Google Scholar
  18. Schilder, F., Versley, Y., & Habel, C. (2004). Extracting spatial information: Grounding, classifying and linking spatial expressions. Workshop on geographic information. Retrieval at the 27th ACM SIGIR conference, Sheffield, England, UK.Google Scholar
  19. Sundheim, B., Mardis, S., & Burger, J. (2006). Gazetteer linkage to WordNet. In The Third International WordNet Conference, South Jeju Island, Korea. http://nlpweb.kaist.ac.kr/gwc/pdf2006/7.pdf.

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Inderjeet Mani
    • 1
  • Christy Doran
    • 1
  • Dave Harris
    • 1
  • Janet Hitzeman
    • 1
  • Rob Quimby
    • 1
  • Justin Richer
    • 1
  • Ben Wellner
    • 1
  • Scott Mardis
    • 1
  • Seamus Clancy
    • 1
  1. 1.The MITRE CorporationBedfordUSA

Personalised recommendations