Creating a Corpus of Geospatial Natural Language

  • Kristin Stock
  • Robert C. Pasley
  • Zoe Gardner
  • Paul Brindley
  • Jeremy Morley
  • Claudia Cialone
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8116)


The description of location using natural language is of interest for a number of research activities including the automated interpretation and generation of natural language to ease interaction with geographic information systems. For such activities, examples of geospatial natural language are usually collected from the personal knowledge of researchers, or in small scale collection activities specific to the project concerned. This paper describes the process used to develop a more generic corpus of geospatial natural language.

The paper discusses the development and evaluation of four methods for semi-automated harvesting of geospatial natural language clauses from text to create a corpus of geospatial natural language. The most successful method uses a set of geospatial syntactic templates that describe common patterns of grammatical geospatial word categories and provide a precision of 0.66. Particular challenges were posed by the range of English dialects included, as well as metaphoric and sporting references.


corpus linguistics geospatial natural language 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G.M., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H.S., Weinert, R.: The HCRC Map Task Corpus. Language and Speech 34, 351–366 (1991)Google Scholar
  2. 2.
    Bateman, J.A., Hois, J., Ross, R.J., Tenbrink, T.: A Linguistic Ontology of Space for Natural Language Processing. Artificial Intelligence 174, 1027–1071 (2010)CrossRefGoogle Scholar
  3. 3.
    Bitters, B.: Geospatial Reasoning in a Natural Language Processing (NLP) Environment. In: Proceedings of the 25th International Cartographic Conference (2011)Google Scholar
  4. 4.
    Blaylock, N., Swain, B., Allen, J.F.: Tesla: A tool for annotating geospatial language corpora. In: HLT-NAACL (Short Papers), pp. 45–48 (2009)Google Scholar
  5. 5.
    Blaylock, N., Swain, B., Allen, J.: Mining Geospatial Path Data from Natural Language Descriptions. In: ACM QUeST 2009, Seattle, November 3 (2009)Google Scholar
  6. 6.
    Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, ANLC 1992, pp. 152–155. Association for Computational Linguistics, Stroudsburg (1992)CrossRefGoogle Scholar
  7. 7.
    Califi, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Standford, CA, pp. 6–11 (1998)Google Scholar
  8. 8.
    Chomsky, N.: Three models for the description of language. IRE Transactions on Information Theory 2, 113–124 (1956)zbMATHCrossRefGoogle Scholar
  9. 9.
    Cohen, K.B., Fox, L., Ogren, P.V., Hunter, L.: Corpus design for biomedical natural language processing. In: Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 38–45 (June 2005)Google Scholar
  10. 10.
    Coventry, K.R., Garrod, S.C.: Saying, Seeing and Acting: The Psychological Semantics of Spatial Prepositions. Psychology Press, East Sussex (2004)Google Scholar
  11. 11.
    Cunningham, H.: GATE, a General Architecture for Text Engineering. Computers and the Humanities 36, 223–254 (2002)CrossRefGoogle Scholar
  12. 12.
    Evans, V., Green, M.: Cognitive Linguistics: An Introduction. Edinburgh University Press, Edinburgh (2006)Google Scholar
  13. 13.
    Goldberg, A.: Constructions at Work: The Nature of Generalization in Language. Oxford University Press, Oxford (2006)Google Scholar
  14. 14.
    Gregory, I., Hardie, A.: Visual GISting: bringing together corpus linguistics and Geographical Information Systems. Literary and Linguistic Computing 26, 297–314 (2011)CrossRefGoogle Scholar
  15. 15.
    Hirtle, S., Richter, K.-F., Srinivas, S., Firth, R.: This is the tricky part: When directions become difficult. Journal of Spatial Information Science 1, 53–73 (2010)Google Scholar
  16. 16.
    Hornsby, K.S., Li, N.: Conceptual Framework for Modeling Dynamic Paths from Natural Language Expressions. Transactions in GIS 13, 27–45 (2009)CrossRefGoogle Scholar
  17. 17.
  18. 18.
    Hunston, S., Francis, G.: Pattern Grammar: A Corpus-Driven Approach to the Lexical Grammar of English. John Benjamins Publishing Co., Amsterdam (2000)Google Scholar
  19. 19.
    Johnson, M.: The body in the mind: the bodily basis of meaning, imagination, and reason. University of Chicago Press, Chicago (1987)Google Scholar
  20. 20.
    Klippel, A., Xu, S., Li, R., Yang, J.: Spatial event language across domains. In: Workshop on Computational Models for Spatial Language Interpretation and Generation, CoSLI-2 (2011)Google Scholar
  21. 21.
    Lakoff, G.: Women, fire, and dangerous things: what categories reveal about the mind. University of Chicago Press, Chicago (1990)Google Scholar
  22. 22.
    Landau, B., Jackendoff, R.: “What” and “Where” in spatial language and spatial cognition. Behavioral and Brain Sciences 16, 217–265 (1993)CrossRefGoogle Scholar
  23. 23.
    Law, M.: Guide to Worldwide Postal Code and Address Formats. WorldVu LLC (2010), (accessed May 22, 2013)
  24. 24.
    Mark, D.M., Egenhofer, M.J.: Topology of Prototypical Spatial Relations Between Lines and Regions in English and Spanish. In: Proceedings of the Auto Carto 12, Charlotte, North Carolina, pp. 245–254 (1995)Google Scholar
  25. 25.
    McEnery, T., Hardie, A.: Corpus Linguistics: Method, Theory and Practice. Cambridge University Press, Cambridge (2012)Google Scholar
  26. 26.
    Miller, G.A.: Wordnet: A lexical database for English. Communications of the ACM 38, 39–41 (1995)CrossRefGoogle Scholar
  27. 27.
    Montello, D.R.: Scale and multiple psychologies of space. In: Campari, I., Frank, A.U. (eds.) COSIT 1993. LNCS, vol. 716, pp. 312–321. Springer, Heidelberg (1993)Google Scholar
  28. 28.
    Morimoto, Y., Aono, M., Houle, M.E., McCurley, K.S.: Extracting spatial knowledge from the web. In: SAINT 2003: Proceedings of the 2003 Symposium on Applications and the Internet, pp. 326–333. IEEE Computer Society, Washington, DC (2003)CrossRefGoogle Scholar
  29. 29.
    Morton-Owens, E.: A tool for extracting and indexing spatio-temporal information from biographical articles in Wikipedia. Masters Thesis. New York University (2012)Google Scholar
  30. 30.
    Pustejofsky, J., Moszkowics, J., Verhagen, M.: ISO-Space: The Annotation of Spatial Information in Language. In: Proceedings of the Sixth Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation, Oxford, UK (2011)Google Scholar
  31. 31.
    Riedemann, C.: Naming Topological Operators at GIS User Interfaces. In: 8th AGILE Conference on Geographic Information Science, Estoril, Portugal, pp. 307–315 (2005)Google Scholar
  32. 32.
    Saracevic, T.: Evaluation of evaluation in information retrieval. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Special Issue of SIGIR Forum., pp. 138–146 (1995)Google Scholar
  33. 33.
    Schockaert, S., De Cock, M., Kerre, E., Smart, P., Abdelmoty, A., Jones, C.: Mining topological relations from the web. In: Bhowmick, S.S., Kung, J., Wagner, R. (eds.) DEXA 2008. LNCS, pp. 652–656. Springer (2008)Google Scholar
  34. 34.
    Schwering, A.: Evaluation of a semantic similarity measure for natural language spatial relations. In: Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS, vol. 4736, pp. 116–132. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  35. 35.
    Semino, E., Hardie, A., Koller, V., Rayson, P.: A computer-assisted approach to the analysis of metaphor variation across genres. In: Barnden, J., Lee, M., Littlemore, J., Moon, R., Philip, G., Wallington, A. (eds.) Corpus-based Approaches to Figurative Language, pp. 145–153. University of Birmingham School of Computer Science, Birmingham (2005)Google Scholar
  36. 36.
    Stock, K.: NaturalGeo Project: Identifying Patterns in Geospatial Natural Language (2012) (accessed on May 22, 2013)Google Scholar
  37. 37.
    Talmy, L.: Toward a Cognitive Semantics. MIT Press, Cambridge (2000)Google Scholar
  38. 38.
    Tellex, S.: Natural Language and Spatial Reasoning. PhD Thesis, Massachusetts Institute of Technology (2009)Google Scholar
  39. 39.
    Tomai, E., Kavouras, M.: Where the city sits? Revealing Geospatial Semantics in Text Descriptions. In: 7th AGILE Conference on Geographic Information Science, pp. 189–194. Association of Geographic Information Laboratories for Europe, Heraklion (2004)Google Scholar
  40. 40.
    Usmani, T.A., Pant, D., Bhatt, A.K.: A Comparative Study of Google and Bing Search Engines in Context of Precision and Relative Recall Parameter. International Journal on Computer Science & Engineering 4, 21–34 (2012)Google Scholar
  41. 41.
    Vasardani, M., Winter, S., Richter, K.-F.: Locating place names from place descriptions. International Journal of Geographical Information Science (2013)Google Scholar
  42. 42.
    Wang, X., Matsakis, P., Trick, L., Nonnecke, B., Veltman, M.A.: A study on how humans describe relative positions of image objects. In: Ruas, A., Gold, C. (eds.) Headway in Spatial Data Handling, Proceedings of SDH 2008, 13th Int. Symposium on Spatial Data Handling, pp. 1–18. Springer Publications (2008)Google Scholar
  43. 43.
    Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting tf–idf term weights as making relevance decisions. ACM Transactions on Information Systems 26, 1–37 (2008)CrossRefGoogle Scholar
  44. 44.
    Xiao, R.: Corpus Creation. In: Indurkhya, N., Damerau, F.J. (eds.) The Handbook of Natural Language Processing, 2nd edn., pp. 147–165 (2010)Google Scholar
  45. 45.
    Xu, S., Klippel, A., MacEachren, A., Mitra, P., Turton, I., Zhang, X., Jaiswal, A.: Exploring regional variation in spatial language - a case study on spatial orientation with spatially stratified web-sampled documents. In: Spatial Cognition Conference – Poster Session, Mt. Hood, Portland Oregon (2010)Google Scholar
  46. 46.
    Zhang, C., Zhang, X., Jiang, W., Shen, Q., Zhang, S.: Rule-Based Extraction of Spatial Relations in Natural Language Text. In: International Conference on Computational Intelligence and Software Engineering, CiSE 2009, pp. 1–4 (2009)Google Scholar
  47. 47.
    Zhang, X., Zhang, C., Du, C., Zhu, S.: SVM based Extraction of Spatial Relations in Text. In: Proceedings of the IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services 2011, Fuzhou, China (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Kristin Stock
    • 1
  • Robert C. Pasley
    • 1
  • Zoe Gardner
    • 1
  • Paul Brindley
    • 1
  • Jeremy Morley
    • 1
  • Claudia Cialone
    • 1
  1. 1.Nottingham Geospatial InstituteUniversity of NottinghamUnited Kingdom

Personalised recommendations