Abstract
The description of location using natural language is of interest for a number of research activities including the automated interpretation and generation of natural language to ease interaction with geographic information systems. For such activities, examples of geospatial natural language are usually collected from the personal knowledge of researchers, or in small scale collection activities specific to the project concerned. This paper describes the process used to develop a more generic corpus of geospatial natural language.
The paper discusses the development and evaluation of four methods for semi-automated harvesting of geospatial natural language clauses from text to create a corpus of geospatial natural language. The most successful method uses a set of geospatial syntactic templates that describe common patterns of grammatical geospatial word categories and provide a precision of 0.66. Particular challenges were posed by the range of English dialects included, as well as metaphoric and sporting references.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G.M., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H.S., Weinert, R.: The HCRC Map Task Corpus. Language and Speech 34, 351–366 (1991)
Bateman, J.A., Hois, J., Ross, R.J., Tenbrink, T.: A Linguistic Ontology of Space for Natural Language Processing. Artificial Intelligence 174, 1027–1071 (2010)
Bitters, B.: Geospatial Reasoning in a Natural Language Processing (NLP) Environment. In: Proceedings of the 25th International Cartographic Conference (2011)
Blaylock, N., Swain, B., Allen, J.F.: Tesla: A tool for annotating geospatial language corpora. In: HLT-NAACL (Short Papers), pp. 45–48 (2009)
Blaylock, N., Swain, B., Allen, J.: Mining Geospatial Path Data from Natural Language Descriptions. In: ACM QUeST 2009, Seattle, November 3 (2009)
Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, ANLC 1992, pp. 152–155. Association for Computational Linguistics, Stroudsburg (1992)
Califi, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Standford, CA, pp. 6–11 (1998)
Chomsky, N.: Three models for the description of language. IRE Transactions on Information Theory 2, 113–124 (1956)
Cohen, K.B., Fox, L., Ogren, P.V., Hunter, L.: Corpus design for biomedical natural language processing. In: Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 38–45 (June 2005)
Coventry, K.R., Garrod, S.C.: Saying, Seeing and Acting: The Psychological Semantics of Spatial Prepositions. Psychology Press, East Sussex (2004)
Cunningham, H.: GATE, a General Architecture for Text Engineering. Computers and the Humanities 36, 223–254 (2002)
Evans, V., Green, M.: Cognitive Linguistics: An Introduction. Edinburgh University Press, Edinburgh (2006)
Goldberg, A.: Constructions at Work: The Nature of Generalization in Language. Oxford University Press, Oxford (2006)
Gregory, I., Hardie, A.: Visual GISting: bringing together corpus linguistics and Geographical Information Systems. Literary and Linguistic Computing 26, 297–314 (2011)
Hirtle, S., Richter, K.-F., Srinivas, S., Firth, R.: This is the tricky part: When directions become difficult. Journal of Spatial Information Science 1, 53–73 (2010)
Hornsby, K.S., Li, N.: Conceptual Framework for Modeling Dynamic Paths from Natural Language Expressions. Transactions in GIS 13, 27–45 (2009)
Hunston, S., Francis, G.: Pattern Grammar: A Corpus-Driven Approach to the Lexical Grammar of English. John Benjamins Publishing Co., Amsterdam (2000)
Johnson, M.: The body in the mind: the bodily basis of meaning, imagination, and reason. University of Chicago Press, Chicago (1987)
Klippel, A., Xu, S., Li, R., Yang, J.: Spatial event language across domains. In: Workshop on Computational Models for Spatial Language Interpretation and Generation, CoSLI-2 (2011)
Lakoff, G.: Women, fire, and dangerous things: what categories reveal about the mind. University of Chicago Press, Chicago (1990)
Landau, B., Jackendoff, R.: “What” and “Where” in spatial language and spatial cognition. Behavioral and Brain Sciences 16, 217–265 (1993)
Law, M.: Guide to Worldwide Postal Code and Address Formats. WorldVu LLC (2010), http://www.worldvu.com (accessed May 22, 2013)
Mark, D.M., Egenhofer, M.J.: Topology of Prototypical Spatial Relations Between Lines and Regions in English and Spanish. In: Proceedings of the Auto Carto 12, Charlotte, North Carolina, pp. 245–254 (1995)
McEnery, T., Hardie, A.: Corpus Linguistics: Method, Theory and Practice. Cambridge University Press, Cambridge (2012)
Miller, G.A.: Wordnet: A lexical database for English. Communications of the ACM 38, 39–41 (1995)
Montello, D.R.: Scale and multiple psychologies of space. In: Campari, I., Frank, A.U. (eds.) COSIT 1993. LNCS, vol. 716, pp. 312–321. Springer, Heidelberg (1993)
Morimoto, Y., Aono, M., Houle, M.E., McCurley, K.S.: Extracting spatial knowledge from the web. In: SAINT 2003: Proceedings of the 2003 Symposium on Applications and the Internet, pp. 326–333. IEEE Computer Society, Washington, DC (2003)
Morton-Owens, E.: A tool for extracting and indexing spatio-temporal information from biographical articles in Wikipedia. Masters Thesis. New York University (2012)
Pustejofsky, J., Moszkowics, J., Verhagen, M.: ISO-Space: The Annotation of Spatial Information in Language. In: Proceedings of the Sixth Joint ISO - ACL SIGSEM Workshop on Interoperable Semantic Annotation, Oxford, UK (2011)
Riedemann, C.: Naming Topological Operators at GIS User Interfaces. In: 8th AGILE Conference on Geographic Information Science, Estoril, Portugal, pp. 307–315 (2005)
Saracevic, T.: Evaluation of evaluation in information retrieval. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Special Issue of SIGIR Forum., pp. 138–146 (1995)
Schockaert, S., De Cock, M., Kerre, E., Smart, P., Abdelmoty, A., Jones, C.: Mining topological relations from the web. In: Bhowmick, S.S., Kung, J., Wagner, R. (eds.) DEXA 2008. LNCS, pp. 652–656. Springer (2008)
Schwering, A.: Evaluation of a semantic similarity measure for natural language spatial relations. In: Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS, vol. 4736, pp. 116–132. Springer, Heidelberg (2007)
Semino, E., Hardie, A., Koller, V., Rayson, P.: A computer-assisted approach to the analysis of metaphor variation across genres. In: Barnden, J., Lee, M., Littlemore, J., Moon, R., Philip, G., Wallington, A. (eds.) Corpus-based Approaches to Figurative Language, pp. 145–153. University of Birmingham School of Computer Science, Birmingham (2005)
Stock, K.: NaturalGeo Project: Identifying Patterns in Geospatial Natural Language (2012) (accessed on May 22, 2013)
Talmy, L.: Toward a Cognitive Semantics. MIT Press, Cambridge (2000)
Tellex, S.: Natural Language and Spatial Reasoning. PhD Thesis, Massachusetts Institute of Technology (2009)
Tomai, E., Kavouras, M.: Where the city sits? Revealing Geospatial Semantics in Text Descriptions. In: 7th AGILE Conference on Geographic Information Science, pp. 189–194. Association of Geographic Information Laboratories for Europe, Heraklion (2004)
Usmani, T.A., Pant, D., Bhatt, A.K.: A Comparative Study of Google and Bing Search Engines in Context of Precision and Relative Recall Parameter. International Journal on Computer Science & Engineering 4, 21–34 (2012)
Vasardani, M., Winter, S., Richter, K.-F.: Locating place names from place descriptions. International Journal of Geographical Information Science (2013)
Wang, X., Matsakis, P., Trick, L., Nonnecke, B., Veltman, M.A.: A study on how humans describe relative positions of image objects. In: Ruas, A., Gold, C. (eds.) Headway in Spatial Data Handling, Proceedings of SDH 2008, 13th Int. Symposium on Spatial Data Handling, pp. 1–18. Springer Publications (2008)
Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting tf–idf term weights as making relevance decisions. ACM Transactions on Information Systems 26, 1–37 (2008)
Xiao, R.: Corpus Creation. In: Indurkhya, N., Damerau, F.J. (eds.) The Handbook of Natural Language Processing, 2nd edn., pp. 147–165 (2010)
Xu, S., Klippel, A., MacEachren, A., Mitra, P., Turton, I., Zhang, X., Jaiswal, A.: Exploring regional variation in spatial language - a case study on spatial orientation with spatially stratified web-sampled documents. In: Spatial Cognition Conference – Poster Session, Mt. Hood, Portland Oregon (2010)
Zhang, C., Zhang, X., Jiang, W., Shen, Q., Zhang, S.: Rule-Based Extraction of Spatial Relations in Natural Language Text. In: International Conference on Computational Intelligence and Software Engineering, CiSE 2009, pp. 1–4 (2009)
Zhang, X., Zhang, C., Du, C., Zhu, S.: SVM based Extraction of Spatial Relations in Text. In: Proceedings of the IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services 2011, Fuzhou, China (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Stock, K., Pasley, R.C., Gardner, Z., Brindley, P., Morley, J., Cialone, C. (2013). Creating a Corpus of Geospatial Natural Language. In: Tenbrink, T., Stell, J., Galton, A., Wood, Z. (eds) Spatial Information Theory. COSIT 2013. Lecture Notes in Computer Science, vol 8116. Springer, Cham. https://doi.org/10.1007/978-3-319-01790-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-01790-7_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01789-1
Online ISBN: 978-3-319-01790-7
eBook Packages: Computer ScienceComputer Science (R0)