Skip to main content
Log in

An evaluative baseline for geo-semantic relatedness and similarity

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

In geographic information science and semantics, the computation of semantic similarity is widely recognised as key to supporting a vast number of tasks in information integration and retrieval. By contrast, the role of geo-semantic relatedness has been largely ignored. In natural language processing, semantic relatedness is often confused with the more specific semantic similarity. In this article, we discuss a notion of geo-semantic relatedness based on Lehrer’s semantic fields, and we compare it with geo-semantic similarity. We then describe and validate the Geo Relatedness and Similarity Dataset (GeReSiD), a new open dataset designed to evaluate computational measures of geo-semantic relatedness and similarity. This dataset is larger than existing datasets of this kind, and includes 97 geographic terms combined into 50 term pairs rated by 203 human subjects. GeReSiD is available online and can be used as an evaluation baseline to determine empirically to what degree a given computational model approximates geo-semantic relatedness and similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://github.com/ucd-spatial/Datasets (acc. Apr 10, 2013)

  2. http://oxforddictionaries.com/definition/gold+standard (acc. Apr 10, 2013)

  3. http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353 (acc. Apr 10, 2013)

  4. http://alfonseca.org/eng/research/wordsim353.html (acc. Apr 10, 2013)

  5. Although a better gender, age, and geographic balances would be desirable, we found it difficult to obtain it in practice without drastically limiting the size of the sample.

  6. http://github.com/ucd-spatial/Datasets (acc. Apr 10, 2013)

References

  1. Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics. ACL, pp 19–27

  2. Bakillah M, Bédard Y, Mostafavi M, Brodeur J (2009) SIM-NET: a view-based semantic similarity model for Ad Hoc networks of geospatial databases. Trans GIS 13(5–6):417–447

    Article  Google Scholar 

  3. Ballatore A, Wilson D, Bertolotto M (2012) The similarity jury: combining expert judgements on geographic concepts. In: Castano S, Vassiliadis P, Lakshmanan L, Lee M (eds) Advances in conceptual modeling. ER 2012 workshops (SeCoGIS). LNCS, vol 7518. Springer, Berlin, pp 231–240

    Google Scholar 

  4. Ballatore A, Bertolotto M, Wilson D (2013) Computing the semantic similarity of geographic terms using volunteered lexical definitions. Int J Geogr Inf Sci 27(10):2099–2118

    Article  Google Scholar 

  5. Ballatore A, Bertolotto M, Wilson D (2013) Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowl Inf Syst 37(1):61–81

    Article  Google Scholar 

  6. Ballatore A, Wilson D, Bertolotto M (2013) A survey of volunteered open geo-knowledge bases in the semantic web. In: Pasi G, Bordogna G, Jain L (eds) Quality issues in the management of web information, intelligent systems reference library, vol 50. Springer, Berlin, pp 93–120

    Chapter  Google Scholar 

  7. Banerjee M, Capozzoli M, McSweeney L, Sinha D (1999) Beyond Kappa: a review of interrater agreement measures. Can J Stat 27(1):3–23

    Article  Google Scholar 

  8. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  9. Budanitsky A, Hirst G (2006) Evaluating WordNet-based measures of lexical semantic relatedness. Comput Linguist 32(1):13–47

    Article  Google Scholar 

  10. Cimiano P, Völker J (2005) Towards large-scale, open-domain and ontology-based named entity classification. In: Recent advances in natural language processing, RANLP 2005. ACL, pp 166–172

  11. Dawes J (2008) Do data characteristics change according to the number of scale points used?Int J Mark Res 50(1):61–78

    Google Scholar 

  12. Ferrara F, Tasso C (2013) Evaluating the results of methods for computing semantic relatedness. In: Gelbukh A (ed) Computational linguistics and intelligent text processing. LNCS, vol 7816. Springer, Berlin, pp 447–458

    Chapter  Google Scholar 

  13. Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inf Syst 20(1):116–131

    Article  Google Scholar 

  14. Finn R (1970) A note on estimating the reliability of categorical data. Educ Psychol Meas 30(1):71–76

    Article  Google Scholar 

  15. Goldstone R, Son J (2005) Similarity. In: Holyoak K, Morrison R (eds) Cambridge handbook of thinking and reasoning. Cambridge University Press, New York, pp 13–36

    Google Scholar 

  16. Hecht B, Raubal M (2008) GeoSR: geographically explore semantic relations in world knowledge. In: The European information society: taking geoinformation science one step further. LNGC, Springer, Berlin

  17. Hecht B, Carton SH, Quaderi M, Schöning J, Raubal M, Gergle D, Downey D (2012) Explanatory semantic relatedness and explicit spatialization for exploratory search. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 415–424

  18. James L, Demaree R, Wolf G (1984) Estimating within-group interrater reliability with and without response bias. J Appl Psychol 69(1):85–98

    Article  Google Scholar 

  19. Janowicz K, Raubal M (2007) Affordance-based similarity measurement for entity types. In: Spatial information theory. LNCS, vol 4736. Springer, Berlin, pp 133–151

  20. Janowicz K, Keßler C, Schwarz M, Wilkes M, Panov I, Espeter M, Bäumer B (2007) Algorithm, implementation and application of the SIM-DL similarity server. In: GeoSpatial semantics: second international conference, GeoS 2007. LNCS, vol 4853. Springer, Berlin, pp 128–145

  21. Janowicz K, Keßler C, Panov I, Wilkes M, Espeter M, Schwarz M (2008) A study on the cognitive plausibility of SIM-DL similarity rankings for geographic feature types. In: Fabrikant S, Wachowicz M (eds) The European information society: taking geoinformation science one step further. LNGC, Springer, Berlin, pp 115–134

    Chapter  Google Scholar 

  22. Janowicz K, Raubal M, Schwering A, Kuhn W (2008) Semantic similarity measurement and geospatial applications. Trans GIS 12(6):651–659

    Article  Google Scholar 

  23. Janowicz K, Raubal M, Kuhn W (2011) The semantics of similarity in geographic information retrieval. J Spat Info Sci 2(1):29–57

    Google Scholar 

  24. Kaptchuk T (2001) The double-blind, randomized, placebo-controlled trial: gold standard or golden calf?J Clin Epidemiol 54(6):541–549

    Article  Google Scholar 

  25. Kendall M, Smith B (1939) The problem of m rankings. Ann Math Stat 10(3):275–287

    Article  Google Scholar 

  26. Keßler C (2007) Similarity measurement in context. In: Proceedings of the 6th international and interdisciplinary conference on modeling and using context. LNCS, vol 4635. Springer, pp 277–290

  27. Keßler C (2011) What is the difference? A cognitive dissimilarity measure for information retrieval result sets. Knowl Inf Syst 30(2):319–340

    Article  Google Scholar 

  28. Khoo C, Na J (2006) Semantic relations in information science. Annu Rev Inf Sci Technol 40(1):157–207

    Article  Google Scholar 

  29. Kuhn W (2013) Cognitive and linguistic ideas and geographic information semantics. In: Cognitive and linguistic aspects of geographic space. LNGC, Springer, pp 159–174

  30. LeBreton J, Senter J (2008) Answers to 20 questions about interrater reliability and interrater agreement. Organ Res Methods 11(4):815–852

    Article  Google Scholar 

  31. Lehrer A (1985) The influence of semantic fields on semantic change. In: Fisiak J (ed) Historical word formation. Walter de Gruyter, Berlin, pp 283–296

    Google Scholar 

  32. Medin D, Goldstone R, Gentner D (1990) Similarity involving attributes and relations: judgments of similarity and difference are not inverses. Psychol Sci 1(1):64–69

    Article  Google Scholar 

  33. Miller G, Charles W (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28

    Article  Google Scholar 

  34. Mohammad S, Hirst G (2012) Distributional measures of semantic distance: a survey. Comput Res Repository (CoRR) 1–39. arXiv:http://arXiv.org/abs/1203.1858

  35. Montello DR, Fabrikant SI, Ruocco M, Middleton RS (2003) Testing the first law of cognitive geography on point-display spatializations. In:Kuhn W, Worboys M, Timpf S (eds) Spatial information theory. Foundations of Geographic Information Science. LNCS, vol 2825. Springer, pp 316–331

  36. Morris J, Hirst G (2004) Non-classical lexical semantic relations. In: Proceedings of the HLT-NAACL workshop on computational lexical semantics. ACL, pp 46–51

  37. Nelson D, Dyrdal G, Goodmon L (2005) What is preexisting strength? Predicting free association probabilities, similarity ratings, and cued recall probabilities. Psychon Bull Rev 12(4):711–719

    Article  Google Scholar 

  38. Pedersen T, Kolhatkar V (2009) Wordnet::senserelate::allwords: a broad coverage word sense tagger that maximizes semantic relatedness. In: Proceedings of human language technologies: the 2009 annual conference of the north american chapter of the association for computational linguistics, companion volume: demonstration session. ACL, pp 17–20

  39. Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet::similarity: measuring the relatedness of concepts. In: Proceedings of human language technologies: the 2004 annual conference of the north American Chapter of the Association for Computational Linguistics, companion volume: demonstration session. ACL, pp 38–41

  40. Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30

    Article  Google Scholar 

  41. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI’95, vol 1. Morgan Kaufmann, pp 448–453

  42. Robinson W (1957) The statistical measurement of agreement. Am Sociol Rev 22(1):17–25

    Article  Google Scholar 

  43. Rodgers J, Nicewander W (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42(1):59–66

    Article  Google Scholar 

  44. Rodríguez M, Egenhofer M (2004) Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure. Int J Geogr Inf Sci 18(3):229–256

    Article  Google Scholar 

  45. Rubenstein H, Goodenough J (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633

    Article  Google Scholar 

  46. Schütze H (1998) Automatic word sense discrimination. Comput Linguist 24(1):97–123

    Google Scholar 

  47. Schwering A (2008) Approaches to semantic similarity measurement for geo-spatial data: a survey. Trans GIS 12(1):5–29

    Article  Google Scholar 

  48. Schwering A, Kuhn W (2009) A hybrid semantic similarity measure for spatial information retrieval. Spat Cogn Comput 9(1):30–63

    Google Scholar 

  49. Schwering A, Raubal M (2005) Spatial relations for semantic similarity measurement. In: Perspectives in conceptual modeling. LNCS, vol 3770. Springer, pp 259–269

  50. Strube G (1992) The role of cognitive science in knowledge engineering. Contemp Knowl Eng Cogn 622:159–174

    Article  Google Scholar 

  51. Tobler W (1970) A computer movie simulating urban growth in the Detroit region. In: Economic geography. Supplement: proceedings. International Geographical Union. Commission on quantitative methods, vol 46. Clark University, Worcester, pp 234–240

  52. Toutanova K, Klein D, Manning C, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1. ACL, pp 173–180

  53. Turney P (2006) Similarity of semantic relations. Comput Linguist 32(3):379–416

    Article  Google Scholar 

  54. Tversky A (1977) Features of similarity. Psychol Rev 84(4):327–352

    Article  Google Scholar 

  55. Wang C, Wang J, Xie X, Ma WY (2007) Mining geographic knowledge using location aware topic model. In: Proceedings of the 4th ACM workshop on geographical information retrieval. ACM, pp 65–70

  56. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83

    Article  Google Scholar 

  57. Wright K (2005) Researching internet-based populations: advantages and disadvantages of online survey research, online questionnaire authoring software packages, and web survey services. J Comput-Mediated Commun 10(3). http://jcmc.indiana.edu/vol10/issue3/wright.html, article 11

Download references

Acknowledgments

The research presented in this article was funded by a Strategic Research Cluster grant (07/SRC/I1168) by Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Ballatore.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 95.0 KB)

(DOC 161 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ballatore, A., Bertolotto, M. & Wilson, D.C. An evaluative baseline for geo-semantic relatedness and similarity. Geoinformatica 18, 747–767 (2014). https://doi.org/10.1007/s10707-013-0197-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-013-0197-8

Keywords

Navigation