Abstract
In geographic information science and semantics, the computation of semantic similarity is widely recognised as key to supporting a vast number of tasks in information integration and retrieval. By contrast, the role of geo-semantic relatedness has been largely ignored. In natural language processing, semantic relatedness is often confused with the more specific semantic similarity. In this article, we discuss a notion of geo-semantic relatedness based on Lehrer’s semantic fields, and we compare it with geo-semantic similarity. We then describe and validate the Geo Relatedness and Similarity Dataset (GeReSiD), a new open dataset designed to evaluate computational measures of geo-semantic relatedness and similarity. This dataset is larger than existing datasets of this kind, and includes 97 geographic terms combined into 50 term pairs rated by 203 human subjects. GeReSiD is available online and can be used as an evaluation baseline to determine empirically to what degree a given computational model approximates geo-semantic relatedness and similarity.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
http://github.com/ucd-spatial/Datasets (acc. Apr 10, 2013)
http://oxforddictionaries.com/definition/gold+standard (acc. Apr 10, 2013)
http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353 (acc. Apr 10, 2013)
http://alfonseca.org/eng/research/wordsim353.html (acc. Apr 10, 2013)
Although a better gender, age, and geographic balances would be desirable, we found it difficult to obtain it in practice without drastically limiting the size of the sample.
http://github.com/ucd-spatial/Datasets (acc. Apr 10, 2013)
References
Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics. ACL, pp 19–27
Bakillah M, Bédard Y, Mostafavi M, Brodeur J (2009) SIM-NET: a view-based semantic similarity model for Ad Hoc networks of geospatial databases. Trans GIS 13(5–6):417–447
Ballatore A, Wilson D, Bertolotto M (2012) The similarity jury: combining expert judgements on geographic concepts. In: Castano S, Vassiliadis P, Lakshmanan L, Lee M (eds) Advances in conceptual modeling. ER 2012 workshops (SeCoGIS). LNCS, vol 7518. Springer, Berlin, pp 231–240
Ballatore A, Bertolotto M, Wilson D (2013) Computing the semantic similarity of geographic terms using volunteered lexical definitions. Int J Geogr Inf Sci 27(10):2099–2118
Ballatore A, Bertolotto M, Wilson D (2013) Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowl Inf Syst 37(1):61–81
Ballatore A, Wilson D, Bertolotto M (2013) A survey of volunteered open geo-knowledge bases in the semantic web. In: Pasi G, Bordogna G, Jain L (eds) Quality issues in the management of web information, intelligent systems reference library, vol 50. Springer, Berlin, pp 93–120
Banerjee M, Capozzoli M, McSweeney L, Sinha D (1999) Beyond Kappa: a review of interrater agreement measures. Can J Stat 27(1):3–23
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Budanitsky A, Hirst G (2006) Evaluating WordNet-based measures of lexical semantic relatedness. Comput Linguist 32(1):13–47
Cimiano P, Völker J (2005) Towards large-scale, open-domain and ontology-based named entity classification. In: Recent advances in natural language processing, RANLP 2005. ACL, pp 166–172
Dawes J (2008) Do data characteristics change according to the number of scale points used?Int J Mark Res 50(1):61–78
Ferrara F, Tasso C (2013) Evaluating the results of methods for computing semantic relatedness. In: Gelbukh A (ed) Computational linguistics and intelligent text processing. LNCS, vol 7816. Springer, Berlin, pp 447–458
Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inf Syst 20(1):116–131
Finn R (1970) A note on estimating the reliability of categorical data. Educ Psychol Meas 30(1):71–76
Goldstone R, Son J (2005) Similarity. In: Holyoak K, Morrison R (eds) Cambridge handbook of thinking and reasoning. Cambridge University Press, New York, pp 13–36
Hecht B, Raubal M (2008) GeoSR: geographically explore semantic relations in world knowledge. In: The European information society: taking geoinformation science one step further. LNGC, Springer, Berlin
Hecht B, Carton SH, Quaderi M, Schöning J, Raubal M, Gergle D, Downey D (2012) Explanatory semantic relatedness and explicit spatialization for exploratory search. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 415–424
James L, Demaree R, Wolf G (1984) Estimating within-group interrater reliability with and without response bias. J Appl Psychol 69(1):85–98
Janowicz K, Raubal M (2007) Affordance-based similarity measurement for entity types. In: Spatial information theory. LNCS, vol 4736. Springer, Berlin, pp 133–151
Janowicz K, Keßler C, Schwarz M, Wilkes M, Panov I, Espeter M, Bäumer B (2007) Algorithm, implementation and application of the SIM-DL similarity server. In: GeoSpatial semantics: second international conference, GeoS 2007. LNCS, vol 4853. Springer, Berlin, pp 128–145
Janowicz K, Keßler C, Panov I, Wilkes M, Espeter M, Schwarz M (2008) A study on the cognitive plausibility of SIM-DL similarity rankings for geographic feature types. In: Fabrikant S, Wachowicz M (eds) The European information society: taking geoinformation science one step further. LNGC, Springer, Berlin, pp 115–134
Janowicz K, Raubal M, Schwering A, Kuhn W (2008) Semantic similarity measurement and geospatial applications. Trans GIS 12(6):651–659
Janowicz K, Raubal M, Kuhn W (2011) The semantics of similarity in geographic information retrieval. J Spat Info Sci 2(1):29–57
Kaptchuk T (2001) The double-blind, randomized, placebo-controlled trial: gold standard or golden calf?J Clin Epidemiol 54(6):541–549
Kendall M, Smith B (1939) The problem of m rankings. Ann Math Stat 10(3):275–287
Keßler C (2007) Similarity measurement in context. In: Proceedings of the 6th international and interdisciplinary conference on modeling and using context. LNCS, vol 4635. Springer, pp 277–290
Keßler C (2011) What is the difference? A cognitive dissimilarity measure for information retrieval result sets. Knowl Inf Syst 30(2):319–340
Khoo C, Na J (2006) Semantic relations in information science. Annu Rev Inf Sci Technol 40(1):157–207
Kuhn W (2013) Cognitive and linguistic ideas and geographic information semantics. In: Cognitive and linguistic aspects of geographic space. LNGC, Springer, pp 159–174
LeBreton J, Senter J (2008) Answers to 20 questions about interrater reliability and interrater agreement. Organ Res Methods 11(4):815–852
Lehrer A (1985) The influence of semantic fields on semantic change. In: Fisiak J (ed) Historical word formation. Walter de Gruyter, Berlin, pp 283–296
Medin D, Goldstone R, Gentner D (1990) Similarity involving attributes and relations: judgments of similarity and difference are not inverses. Psychol Sci 1(1):64–69
Miller G, Charles W (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28
Mohammad S, Hirst G (2012) Distributional measures of semantic distance: a survey. Comput Res Repository (CoRR) 1–39. arXiv:http://arXiv.org/abs/1203.1858
Montello DR, Fabrikant SI, Ruocco M, Middleton RS (2003) Testing the first law of cognitive geography on point-display spatializations. In:Kuhn W, Worboys M, Timpf S (eds) Spatial information theory. Foundations of Geographic Information Science. LNCS, vol 2825. Springer, pp 316–331
Morris J, Hirst G (2004) Non-classical lexical semantic relations. In: Proceedings of the HLT-NAACL workshop on computational lexical semantics. ACL, pp 46–51
Nelson D, Dyrdal G, Goodmon L (2005) What is preexisting strength? Predicting free association probabilities, similarity ratings, and cued recall probabilities. Psychon Bull Rev 12(4):711–719
Pedersen T, Kolhatkar V (2009) Wordnet::senserelate::allwords: a broad coverage word sense tagger that maximizes semantic relatedness. In: Proceedings of human language technologies: the 2009 annual conference of the north american chapter of the association for computational linguistics, companion volume: demonstration session. ACL, pp 17–20
Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet::similarity: measuring the relatedness of concepts. In: Proceedings of human language technologies: the 2004 annual conference of the north American Chapter of the Association for Computational Linguistics, companion volume: demonstration session. ACL, pp 38–41
Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI’95, vol 1. Morgan Kaufmann, pp 448–453
Robinson W (1957) The statistical measurement of agreement. Am Sociol Rev 22(1):17–25
Rodgers J, Nicewander W (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42(1):59–66
Rodríguez M, Egenhofer M (2004) Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure. Int J Geogr Inf Sci 18(3):229–256
Rubenstein H, Goodenough J (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633
Schütze H (1998) Automatic word sense discrimination. Comput Linguist 24(1):97–123
Schwering A (2008) Approaches to semantic similarity measurement for geo-spatial data: a survey. Trans GIS 12(1):5–29
Schwering A, Kuhn W (2009) A hybrid semantic similarity measure for spatial information retrieval. Spat Cogn Comput 9(1):30–63
Schwering A, Raubal M (2005) Spatial relations for semantic similarity measurement. In: Perspectives in conceptual modeling. LNCS, vol 3770. Springer, pp 259–269
Strube G (1992) The role of cognitive science in knowledge engineering. Contemp Knowl Eng Cogn 622:159–174
Tobler W (1970) A computer movie simulating urban growth in the Detroit region. In: Economic geography. Supplement: proceedings. International Geographical Union. Commission on quantitative methods, vol 46. Clark University, Worcester, pp 234–240
Toutanova K, Klein D, Manning C, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1. ACL, pp 173–180
Turney P (2006) Similarity of semantic relations. Comput Linguist 32(3):379–416
Tversky A (1977) Features of similarity. Psychol Rev 84(4):327–352
Wang C, Wang J, Xie X, Ma WY (2007) Mining geographic knowledge using location aware topic model. In: Proceedings of the 4th ACM workshop on geographical information retrieval. ACM, pp 65–70
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Wright K (2005) Researching internet-based populations: advantages and disadvantages of online survey research, online questionnaire authoring software packages, and web survey services. J Comput-Mediated Commun 10(3). http://jcmc.indiana.edu/vol10/issue3/wright.html, article 11
Acknowledgments
The research presented in this article was funded by a Strategic Research Cluster grant (07/SRC/I1168) by Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Ballatore, A., Bertolotto, M. & Wilson, D.C. An evaluative baseline for geo-semantic relatedness and similarity. Geoinformatica 18, 747–767 (2014). https://doi.org/10.1007/s10707-013-0197-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-013-0197-8