Advertisement

GeoInformatica

, Volume 18, Issue 4, pp 747–767 | Cite as

An evaluative baseline for geo-semantic relatedness and similarity

  • Andrea BallatoreEmail author
  • Michela Bertolotto
  • David C. Wilson
Article

Abstract

In geographic information science and semantics, the computation of semantic similarity is widely recognised as key to supporting a vast number of tasks in information integration and retrieval. By contrast, the role of geo-semantic relatedness has been largely ignored. In natural language processing, semantic relatedness is often confused with the more specific semantic similarity. In this article, we discuss a notion of geo-semantic relatedness based on Lehrer’s semantic fields, and we compare it with geo-semantic similarity. We then describe and validate the Geo Relatedness and Similarity Dataset (GeReSiD), a new open dataset designed to evaluate computational measures of geo-semantic relatedness and similarity. This dataset is larger than existing datasets of this kind, and includes 97 geographic terms combined into 50 term pairs rated by 203 human subjects. GeReSiD is available online and can be used as an evaluation baseline to determine empirically to what degree a given computational model approximates geo-semantic relatedness and similarity.

Keywords

Geo-semantic relatedness Geo-semantic similarity Gold standards Geo-semantics Cognitive plausibility GeReSiD 

Notes

Acknowledgments

The research presented in this article was funded by a Strategic Research Cluster grant (07/SRC/I1168) by Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support.

Supplementary material

10707_2013_197_MOESM1_ESM.doc (95 kb)
(DOC 95.0 KB)
10707_2013_197_MOESM2_ESM.doc (162 kb)
(DOC 161 KB)

References

  1. 1.
    Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics. ACL, pp 19–27Google Scholar
  2. 2.
    Bakillah M, Bédard Y, Mostafavi M, Brodeur J (2009) SIM-NET: a view-based semantic similarity model for Ad Hoc networks of geospatial databases. Trans GIS 13(5–6):417–447CrossRefGoogle Scholar
  3. 3.
    Ballatore A, Wilson D, Bertolotto M (2012) The similarity jury: combining expert judgements on geographic concepts. In: Castano S, Vassiliadis P, Lakshmanan L, Lee M (eds) Advances in conceptual modeling. ER 2012 workshops (SeCoGIS). LNCS, vol 7518. Springer, Berlin, pp 231–240Google Scholar
  4. 4.
    Ballatore A, Bertolotto M, Wilson D (2013) Computing the semantic similarity of geographic terms using volunteered lexical definitions. Int J Geogr Inf Sci 27(10):2099–2118CrossRefGoogle Scholar
  5. 5.
    Ballatore A, Bertolotto M, Wilson D (2013) Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowl Inf Syst 37(1):61–81CrossRefGoogle Scholar
  6. 6.
    Ballatore A, Wilson D, Bertolotto M (2013) A survey of volunteered open geo-knowledge bases in the semantic web. In: Pasi G, Bordogna G, Jain L (eds) Quality issues in the management of web information, intelligent systems reference library, vol 50. Springer, Berlin, pp 93–120CrossRefGoogle Scholar
  7. 7.
    Banerjee M, Capozzoli M, McSweeney L, Sinha D (1999) Beyond Kappa: a review of interrater agreement measures. Can J Stat 27(1):3–23CrossRefGoogle Scholar
  8. 8.
    Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022Google Scholar
  9. 9.
    Budanitsky A, Hirst G (2006) Evaluating WordNet-based measures of lexical semantic relatedness. Comput Linguist 32(1):13–47CrossRefGoogle Scholar
  10. 10.
    Cimiano P, Völker J (2005) Towards large-scale, open-domain and ontology-based named entity classification. In: Recent advances in natural language processing, RANLP 2005. ACL, pp 166–172Google Scholar
  11. 11.
    Dawes J (2008) Do data characteristics change according to the number of scale points used?Int J Mark Res 50(1):61–78Google Scholar
  12. 12.
    Ferrara F, Tasso C (2013) Evaluating the results of methods for computing semantic relatedness. In: Gelbukh A (ed) Computational linguistics and intelligent text processing. LNCS, vol 7816. Springer, Berlin, pp 447–458CrossRefGoogle Scholar
  13. 13.
    Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inf Syst 20(1):116–131CrossRefGoogle Scholar
  14. 14.
    Finn R (1970) A note on estimating the reliability of categorical data. Educ Psychol Meas 30(1):71–76CrossRefGoogle Scholar
  15. 15.
    Goldstone R, Son J (2005) Similarity. In: Holyoak K, Morrison R (eds) Cambridge handbook of thinking and reasoning. Cambridge University Press, New York, pp 13–36Google Scholar
  16. 16.
    Hecht B, Raubal M (2008) GeoSR: geographically explore semantic relations in world knowledge. In: The European information society: taking geoinformation science one step further. LNGC, Springer, BerlinGoogle Scholar
  17. 17.
    Hecht B, Carton SH, Quaderi M, Schöning J, Raubal M, Gergle D, Downey D (2012) Explanatory semantic relatedness and explicit spatialization for exploratory search. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 415–424Google Scholar
  18. 18.
    James L, Demaree R, Wolf G (1984) Estimating within-group interrater reliability with and without response bias. J Appl Psychol 69(1):85–98CrossRefGoogle Scholar
  19. 19.
    Janowicz K, Raubal M (2007) Affordance-based similarity measurement for entity types. In: Spatial information theory. LNCS, vol 4736. Springer, Berlin, pp 133–151Google Scholar
  20. 20.
    Janowicz K, Keßler C, Schwarz M, Wilkes M, Panov I, Espeter M, Bäumer B (2007) Algorithm, implementation and application of the SIM-DL similarity server. In: GeoSpatial semantics: second international conference, GeoS 2007. LNCS, vol 4853. Springer, Berlin, pp 128–145Google Scholar
  21. 21.
    Janowicz K, Keßler C, Panov I, Wilkes M, Espeter M, Schwarz M (2008) A study on the cognitive plausibility of SIM-DL similarity rankings for geographic feature types. In: Fabrikant S, Wachowicz M (eds) The European information society: taking geoinformation science one step further. LNGC, Springer, Berlin, pp 115–134CrossRefGoogle Scholar
  22. 22.
    Janowicz K, Raubal M, Schwering A, Kuhn W (2008) Semantic similarity measurement and geospatial applications. Trans GIS 12(6):651–659CrossRefGoogle Scholar
  23. 23.
    Janowicz K, Raubal M, Kuhn W (2011) The semantics of similarity in geographic information retrieval. J Spat Info Sci 2(1):29–57Google Scholar
  24. 24.
    Kaptchuk T (2001) The double-blind, randomized, placebo-controlled trial: gold standard or golden calf?J Clin Epidemiol 54(6):541–549CrossRefGoogle Scholar
  25. 25.
    Kendall M, Smith B (1939) The problem of m rankings. Ann Math Stat 10(3):275–287CrossRefGoogle Scholar
  26. 26.
    Keßler C (2007) Similarity measurement in context. In: Proceedings of the 6th international and interdisciplinary conference on modeling and using context. LNCS, vol 4635. Springer, pp 277–290Google Scholar
  27. 27.
    Keßler C (2011) What is the difference? A cognitive dissimilarity measure for information retrieval result sets. Knowl Inf Syst 30(2):319–340CrossRefGoogle Scholar
  28. 28.
    Khoo C, Na J (2006) Semantic relations in information science. Annu Rev Inf Sci Technol 40(1):157–207CrossRefGoogle Scholar
  29. 29.
    Kuhn W (2013) Cognitive and linguistic ideas and geographic information semantics. In: Cognitive and linguistic aspects of geographic space. LNGC, Springer, pp 159–174Google Scholar
  30. 30.
    LeBreton J, Senter J (2008) Answers to 20 questions about interrater reliability and interrater agreement. Organ Res Methods 11(4):815–852CrossRefGoogle Scholar
  31. 31.
    Lehrer A (1985) The influence of semantic fields on semantic change. In: Fisiak J (ed) Historical word formation. Walter de Gruyter, Berlin, pp 283–296Google Scholar
  32. 32.
    Medin D, Goldstone R, Gentner D (1990) Similarity involving attributes and relations: judgments of similarity and difference are not inverses. Psychol Sci 1(1):64–69CrossRefGoogle Scholar
  33. 33.
    Miller G, Charles W (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28CrossRefGoogle Scholar
  34. 34.
    Mohammad S, Hirst G (2012) Distributional measures of semantic distance: a survey. Comput Res Repository (CoRR) 1–39. arXiv:http://arXiv.org/abs/1203.1858
  35. 35.
    Montello DR, Fabrikant SI, Ruocco M, Middleton RS (2003) Testing the first law of cognitive geography on point-display spatializations. In:Kuhn W, Worboys M, Timpf S (eds) Spatial information theory. Foundations of Geographic Information Science. LNCS, vol 2825. Springer, pp 316–331Google Scholar
  36. 36.
    Morris J, Hirst G (2004) Non-classical lexical semantic relations. In: Proceedings of the HLT-NAACL workshop on computational lexical semantics. ACL, pp 46–51Google Scholar
  37. 37.
    Nelson D, Dyrdal G, Goodmon L (2005) What is preexisting strength? Predicting free association probabilities, similarity ratings, and cued recall probabilities. Psychon Bull Rev 12(4):711–719CrossRefGoogle Scholar
  38. 38.
    Pedersen T, Kolhatkar V (2009) Wordnet::senserelate::allwords: a broad coverage word sense tagger that maximizes semantic relatedness. In: Proceedings of human language technologies: the 2009 annual conference of the north american chapter of the association for computational linguistics, companion volume: demonstration session. ACL, pp 17–20Google Scholar
  39. 39.
    Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet::similarity: measuring the relatedness of concepts. In: Proceedings of human language technologies: the 2004 annual conference of the north American Chapter of the Association for Computational Linguistics, companion volume: demonstration session. ACL, pp 38–41Google Scholar
  40. 40.
    Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30CrossRefGoogle Scholar
  41. 41.
    Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI’95, vol 1. Morgan Kaufmann, pp 448–453Google Scholar
  42. 42.
    Robinson W (1957) The statistical measurement of agreement. Am Sociol Rev 22(1):17–25CrossRefGoogle Scholar
  43. 43.
    Rodgers J, Nicewander W (1988) Thirteen ways to look at the correlation coefficient. Am Stat 42(1):59–66CrossRefGoogle Scholar
  44. 44.
    Rodríguez M, Egenhofer M (2004) Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure. Int J Geogr Inf Sci 18(3):229–256CrossRefGoogle Scholar
  45. 45.
    Rubenstein H, Goodenough J (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633CrossRefGoogle Scholar
  46. 46.
    Schütze H (1998) Automatic word sense discrimination. Comput Linguist 24(1):97–123Google Scholar
  47. 47.
    Schwering A (2008) Approaches to semantic similarity measurement for geo-spatial data: a survey. Trans GIS 12(1):5–29CrossRefGoogle Scholar
  48. 48.
    Schwering A, Kuhn W (2009) A hybrid semantic similarity measure for spatial information retrieval. Spat Cogn Comput 9(1):30–63Google Scholar
  49. 49.
    Schwering A, Raubal M (2005) Spatial relations for semantic similarity measurement. In: Perspectives in conceptual modeling. LNCS, vol 3770. Springer, pp 259–269Google Scholar
  50. 50.
    Strube G (1992) The role of cognitive science in knowledge engineering. Contemp Knowl Eng Cogn 622:159–174CrossRefGoogle Scholar
  51. 51.
    Tobler W (1970) A computer movie simulating urban growth in the Detroit region. In: Economic geography. Supplement: proceedings. International Geographical Union. Commission on quantitative methods, vol 46. Clark University, Worcester, pp 234–240Google Scholar
  52. 52.
    Toutanova K, Klein D, Manning C, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol 1. ACL, pp 173–180Google Scholar
  53. 53.
    Turney P (2006) Similarity of semantic relations. Comput Linguist 32(3):379–416CrossRefGoogle Scholar
  54. 54.
    Tversky A (1977) Features of similarity. Psychol Rev 84(4):327–352CrossRefGoogle Scholar
  55. 55.
    Wang C, Wang J, Xie X, Ma WY (2007) Mining geographic knowledge using location aware topic model. In: Proceedings of the 4th ACM workshop on geographical information retrieval. ACM, pp 65–70Google Scholar
  56. 56.
    Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83CrossRefGoogle Scholar
  57. 57.
    Wright K (2005) Researching internet-based populations: advantages and disadvantages of online survey research, online questionnaire authoring software packages, and web survey services. J Comput-Mediated Commun 10(3). http://jcmc.indiana.edu/vol10/issue3/wright.html, article 11

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Andrea Ballatore
    • 1
    Email author
  • Michela Bertolotto
    • 1
  • David C. Wilson
    • 2
  1. 1.School of Computer Science and InformaticsUniversity College DublinDublin 4Ireland
  2. 2.Department of Software and Information SystemsUniversity of North CarolinaCharlotteUSA

Personalised recommendations