Knowledge and Information Systems

, Volume 37, Issue 1, pp 61–81 | Cite as

Geographic knowledge extraction and semantic similarity in OpenStreetMap

  • Andrea BallatoreEmail author
  • Michela Bertolotto
  • David C. Wilson
Regular Paper


In recent years, a web phenomenon known as Volunteered Geographic Information (VGI) has produced large crowdsourced geographic data sets. OpenStreetMap (OSM), the leading VGI project, aims at building an open-content world map through user contributions. OSM semantics consists of a set of properties (called ‘tags’) describing geographic classes, whose usage is defined by project contributors on a dedicated Wiki website. Because of its simple and open semantic structure, the OSM approach often results in noisy and ambiguous data, limiting its usability for analysis in information retrieval, recommender systems and data mining. Devising a mechanism for computing the semantic similarity of the OSM geographic classes can help alleviate this semantic gap. The contribution of this paper is twofold. It consists of (1) the development of the OSM Semantic Network by means of a web crawler tailored to the OSM Wiki website; this semantic network can be used to compute semantic similarity through co-citation measures, providing a novel semantic tool for OSM and GIS communities; (2) a study of the cognitive plausibility (i.e. the ability to replicate human judgement) of co-citation algorithms when applied to the computation of semantic similarity of geographic concepts. Empirical evidence supports the usage of co-citation algorithms—SimRank showing the highest plausibility—to compute concept similarity in a crowdsourced semantic network.


Semantic similarity OpenStreetMap Volunteered Geographic Information OSM Semantic Network SimRank P-Rank Co-citation Crowdsourcing 



The research presented in this paper was funded by a Strategic Research Cluster grant (07/SRC/I1168) by Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support. They also wish to thank the anonymous reviewers for their valuable suggestions, and Prof. Leslie Daly (UCD School of Public Health, Physiotherapy & Population Science) for his insightful comments on statistical meta-analysis.


  1. 1.
    Adafre S, de Rijke M (2005) Discovering missing links in Wikipedia. In: Proceedings of the 3rd international workshop on link discovery ACM, pp 90–97Google Scholar
  2. 2.
    Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and Wordnet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American Chapter of the Association for Computational Linguistics ACL, pp 19–27Google Scholar
  3. 3.
    Altman D, Gardner M (1988) Statistics in medicine: calculating confidence intervals for regression and correlation. Br Med J (Clin Res Ed) 296(6631):1238–1242CrossRefGoogle Scholar
  4. 4.
    Amsler R (1972) Applications of citation-based automatic classification. Technical report 14. Linguistics Research Center, AustinGoogle Scholar
  5. 5.
    Auer S, Lehmann J, Hellmann S (2009) LinkedGeoData: adding a spatial dimension to the web of data. In: Proceedings of the international semantic web conference, ISWC 09’ vol 5823 of LNCS. Springer, Berlin, pp 731–746Google Scholar
  6. 6.
    Ballatore A, Bertolotto M (2011) Semantically enriching VGI in support of implicit feedback analysis. In: Proceedings of the web and wireless geographical information systems international symposium (W2GIS 2011), vol 6574 of LNCS. Springer, Berlin, pp 78–93Google Scholar
  7. 7.
    Ballatore A, Wilson D, Bertolotto M (2012) A survey of volunteered open geo-knowledge bases in the semantic web. In: Advanced techniques in web intelligence—3: quality-based information retrieval. Studies in computational intelligence, Springer (in press)Google Scholar
  8. 8.
    Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2000) Graph structure in the web. Comput Netw 33(1–6):309–320CrossRefGoogle Scholar
  9. 9.
    Collins A, Loftus E (1975) A spreading-activation theory of semantic processing. Psychol Rev 82(6): 407–428CrossRefGoogle Scholar
  10. 10.
    Diener M, Hilsenroth M, Weinberger J (2009) A primer on meta-analysis of correlation coefficients: the relationship between patient-reported therapeutic alliance and adult attachment style as an illustration. Psychother Res 4–5(19):519–526CrossRefGoogle Scholar
  11. 11.
    Dolbear C, Hart G (2008) Ontological bridge building—using ontologies to merge spatial datasets. In: Proceedings of the AAAI spring symposium on semantic scientific knowledge integration, AAAI/SSKI’08’, AAAI, pp 26–28Google Scholar
  12. 12.
    Egenhofer M (2002) Toward the semantic geospatial web. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, ACM, pp 1–4Google Scholar
  13. 13.
    Fegeas R, Cascio J, Lazar R (1992) An overview of FIPS 173, the spatial data transfer standard. Cartogr Geogr Inf Sci 19(5):278–293CrossRefGoogle Scholar
  14. 14.
    Field A (2001) Meta-analysis of correlation coefficients: A monte carlo comparison of fixed- and random-effects methods. Psychol Methods 6(2):161–180CrossRefGoogle Scholar
  15. 15.
    Formica A, Pourabbas E (2009) Content based similarity of geographic classes organized as partition hierarchies. Knowl Inf Syst 20(2):221–241CrossRefGoogle Scholar
  16. 16.
    Gartner G, Bennett D, Morita T (2007) Towards ubiquitous cartography. Cartogr Geogr Inf Sci 34(4): 247–257CrossRefGoogle Scholar
  17. 17.
    Giunchiglia F, Maltese V, Farazi F, Dutta B (2010) GeoWordNet: a resource for geo-spatial applications. In: The semantic web: research and applications, ESWC 2010, vol 6088 of LNCS. Springer, Berlin, pp 121–136Google Scholar
  18. 18.
    Goodchild M (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69(4):211–221CrossRefGoogle Scholar
  19. 19.
    Haklay M (2010) How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Environ Plan B Plan Des 37(4):682–703CrossRefGoogle Scholar
  20. 20.
    Haklay M, Singleton A, Parker C (2008) Web mapping 2.0: the neogeography of the GeoWeb. Geogr Compass 2(6):2011–2039CrossRefGoogle Scholar
  21. 21.
    Haklay M, Weber P (2008) OpenStreetMap: user-generated street maps. IEEE Pervasive Comput 7(4): 12–18CrossRefGoogle Scholar
  22. 22.
    Halpin H, Robu V, Shepherd H (2007) The complex dynamics of collaborative tagging. In: Proceedings of the 16th international conference on world wide web, ACM, pp 211–220Google Scholar
  23. 23.
    Hu B (2010) WiKi’mantics: interpreting ontologies with WikipediA. Knowl Inf Syst 25(3):445–472CrossRefGoogle Scholar
  24. 24.
    Hunter J, Schmidt F (1990) Methods of meta-analysis: correcting error and bias in research findings. SAGE, Newbury ParkGoogle Scholar
  25. 25.
    Janowicz K, Keßler C, Panov I, Wilkes M, Espeter M, Schwarz M (2008) A study on the cognitive plausibility of SIM-DL similarity rankings for geographic feature types. In: The European Information Society: taking geoinformation science one step further, LNGC. Springer, Berlin, pp 115–134Google Scholar
  26. 26.
    Janowicz K, Keßler C, Schwarz M, Wilkes M, Panov I, Espeter M, Bäumer B (2007) Algorithm, implementation and application of the SIM-DL similarity server. In: GeoSpatial semantics: second international conference, GeoS 2007, vol 4853 of LNCS. Springer, Berlin, pp 128–145Google Scholar
  27. 27.
    Janowicz K, Raubal M, Kuhn W (2011) The semantics of similarity in geographic information retrieval. J Spat Inf Sci 2(1):29–57Google Scholar
  28. 28.
    Janowicz K, Raubal M, Schwering A, Kuhn W (2008) Semantic similarity measurement and geospatial applications. Trans GIS 12(6):651–659CrossRefGoogle Scholar
  29. 29.
    Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM international conference on knowledge discovery and data mining, SIGKDD, ACM, pp 538–543Google Scholar
  30. 30.
    Keßler C (2011) What is the difference? A cognitive dissimilarity measure for information retrieval result sets. Knowl Inf Syst 30(2):319–340CrossRefGoogle Scholar
  31. 31.
    Kessler M (1963) Bibliographic coupling between scientific papers. Am Doc 14(1):10–25CrossRefGoogle Scholar
  32. 32.
    Kuhn W (2005) Geospatial semantics: why, of what, and how?. In: Journal of Data Semantics III. Special issue on Semantic-based Geographical Information Systems, vol 3534 of LNCS. Springer, Berlin, pp 1–24Google Scholar
  33. 33.
    Li P, Liu H, Yu J, He J, Du X (2010) Fast single-pair SimRank computation. In: Proceedings of the SIAM international conference on data mining, SDM2010. Omnipress, Madison, pp 571–582Google Scholar
  34. 34.
    Lin Y (2011) A qualitative enquiry into OpenStreetMap making. New Rev Hypermed Multimed 17(1): 53–71CrossRefGoogle Scholar
  35. 35.
    Lin Z, Lyu M, King I (2011) MatchSim: a novel similarity measure based on maximum neighborhood matching. Knowl Inf Syst 32(1):1–26Google Scholar
  36. 36.
    Liu J, Chen H, Furuse K, Kitagawa H, Yu JX (2011) On efficient distance-based similarity search. In: ‘Proceedings of the 11th IEEE international conference on data mining workshops, IEEE, pp 1199–1202Google Scholar
  37. 37.
    Lizorkin D, Velikhov P, Grinev M, Turdakov D (2008) Accuracy estimate and optimization techniques for SimRank computation. In: Proceedings of the VLDB endowment, vol 1, very large data base endowment, pp 422–433Google Scholar
  38. 38.
    Mika P (2005) Ontologies are us: a unified model of social networks and semantics. In: Proceedings of the 4th international semantic web conference, ISWC 2005, vol 3729 of LNCS. Springer, Berlin, pp 522–536Google Scholar
  39. 39.
    Miller G (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41CrossRefGoogle Scholar
  40. 40.
    Miller G, Charles W (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28CrossRefGoogle Scholar
  41. 41.
    Mirizzi R, Ragone A, Di Noia T, Di E (2010) Ranking the linked data: the case of DBpedia. In: Proceedings of 10th international conference in web engineering, ICWE 2010, vol 6189 of LNCS. Springer, Berlin, pp 337–354Google Scholar
  42. 42.
    Mooney P, Corcoran P (2012) Characteristics of heavily edited objects in OpenStreetMap. Future Internet 4(1):285–305CrossRefGoogle Scholar
  43. 43.
    Mooney P, Corcoran P, Winstanley A (2010) Towards quality metrics for OpenStreetMap. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, ACM, pp 514–517Google Scholar
  44. 44.
    Mülligann C, Janowicz K, Ye M, Lee W (2011) Analyzing the spatial-semantic interaction of points of interest in volunteered geographic information. In: Spatial information theory, vol 6899 of LNCS. Springer, Berlin, pp 350–370Google Scholar
  45. 45.
    Nakayama K, Hara T, Nishio S (2008) Wikipedia link structure and text mining for semantic relation extraction. Towards a huge scale global web ontology. In: Proceedings of the workshop on semantic search (SemSearch 2008), 5th European semantic web conference (ESWC 2008), vol 334 of CEUR workshop proceedings, pp 59–73Google Scholar
  46. 46.
    Nitzschke J (2012) OpenStreetMap’s growth accelerates. Technical report, BeyoNav, Chicago.
  47. 47.
    Priedhorsky R, Terveen L (2008) The computational Geowiki: what, why, and how. In: Proceedings of the ACM conference on computer supported cooperative work, CSCW 2008. ACM, pp 267–276Google Scholar
  48. 48.
    Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI’95, vol 1. Morgan Kaufmann, pp 448–453Google Scholar
  49. 49.
    Robu V, Halpin H, Shepherd H (2009) Emergence of consensus and shared vocabularies in collaborative tagging systems. ACM Trans Web 3(4):1–34CrossRefGoogle Scholar
  50. 50.
    Rodríguez M, Egenhofer M (2004) Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure. Int J Geogr Inf Sci 18(3):229–256CrossRefGoogle Scholar
  51. 51.
    Rubenstein H, Goodenough J (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633CrossRefGoogle Scholar
  52. 52.
    Schwering A (2008) Approaches to semantic similarity measurement for geo-spatial data: a survey. Trans GIS 12(1):5–29CrossRefGoogle Scholar
  53. 53.
    Selçuk Candan K, Li W (2001) On similarity measures for multimedia database applications. Knowl Inf Syst 3(1):30–51CrossRefzbMATHGoogle Scholar
  54. 54.
    Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inf Sci 24(4):265–269CrossRefGoogle Scholar
  55. 55.
    Sowa J (1991) Principles of semantic networks: explorations in the representation of knowledge. Morgan Kaufmann, San MateozbMATHGoogle Scholar
  56. 56.
    Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101CrossRefGoogle Scholar
  57. 57.
    Sui D (2008) The wikification of GIS and its consequences: or Angelina Jolie’s new tattoo and the future of GIS. Comput Environ Urban Syst 32(1):1–5MathSciNetCrossRefGoogle Scholar
  58. 58.
    Turdakov D, Velikhov P (2008) Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation. In: Proceedings of the SYRCODIS 2008 colloquium on databases and information systems, vol 355 of CEUR workshop proceedingsGoogle Scholar
  59. 59.
    Turner A (2006) Introduction to neogeography. O’Reilly Media, SebastopolGoogle Scholar
  60. 60.
    Wan X (2008) Beyond topical similarity: a structural similarity measure for retrieving highly similar documents. Knowl Inf Syst 15(1):55–73CrossRefGoogle Scholar
  61. 61.
    Wittgenstein L [2009 (1953)] Philosophical investigations, 4th edn. Blackwell, Chichester (trans: Anscombe GEM)Google Scholar
  62. 62.
    Zhao P, Han J, Sun Y (2009) P-Rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM conference on information and knowledge management, CIKM ’09, ACM, pp 553–562Google Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  • Andrea Ballatore
    • 1
    Email author
  • Michela Bertolotto
    • 1
  • David C. Wilson
    • 2
  1. 1.School of Computer Science and InformaticsUniversity College DublinDublin 4Ireland
  2. 2.Department of Software and Information SystemsUniversity of North CarolinaCharlotteUSA

Personalised recommendations