Skip to main content

Advertisement

Log in

Geographic knowledge extraction and semantic similarity in OpenStreetMap

Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

In recent years, a web phenomenon known as Volunteered Geographic Information (VGI) has produced large crowdsourced geographic data sets. OpenStreetMap (OSM), the leading VGI project, aims at building an open-content world map through user contributions. OSM semantics consists of a set of properties (called ‘tags’) describing geographic classes, whose usage is defined by project contributors on a dedicated Wiki website. Because of its simple and open semantic structure, the OSM approach often results in noisy and ambiguous data, limiting its usability for analysis in information retrieval, recommender systems and data mining. Devising a mechanism for computing the semantic similarity of the OSM geographic classes can help alleviate this semantic gap. The contribution of this paper is twofold. It consists of (1) the development of the OSM Semantic Network by means of a web crawler tailored to the OSM Wiki website; this semantic network can be used to compute semantic similarity through co-citation measures, providing a novel semantic tool for OSM and GIS communities; (2) a study of the cognitive plausibility (i.e. the ability to replicate human judgement) of co-citation algorithms when applied to the computation of semantic similarity of geographic concepts. Empirical evidence supports the usage of co-citation algorithms—SimRank showing the highest plausibility—to compute concept similarity in a crowdsourced semantic network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. This usage of the term ‘tag’ is highly unusual: in the Web 2.0, tags are generally unstructured text labels used as meta-data [22]. However, to be consistent with the OSM terminology, we will refer to the OSM properties as ‘tags’ in the rest of this paper.

  2. http://wiki.openstreetmap.org (acc. August 10, 2012).

  3. http://wiki.openstreetmap.org/wiki/OSMonto (acc. August 10, 2012).

  4. http://wiki.openstreetmap.org/wiki/OSMSemanticNetwork (acc. August 10, 2012).

  5. Pre-computed similarity scores for the entire OSM Semantic Network are available at http://spatial.ucd.ie/osn/similarities (acc. August 10, 2012).

  6. http://cyclopath.org,http://wikimapia.org (acc. August 10, 2012).

  7. http://wiki.openstreetmap.org/wiki/OpenStreetMap_License (acc. August 10, 2012).

  8. http://linkedgeodata.org/ontology (acc. August 10, 2012).

  9. http://wiki.openstreetmap.org/wiki/OSMonto (acc. August 10, 2012).

  10. http://wiki.openstreetmap.org/wiki/Statistics (acc. August 10, 2012).

  11. http://wiki.openstreetmap.org/wiki/Map_Features (acc. August 10, 2012).

  12. http://taginfo.openstreetmap.org,http://wiki.openstreetmap.org/wiki/Tagwatch (acc. August 10, 2012).

  13. osmwiki:’ stands for the namespace http://wiki.openstreetmap.org/wiki/ (acc. August 10, 2012).

  14. http://github.com/ucd-spatial/OsmWikiCrawler (acc. August 10, 2012).

  15. http://www.w3.org/RDF (acc. August 10, 2012).

  16. http://dump.wiki.openstreetmap.org (acc. August 10, 2012).

  17. lgdo:’ stands for the namespace http://linkedgeodata.org/ontology/ (acc. August 10, 2012).

  18. The full algorithm of the crawler is available in the source code documentation.

  19. osmwiki:Proposed_features/Building_attributes (acc. August 10, 2012).

  20. http://linkeddata.org (acc. August 10, 2012).

  21. http://wiki.openstreetmap.org/wiki/OSMSemanticNetwork (acc. August 10, 2012).

  22. http://github.com/ucd-spatial/Datasets (acc. August 10, 2012).

  23. http://github.com/ucd-spatial/Datasets (acc. August 10, 2012).

  24. http://wiki.openstreetmap.org/wiki/OSMSemanticNetwork (acc. August 10, 2012).

  25. http://www.geonames.org (acc. August 10, 2012).

  26. http://wiki.openstreetmap.org/wiki/Category:Projects_by_country (acc. August 10, 2012).

References

  1. Adafre S, de Rijke M (2005) Discovering missing links in Wikipedia. In: Proceedings of the 3rd international workshop on link discovery ACM, pp 90–97

  2. Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and Wordnet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American Chapter of the Association for Computational Linguistics ACL, pp 19–27

  3. Altman D, Gardner M (1988) Statistics in medicine: calculating confidence intervals for regression and correlation. Br Med J (Clin Res Ed) 296(6631):1238–1242

    Article  Google Scholar 

  4. Amsler R (1972) Applications of citation-based automatic classification. Technical report 14. Linguistics Research Center, Austin

    Google Scholar 

  5. Auer S, Lehmann J, Hellmann S (2009) LinkedGeoData: adding a spatial dimension to the web of data. In: Proceedings of the international semantic web conference, ISWC 09’ vol 5823 of LNCS. Springer, Berlin, pp 731–746

  6. Ballatore A, Bertolotto M (2011) Semantically enriching VGI in support of implicit feedback analysis. In: Proceedings of the web and wireless geographical information systems international symposium (W2GIS 2011), vol 6574 of LNCS. Springer, Berlin, pp 78–93

  7. Ballatore A, Wilson D, Bertolotto M (2012) A survey of volunteered open geo-knowledge bases in the semantic web. In: Advanced techniques in web intelligence—3: quality-based information retrieval. Studies in computational intelligence, Springer (in press)

  8. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2000) Graph structure in the web. Comput Netw 33(1–6):309–320

    Article  Google Scholar 

  9. Collins A, Loftus E (1975) A spreading-activation theory of semantic processing. Psychol Rev 82(6): 407–428

    Article  Google Scholar 

  10. Diener M, Hilsenroth M, Weinberger J (2009) A primer on meta-analysis of correlation coefficients: the relationship between patient-reported therapeutic alliance and adult attachment style as an illustration. Psychother Res 4–5(19):519–526

    Article  Google Scholar 

  11. Dolbear C, Hart G (2008) Ontological bridge building—using ontologies to merge spatial datasets. In: Proceedings of the AAAI spring symposium on semantic scientific knowledge integration, AAAI/SSKI’08’, AAAI, pp 26–28

  12. Egenhofer M (2002) Toward the semantic geospatial web. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, ACM, pp 1–4

  13. Fegeas R, Cascio J, Lazar R (1992) An overview of FIPS 173, the spatial data transfer standard. Cartogr Geogr Inf Sci 19(5):278–293

    Article  Google Scholar 

  14. Field A (2001) Meta-analysis of correlation coefficients: A monte carlo comparison of fixed- and random-effects methods. Psychol Methods 6(2):161–180

    Article  Google Scholar 

  15. Formica A, Pourabbas E (2009) Content based similarity of geographic classes organized as partition hierarchies. Knowl Inf Syst 20(2):221–241

    Article  Google Scholar 

  16. Gartner G, Bennett D, Morita T (2007) Towards ubiquitous cartography. Cartogr Geogr Inf Sci 34(4): 247–257

    Article  Google Scholar 

  17. Giunchiglia F, Maltese V, Farazi F, Dutta B (2010) GeoWordNet: a resource for geo-spatial applications. In: The semantic web: research and applications, ESWC 2010, vol 6088 of LNCS. Springer, Berlin, pp 121–136

  18. Goodchild M (2007) Citizens as sensors: the world of volunteered geography. GeoJournal 69(4):211–221

    Article  Google Scholar 

  19. Haklay M (2010) How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Environ Plan B Plan Des 37(4):682–703

    Article  Google Scholar 

  20. Haklay M, Singleton A, Parker C (2008) Web mapping 2.0: the neogeography of the GeoWeb. Geogr Compass 2(6):2011–2039

    Article  Google Scholar 

  21. Haklay M, Weber P (2008) OpenStreetMap: user-generated street maps. IEEE Pervasive Comput 7(4): 12–18

    Article  Google Scholar 

  22. Halpin H, Robu V, Shepherd H (2007) The complex dynamics of collaborative tagging. In: Proceedings of the 16th international conference on world wide web, ACM, pp 211–220

  23. Hu B (2010) WiKi’mantics: interpreting ontologies with WikipediA. Knowl Inf Syst 25(3):445–472

    Article  Google Scholar 

  24. Hunter J, Schmidt F (1990) Methods of meta-analysis: correcting error and bias in research findings. SAGE, Newbury Park

    Google Scholar 

  25. Janowicz K, Keßler C, Panov I, Wilkes M, Espeter M, Schwarz M (2008) A study on the cognitive plausibility of SIM-DL similarity rankings for geographic feature types. In: The European Information Society: taking geoinformation science one step further, LNGC. Springer, Berlin, pp 115–134

  26. Janowicz K, Keßler C, Schwarz M, Wilkes M, Panov I, Espeter M, Bäumer B (2007) Algorithm, implementation and application of the SIM-DL similarity server. In: GeoSpatial semantics: second international conference, GeoS 2007, vol 4853 of LNCS. Springer, Berlin, pp 128–145

  27. Janowicz K, Raubal M, Kuhn W (2011) The semantics of similarity in geographic information retrieval. J Spat Inf Sci 2(1):29–57

    Google Scholar 

  28. Janowicz K, Raubal M, Schwering A, Kuhn W (2008) Semantic similarity measurement and geospatial applications. Trans GIS 12(6):651–659

    Article  Google Scholar 

  29. Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM international conference on knowledge discovery and data mining, SIGKDD, ACM, pp 538–543

  30. Keßler C (2011) What is the difference? A cognitive dissimilarity measure for information retrieval result sets. Knowl Inf Syst 30(2):319–340

    Article  Google Scholar 

  31. Kessler M (1963) Bibliographic coupling between scientific papers. Am Doc 14(1):10–25

    Article  Google Scholar 

  32. Kuhn W (2005) Geospatial semantics: why, of what, and how?. In: Journal of Data Semantics III. Special issue on Semantic-based Geographical Information Systems, vol 3534 of LNCS. Springer, Berlin, pp 1–24

  33. Li P, Liu H, Yu J, He J, Du X (2010) Fast single-pair SimRank computation. In: Proceedings of the SIAM international conference on data mining, SDM2010. Omnipress, Madison, pp 571–582

  34. Lin Y (2011) A qualitative enquiry into OpenStreetMap making. New Rev Hypermed Multimed 17(1): 53–71

    Article  Google Scholar 

  35. Lin Z, Lyu M, King I (2011) MatchSim: a novel similarity measure based on maximum neighborhood matching. Knowl Inf Syst 32(1):1–26

    Google Scholar 

  36. Liu J, Chen H, Furuse K, Kitagawa H, Yu JX (2011) On efficient distance-based similarity search. In: ‘Proceedings of the 11th IEEE international conference on data mining workshops, IEEE, pp 1199–1202

  37. Lizorkin D, Velikhov P, Grinev M, Turdakov D (2008) Accuracy estimate and optimization techniques for SimRank computation. In: Proceedings of the VLDB endowment, vol 1, very large data base endowment, pp 422–433

  38. Mika P (2005) Ontologies are us: a unified model of social networks and semantics. In: Proceedings of the 4th international semantic web conference, ISWC 2005, vol 3729 of LNCS. Springer, Berlin, pp 522–536

  39. Miller G (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  40. Miller G, Charles W (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28

    Article  Google Scholar 

  41. Mirizzi R, Ragone A, Di Noia T, Di E (2010) Ranking the linked data: the case of DBpedia. In: Proceedings of 10th international conference in web engineering, ICWE 2010, vol 6189 of LNCS. Springer, Berlin, pp 337–354

  42. Mooney P, Corcoran P (2012) Characteristics of heavily edited objects in OpenStreetMap. Future Internet 4(1):285–305

    Article  Google Scholar 

  43. Mooney P, Corcoran P, Winstanley A (2010) Towards quality metrics for OpenStreetMap. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, ACM, pp 514–517

  44. Mülligann C, Janowicz K, Ye M, Lee W (2011) Analyzing the spatial-semantic interaction of points of interest in volunteered geographic information. In: Spatial information theory, vol 6899 of LNCS. Springer, Berlin, pp 350–370

  45. Nakayama K, Hara T, Nishio S (2008) Wikipedia link structure and text mining for semantic relation extraction. Towards a huge scale global web ontology. In: Proceedings of the workshop on semantic search (SemSearch 2008), 5th European semantic web conference (ESWC 2008), vol 334 of CEUR workshop proceedings, pp 59–73

  46. Nitzschke J (2012) OpenStreetMap’s growth accelerates. Technical report, BeyoNav, Chicago. http://www.beyonav.com/openstreetmaps-growth-accelerates

  47. Priedhorsky R, Terveen L (2008) The computational Geowiki: what, why, and how. In: Proceedings of the ACM conference on computer supported cooperative work, CSCW 2008. ACM, pp 267–276

  48. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th international joint conference on artificial intelligence, IJCAI’95, vol 1. Morgan Kaufmann, pp 448–453

  49. Robu V, Halpin H, Shepherd H (2009) Emergence of consensus and shared vocabularies in collaborative tagging systems. ACM Trans Web 3(4):1–34

    Article  Google Scholar 

  50. Rodríguez M, Egenhofer M (2004) Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure. Int J Geogr Inf Sci 18(3):229–256

    Article  Google Scholar 

  51. Rubenstein H, Goodenough J (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633

    Article  Google Scholar 

  52. Schwering A (2008) Approaches to semantic similarity measurement for geo-spatial data: a survey. Trans GIS 12(1):5–29

    Article  Google Scholar 

  53. Selçuk Candan K, Li W (2001) On similarity measures for multimedia database applications. Knowl Inf Syst 3(1):30–51

    Article  MATH  Google Scholar 

  54. Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inf Sci 24(4):265–269

    Article  Google Scholar 

  55. Sowa J (1991) Principles of semantic networks: explorations in the representation of knowledge. Morgan Kaufmann, San Mateo

    MATH  Google Scholar 

  56. Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101

    Article  Google Scholar 

  57. Sui D (2008) The wikification of GIS and its consequences: or Angelina Jolie’s new tattoo and the future of GIS. Comput Environ Urban Syst 32(1):1–5

    Article  MathSciNet  Google Scholar 

  58. Turdakov D, Velikhov P (2008) Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation. In: Proceedings of the SYRCODIS 2008 colloquium on databases and information systems, vol 355 of CEUR workshop proceedings

  59. Turner A (2006) Introduction to neogeography. O’Reilly Media, Sebastopol

    Google Scholar 

  60. Wan X (2008) Beyond topical similarity: a structural similarity measure for retrieving highly similar documents. Knowl Inf Syst 15(1):55–73

    Article  Google Scholar 

  61. Wittgenstein L [2009 (1953)] Philosophical investigations, 4th edn. Blackwell, Chichester (trans: Anscombe GEM)

  62. Zhao P, Han J, Sun Y (2009) P-Rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM conference on information and knowledge management, CIKM ’09, ACM, pp 553–562

Download references

Acknowledgments

The research presented in this paper was funded by a Strategic Research Cluster grant (07/SRC/I1168) by Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support. They also wish to thank the anonymous reviewers for their valuable suggestions, and Prof. Leslie Daly (UCD School of Public Health, Physiotherapy & Population Science) for his insightful comments on statistical meta-analysis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Ballatore.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ballatore, A., Bertolotto, M. & Wilson, D.C. Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowl Inf Syst 37, 61–81 (2013). https://doi.org/10.1007/s10115-012-0571-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0571-0

Keywords

Navigation