Interpretation and automatic integration of geospatial data into the Semantic Web
Abstract
In the context of disaster management, geospatial information plays a crucial role in the decision-making process to protect and save the population. Gathering a maximum of information from different sources to oversee the current situation is a complex task due to the diversity of data formats and structures. Although several approaches have been designed to integrate data from different sources into an ontology, they mainly require background knowledge of the data. However, non-standard data set schema (NSDS) of relational geospatial data retrieved from e.g. web feature services are not always documented. This lack of background knowledge is a major challenge for automatic semantic data integration. Focusing on this problem, this article presents an automatic approach for geospatial data integration in NSDS. This approach does a schema mapping according to the result of an ontology matching corresponding to a semantic interpretation process. This process is based on geocoding and natural language processing. This article extends work done in a previous publication by an improved unit detection algorithm, data quality and provenance enrichments, the detection of feature clusters. It also presents an improved evaluation process to better assess the performance of this approach compared to a manually created ontology. These experiments have shown the automatic approach obtains an error of semantic interpretation around 10% according to a manual approach.
Keywords
Semantic interpretation Data quality Natural language processing Ontologies Spatial fusion Semantic WebNotes
Acknowledgements
We are funded by the German Federal Ministry of Education and Research (https://www.bmbf.de/en/index.html Project Reference: 03FH032IX4).
References
- 1.Alt H, Godau M (1995) Computing the Fréchet distance between two polygonal curves. Int J Comput Geom Appl 5(01n02):75–91Google Scholar
- 2.Arenas M, Bertails A, Prud’hommeaux E, Sequeda J (2012) A direct mapping of relational data to RDF. W3C recommendation. https://www.w3.org/TR/rdb-direct-mapping/
- 3.Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. In: The semantic web, Springer, pp 722–735Google Scholar
- 4.Auer S, Lehmann J, Hellmann S (2009) Linkedgeodata: adding a spatial dimension to the web of data. In: International semantic web conference, Springer, pp 731–746Google Scholar
- 5.Barron C, Neis P, Zipf A (2014) A comprehensive framework for intrinsic openstreetmap quality analysis. Trans GIS 18(6):877–895Google Scholar
- 6.Battle R, Kolas D (2011) Geosparql: enabling a geospatial semantic web. Semant Web J 3(4):355–370Google Scholar
- 7.Berretti S, Del Bimbo A, Pala P (2000) Retrieval by shape similarity with perceptual distance and effective indexing. IEEE Trans Multimed 2(4):225–239Google Scholar
- 8.Bizid I, Faiz S, Boursier Patriceand Yusuf JCM (2014) Integration of heterogeneous spatial databases for disaster management. In: Parsons J, Chiu D (eds) Advances in conceptual modeling: ER 2013 workshops, LSAWM, MoBiD, RIGiM, SeCoGIS, WISM, DaSeM, SCME, and PhD symposium, Hong Kong, China, November, 2013, revised selected papers. Springer, Cham, pp 77–86. https://doi.org/10.1007/978-3-319-14139-8_10
- 9.Brassel K, Bucher F, Stephan EM, Vckovski A (1995) Completeness. In: Guptill SC, Morrison JL (eds) Elements of spatial data quality. Elsevier, Amsterdam, pp 81–108Google Scholar
- 10.Burggraf DS (2006) Geography markup language. Data Sci J 5:178–204Google Scholar
- 11.Buscaldi D, Rosso P (2008) Geo-wordnet: automatic georeferencing of wordnet. In: LRECGoogle Scholar
- 12.Das S, Sundara S, Cyganiak R (2012) R2RML: RDB to RDF mapping language, W3C recommendation. World Wide Web Consortium, CambridgeGoogle Scholar
- 13.Debruyne C, McGlinn K, McNerney L, O’Sullivan D (2017) A lightweight approach to explore, enrich and use data with a geospatial dimension with semantic web technologies. In: Proceedings of the fourth international ACM workshop on managing and mining enriched geo-spatial data, ACM, p 1Google Scholar
- 14.Debruyne C, Meehan A, Clinton É, McNerney L, Nautiyal A, Lavin P, O’Sullivan D (2017) Ireland’s authoritative geospatial linked data. In: International semantic web conference, Springer, pp 66–74Google Scholar
- 15.Do HH, Rahm E (2002) Coma: a system for flexible combination of schema matching approaches. In: Proceedings of the 28th international conference on very large data bases, VLDB endowment, pp 610–621Google Scholar
- 16.Eren H (2016) 8 standards in process control and automation. In: Liptak BG, Eren H (eds) Instrument engineers’ handbook, volume 3: process software and digital networks, vol 3. CRC Press, Boca Raton, p 155Google Scholar
- 17.ESRI E (1998) Shapefile technical description. An ESRI white paperGoogle Scholar
- 18.Euzenat J, Shvaiko P (2007) Ontology matching. Springer, BerlinzbMATHGoogle Scholar
- 19.Gao S, Sperberg-McQueen CM, Thompson HS, Mendelsohn N, Beech D, Maloney M (2009) W3C XML schema definition language (XSD) 1.1 part 1: structures. W3C Candidate Recomm 30(7.2):16Google Scholar
- 20.Goodchild MF, Hunter GJ (1997) A simple positional accuracy measure for linear features. Int J Geogr Inf Sci 11(3):299–306Google Scholar
- 21.Grantner E (2007) ISO 8000: a standard for data quality. Logist Spectr 41(4):4–6Google Scholar
- 22.Guo H, Song GF, Ma L, Wang SH (2009) Design and implementation of address geocoding system. Comput Eng 35(1):250–251Google Scholar
- 23.Hartig O, Zhao J (2009) Using web data provenance for quality assessment. CEUR workshop proceedingsGoogle Scholar
- 24.Hillner S, Ngomo ACN (2011) Parallelizing limes for large-scale link discovery. In: 7th international conference on semantic systems, ACM, pp 9–16Google Scholar
- 25.Homburg T, Prudhomme C, Würriehausen F, Karmacharya A, Boochs F, Roxin A, Cruz C (2016) Interpreting heterogeneous geospatial data using semantic web technologies. In: International conference on computational science and its applications, Springer, pp 240–255Google Scholar
- 26.Huttenlocher DP, Klanderman GA, Rucklidge WJ (1993) Comparing images using the Hausdorff distance. IEEE Trans Pattern Anal Mach Intell 15(9):850–863Google Scholar
- 27.Jiménez-Ruiz E, Grau BC (2011) Logmap: logic-based and scalable ontology matching. In: International semantic web conference, Springer, pp 273–288Google Scholar
- 28.Jiménez-Ruiz E, Kharlamov E, Zheleznyakov D, Horrocks I, Pinkel C, Skjæveland MG, Thorstensen E, Mora J (2015) Bootox: practical mapping of RDBS to OWL 2. In: International semantic web conference, Springer, pp 113–132Google Scholar
- 29.Kainz W (1995) Logical consistency. Elem Spat Data Qual 202:109–137Google Scholar
- 30.Kalemi E, Martiri E (2011) FOAF-academic ontology: a vocabulary for the academic community. In: 2011 third international conference on intelligent networking and collaborative systems (INCoS), IEEE, pp 440–445Google Scholar
- 31.Lanter DP (1990) Lineage in GIS: the problem and a solution, NCGIA National Center for Geographic Information and Analysis. http://infoscience.epfl.ch/record/51713
- 32.Le Grange JJ, Lehmann J, Athanasiou S, Garcia-Rojas A, Giannopoulos G, Hladky D, Isele R, Ngomo ACN, Sherif MA, Stadler C, et al (2014) The geoknow generator: managing geospatial data in the linked data web. In: Linking geospatial dataGoogle Scholar
- 33.Lebo T, Sahoo S, McGuinness D, Belhajjame K, Cheney J, Corsar D, Garijo D, Soiland-Reyes S, Zednik S, Zhao J (2013) PROV-O: the PROV ontology. W3C recommendation. https://www.w3.org/TR/prov-o/
- 34.Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions, and reversals. Soviet Phys Dokl 10:707–710MathSciNetGoogle Scholar
- 35.Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: 52nd annual meeting of the association for computational linguistics: system demonstrations, pp 55–60Google Scholar
- 36.Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: 18th international conference on data engineering, 2002. Proceedings, IEEE, pp 117–128Google Scholar
- 37.Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41Google Scholar
- 38.Navigli R, Ponzetto SP (2010) BabelNet: building a very large multilingual semantic network. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for computational linguistics, pp 216–225Google Scholar
- 39.Nentwig M, Hartung M, Ngonga Ngomo AC, Rahm E (2017) A survey of current link discovery frameworks. Semant Web 8(3):419–436Google Scholar
- 40.Ngomo ACN, Auer S (2011) Limes-a time-efficient approach for large-scale link discovery on the web of data. In: IJCAI, pp 2312–2317Google Scholar
- 41.Niu X, Rong S, Zhang Y, Wang H (2011) Zhishi.links results for OAEI 2011. In: Ontology matching, vol 220Google Scholar
- 42.Niwattanakul S, Singthongchai J, Naenudorn E, Wanapu S (2013) Using of Jaccard coefficient for keywords similarity. In: Proceedings of the international multiconference of engineers and computer scientists, vol 1Google Scholar
- 43.OGC (2011) OGC geosparql—a geographic query language for RDF data. Technical reportGoogle Scholar
- 44.Otero-Cerdeira L, Rodríguez-Martínez FJ, Gómez-Rodríguez A (2015) Ontology matching: a literature review. Expert Syst Appl 42(2):949–971Google Scholar
- 45.Pan JZ (2009) Resource description framework. In: Staab S, Studer R (eds) Handbook on ontologies. Springer, Berlin, pp 71–90Google Scholar
- 46.Patroumpas K, Alexakis M, Giannopoulos G, Athanasiou S (2014) Triplegeo: an ETL tool for transforming geospatial data into RDF triples. In: ICDT workshops, pp 275–278Google Scholar
- 47.Pinkel C, Binnig C, Jiménez-Ruiz E, Kharlamov E, May W, Nikolov A, Sasa Bastinos A, Skjæveland MG, Solimando A, Taheriyan M et al (2016) RODI: benchmarking relational-to-ontology mapping generation quality. Semant Web 9(1):25–52Google Scholar
- 48.Pinkel C, Binnig C, Jimenez-Ruiz E, Kharlamov E, Nikolov A, Schwarte A, Heupel C, Kraska T (2017) IncMap: a journey towards ontology-based data integration. In: Mitschang B, Nicklas D, Leymann F, Schöning H, Herschel M, Teubner J, Härder T, Kopp O, Wieland M (eds) Datenbanksysteme für Business, Technologie und Web (BTW 2017). Gesellschaft für Informatik, BonnGoogle Scholar
- 49.Prudhomme C, Homburg T, Ponciano JJ, Boochs F, Roxin A, Cruz C (2017) Automatic integration of spatial data into the semantic web. In: WebIST 2017Google Scholar
- 50.Prud E, Seaborne A, et al (2008) SPARQL query language for RDF. W3C Recommendation. https://www.w3.org/2001/sw/DataAccess/rq23/
- 51.Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350zbMATHGoogle Scholar
- 52.Repici J (2010) The comma separated value (CSV) file format. Creativyst Inc, San CarlosGoogle Scholar
- 53.Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. arXiv:cmp-lg/9511007
- 54.Rijgersberg H, van Assem M, Top J (2013) Ontology of units of measure and related concepts. Semant Web 4(1):3–13Google Scholar
- 55.Scharffe F, Atemezing G, Troncy R, Gandon F, Villata S, Bucher B, Hamdi F, Bihanic L, Képéklian G, Cotton F, et al (2012) Enabling linked-data publication with the datalift platform. In: Proceedings of AAAI workshop on semantic citiesGoogle Scholar
- 56.Schwering A (2008) Approaches to semantic similarity measurement for geo-spatial data: a survey. Trans GIS 12(1):5–29Google Scholar
- 57.Shvaiko P, Euzenat J (2013) Ontology matching: state of the art and future challenges. IEEE Trans Knowl Data Eng 25(1):158–176Google Scholar
- 58.Stadler C, Unbehauen J, Lehmann J, Auer S (2013) Connecting crowdsourced spatial information to the data web with sparqlify. Technical report, University of LeipzigGoogle Scholar
- 59.Svennerberg, G (2010) Beginning Google Maps API 3. ApressGoogle Scholar
- 60.Tarasowa D, Lange C, Auer S (2015) Measuring the quality of relational-to-RDF mappings. In: International conference on knowledge engineering and the semantic web, Springer, pp 210–224Google Scholar
- 61.van Rees E (2013) Open geospatial consortium (OGC). Geoinformatics 16(8):28Google Scholar
- 62.Veltkamp RC (2001) Shape matching: similarity measures and algorithms. In: SMI 2001 international conference on shape modeling and applications, IEEE, pp 188–197Google Scholar
- 63.Vertan C, Wozu O (2007) Web ontology language (OWL). W3C Recommendation. https://www.w3.org/TR/owl-features/
- 64.Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Silk-a link discovery framework for the web of data. In: LDOW, vol 538Google Scholar
- 65.Vrandečić D, Krötzsch M (2014) Wikidata: a free collaborative knowledgebase. Commun ACM 57(10):78–85Google Scholar
- 66.Vretanos PA (2005) Web feature service implementation specification. Open Geospatial Consort Specif 1325:04–094Google Scholar
- 67.Wick M, Vatant B, Christophe B (2015) Geonames ontology. http://www.geonames.org/ontology/documentation.html
- 68.Zaveri A, Rula A, Maurino A, Pietrobon R, Lehmann J, Auer S (2016) Quality assessment for linked data: a survey. Semant Web 7(1):63–93Google Scholar