R2LD: Schema-based Graph Mapping of relational databases to Linked Open Data for multimedia resources data
- 58 Downloads
The Web of Data used to share and interchange the diverse data of heterogeneous types on the Web has been actively established. Ontology-based Linked Open Data (LOD) that allows computers to understand and process data semantics has emerged to extend the current Web of Documents. LOD is important for the modeling of multimedia resources since it provides an efficient way for unstructured data resources. This paper proposes a noble and practical schema-based mapping method to populate Linked Open Data sets from relational databases for multimedia resources. The proposed schema-based mapping R2LD realizes the seamless mapping for RDB-to-RDF by taking advantage of the compatible conceptual schema. The schema-based mapping can resolve the complicated mapping issues in RDB-to-RDF, such as primary and foreign key relationships. The mapping description is straightforward and flexible. It can define mapping information in the form of attribute-value pairs. Especially, the proposed mapping method is suitable for the unstructured multimedia resources. The schema-based mapping R2LD provides an efficient way to implement SPARQL endpoint into RDB and preserve the performance of SQL, which is vital to the dissemination of LOD.
KeywordsRDF Linked Open Data direct mapping SPARQL SQL Multimedia resources
The research and development establishing the Web of Data has been accomplished by the opening and sharing of various heterogeneous data types such as the unstructured multimedia resources on the Web. The Linked Open Data (LOD) has emerged as a powerful enabler to extend the current Web of Documents to a Web of interlinked data and, ultimately, into the Semantic Web. Using domain ontologies, LOD can exploit machine-readable data from the diverse data resources in the Web by means of web-based standards for encoding datasets and linking them to other published datasets. In the last decade, numerous best practices of LOD, including DBpedia,YAGO and CAMO, have been published by an increasing number of researchers, governments, public organizations and data providers, creating a global data space that interlinks billions of assertions: the Web of Linked Open Data [4, 7, 9]. Therefore, LOD has evolved from a practical research idea into a very promising technology that can realize the Web as a platform for an intelligent information system with semantic search, query and reasoning capability. In order to accelerate LOD paradigm in the Web, the publication of machine-readable LOD sets should take precedence [4, 21].
Since the vast amount of useful data are still stored in relational databases (RDB), one of the most efficient ways to populate LOD sets is to map data in relational databases into RDF, which is the standard data model of LOD. Due to the importance of mapping RDB to RDF (RDB2RDF), the multifold mapping approaches have been proposed [11, 16, 18]. The most important step towards RDB2RDF is two standard recommendations by the W3C RDB2RDF Working Group: Direct Mapping and R2RML mapping language [1, 12, 14].
With these remarkable achievements for RDB2RDF, it has been expected that the seamless integration of RDB data with RDF datasets towards the Web of Data can be easily actualized. In practice, encouraged by the standard mapping language R2RML, many RDB2RDF systems adopting this common language have been developed in various areas . However, in spite of earnest efforts to adopt this common mapping language, the publication of RDB data on the Web in machine-readable RDF format did not yield the significant results as expected. Although direct mapping and R2RML seem to be inevitable for RDB2RDF, this approach has some limitations [11, 13]. The complex mapping structures of R2RML written in Turtle hinders its practical applications. Due to its robust structures, we cannot find fully-fledged R2RML processors. R2RML also does not address some common questions that occurred in RDB2RDF, such as the implementation of translation process and the way to access the mapped RDF datasets with SPARQL . The translation of SPARQL queries into equivalent SQL queries is one of the crux issues in RDB2RDF [5, 8].
This paper proposes a noble and practical method based on schema translation for RDB2RDF. Since the conceptual schema of RDB is similar to ontological modeling of a certain domain, the mapping of RDB schema into RDF Schema is more effective than the conventional instance-based approaches. We describe how to resolve the intrinsic differences between RDB and RDF in schema level. In addition, we also show the ways to map the operational differences, such as graph pattern matching in RDF and JOIN operations in RDB. To extend this approach, we present a new schema-level mapping method and an effective way to implement SPARQL endpoint in RDB.
The rest of the paper is organized as follows. Section 2 reviews related work on RDB-to-RDF mapping approaches, especially direct mapping method, and analyses the principles for the mapping approaches. Section 3 presents the detailed description of the proposed schema-base mapping: the underlying concepts, mapping method and mapping description. Section 4 deals with schema-based mapping system architecture and implementation of SPARQL endpoint in RDB with some typical mapping examples. Section 5 concludes the paper and discusses possible future work.
2 Related work
Since the vast amounts of information are stored in RDB, RDB2RDF for the publication of RDB data on the Web and the integration of data from different RDBs has been a crucial research topic for LOD applications. A myriad of approaches, techniques, and corresponding tools for RDB2RDF have been proposed over the last decade. With these research efforts, several studies have been conducted to compare approaches and techniques from diverse perspectives. The motivations, underlying principles, specifications, capabilities, and categorizations of RDB2RDF can be referred to these comprehensive surveys of proposed approaches [3, 8, 11, 16, 17, 18].
The W3C RDB2RDF Working Group has proposed two standards for RDB2RDF: Direct Mapping and R2RML (Relational Database to RDF Mapping Language) [1, 12, 14]. Direct Mapping is the recommended approach to directly translate RDB data and its schema to RDF representation. R2RML is a generic language for describing a set of customized mapping rules that transform RDB data into RDF datasets. The publications by the W3C have addressed a new, normalized discipline towards standardized RDB2RDF mapping and the development of compliant tools. After emerging R2RML, direct mapping becomes the dominant approach of RDB2RDF.
table-to-class: a table is translated into an ontological class identified by a URI.
row-to-resource: each row or tuple of a table is translated into a resource that has triple structure in RDF model.
column-to-predicate: each column of a table is translated into predicate in RDF triple, representing an ontological property.
primary key-to-subject: the primary key used as an identifier of the table is translated into subject with URI in RDF triple.
cell-to-literal value: each cell with a literal value is translated into object in RDF triple, representing a data property.
cell-to-resource with URI: each cell with a foreign key constraint is translated into a resource with URI in RDF triple, representing an object property.
3 R2LD: Schema-based Graph Mapping of multimedia RDB to LOD
This section describes the concepts and the techniques of schema-based mapping of RDB2RDF. After reviewing the mapping requirements, we elaborated on the details of schema-based mapping, called R2LD, and explained how to describe mapping relations.
3.1 Requirements for RDB2RDF
The instances of LOD are based on RDF, the standard framework for expressing and interchanging information about resources on the Web. Resources can be anything, including documents, people, physical objects, multimedia and abstract concepts. In RDF, resources are represented in the form of subject–predicate–object expressions, known as triples. The subject denotes the resource, and the predicate denotes traits or aspects of the resource, and expresses a relationship between the subject and the object.
The local vocabularies used in RDB should be able to translate into common equivalent ontology vocabularies, since the main objective of RDB2RDF is to expose and publish RDB data on the Web. This means the mapping method should provide the efficient ways to redefine column names of RDB tables into ontology vocabularies used in LOD according to their semantic attribute.
The mapped RDB resources should be identified by URIs of LOD. Dereferencing a HTTP URI about RDF data returns appropriate information. This is mandatory for the mapped RDB data to be LOD.
The mapping method should provide not only translation of RDB table to RDF graph pattern, but also transformation of SPARQL queries to SQL queries. The mapping description plays the role of an arbitrator between RDB and RDF, unless the duplicated LOD sets of RDB data are generated.
The mapping method should have competent facility to handle the primary and foreign key constraints and M:N relationships. Since these are the principal mechanism to join tables in RDB, these play an essential role in constructing RDF graph model.
Some tables decomposed by normalization are used only to connect tables. The mapping method should provide a way to process these normailzed tables since they do not represent the conceptual schema. In addition, even tables composed of conceptual attributes should be able to be decomposed on any occasion while they are mapped into RDF graph to maintain the consistent data model.
As most of the requirement analyses have mentioned, the above requirements are mandatory for RDB2RDF, however, it is easy to neglect their indispensable functionalities. Especially, many RDB2RDF mapping approaches have revealed the difficulties in handling the normalized tables, foreign key constraints and SQL query generation. This paper proposes a more efficient, noble approach to deal with these issues.
3.2 Schema-based Graph Mapping approach
The conceptual schema of relational database is usually represented with entity-relationship diagram (ERD), which is almost identical to RDF data model. Accordingly, it can be expected that the RDB2RDF mapping approach should be based on RDB schema of ERD. This provides a consistent way of mapping and makes it possible to preserve information and conceptual structures of RDB in the process of the mapping.
An entity in ERD is a physical or logical object that can be uniquely identified in a domain, which is conceptual element corresponding to the class of RDF Schema. Since RDF data with triple structure, that become LOD instances, are generated by means of the classes and properties of RDF Schema, RDB schema defined by ERD should be the starting point of RDB2RDF. This schema-based mapping provides the notable feature of the separation of concerns between schema and instance. Focusing on conceptual schema, the mapping method can realize seamless translation of RDB data to LOD instances at schema level. The mapping description also becomes more succinct, since it need not specify the mapping definition for each instance as direct mapping does. Above all, schema-based mapping is expected to realize all the requirements for RDB2RDF.
3.2.1 Mapping RDB table schema to RDF graph
database-to-namespace: the database name is mapped into the namespace of RDF. Since the database is in some senses kind of the domain, it defines the domain vocabularies to tables and columns.
table-to-subject: the table name is mapped into the subject of RDF data model. While the subject of the triple in RDF model usually denotes the specified resources with URI, the subject by schema-based mapping plays a role of the organizer to compose the instances.
column-to-predicate: the column as a property is mapped into the predicate of RDF data model.
table.column-to-object: the cell value described in table.column is mapped into the object of RDF data model.
row-to-instance: each row of the table corresponds to RDF triple.
A table in itself as a resource is mapped into a subject with namespace and can be additionally associated with the common ontology vocabularies, such as rdf:type and rdfs:subClassOf. The table column Ci corresponding to the predicate of RDF data can be translated into the well-known vocabularies, such as DC, FOAF, CAMO, and vocabularies in schema.org. In this manner, the schema-based mapping can realize semantic interoperability of the mapped RDF data by using a simple, effective mapping description. The object value of the predicate is described in the value term, TableName.Ci. The value term notation makes it possible to translate SPARQL query into the equivalent SQL query.
The subject of RDF triple usually denotes the specified resource. However, in schema-based mapping the subject acts as a virtual subject to compose the triple. In practical applications of LOD, since the instance can be identified by its properties, there is no need to use the superfluous identifier as the subject. Rather, the subject in schema-based mapping can represent the class type of the instance.
The schema-based mapping is more succinct and effective than direct mapping based on the instance. The schema-based mapping can resolve theoretical requirements of RDB2RDF as well as preserve the conceptual schema completely. The mapping approach need not be based on the instance modeling since the instance are stored in RDB and can be accessed anytime by SQL query.
3.2.2 Types of the predicate
In relational database, the primary key and the foreign key are used to establish the relationships between the tables. Sometimes these key attributes are independent of the conceptual schema, required only to connect tables. Besides, the relationships between tables are not explicitly specified, only represented in the conceptual schema. Since the relationships are denoted by the predicate in RDF data model, the implied relationships by the key attributes should be explicitly redefined. There are two types of the predicate by its function.
The link predicate originates from the relationships by the primary key and the foreign key. There are two types of link predicates depending on the target table: internal link (i-link) predicate for the recursive relationship and external link (e-link) predicate for the different tables. The object value of the link predicate has the relation expressions that represent the link equation, TableName.Ci = TableName.Cj. In Fig. 4, for example, the column ARTIST.CoWorker in the table ARTIST becomes the i-link predicate and has the value, ARTIST.CoWorker = ARTIST.ID, as the object, while the column ARTIST.DID becomes e-link predicate and has the value, ARTIST.DID = PUBLISHER.ID, as the object. The link predicate is represented with the ontology vocabulary in the RDF graph model.
The object value term or relation expression of the predicate can solve the difficult problems in RDB2RDF, such as M:N or foreign key relationships. The SQL query to access the instance data is compiled from SPARQL variables with the object value term or relation expression. Figure 4 is the typical example of RDB tables containing the unique features of RDB that should be considered in RDB2RDF. Although AP table is actually a normalized table, its own attributes, Role and Date, are added to show that schema-based mapping can handle the complexity of the relationships in RDB. The column CoWorker in ARTIST table is a recursive relation.
Note that a new relation between tables such as accomplish can be added without any difficulties during RDB2RDF mapping description. This can enable more complete conceptualization rather than simple mapping of RDB to RDF and generation of a new model appropriate for LOD.
3.3 Mapping description of R2LD
The mapping of the attribute predicate shown in Fig. 6a is straightforward. Several RDB vocabularies such as ARTIST. Name, PUBLISHER.Name and PRODUCT.Name can be mapped into a same vocabulary. However, the correct mapping can be clarified by other predicates during the processing of RDF graph. In addition, the common ontologies such as FOAF and GeoNames can be easily imported without any restrictions. This makes it possible to redesign and generate more appropriate RDF graph model of RDB schema.
The prominent feature of schema-based mapping is to use the join expression for link predicate as shown in Fig. 6b, c. Since the relations between resources are implicitly implemented with key types in RDB tables, it is reasonable to explicitly map the link predicate to the join expressions. The join expression can resolve the diverse difficult problems caused in RDB2RDF mapping. Any complex table relationships normalized or partitioned for database management can be accommodated with join expression in a simple and efficient manner.
The mapping description consists of simple one-to-one correspondence of the vocabularies between RDB and RDF. Many-to-one mapping shown in the attribute predicate can be also regarded as one-to-one since the ambiguity can be easily resolved in the process of RDF graph. The mapping description is also flexible enough that some useful ontology vocabularies can be easily added to realize more competent model of LOD. The simplicity of mapping description also provides easy implementation and applications.
4 R2LD and SPARQL query processing of multimedia resources
To verify the effectiveness of schema-based mapping, this section describes an implementation of SPARQL endpoint into RDB. The schema-base mapping can duplicate RDB to LOD as direct mapping doses. Moreover, the translation of SPARQL to SQL is also straightforward. This section explains SPARQL-to-SQL translation under schema-based mapping.
4.1 Schema-based SPARQL endpoint architecture
The SPARQL endpoint to publish RDB data into LOD is implemented with the simple interface to RDB. In the conventional mapping approach, such as direct mapping, robustness and performance have been serious problems. However, schema-based mapping realizes robustness by conceptual schema mapping and performance by SQL query over RDB data. Schema-based mapping can implement SPARQL endpoint with a simple add-on interface without duplication of RDB.
4.2 SPARQL query mapping
In schema-based mapping R2RS, the primary resources for writing SPARQL queries is the mapped RDF graph represented by the conceptual scheme of RDB. The mapped RDF graph promotes conceptual thinking so that this provides a more efficient way to write SPARQL queries. Since the nearly identical conceptual schema is used in both SPARQL and SQL, SPARQL graph patterns are efficiently translated into SQL queries. Through the typical use cases of SPARQL queries, the effectiveness of query processing in schema-based mapping is shown below.
4.2.1 Relationships between two tables with foreign keys
Although many suggestions have been proposed, the primary and foreign key issues still remain cumbersome obstacles in RDB2RDF mapping. Some proposed mapping description is also too complicated to apply for the real operational databases. However, in schema-based mapping R2LD, the foreign key relationships are explicitly represented in the mapped RDF graph, and the detailed mapping methods are specified as the join expressions in the mapping description. This provides an efficient way to solve the implicit foreign key relationships in RDB.
For example, in Fig. 9, the subject ?x is related to the tableS ARTIST and CONTACT by means of its predicates and the predicate name is resolved to ARTIST.Name. So from the mapping description in Fig. 6c, since the join expression to combine ARTIST and CONTACT is obtained, SPARQL query can be easily translated in to SQL query.
The join expression in schema-based mapping is an efficient way to solve the diverse problems caused in RDB2RDF. All complicated mapping problems can be accommodated into the mapping description. This grants more flexibility than rigid mapping for simple duplication of RDB.
4.2.2 Recursive relationships
The recursive relationships are easily implemented in RDB and RDF graph model. However, the mapping of recursive relationships is another matter in RDB2RDF. Although some mapping approaches suggest the plausible solutions, these methods accompany ineffective and unpractical issues. Schema-based mapping can easily deal with this cumbersome problem by means of the join expression.
4.2.3 Multi-tables relationships
Even though SPARQL query use the same predicate such as name in ?x and ?y, this ambiguity can be resolved with the mapping information of other predicates. Schema-based mapping is flexible in the use of ontology vocabularies.
4.2.4 Multi-tables and M:N relationships
In schema-based mapping, any additional relationships can be inserted as a link predicate accomplish shown in Fig. 5. The additional relationships regardless of RDB schema can realize more reasonable RDF model.
4.2.5 Joined and merged-tables with M:N relationships
Any complicated SPARQL query can be easily translated into the corresponding SQL query in schema-based mapping. The following example shown in Fig. 12 is conceptually intutive and suitable for LOD. However, this query contains very complex relationship problems among the tables of RDB.
The variable ?a is related to the table CONTACT, however, the table is implicitly joined and merged with ARTIST. The variable ?b is related with the Table AP that originally mediates the primary and foreign key relationships by the normalization, but has its own attributes. In addition, the link predicate accomplish related with the variable? y as seen in the previous example is the condensed relation that merges several relations in RDB.
The schema-based mapping supports to build more natural SPARQL query adequate for LOD applications and provides the efficient way of the mapping to SQL query. This make it possible to implement SPARQL endpoint in RDB, which is vital to publish RDB to LOD sets.
The research and development involved in the realization of the Web of Data has been actively accomplished by opening and sharing various heterogeneous data types such as unstructured multimedia resources on the Web. Ontology-based LOD that allows computers to understand and process the data semantics is proposed as a standard data model. In this model, LOD becomes the enabler to realize the Web of Data by using the standard data model for both structured and unstructured information resources of the Web and a shared semantic representation.
The high qualitative data sets should be provided to realize the diverse intelligent services on the Web. The conventional approaches to develop LOD sets, such as ontology-based LOD generation and the translation of RDB to LOD, suffer from the complexity and difficulty of the approaches, the realization of domain peculiarities and practical adaptabilities. In addition, the development of information systems based on LOD has made very little progress due to the lack of appropriate, specialized methodologies and tools for the instances of LOD sets.
The most effective and practical way to populate the LOD sets is to publish the data stored in RDB on the Web as the standard form of RDF. Many studies about RDB-to-RDF mapping have been conducted to realize the Web of Data. Two standard drafts have been proposed as important achievements by the W3C: Direct Mapping and R2RML mapping language. However, the practical RDB-to-RDF mapping approach is still an open question.
This paper proposes a noble and practical RDB2RDF mapping method suitable for LOD at the conceptual schema level. Since the conceptual schema of RDB is similar to ontological domain modeling of RDF, the proposed schema-based mapping R2LD can achieve more coherent mapping than the conventional direct mapping approaches by dissolving the structural and operational differences, such as graph pattern matching and JOIN operations. The mapping description is straightforward on account of the compatible conceptual structures and can accommodate the complex relationships in an effective manner. In addition, the implementation of schema-based mapping is simple and intuitive as seen in several typical examples. So, the schema-based mapping R2LD provides an efficient way to implement SPARQL endpoint into RDB, which is vital to disseminate LOD.
This paper was supported by Wonkwang university in 2017.
- 1.A Direct Mapping of Relational Data to RDF. W3C Recommendation 27 September 2012. https://www.w3.org/TR/rdb-direct-mapping. Accessed 12 Oct 2018
- 2.A Direct mapping of Relational Data to RDF. W3C Working Draft 29 May 2012. https://www.w3.org/TR/rdb-direct-mapping/. Accessed 12 Oct 2018
- 3.A Survey of Current Approaches for Mapping of Relational Databases to RDF. W3C RDB2RDF Incubator Group report 1:113–130. https://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf. Accessed 12 October 2018
- 6.Erling O (2008) Requirements for Relational to RDF Mapping. https://www.w3.org/wiki/Rdb2RdfXG/ReqForMappingByOErling. Accessed 12 Oct 2018
- 8.Hert M, Reif G, Gall HC (2011) A Comparison of RDB-to-RDF Mapping Languages. Proceedings of the 7th International Conference on Semantic Systems, ACM, New York, pp 25–32. https://doi.org/10.1145/2063518.2063522
- 9.Hu W, Jia C, Wan L et al (2014) CAMO: Integration of Linked Open Data for Multimedia Metadata Enrichment. ISWC Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-319-11964-9_1
- 10.Lee TB (1998) Relational Databases on the Semantic Web. www.w3.org/DesignIssues/RDB-RDF.html. Accessed 12 Oct 2018
- 11.Michel F, Montagnat J, Zucker CF (2014) A Survey of RDB to RDF Translation Approaches and Tools. Dissertation, I3S. <hal-00903568v2>Google Scholar
- 12.R2RML and Direct Mapping Test Cases. W3C Working Group Note 14 August 2012. https://www.w3.org/TR/rdb2rdf-test-cases. Accessed 12 Oct 2018
- 13.R2RML and Direct Mapping Test Cases. W3C Editor's Draft 24 July 2012. http://www.w3.org/2001/sw/rdb2rdf/test-cases. Accessed 12 Oct 2018
- 14.R2RML: RDB to RDF Mapping Language. W3C Recommendation 27 September 2012. https://www.w3.org/TR/r2rml. Accessed 12 Oct 2018
- 15.RDF 1.1 Prime. W3C Working Group Note 25 February 2014. https://www.w3.org/TR/rdf11-primer/. Accessed 12 Oct 2018
- 16.Sahoo SS, Halb W, Hellmann S et al (2009) A Survey of Current Approaches for Mapping of Relational Databases to RDF. W3C RDB2RDF Incubator Group report 1:113–130. https://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf. Accessed 12 Oct 2018
- 17.Sequeda J, Priyatna F, Villazón-Terrazas B (2012) Relational Database to RDF Mapping Patterns. Proceedings of the 3rd International Conference on Ontology Patterns (WOP'12). pp 97–108. http://ceur-ws.org/Vol-929/paper9.pdf. Accessed 02 Feb 2019
- 19.SPARQL 1.1 Query Language. W3C Recommendation 21 March 2013. https://www.w3.org/TR/sparql11-query. Accessed 12 Oct 2018
- 20.Use Cases and Requirements for Mapping Relational Databases to RDF. W3C Working Draft 8 June 2010. https://www.w3.org/TR/rdb2rdf-ucr/. Accessed 12 Oct 2018
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.