RMLEditor: A Graph-Based Mapping Editor for Linked Data Mappings

Heyvaert, Pieter; Dimou, Anastasia; Herregodts, Aron-Levi; Verborgh, Ruben; Schuurman, Dimitri; Mannens, Erik; Van de Walle, Rik

doi:10.1007/978-3-319-34129-3_43

Pieter Heyvaert¹⁹,
Anastasia Dimou¹⁹,
Aron-Levi Herregodts²⁰,
Ruben Verborgh¹⁹,
Dimitri Schuurman²⁰,
Erik Mannens¹⁹ &
…
Rik Van de Walle¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9678))

Included in the following conference series:

European Semantic Web Conference

3332 Accesses
25 Citations

Abstract

Although several tools have been implemented to generate Linked Data from raw data, users still need to be aware of the underlying technologies and Linked Data principles to use them. Mapping languages enable to detach the mapping definitions from the implementation that executes them. However, no thorough research has been conducted on how to facilitate the editing of mappings. We propose the rmleditor, a visual graph-based user interface, which allows users to easily define the mappings that deliver the rdf representation of the corresponding raw data. Neither knowledge of the underlying mapping language nor the used technologies is required. The rmleditor aims to facilitate the editing of mappings, and thereby lowers the barriers to create Linked Data. The rmleditor is developed for use by data specialists who are partners of (i) a companies-driven pilot and (ii) a community group. The current version of the rmleditor was validated: participants indicate that it is adequate for its purpose and the graph-based approach enables users to conceive the linked nature of the data.

The described research activities were funded by Ghent University, iMinds, the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT), the Fund for Scientific Research Flanders (FWO Flanders), and the European Union.

You have full access to this open access chapter, Download conference paper PDF

Graph-Based Editing of Linked Data Mappings Using the RMLEditor

Juma: An Editor that Uses a Block Metaphor to Facilitate the Creation and Editing of R2RML Mappings

Linked Data Creation with ExcelRDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Semantic Web technologies rely on data which is interlinked and whose semantically enriched representation is available, the so-called Linked Data [1]. Most of the current Linked Data stems originally from (semi-)structured formats. Mappings specify in a declarative way how Linked Data is generated from such raw data. Nevertheless, defining and executing them still remains complicated, despite the significant number of tools implemented for this scope. At first, most approaches that map raw data to its rdf representation [2] incorporated the mappings in the implementation that executes them. Thus, not only knowledge of Semantic Web and Linked Data is required. However, also dedicated software development cycles for creating, updating and extending the implementations, whenever new or updated semantic annotations are desired, are needed. Mapping languages, such as rrml [3] and rml [4], enable to detach the mapping definitions from the implementation that executes them. Besides knowledge of the underlying mapping language that is required to define the mappings, manually editing and curating them requires a substantial amount of human effort [5]. Moreover, data specialists are not Semantic Web experts or developers. Thus, the task of editing mappings should be addressed independently, and disassociated from the corresponding mapping language and/or underlying technology used. Facilitating the editing of mappings further lowers the barriers of obtaining Linked Data and, thus stimulates the adoption of Semantic Web technologies. Nevertheless, dedicated environments that support users to intuitively edit mappings were not thoroughly investigated yet, as it occurred with applications that actually execute them and deliver its rdf representation. Step-by-step wizards prevailed, e.g., fluidOps editor [6], as an easy-to-reach solution. However, such applications restrict data publishers’ editing options, hamper altering parameters in previous steps, and detach mapping definitions from the overall knowledge modeling, since related information is separated in different steps. We propose the rmleditor^{Footnote 1}, an editing environment for specifying mappings of raw data to their rdf representation based on graph visualizations, without requiring knowledge of the underlying mapping language. The rmleditor is developed to support partners of (i) a companies-driven pilot for sharing and integrating the rdf representations of their data and co-develop third-party applications, and of (ii) a community-group-driven bootstrap for showcasing the advantages of Linked Data and Semantic Web technologies. The tool is available to interested parties under custom licensing conditions. We performed an exploratory user validation of our proposed solution, which showed that the rmleditor achieves the goal it was implemented for. 82 % of the participants found the use of graphs beneficial for editing mappings using the rmleditor and 70 % could better conceive that a relationship exists between multiple data sources. The remainder of the paper is structured as follows: Sect. 2 outlines existing mapping languages and mapping editors. Section 3 describes the mapping process without and with the use of a mapping editor. Section 4 presents our proposed solution, the rmleditor. Section 5 outlines the use cases and explains the exploratory user validation and presents the results. Last, Sect. 6 discusses the results and presents the conclusions of our solution.

2 Related Work

In this section, we discuss existing mapping languages. Moreover, we elaborate on existing mapping editors, with a distinction between editors either supporting homogeneous or heterogeneous data sources.

2.1 Mapping Languages

Mapping languages specify in a declarative way how Linked Data is generated from raw data. At first, existing formalizations were considered as mapping languages, such as xpath [7] or xquery language [8]. Nevertheless, there were also languages defined for this particular task. rrml [3] is the wc-recommended language to define mappings to generate rdf from data derived from relational databases. Besides rrml, other format-specific languages were defined, such as xml ^{Footnote 2} for xml data. There are also query-oriented languages such as xsparql [9], which combines xquery and sparql to map xml data, and Tarql^{Footnote 3}, for data in csv. However, these languages only support homogeneous data sources. A number of tools were developed supporting mappings from heterogeneous data sources to rdf, such as Datalift^{Footnote 4}, RDFizers^{Footnote 5} and Virtuoso Sponger^{Footnote 6}. However, those tools actually employ separate source-centric approaches for each format they support, which does not allow the interlinking between sources in different formats. The rdf Mapping Language (rml) [4] circumvents this, by enabling the generation of data in rdf representation based on multiple heterogeneous data sources, e.g., xml and json.

2.2 Mapping Editors

Despite the significant number of mapping languages, the number of corresponding editors that support users to define the mappings is not comparable. Similar to the mapping languages, a distinction can be made between tools supporting homogeneous data sources and tools supporting heterogeneous data sources.

Homogeneous Data Sources. The fluidOps editor [6] is a browser application that provides an intuitive user interface for editing mappings. The underlying mapping language is rrml. The fluidOps editor relies on a single step-by-step workflow. There are six successive steps, similar to actually creating a mapping document. Although its Graphical User Interface (gui) aims to hide the rrmlvocabulary, it still strongly focuses on concepts and terminology introduced by rrml(e.g., subject maps and object maps). Therefore, knowledge of the language is required to use the editor. This decreases its adoption by non-Semantic Web experts and lowers the gui’s reusability potential. Additionally, only once the users reach the final step, they are able to preview the mappings in rrmlsyntax and identify possible inconsistencies. Consequently, they need to restart the workflow to update the definitions of the previous steps. Pinkel et al. [5] adapted the original fluidOps editor to overcome flexibility limitations imposed by the database-driven step-by-step workflow. Their extension supports the ontology-driven approach. With the former approach creating the mappings starts with the data in the databases, semantic annotations are added afterwards. With the latter approach the mappings are created based on an existing ontology. Next, the mappings are complemented with data fractions from the databases. sheetrdf [10] is a platform that uses a pearl [11] document to map data in spreadsheets to rdf. Its gui allow users to view the source data, define the mappings by editing the pearldocument directly, and view the resulting rdf through a tabular-structure. However, the adoption of the tool decreases because users need knowledge about pearlto edit the mappings. An alternative approach is proposed by Rodrıguez-Muro et al. [12] who introduced -ontopPro-^{Footnote 7}, a plugin for Protégé [13]. It allows users to generate rdf based on data from database(s) and an ontology. Tabs are provided to manage the databases and the mappings. Users need to write sparql-like templates to define how to map the original data. However, data sources are limited to databases and users need to understand the template’s custom syntax.

Heterogeneous Data Sources. Karma^{Footnote 8} differentiates from aforementioned tools because it supports heterogeneous sources, such as databases, delimited text files (e.g., csv files), json, xml, Microsoft Excel and web apis. It uses Global-Local-As-View [14] rules to perform the mappings. These rules can be exported using rrmlor drq [15]. When displaying the mappings to the users, Karma takes a data-centric approach: users can only view the input data. Users are not able to follow the ontology-driven approach, as it occurs with the fluidOps editor. DataOps [16] uses the latter to support heterogeneous data formats. However, users are still confronted with the syntax of the used languages. RDF123 [17] is similar to sheetrdf, however, it also supports csv files, and it uses custom map graphs to represent the mappings, which are converted to a custom map function that produces rdf based on the input data. Consequently, users are not required to understand or know about the map function to define mappings, as it occurs with pearlfor sheetrdf. Additionally, they offer a web service that uses a link to a Google Spreadsheet or a csv file, and generates rdf based on the mappings defined with the application. Therefore, the maps can be used outside the desktop application. However, their use is limited to that web service, as a custom map function is used and not a mapping language. TopBraid Composer^{Footnote 9} supports heterogeneous sources. It allows data integration from databases, xml, uml, rss, spreadsheets and rdf data backends. The data can be reconciliated with DBpedia [18]. However, as in the case of Karma, interlinking data from different sources is not possible. To set up the mapping process, the gui offers a data-driven step-by-step workflow. However, an ontology-driven approach is not possible. Similar to the original fluidOps editor, a more simplified wizard has been built on top of rml, instead of rrml. This form-based browser application^{Footnote 10} supports heterogeneous data sources, because it has rmlas its underlying mapping language, compared to only homogeneous sources (i.e., databases) which are supported by the fluidOps editor. Last, OpenRefine^{Footnote 11} is a browser-based tool for cleansing raw data, changing the data format and incorporating external data using Web services. The rdf Refine extension^{Footnote 12} [19] allows users to export the data in rdf. A rdf graph is used to visualize the mappings. However, the rdf graph is forced in a hierarchy-layout, which weakens the advantages of using a graph representation. Additionally, this extension supports reconciliation services that offer http interfaces. User intervention is needed to assess the quality of each reconciliation. A preview of candidate entities is available to help the user.

3 Mapping Process

The mapping process is a series of steps performed in order to generate Linked Data from raw data. There are two variations of the process, depending on whether a mapping editor is used or not.

Mappings Without Editor. Generating rdf from (semi-)structured data, using a mapping language without a mapping editor, consists of two consecutive steps: First, the mapping is created (see Fig. 1a). In this step, the user needs to be aware of the input data and has to have knowledge about the domain and the mapping language’s specification. The latter is, in principle, possessed by Semantic Web experts. In most cases, a text editor is used to create the mappings. Statements in the mapping language follows this step. In the second step the statements together with the input data sources are used by the mapping processor to generate the rdf triples (see Fig. 1b). The mapping processor is a tool that knows how to interpret the mapping language’s specification to generate the rdf representation of the corresponding data, taking into consideration the statements derived from the previous step.

Mappings with Editor. When the mapping process is done with a mapping editor, the users do not longer need to be aware of the mapping language’s specification (see Fig. 1a’). This allows non-Semantic Web experts to define the mappings. Additionally, the text editor is replaced by the mapping editor. The mapping editor knows how to interpret the mappings created by the user using the editor based on the language’s specification. Subsequently, it generates the mapping language statements (see Fig. 1c). Important to note is that in this step no user knowledge about the specification is needed. Subsequently, the mapping is executed, and rdf triples are generated.

4 RMLEditor

The rmleditor is a browser-based gui with the goal to support users in production environments to define, in a uniform way, mappings that specify how to generate Linked Data represented using the prevalent rdf framework. In previous work [20], we listed 7 desired features of a gui for uniform mapping editors. First, it should be independent of the underlying mapping language, so that users are able to create mappings without knowledge of the language’s syntax. Second, it should allow users to execute the mappings outside of the editor, because it is only meant to create the mappings. Third, it should enable users to map multiple data sources at the same time, as it might occur that data is spread across multiple sources. Fourth, the editor should support data sources in different data formats, as the generation of Linked Data should be independent of the original format. Fifth, as multiple ontologies and vocabularies can be used to create a mapping, an editor should support the use of both existing and customs ontologies and vocabularies. Sixth, it should allow multiple alternative modeling approaches, as certain use cases might benefit from using a specific approach. Finally, by supporting non-linear workflows, users are able to keep an overview of the mapping model and its relationships. The gui of the rmleditor is designed to implement these features. The gui uses graphs to visualize the mappings. Manipulation of these graphs results in creating, updating and extending the mappings, which can be done without any knowledge of the underlying mapping language or other used technologies. The graphs express how the raw data will be represented as rdf. However, this expressiveness is independent of the language’s expressiveness. The rmleditor triggers the mapping processor which executes the mappings exported by the rmleditor and generates rdf statements. For the rmleditor, we chose rmlwhich can support mappings derived from a gui that covers all of the aforementioned features. However, any other mapping language could be used instead, if it allows to implement the features. In Sect. 4.1 we discuss the rmleditor’s architecture. In Sect. 4.2, we elaborate on how the features are implemented in the gui. In Sect. 4.3, we explain how the rmlprocessor, the mapping processor for rml, is used with the rmleditor. In Sect. 4.4, we present two real-life use cases of the rmleditor.

4.1 Architecture

The rmleditor’s high-level architecture is based on the multilayered architecture pattern [21]. This allows to separate the presentation and the logic of the mappings, using the presentation layer and application layer, respectively. The loading of mappings and data sources is done using the data access layer. The latter only communicates with the application layer. Communication between the presentation layer and the data access layer is not possible, as the architecture prohibits communication between layers that are not directly under or above each other. For the presentation layer, the Webix JavaScript library^{Footnote 13} is used to build the gui, in cooperation with the d3.js library [22] for the presentation of the graphs. The communication with application layer is facilitated by the Model-View-Controller pattern [23]. The Graph Markup Language (GraphML) [24] is used to represent the graph visualization of the mappings independently of the underlying mapping language. This allows users to export the graphs in an application-independent format. Additionally, the GraphML-version of the graphs are used to generate the corresponding rmlstatements. Users are able to load graphs by instructing the rmleditor to load the corresponding GraphML document. When rmlstatements need to be loaded, the statements are first converted to a GraphML document, then interpreted as graph elements, and shown in the gui.

4.2 Graphical User Interface

The graphical user interface of the rmleditor allows users to define mappings on existing data. To implement the aforementioned features for the gui, the rmleditor offers three panels to the users: Input Panel, Modeling Panel and Results Panel (see Fig. 2). They are aligned next to each other, however, when users want to focus on a specific panel, they are able to hide the other panels. The Input Panel shows the data sources to users (Feature 3). Each data source is assigned with a unique color. Depending on the data format, an adequate visualization is chosen (Feature 4). The Modeling Panel shows the mappings using a graph representation. The color of each node and edge depends on the data source that is used in that specific mapping, if any. It offers the means to manipulate the nodes and edges of the graphs in order to update the mappings. Semantic annotations can be added using multiple vocabularies and ontologies (Feature 5). The Linked Open Vocabularies^{Footnote 14} (LOV) can be consulted via the gui to get suggestions on which classes, properties and datatypes to use. As the graphs offer a generic representation of the mappings, because they do not depend on the underlying mapping language, this panel addresses Feature 1. Additionally, the graph representation and the rmlstatements can be exported (Feature 2), allowing the execution of the mappings outside the rmleditor. The Results Panel shows the resulting rdf dataset when the mappings defined in the Modeling Panel are executed on the data in the Input Panel. For each rdf triple of the dataset it shows the subject, predicate and object. The functionality and the interaction between the panels supports the different mapping generation approaches, as we described in previous work [20, 25] (Feature 6). The data-driven approach uses the input data sources as the basis to construct the mappings. The classes, properties and datatypes of the schemas are then assigned to the mappings. When users start with the vocabularies and ontologies to generate the mappings, the schema-driven approach is followed. Next, data fractions from the data sources can be associated to the mappings. Additionally, by not restricting users in when to interact with which panels – as would be the case for linear workflows – the rmleditor supports non-linear workflows (Feature 7).

4.3 RMLProcessor Server

As the rmleditor uses rml, it needs the functionality of the rmlprocessor^{Footnote 15}, a Java application which generates rdf based on provided rmlmapping documents. However, the processor is not needed to define the mappings. For the rmleditor’s needs, the rmlprocessor’s functionality is offered through a Web api. The Web api is developed using Node.js^{Footnote 16} and offers three functions: (i) executing a mapping document on a set of data sources; (ii) converting a GraphML document to rmlto execute the mappings using the rmlprocessor, and (iii) converting rmlstatements to a GraphML document to visualize the mappings that are loaded in the rmleditor.

4.4 Real-Life Use Cases

The rmleditor is developed for a pilot (combust ^{Footnote 17}), initiated by companies in Flanders who aim to build their collaboration network on top of their interlinked data. The rmleditor covers their need to generate the rdf representation of their raw data to be further used in other third-party applications. Moreover, the rmleditor supports the partners of the Open Tourism working group of the Belgian Chapter of Open Knowledge Foundation (okfn)^{Footnote 18} to semantically annotate their data with the Open Standard for Tourism Ecosystems Data^{Footnote 19} in the frame of the Sustainable Mobile Guides for Tourism (in Flanders) bootstrap. In both cases, the partners of the project have the raw data and the required knowledge about the domain. However, they have no understanding of mapping languages, and without the use of the rmleditor they need assistance of a Semantic Web expert. Additionally, for every update and execution of the mappings the expert is consulted, which introduces significant overhead. We collected feedback during the deployment of the rmleditor and concluded that they were able to create mappings for their data using their own domain knowledge. This is confirmed during our exploratory user validation, which involved members of the aforementioned pilot (see Sect. 5). Furthermore, the rmleditor is the topic of a number of workshops and tutorials for interested groups and users^{Footnote 20}.

5 Exploratory User Validation

We performed an exploratory user validation to assess the rmleditor’s adequacy to support users in defining mappings that generate rdf datasets. In Sect. 5.1 we discuss the two use cases that the participants completed. In Sect. 5.2 we explain the two groups of participants, the apparatus and the procedure followed during the validation. In Sects. 5.3 and 5.4 we elaborate on both the subjective and objective aspect of the validation. Finally, in Sect. 5.5 we discuss the results.

5.1 Use Cases

In this section, two use cases are outlined and it is explained in more detail how the rmleditor is used to generate the mappings. Each use case covers a different mapping generation approach: data-driven or schema-driven [25]. The first use case involves data about employees and projects they work on, originally in two different data sources. The goal is to semantically annotate them, and link the employees to the projects they work on. The data-driven approach is recommended for editing the mappings to rdf. Users start by loading the two data sources into the rmleditor. Next, mappings are generated based on data fractions from the sources. Subsequently, users semantically annotate the mappings. Identifying suitable classes, properties and datatypes is supported by the LOV. The second use cases involves data about movies and their directors. The goal is to semantically annotate them, and to interlink each movie to the person that directed it. The schema-driven approach, supported by the rmleditor, is recommended to the users for editing the mappings. Users start by generating mappings reusing movie concepts from the DBpedia Ontology^{Footnote 21} and person concepts from FOAF^{Footnote 22}. When the modeling is completed, the data sources are loaded. Subsequently, the mappings are updated with the data fractions to be used for generating the final rdf dataset.

5.2 Method

Participants. 15 participants from both Ghent University and the combust network took part. They were divided in two major groups: (i) Semantic Web experts with 10 participants and (ii) non-Semantic Web experts with 5 participants. A Semantic Web expert is a user who has knowledge about Semantic Web technologies and standards, including Linked Data and rdf. A non-Semantic Web expert is not aware of the Semantic Web technologies, including the editor’s underlying language rml. The Semantic Web experts are further distinguished in two sub-categories: (i) 5 experts with experience in Linked Data publishing, and (ii) 5 experts with no previous experience in Linked Data publishing.

Apparatus. The evaluation was carried out using the current version of the rmleditor. Participants used their own computer with Chrome^{Footnote 23}. Both the rmleditor and the rmlprocessor were hosted on the same physical server. Each participant participated in both use cases described in Sect. 5.1.

Procedure. The following steps were followed during the evaluation:

Step 1. We gave a presentation^{Footnote 24} to the participants where basic concepts, such as Linked Data, rdf and schemas were explained. In order for them to understand the purpose of the rmleditor. Additionally, the user interface’s panels were described. However, no introductory tutorial regarding how to use the rmleditor was given.
Step 2. We conducted a pre-assessment by asking the participants to fill in a questionnaire.
Step 3. Half of each group’s participants started with Use Case A, and the other half with Use Case B. This was done to eliminate the influences of the use case-specific data and editing approach on the final results of the evaluation. Subsequently, they completed the other use case. During the execution of the use cases, the experts were able to ask questions when they had problems with the rmleditor. It was recorded when intervention was needed. For the non-Semantic Web experts, closer observation was conducted and the think-aloud method [26] was taken into consideration. This allowed for the identification of specific usability issues related to the rmleditor. After each use case, we conducted an assessment to inquire about the difficulty of the use case.
Step 4. (Experts with prior rmlknowledge only) Half of the experts, with knowledge of the underlying mapping language rml, also had to create a mapping using a plain text editor, to observe differences and preferences between use and non-use of the rmleditor. However, this could not be done by all participants, as they do not possess the required knowledge.
Step 5. At the end of each use case, we collected the created mappings. Additionally, a second and third questionnaire were filled in by the participants after their first and second use case, respectively.
Step 6. We conducted a post-assessment by providing the participants with a fourth questionnaire to fill in.

5.3 Subjective Validation

We conducted a subjective validation of the rmleditor by presenting the participants with four questionnaires during the validation in step 2, 5 and 6. With the pre-assessment we assess the participants expectations before interacting with the rmleditor. For the pre-assessment’s questionnaire, we used the System Usability Scale (SUS) scale [27]. The SUS statements were translated to the use of the rmleditor. The intermediate questionnaires of step 5 allowed for a subjective self-assessment of the previously completed use case, and more specific the approach of the specific use case. With the post-assessment we assess the participants’ experience with the rmleditor and determine what the positive and negative aspects are of the rmleditor. For the intermediate questionnaires and the post-assessment’s questionnaire, we used the 7-point Likert scale [28] from ‘not difficult at all’ to ‘very difficult’ to measure difficulty, from ‘not useful at all’ to ‘very useful’ to measure usefulness, and from ‘not agree at all’ to ‘very much agree’ to measure the degree of agreement to a given statement.

5.4 Objective Validation

After each completed use case, we collected each participant’s mapping. We conducted a objective validation of the rmleditor by comparing all mappings to the baseline mapping that we created ourselves. Additionally, if a participant also created a mapping using rmldirectly, it was compared with the participant’s mapping created using the rmleditor.

5.5 Results

During the pre-assessment we measured that both groups of participants had high expectations regarding the ease-of-use and the improvements the rmleditor would bring to the mapping process. Interesting to note is that Semantic Web experts expected the tool to be hard to use if no introduction tutorial is provided, in contradiction to the non-Semantic Web experts.

Learning Curve. After having completed the first use case in row, we measured via the subjective validation that 60 % of both the Semantic Web experts and non-experts found it difficult to use the rmleditor to define the mappings that generate the rdf representation. After having completed the second use, all non-experts found it easier to use the rmleditor. However, 30 % of the experts still found it difficult to use the rmleditor over all. This is partially due to the fact that they all had high expectations of the rmleditor. All participants needed at least once help during the first in row use case. However, during the participants their second use case, 40 % of the Semantic Web experts did not need any help at all anymore. However, all non-Semantic Web experts still needed help during their second use case. Nevertheless, over all, a significant lower level of intervention by us was needed. We deduced that the completeness and accuracy of the mappings increased when users performed their second use case, compared to their first one. This shows that there is a learning curve to use the rmleditor, however, once users learn how to use it, the rmleditor is adequate for its scope. The latter is verified via subjective validation by the fact that 47 % of the participants found that the overall usage of the rmleditor was not difficult. However, still 20 % found it too difficult to use. Again, this is partially due to the fact that the participants had high expectations of the rmleditor. Even though the rmleditor aims to eliminate language and rdf-specific terminology, through the observation during the validation we found that 40 % of the participants had trouble with the used terminology in the gui, such as ‘child’ and ‘parent’ when data sources need to be interlinked, and ‘templates’ when the URIs of the entities are constructed based on a data fraction from the data source. However, once explained to the users, 90 % of the users could use the terminology as intended.

Editing Approaches. 67 % of the participants were able to start editing the mappings using the given approach with no assistance at all. Through oral feedback during the evaluation, participants stated their preferred approach. Because of that preference, when they were asked to follow the other approach, they stated it was counterintuitive. If users prefer the data-driven approach, the schema-driven approach is counterintuitive, because no data is loaded initially. If users prefer the schema-driven approach, the data-driven is counterintuitive, because no predefined schema to be used is given.

Graph Visualizations. During the post-assessment, participants were asked to what extend they agree with the statement that the use of graph is beneficial for editing mappings, and the statement that the graphs make the linked nature of the final rdf dataset clear. Through the subjective validation, we found that 82 % of all participants found the use of graphs beneficial for editing mappings, and that graph-based visualizations are adequate for conceiving how the final rdf dataset will be. The objective validation showed that in the case of Semantic Web experts, in only 15 % of the total twenty use cases there were incomplete or inaccurate mappings, regarding the modeling of the domain. In 67 % of the use cases they were able to create mappings as complete and accurate as expected.

Linking Data Sources. In 33 % of the use cases, participants missed the interlinking between multiple data sources when using rml, when we compared the mapping created with the rmleditor and the mapping created directly using rml, via the objective validation. Through the post-assessment, we concluded that the links were missing because of the difficulties with the terminology of the language, or because the participants just forgot the interlinking. With the rmleditor this only happened in 10 % of the use cases. However, it still occurred because not all terminology was well covered in the gui, as the participants made clear in the post-assessment.

6 Discussion and Conclusions

During the assessment both groups of participants were able to generate Linked Data through the rmleditor’s gui, with the graph-based visualization as the most important contributor. Therefore, the rmleditor fulfills its initial goal to supports users to define, in a uniform way, mappings that specify how to generate Linked Data. We present four main findings. First, non-Semantic Web experts are able to generate Linked Data when using the rmleditor. Additionally, for experts, the quality of the Linked Data is better when using the rmleditor instead of rmldirectly, by looking at the mapping created with the rmleditor and the mapping created by using rmldirectly. Therefore, second, when Semantic Web experts generate Linked Data using the rmleditor, the resulting rdf dataset is of at least the same quality as the dataset generated when using directly the rmllanguage, if quality metrics, defined by the Semantic Web experts, are kept constant. Furthermore, for example, when a relationship between two data sources exists, however, not made explicit by the user, this is reflected in the graph: a link between the resource nodes belonging to the sources is not present. Using rmldirectly, finding the missing relationship is more difficult. Therefore, third, graph-based visualizations of mappings improve the interlinking between multiple data sources during the mapping creation, compared to the use of no visualizations. Nevertheless, improvements to the rmleditor, and more specifically to the gui can be made during the next development sprints. As the main aspect of the rmleditor is the use of graphs, it is important that manipulations on them are made as easy as possible. Although the required functionality for the manipulations is available, improving even more its accessibility will benefit the mapping process. Additionally, allowing users to determine the size of the graphs allows them to select the desired detail-level of their mappings. As the participants where presented with two use cases, they were able to perceive a learning curve. Future evaluations need to be conducted to determine how steep this learning curve is. However, as the fourth finding, by providing a set of tutorials, where the features of the rmleditor are discussed will already improve the initial use of the rmleditor by new users. Topic of future research is the validation of these observations during further large-scale user testing. Moreover, the rmleditor supports both the data-driven and schema-driven approach. Whether a user starts with the use case using the data-driven or the schema-driven approach, the second use case in row is less difficult. Therefore, there is no approach enforced on the user by the rmleditor, and the preferred approach depends on the person and the circumstances [5]. Both Semantic Web experts and non-Semantic Web experts have different expectations from the rmleditor. Non-Semantic Web experts expect that they can generate Linked Data without any knowledge about a mapping language or Semantic Web technology. Semantic Web experts can already generate Linked Data, as they can use rmldirectly. Therefore, they expect that they can at least do the same as with rml. Additionally, they have a set of requirements for their Linked Data, for which they look in the rmleditor. These requirements improve, according to them, the quality of their generated Linked Data. In the long run, we want to see these expectations united. Non-Semantic Web experts do not only generate Linked Data. They are also concerned with the quality of their Linked Data, in order to further improve its adoption.

Notes

1.
http://rml.io/RMLeditor.
2.
https://github.com/delving/x3ml/blob/master/docs/x3ml-language.md.
3.
https://tarql.github.io/.
4.
http://datalift.org/.
5.
http://simile.mit.edu/wiki/RDFizers.
6.
http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtSponger.
7.
http://ontop.inf.unibz.it/components/sample-page/.
8.
http://www.isi.edu/integration/karma/.
9.
http://www.topquadrant.com/tools/modeling-topbraid-composer-standard-edition/.
10.
http://pebbie.org/mashup/rml.
11.
http://openrefine.org/.
12.
http://refine.deri.ie/.
13.
http://webix.com/.
14.
http://lov.okfn.org/dataset/lov/.
15.
https://github.com/RMLio/RML-Mapper.
16.
https://nodejs.org/.
17.
http://www.iminds.be/en/projects/2015/03/11/combust.
18.
http://www.openknowledge.be/.
19.
http://tourism.openknowledge.be:8080/spec/.
20.
http://rml.io/RMLevents.
21.
http://dbpedia.org/ontology/.
22.
http://xmlns.com/foaf/0.1/.
23.
Version 45.0.2454.101 or higher; https://www.google.com/chrome/.
24.
http://www.slideshare.net/secret/vI6AO0ywqzyk1m.

References

Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. In: Emerging Concepts, Semantic Services, Interoperability and Web Applications (2009)
Google Scholar
Brickley, D., Guha, R.: RDF Schema 1.1. Working group recommendation, W3C, February 2014. http://www.w3.org/TR/rdf-schema/
Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF Mapping Language. Working group recommendation W3C, September 2012. http://www.w3.org/TR/r2rml/
Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated RDF mappings of heterogeneous data. In: Workshop on Linked Data on the Web (2014)
Google Scholar
Pinkel, C., Binnig, C., Haase, P., Martin, C., Sengupta, K., Trame, J.: How to best find a partner? An evaluation of editing approaches to construct R2RML mappings. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 675–690. Springer, Heidelberg (2014)
Chapter Google Scholar
Sengupta, K., Haase, P., Schmidt, M., Hitzler, P.: Editing R2RML mappings made easy. In: Proceedings of the 12th International Semantic Web Conference (2013)
Google Scholar
Berglund, A., Boag, S., Chamberlin, D., Fernández, M.F., Kay, M., Robie, J., Siméon, J.: XML path language (XPath). World Wide Web Consortium (W3C) (2003)
Google Scholar
Boag, S., Chamberlin, D., Fernández, M.F., Florescu, D., Robie, J., Siméon, J., Stefanescu, M.: XQuery 1.0: An XML query language (2002)
Google Scholar
Bischof, S., Decker, S., Krennwallner, T., Lopes, N., Polleres, A.: Mapping between rdf, xml with xsparql. J. Data Seman. 1(3), 147–185 (2012). ISSN 1861–2032
Article Google Scholar
Fiorelli, M., Lorenzetti, T., Pazienza, M.T., Stellato, A., Turbati, A.: Sheet2RDF: a flexible and dynamic spreadsheet import&lifting framework for RDF. In: Ali, M., Kwon, Y.S., Lee, C.-H., Kim, J., Kim, Y. (eds.) IEA/AIE 2015. LNCS, vol. 9101, pp. 131–140. Springer, Heidelberg (2015)
Google Scholar
Pazienza, M.T., Stellato, A., Turbati, A.: Pearl: Projection of annotations rule language, a language for projecting (uima) annotations over rdf knowledge bases. In: LREC (2012)
Google Scholar
Rodrıguez-Muro, M., Hardi, J., Calvanese, D.: Quest: efficient SPARQL-to-SQL for RDF and OWL. In: 11th International Semantic Web Conference ISWC, p. 53. Citeseer (2012)
Google Scholar
Noy, N.F., Sintek, M., Decker, S., Crubézy, M., Fergerson, R.W., Musen, M.A.: Creating semantic web contents with protege-2000. IEEE Intell. Syst. 2, 60–71 (2001)
Article Google Scholar
Friedman, M., Levy, A.Y., Millstein, T.D., et al.: Navigational plans for data integration. In: AAAI/IAAI, pp. 67–73 (1999)
Google Scholar
Bizer, C., Seaborne, A.: D2RQ - treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd International Semantic Web Conference (ISWC 2004), vol. 2004. Citeseer, Hiroshima (2004)
Google Scholar
Pinkel, C., Schwarte, A., Trame, J., Nikolov, A., Bastinos, A.S., Zeuch, T.: DataOps: seamless end-to-end anything-to-RDF data integration. In: Gandon, F., Guéret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9341, pp. 123–127. Springer, Heidelberg (2015)
Chapter Google Scholar
Han, L., Finin, T., Parr, C., Sachs, J., Joshi, A.: RDF123: from spreadsheets to RDF. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 451–466. Springer, Heidelberg (2008)
Chapter Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Web Seman. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)
Article Google Scholar
Maali, F., Cyganiak, R., Peristeras, V.: Re-using cool URIs: entity reconciliation against LOD hubs. In: LDOW, vol. 813 (2011)
Google Scholar
Heyvaert, P., Dimou, A., Verborgh, R., Mannens, E., Van de Walle, R.: Towards a uniform user interface for editing mapping definitions. In: Proceedings of the 4th Workshop on Intelligent Exploration of Semantic Data, October 2015
Google Scholar
Richards, M.: Software Architecture Patterns. O’Reilly, Sebastopol (2015)
Google Scholar
Bostock, M., Ogievetsky, V., Heer, J.: D\(^3\) data-driven documents. IEEE Trans. Vis. Comput. Graph. 17(12), 2301–2309 (2011)
Article Google Scholar
Osmani, A.: Learning JavaScript Design Patterns. O’Reilly Media, Sebastopol (2012)
Google Scholar
Brandes, U., Eiglsperger, M., Herman, I., Himsolt, M., Marshall, M.S.: GraphML progress report layer proposal. In: Mutzel, P., Jünger, M., Leipert, S. (eds.) GD 2001. LNCS, vol. 2265, pp. 501–512. Springer, Heidelberg (2002)
Chapter Google Scholar
Heyvaert, P., Dimou, A., Verborgh, R., Mannens, E., Van de Walle, R.: Approaches for generating mappings to RDF. In: Proceedings of the 14th International Semantic Web Conference: Posters and Demos, October 2015
Google Scholar
Charters, E.: The use of think-aloud methods in qualitative research an introduction to think-aloud methods. Brock Educ. J. 12(2), 68–82 (2003)
Google Scholar
Brooke, J.: SUS - a quick and dirty usability scale. Usability Eval. Ind. 189(194), 4–7 (1996)
Google Scholar
Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. 22, 1–55 (1932)
Google Scholar

Download references

Author information

Authors and Affiliations

Data Science Laboratory, Ghent University - iMinds, Ghent, Belgium
Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, Erik Mannens & Rik Van de Walle
Ghent University – iMinds – MICT, Ghent, Belgium
Aron-Levi Herregodts & Dimitri Schuurman

Authors

Pieter Heyvaert
View author publications
You can also search for this author in PubMed Google Scholar
Anastasia Dimou
View author publications
You can also search for this author in PubMed Google Scholar
Aron-Levi Herregodts
View author publications
You can also search for this author in PubMed Google Scholar
Ruben Verborgh
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Schuurman
View author publications
You can also search for this author in PubMed Google Scholar
Erik Mannens
View author publications
You can also search for this author in PubMed Google Scholar
Rik Van de Walle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pieter Heyvaert .

Editor information

Editors and Affiliations

Universität Potsdam, Potsdam, Germany
Harald Sack
Linköping University, Linköping, Sweden
Eva Blomqvist
The Open University, Milton Keynes, United Kingdom
Mathieu d'Aquin
Fondazione Bruno Kessler, Trento, Italy
Chiara Ghidini
Universität Mannheim, Mannheim, Germany
Simone Paolo Ponzetto
Universität Bonn, Bonn, Nordrhein-Westfalen, Germany
Christoph Lange

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heyvaert, P. et al. (2016). RMLEditor: A Graph-Based Mapping Editor for Linked Data Mappings. In: Sack, H., Blomqvist, E., d'Aquin, M., Ghidini, C., Ponzetto, S., Lange, C. (eds) The Semantic Web. Latest Advances and New Domains. ESWC 2016. Lecture Notes in Computer Science(), vol 9678. Springer, Cham. https://doi.org/10.1007/978-3-319-34129-3_43

Download citation

DOI: https://doi.org/10.1007/978-3-319-34129-3_43
Published: 14 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34128-6
Online ISBN: 978-3-319-34129-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics