Link++: A Flexible and Customizable Tool for Connecting RDF Data Sources

Masri, Ali; Zeitouni, Karine; Kedad, Zoubida; Kepeklian, Gabriel

doi:10.1007/978-3-319-47602-5_22

Ali Masri^19,20,
Karine Zeitouni¹⁹,
Zoubida Kedad¹⁹ &
…
Gabriel Kepeklian²⁰

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 9989))

Included in the following conference series:

European Semantic Web Conference

1469 Accesses

Abstract

Existing interlinking tools focus on finding similarity relationships between entities of distinct RDF datasets by generating owl:sameAs links. These approaches address the detection of equivalence relations between entities. However, in some contexts, more complex relations are required, and the links to be defined follow more sophisticated patterns. This paper introduces Link++, an approach that enables the discovery of complex links in a flexible manner. Link++ enables the users to generate rich links by specifying a link pattern as well as rules and functions to discover them. When visiting the demo, attendees will be introduced to all the aspects of the system explaining the required steps to define custom functions, connection patterns and linking rules until finally obtaining custom connections.

You have full access to this open access chapter, Download conference paper PDF

LinkD: element-based data interlinking of RDF datasets in linked data

Article Open access 16 July 2022

On Materialized sameAs Linksets

LinkLion: A Link Repository for the Web of Data

Keywords

1 Introduction

Data interlinking aims to find equivalence relations between entities of different datasets in the Web of data. Roughly speaking, a user defines two data sources and a linking rule which specifies how different entities can be related together. The results are a set of triples in the form x owl:sameAs y which are used to navigate from one data source to another in order to gain richer information.

Existing interlinking tools [1–3, 5, 6] support this task by providing a platform with a set of similarity functions that can be combined to form a designated rule. The interlinking engine processes the given datasets, applies an interlinking rule and returns the results.

In the same spirit as for data interlinking, our aim is to answer the need for more complex relations in some application domains. We want to enable the possibility of discovering complex relationships carrying more information in the form of properties that would be used for analysis purposes.

For instance, consider two transportation datasets representing a set of bus stops and a set of train stations respectively. Assume that our goal is to link bus stops that can be reached from a train station, with the intention of developing a multi-modal trip planner that uses these links to compute trips. Using the existing tools, we are faced with two main limitations.

The restriction to a predefined set of functions for composing linking rules. In our case, we lack of precise functions to calculate the closeness of a bus stop and a train station. Existing tools support geographical distances to calculate distance similarity which are not always reliable in real life, e.g., two geographically close stops that are separated by a river might be hard to reach from one another, therefore connecting them does not make sense.
The representation of the generated output. Supporting complex relations requires more complex output patterns. Considering our example, interlinking based on the stations that can be reached from another gives the output BusStop1 nextTo TrainStation132 which delivers no idea about the semantics of this relation. They are next to each others but how close are they? and what are the transportation modes that we can use? etc.

Our contribution is twofold, first we define connection patterns – a new way of customizing the output of a linking process. Second, we propose a platform that enables dynamic functions insertion and integration within the linking process.

In this work we introduce Link++, a tool that enables users to produce complex links by using their defined functions and connection patterns to support any type of relation between datasets.

The paper is structured as follows: In Sect. 2 we give an overall description of our system. We test the approach via a use case explained in Sect. 3 then we finally conclude in Sect. 4.

2 System Design and Implementation

To enable the discovery of complex links we introduce the notions of customized Connection Patterns. A connection pattern defines the content and the format of the link to be generated by defining the properties along with the associated properties. Figure 1 shows an overall view of the designed platform to generate customized connections.

TO begin, a configuration task is executed to define a connection pattern represented by a set of connection properties where each property is calculated by a function. Custom functions can be provided either by the user or by a common pre-coded functions library. In our implementation the functions are represented within a JAVA class and the library dependencies are regular .jar files. The configuration parameters are inputs of the connection generation component where the linking rule is defined. The linking rule is a file that describes the conditions to generate a connection pattern between entities. In short, if a rule between two entities is valid, a connection is instantiated and filled based on the given template (connection pattern). In the connection generation phase the system passes over the data sources and test the validity of a user-defined linking rule. If a rule is valid the connection pattern is evaluated and the resulting connection is stored in a connection store. In our implementation, both the configuration file (connection pattern) and the linking rule are XML files described by a DTD and the connection is generated in Turtle format^{Footnote 1}. Figure 2 shows an example of a connection pattern. An executable version of the system can be found online via the link: https://github.com/alimasri/link-plus-plus.git in addition to a video tutorial on: https://youtu.be/u2gr7Wa4eT4.

3 Scenario

The scenario we present in this demo describes how we can use our tool to connect transportation points of transfer to provide data for multimodal trip planning solutions. Transportation companies work in isolation and provide specific planning solutions for their data. We have used our solution to expand the transportation view by creating chains of connections in cities. We have considered two data sources representing SNCF train stations^{Footnote 2} and VELIB^{Footnote 3} bike sharing stations in the Paris area in France. Both data sources are pre-processed and translated into RDF using the DataLift platform [4].

The specified linking rule aims to find for each train station the nearby bike sharing stations, which is defined to be the walking distance since it is more relevant for our case, and it can be calculated via any distance API over the web. In this scenario we choose the Google distance matrix API^{Footnote 4}, however users are free to choose any existing API or create their own since this is completely independent from the process. For the connection pattern, we are interested in getting the source and destination (generated by default), the calculated walking distance and the estimated walking time. Both files are provided as inputs to our system and we have successfully generated an output file representing the needed connections as shown in a snapshot of our system in Fig. 3.

The output file can be used as an input for trip planning algorithms to create multimodal trips and to facilitate and optimize passengers trips. The reliability of the results depends on the chosen functions and this requires a careful selection from the user.

4 Conclusion

Current interlinking tools focus on discovering equivalence relationships between datasets. In the same spirit as these tools, we have proposed an approach which discovers more complex relations to support specific application requirements such as linking transportation data sources.

In this paper we introduced Link++, a flexible interlinking tool which provides user defined semantic relationships between entities. The system enables the users to define their own functions and connection patterns thus providing customized and flexible linking capability.

Future work will target the dynamic nature of the created links. For instance, in some application domains (e.g. transportation) the created links can be dynamic and vary in real time which indeed affect their correctness and reliability. Moreover, we are interested in defining a functions library where users can share their functions and connection patterns allowing reusability and knowledge sharing.

Notes

References

Jaffri, A., Glaser, H., Millard, I.C.: Managing URI synonymity to enable consistent reference on the semantic web. In: Proceedings of the Workshop on Identity, Reference, and the Web (IRSW) (2008)
Google Scholar
Ngomo, A-C.N., Auer, S.: LIMES: a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI 2011, vol. 3, pp. 2312–2317. AAAI Press (2011)
Google Scholar
Raimond, Y., Sutton, C., Sandler, M.B.: Automatic interlinking of music datasets on the semantic web. In: Linked Data on the Web, vol. 369 (2008)
Google Scholar
Scharffe, F., Atemezing, G., Troncy, R., Gandon, F., Villata, S., Bucher, B., Hamdi, F., Bihanic, L., Képéklian, G., Cotton, F., et al.: Enabling linked data publication with the datalift platform. In: Proceedings of AAAI Workshop on Semantic Cities (2012)
Google Scholar
Scharffe, F., Liu, Y., Zhou, C., Rdf-ai: an architecture for rdf datasets matching, fusion and interlink. In: Proceedings of IJCAI 2009 Workshop on Identity, Reference, and Knowledge Representation (IR-KR) (2009)
Google Scholar
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk-a link discovery framework for the web of data. In: Linked Data on the Web, vol. 538 (2009)
Google Scholar

Download references

Acknowledgments

We thank Mr. Bertrand Leroy for the fruitful discussion.

Author information

Authors and Affiliations

DAVID Laboratory, University of Versailles Saint-Quentin-en-Yvelines, Versailles, France
Ali Masri, Karine Zeitouni & Zoubida Kedad
VEDECOM Institute, Versailles, France
Ali Masri & Gabriel Kepeklian

Authors

Ali Masri
View author publications
You can also search for this author in PubMed Google Scholar
Karine Zeitouni
View author publications
You can also search for this author in PubMed Google Scholar
Zoubida Kedad
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Kepeklian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Masri .

Editor information

Editors and Affiliations

Hasso-Plattner-Institut für Softwaresystemtechnik, Universität Potsdam, Potsdam, Germany
Harald Sack
Innovation Development, Istituto Superiore Mario Boella, Turin, Italy
Giuseppe Rizzo
Technical University of Ilmenau, Ilemnau, Germany
Nadine Steinmetz
Artiﬁcial Intelligence Laboratory, J. Stefan Institute, Ljubljana, Slovenia
Dunja Mladenić
Institut für Informatik III, University of Bonn, Bonn, Germany
Sören Auer
Institut für Informatik III, Universität Bonn, Bonn, Germany
Christoph Lange

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Masri, A., Zeitouni, K., Kedad, Z., Kepeklian, G. (2016). Link++: A Flexible and Customizable Tool for Connecting RDF Data Sources. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds) The Semantic Web. ESWC 2016. Lecture Notes in Computer Science(), vol 9989. Springer, Cham. https://doi.org/10.1007/978-3-319-47602-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-47602-5_22
Published: 20 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47601-8
Online ISBN: 978-3-319-47602-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Link++: A Flexible and Customizable Tool for Connecting RDF Data Sources

Abstract

Similar content being viewed by others