The CSV file format is a convenient and popular way to exchange structured data, but the semantics of the information captured in such files are not explicit. RDF, on the other hand, provides one means to add meaning to data that software can process. The process of converting data in any non-RDF format (e.g., CSV and relational databases) into RDF is called uplift.
There are scenarios where it is necessary to manipulate data during the uplift process. Depending on the source data format, this can be very straightforward. With uplift languages for relational databases such as R2RMLFootnote 1, for example, one can rely on the underlying RDBMS to support some data manipulation tasks (e.g., string concatenation). The same is true for XSPARQL [5], where one can use XQuery to manipulate the data contained in XML. Sometimes, however, relying on the underlying technology is not sufficient [4]. For CSV datasets there is no such equivalent, standardized underlying technology. When data in CSV files has to be manipulated to generate RDF, one needs to resort to data pre- or post-processing. This increases complexity in terms of number of steps necessary to generate RDF, and renders the whole data processing “pipeline” less transparent. A solution to tackle this problem is to capture these manipulations as functions in the mappings.
We propose a method to incorporate functions into mapping languages that draws inspiration from, and generalizes ideas presented in [4]. To demonstrate our method, we extend RML’s vocabulary and engine to include notions for function calls and parameter bindings. The main contributions of this paper can be summarized as follows: (i) a method to incorporate functions in a mapping language; (ii) an implementation of the method extending RML; and (iii) a demonstration of functions incorporated into mappings applied to a real world dataset.