Skip to main content

Visualizing RDF Data Cubes Using the Linked Data Visualization Model

Part of the Lecture Notes in Computer Science book series (LNISA,volume 8798)

Abstract

Data Cube represents one of the basic means for storing, processing and analyzing statistical data. Recently, the RDF Data Cube Vocabulary became a W3C recommendation and at the same time interesting datasets using it started to appear. Along with them appeared the need for compatible visualization tools. The Linked Data Visualisation Model is a formalism focused on this area and is implemented by Payola, a framework for analysis and visualization of Linked Data. In this paper, we present capabilities of LDVM and Payola to visualize RDF Data Cubes as well as other statistical datasets not yet compatible with the Data Cube Vocabulary. We also compare our approach to CubeViz, which is a visualization tool specialized on RDF Data Cube visualizations.

Keywords

  • Linked data
  • RDF
  • Visualization
  • Data cube

The research is supported in part by the EU ICT FP7 under No.257943, LOD2 project and in part by project SVV-2014-260100.

1 Introduction

Data analysts are accustomed to making projections from multi-dimensional datasets to low-dimensional ones using aggregations, slicing and dicing known from OLAP [3]. Those can be easily visualized by well-known and widely implemented techniques like charts, timelines, map visualizations, etc. More and more stakeholders including governments and scientific groups are publishing their datasets in a form of Linked DataFootnote 1. Our goal is to apply the well-known visualization techniques which are understandable by non-expert users and use the Data Cube Vocabulary (DCV)Footnote 2 W3C Recommendation to achieve it. An expert user prepares a data cube and a non-expert one is provided with an easy way of exploring the cube with simple faceted visualization tools. In this paper, we demonstrate that our Linked Data Visualization Model enables us to create a flexible solution for RDF Data Cube visualizations that fit into a bigger, more general framework.

2 Linked Data Visualization Model

In our previous work we defined the Linked Data Visualization Model (LDVM) [1], an abstract visualization process customized for the specifics of Linked Data. LDVM allows users to create data visualization pipelines that consist of four stages: Source Data, Analytical Abstraction, Visualization Abstraction and View.

Source Data allows a user to define a custom transformation to prepare an arbitrary dataset for further stages, which require their input to be RDF. In this paper we only consider RDF data sources such as RDF files or SPARQL endpoints, e.g. DBPedia.

The Analytical Abstraction enables the user to specify analytical operators that extract data to be processed from a data source and then transform it to create the desired analysis. The transformation can also compute additional characteristics or even generate a new multi-dimensional dataset. For example, we can create a statistical dataset from DBPedia by querying for resources of type dbpedia-owl:City and using data from their properties such as dbpedia-owl:populationAsOf for a dimension and dbpedia-owl:populationTotal for a measure. Further analytical steps could be performed within this stage, e.g. filtering cities from a specific country.

In the Visualization Abstraction stage of LDVM we need to prepare the analytical data to be compatible with our Data Cube visualizer. In the case of the analytical data already being described by DCV, this stage can be skipped. Otherwise, we would have to use a LDVM transformer to convert non-DCV statistical data to DCV as it is the format required by our visualizer. This stage is what allows users to reuse statistical analyses with results in various formats without rewriting them simply by appending an appropriate transformer.

In View Stage, DCV-compliant data is passed to a visualizer which creates a user-friendly data cube visualization. Based on dimension links to SDMX and SKOS concepts, a visualizer can generate more sophisticated facets in order to let the user to slice and dice the data cube. A proper visualizer should contain the well-known data cube visualization techniques and in Payola, our LDVM implementation, we have such a visualizer.

3 Mapping Non-Data Cube Data to Data Cube

While experimenting with statistical data, we have encountered Linked Data datasets which contain statistical data, but do not use DCV. Since we have a visualizer using DCV, we implemented a tool, which is capable of mapping RDF non-cube data to a form compliant with DCV as a plugin usable in LDVM analyzers. While creating a new LDVM analyzer in Payola, a user is able to create a new instance of the DCV analytical plugin. On its input the plugin receives arbitrary RDF data and based on a user-defined pattern, it maps the data to a specified DCV data structure definition. A user is asked to supply a URL containing at least one DCV data structure definition (DSD) in RDF. The user is presented with a list of available DSDs and after selecting one, a new analytical plugin is created for this DSD. This plugin can then be used by other Payola users without the need for specifying the URL with DSD and becomes a part of our extensible library of reusable DCV analyzers.

Fig. 1.
figure 1

User inputs a mapping pattern

To be able to map an arbitrary dataset into a form compliant with DCV, the plugin needs the user to specify the data mapping. Based on DCV, this could be partially automated in the future. As can be seen in Fig. 1 the process is based on the query-by-example principle. The plugin shows the user a generic graph visualization based on a preview of the input which will be processed by the DCV analytical plugin. It lets them to select a pattern: step by step, they are asked by the application to mark a vertex, which represents one of dimensions/measures/attributes of the chosen DSD (red vertices). To narrow down the volume of the results or to be able to specify more sophisticated patterns, the user is also able to mark vertices (green ones), which refine the pattern, but do not represent any DSD component. Based on the given example, the plugin produces a SPARQL query. When executed against a SPARQL endpoint, it creates new links between existing resources and components of the DSD.

The resulting plugin can be used in various ways in an LDVM analyzer. Connected directly to a data source it works as a filter and transformer which selects only data related to the specified DSD and maps it to DCV at the same time. It could also be beneficial for a user to use the plugin as an inner analytical operator to filter and map processed data since using DCV it becomes snowflake-shaped and can be easier to work with in further analytical steps. Or, as a final plugin of an analyzer, it can transform results of a non-DCV analysis into DCV in the same way a visualization transformation does.

4 Payola and CubeViz

Payola and CubeViz represent visualization tools that use DCV. Both of them use the Highcharts library to deliver user-friendly visualizations (line, bar, column, area and pie charts) (see Fig. 2) and enable users to obtain a permanent link to a created visualization. When sent to a non-expert user, the link enables them to view a DCV-based visualization without any knowledge of Linked Data or DCV in an environment of a faceted browser. In addition, CubeViz provides a packing layout visualization of SKOS hierarchies using the d3js library. Such a visualizer is, however, also present in Payola but not as a part of the DCV visualizer as it can be also used in a more general way for non-DCV data.

Faceted capabilities of the two tools enable a user to slice a DCV cube, which means that they are enabled to select multiple values of two dimensions, one value from the rest of dimensions and choose a single measure. Configuring facets in such a way makes the tools load a 2-dimensional table, which is visualized by the aforementioned techniques. Both tools are technically capable of dicing (produces sub-cubes), but do not offer a way of visualizing more than 2 dimensions at a time.

Fig. 2.
figure 2

An example of a visualization prepared in Payola. The four-dimensional cube is based on Czech Statistical Office data

A DCV-based dataset could be visualized in both Payola and CubeViz with no additional transformations involved. The difference is that in Payola, any statistical RDF data can be transformed and visualized using the same data cube visualizer. In theory, CubeViz could even be used as an instance of a LDVM visualizer proving that LDVM is a more general and reusable framework. This could be achieved by supplying it with a DCV compatible LDVM visualization abstraction produced by a data cube LDVM pipeline. However, at the time of writing this paper, CubeViz was unstable and was crashing when loading data from our SPARQL endpoints so we could not finish evaluating this possibility.

5 Related Work

Tools like OLAP2DataCube [6] and Tables Footnote 3 enable users to convert non-RDF statistical datasets to DCV. Compared to Payola mapping process, they have a different input data type (relational data instead of RDF). In the phase of mapping data to DCV, they also rely on user input (selecting from a list or even using a custom DSL). From the group of more general visualization tools we name VisualBox Footnote 4 and Exhibit [4], which are JavaScript based libraries that are not DCV capable and require the user to have scripting abilities. GeoGlobe Footnote 5 and map4rdfFootnote 6 visualize spatial statistical data from a fixed dataset. Also Rhizomer [2] offers multi-dimensional data visualizations (maps for spatial data, timeline, charts, etc.) without involving DCV. Payola and CubeViz rely on DCV as well as Olap4ld [5], which is an implementation of the Open Java API for OLAP and while converting OLAP operations to SPARQL, it introduces OLAP-to-SPARQL analytical approach. Linked Statistical Data Analysis Footnote 7 presents results of SDMX-ML transformations into DCV. It enables a user to visualize correlations over a fixed statistical datasets prepared by a set of custom analytical and transformation scriptsFootnote 8.

6 Conclusions

In this demo we present the Payola Data Cube Vocabulary mapping plugin that demonstrates how DCV can be utilized throughout the stages of LDVM. For the View Stage of LDVM we implemented a DCV visualizer in Payola that is capable of visualizing DCV datasets and provides a user with facets with slicing and dicing of data cubes. A sample DCV visualization is located at http://vis.payola.cz/dcv_czso. Compared to CubeViz, which is another tool for RDF Data Cube visualization, Payola, thanks to being a LDVM implementation, offers a wider range of usage scenarios. One of those scenarios is visualizing statistical data that is not described by DCV simply by mapping it to DCV as a part of a standard LDVM pipeline.

Notes

  1. 1.

    http://wiki.planet-data.eu/web/Datasets

  2. 2.

    http://www.w3.org/TR/2014/REC-vocab-data-cube-20140116/

  3. 3.

    http://idi.fundacionctic.org/tabels/

  4. 4.

    https://github.com/alangrafu/visualbox

  5. 5.

    http://data.i2g.pl/insigos/hz-geo/globe/

  6. 6.

    http://oegdev.dia.fi.upm.es/map4rdf/

  7. 7.

    http://stats.270a.info/

  8. 8.

    https://github.com/csarven/publishing-statistical-linked-data/blob/master/csarven.publishing-statistical-linked-data.pdf?raw=true

References

  1. Brunetti, J.M., Auer, S., García, R., Klímek, J., Nečaský, M.: Formal linked data visualization model. In: Proceedings of the 15th International Conference on Information Integration and Web-based Applications & Services (IIWAS’13), pp. 309–318 (2013)

    Google Scholar 

  2. Brunetti, J.M., García, R., Auer, S.: From overview to FACETs and pivoting for interactive exploration of semantic web data. Int. J. Seman. Web Inf. Syst. 9(1), 1–20 (2013)

    CrossRef  Google Scholar 

  3. Chaudhuri, S., Dayal, U.: An overview of data warehousing and OLAP technology. SIGMOD Record 26(1), 65–74 (1997)

    CrossRef  Google Scholar 

  4. Huynh, D.F., Karger, D.R., Miller, R.C.: Exhibit: lightweight structured data publishing. In: Proceedings of the 16th International Conference on World Wide Web, WWW ’07, New York, NY, USA, pp. 737–746. ACM (2007)

    Google Scholar 

  5. Kämpgen, B., Harth, A.: No Size fits all – running the star schema benchmark with SPARQL and RDF aggregate views. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 290–304. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  6. Salas, P.E., Martin, M., Mota, F.M.D., Breitman, K., Auer, S., Casanova, M.A.: Olap2datacube: an ontowiki plugin for statistical data publishing. In: Proceedings of the 2nd Workshop on Developing Tools as Plug-ins, TOPI 2012, New York, NY, USA. ACM (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiří Helmich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Helmich, J., Klímek, J., Nečaský, M. (2014). Visualizing RDF Data Cubes Using the Linked Data Visualization Model. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds) The Semantic Web: ESWC 2014 Satellite Events. ESWC 2014. Lecture Notes in Computer Science(), vol 8798. Springer, Cham. https://doi.org/10.1007/978-3-319-11955-7_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11955-7_50

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11954-0

  • Online ISBN: 978-3-319-11955-7

  • eBook Packages: Computer ScienceComputer Science (R0)