Skip to main content

Towards Exploratory OLAP Over Linked Open Data – A Case Study

  • Conference paper
  • First Online:
Enabling Real-Time Business Intelligence (BIRTE 2014, BIRTE 2013)

Abstract

Business Intelligence (BI) tools provide fundamental support for analyzing large volumes of information. Data Warehouses (DW) and Online Analytical Processing (OLAP) tools are used to store and analyze data. Nowadays more and more information is available on the Web in the form of Resource Description Framework (RDF), and BI tools have a huge potential of achieving better results by integrating real-time data from web sources into the analysis process. In this paper, we describe a framework for so-called exploratory OLAP over RDF sources. We propose a system that uses a multidimensional schema of the OLAP cube expressed in RDF vocabularies. Based on this information the system is able to query data sources, extract and aggregate data, and build a cube. We also propose a computer-aided process for discovering previously unknown data sources and building a multidimensional schema of the cube. We present a use case to demonstrate the applicability of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://data.linkedmdb.org.

  2. 2.

    http://data.linkedmdb.org/sparql.

  3. 3.

    http://data.linkedmdb.org/resource/film/1005.

  4. 4.

    http://www.geonames.org/.

  5. 5.

    http://dbpedia.org/About.

  6. 6.

    http://www.mpi-inf.mpg.de/yago-naga/yago/.

  7. 7.

    http://www.freebase.com/.

  8. 8.

    http://datahub.io.

  9. 9.

    http://ckan.org/.

  10. 10.

    http://sindice.com/.

References

  1. Abelló, A., Darmont, J., Etcheverry, L., Golfarelli, M., Mazón, J., Naumann, F., Pedersen, T.B., Rizzi, S., Trujillo, J., Vassiliadis, P., Vossen, G.: Fusion cubes: towards self-service business intelligence. IJDWM 9(2), 66–88 (2013)

    Google Scholar 

  2. Abelló, A., Romero, O., Pedersen, T.B., Berlanga, R., Nebot, V., Aramburu, M.J., Simitsis, A.: Using semantic web technologies for exploratory OLAP: a survey. TKDE 99 (2014)

    Google Scholar 

  3. Bojars, U., Passant, A., Giasson, F., Breslin, J.G.: An architecture to discover and query decentralized RDF data. In: SFSW (2007)

    Google Scholar 

  4. Etcheverry, L., Vaisman, A., Zimányi, E.: Modeling and querying data warehouses on the semantic web using QB4OLAP. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 45–56. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  5. Etcheverry, L., Vaisman, A.A.: QB4OLAP: a vocabulary for OLAP cubes on the semantic web. In: COLD (2012)

    Google Scholar 

  6. Görlitz, O., Staab, S.: SPLENDID: SPARQL endpoint federation exploiting VOID descriptions. In: COLD (2011)

    Google Scholar 

  7. Hagedorn, S., Hose, K., Sattler, K., Umbrich, J.: Resource planning for SPARQL query execution on data sharing platforms. In: COLD (2014)

    Google Scholar 

  8. Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K., Umbrich, J.: Data summaries for on-demand queries over linked data. In: WWW, pp. 411–420 (2010)

    Google Scholar 

  9. Hartig, O.: Zero-knowledge query planning for an iterator implementation of link traversal based query execution. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 154–169. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Heim, P., Hellmann, S., Lehmann, J., Lohmann, S., Stegemann, T.: RelFinder: revealing relationships in RDF knowledge bases. In: Chua, T.-S., Kompatsiaris, Y., Mérialdo, B., Haas, W., Thallinger, G., Bailer, W. (eds.) SAMT 2009. LNCS, vol. 5887, pp. 182–187. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  11. Hogan, A., Harth, A., Umbrich, J., Kinsella, S., Polleres, A., Decker, S.: Searching and browsing linked data with SWSE: the semantic web search engine. J. Web Semant. 9(4), 365–401 (2011)

    Article  Google Scholar 

  12. Hose, K., Schenkel, R.: Towards benefit-based RDF source selection for SPARQL queries. In: SWIM, pp. 2:1–2:86 (2012)

    Google Scholar 

  13. Inoue, H., Amagasa, T., Kitagawa, H.: An ETL framework for online analytical processing of linked open data. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds.) WAIM 2013. LNCS, vol. 7923, pp. 111–117. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  14. Kämpgen, B., Harth, A.: Transforming statistical linked data for use in OLAP systems. In: I-SEMANTICS, pp. 33–40 (2011)

    Google Scholar 

  15. Kämpgen, B., Harth, A.: No size fits all - running the star schema benchmark with SPARQL and RDF aggregate views. In: ESWC, pp. 290–304 (2013)

    Google Scholar 

  16. Kämpgen, B., O’Riain, S., Harth, A.: Interacting with statistical linked data via OLAP operations. In: ILD, pp. 336–353 (2012)

    Google Scholar 

  17. Ladwig, G., Tran, T.: Linked data query processing strategies. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 453–469. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  18. Nebot, V., Berlanga, R.: Building data warehouses with semantic web data. Decis. Support Syst. 52(4), 853–868 (2012)

    Article  Google Scholar 

  19. Neumann, T., Moerkotte, G.: Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins. In: ICDE, pp. 984–994 (2011)

    Google Scholar 

  20. Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: Sindice.com: a document-oriented lookup index for open linked data. IJMSO 3(1), 37–52 (2008)

    Article  Google Scholar 

  21. Pedersen, D., Riis, K., Pedersen, T.B.: XML-extended OLAP querying. In: SSDBM, pp. 195–206 (2002)

    Google Scholar 

  22. Pedrinaci, C., Domingue, J.: Toward the next wave of services: linked services for the web of data. J.UCS 16, 1694–1719 (2010)

    Google Scholar 

  23. Pedrinaci, C., Liu, D., Maleshkova, M., Lambert, D., Kopecky, J., Domingue, J.: iServe: a linked services publishing platform. In: ORES (2010)

    Google Scholar 

  24. Prasser, F., Kemper, A., Kuhn, K.: Efficient distributed query processing for autonomous RDF databases. In: EDBT, pp. 372–383 (2012)

    Google Scholar 

  25. Romero, O., Abelló, A.: Automating multidimensional design from ontologies. In: DOLAP, pp. 1–8. ACM (2007)

    Google Scholar 

  26. Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  27. Sheth, A., Larson, J.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. 22(3), 183–236 (1990)

    Article  Google Scholar 

  28. Umbrich, J., Hose, K., Karnstedt, M., Harth, A., Polleres, A.: Comparing data summaries for processing live queries over linked data. WWWJ 14(5–6), 495–544 (2011)

    Article  Google Scholar 

  29. Umbrich, J., Karnstedt, M., Hogan, A., Parreira, J.X.: Hybrid SPARQL queries: fresh vs. fast results. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 608–624. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  30. Vaisman, A., Zimányi, E.: Data Warehouse Systems: Design and Implementation. Springer, New York (2014)

    Book  Google Scholar 

  31. W3C. Describing linked datasets with the VoID vocabulary (2010). http://www.w3.org/TR/void/

  32. W3C. Data W3C (2013). http://www.w3.org/standards/semanticweb/data

  33. W3C. The RDF data cube vocabulary (2013). http://www.w3.org/TR/2013/CR-vocab-data-cube-20130625/

  34. W3C. W3C semantic web activity homepage (2013). http://www.w3.org/2001/sw

  35. Semantic Web. SPARQL endpoint (2013). http://semanticweb.org/wiki/SPARQL_endpoint

Download references

Acknowledgment

This research is partially funded by the Erasmus Mundus Joint Doctorate in “Information Technologies for Business Intelligence – Doctoral College (IT4BI-DC)”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dilshod Ibragimov .

Editor information

Editors and Affiliations

Appendices

Appendix

Prefixes Used in the Paper

figure m

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ibragimov, D., Hose, K., Pedersen, T.B., Zimányi, E. (2015). Towards Exploratory OLAP Over Linked Open Data – A Case Study. In: Castellanos, M., Dayal, U., Pedersen, T., Tatbul, N. (eds) Enabling Real-Time Business Intelligence. BIRTE BIRTE 2014 2013. Lecture Notes in Business Information Processing, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46839-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46839-5_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46838-8

  • Online ISBN: 978-3-662-46839-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics