The contribution of linked open data to augment a traditional data warehouse

Abstract

The arrival of Big Data has contributed positively to the evolution of the data warehouse (DW ) technology. This gives birth of augmented DW s that aim at maximizing the effectiveness of existing ones. Various augmentation scenarios have been proposed and adopted by firms and industry covering several aspects such as new data sources (e.g., Linked Open Data (LOD), social, stream and IoT data), data ingestion, advanced deployment infrastructures, programming paradigms, data visualization. These scenarios allow companies reaching valuable decisions. By examining traditional DW s, we realized that they do not fulfill all decision-maker requirements since data sources alimenting a target DW are not rich enough to capture Big Data. The arrival of LOD era is an excellent opportunity to enrich traditional DW s with a new V dimension: Value. In this paper, we first conceptualize the variety of internal and external sources and study its effect on the ETL phase to ease the value capturing. Secondly, a Value-driven approach for the DW design is discussed. Thirdly, three realistic scenarios for integrating LOD in the DW landscape are given. Finally, experiments are conducted showing the added value by augmenting the existing DW environment with LOD.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    http://www.wfmc.org/

  2. 2.

    e.g. Dbpedia SPARQL endpoint: https://dbpedia.org/sparql

  3. 3.

    http://linkedgeodata.org/

  4. 4.

    http://www.scholarlydata.org/dumps/

  5. 5.

    https://permid.org/download

  6. 6.

    http://swat.cse.lehigh.edu/projects/lubm/

  7. 7.

    https://www.springernature.com/gp/researchers/scigraph

  8. 8.

    http://swat.cse.lehigh.edu/projects/lubm/queries-sparql.txt

  9. 9.

    http://www.colinda.org/

  10. 10.

    http://www.wikicfp.com/cfp/

References

  1. Abelló, A., Romero, O., Pedersen, T.B., Llavori, R.B., Nebot, V., Cabo, M.J.A., Simitsis, A. (2015). Using semantic web technologies for exploratory OLAP: a survey. IEEE Transition Knowledge Data Engineering, 27(2), 571–588.

    Article  Google Scholar 

  2. Abelló Gamazo, A., Gallinucci, E., Golfarelli, M., Rizzi Bach, S., Romero Moral, O. (2016). Towards exploratory olap on linked data. In SEBD (pp. 86–93).

  3. Baldacci, L., Golfarelli, M., Graziani, S., Rizzi, S. (2017). Qetl: an approach to on-demand etl from non-owned data sources. DKE, 112, 17–37.

    Article  Google Scholar 

  4. Ballou, D.P., & Tayi, G.K. (1999). Enhancing data quality in data warehouse environments. Communications of the ACM, 42(1), 73–78.

    Article  Google Scholar 

  5. Beheshti, A., Benatallah, B., Nouri, R., Tabebordbar, A. (2018). Corekg: a knowledge lake service. Proceedings of the VLDB Endowment, 11(12), 1942–1945.

    Article  Google Scholar 

  6. Beheshti, A., Benatallah, B., Tabebordbar, A., Motahari-Nezhad, H.R., Barukh, M.C., Nouri, R. (2019). Datasynapse: a social data curation foundry. Distributed and Parallel Databases, 37(3), 351–384.

    Article  Google Scholar 

  7. Berkani, N., & Bellatreche, L. (2017). A variety-sensitive ETL processes. In DEXA, (Vol. 2 pp. 201–216).

  8. Berkani, N., Bellatreche, L., Benatallah, B. (2016). A value-added approach to design BI applications. In DaWaK (pp. 361–375).

    Google Scholar 

  9. Berkani, N., Bellatreche, L., Khouri, S., Ordonez, C. (2019). Value-driven approach for designing extended data warehouses. In DOLAP.

  10. Berro, A., Megdiche, I., Teste, O. (2015). Graph-based ETL processes for warehousing statistical open data. In ICEIS, (Vol. 2015 pp. 271–278).

  11. Boehm, B. (2003). Value-based software engineering: reinventing. ACM SIGSOFT Software Engineering Notes, 28(2), 3.

    Article  Google Scholar 

  12. Božič, K., & Dimovski, V. (2019). Business intelligence and analytics for value creation: the role of absorptive capacity. IJIM, 46, 93–103.

    Google Scholar 

  13. Calvanese, D., & et al. (1999). A principled approach to data integration and reconciliation in data warehousing. In DMDW (p. 16).

  14. Deb Nath, R.P., Hose, K., Pedersen, T.B. (2015). Towards a programmable semantic extract-transform-load framework for semantic data warehouses. In DOLAP (pp. 15–24).

  15. Dehainsala, H., Pierra, G., Bellatreche, L. (2007). OntoDB: an ontology-based database for data intensive applications. In DASFAA (pp. 497–508).

  16. Deza, M.M., & Deza, E. (2009). Encyclopedia of distances. In Encyclopedia of distances (pp. 1–583): Springer.

  17. Eckerson, W. (2003). Smart companies in the 21st century: the secrets of creating successful business intelligence solutions. TDWI Report Series 7.

  18. Etcheverry, L., Vaisman, A., Zimányi, E. (2014). Modeling and querying data warehouses on the semantic web using qb4olap. In DaWAK (pp. 45–56).

    Google Scholar 

  19. Golfarelli, M., & Rizzi, S. (2009). A survey on temporal data warehousing. International Journal of Data Warehousing and Mining (IJDWM), 5(1), 1–17.

    Article  Google Scholar 

  20. Gordijn, J., & Akkermans, J. (2003). Value-based requirements engineering: exploring innovative e-commerce ideas. Requirements Engineering, 8(2), 114–134.

    Article  Google Scholar 

  21. Gosain, A., & et al. (2015). Literature review of data model quality metrics of data warehouse. Procedia Computer Science, 48, 236–243.

    Article  Google Scholar 

  22. Guarino, N., Andersson, B., Johannesson, P., Livieri, B. (2016). Towards an ontology of value ascription. In FOIS, IOS Press, (Vol. 283 p. 331).

  23. Hoffart, J., & et al. (2011). YAGO2: exploring and querying world knowledge in time, space, context, and many languages. In WWW (pp. 229–232).

  24. Hoffer, J.A., Ramesh, V., Topi, H. (2011). Modern database management. Upper Saddle River: Prentice Hall.

    Google Scholar 

  25. Kämpgen, B., O’Riain, S., Harth, A. (2012). Interacting with statistical linked data via OLAP operations. In ESWC (pp. 87–101).

    Google Scholar 

  26. Konstantinou, N., & et al. (2017). The VADA architecture for cost-effective data wrangling. In SIGMOD (pp. 1599–1602).

  27. Matei, A., Chao, K., Godwin, N. (2014). OLAP for multidimensional semantic web databases. In BIRTE (pp. 81–96).

    Google Scholar 

  28. Mountantonakis, M., & Tzitzikas, Y. (2018). Scalable methods for measuring the connectivity and quality of large numbers of linked datasets. JDIQ, 9(3), 15.

    Article  Google Scholar 

  29. Nebot, V., & Llavori, R.B. (2012). Building data warehouses with semantic web data. Decision Support Systems, 52(4), 853–868.

    Article  Google Scholar 

  30. Ravat, F., Song, J., Teste, O. (2016). Designing multidimensional cubes from warehoused data and linked open data. In RCIS (pp. 1–12).

  31. Saad, R., Teste, O., Trojahn, C. (2013). Olap manipulations on rdf data following a constellation model. In 1st international workshop on semantic statistics.

  32. Sabharwal, S., Nagpal, S., Aggarwal, G. (2017). Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques. JSAEM, 8(2), 703–715.

    Google Scholar 

  33. Sales, T.P., Guarino, N., Guizzardi, G., Mylopoulos, J. (2017). An ontological analysis of value propositions. In EDOC (pp. 184–193): IEEE.

  34. Sales, T.P., Baião, F.A., Guizzardi, G., Almeida, J.P.A., Guarino, N., Mylopoulos, J. (2018). The common ontology of value and risk. In ER (pp. 121–135).

    Google Scholar 

  35. Serrano, M., Trujillo, J., Calero, C., Piattini, M. (2007). Metrics for data warehouse conceptual models understandability. JIST, 49(8), 851–870.

    Google Scholar 

  36. Skoutas, D., & Simitsis, A. (2007). Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Semantic Web, 3(4), 1–24.

    Article  Google Scholar 

  37. Thew, S., & Sutcliffe, A. (2018). Value-based requirements engineering: method and experience. Requirements Engineering, 23(4), 443–464.

    Article  Google Scholar 

  38. van Der Aalst, W.M., Ter Hofstede, A.H., Kiepuszewski, B., Barros, A.P. (2003). Workflow patterns. Distributed and Parallel Databases, 14(1), 5–51.

    Article  Google Scholar 

  39. Wegmann, A. (2003). On the systemic enterprise architecture methodology (seam). In CONF (pp. 483–490).

  40. Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S. (2016). Quality assessment for linked data: a survey. Semantic Web, 7(1), 63–93.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nabila Berkani.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Berkani, N., Bellatreche, L., Khouri, S. et al. The contribution of linked open data to augment a traditional data warehouse. J Intell Inf Syst (2020). https://doi.org/10.1007/s10844-020-00594-w

Download citation

Keywords

  • Linked open data
  • Traditional \(\mathcal {D}\mathcal {W}\) augmentation
  • Value