Advertisement

Cluster Computing

, Volume 16, Issue 4, pp 915–931 | Cite as

Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service

  • Nabila Berkani
  • Ladjel BellatrecheEmail author
  • Selma Khouri
Article

Abstract

The data warehouse technology has become the incontestable tool for businesses and organizations to make strategic decisions to ensure their competitively. The construction of a data warehouse (\(\mathcal{D}\mathcal{W}\)) passes by selecting relevant information sources, extracting relevant data and loading them into the \(\mathcal{D}\mathcal{W}\). These processes require a precise expertise from designers related to logical and physical implementations of information sources, which is not usually an easy task. The diversity and heterogeneity of information sources makes the construction process of the \(\mathcal{D}\mathcal{W}\) complex and time consuming. Domain ontologies have been proposed to reduce heterogeneity between sources, platforms, services, etc. They resolve syntax and semantic conflicts. The phenomenon of adopting domain ontologies by organizations creates a new type of databases, called semantic databases (\(\mathcal{S}\mathcal{D}\mathcal{B}\)). As a consequence, they become a candidate for building the semantic \(\mathcal{D}\mathcal{W}\) (\(\mathcal{S}\mathcal{D}\mathcal{W}\)). To handle the diversity of information sources and hide the implementations aspects of sources, proposing a generic framework for constructing \(\mathcal{D}\mathcal{W}\) becomes a necessity. In this paper, we first proposed an ontology-based approach for designing \(\mathcal{S}\mathcal {D}\mathcal{B}\). Secondly, ETL phases are defined at ontological level to hide the implementation details. Thirdly, a storage service for ontologies and its associated data is given. Finally, our proposal is validated through a case study and a tool.

Keywords

Semantic databases Data warehouse Ontologies ETL processes Service BPMN 

References

  1. 1.
    Halevy, A.Y., Ashish, N., Bitton, D., Carey, M.J., Draper, D., Pollock, J., Rosenthal, A., Sikka, V.: Enterprise information integration: successes, challenges and controversies. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 778–787 (2005) Google Scholar
  2. 2.
    Jarke, M., Jeusfeld, M.A., Quix, C., Vassiliadis, P.: Architecture and quality in data warehouses: an extended repository approach. Inf. Syst. 24(3), 229–253 (1999) CrossRefGoogle Scholar
  3. 3.
    Liu, X., Thomsen, C., Pedersen, T.B.: Mapreduce-based dimensional ETL made easy. J. Proc. VLDB Endow. 5(12), 1882–1885 (2012) Google Scholar
  4. 4.
    Calvanese, D., Giacomo, G., Lenzerini, M., Nardi, D., Rosati, R.: Data integration in data warehousing. Int. J. Coop. Inf. Syst. 10(3), 237–271 (2001) CrossRefGoogle Scholar
  5. 5.
    Agrawal, D., Das, S., El Abbadi, A.: Big data and cloud computing: new wine or just new bottles? J. Proc. VLDB Endow. 3(2), 1647–1648 (2010) Google Scholar
  6. 6.
    Agrawal, D., El Abbadi, A., Wang, S.: Secure and privacy-preserving data services in the cloud: a data centric view. J. Proc. VLDB Endow. 5(12), 2028–2029 (2012) Google Scholar
  7. 7.
    Haase, P., Motik, B.: A mapping system for the integration of owl-dl ontologies. In: IHIS, pp. 9–16 (2005) CrossRefGoogle Scholar
  8. 8.
    Gruber, T.R.: A translation approach to portable ontology specifications. In: Knowledge Acquisition, vol. 5, pp. 199–220 (1993) Google Scholar
  9. 9.
    Bellatreche, L., Nguyen Xuan, D., Pierra, G., Dehainsala, H.: Contribution of ontology-based data modeling to automatic integration of electronic catalogues within engineering databases. Comput. Ind. 57(8–9), 711–724 (2006) CrossRefGoogle Scholar
  10. 10.
    Fankam, C.: Ontodb2: un systme flexible et efficient de base de donnes base ontologique pour le web smantique et les donnes techniques. Poitiers University, Ph.D. Thesis (2009) Google Scholar
  11. 11.
    Lu, J., Ma, L., Zhang, L., Brunner, J.S., Wang, C., Pan, Y., Yu, Y.: Sor: a practical system for ontology storage, reasoning and search. In: VLDB, pp. 1402–1405 (2007) Google Scholar
  12. 12.
    Wu, Z., Eadon, G., Das, S., Chong, E., Kolovski, V., Annamalai, M., Srinivasan, J.: Implementing an inference engine for rdfs/owl constructs and user-defined rules in oracle. In: ICDE, pp. 1239–1248 (2008) Google Scholar
  13. 13.
    Beneventano, D., Bergamaschi, S., Castano, S., Corni, A., Guidetti, R., Malvezzi, G., Melchiori, M., Vincini, M.: Information integration: the momis project demonstration. In: VLDB Journal, pp. 611–614 (2000) Google Scholar
  14. 14.
    Mena, E., Illarramendi, A., Kashyap, V., Sheth, A.P.: Observer: an approach for query processing in global information systems based on interoperation across pre-existing ontologies. Distrib. Parallel Databases 8(2), 223–271 (2000) CrossRefGoogle Scholar
  15. 15.
    Wache, H., et al.: Ontology-based integration of information—a survey of existing approaches. In: OIS, pp. 108–117 (2001) Google Scholar
  16. 16.
    Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual modeling for ETL processes. In: DOLAP, pp. 14–21 (2002) Google Scholar
  17. 17.
    Trujillo, J., Luján-Mora, S.: A uml based approach for modeling ETL processes in data warehouses. In: ER, pp. 307–320 (2003) Google Scholar
  18. 18.
    Mazón, J.-N., Trujillo, J.: An mda approach for the development of data warehouses. In: JISBD, p. 208 (2009) Google Scholar
  19. 19.
    Wilkinson, K., Simitsis, A., Castellanos, M., Dayal, U.: Leveraging business process models for ETL design. In: ER, pp. 15–30 (2010) Google Scholar
  20. 20.
    Akkaoui, Z., Mazón, J., Vaisman, A., Zimányi, A.: Bpmn-based conceptual modeling of ETL processes. In: DaWaK, pp. 1–14 (2012) Google Scholar
  21. 21.
    Calvanese, D., De Giacomo, G., Lenzerini, M., Nardi, D., Rosati, R.: A principled approach to data integration and reconciliation in data warehousing. In: DMDW, p. 16 (1999) Google Scholar
  22. 22.
    Calvanese, D., Giacomo, G., Lenzerini, M., Nardi, D., Rosati, R.: Data integration in data warehousing. Int. J. Coop. Inf. Syst. 10(3), 237–271 (2001) CrossRefGoogle Scholar
  23. 23.
    Luján-Mora, S., Vassiliadis, P., Trujillo, J.: Data mapping diagrams for data warehouse design with uml. In: ER, pp. 191–204 (2004) Google Scholar
  24. 24.
    Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Modeling ETL activities as graphs. In: DMDW, pp. 52–61 (2002) Google Scholar
  25. 25.
    Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A generic and customizable framework for the design of ETL scenarios. Inf. Syst. 30(7), 492–525 (2005) CrossRefGoogle Scholar
  26. 26.
    Shmueli, O., Tsur, S.: Logical diagnosis of ldl programs. New Gener. Comput. 9(3/4), 277–304 (1991) CrossRefGoogle Scholar
  27. 27.
    Luján-Mora, S., Trujillo, J.: Physical modeling of data warehouses using uml component and deployment diagrams: design and implementation issues. J. Database Manag. 17(2), 12–42 (2006) CrossRefGoogle Scholar
  28. 28.
    Tziovara, P., Vassiliadis, P., Simitsis, A.: Deciding the physical implementation of ETL workflows. In: DOLAP, pp. 49–56 (2007) Google Scholar
  29. 29.
    Simitsis, A., Vassiliadis, P., Sellis, T.-K.: Optimizing ETL processes in data warehouses. In: ICDE, pp. 564–575 (2005) Google Scholar
  30. 30.
    Simitsis, A., Wilkinson, K., Dayal, U., Castellanos, M.: Optimizing ETL workflows for fault-tolerance. In: ICDE, pp. 385–396 (2010) Google Scholar
  31. 31.
    Microsoft: Sql server integration services (2008). Available online: http://technet.microsoft.com/fr-fr/library/ms141026.aspx
  32. 32.
    Oracle: Oracle warehouse builder 11g release 2.1 (2009). Available online: http://www.oracle.com/technetwork/developer-tools/warehouse/documentation/library/index.html
  33. 33.
    IBM: IBM infosphere datastage (2008). Available online: http://www-01.ibm.com/software/data/infosphere/datastage/
  34. 34.
    Informatica: Informatica powercenter (2008). Available online: http://www.informatica.com/us/products/enterprise-data-integration/powercenter/
  35. 35.
    Skoutas, D., Simitsis, A.: Ontology-based conceptual design of ETL processes for both structured and semi-structured data. Int. J. Semantic Web Inf. Syst. 3(4), 1–24 (2007) CrossRefGoogle Scholar
  36. 36.
    Romero, O., Simitsis, A., Abelló, A.: Gem: requirement-driven generation of ETL and multidimensional conceptual designs. In: DaWaK, pp. 80–95 (2011) Google Scholar
  37. 37.
    Nebot, V., Berlanga, R.: Building data warehouses with semantic web data. Decis. Support Syst. 52(4), 853–868 (2012) CrossRefGoogle Scholar
  38. 38.
    Calvanese, D., Lenzerini, M., Nardi, D.: Description logics for conceptual data modeling. In: Logics for Databases and Information Systems, pp. 229–263 (1998) CrossRefGoogle Scholar
  39. 39.
    Brockmans, S., Haase, P., Serafini, L., Stuckenschmidt, H.: Formal and conceptual comparison of ontology mapping languages. In: Modular Ontologies, pp. 267–291 (2009) CrossRefGoogle Scholar
  40. 40.
    Guo, Y., Pan, Z., Heflin, J.: Lubm: a benchmark for owl knowledge base systems. J. Web Semant. 3(2–3), 158–182 (2005) CrossRefGoogle Scholar
  41. 41.
    Mayr, C., Zdun, U., Dustdar, S.: Model-driven integration and management of data access objects in process-driven soas. In: ServiceWave, pp. 62–73 (2008) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Nabila Berkani
    • 1
  • Ladjel Bellatreche
    • 2
    Email author
  • Selma Khouri
    • 1
    • 2
  1. 1.National High School for Computer Science (ESI)AlgiersAlgeria
  2. 2.LIAS/ISAE-ENSMAFuturoscopeFrance

Personalised recommendations