Advertisement

Ontology-Driven Conceptual Design of ETL Processes Using Graph Transformations

  • Dimitrios Skoutas
  • Alkis Simitsis
  • Timos Sellis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5530)

Abstract

One of the main tasks during the early steps of a data warehouse project is the identification of the appropriate transformations and the specification of inter-schema mappings from the source to the target data stores. This is a challenging task, requiring firstly the semantic and secondly the structural reconciliation of the information provided by the available sources. This task is a part of the Extract-Transform-Load (ETL) process, which is responsible for the population of the data warehouse. In this paper, we propose a customizable and extensible ontology-driven approach for the conceptual design of ETL processes. A graph-based representation is used as a conceptual model for the source and target data stores. We then present a method for devising flows of ETL operations by means of graph transformations. In particular, the operations comprising the ETL process are derived through graph transformation rules, the choice and applicability of which are determined by the semantics of the data with respect to an attached domain ontology. Finally, we present our experimental findings that demonstrate the applicability of our approach.

Keywords

Source Node Intermediate Node Data Warehouse Target Node Operation Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual Modeling for ETL Processes. In: DOLAP, pp. 14–21 (2002)Google Scholar
  2. 2.
    Luján-Mora, S., Vassiliadis, P., Trujillo, J.: Data Mapping Diagrams for Data Warehouse Design with UML. In: ER, pp. 191–204 (2004)Google Scholar
  3. 3.
    Trujillo, J., Luján-Mora, S.: A UML Based Approach for Modeling ETL Processes in Data Warehouses. In: ER, pp. 307–320 (2003)Google Scholar
  4. 4.
    IBM: IBM Data Warehouse Manager (2006), http://www.ibm.com/software/data/db2/datawarehouse/
  5. 5.
    Informatica: Informatica PowerCenter (2007), http://www.informatica.com/products/powercenter/
  6. 6.
    Microsoft: Microsoft Data Transformation Services (2007), http://www.microsoft.com/sql/prodinfo/features/
  7. 7.
    Oracle: Oracle Warehouse Builder (2007), http://www.oracle.com/technology/products/warehouse/
  8. 8.
    Hüsemann, B., Lechtenbörger, J., Vossen, G.: Conceptual Data Warehouse Modeling. In: DMDW, p. 6 (2000)Google Scholar
  9. 9.
    Borst, W.N.: Construction of Engineering Ontologies for Knowledge Sharing and Reuse. PhD thesis, University of Enschede (1997)Google Scholar
  10. 10.
    Skoutas, D., Simitsis, A.: Ontology-Based Conceptual Design of ETL Processes for Both Structured and Semi-Structured Data. Int. J. Semantic Web Inf. Syst. 3(4), 1–24 (2007)Google Scholar
  11. 11.
    Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)zbMATHCrossRefGoogle Scholar
  12. 12.
    Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  13. 13.
    Simitsis, A., Skoutas, D., Castellanos, M.: Natural Language Reporting for ETL Processes. In: DOLAP, pp. 65–72 (2008)Google Scholar
  14. 14.
    Skoutas, D., Simitsis, A.: Flexible and Customizable NL Representation of Requirements for ETL processes. In: NLDB, pp. 433–439 (2007)Google Scholar
  15. 15.
    Manola, F., Miller, E.: Rdf primer. W3C Recommendation, W3C (February 2004)Google Scholar
  16. 16.
    Brickley, D., Guha, R.: Rdf vocabulary description language 1.0: Rdf schema. W3C Recommendation, W3C (February 2004)Google Scholar
  17. 17.
    McGuinness, D.L., van Harmelen, F.: OWL Web Ontology Language Overview. W3C Recommendation, W3C (February 2004)Google Scholar
  18. 18.
    Skoutas, D., Simitsis, A.: Designing ETL Processes Using Semantic Web Technologies. In: DOLAP, pp. 67–74 (2006)Google Scholar
  19. 19.
    Rozenberg, G. (ed.): Handbook of Graph Grammars and Computing by Graph Transformations. Foundations, vol. 1. World Scientific, Singapore (1997)zbMATHGoogle Scholar
  20. 20.
    Simitsis, A., Vassiliadis, P., Sellis, T.K.: State-Space Optimization of ETL Workflows. IEEE Trans. Knowl. Data Eng. 17(10), 1404–1419 (2005)CrossRefGoogle Scholar
  21. 21.
    Tzitzikas, Y., Hainaut, J.L.: How to Tame a Very Large ER Diagram (Using Link Analysis and Force-Directed Drawing Algorithms). In: Delcambre, L.M.L., Kop, C., Mayr, H.C., Mylopoulos, J., Pastor, Ó. (eds.) ER 2005. LNCS, vol. 3716, pp. 144–159. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  22. 22.
    Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A Generic and Customizable Framework for the Design of ETL Scenarios. Inf. Syst. 30(7), 492–525 (2005)CrossRefGoogle Scholar
  23. 23.
    AGG: AGG Homepage (2007), http://tfs.cs.tu-berlin.de/agg
  24. 24.
    Papastefanatos, G., Vassiliadis, P., Simitsis, A., Vassiliou, Y.: Policy-regulated Management of ETL Evolution. J. Data Semantics (to appear)Google Scholar
  25. 25.
    Mazón, J.N., Trujillo, J.: Enriching data warehouse dimension hierarchies by using semantic relations. In: Bell, D.A., Hong, J. (eds.) BNCOD 2006. LNCS, vol. 4042, pp. 278–281. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  26. 26.
    Niemi, T., Toivonen, S., Niinimäki, M., Nummenmaa, J.: Ontologies with Semantic Web/Grid in Data Integration for OLAP. Int. J. Semantic Web Inf. Syst. 3(4), 25–49 (2007)Google Scholar
  27. 27.
    Romero, O., Abelló, A.: Automating Multidimensional Design from Ontologies. In: DOLAP, pp. 1–8 (2007)Google Scholar
  28. 28.
    Kedad, Z., Métais, E.: Ontology-based data cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  29. 29.
    Gottlob, G.: Web Data Extraction for Business Intelligence: The Lixto Approach. In: BTW, pp. 30–47 (2005)Google Scholar
  30. 30.
    Mazón, J.N., Trujillo, J., Serrano, M., Piattini, M.: Applying MDA to the development of data warehouses. In: DOLAP, pp. 57–66 (2005)Google Scholar
  31. 31.
  32. 32.
    Ehrig, K., Guerra, E., de Lara, J., Lengyel, L., Levendovszky, T., Prange, U., Taentzer, G., Varró, D., Gyapay, S.V.: Model transformation by graph transformation: A comparative study. In: MTiP (2005)Google Scholar
  33. 33.
    Sanfeliu, A., Fu, K.: A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. SMC 13(3), 353–362 (1983)zbMATHGoogle Scholar
  34. 34.
    Messmer, B.T., Bunke, H.: A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 493–504 (1998)CrossRefGoogle Scholar
  35. 35.
    Myers, R., Wilson, R.C., Hancock, E.R.: Bayesian Graph Edit Distance. IEEE Trans. Pattern Anal. Mach. Intell. 22(6), 628–635 (2000)CrossRefGoogle Scholar
  36. 36.
    Yahoo!: Pipes (2007), http://pipes.yahoo.com/
  37. 37.
    Microsoft: Popfly (2007), http://www.popfly.com/
  38. 38.
    Google: Mashup Editor (2007), http://www.googlemashups.com/
  39. 39.
    Huynh, D.F., Miller, R.C., Karger, D.R.: Potluck: Semi-ontology alignment for casual users. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 903–910. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  40. 40.
    Ambite, J.L., Kapoor, D.: Automatically Composing Data Workflows with Relational Descriptions and Shim Services. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 15–29. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  41. 41.
    Petrovic, M., Liu, H., Jacobsen, H.A.: G-ToPSS: Fast Filtering of Graph-based Metadata. In: WWW, pp. 539–547 (2005)Google Scholar
  42. 42.
    Giunchiglia, F., Yatskevich, M., Shvaiko, P.: Semantic Matching: Algorithms and Implementation. In: Spaccapietra, S., Atzeni, P., Fages, F., Hacid, M.-S., Kifer, M., Mylopoulos, J., Pernici, B., Shvaiko, P., Trujillo, J., Zaihrayeu, I. (eds.) Journal on Data Semantics IX. LNCS, vol. 4601, pp. 1–38. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Dimitrios Skoutas
    • 1
    • 2
  • Alkis Simitsis
    • 3
  • Timos Sellis
    • 2
  1. 1.National Technical University of AthensAthensGreece
  2. 2.Institute for the Management of Information Systems, R.C. “Athena”AthensGreece
  3. 3.HP Labs and Stanford UniversityPalo AltoUSA

Personalised recommendations