Advertisement

A UML Based Approach for Modeling ETL Processes in Data Warehouses

  • Juan Trujillo
  • Sergio Luján-Mora
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2813)

Abstract

Data warehouses (DWs) are complex computer systems whose main goal is to facilitate the decision making process of knowledge workers. ETL (Extraction-Transformation-Loading) processes are responsible for the extraction of data from heterogeneous operational data sources, their transformation (conversion, cleaning, normalization, etc.) and their loading into DWs. ETL processes are a key component of DWs because incorrect or misleading data will produce wrong business decisions, and therefore, a correct design of these processes at early stages of a DW project is absolutely necessary to improve data quality. However, not much research has dealt with the modeling of ETL processes. In this paper, we present our approach, based on the Unified Modeling Language (UML), which allows us to accomplish the conceptual modeling of these ETL processes. We provide the necessary mechanisms for an easy and quick specification of the common operations defined in these ETL processes such as, the integration of different data sources, the transformation between source and target attributes, the generation of surrogate keys and so on. Another advantage of our proposal is the use of the UML (standardization, ease-of-use and functionality) and the seamless integration of the design of the ETL processes with the DW conceptual schema.

Keywords

ETL processes Data warehouses conceptual modeling UML 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Inmon, W.H.: Building the Data Warehouse. QED Press/John Wiley (1992); Last edition: 3rd edn. John Wiley & Sons (2002)Google Scholar
  2. 2.
    SQL Power Group: How do I ensure the success of my DW? (2002), Internet: http://www.sqlpower.ca/page/dw best practices
  3. 3.
    Strange, K.: ETLWas the Key to this Data Warehouse’s Success. Technical Report CS-15-3143, Gartner (2002)Google Scholar
  4. 4.
    Rahm, E., Do, H.: Data Cleaning: Problems and Current Approaches. IEEE Bulletin of the Technical Committee on Data Engineering 23, 3–13 (2000)Google Scholar
  5. 5.
    Friedman, T.: ETL Magic Quadrant Update: Market Pressure Increases. Technical Report M-19-1108, Gartner (2003)Google Scholar
  6. 6.
    Greenfield, L.: Data Extraction, Transforming, Loading (ETL) Tools. The Data Warehousing Information Center (2003), Internet http://www.dwinfocenter.org/clean.html
  7. 7.
    Agosta, L.: Market Overview Update: ETL. Technical Report RPA-032002-00021, Giga Information Group (2002)Google Scholar
  8. 8.
    Kimball, R.: The Data Warehouse Toolkit. John Wiley & Sons, Chichester (1996); Last edition: 2nd edn. John Wiley & Sons (2002)Google Scholar
  9. 9.
    Object Management Group (OMG): Unified Modeling Language Specification 1.4 (2001), Internet http://www.omg.org/cgi-bin/doc?formal/01-09-67
  10. 10.
    Trujillo, J., Palomar, M., Gómez, J., Song, I.: Designing Data Warehouses with OO Conceptual Models. IEEE Computer, special issue on Data Warehouses 34, 66–75 (2001)Google Scholar
  11. 11.
    Luján-Mora, S., Trujillo, J., Song, I.: Extending UML for Multidimensional Modeling. In: Jézéquel, J.-M., Hussmann, H., Cook, S. (eds.) UML 2002. LNCS, vol. 2460, pp. 290–304. Springer, Heidelberg (2002)Google Scholar
  12. 12.
    Luján-Mora, S., Trujillo, J., Song, I.: Multidimensional Modeling with UML Package Diagrams. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 199–213. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Eckerson, W.: Data Quality and the Bottom Line. Technical report, The Data Warehousing Institute (2002)Google Scholar
  14. 14.
    Naiburg, E., Maksimchuk, R.: UML for Database Design. Addison-Wesley, Reading (2001)Google Scholar
  15. 15.
    Golfarelli, M., Rizzi, S.: A methodological Framework for Data Warehouse Design. In: Proc. of the ACM 1st Intl. Workshop on Data warehousing and OLAP (DOLAP 1998), Washington D.C., USA, pp. 3–9 (1998)Google Scholar
  16. 16.
    Sapia, C., Blaschka, M., Höfling, G., Dinter, B.: Extending the E/R Model for the Multidimensional Paradigm. In: Kambayashi, Y., Lee, D.-L., Lim, E.-p., Mohania, M., Masunaga, Y. (eds.) ER Workshops 1998. LNCS, vol. 1552, pp. 105–116. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  17. 17.
    Tryfona, N., Busborg, F., Christiansen, J.: starER: A Conceptual Model for Data Warehouse Design. In: Proc. of the ACM 2nd Intl. Workshop on Data warehousing and OLAP (DOLAP 1999), Kansas City, Missouri, USA (1999)Google Scholar
  18. 18.
    Husemann, B., Lechtenborger, J., Vossen, G.: Conceptual Data Warehouse Design. In: Proc. of the 2nd. Intl. Workshop on Design and Management of Data Warehouses (DMDW 2000), Stockholm, Sweden, pp. 3–9 (2000)Google Scholar
  19. 19.
    Abelló, A., Samos, J., Saltor, F.: YAM2 (Yet Another Multidimensional Model): An Extension of UML. In: International Database Engineering & Applications Symposium (IDEAS 2002), Edmonton, Canada, pp. 172–181 (2002)Google Scholar
  20. 20.
    National Technical University of Athens (Greece): Knowledge and Database Systems Laboratory (2003), Internet http://www.dblab.ntua.gr/
  21. 21.
    Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual Modeling for ETL Processes. In: 5th ACM International Workshop on Data Warehousing and OLAP (DOLAP 2002), McLean, USA, pp. 14–21 (2002)Google Scholar
  22. 22.
    Vassiliadis, P., Vagena, Z., Skiadopoulos, S., Karayannidis, N., Sellis, T.: ARKTOS: towards the modeling, design, control and execution of ETL processes. Information Systems, 537–561 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Juan Trujillo
    • 1
  • Sergio Luján-Mora
    • 1
  1. 1.Dept. de Lenguajes y Sistemas InformáticosUniversidad de AlicanteSpain

Personalised recommendations