Improving the Web Usage Analysis Process: A UML Model of the ETL Process

  • Thilo Maier
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3932)


Integrating OLAP and Web usage analysis in a data warehousing environment is a promising approach for sophisticated analysis of the Web channel in multi-channel environments of organizations. Populating the data warehouse is a laborious and time-consuming task (especially for small projects), which is – in practice – a big obstacle for concrete ECRM projects. Especially if Web usage analysis researchers need to conduct experiments with a Web warehouse, an intuitive and easy to deploy ETL component is essential. In this paper we propose a logical object-oriented relational data storage model in UML, which is based on a formal model. A concrete Java instance of our model simplifies modeling and automating the ETL process. The Java instance of our model has been integrated into our WUSAN (Web USage ANalyis) system. Finally, we illustrate the usage of our model for Web usage analysis purposes, though the model is principally not restricted to this domain.


Data Warehouse Customer Relationship Management Calculated Attribute Semantic Basis Star Schema 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ansari, S., Kohavi, R., Mason, L., Zheng, Z.: Integrating E-Commerce and Data Mining: Architecture and Challenges. In: Cercone, N., Lin, T.Y., Wu, X. (eds.) Proceedings of the 2001 IEEE International Conference on Data Mining, ICDM 2001, San José, CA, USA, pp. 27–34. IEEE Computer Society Press, Los Alamitos (2001)CrossRefGoogle Scholar
  2. 2.
    ASCENTIAL (2004) (access, 06/01/2004),
  3. 3.
    Cooley, R., Mobasher, B., Srivastava, J.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proceedings of the 9th International Conference on Tools with Artificial Intelligence, ICTAI 1997, Newport Beach, CA, USA, pp. 558–567. IEEE Computer Society Press, Los Alamitos (1997)Google Scholar
  4. 4.
    Dinter, B., Sapia, C., Höfling, G., Blaschka, M.: The OLAP Market: State of the Art and Research Issues. In: Proceedings of the First ACM International Workshop on Data Warehousing and OLAP, DOLAP 1998, Washington, DC, USA, pp. 22–27. ACM Press, New York (1998)CrossRefGoogle Scholar
  5. 5.
    ETI (2004) (access, 06/01/2004),
  6. 6.
    Gorla, N.: Features to Consider in a Data Warehousing System. Communications of the ACM 46(11), 111–115 (2003)CrossRefGoogle Scholar
  7. 7.
    Han, J.: OLAP Mining: An Integration of OLAP with Data Mining. In: Spaccapietra, S., Maryanski, F.J. (eds.) Data Mining and Reverse Engineering: Searching for Semantics. Proceedings of the Seventh Conference on Database Semantics, DS-7, Leysin, Switzerland, pp. 3–20. Chapman & Hall, Boca Raton (1997)Google Scholar
  8. 8.
    Hu, X., Cercone, N.: A Data Warehouse/Online Analytic Processing Framework for Web Usage Mining and Business Intelligence Reporting. International Journal of Intelligent Systems 19(7), 585–606 (2004)CrossRefGoogle Scholar
  9. 9.
    INFORMATICA (2004) (access, 06/01/2004),
  10. 10.
    Joshi, K.P., Joshi, A., Yesha, Y.: On Using a Warehouse to Analyze Web Logs. Distributed and Parallel Databases 13(2), 161–180 (2003)CrossRefMATHGoogle Scholar
  11. 11.
    Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit, 1st edn. Wiley, Indianapolis, IN, USA (2004)Google Scholar
  12. 12.
    Kimball, R., Merz, R.: The Data Webhouse Toolkit, 1st edn. Wiley, Indianapolis, IN, USA (2000)Google Scholar
  13. 13.
    Kohavi, R.: Mining E-Commerce Data: The Good, the Bad, and the Ugly. In: Provost, F., Srikant, R., Schkolnick, M., Lee, D. (eds.) Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, San Francisco, CA, USA, pp. 8–13. ACM Press, New York (2001)Google Scholar
  14. 14.
    Kohavi, R., Mason, L., Parekh, R., Zheng, Z.: Lessons and Challenges from Mining Retail E-Commerce Data. Machine Learning 57(1/2), 83–115 (2004)CrossRefGoogle Scholar
  15. 15.
    Maier, T., Reinartz, T.: Evaluation of Web Usage Analysis Tools. Künstliche Intelligenz (1), 65–68 (2004)Google Scholar
  16. 16.
    Martyn, T.: Reconsidering Multi-Dimensional Schemas. SIGMOD Record 33(1), 83–88 (2004)CrossRefGoogle Scholar
  17. 17.
    MICROSOFT-DTS. Data Transformation Services (DTS). Microsoft Corporation (2004) (access, 06/01/2004),
  18. 18.
    MICROSOFT-MDX. MDX (Multidimensional Expressions). Microsoft Corporation (2004) (access, 04/19/2004),
  19. 19.
    MONDRIAN-OLAP (2004) (access, 04/19/2004),
  20. 20.
    OMG. Common Warehouse Metamodel Specification. Object Management Group (2001) (access, 06/18/2003),
  21. 21.
    Pan, S.L., Lee, J.-N.: Using e-CRM for a Unified View of the Customer. Communications of the ACM 46(4), 95–99 (2003)CrossRefGoogle Scholar
  22. 22.
    Payne, A.: The Multi-Channel Integration Process in Customer Relationship Management. White Paper, Cranfield School of Management, Cranfield University, Cranfield, UK (February 2003a) (access, 03/08/2005),
  23. 23.
    Payne, A.: A Strategic Framework for Customer Relationship Management. White Paper, Cranfield School of Management, Cranfield University, Cranfield, UK (May 2003b) (access, 03/05/2005),
  24. 24.
    Poole, J., Chang, D., Tolbert, D., Mellor, D.: Common Warehouse Metamodel. Developer’s Guide, 1st edn. Wiley, Indianapolis, IN, USA (2003)Google Scholar
  25. 25.
    Rahm, E., Stöhr, T.: Data-Warehouse-Einsatz zur Web-Zugriffsanalyse. In: Rahm, E., Vossen, G. (eds.) Web und Datenbanken. Konzepte, Architekturen, Anwendungen, 1st edn., pp. 335–362. Dpunkt Verlag, Heidelberg (2002)Google Scholar
  26. 26.
    Rivest, R.L.: The MD5 Message-Digest Algorithm (1992) (access, 07/09/2004),
  27. 27.
    Srivastava, J., Wang, J.-H., Lim, E.-P., Hwang, S.-Y.: A Case for Analytical Customer Relationship Management. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 14–27. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  28. 28.
    Stonebraker, M.: Too Much Middleware. SIGMOD Record 31(1), 97–106 (2002)CrossRefGoogle Scholar
  29. 29.
    Thess, M.: Xeli’s Intro. Introduction to Xelopes. White Paper, Prudsys AG (May 2004) (access, 07/09/2004),
  30. 30.
    Thess, M., Bolotnicov, M.: XELOPES Library Documentation Version 1.2.5. Prudsys AG (November 2004) (access, 07/09/2004),
  31. 31.
    Vassiliadis, P., Vagena, Z., Skiadopoulos, S., Karayannidis, N., Sellis, T.: ARKTOS: Towards the Modeling, Design, Control and Execution of ETL Processes. Information Systems 26(8), 537–561 (2001)CrossRefMATHGoogle Scholar
  32. 32.
    Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Conceptual Modeling for ETL Processes. In: Proceedings of the 5th ACM International Workshop on Data Warehousing and OLAP, McLean, VA, USA, pp. 14–21. ACM Press, New York (2002)Google Scholar
  33. 33.
    Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M.: A Framework for the Design of ETL Scenarios. In: Eder, J., Missikoff, M. (eds.) CAiSE 2003. LNCS, vol. 2681, pp. 520–535. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  34. 34.
    Zaïane, O.R., Xin, M., Han, J.: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs. In: Proceedings of the Advances in Digital Libraries Conference, ADL 1998, Santa Barbara, CA, USA, pp. 19–29. IEEE Computer Society, Los Alamitos (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Thilo Maier
    • 1
  1. 1.Catholic University Eichstätt-IngolstadtIngolstadtGermany

Personalised recommendations