ETLDiff: A Semi-automatic Framework for Regression Test of ETL Software

  • Christian Thomsen
  • Torben Bach Pedersen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4081)


Modern software development methods such as Extreme Programming (XP) favor the use of frequently repeated tests, so-called regression tests, to catch new errors when software is updated or tuned, by checking that the software still produces the right results for a reference input. Regression testing is also very valuable for Extract–Transform–Load (ETL) software, as ETL software tends to be very complex and error-prone. However, regression testing of ETL software is currently cumbersome and requires large manual efforts. In this paper, we describe a novel, easy–to–use, and efficient semi–automatic test framework for regression test of ETL software. By automatically analyzing the schema, the tool detects how tables are related, and uses this knowledge, along with optional user specifications, to determine exactly what data warehouse (DW) data should be identical across test ETL runs, leaving out change-prone values such as surrogate keys. The framework also provides tools for quickly detecting and displaying differences between the current ETL results and the reference results. In summary, manual work for test setup is reduced to a minimum, while still ensuring an efficient testing procedure.


Data Warehouse Database Model Fact Table Reference Result Extreme Program 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Beck, K.: Extreme Programming Explained: Embrace Change. Addison-Wesley Professional, Reading (1999)Google Scholar
  2. 2.
    Chays, D., Dan, S., Frankl, P., Vokolos, F.I., Weyuker, E.J.: A Framework for Testing Database Applications. In: Proceedings of ISSTA 2000, pp. 147–157 (2000)Google Scholar
  3. 3.
    Christensen, C.A., Gundersborg, S., de Linde, K., Torp, K.: A Unit-Test Framework for Database Applications, TR-15,
  4. 4.
    Daou, B., Haraty, R.A., Mansour, N.: Regression Testing of Database Applications. In: Proceedings of SAC 2001, pp. 285–290 (2001)Google Scholar
  5. 5.
    Cobéna, G., Abdessalem, T., Hinnach, Y.: A comparative study for XML change detection. TR (April 2002) (last accessed June 9, 2006),
  6. 6. (last accessed June 9, 2006)
  7. 7. (last accessed June 9, 2006)
  8. 8.
    Jensen, M.R., Holmgren, T., Pedersen, T.B.: Discovering Multidimensional Structure in Relational Data. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2004. LNCS, vol. 3181, pp. 138–148. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Kimball, R., Reeves, L., Ross, M., Thornthwaite, W.: The Data Warehouse Lifecycle Toolkit. Wiley, Chichester (1998)Google Scholar
  10. 10.
    Kimball, R., Ross, M.: The Data Warehouse Toolkit, 2nd edn. Wiley, Chichester (2002)Google Scholar
  11. 11.
    Knudsen, S.U., Pedersen, T.B., Thomsen, C., Torp, K.: RELAXML: Bidirectional Transfer between Relational and XML Data. In: Proceedings of IDEAS 2005, pp. 151–162 (2005)Google Scholar
  12. 12.
    Microsoft Corporation. SQL Server Integration Services (last accessed June 9, 2006),
  13. 13.
    Peters, L.: Change Detection in XML Trees: a Survey. In: 3rd Twente Student Conference on IT (2005),
  14. 14. (last accessed June 9, 2006)
  15. 15. (last accessed June 9, 2006)
  16. 16.
    Willmor, D., Embury, S.: A safe regression test selection technique for database-driven applications. In: Proceedings of ICSM 2005, pp. 421–430 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Christian Thomsen
    • 1
  • Torben Bach Pedersen
    • 1
  1. 1.Department of Computer ScienceAalborg University 

Personalised recommendations