Software Quality Journal

, Volume 16, Issue 1, pp 79–106 | Cite as

Empirical studies to assess the understandability of data warehouse schemas using structural metrics

  • Manuel Angel SerranoEmail author
  • Coral Calero
  • Houari A. Sahraoui
  • Mario Piattini


Data warehouses are powerful tools for making better and faster decisions in organizations where information is an asset of primary importance. Due to the complexity of data warehouses, metrics and procedures are required to continuously assure their quality. This article describes an empirical study and a replication aimed at investigating the use of structural metrics as indicators of the understandability, and by extension, the cognitive complexity of data warehouse schemas. More specifically, a four-step analysis is conducted: (1) check if individually and collectively, the considered metrics can be correlated with schema understandability using classical statistical techniques, (2) evaluate whether understandability can be predicted by case similarity using the case-based reasoning technique, (3) determine, for each level of understandability, the subsets of metrics that are important by means of a classification technique, and assess, by means of a probabilistic technique, the degree of participation of each metric in the understandability prediction. The results obtained show that although a linear model is a good approximation of the relation between structure and understandability, the associated coefficients are not significant enough. Additionally, classification analyses reveal respectively that prediction can be achieved by considering structure similarity, that extracted classification rules can be used to estimate the magnitude of understandability, and that some metrics such as the number of fact tables have more impact than others.


Data warehouse Quality Metrics Empirical studies 



This research is part of the CALIPO project, supported by Dirección General de Investigación of the Ministerio de Ciencia y Tecnologia (TIC2003-07804-C05-03). This research is also part of the ENIGMAS project, supported by Junta de Comunidades de Castilla – La Mancha – Consejería de Ciencia y Tecnología (PBI-05-058). This work was performed during the stay of Houari Sahraoui at the University of Castilla-La Mancha under the “Programa Nacional De Ayudas Para La Movilidad de Profesores en Régimen de año sabático”, from Spanish Ministerio de Educación y Ciencia, REF: 2004-0161. We would like to thank all of the volunteer subjects who participated in these experiments whose inestimable assistance helped us reach the conclusions in this paper. We also want to thank the reviewers for their valuable comments.


  1. Anahory, S., & Murray, D. (1997). Data warehousing in the real world. Harlow, UK: Addison-Wesley.Google Scholar
  2. Basili, V. R., Shull, F., & Lanubille, F. (1999). Building knowledge through families of experiments. IEEE Transactions on Software Engineering, 25(4), 456–473.CrossRefGoogle Scholar
  3. Bouzeghoub, M., & Kedad, Z. (2002). Information and database quality, Chapter 8, Quality in data warehousing (pp. 163–198). Kluwer Academic Publishers.Google Scholar
  4. Briand, L., Morasca, S., & Basili, V. (1996). Property-based software engineering measurement. IEEE Transactions on Software Engineering, 22(1), 68–86.CrossRefGoogle Scholar
  5. Briand, L., Ikonomovski, S., Lounis, H., & Wüst, J. (1998). A Comprehensive investigation of quality factors in object-oriented designs: An industrial case study, Technical Report ISERN-98-29. Germany: Fraunhofer Institute for Experimental Software Engineering.Google Scholar
  6. Calero, C., Piattini, M., Pascual, C., & Serrano, M. (2001). Towards Data warehouse Quality Metrics, International Workshop on Design and Management of Data Warehouses (DMDW’01).Google Scholar
  7. Carver, J., Jaccheri, L., Morasca, S., & Shull, F. (2003). Issues in using students in empirical studies in software engineering education. In Proceedings of 2003 International Symposium on software metrics (METRICS 2003). Sydney, Australia. September 2003, pp. 239–249.Google Scholar
  8. Debevoise, N. T. (1999). The data warehouse method. NJ: Prentice Hall Upper Saddle River.Google Scholar
  9. Fenton, N., & Pfleeger, S. (1997). Software metrics: A rigorous approach (2nd ed.). London: Chapman & Hall.Google Scholar
  10. Flach, P., & Lachiche, N. (1999). 1BC: A First-Order Bayesian Classifier. In Proceedings of the Ninth International Workshop on inductive logic programming (ILP’99), volume 1634 of lecture notes in artificial intelligence, pp. 92–103.Google Scholar
  11. Godin, R., Mineau, G., Missaoui, R., St-Germain, M., & Faraj, N. (1995). Applying concept formation methods to software reuse. International Journal of Knowledge Engineering and Software Engineering, 5(1), 119–142.CrossRefGoogle Scholar
  12. Grosser, D., Sahraoui, H. A., & Valtchev, P. (2003). An analogy-based approach for predicting design stability of Java classes. In International Symposium on Software Metrics (METRICS’03), pp. 252–262.Google Scholar
  13. Hörst, M., Regnell, B., & Wohlin, C. (2000). Using students as subjects – A comparative study of students & professionals in lead-time impact assessment. In 4th Conference on empirical assessment & evaluation in software engineering, EASE, Keele University, UK.Google Scholar
  14. Huang, K.-T., Lee, Y. W., & Wang, R. Y. (1999). Quality information and knowledge. Prentice Hall: Upper Saddle River.Google Scholar
  15. Inmon, W. H. (1997). Building the data warehouse (2nd ed.). John Wiley and Sons.Google Scholar
  16. ISO. (2001). Software product evaluation-quality characteristics and guidelines for their use. Geneva: ISO/IEC Standard 9126.Google Scholar
  17. Jarke, M., LenzerinI, I. M., Vassilou, Y., & Vassiliadis, P. (2000). Fundamentals of data warehouses. Springer.Google Scholar
  18. Kimball, R., Reeves, L., Ross, M., & Thornthwaite, W. (1998). The data warehouse lifecycle toolkit. John Wiley and Sons.Google Scholar
  19. Kitchenham, B., Pfleegger, S., Pickard, L., Jones, P., Hoaglin, D., El-Emam, K., & Rosenberg, J. (2002). Preliminary guidelines for empirical research in software engineering. IEEE Transactions of Software Engineering, 28(8), 721–734.CrossRefGoogle Scholar
  20. Poels, G., & Dedene G. (1999). DISTANCE: A framework for software measure construction. Belgium: Dept. Applied Economics Katholieke Universiteit Leuven.Google Scholar
  21. Ramoni, M., & Sebastiani, P. (1999). Bayesian methods for intelligent data analysis. In: M. Berthold & D. J. Hand (Eds.), An introduction to intelligent data analysis. Springer: New York.Google Scholar
  22. Schneidewind, N. (2002). Body of knowledge for software quality measurement. IEEE Computer, 35(2), 77–83.Google Scholar
  23. Serrano, M., Calero, C., & Piattini, M. (2002). Validating metrics for data warehouses. IEE Proceedings SOFTWARE, 149(5), 161–166.CrossRefGoogle Scholar
  24. Serrano, M., Calero, C., & Piattini, M. (2005). An experimental replication with data warehouse metrics. International Journal of Data Warehousing & Mining, 1(4), 1–21.Google Scholar
  25. Wilson, D., & Martinez, T. (1997). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6, 1–34.zbMATHMathSciNetGoogle Scholar
  26. Wohlin, C., Runeson, P., Höst, M., Ohlson, M., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: An introduction. Kluwer Academic Publishers.Google Scholar
  27. Zuse, H. (1998). A framework of software measurement. Berlin: Walter de Gruyter.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Manuel Angel Serrano
    • 1
    Email author
  • Coral Calero
    • 1
  • Houari A. Sahraoui
    • 1
    • 2
  • Mario Piattini
    • 1
  1. 1.Alarcos Research Group – Department of Information Technologies and SystemsUniversidad de Castilla – La ManchaCiudad RealSpain
  2. 2.Dep. d’Informatique et de Recherche OpérationnelleUniversité de MontréalMontrealCanada

Personalised recommendations