Measuring and Comparing Effectiveness of Data Quality Techniques

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5565)


Poor quality data may be detected and corrected by performing various quality assurance activities that rely on techniques with different efficacy and cost. In this paper, we propose a quantitative approach for measuring and comparing the effectiveness of these data quality (DQ) techniques. Our definitions of effectiveness are inspired by measures proposed in Information Retrieval. We show how the effectiveness of a DQ technique can be mathematically estimated in general cases, using formal techniques that are based on probabilistic assumptions. We then show how the resulting effectiveness formulas can be used to evaluate, compare and make choices involving DQ techniques.


data quality technique data quality measure data quality assurance 


  1. 1.
    Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications), 1st edn. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  2. 2.
    Jiang, L., Topaloglou, T., Borgida, A., Mylopoulos, J.: Goal-oriented conceptual database design. In: Proceedings of the 15th IEEE International Requirements Engineering Conference (RE 2007) (2007)Google Scholar
  3. 3.
    Jiang, L., Borgida, A., Topaloglou, T., Mylopoulos, J.: Data quality by design: A goal-oriented approach. In: Proceedings of the 12th International Conference on Info. Quality (ICIQ 2007) (2007)Google Scholar
  4. 4.
    Bohannon, P., Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: IEEE 23rd International Conference on Data Engineering, 2007. ICDE 2007, pp. 746–755 (2007)Google Scholar
  5. 5.
    Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33(2), 1–48 (2008)CrossRefGoogle Scholar
  6. 6.
    van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworth, London (1979)zbMATHGoogle Scholar
  7. 7.
    Barbará, D., Goel, R., Jajodia, S.: Using checksums to detect data corruption. In: Advances in Database Technology — EDBT 2000, pp. 136–149 (2000)Google Scholar
  8. 8.
    Fenton, N.E., Pfleeger, S.L.: Software Metrics: A Rigorous and Practical Approach. PWS Publishing Co., Boston (1998)Google Scholar
  9. 9.
    Ballou, D., Wang, R., Pazer, H., Tayi, G.K.: Modeling information manufacturing systems to determine information product quality. Manage. Sci. 44(4), 462–484 (1998)CrossRefzbMATHGoogle Scholar
  10. 10.
    Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002)CrossRefGoogle Scholar
  11. 11.
    Ballou, D.P., Pazer, H.L.: Modeling completeness versus consistency tradeoffs in information decision contexts. IEEE Trans. on Knowl. and Data Engineering 15(1), 240–243 (2003)CrossRefGoogle Scholar
  12. 12.
    Gu, L., Baxter, R., Vickers, D., Rainsford, C.: Record linkage: Current practice and future directions. Technical report, CSIRO Mathematical and Information Sciences (2003)Google Scholar
  13. 13.
    Christen, P., Goiser, K.: Quality and complexity measures for data linkage and deduplication. In: Guillet, F., Hamilton, H.J. (eds.) Quality Measures in Data Mining. Studies in Computational Intelligence, vol. 43, pp. 127–151. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Batini, C., Ceri, S., Navathe, S.B.: Conceptual Database Design: An Entity-Relationship Approach. Benjamin/Cummings (1992)Google Scholar
  15. 15.
    Moody, D.L.: Metrics for evaluating the quality of entity relationship models. In: Ling, T.-W., Ram, S., Li Lee, M. (eds.) ER 1998. LNCS, vol. 1507, pp. 211–225. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  16. 16.
    Piattini, M., Calero, C., Genero, M.: Table oriented metrics for relational databases. Software Quality Journal 9(2), 79–97 (2001)CrossRefGoogle Scholar
  17. 17.
    Calero, C., Piattini, M.: Metrics for databases: a way to assure the quality. In: Piattini, M.G., Calero, C., Genero, M. (eds.) Information and database quality, pp. 57–84. Kluwer Academic Publishers, Norwell (2002)CrossRefGoogle Scholar
  18. 18.
    Baroni, A.L., Calero, C., Abreu, F.B., Piattini, M.: Object-relational database metrics formalization. In: Sixth International Conference on Quality Software, pp. 30–37. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  19. 19.
    Serrano, M.A., Calero, C., Piattini, M.: Metrics for data warehouse quality. In: Khosrow-Pour, M. (ed.) Encyclopedia of Info. Sci. and Techno. (IV), pp. 1938–1944. Idea Group (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  1. 1.Dept. of Computer ScienceUniversity of TorontoCanada
  2. 2.Dept. of Computer ScienceUniversità di Milano BicoccaItaly
  3. 3.Dept. of Computer ScienceRutgers UniversityUSA
  4. 4.Dept. of Information Engineering and Computer ScienceUniversity of TrentoItaly

Personalised recommendations