Data Quality Aware Queries in Collaborative Information Systems

  • N. K. Yeganeh
  • S. Sadiq
  • K. Deng
  • X. Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5446)


The issue of data quality is gaining importance as individuals as well as corporations are increasingly relying on multiple, often external sources of data to make decisions. Traditional query systems do not factor in data quality considerations in their response. Studies into the diverse interpretations of data quality indicate that fitness for use is a fundamental criteria in the evaluation of data quality. In this paper, we present a 4 step methodology that includes user preferences for data quality in the response of queries from multiple sources. User preferences are modelled using the notion of preference hierarchies. We have developed an SQL extension to facilitate the specification of preference hierarchies. Further, we will demonstrate through experimentation how our approach produces an improved result in query response.


Data Quality Data Envelopment Analysis Analytical Hierarchy Process User Preference User Comment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques (Data-Centric Systems and Applications). Springer, New York (2006)zbMATHGoogle Scholar
  2. 2.
    Benjelloun, O., Garcia-Molina, H., Su, Q., Widom, J.: Swoosh: A generic approach to entity resolution. VLDB Journal (2008)Google Scholar
  3. 3.
    Bohannon, P., Wenfei, F., Geerts, F., Xibei, J., Kementsietsidis, A.: Conditional functional dependencies for data cleaning. In: ICDE (2007)Google Scholar
  4. 4.
    Borzsonyi, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proc. of ICDE, pp. 421–430 (2001)Google Scholar
  5. 5.
    Chomicki, J.: Querying with intrinsic preferences. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 34–51. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: Proceedings of the 33rd international conference on Very large data bases, pp. 315–326 (2007)Google Scholar
  7. 7.
    Fishburn, P.: Preference structures and their numerical representations. Theoretical Computer Science 217(2), 359–383 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Friedman, T., Bitterer, A.: Magic Quadrant for Data Quality Tools. Gartner Group (2006)Google Scholar
  9. 9.
    Govindarajan, K., Jayaraman, B., Mantha, S.: Preference Queries in Deductive Databases. New Generation Computing 19(1), 57–86 (2000)CrossRefzbMATHGoogle Scholar
  10. 10.
    Gravano, L., Ipeirotis, P.G., Jagadish, H.V., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate String Joins in a Database (Almost) for Free. In: Proceedings of the international conference on very large data bases, pp. 491–500 (2001)Google Scholar
  11. 11.
    Gravano, L., Ipeirotis, P.G., Koudas, N., Srivastava, D.: Text Joins for Data Cleansing and Integration in an RDBMS. In: Proc. of Int. Conf. on Data Engineering (ICDE) (2003)Google Scholar
  12. 12.
    Hwang, C.L., Yoon, K.: Multiple Attribute Decision Making: Methods and Appllication. Lecture Notes in Economics and Mathematical Systems. Springer, Heidelberg (1981)CrossRefGoogle Scholar
  13. 13.
    Kießling, W.: Foundations of preferences in database systems. In: Proceedings of the 28th international conference on Very Large Data Bases, pp. 311–322. VLDB Endowment (2002)Google Scholar
  14. 14.
    Lacroix, M., Lavency, P.: Preferences: Putting More Knowledge into Queries. In: Proceedings of the 13th International Conference on Very Large Data Bases, pp. 217–225. Morgan Kaufmann Publishers Inc., San Francisco (1987)Google Scholar
  15. 15.
    Lakshmanan, L.V.S., Leone, N., Ross, R., Subrahmanian, V.S.: ProbView: a flexible probabilistic database system. ACM Transactions on Database Systems (TODS) 22(3), 419–469 (1997)CrossRefGoogle Scholar
  16. 16.
    Mantha, S.M.: First-order preference theories and their applications. PhD thesis, Mathematics, Salt Lake City, UT, USA (1992)Google Scholar
  17. 17.
    Naumann, F.: Quality-Driven Query Answering for Integrated Information Systems. LNCS, vol. 2261. Springer, Heidelberg (2002)zbMATHGoogle Scholar
  18. 18.
    Naumann, F., Freytag, J.C., Spiliopoulou, M.: Qualitydriven source selection using Data Envelopment Analysis. In: Proc. of the 3rd Conference on Information Quality (IQ), Cambridge, MA (1998)Google Scholar
  19. 19.
    Redman, T.C.: Data Quality for the Information Age. Artech House, Inc., Norwood (1997)Google Scholar
  20. 20.
    Redman, T.C.: The impact of poor data quality on the typical enterprise. Communications of the ACM 41(2), 79–82 (1998)CrossRefGoogle Scholar
  21. 21.
    Saaty, T.L.: How to Make a Decision: The Analytic Hierarchy Process. European Journal of Operational Research 48(1), 9–26 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Saaty, T.L.: Multicriteria Decision Making: The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation. RWS Publications (1996)Google Scholar
  23. 23.
    Scannapieco, M., Missier, P., Batini, C.: Data quality at a glance. Datenbank-Spektrum 14, 6–14 (2005)Google Scholar
  24. 24.
    Simmhan, Y.L., Plale, B., Gannon, D.: A Survey of Data Provenance in e-Science. SIGMOD RECORD 34(3), 31 (2005)CrossRefGoogle Scholar
  25. 25.
    von Wright, G.H.: The Logic of Preference. Edinburgh University Press (1963)Google Scholar
  26. 26.
    Wang, R.Y., Kon, H.B.: Toward total data quality management (TDQM). Prentice-Hall, Inc., Upper Saddle River (1993)Google Scholar
  27. 27.
    Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Transactions on Knowledge and Data Engineering 7(4), 623–640 (1995)CrossRefGoogle Scholar
  28. 28.
    Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. Journal of Management Information Systems 12(4), 5–33 (1996)CrossRefGoogle Scholar
  29. 29.
    Wellman, M.P., Doyle, J.: Preferential semantics for goals. In: Proceedings of the National Conference on Artificial Intelligence, pp. 698–703 (1991)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • N. K. Yeganeh
    • 1
  • S. Sadiq
    • 1
  • K. Deng
    • 1
  • X. Zhou
    • 1
  1. 1.School of Information Technology and Electrical EngineeringThe University of QueenslandAustralia

Personalised recommendations