Skip to main content

Conditional Dependencies: A Principled Approach to Improving Data Quality

  • Conference paper
Dataspace: The Final Frontier (BNCOD 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5588))

Included in the following conference series:

Abstract

Real-life data is often dirty and costs billions of pounds to businesses worldwide each year. This paper presents a promising approach to improving data quality. It effectively detects and fixes inconsistencies in real-life data based on conditional dependencies, an extension of database dependencies by enforcing bindings of semantically related data values. It accurately identifies records from unreliable data sources by leveraging relative candidate keys, an extension of keys for relations by supporting similarity and matching operators across relations. In contrast to traditional dependencies that were developed for improving the quality of schema, the revised constraints are proposed to improve the quality of data. These constraints yield practical techniques for data repairing and record matching in a uniform framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In: PODS (1999)

    Google Scholar 

  2. Batini, C., Scannapieco, M.: Data Quality: Concepts, Methodologies and Techniques. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  3. Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: SIGMOD (2005)

    Google Scholar 

  4. Bravo, L., Fan, W., Geerts, F., Ma, S.: Increasing the expressivity of conditional functional dependencies without extra complexity. In: ICDE (2008)

    Google Scholar 

  5. Bravo, L., Fan, W., Ma, S.: Extending dependencies with conditions. In: VLDB (2007)

    Google Scholar 

  6. Chiang, F., Miller, R.: Discovering data quality rules. In: VLDB (2008)

    Google Scholar 

  7. Chomicki, J.: Consistent query answering: Five easy pieces. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 1–17. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Inf. Comput. 197(1-2), 90–121 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: Consistency and accuracy. In: VLDB (2007)

    Google Scholar 

  10. Eckerson, W.: Data quality and the bottom line: Achieving business success through a commitment to high quality data. The Data Warehousing Institute (2002)

    Google Scholar 

  11. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. TKDE 19(1), 1–16 (2007)

    Google Scholar 

  12. English, L.: Plain English on data quality: Information quality management: The next frontier. DM Review Magazine (April 2000)

    Google Scholar 

  13. Fagin, R., Vardi, M.Y.: The theory of data dependencies - An overview. In: Paredaens, J. (ed.) ICALP 1984. LNCS, vol. 172, Springer, Heidelberg (1984)

    Google Scholar 

  14. Fan, W.: Dependencies revisited for improving data quality. In: PODS (2008)

    Google Scholar 

  15. Fan, W., Geerts, F.: Relative information completeness. In: PODS (2009)

    Google Scholar 

  16. Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. TODS 33(2) (June 2008)

    Google Scholar 

  17. Fan, W., Geerts, F., Jia, X.: SEMANDAQ: A data quality system. based on conditional functional dependencies. In: VLDB, demo (2008)

    Google Scholar 

  18. Fan, W., Geerts, F., Lakshmanan, L., Xiong, M.: Discovering conditional functional dependencies. In: ICDE (2009)

    Google Scholar 

  19. Fan, W., Ma, S., Hu, Y., Liu, J., Wu, Y.: Propagating functional dependencies with conditions. In: VLDB (2008)

    Google Scholar 

  20. Fellegi, I., Holt, D.: A systematic approach to automatic edit and imputation. J. American Statistical Association 71(353), 17–35 (1976)

    Article  Google Scholar 

  21. Flesca, S., Furfaro, F., Greco, S., Zumpano, E.: Querying and repairing inconsistent XML data. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 175–188. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  22. Golab, L., Karloff, H., Korn, F., Srivastava, D., Yu, B.: On generating near-optimal tableaux for conditional functional dependencies. In: VLDB (2008)

    Google Scholar 

  23. Herzog, T.N., Scheuren, F.J., Winkler, W.E.: Data Quality and Record Linkage Techniques. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  24. Loshin, D.: Master Data Management, Knowledge Integrity Inc. (2009)

    Google Scholar 

  25. Imieliński, T., Lipski Jr., W.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)

    MathSciNet  MATH  Google Scholar 

  26. van der Meyden, R.: Logical approaches to incomplete information: A survey. In: Chomicki, J., Saake, G. (eds.) Logics for Databases and Information Systems, pp. 307–356 (1998)

    Google Scholar 

  27. Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)

    Google Scholar 

  28. Redman, T.: The impact of poor data quality on the typical enterprise. Commun. ACM 41(2), 79–82 (1998)

    Article  Google Scholar 

  29. Shilakes, C., Tylman, J.: Enterprise information portals. Merrill Lynch (1998)

    Google Scholar 

  30. Wijsen, J.: Database repairing using updates. TODS 30(3), 722–768 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fan, W., Geerts, F., Jia, X. (2009). Conditional Dependencies: A Principled Approach to Improving Data Quality. In: Sexton, A.P. (eds) Dataspace: The Final Frontier. BNCOD 2009. Lecture Notes in Computer Science, vol 5588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02843-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02843-4_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02842-7

  • Online ISBN: 978-3-642-02843-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics