Data Quality: Theory and Practice

  • Wenfei Fan
Conference paper

DOI: 10.1007/978-3-642-32281-5_1

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7418)
Cite this paper as:
Fan W. (2012) Data Quality: Theory and Practice. In: Gao H., Lim L., Wang W., Li C., Chen L. (eds) Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7418. Springer, Berlin, Heidelberg

Abstract

Real-life data are often dirty: inconsistent, inaccurate, incomplete, stale and duplicated. Dirty data have been a longstanding issue, and the prevalent use of Internet has been increasing the risks, in an unprecedented scale, of creating and propagating dirty data. Dirty data are reported to cost US industry billions of dollars each year. There is no reason to believe that the scale of the problem is any different in any other society that depends on information technology. With these comes the need for improving data quality, a topic as important as traditional data management tasks for coping with the quantity of the data.

We aim to provide an overview of recent advances in the area of data quality, from theory to practical techniques. We promote a conditional dependency theory for capturing data inconsistencies, a new form of dynamic constraints for data deduplication, a theory of relative information completeness for characterizing incomplete data, and a data currency model for answering queries with current values from possibly stale data in the absence of reliable timestamps. We also discuss techniques for automatically discovering data quality rules, detecting errors in real-life data, and for correcting errors with performance guarantees.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Wenfei Fan
    • 1
  1. 1.University of Edinburgh and Harbin Institute of TechnologyChina

Personalised recommendations