Analyzing Data Quality Using Data Auditor
Monitoring databases maintain configuration and measurement tables about computer systems, such as networks and computing clusters, and serve important business functions, such as troubleshooting customer problems, analyzing equipment failures, planning system upgrades, etc. These databases are prone to many data quality issues: configuration tables may be incorrect due to data entry errors, while measurement tables may be affected by incorrect, missing, duplicate and delayed polls.
We describe Data Auditor, a system for analyzing data quality and exploring data semantics of monitoring databases. Given a user-supplied constraint, such as a boolean predicate expected to be satisfied by every tuple, a functional dependency, or an inclusion dependency, Data Auditor computes "pattern tableaux", which are concise summaries of subsets of the data that satisfy or fail the constraint. We discuss the architecture of Data Auditor, including the supported types of constraints and the tableau generation mechanism. We also show the utility of our approach on an operational network monitoring database.