Data Quality Control Based on Metric Data Models



We consider statistical edits defined on a metric data space spanned by the nonkey attributes (variables) of a given database. Integrity constraints are defined on this data space based on definitions, behavioral equations or a balance equation system. As an example think of a set of business or economic indicators. The variables are linked by the four basic arithmetic operations only. Assuming a multivariate Gaussian distribution and an error in the variables model estimation of the unknown (latent) variables can be carried out by a generalized least-squares (GLS) procedure. The drawback of this approach is that the equations form a non-linear equation system due to multiplication and division of variables, and that generally one assumes independence between all variables due to a lack of information in real applications. As there exists no finite parameter density family which is closed under all four arithmetic operations we use MCMC-simulation techniques, cf. Smith and Gelfand (1992) and Chib (2004) to derive the “exact” distributions in the non-normal case and under cross-correlation. The research can be viewed as an extension of Köppen and Lenz (2005) in the sense of studying the robustness of the GLS approach with respect to non-normality and correlation.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Physica-Verlag Heidelberg 2010

Authors and Affiliations

  1. 1.Institute of Production, Information Systems and Operations ResearchFreie Universität BerlinBerlinGermany
  2. 2.Institute of Statistics and EconometricsFreie Universität BerlinBerlinGermany

Personalised recommendations