A new approach for data editing and imputation

Delgado-Quintero, Sergio; Salazar-González, Juan-José

doi:10.1007/s00186-008-0237-6

A new approach for data editing and imputation

Original Article
Published: 21 August 2008

Volume 68, pages 407–428, (2008)
Cite this article

Mathematical Methods of Operations Research Aims and scope Submit manuscript

Sergio Delgado-Quintero¹ &
Juan-José Salazar-González¹

93 Accesses
1 Altmetric
Explore all metrics

Abstract

The editing-and-imputation problem concerns the question of finding errors in a record which does not satisfy a set of consistency rules. Once some potential errors have been localizated, it is also necessary to impute new values to the associated fields. The output dataset should consist of valid records and preserve similar statistical properties as the input dataset. Most of this work is usually done manually by statistical agencies, thus consuming a great deal of human resources. This paper presents a mathematical programming model to optimally solve the problem on surveys with categorical values and particular edits. We also describe a heuristic approach to deal with the more complex surveys. The heuristic procedure follows a combination of the widely-accepted hot-deck donor scheme and the multivariate regression analysis. It has been implemented in a graphical user interface running on standard personal computers, and has been tested on real-world surveys. This paper demonstrates the satisfactory performance of our automatic procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiply robust imputation procedures for zero-inflated distributions in surveys

Article 11 October 2017

Investigating the Performance of a Variation of Multiple Correspondence Analysis for Multiple Imputation in Categorical Data Sets

Article 03 October 2017

Efficient Imputation Methods to Handle Missing Data in Sample Surveys

Article 02 June 2022

References

Bell C, Nerode A, Ng RT, Subrahmanian VS (1996) Implementing deductive databases by mixed integer programming. ACM Trans Database Syst 21: 238–269
Article Google Scholar
Bruni R (2004) Discrete models for data imputation. Discrete Appl Math 144: 59–69
Article MATH MathSciNet Google Scholar
Bruni R (2005) Error correction for massive data sets. Optim Methods Softw 20: 295–314
Article MathSciNet Google Scholar
De Waal T (2001) WAID 4.1: a computer program for imputation of missing values. Res Off Stat 2: 53–70
Google Scholar
De Waal T (2003) Processing of erroneous and unsafe data. PhD thesis, Erasmus University Rotterdam
De Waal T, Coutinho W (2005) Automatic editing for business surveys: an assessment of selected algorithms. Int Stat Rev 73: 73–102
MATH Google Scholar
Fellegi IP, Holt D (1976) A systematic approach to automatic edit and imputation. J Am Stat Assoc 71: 17–35
Article Google Scholar
Ford BF (1983) An overview of hot-deck procedures. Incomplete Data Sample Surveys Theory Bibliograph 2: 185–207
Google Scholar
Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. Freeman WH, San Francisco
MATH Google Scholar
Garfinkel RS, Kunnathur AS, Liepins GE (1986) Optimal imputation of erroneous data: categorical data, general edits. Oper Res 34: 744–751
Article MATH Google Scholar
Kovar J, Whitridge P (1990) Generalized edit and imputation system; overview and applications. Rev Bras Estadistica 51: 85–100
Google Scholar
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley Interscience, New York
MATH Google Scholar
Milano M (ed) (2004) Constraint and integer programming toward a unified methodology. Operations Research/Computer Science, Interfaces Series 27
Nerode A, Shore RA (1997) Logic for applications. Springer, New York
MATH Google Scholar
Olinsky A, Chen S, Harlow L (2003) The comparative efficacy of imputation methods for missing data in structural equation modeling. Eur J Oper Res 151: 53–79
Article MATH MathSciNet Google Scholar
Pierzchala M (1995) Editing systems and software. In: Cox B, Chinnappa CK(eds) Business Survey Methods. Wiley, New York, pp 425–441
Google Scholar
Riera-Ledesma J, Salazar-González JJ (2007a) A Heuristic approach for the continuous error localization problem in data cleaning. Comput Oper Res 34: 2370–2383
Article MATH Google Scholar
Riera-Ledesma J, Salazar-González JJ (2007b) A branch-and-cut algorithm for the error location problem in data cleaning. Comput Oper Res 34: 2790–2804
Article MATH Google Scholar
Schaffer J (1987) Procedure for solving the data-editing problem with both continuous and discrete data types. Naval Res Logist 34: 879–890
Article MATH Google Scholar
The knowledge base on statistical data editing. Available online at: http://amrads.jrc.cec.eu.int/k-base (accessed on May 15, 2007)
United Nations Statistical Commision and Economic Comission for Europe (2000) Evaluating efficiency of statistical data editing: general framework (Conference of European Statisticians in Geneva). Available online at: http://www.unece.org/stats/publications/editingefficiency.pdf (accessed on May 15, 2007)

Download references

Author information

Authors and Affiliations

DEIOC, Universidad de La Laguna, 38271, Tenerife, Spain
Sergio Delgado-Quintero & Juan-José Salazar-González

Authors

Sergio Delgado-Quintero
View author publications
You can also search for this author in PubMed Google Scholar
Juan-José Salazar-González
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Delgado-Quintero.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Delgado-Quintero, S., Salazar-González, JJ. A new approach for data editing and imputation. Math Meth Oper Res 68, 407–428 (2008). https://doi.org/10.1007/s00186-008-0237-6

Download citation

Received: 15 November 2006
Accepted: 12 February 2008
Published: 21 August 2008
Issue Date: December 2008
DOI: https://doi.org/10.1007/s00186-008-0237-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new approach for data editing and imputation

Abstract

Access this article

Similar content being viewed by others

Multiply robust imputation procedures for zero-inflated distributions in surveys

Investigating the Performance of a Variation of Multiple Correspondence Analysis for Multiple Imputation in Categorical Data Sets

Efficient Imputation Methods to Handle Missing Data in Sample Surveys

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new approach for data editing and imputation

Abstract

Access this article

Similar content being viewed by others

Multiply robust imputation procedures for zero-inflated distributions in surveys

Investigating the Performance of a Variation of Multiple Correspondence Analysis for Multiple Imputation in Categorical Data Sets

Efficient Imputation Methods to Handle Missing Data in Sample Surveys

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation