Abstract
In the case of large-scale surveys, such as a Census, data may contain errors or missing values. An automatic error correction procedure is therefore needed. We focus on the problem of restoring the consistency of agricultural data concerning cultivation areas and number of livestock, and we propose here an approach to this balancing problem based on optimization. Possible alternative models, either linear, quadratic or mixed integer, are presented. The mixed integer linear one has been preferred and used for the treatment of possibly unbalanced data records. Results on real-world Agricultural Census data show the effectiveness of the proposed approach.
References
Bacharach, M.: Matrix rounding problems. Manag. Sci. 12(9), 732–742 (1966)
Bacharach, M.: Biproportional Matrices and Input–Output Change. Cambridge University Press, Cambridge (1970)
Banff Support Team: Functional Description of the Banff System for Edit and Imputation System. Quality Assurance and Generalized Systems Section Technical Reportt Statistics, Canada (2003)
Bankier, M.: Canadian Census Minimum change Donor imputation methodology. In: Proceedings of the Workshop on Data Editing. UN/ECE, Cardiff (2000)
Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization. Athena Scientific, Belmont (1997)
Bianchi, G., Bruni, R., Reale, A.: Information reconstruction via discrete optimization for agricultural census data. Appl. Math. Sci. 6(125), 6241–6251 (2012)
Bomze, I.M., Locatelli, M.: Separable standard quadratic optimization problems. Optim. Lett. 6(5), 857–866 (2012)
Bourbaki, N.: Topological vector spaces. Springer, Berlin (1987)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Bruni, R.: Discrete models for data imputation. Discrete Appl. Math. 144(1), 59–69 (2004)
Bruni, R.: Error correction for massive data sets. Optim. Methods Softw. 20(2–3), 295–314 (2005)
Bruni, R., Bianchi, G.: A formal procedure for finding contradictions into a set of rules. Appl. Math. Sci. 6(126), 6253–6271 (2012)
De Waal, T.: Computational Results with Various Error Localization Algorithms. UNECE Statistical Data Editing Work Session, Madrid (2003)
De Waal, T., Pannekoek, J., Scholtus, S.: Handbook of Statistical Data Editing and Imputation. Wiley Handbooks in Survey Methodology. John Wiley & Sons, Inc., New York (2011)
European Council Regulation (EEC) No 357/79 of 5 February 1979 on statistical surveys, EEC Documentation (1979)
Fellegi, I.P., Holt, D.: A systematic approach to automatic edit and imputation. J. Am. Stat. Assoc. 71, 17–35 (1976)
Fuller, W.A.: Measurement Error Models. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., New York (2006)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Co, San Francisco (1979)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York (2001)
Hochbaum, D.S., Shanthikumar, J.G.: Convex separable optimization is not much harder than linear optimization. J. ACM. 37(4), 843–862 (1990)
IBM: Ilog Concert Technology 12.1 Reference Manual. International Business Machines Corporation (2009)
IBM: Ilog Cplex 12.1 Reference Manual. International Business Machines Corporation (2009)
Kalantari, B., Lari, I., Ricca, F., Simeone, B.: On the complexity of general matrix scaling and entropy minimization via the RAS algorithm. Math. Program. Ser. A. 112, 371–401 (2008)
Klösgen, W., Żytkow, J.M. (eds.): Handbook of Data Mining and Knowledge Discovery. Oxford University Press, Oxford (2002)
Lyberg, L.E., Biemer, P., Collins, M., De Leeuw, E.D., Dippo, C., Schwarz, N., Trewin, D. (eds.): Survey Measurement and Process Quality, Section C, post survey processing and operations. John Wiley & Sons, Inc., New York (1997)
Mucherino, A., Papajorgji, P., Pardalos, P.M.: Data Mining in Agriculture. Springer, New York (2009)
Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. John Wiley & Sons, Inc., New York (1999)
Riera-Ledesma, J., Salazar-Gonzalez, J.J.: New Algorithms for the Editing and Imputation Problem. UNECE Statistical Data Editing Work Session, Madrid (2003)
Ramakrishnan, R., Gehrke, J.: Database Management Systems (3rd edn). McGraw-Hill, New York (2003)
Schneider, M.H., Zenios, S.A.: A comparative study of algorithms for matrix balancing. Oper. Res. 38(3), 439–455 (1990)
Schrijver, A.: Combinatorial Optimization. Springer, Berlin/New York (2003)
Winkler, W.E.: State of statistical data editing and current research problems. In: Proceedings of the Workshop on Data Editing. UN/ECE, Rome (1999)
Author information
Authors and Affiliations
Corresponding author
Additional information
Work developed during the biennial research collaboration between the Italian Statistic Office (Istat) and the University of Roma “Sapienza” on the data processing of the 2010 Census of Italian Agriculture.
Rights and permissions
About this article
Cite this article
Bianchi, G., Bruni, R. & Reale, A. Balancing of agricultural census data by using discrete optimization. Optim Lett 8, 1553–1565 (2014). https://doi.org/10.1007/s11590-013-0652-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-013-0652-3