Abstract
Complete-case analysis, also known as listwise deletion method (LD), is a relatively popular technique to handle datasets with incomplete entries. It is known to be effective when data are missing completely at random. However, by reducing the size of the dataset it can weaken the final statistical analysis. We present an optimization algorithm that improves the size of the final dataset after applying LD. It is based on a constrained weighted optimization technique to determine the maximum number of variables and respondents from the initial dataset that are preserved after applying LD. The main feature is that the method allows for selecting a specific set of variables (or respondents) that must be kept during the optimization, while balancing their relative importance by means of suitable weights. Moreover, we provide analytic formulas for the optimal solution, that can be easily evaluated numerically, reducing the computational complexity associated to the usage of off-the-shelf packages for solving similar large constrained optimization problems. We illustrate the application of our weighted optimization method to some examples and real datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allison, P.D.: Multiple imputation for missing data: a cautionary tale. Sociol. Methods & Res. 28(3), 301–309 (2000)
Allison, P.D.: Missing Data. Sage, Thousand Oaks (2001)
Berkovitz, L.: Convexity and Optimization in \(\mathbb {R}^n\). Wiley, Hoboken (2002)
Cela, E.: The Quadratic Assignment Problem: Theory and Algorithms. Kluwer Academic, Dordrecht (1998)
Cook, D., Swayne, D.F.: Interactive and Dynamic Graphics for Data Analysis. Springer, New York (2007)
Ge, R., Huang, C.: A continuous approach to nonlinear integer programming. Appl. Math. Comput. 34, 39–60 (1989)
King, G., Honaker, J., Joseph, A., Scheve, K.: Analyzing incomplete political science data: an alternative algorithm for multiple imputation. Am. Polit. Sci. Rev. 95, 49–69 (2001)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd ed. Wiley, New York (2002)
Murray, W., Ng, K.M.: An algorithm for nonlinear optimization problems with binary variables. Comput. Optim. Appl. 47(2), 257–288 (2010)
National Research Council: The Prevention and Treatment of Missing Data in Clinical Trials. The National Academies Press, Washington (2010)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hill, London (1997)
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7, 147–177 (2002)
SSP2015: 2015 Japanese Survey on Stratification and Social Psychology. http://ssp.hus.osaka-u.ac.jp
Wilkinson, L.: Statistical methods in psychology journals: guidelines and explanations. Am. Psychol. 54(8), 594–604 (1999)
Acknowledgements
This work was supported by JSPS KAKENHI Grant Number 26380658, 17K04103, and 16H02045, as part of the SSP Project (http://ssp.hus.osaka-u.ac.jp). The authors thank the SSP Project for the permission to use the SSP 2015 survey. Finally, the authors would like to thank the anonymous Referees for their valuable and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vernizzi, G., Nakai, M. (2019). Weighted Optimization with Thresholding for Complete-Case Analysis. In: Greselin, F., Deldossi, L., Bagnato, L., Vichi, M. (eds) Statistical Learning of Complex Data. CLADAG 2017. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-21140-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-21140-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21139-4
Online ISBN: 978-3-030-21140-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)