Weighted Optimization with Thresholding for Complete-Case Analysis

Vernizzi, Graziano; Nakai, Miki

doi:10.1007/978-3-030-21140-0_15

Graziano Vernizzi²¹ &
Miki Nakai²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Included in the following conference series:

Scientific Meeting of the Classification and Data Analysis Group of the Italian Statistical Society

1069 Accesses

Abstract

Complete-case analysis, also known as listwise deletion method (LD), is a relatively popular technique to handle datasets with incomplete entries. It is known to be effective when data are missing completely at random. However, by reducing the size of the dataset it can weaken the final statistical analysis. We present an optimization algorithm that improves the size of the final dataset after applying LD. It is based on a constrained weighted optimization technique to determine the maximum number of variables and respondents from the initial dataset that are preserved after applying LD. The main feature is that the method allows for selecting a specific set of variables (or respondents) that must be kept during the optimization, while balancing their relative importance by means of suitable weights. Moreover, we provide analytic formulas for the optimal solution, that can be easily evaluated numerically, reducing the computational complexity associated to the usage of off-the-shelf packages for solving similar large constrained optimization problems. We illustrate the application of our weighted optimization method to some examples and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Allison, P.D.: Multiple imputation for missing data: a cautionary tale. Sociol. Methods & Res. 28(3), 301–309 (2000)
Article Google Scholar
Allison, P.D.: Missing Data. Sage, Thousand Oaks (2001)
MATH Google Scholar
Berkovitz, L.: Convexity and Optimization in \(\mathbb {R}^n\). Wiley, Hoboken (2002)
Google Scholar
Cela, E.: The Quadratic Assignment Problem: Theory and Algorithms. Kluwer Academic, Dordrecht (1998)
Book Google Scholar
Cook, D., Swayne, D.F.: Interactive and Dynamic Graphics for Data Analysis. Springer, New York (2007)
Book Google Scholar
Ge, R., Huang, C.: A continuous approach to nonlinear integer programming. Appl. Math. Comput. 34, 39–60 (1989)
MathSciNet MATH Google Scholar
King, G., Honaker, J., Joseph, A., Scheve, K.: Analyzing incomplete political science data: an alternative algorithm for multiple imputation. Am. Polit. Sci. Rev. 95, 49–69 (2001)
Article Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd ed. Wiley, New York (2002)
Book Google Scholar
Murray, W., Ng, K.M.: An algorithm for nonlinear optimization problems with binary variables. Comput. Optim. Appl. 47(2), 257–288 (2010)
Article MathSciNet Google Scholar
National Research Council: The Prevention and Treatment of Missing Data in Clinical Trials. The National Academies Press, Washington (2010)
Google Scholar
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hill, London (1997)
Book Google Scholar
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7, 147–177 (2002)
Article Google Scholar
SSP2015: 2015 Japanese Survey on Stratification and Social Psychology. http://ssp.hus.osaka-u.ac.jp
Wilkinson, L.: Statistical methods in psychology journals: guidelines and explanations. Am. Psychol. 54(8), 594–604 (1999)
Article Google Scholar

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number 26380658, 17K04103, and 16H02045, as part of the SSP Project (http://ssp.hus.osaka-u.ac.jp). The authors thank the SSP Project for the permission to use the SSP 2015 survey. Finally, the authors would like to thank the anonymous Referees for their valuable and constructive comments.

Author information

Authors and Affiliations

Department of Physics and Astronomy, Siena College, Loudonville, NY, USA
Graziano Vernizzi
Department of Social Sciences, College of Social Sciences, Ritsumeikan University, Kyoto, Japan
Miki Nakai

Authors

Graziano Vernizzi
View author publications
You can also search for this author in PubMed Google Scholar
Miki Nakai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miki Nakai .

Editor information

Editors and Affiliations

Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
Francesca Greselin
Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
Laura Deldossi
Department of Economic and Social Sciences, Università Cattolica del Sacro Cuore, Piacenza, Italy
Luca Bagnato
Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vernizzi, G., Nakai, M. (2019). Weighted Optimization with Thresholding for Complete-Case Analysis. In: Greselin, F., Deldossi, L., Bagnato, L., Vichi, M. (eds) Statistical Learning of Complex Data. CLADAG 2017. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-21140-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-21140-0_15
Published: 07 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21139-4
Online ISBN: 978-3-030-21140-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics