Skip to main content

Weighted Optimization with Thresholding for Complete-Case Analysis

  • Conference paper
  • First Online:
Statistical Learning of Complex Data (CLADAG 2017)

Abstract

Complete-case analysis, also known as listwise deletion method (LD), is a relatively popular technique to handle datasets with incomplete entries. It is known to be effective when data are missing completely at random. However, by reducing the size of the dataset it can weaken the final statistical analysis. We present an optimization algorithm that improves the size of the final dataset after applying LD. It is based on a constrained weighted optimization technique to determine the maximum number of variables and respondents from the initial dataset that are preserved after applying LD. The main feature is that the method allows for selecting a specific set of variables (or respondents) that must be kept during the optimization, while balancing their relative importance by means of suitable weights. Moreover, we provide analytic formulas for the optimal solution, that can be easily evaluated numerically, reducing the computational complexity associated to the usage of off-the-shelf packages for solving similar large constrained optimization problems. We illustrate the application of our weighted optimization method to some examples and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Allison, P.D.: Multiple imputation for missing data: a cautionary tale. Sociol. Methods & Res. 28(3), 301–309 (2000)

    Article  Google Scholar 

  2. Allison, P.D.: Missing Data. Sage, Thousand Oaks (2001)

    MATH  Google Scholar 

  3. Berkovitz, L.: Convexity and Optimization in \(\mathbb {R}^n\). Wiley, Hoboken (2002)

    Google Scholar 

  4. Cela, E.: The Quadratic Assignment Problem: Theory and Algorithms. Kluwer Academic, Dordrecht (1998)

    Book  Google Scholar 

  5. Cook, D., Swayne, D.F.: Interactive and Dynamic Graphics for Data Analysis. Springer, New York (2007)

    Book  Google Scholar 

  6. Ge, R., Huang, C.: A continuous approach to nonlinear integer programming. Appl. Math. Comput. 34, 39–60 (1989)

    MathSciNet  MATH  Google Scholar 

  7. King, G., Honaker, J., Joseph, A., Scheve, K.: Analyzing incomplete political science data: an alternative algorithm for multiple imputation. Am. Polit. Sci. Rev. 95, 49–69 (2001)

    Article  Google Scholar 

  8. Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd ed. Wiley, New York (2002)

    Book  Google Scholar 

  9. Murray, W., Ng, K.M.: An algorithm for nonlinear optimization problems with binary variables. Comput. Optim. Appl. 47(2), 257–288 (2010)

    Article  MathSciNet  Google Scholar 

  10. National Research Council: The Prevention and Treatment of Missing Data in Clinical Trials. The National Academies Press, Washington (2010)

    Google Scholar 

  11. Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman & Hill, London (1997)

    Book  Google Scholar 

  12. Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7, 147–177 (2002)

    Article  Google Scholar 

  13. SSP2015: 2015 Japanese Survey on Stratification and Social Psychology. http://ssp.hus.osaka-u.ac.jp

  14. Wilkinson, L.: Statistical methods in psychology journals: guidelines and explanations. Am. Psychol. 54(8), 594–604 (1999)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number 26380658, 17K04103, and 16H02045, as part of the SSP Project (http://ssp.hus.osaka-u.ac.jp). The authors thank the SSP Project for the permission to use the SSP 2015 survey. Finally, the authors would like to thank the anonymous Referees for their valuable and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miki Nakai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vernizzi, G., Nakai, M. (2019). Weighted Optimization with Thresholding for Complete-Case Analysis. In: Greselin, F., Deldossi, L., Bagnato, L., Vichi, M. (eds) Statistical Learning of Complex Data. CLADAG 2017. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-21140-0_15

Download citation

Publish with us

Policies and ethics