Skip to main content
Log in

A proposal for handling missing data

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

A method for dealing with the problem of missing observations in multivariate data is developed and evaluated. The method uses a transformation of the principal components of the data to estimate missing entries. The properties of this method and four alternative methods are investigated by means of a Monte Carlo study of 42 computer-generated data matrices. The methods are compared with respect to their ability to predict correlation matrices as well as missing entries.

The results indicate that whenever there exists modest intercorrelations among the variables (i.e., average off diagonal correlation above .2) the proposed method is at least as good as the best alternative (a regression method) while being considerably faster and simpler computationally. Models for determining the best alternative based upon easily calculated characteristics of the matrix are given. The generality of these models is demonstrated using the previously published results of Timm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Anderson, T. W. Maximum likelihood estimates for a multivariate normal distribution when some observations are missing.Journal of the American Statistical Association, 1957,52, 200–03.

    Google Scholar 

  • Buck, S. F. A method of estimation of missing values in multivariate data suitable for use with an electronic computer.Journal of the Royal Statistical Society, Series B, 1960,22, 302–307.

    Google Scholar 

  • Christofferson, A. A method for component analysis when the data are incomplete. Seminar communication, University Institute of Statistics, Uppsala, 1965.

    Google Scholar 

  • Dear, R. E. A principal-Component missing data method for multiple regression models. System Development Corporation, Technical Report SP-86, 1959.

  • Eckart, C. and Young, G. The approximation of one matrix by another of lower rank.Psychometrika, 1936,1, 211–218.

    Google Scholar 

  • Edgett, G. L. Multiple regression with missing observations among the independent variables.Journal of the American Statistical Association, 1956,51, 122–132.

    Google Scholar 

  • Glasser, M. Linear regression analysis with missing observations among the independent variables.Journal of the American Statistical Association, 1964,59, 834–844.

    Google Scholar 

  • Gleason, T. C. and Staelin, R. Improving the metric quality of questionnaire data.Psychometrika, 1973, 393–410.

  • Haitovsky, Y. Missing data in regression analysis.Journal of the Royal Statistical Society, Series B, 1968,30, 67–82.

    Google Scholar 

  • Horn, J. L. A rationale and test for the number of factors in factor analysis.Psychometrika, 1965,30, 179–185.

    Google Scholar 

  • Johnson, R. M. On a theorem stated by Eckart and Young.Psychometrika, 1963,28, 259–264.

    Google Scholar 

  • Srivastava, J. N. and McDonald, L. On a large class of incomplete multivariate models which can be transformed to make manova applicable.Metron, 1970,28, 241–52.

    Google Scholar 

  • Staelin, R. and Gleason, T. C. On the quality of principle components. American Marketing Association Combined Conference Proceedings Spring and Fall 1972, B. W. Becker and H. Becker (Eds.),34, 484–488.

  • Timm, N. H. The estimation of variance-covariance and correlation matrices from incomplete data.Psychometrika, 1970,35, 417–438.

    Google Scholar 

  • Trawinski, I. M. and Bargmann, R. E. Maximum likelihood estimation with incomplete multivariate data.Annals of Mathematical Statistics, 1964,35, 647–57.

    Google Scholar 

  • Walsh, J. E. Computer-feasible method for handling incomplete data in regression analysis.Journal of the Association for Computer Machinery, 1961,18, 201–211.

    Google Scholar 

  • Wilks, S. S. Moments and distributions of estimates of population parameters from fragmentary samples.Annals of Mathematical Statistics, 1932,3, 163–195.

    Google Scholar 

  • Wold, H. Nonlinear estimation by iterative least squares procedures. In F. N. David (Ed.),Festchrift Jerzy Neyman. Wiley: New York, 1966.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This is an extension and elaboration of a paper read at the Spring 1973 meetings of the Psychometric Society. We wish to express our appreciation to Timothy McGuire for his helpful comments.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gleason, T.C., Staelin, R. A proposal for handling missing data. Psychometrika 40, 229–252 (1975). https://doi.org/10.1007/BF02291569

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02291569

Keywords

Navigation