Quality and Quantity

, Volume 39, Issue 4, pp 359–372 | Cite as

Multivariate Prediction with Nonlinear Principal Components Analysis: Theory

Article

Abstract

We propose the notion of multivariate predictability as a measure of goodness-of-fit in data reduction techniques which are useful for visualizing and screening data. For quantitative variables this leads to the usual sums-of-squares and variance accounted for criteria. For categorical variables we show how to predict the category-levels of all variables associated with every point (case). The proportion of predictions which agree with the true categories gives the measure of fit. The ideas are very general; as an illustration we use nonlinear principal components analysis (NLPCA) in association with ordered categorical variables. A detailed example using data from the International Social Survey Program (ISSP) will be given in Blasius and Gower (quality and quantity, 39, to appear). It will be shown that the predictability criterion suggests that the fits are rather better than is indicated by “percentage of variance accounted for”.

Keywords

biplot large scale data analysis nonlinear principal components analysis prediction 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blasius, J. & Gower, J. C. (to appear). Multivariate prediction with nonlinear principal components analysis: Application. Quality and Quantity 39Google Scholar
  2. Borg, I., Groenen, P. 1997Modern Multidimensional ScalingSpringerNew YorkGoogle Scholar
  3. Borg, I., Shye, S. 1995Facet Theory. Form and ContentSageNewbury Park, CAGoogle Scholar
  4. Chambers, J.M. 1998Programming with Data: A Guide to the S LanguageSpringerNew YorkGoogle Scholar
  5. Eckart, C., Young, G. 1936The approximation of one matrix by another of lower rankPsychometrika1211218Google Scholar
  6. Gabriel, K.R. 1971The biplot-graphic display of matrices with applications to principal components analysisBiometrika58453467Google Scholar
  7. Gabriel, K.R. 1981

    Biplot display of multivariate matrices for inspecting of data and diagnosis

    Barnett, V. eds. Interpreting Multivariate DataWileyChichester147174
    Google Scholar
  8. Genstat 5 Committee1993Genstat 5 Release 3 Reference ManualNumerical Algorithms GroupOxfordGoogle Scholar
  9. Gifi, A. 1990Nonlinear Multivariate AnalysisWileyChichesterGoogle Scholar
  10. Gower, J.C. 1966Some distance properties of latent-root and vector methods used in multivariate analysisBiometrika53325338Google Scholar
  11. Gower, J. C. (1993). The construction of neighbour-regions in two dimensions for prediction with multi-level categorical variables. In: O. Opitz, B. Lausen & R. Klar (eds.), Information and Classification: Concepts–Methods–Applications Proceedings 16th Annual Conference of the Gesellschaft für Klassifikation, Dortmund, April 1992, Berlin: Springer, pp. 174–189Google Scholar
  12. Gower, J.C. 2002

    Categories and quantities

    Nishisato, S.Baba, Y.Bozdogan, H.Kamefuji, K. eds. Measurement and Multivariate AnalysisSpringerTokyo112
    Google Scholar
  13. Gower, J.C., Hand, D.J. 1996BiplotsChapman & HallLondonGoogle Scholar
  14. Gower, J.C., Harding, S. 1998

    Prediction regions for categorical variables

    Blasius, J.Greenacre, M. eds. Visualization of Categorical Data.Academic PressSan Diego405423
    Google Scholar
  15. Greenacre, M.J. 1993Biplots in correspondence analysisJournal of Applied Statistics20251269Google Scholar
  16. Guttman, L. 1965A faceted definition of intelligenceScripta Hierosolymitana14166181Google Scholar
  17. Heiser, W.J., Meulman, J.J. 1994

    Homogeneity analysis: exploring the distribution of variables and their nonlinear relationships

    Greenacre, M.J.Blasius, J. eds. Correspondence Analysis in the Social Sciences. Recent Developments and ApplicationsAcademic PressLondon179209
    Google Scholar
  18. Meulman, J.J., Heiser, W.J. 1999SPSS Categories 100SPSS Inc.ChicagoGoogle Scholar
  19. Payne, R. W., Lane, P. W., Baird, D. B., Gilmour, A. R., Harding, S. A., Morgan, G. W. Murray, D. A., Thompson, R., Todd, A. D., Tunicliffe-Wilson, G., Webster, R. & Welham, S. J. (1998). Genstat 5 Release 4.1 Reference Manual Supplement. Oxford: Numerical Algorithms GroupGoogle Scholar
  20. SPSS. (1999). See Meulman and Heiser (1999)Google Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  1. 1.Department of StatisticsThe Open UniversityMilton KeynesU.K
  2. 2.Seminar for SociologyUniversity of BonnBonnGermany

Personalised recommendations