Psychometrika

, Volume 68, Issue 4, pp 493–517

Prediction and classification in nonlinear data analysis: Something old, something new, something borrowed, something blue

2003 Presidential Address
  • 184 Downloads

Abstract

Prediction and classification are two very active areas in modern data analysis. In this paper, prediction with nonlinear optimal scaling transformations of the variables is reviewed, and extended to the use of multiple additive components, much in the spirit of statistical learning techniques that are currently popular, among other areas, in data mining. Also, a classification/clustering method is described that is particularly suitable for analyzing attribute-value data from systems biology (genomics, proteomics, and metabolomics), and which is able to detect groups of objects that have similar values on small subsets of the attributes.

Key words

multiple regression optimal scaling optimal scoring statistical learning data mining boosting forward stagewise additive modeling additive prediction components monotonic regression regression splines distance based clustering clustering on variable subsets COSA genomics proteomics systems biology categorical data ordinal data ApoE3 data cervix cancer data Boston housing data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bock, R.D. (1960).Methods and applications of optimal scaling (Tech. Rep. 25). Chapel Hill, NC: University of North Carolina, L.L. Thurstone Psychometric Laboratory.Google Scholar
  2. Boon, M.E., Zeppa, P., Ouwerkerk-Noordam, E., & Kok, L.P. (1990). Exploiting the tooth-pick effect of the cytobrush by plastic embedding of cervical samples.Acta Cytologica, 35, 57–63.Google Scholar
  3. Breiman, L. (1996a). Bagging predictors.Machine Learning, 26, 123–140.Google Scholar
  4. Breiman, L. (1996b). Stacked regressions.Machine Learning, 24, 51–64.Google Scholar
  5. Breiman, L., & Friedman, J.H. (1985). Estimating optimal transformations for multiple regression and correlation.Journal of the American Statistical Association, 80, 580–598.Google Scholar
  6. Breiman, L., & Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984).Classification and regression trees. Belmont, CA: Wadsworth.Google Scholar
  7. Buja, A. (1990). Remarks on functional canonical variates, alternating least squares methods and ACE.Annals of Statistics, 18, 1032–1069.Google Scholar
  8. de Leeuw, J., & Heiser, W.J. (1980). Multidimensional scaling with restrictions on the configuration. In P.R. Krishnaiah (Ed.),Multivariate analysis, Vol. V (pp. 501–522). Amsterdam: North-Holland.Google Scholar
  9. de Leeuw, J., Young, F.W., & Takane, Y. (1976). Additive structure in qualitative data.Psychometrika, 41, 471–503.CrossRefGoogle Scholar
  10. Duda, R., Hart, P. and Stork, D. (2000).Pattern classification (2nd ed.). New York, NY: John Wiley & Sons.Google Scholar
  11. Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm.Machine Learning: Proceedings of the Thirteenth International Conference (pp. 148–156). San Francisco, CA: Morgan Kauffman.Google Scholar
  12. Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine.Annals of Statistics, 29(5), 1189–1232.Google Scholar
  13. Friedman, J.H., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion).Annals of Statistics, 28, 337–307.Google Scholar
  14. Friedman, J.H., & Meulman, J.J. (in press). Clustering objects on subsets of attributes, (with discussion).Journal of the Royal Statistical Society, Series B. Available at http://www-stat.stanford.edu/~jhf/ftp/cosa.pdfGoogle Scholar
  15. Friedman, J.H., & Meulman, J.J. (2003a). Multiple additive regression trees with application in epidemiology.Statistics in Medicine, 22(9), 1365–1381.CrossRefPubMedGoogle Scholar
  16. Friedman, J.H., & Meulman, J.J. (2003b).COSA [Software]. Available at http://www-stat.stanford.edu/~jhf/COSA.htmlGoogle Scholar
  17. Friedman, J., & Stuetzle, W. (1981). Projection pursuit regression.Journal of the American Statistical Association, 76, 817–823.Google Scholar
  18. Gifi, A. (1990).Nonlinear multivariate analysis. Chichester, U.K.: John Wiley & Sons. (First edition, 1981, University of Leiden, Department of Data Theory)Google Scholar
  19. Groenen, P.J.F., van Os, B.J., & Meulman, J.J. (2000). Optimal scaling by alternating length constrained nonnegative least squares: An application to distance based principal components analysis.Psychometrika, 65, 511–524.CrossRefGoogle Scholar
  20. Guttman, L. (1950). The principal components of scale analysis. In S.A. Stouffer, L. Guttman, E.A. Suchman, P.F. Lazarsfield, S.A. Star, & J.A. Clausen (Eds.),Measurement and prediction. Princeton, NJ: Princeton University Press.Google Scholar
  21. Harrison, D., & Rubinfeld, D.L. (1978). Hedonic housing prices and the demand for clean air.Journal of Environmental Economics Management, 5, 81–102.Google Scholar
  22. Hastie, T., & Tibshirani, R. (1990).Generalized additive models. New York, NY: Chapman and Hall.Google Scholar
  23. Hastie, T., Tibshirani, R., & Buja, A. (1998). Flexible discriminant analysis by optimal scoring.Journal of the American Statistical Association, 89, 1255–1270.Google Scholar
  24. Hastie, T., Tibshirani, R., & Friedman, J.H. (2001).The elements of statistical learning. New York, NY: Springer-Verlag.Google Scholar
  25. Hayashi, C. (1952). On the prediction of phenomena from qualitative data and the quantification of qualitative data from the mathematico-statistical point of view.Annals of the Institute of Statitical Mathematics, 2, 93–96.Google Scholar
  26. Heiser, W.J. (1995). Convergent computation by iterative majorization: Theory and applications in multidimensional data analysis. In W.J. Krzanowski (Ed.),Recent advances in descriptive multivariate analysis (pp. 157–189). Oxford, U.K.: Oxford University Press.Google Scholar
  27. Kruskal, J.B. (1964a). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.Psychometrika, 29, 1–28.Google Scholar
  28. Kruskal, J.B. (1964b). Nonmetric multidimensional scaling: A numerical method.Psychometrika, 29, 115–129.Google Scholar
  29. Kruskal, J.B. (1965). Analysis of factorial experiments by estimating monotone transformations of the data.Journal of the Royal Statistical Society, Series B,27, 251–263.Google Scholar
  30. Max, J. (1960). Quantizing for minimum distortion.Proceedings IEEE (Information Theory), 6, 7–12.Google Scholar
  31. McLachlan, G.J. (1992).Discriminant analysis and statistical pattern recognition. New York, NY: John Wiley & Sons.Google Scholar
  32. Meulman, J.J. (2000). Discriminant analysis with optimal scaling. In R. Decker & W. Gaul (Eds.),Classification and information processing at the turn of the millenium (pp. 32–39). Heidelberg-Berlin, Germany: Springer-Verlag.Google Scholar
  33. Meulman, J.J., Zeppa, P., Boon, M.E., & Rietveld, W.J. (1992). Prediction of various grades of cervical preneoplasia and neoplasia on plastic embedded cytobrush samples: Discriminant analysis with qualitative and quantitative predictors.Analytical and Quantitative Cytology and Histology, 14, 60–72.PubMedGoogle Scholar
  34. Meulman J.J., & van der Kooij, A.J. (2000, May).Transformations towards independence through optimal scaling. Paper presented at the International Conference on Measurement and Multivariate Analysis (ICMMA), Banff, Canada.Google Scholar
  35. Nishisato, S. (1980).Analysis of categorical data: Dual scaling and its applications. Toronto, Canada: University of Toronto Press.Google Scholar
  36. Nishisato, S. (1994).Elements of dual scaling: An introduction to practical data analysis. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  37. Ramsay, J.O. (1988). Monotone regression splines in action.Statistical Science, 4, 425–461.Google Scholar
  38. Ripley, B.D. (1996).Pattern recognition and neural networks. Cambridge, U.K.: Cambridge University Press.Google Scholar
  39. Takane, Y. (1998). Nonlinear multivariate analysis by neural network models. In C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H.H. Bock, & Y. Baba (Eds.),Data science, classification, and related methods (pp. 527–538). Tokyo: Springer.Google Scholar
  40. Takane, Y., & Oshima-Takane, Y. (2002). Nonlinear generalized canonical correlation analysis by neural network models. In S. Nishisato, Y. Baba, H. Bozdogan, & K. Kanefuji (Eds.),Measurement and multivariate analysis (pp. 183–190). Tokyo: Springer-Verlag.Google Scholar
  41. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society, Series B,58, 267–288.Google Scholar
  42. van der Greef J., Davidov E., Verheij E., Vogels J., van der Heijden R., Adourian A.S., Oresic M., Marple E.W., & Naylor S. (2003). The role of metabolomics in drug discovery: A new vision for drug discovery and development. In G.G. Harrigan & R. Goodacre (Eds.),Metabolic profiling: Its role in biomarker discovery and gene function analysis (pp. 170–198). Boston, MA: Dordrecht; London: Kluwer Academic Publishers.Google Scholar
  43. van der Kooij, A.J., & Meulman, J.J. (1999). Regression with optimal scaling. In J.J. Meulman, W.J. Heiser, & SPSS Inc. (Eds.),SPPS Categories 10.0. (pp. 1–8, 77–101). Chicago, IL: SPSS.Google Scholar
  44. van der Kooij, A.J., Meulman, J.J., & Heiser W.J. (2003). Local minima in categorical multiple regression. Manuscript mubmitted for publication.Google Scholar
  45. Vapnik, V. (1996).The nature of statistical learning theory. New York, NY: Springer-Verlag.Google Scholar
  46. Whittaker, J.L. (1990).Graphical models in applied multivariate statistics. New York, NY: John Wiley & Sons.Google Scholar
  47. Winsberg, S., & Ramsay, J.O. (1980). Monotonic transformations to additivity using splines.Biometrika, 67, 669–674.Google Scholar
  48. Yanai, H., Okada, A., Shigemasu, K., Kano, T., & Meulman, J.J. (Eds.) (2003).New developments in psychometrics. Tokyo: Springer-VerlagGoogle Scholar
  49. Young, F.W., de Leeuw, J., & Takane, Y. (1976). Regression with qualitative and quantitative variables: An alternating least squares method with optimal scaling features.Psychometrika, 41, 505–528.CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2003

Authors and Affiliations

  1. 1.Data Theory Group, Department of EducationLeiden UniversityLeidenThe Netherlands

Personalised recommendations