Optimal predictive partitioning
- 108 Downloads
In many situations, one wishes to group objects into well-defined classes on the basis of one set of descriptor variables, and then predict the classes of new objects from a different set of variables. For example, a bank may categorise customers into distinct financial behaviour pattern classes by observing how they have behaved over a period of years, and then seek to assign new customers to future behaviour classes using information captured when they open an account. Such situations require the striking of a compromise between the compactness and integrity of the cluster structure, and the accuracy of the predictive assignment to clusters. We describe two algorithms for achieving such a compromise, discuss some of their features, and illustrate their performance in a simulation study and in a liver transplant problem.
KeywordsAlternating least squares Clustering Criterion optimisation Discrimination Error rates Transfer algorithm
Unable to display preview. Download preview PDF.
- Bock H.H. 1987. On the interface between cluster analysis, principal component analysis and multidimensional scaling. In: Bozdogan H. and Gupta A. K. (Eds.), Multivariate Statistical Modeling and Data Analysis. Dordrecht, Reidel, pp. 17–34.Google Scholar
- Everitt B.S., Landau S., and Leese M. 2001. Cluster Analysis (4th Ed). London, Arnold.Google Scholar
- Forgey E.W. 1965. Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics, 21: 768–769.Google Scholar
- Friedman J.H. and Meulman J.J. 2004. Clustering objects on subsets of attributes (with discussion). Journal of the Royal Statistical Society Series B 66: 815–849.Google Scholar
- Kelly M.G., Hand D.J., and Adams N.M. 1998. Defining the goals to optimise data mining performance. In: Agrawal R., Stolorz P., and Piatetsky-Shapiro G. (Eds.), Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, Menlo Park, AAAI Press, pp. 234–238.Google Scholar
- Kelly M.G., Hand D.J., and Adams N.M. 1999. Supervised classification problems: how to be both judge and jury. In: Hand D.J., Kok J.N., and Berthold M.R. (Eds.), Advances in Intelligent Data Analysis Berlin, Springer, pp. 235–244.Google Scholar
- Lewis E.M. 1994. An Introduction to Credit Scoring. San Rafael, California, Athena Press.Google Scholar
- MacQueen J. 1967. Some methods for classification and analysis of multivariate observations. In: LeCam L. and Neyman J., (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, University of California Press, Vol. 1, pp. 281–297.Google Scholar