Abstract
Structures found in data by exploratory techniques are notoriously unstable. Suppose that we search for a model within a given family and that we do this on different samples from the same population, D0, D1,..., DB. When only one data set is available, one can think of D as the original data set and the others as bootstrap samples from D0. Experience shows that one can be practically sure to find different models from different samples. A striking example of this model instability is given by Gong [1], in the context of stepwise logistic regression. The problem can be expected to be even more serious for tree-structured predictors, such as the RECPAM trees [2–4] which are the main concern of this work, since the model is selected out of a family much richer than that of linear regression as usually defined.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gong, G. (1986), “Cross-validation, the Jackknife, and the Bootstrap: Excess error estimation in forward logistic regression”, Journal of the American Statistical Association, 81, 108–113.
Ciampi, A., Chang, C.-H., Hogg, S.A. and McKinney, S. (1986), “Recursive Partition: A versatile method for exploratory data analysis in Biostatistics. In: Biostatistics 5, Festschrift in honor of Prof. V.M. Joshi’s 70th birthday, I. B. McNeil and G. J. Umphrey, eds, Dordrechdt-Ho11and, 1–28.
Ciampi, A. and Thiffault, J. (1988), “Recursive Partition and Amalgamation (RECPAM) for censored survival data: Criteria for tree selection”, to appear in Statistical Software Newsletter, vol.14, no.2.
Ciampi, A., Hogg, S.A., McKinney, S. and Thiffault, J. (1987), “RECPAM: a computer program for Recursive Partition and Amalgamation for censored survival data and other situations frequently occurring in Biostatistics. I.Methods and program features”, to appear in Computer Methods and Programs in Biomedicine.
Day,W. H. E. (1983), “The role of complexity in comparing classifications”, Mathematical Biosciences, 66, 97–114.
Faith, D. P. and Belbin, L. (1986), “Comparison of Classifications using measures intermediate between metric dissimilarity and consensus similarity”, Journal of Classification, 3, 257–280.
Boorman, S. A. and Arabie, P. (1972), “Structural measures and the method of sorting”, In: Multidimensional Scaling: Theory and Applications in the Behavorial Sciences, vol.1, R.N. Shepard, A.K. Romney and S.B. Nerlove, eds, New York: Seminar Press, 225–249.
Goodman, L. A. and Kruskal, W. H. (1954), “Measures of association for cross classifications”, Journal of the American Statistical Association, 49, 723–764.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1988 Physica-Verlag Heidelberg
About this paper
Cite this paper
Ciampi, A., Thiffault, J. (1988). Recursive Partition in Biostatistics: Stability of Trees and Choice of the Most Stable Classification. In: Edwards, D., Raun, N.E. (eds) Compstat. Physica-Verlag HD. https://doi.org/10.1007/978-3-642-46900-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-46900-8_36
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-0411-9
Online ISBN: 978-3-642-46900-8
eBook Packages: Springer Book Archive