Summary
Tree-based methods are a statistical procedure for automatic learning from data, their main characteristic being the simplicity of the results obtained. Their virtue is also their defect since the tree growing process is very dependent on data; small fluctuations in data may cause a big change in the tree growing process. Our main objective was to define data diagnostics to prevent internal instability in the tree growing process before a particular split has been made. We present a general formulation for the impurity of a node, a function of the proximity between the individuals in the node and its representative. Then, we compute a stability measure of a split and hence we can define more robust splits. Also. we have studied the theoretical complexity of this algorithm and its applicability to large data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aluja T., Nafria E. (1995). Generalised impurity measures and data diagnostics in decision trees. Visualising Categorical Data. Cologne.
Breiman L., Friedman J.H., Olshen RA., and Stone C.J. (1984). Classification and Regression Trees. Waldsworth International Group, Belmont, California.
Celeux G., Lechevallier Y. (1982). Méthodes de Segementation non Paramétriques. Revue de Statistique Appliquée, XXX (4), 39–53.
Ciampi A. (1991). Generalized Regression Trees. Computational Statistics and Data Analysis, 12, 57–78. North Holland.
Greenacre M. (1984). Theory and Application of Correspondence Analysis. Academic Press.
Gueguen A., Nakache J.P. (1988). Méthode de discrimination basée sur la construction d’un arbre de décision binaire. Revue de Statistique Appliquée, XXXVI (1), 19–38.
Kass G.V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics, 29, n 2, pp. 119–127.
Mola F., Siciliano R. (1992). A two-stage predictive splitting algorithm in binary segmentation. Computational Statistics. vol. 1. Y. Dodge and J. Whittaker ed. Physica Verlag.
Sonquist J.A., Morgan J.N. (1964). The Detection of Interaction Effects. Ann Arbor: Institute for Social Research. University of Michigan.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Japan
About this paper
Cite this paper
Aluja-Banet, T., Nafria, E. (1998). Robust impurity measures in decision trees. In: Hayashi, C., Yajima, K., Bock, HH., Ohsumi, N., Tanaka, Y., Baba, Y. (eds) Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Tokyo. https://doi.org/10.1007/978-4-431-65950-1_21
Download citation
DOI: https://doi.org/10.1007/978-4-431-65950-1_21
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-70208-5
Online ISBN: 978-4-431-65950-1
eBook Packages: Springer Book Archive