Robust impurity measures in decision trees

Aluja-Banet, Tomàs; Nafria, Eduard

doi:10.1007/978-4-431-65950-1_21

Tomàs Aluja-Banet⁸ &
Eduard Nafria⁸

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2023 Accesses
1 Citations

Summary

Tree-based methods are a statistical procedure for automatic learning from data, their main characteristic being the simplicity of the results obtained. Their virtue is also their defect since the tree growing process is very dependent on data; small fluctuations in data may cause a big change in the tree growing process. Our main objective was to define data diagnostics to prevent internal instability in the tree growing process before a particular split has been made. We present a general formulation for the impurity of a node, a function of the proximity between the individuals in the node and its representative. Then, we compute a stability measure of a split and hence we can define more robust splits. Also. we have studied the theoretical complexity of this algorithm and its applicability to large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aluja T., Nafria E. (1995). Generalised impurity measures and data diagnostics in decision trees. Visualising Categorical Data. Cologne.
Google Scholar
Breiman L., Friedman J.H., Olshen RA., and Stone C.J. (1984). Classification and Regression Trees. Waldsworth International Group, Belmont, California.
Google Scholar
Celeux G., Lechevallier Y. (1982). Méthodes de Segementation non Paramétriques. Revue de Statistique Appliquée, XXX (4), 39–53.
Google Scholar
Ciampi A. (1991). Generalized Regression Trees. Computational Statistics and Data Analysis, 12, 57–78. North Holland.
Article MathSciNet MATH Google Scholar
Greenacre M. (1984). Theory and Application of Correspondence Analysis. Academic Press.
Google Scholar
Gueguen A., Nakache J.P. (1988). Méthode de discrimination basée sur la construction d’un arbre de décision binaire. Revue de Statistique Appliquée, XXXVI (1), 19–38.
Google Scholar
Kass G.V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics, 29, n 2, pp. 119–127.
Article Google Scholar
Mola F., Siciliano R. (1992). A two-stage predictive splitting algorithm in binary segmentation. Computational Statistics. vol. 1. Y. Dodge and J. Whittaker ed. Physica Verlag.
Google Scholar
Sonquist J.A., Morgan J.N. (1964). The Detection of Interaction Effects. Ann Arbor: Institute for Social Research. University of Michigan.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Statistics and Operational Research, Universitat Politcnica de Catalunya, c. Pau Gargallo. 5, 08028, Barcelona, Spain
Tomàs Aluja-Banet & Eduard Nafria

Authors

Tomàs Aluja-Banet
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Nafria
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106, Japan
Chikio Hayashi , Noboru Ohsumi & Yasumasa Baba , &
School of Management, Science University of Tokyo, 500 Shimokiyoku, Kuki, Saitama 346, Japan
Keiji Yajima
Institut für Statistik, Rheinisch-Westfälische Technische Hochschule (RWTH), D-52056, Aachen, Germany
Hans-Hermann Bock
Faculty of Environmental Science & Technology, Okayama University, 2-1-1 Tsushima-naka, Okayama 700, Japan
Yutaka Tanaka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aluja-Banet, T., Nafria, E. (1998). Robust impurity measures in decision trees. In: Hayashi, C., Yajima, K., Bock, HH., Ohsumi, N., Tanaka, Y., Baba, Y. (eds) Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Tokyo. https://doi.org/10.1007/978-4-431-65950-1_21

Download citation

DOI: https://doi.org/10.1007/978-4-431-65950-1_21
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-70208-5
Online ISBN: 978-4-431-65950-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics