Skip to main content

Robust impurity measures in decision trees

  • Conference paper
Data Science, Classification, and Related Methods

Summary

Tree-based methods are a statistical procedure for automatic learning from data, their main characteristic being the simplicity of the results obtained. Their virtue is also their defect since the tree growing process is very dependent on data; small fluctuations in data may cause a big change in the tree growing process. Our main objective was to define data diagnostics to prevent internal instability in the tree growing process before a particular split has been made. We present a general formulation for the impurity of a node, a function of the proximity between the individuals in the node and its representative. Then, we compute a stability measure of a split and hence we can define more robust splits. Also. we have studied the theoretical complexity of this algorithm and its applicability to large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aluja T., Nafria E. (1995). Generalised impurity measures and data diagnostics in decision trees. Visualising Categorical Data. Cologne.

    Google Scholar 

  • Breiman L., Friedman J.H., Olshen RA., and Stone C.J. (1984). Classification and Regression Trees. Waldsworth International Group, Belmont, California.

    Google Scholar 

  • Celeux G., Lechevallier Y. (1982). Méthodes de Segementation non Paramétriques. Revue de Statistique Appliquée, XXX (4), 39–53.

    Google Scholar 

  • Ciampi A. (1991). Generalized Regression Trees. Computational Statistics and Data Analysis, 12, 57–78. North Holland.

    Article  MathSciNet  MATH  Google Scholar 

  • Greenacre M. (1984). Theory and Application of Correspondence Analysis. Academic Press.

    Google Scholar 

  • Gueguen A., Nakache J.P. (1988). Méthode de discrimination basée sur la construction d’un arbre de décision binaire. Revue de Statistique Appliquée, XXXVI (1), 19–38.

    Google Scholar 

  • Kass G.V. (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics, 29, n 2, pp. 119–127.

    Article  Google Scholar 

  • Mola F., Siciliano R. (1992). A two-stage predictive splitting algorithm in binary segmentation. Computational Statistics. vol. 1. Y. Dodge and J. Whittaker ed. Physica Verlag.

    Google Scholar 

  • Sonquist J.A., Morgan J.N. (1964). The Detection of Interaction Effects. Ann Arbor: Institute for Social Research. University of Michigan.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Japan

About this paper

Cite this paper

Aluja-Banet, T., Nafria, E. (1998). Robust impurity measures in decision trees. In: Hayashi, C., Yajima, K., Bock, HH., Ohsumi, N., Tanaka, Y., Baba, Y. (eds) Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Tokyo. https://doi.org/10.1007/978-4-431-65950-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-4-431-65950-1_21

  • Publisher Name: Springer, Tokyo

  • Print ISBN: 978-4-431-70208-5

  • Online ISBN: 978-4-431-65950-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics