Algorithmic speedups in growing classification trees by using an additive split criterion

  • David Lubinsky
Conference paper
Part of the Lecture Notes in Statistics book series (LNS, volume 89)

Abstract

We propose a new split criterion to be used in building classification trees. This criterion called weighted accuracy or wacc has the advantage that it allows the use of divide-and-conquer algorithms when minimizing the split criterion. This is useful when more complex split families, such as intervals corners and rectangles, are considered. The split criterion is derived to imitate the Gini function as closely as possible by comparing preference regions for the two functions. The wacc function is evaluated in a large empirical comparison and is found to be competitive with the traditionally used functions.

Keywords

Entropy Hepatitis Beach Sorting Bete 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Ben80]
    John Bentley. Programming Pearls. ACM, 1980.Google Scholar
  2. [Ces]
    Bojan Cestnik. Hepatitis Data. Jozef Stefan Institute, Jamova 39, 61000 Ljubljana, Yugoslavia. From the UCI Machine Learning repository.Google Scholar
  3. [For90]
    Richard S. Forsyth. Bupa liver disorders. 8 Grosvenor Avenue, Mapperley Park, Nottingham NG3 5DX, 0602-621676,1990. From the UCI Machine Learning repository.Google Scholar
  4. [Ger]
    B German. Glass data. Central Research Establishment, Home Office Forensic Science Service, Aldermaston, Reading, Berkshire RG7 4PN. From the UCI Machine Learning repository.Google Scholar
  5. [Lub94]
    David J. Lubinsky. Bivariate splits and consistent split criteria in dichotomous classification trees. PhD Thesis, Department of Computer Science, Rutgers University, 1994.Google Scholar
  6. [Min89]
    John Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3: 319–342, 1989.Google Scholar
  7. [MM72]
    Robert Messenger and Lewis Mandell. A modal search technique for predictive nominal scale multivariate analysis. Journal of the American Statistical Association, 67: 768–772, 1972.CrossRefGoogle Scholar
  8. [Nat90]
    National Institute of Diabetes and Digestive and Kidney Diseases. Pima indians diabetes data. From the UCI Machine Learning repository, 1990.Google Scholar
  9. [Qui86]
    J.R. Quinlan. Induction of decision trees. Machine Learning, 1 (1): 81–106, 1986.Google Scholar
  10. [RDF]
    Long Beach Robert Detrano, V.A. Medical Center and Cleveland Clinic Foundation. Heart disease database. From the UCI Machine Learning repository.Google Scholar
  11. [Sta]
    Statlib. Liver disease diagnosis. From CMU statistics library.Google Scholar
  12. [ZS]
    M. Zwitter and M. Soklic. Lymphography data. University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. From the UCI Machine Learning repository.Google Scholar

Copyright information

© Springer-Verlag New York 1994

Authors and Affiliations

  • David Lubinsky
    • 1
  1. 1.Department of Computer ScienceUniversity of the WitwatersrandJohannesburgSouth Africa

Personalised recommendations