Algorithmic speedups in growing classification trees by using an additive split criterion
We propose a new split criterion to be used in building classification trees. This criterion called weighted accuracy or wacc has the advantage that it allows the use of divide-and-conquer algorithms when minimizing the split criterion. This is useful when more complex split families, such as intervals corners and rectangles, are considered. The split criterion is derived to imitate the Gini function as closely as possible by comparing preference regions for the two functions. The wacc function is evaluated in a large empirical comparison and is found to be competitive with the traditionally used functions.
KeywordsSplit Function Current Group Optimal Interval Split Criterion Good Split
Unable to display preview. Download preview PDF.
- [Ben80]John Bentley. Programming Pearls. ACM, 1980.Google Scholar
- [Ces]Bojan Cestnik. Hepatitis Data. Jozef Stefan Institute, Jamova 39, 61000 Ljubljana, Yugoslavia. From the UCI Machine Learning repository.Google Scholar
- [For90]Richard S. Forsyth. Bupa liver disorders. 8 Grosvenor Avenue, Mapperley Park, Nottingham NG3 5DX, 0602-621676,1990. From the UCI Machine Learning repository.Google Scholar
- [Ger]B German. Glass data. Central Research Establishment, Home Office Forensic Science Service, Aldermaston, Reading, Berkshire RG7 4PN. From the UCI Machine Learning repository.Google Scholar
- [Lub94]David J. Lubinsky. Bivariate splits and consistent split criteria in dichotomous classification trees. PhD Thesis, Department of Computer Science, Rutgers University, 1994.Google Scholar
- [Min89]John Mingers. An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3: 319–342, 1989.Google Scholar
- [Nat90]National Institute of Diabetes and Digestive and Kidney Diseases. Pima indians diabetes data. From the UCI Machine Learning repository, 1990.Google Scholar
- [Qui86]J.R. Quinlan. Induction of decision trees. Machine Learning, 1 (1): 81–106, 1986.Google Scholar
- [RDF]Long Beach Robert Detrano, V.A. Medical Center and Cleveland Clinic Foundation. Heart disease database. From the UCI Machine Learning repository.Google Scholar
- [Sta]Statlib. Liver disease diagnosis. From CMU statistics library.Google Scholar
- [ZS]M. Zwitter and M. Soklic. Lymphography data. University Medical Centre, Institute of Oncology, Ljubljana, Yugoslavia. From the UCI Machine Learning repository.Google Scholar