A New Incremental Algorithm for Induction of Multivariate Decision Trees for Large Datasets
Several algorithms for induction of decision trees have been developed to solve problems with large datasets, however some of them have spatial and/or runtime problems using the whole training sample for building the tree and others do not take into account the whole training set. In this paper, we introduce a new algorithm for inducing decision trees for large numerical datasets, called IIMDT, which builds the tree in an incremental way and therefore it is not necesary to keep in main memory the whole training set. A comparison between IIMDT and ICE, an algorithm for inducing decision trees for large datasets, is shown.
KeywordsDecision trees supervised classification large datasets
Unable to display preview. Download preview PDF.
- 1.Dunham, M.: Data Mining, Introductory and Advanced Topics. Prentice Hall, New Jersey (2003)Google Scholar
- 2.Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Boston (2006)Google Scholar
- 3.Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
- 6.Utgoff, P.E.: An improved algorithm for incremental induction of decision trees. In: Proc. 11th International Conference on Machine Learning, pp. 318–325 (1994)Google Scholar
- 10.Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: Proc. 22nd International Conference Very Large Databases, pp. 544–555 (1996)Google Scholar
- 11.Alsabti, K., Ranka, S., Singh, V.: CLOUDS: A decision tree classifier for large datasets. In: Proc. Conference Knowledge Discovery and Data Mining (KDD 1998), pp. 2–8 (1998)Google Scholar
- 12.Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest - a framework for fast decision tree classification of large datasets. In: Proc. of VLDB Conference, New York, pp. 416–427 (1998)Google Scholar
- 13.Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.: BOAT - optimistic decision tree construction. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 169–180 (1999)Google Scholar
- 14.Yoon, H., Alsabti, K., Ranka, S.: Tree-based incremental classification for large datasets. Technical Report TR-99-013, CISE Department, University of Florida, Gainesville, FL. 32611 (1999)Google Scholar
- 15.UCI machine learning repository, University of California (2007), http://www.ics.uci.edu/mlearn/MLRepository.html
- 16.Adelman-McCarthy, J., Agueros, M.A., Allam, S.S.: Data Release 6, ApJS, 175 (in press, 2008)Google Scholar