Advertisement

A New Incremental Algorithm for Induction of Multivariate Decision Trees for Large Datasets

  • Anilu Franco-Arcega
  • J. Ariel Carrasco-Ochoa
  • Guillermo Sánchez-Díaz
  • J. Fco Martínez-Trinidad
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5326)

Abstract

Several algorithms for induction of decision trees have been developed to solve problems with large datasets, however some of them have spatial and/or runtime problems using the whole training sample for building the tree and others do not take into account the whole training set. In this paper, we introduce a new algorithm for inducing decision trees for large numerical datasets, called IIMDT, which builds the tree in an incremental way and therefore it is not necesary to keep in main memory the whole training set. A comparison between IIMDT and ICE, an algorithm for inducing decision trees for large datasets, is shown.

Keywords

Decision trees supervised classification large datasets 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dunham, M.: Data Mining, Introductory and Advanced Topics. Prentice Hall, New Jersey (2003)Google Scholar
  2. 2.
    Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Boston (2006)Google Scholar
  3. 3.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  4. 4.
    Pao, H.-K., Chang, S.-C., Lee, Y.-J.: Model trees for classification of hybrid data types. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 32–39. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Pérez, J., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.: Combining multiple class distribution modified subsamples in a single tree. Pattern Recognition Letters 28(4), 414–422 (2007)CrossRefGoogle Scholar
  6. 6.
    Utgoff, P.E.: An improved algorithm for incremental induction of decision trees. In: Proc. 11th International Conference on Machine Learning, pp. 318–325 (1994)Google Scholar
  7. 7.
    Pedrycz, W., Sosnowski: C-fuzzy decision trees. IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and reviews 35(4), 498–511 (2005)CrossRefGoogle Scholar
  8. 8.
    Agrawal, R., Imielinski, T., Swami, A.: Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering 5(6), 914–925 (1993)CrossRefGoogle Scholar
  9. 9.
    Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classifier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  10. 10.
    Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: Proc. 22nd International Conference Very Large Databases, pp. 544–555 (1996)Google Scholar
  11. 11.
    Alsabti, K., Ranka, S., Singh, V.: CLOUDS: A decision tree classifier for large datasets. In: Proc. Conference Knowledge Discovery and Data Mining (KDD 1998), pp. 2–8 (1998)Google Scholar
  12. 12.
    Gehrke, J., Ramakrishnan, R., Ganti, V.: Rainforest - a framework for fast decision tree classification of large datasets. In: Proc. of VLDB Conference, New York, pp. 416–427 (1998)Google Scholar
  13. 13.
    Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.: BOAT - optimistic decision tree construction. In: Proc. of the ACM SIGMOD Conference on Management of Data, pp. 169–180 (1999)Google Scholar
  14. 14.
    Yoon, H., Alsabti, K., Ranka, S.: Tree-based incremental classification for large datasets. Technical Report TR-99-013, CISE Department, University of Florida, Gainesville, FL. 32611 (1999)Google Scholar
  15. 15.
    UCI machine learning repository, University of California (2007), http://www.ics.uci.edu/mlearn/MLRepository.html
  16. 16.
    Adelman-McCarthy, J., Agueros, M.A., Allam, S.S.: Data Release 6, ApJS, 175 (in press, 2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Anilu Franco-Arcega
    • 1
  • J. Ariel Carrasco-Ochoa
    • 1
  • Guillermo Sánchez-Díaz
    • 2
  • J. Fco Martínez-Trinidad
    • 1
  1. 1.Computer Science Department National Institute of AstrophysicsOptics and ElectronicsSanta Maria TonantzintlaMexico
  2. 2.Centro Universitario de los VallesUniversidad de GuadalajaraAmecaMexico

Personalised recommendations