SLIQ: A fast scalable classifier for data mining

  • Manish Mehta
  • Rakesh Agrawal
  • Jorma Rissanen
Data Mining
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1057)

Abstract

Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classifier and presents the design of SLIQ, a new classifier. SLIQ is a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees. The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R. Agrawal, T. Imielinski, and A. Swami. Database mining: A performance perspective. IEEE Trans. on Knowledge and Data Engineering, 5(6), Dec. 1993.Google Scholar
  2. 2.
    J. Catlett. Megainduction: Machine Learning on Very Large Databases. PhD thesis, University of Sydney, 1991.Google Scholar
  3. 3.
    P. K. Chan and S. J. Stolfo. Meta-learning for multistrategy and parallel learning. In Proc. Second Intl. Workshop on Multistrategy Learning, pages 150–165, 1993.Google Scholar
  4. 4.
    L. Breiman et. al. Classification and Regression Trees. Wadsworth, Belmont, 1984.Google Scholar
  5. 5.
    R. Agrawal et. al. An interval classifier for database mining applications. In Proc. of the VLDB Conf., Vancouver, British Columbia, Canada, August 1992.Google Scholar
  6. 6.
    M. Mehta, J. Rissanen, and R. Agrawal. MDL-based decision tree pruning. In Int'l Conf. on Knowledge Discovery in Databases and Data Mining (KDD-95), Montreal, Canada, Aug. 1995.Google Scholar
  7. 7.
    D. Michie, D. J. Spiegelhalter, and C. C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.Google Scholar
  8. 8.
    NASA Ames Res. Ctr. Intro. to IND Version 2.1, GA23-2475-02 edition, 1992.Google Scholar
  9. 9.
    J. R. Quinlan and R. L. Rivest. Inferring decision trees using minimum description length principle. Information and Computation, 1989.Google Scholar
  10. 10.
    J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.Google Scholar
  11. 11.
    J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific Publ. Co., 1989.Google Scholar
  12. 12.
    C. Wallace and J. Patrick. Coding decision trees. Machine Learning, 11:7–22, 1993.Google Scholar
  13. 13.
    S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems. Morgan Kaufman, 1991.Google Scholar

Copyright information

© Springer-Verlag 1996

Authors and Affiliations

  • Manish Mehta
    • 1
  • Rakesh Agrawal
    • 1
  • Jorma Rissanen
    • 1
  1. 1.IBM Almaden Research CenterSan Jose

Personalised recommendations