Survey Paper

Knowledge and Information Systems

, Volume 14, Issue 1, pp 1-37

First online:

Top 10 algorithms in data mining

  • Xindong WuAffiliated withDepartment of computer Science, University of Vermont Email author 
  • , Vipin KumarAffiliated withDepartment of Computer Science and Engineering, University of Minnesota
  • , J. Ross QuinlanAffiliated withRulequest Research Pty Ltd
  • , Joydeep GhoshAffiliated withDepartment of Electrical and Computer Engineering, University of Texas at Austin
  • , Qiang YangAffiliated withDepartment of Computer Science, Hong Kong University of Science and Technology
  • , Hiroshi MotodaAffiliated withAFOSR/AOARD and Osaka University
  • , Geoffrey J. McLachlanAffiliated withDepartment of Mathematics, The University of Queensland
  • , Angus NgAffiliated withSchool of Medicine, Griffith University
  • , Bing LiuAffiliated withDepartment of Computer Science, University of Illinois at Chicago
    • , Philip S. YuAffiliated withIBM T. J. Watson Research Center
    • , Zhi-Hua ZhouAffiliated withNational Key Laboratory for Novel Software Technology, Nanjing University
    • , Michael SteinbachAffiliated withDepartment of Computer Science and Engineering, University of Minnesota
    • , David J. HandAffiliated withDepartment of Mathematics, Imperial College
    • , Dan SteinbergAffiliated withSalford Systems

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Abstract

This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.