Knowledge and Information Systems

, Volume 14, Issue 1, pp 1–37

Top 10 algorithms in data mining

Authors

    • Department of computer ScienceUniversity of Vermont
  • Vipin Kumar
    • Department of Computer Science and EngineeringUniversity of Minnesota
  • J. Ross Quinlan
    • Rulequest Research Pty Ltd
  • Joydeep Ghosh
    • Department of Electrical and Computer EngineeringUniversity of Texas at Austin
  • Qiang Yang
    • Department of Computer ScienceHong Kong University of Science and Technology
  • Hiroshi Motoda
    • AFOSR/AOARD and Osaka University
  • Geoffrey J. McLachlan
    • Department of MathematicsThe University of Queensland
  • Angus Ng
    • School of MedicineGriffith University
  • Bing Liu
    • Department of Computer ScienceUniversity of Illinois at Chicago
  • Philip S. Yu
    • IBM T. J. Watson Research Center
  • Zhi-Hua Zhou
    • National Key Laboratory for Novel Software TechnologyNanjing University
  • Michael Steinbach
    • Department of Computer Science and EngineeringUniversity of Minnesota
  • David J. Hand
    • Department of MathematicsImperial College
  • Dan Steinberg
    • Salford Systems
Survey Paper

DOI: 10.1007/s10115-007-0114-2

Cite this article as:
Wu, X., Kumar, V., Ross Quinlan, J. et al. Knowl Inf Syst (2008) 14: 1. doi:10.1007/s10115-007-0114-2

Abstract

This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.

Copyright information

© Springer-Verlag London Limited 2007