Knowledge and Information Systems

, Volume 14, Issue 1, pp 1–37 | Cite as

Top 10 algorithms in data mining

  • Xindong WuEmail author
  • Vipin Kumar
  • J. Ross Quinlan
  • Joydeep Ghosh
  • Qiang Yang
  • Hiroshi Motoda
  • Geoffrey J. McLachlan
  • Angus Ng
  • Bing Liu
  • Philip S. Yu
  • Zhi-Hua Zhou
  • Michael Steinbach
  • David J. Hand
  • Dan Steinberg
Survey Paper


This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.


Support Vector Machine Association Rule Mach Learn Frequent Itemsets Apriori Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, pp 487–499Google Scholar
  2. 2.
    Ahmed S, Coenen F and Leng PH (2006). Tree-based partitioning of date for association rule mining. Knowl Inf Syst 10(3): 315–331 CrossRefGoogle Scholar
  3. 3.
    Banerjee A, Merugu S, Dhillon I and Ghosh J (2005). Clustering with Bregman divergences. J Mach Learn Res 6: 1705–1749 MathSciNetGoogle Scholar
  4. 4.
    Bezdek JC, Chuah SK, Leep D (1986) Generalized k-nearest neighbor rules. Fuzzy Sets Syst 18(3):237–256.
  5. 5.
    Bloch DA, Olshen RA and Walker MG (2002). Risk estimation for classification trees. J Comput Graph Stat 11: 263–288 CrossRefMathSciNetGoogle Scholar
  6. 6.
    Bonchi F and Lucchese C (2006). On condensed representations of constrained frequent patterns. Knowl Inf Syst 9(2): 180–201 CrossRefGoogle Scholar
  7. 7.
    Breiman L (1968) Probability theory. Addison-Wesley, Reading. Republished (1991) in Classics of mathematics. SIAM, PhiladelphiaGoogle Scholar
  8. 8.
    Breiman L (1999). Prediction games and arcing classifiers. Neural Comput 11(7): 1493–1517 CrossRefGoogle Scholar
  9. 9.
    Breiman L, Friedman JH, Olshen RA and Stone CJ (1984). Classification and regression trees. Wadsworth, Belmont zbMATHGoogle Scholar
  10. 10.
    Brin S and Page L (1998). The anatomy of a large-scale hypertextual Web Search Sngine. Comput Networks 30(1–7): 107–117 Google Scholar
  11. 11.
    Chen JR (2007). Making clustering in delay-vector space meaningful. Knowl Inf Syst 11(3): 369–385 CrossRefGoogle Scholar
  12. 12.
    Cheung DW, Han J, Ng V, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the ACM SIGMOD international conference on management of data, pp. 13–23Google Scholar
  13. 13.
    Chi Y, Wang H, Yu PS and Muntz RR (2006). Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl Inf Syst 10(3): 265–294 CrossRefGoogle Scholar
  14. 14.
    Cost S, Salzberg S (1993) A weighted nearest neighbor algorithm for learning with symbolic features. Mach Learn 10:57.78 (PEBLS: Parallel Examplar-Based Learning System)Google Scholar
  15. 15.
    Cover T and Hart P (1967). Nearest neighbor pattern classification. IEEE Trans Inform Theory 13(1): 21–27 zbMATHCrossRefGoogle Scholar
  16. 16.
    Dasarathy BV (ed) (1991) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society PressGoogle Scholar
  17. 17.
    Dempster AP, Laird NM and Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J Roy Stat Soc B 39: 1–38 zbMATHMathSciNetGoogle Scholar
  18. 18.
    Devroye L, Gyorfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York. ISBN 0-387-94618-7Google Scholar
  19. 19.
    Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. KDD 2004, pp 551–556Google Scholar
  20. 20.
    Dietterich TG (1997). Machine learning: Four current directions. AI Mag 18(4): 97–136 Google Scholar
  21. 21.
    Domingos P (1999) MetaCost: A general method for making classifiers cost-sensitive. In: Proceedings of the fifth international conference on knowledge discovery and data mining, pp 155–164Google Scholar
  22. 22.
    Domingos P and Pazzani M (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29: 103–130 zbMATHCrossRefGoogle Scholar
  23. 23.
    Fix E, Hodges JL, Jr (1951) Discriminatory analysis, nonparametric discrimination. USAF School of Aviation Medicine, Randolph Field, Tex., Project 21-49-004, Rept. 4, Contract AF41(128)-31, February 1951Google Scholar
  24. 24.
    Freund Y and Schapire RE (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1): 119–139 zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches in logarithmic time. ACM Trans. Math. Software 3, 209. Also available as Stanford Linear Accelerator Center Rep. SIX-PUB-1549, February 1975Google Scholar
  26. 26.
    Friedman JH, Kohavi R, Yun Y (1996) Lazy decision trees. In: Proceedings of the thirteenth national conference on artificial intelligence, San Francisco, CA. AAAI Press/MIT Press, pp. 717–724Google Scholar
  27. 27.
    Friedman N, Geiger D and Goldszmidt M (1997). Bayesian network classifiers. Mach Learn 29: 131–163 zbMATHCrossRefGoogle Scholar
  28. 28.
    Fung G and Stoeckel J (2007). SVM feature selection for classification of SPECT images of Alzheimer’s disease using spatial information. Knowl Inf Syst 11(2): 243–258 CrossRefGoogle Scholar
  29. 29.
    Gates GW (1972). The reduced nearest neighbor rule. IEEE Trans Inform Theory 18: 431–433 CrossRefGoogle Scholar
  30. 30.
    Golub GH, Van Loan CF (1983) Matrix computations. The Johns Hopkins University PressGoogle Scholar
  31. 31.
    Gondek D and Hofmann T (2007). Non-redundant data clustering. Knowl Inf Syst 12(1): 1–24 CrossRefGoogle Scholar
  32. 32.
    Han E (1999) Text categorization using weight adjusted k-nearest neighbor classification. PhD thesis, University of Minnesota, October 1999Google Scholar
  33. 33.
    Hand DJ and Yu K (2001). Idiots Bayes—not so stupid after all?. Int Stat Rev 69: 385–398 zbMATHCrossRefGoogle Scholar
  34. 34.
    Gray RM and Neuhoff DL (1998). Quantization. IEEE Trans Inform Theory 44(6): 2325–2384 zbMATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Hart P (1968). The condensed nearest neighbor rule. IEEE Trans Inform Theory 14: 515–516 CrossRefGoogle Scholar
  36. 36.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD international conference on management of data, pp 1–12Google Scholar
  37. 37.
    Hastie T and Tibshirani R (1996). Discriminant adaptive nearest neighbor classification. IEEE Trans Pattern Anal Mach Intell 18(6): 607–616 CrossRefGoogle Scholar
  38. 38.
    Friedman J, Hastie T and Tibshirani R (2000). Additive logistic regression: a statistical view of boosting with discussions. Ann Stat 28(2): 337–407 zbMATHCrossRefMathSciNetGoogle Scholar
  39. 39.
    Herbrich R, Graepel T, Obermayer K (2000) Rank boundaries for ordinal regression. Adv Mar Classif pp 115–132Google Scholar
  40. 40.
    Hu T and Sung SY (2006). Finding centroid clusterings with entropy-based criteria. Knowl Inf Syst 10(4): 505–514 CrossRefGoogle Scholar
  41. 41.
    Hunt EB, Marin J and Stone PJ (1966). Experiments in induction. Academic Press, New York Google Scholar
  42. 42.
    Inokuchi A, Washio T and Motoda H (2005). General framework for mining frequent subgraphs from labeled graphs. Fundament Inform 66(1-2): 53–82 zbMATHMathSciNetGoogle Scholar
  43. 43.
    Jain AK and Dubes RC (1988). Algorithms for clustering data. Prentice-Hall, Englewood Cliffs zbMATHGoogle Scholar
  44. 44.
    Jin R, Goswami A and Agrawal G (2006). Fast and exact out-of-core and distributed k-means clustering. Knowl Inf Syst 10(1): 17–40 CrossRefGoogle Scholar
  45. 45.
    Kobayashi M and Aono M (2006). Exploring overlapping clusters using dynamic re-scaling and sampling. Knowl Inf Syst 10(3): 295–313 CrossRefGoogle Scholar
  46. 46.
    Koga H, Ishibashi T and Watanabe T (2007). Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing. Knowl Inf Syst 12(1): 25–53 CrossRefGoogle Scholar
  47. 47.
    Kukar M (2006). Quality assessment of individual classifications in machine learning and data mining. Knowl Inf Syst 9(3): 364–384 CrossRefGoogle Scholar
  48. 48.
    Kuramochi M and Karypis G (2005). Gene Classification using Expression Profiles: A Feasibility Study. Int J Artif Intell Tools 14(4): 641–660 CrossRefGoogle Scholar
  49. 49.
    Langville AN and Meyer CD (2006). Google’s PageRank and beyond: the science of search engine rankings. Princeton University Press, Princeton zbMATHGoogle Scholar
  50. 50.
    Leung CW-k, Chan SC-f and Chung F-L (2006). A collaborative filtering framework based on fuzzy association rules and multiple-level similarity. Knowl Inf Syst 10(3): 357–381 CrossRefGoogle Scholar
  51. 51.
    Li T, Zhu S and Ogihara M (2006). Using discriminant analysis for multi-class classification: an experimental investigation. Knowl Inf Syst 10(4): 453–472 CrossRefGoogle Scholar
  52. 52.
    Liu B (2007). Web data mining: exploring hyperlinks, contents and usage Data. Springer, Heidelberg zbMATHGoogle Scholar
  53. 53.
    Lloyd SP (1957) Least squares quantization in PCM. Unpublished Bell Lab. Tech. Note, portions presented at the Institute of Mathematical Statistics Meeting Atlantic City, NJ, September 1957. Also, IEEE Trans Inform Theory (Special Issue on Quantization), vol IT-28, pp 129–137, March 1982Google Scholar
  54. 54.
    Leung CK-S, Khan QI, Li Z and Hoque T (2007). CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inf Syst 11(3): 287–311 CrossRefGoogle Scholar
  55. 55.
    McLachlan GJ (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture.. Appl Stat 36: 318–324 CrossRefGoogle Scholar
  56. 56.
    McLachlan GJ and Krishnan T (1997). The EM algorithm and extensions. Wiley, New York zbMATHGoogle Scholar
  57. 57.
    McLachlan GJ and Peel D (2000). Finite Mixture Models. Wiley, New York zbMATHGoogle Scholar
  58. 58.
    Messenger RC and Mandell ML (1972). A model search technique for predictive nominal scale multivariate analysis. J Am Stat Assoc 67: 768–772 CrossRefGoogle Scholar
  59. 59.
    Morishita S, Sese J (2000) Traversing lattice itemset with statistical metric pruning. In: Proceedings of PODS’00, pp 226–236Google Scholar
  60. 60.
    Olshen R (2001). A conversation with Leo Breiman. Stat Sci 16(2): 184–198 zbMATHCrossRefMathSciNetGoogle Scholar
  61. 61.
    Page L, Brin S, Motwami R, Winograd T (1999) The PageRank citation ranking: bringing order to the Web. Technical Report 1999–0120, Computer Science Department, Stanford UniversityGoogle Scholar
  62. 62.
    Quinlan JR (1979) Discovering rules by induction from large collections of examples. In: Michie D (ed), Expert systems in the micro electronic age. Edinburgh University Press, EdinburghGoogle Scholar
  63. 63.
    Quinlan R (1989) Unknown attribute values in induction. In: Proceedings of the sixth international workshop on machine learning, pp. 164–168Google Scholar
  64. 64.
    Quinlan JR (1993). C4.5: Programs for machine learning. Morgan Kaufmann Publishers, San Mateo Google Scholar
  65. 65.
    Reyzin L, Schapire RE (2006) How boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd international conference on machine learning. Pittsburgh, PA, pp. 753–760Google Scholar
  66. 66.
    Ridgeway G, Madigan D and Richardson T (1998). Interpretable boosted naive Bayes classification. In: Agrawal, R, Stolorz, P, and Piatetsky-Shapiro, G (eds) Proceedings of the fourth international conference on knowledge discovery and data mining., pp 101–104. AAAI Press, Menlo Park Google Scholar
  67. 67.
    Schapire RE (1990). The strength of weak learnability. Mach Learn 5(2): 197–227 Google Scholar
  68. 68.
    Schapire RE, Freund Y, Bartlett P and Lee WS (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann Stat 26(5): 1651–1686 zbMATHCrossRefMathSciNetGoogle Scholar
  69. 69.
    Schapire RE and Singer Y (1999). Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3): 297–336 zbMATHCrossRefGoogle Scholar
  70. 70.
    Scholkopf B, Smola AJ (2002) Learning with kernels. MIT PressGoogle Scholar
  71. 71.
    Seidl T and Kriegel H (1998). Optimal multi-step k-nearest neighbor search. In: Tiwary, A and Franklin, M (eds) Proceedings of the 1998 ACM SIGMOD international conference on management of data, Seattle, Washington, United States, 1–4 June, 1998, pp 154–165. ACM Press, New York CrossRefGoogle Scholar
  72. 72.
    Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21st VLDB conference. pp. 407–419Google Scholar
  73. 73.
    Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: Proceedings of the KDD Workshop on Text MiningGoogle Scholar
  74. 74.
    Steinbach M and Kumar V (2007). Generalizing the notion of confidence. Knowl Inf Syst 12(3): 279–299 CrossRefGoogle Scholar
  75. 75.
    Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson Addison-WesleyGoogle Scholar
  76. 76.
    Tao D, Li X, Wu X, Hu W and Maybank SJ (2007). Supervised tensor learning. Knowl Inf Syst 13(1): 1–42 CrossRefGoogle Scholar
  77. 77.
    Thabtah FA, Cowling PI and Peng Y (2006). Multiple labels associative classification. Knowl Inf Syst 9(1): 109–129 CrossRefGoogle Scholar
  78. 78.
    Ting KM (2002). An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14: 659–665 CrossRefGoogle Scholar
  79. 79.
    Toussaint GT (2002) Proximity graphs for nearest neighbor decision rules: recent progress. In: Interface-2002, 34th symposium on computing and statistics (theme: Geoscience and Remote Sensing). Ritz-Carlton Hotel, Montreal, Canada, 17–20 April, 2002Google Scholar
  80. 80.
    Toussaint GT (2002) Open problems in geometric methods for instance-based learning. JCDCG 273–283Google Scholar
  81. 81.
    Tsang IW, Kwok JT and Cheung P-M (2005). Core vector machines: Fast SVM training on very large data sets. J Mach Learn Res 6: 363–392 MathSciNetGoogle Scholar
  82. 82.
    Uno T, Asai T, Uchida Y, Arimura H (2004) An efficient algorithm for enumerating frequent closed patterns in transaction databases. In: Proc. of the 7th international conference on discovery science. LNAI vol 3245, Springe, Heidelberg, pp 16–30Google Scholar
  83. 83.
    Vapnik V (1995). The nature of statistical learning theory. Springer, New York zbMATHGoogle Scholar
  84. 84.
    Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. pages 511–518, Kauai, HIGoogle Scholar
  85. 85.
    Washio T, Nakanishi K, Motoda H (2005) Association rules based on levelwise subspace clustering. In: Proceedings. of 9th European conference on principles and practice of knowledge discovery in databases. LNAI, vol 3721, pp. 692–700 Springer, HeidelbergGoogle Scholar
  86. 86.
    Wasserman S and Raust K (1994). Social network analysis. Cambridge University Press, Cambridge Google Scholar
  87. 87.
    Wettschereck D, Aha D and Mohri T (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif Intell Rev 11: 273–314 CrossRefGoogle Scholar
  88. 88.
    Wilson DL (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cyberne 2: 408–420 zbMATHCrossRefGoogle Scholar
  89. 89.
    Yang Q and Wu X (2006). 10 challenging problems in data mining research. Int J Inform Technol Decis Making 5(4): 597–604 CrossRefGoogle Scholar
  90. 90.
    Yan X, Han J (2002) gSpan: Graph-based substructure pattern mining. In: Proceedings of ICDM’02, pp 721–724Google Scholar
  91. 91.
    Yu PS, Li X, Liu B (2005) Adding the temporal dimension to search—a case study in publication search. In: Proceedings of Web Intelligence (WI’05)Google Scholar
  92. 92.
    Zhang J, Kang D-K, Silvescu A and Honavar V (2006). Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data. Knowl Inf Syst 9(2): 157–179CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  • Xindong Wu
    • 1
    Email author
  • Vipin Kumar
    • 2
  • J. Ross Quinlan
    • 3
  • Joydeep Ghosh
    • 4
  • Qiang Yang
    • 5
  • Hiroshi Motoda
    • 6
  • Geoffrey J. McLachlan
    • 7
  • Angus Ng
    • 8
  • Bing Liu
    • 9
  • Philip S. Yu
    • 10
  • Zhi-Hua Zhou
    • 11
  • Michael Steinbach
    • 12
  • David J. Hand
    • 13
  • Dan Steinberg
    • 14
  1. 1.Department of computer ScienceUniversity of VermontBurlingtonUSA
  2. 2.Department of Computer Science and EngineeringUniversity of MinnesotaMinneapolisUSA
  3. 3.Rulequest Research Pty LtdSt IvesAustralia
  4. 4.Department of Electrical and Computer EngineeringUniversity of Texas at AustinAustinUSA
  5. 5.Department of Computer ScienceHong Kong University of Science and TechnologyHonkongChina
  6. 6.AFOSR/AOARD and Osaka UniversityMinato-ku, TokyoJapan
  7. 7.Department of MathematicsThe University of QueenslandBrisbaneAustralia
  8. 8.School of MedicineGriffith UniversityBrisbaneAustralia
  9. 9.Department of Computer ScienceUniversity of Illinois at ChicagoChicagoUSA
  10. 10.IBM T. J. Watson Research CenterHawthorneUSA
  11. 11.National Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  12. 12.Department of Computer Science and EngineeringUniversity of MinnesotaMinneapolisUSA
  13. 13.Department of MathematicsImperial CollegeLondonUK
  14. 14.Salford SystemsSan DiegoUSA

Personalised recommendations