Data Sets and Proper Statistical Analysis of Data Mining Techniques

  • Salvador GarcíaEmail author
  • Julián Luengo
  • Francisco Herrera
Part of the Intelligent Systems Reference Library book series (ISRL, volume 72)


Presenting a Data Mining technique and analyzing it often involves using a data set related to the domain. In research fortunately many well-known data sets are available and widely used to check the performance of the technique being considered. Many of the subsequent sections of this book include a practical experimental comparison of the techniques described in each one as a exemplification of this process. Such comparisons require a clear bed test in order to enable the reader to be able to replicate and understand the analysis and the conclusions obtained. First we provide an insight of the data sets used to study the algorithms presented as representative in each section in Sect. 2.1. In this section we elaborate on the data sets used in the rest of the book indicating their characteristics, sources and availability. We also delve in the partitioning procedure and how it is expected to alleviate the problematic associated to the validation of any supervised method as well as the details of the performance measures that will be used in the rest of the book. Section 2.2 takes a tour of the most common statistical techniques required in the literature to provide meaningful and correct conclusions. The steps followed to correctly use and interpret the statistical test outcome are also given.


  1. 1.
    Alpaydin, E.: Introduction to Machine Learning, 2nd edn. MIT Press, Cambridge (2010)zbMATHGoogle Scholar
  2. 2.
    Bache, K., Lichman, M.: UCI machine learning repository (2013).
  3. 3.
    Barandela, R., Sánchez, J.S., García, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognit. 36(3), 849–851 (2003)CrossRefGoogle Scholar
  4. 4.
    Ben-David, A.: A lot of randomness is hiding in accuracy. Eng. Appl. Artif. Intell. 20(7), 875–885 (2007)CrossRefGoogle Scholar
  5. 5.
    Děmsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)zbMATHMathSciNetGoogle Scholar
  6. 6.
    Efron, B., Gong, G.: A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37(1), 36–48 (1983)MathSciNetGoogle Scholar
  7. 7.
    Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)CrossRefGoogle Scholar
  8. 8.
    Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11(1), 86–92 (1940)CrossRefGoogle Scholar
  9. 9.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 180(10), 2044–2064 (2010)CrossRefGoogle Scholar
  10. 10.
    García, S., Herrera, F.: An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)zbMATHGoogle Scholar
  11. 11.
    Hochberg, Y.: A sharper bonferroni procedure for multiple tests of significance. Biometrika 75(4), 800–802 (1988)CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Hodges, J., Lehmann, E.: Rank methods for combination of independent experiments in analysis of variance. Ann. Math. Statist 33, 482–497 (1962)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)zbMATHMathSciNetGoogle Scholar
  14. 14.
    Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3), 299–310 (2005)CrossRefGoogle Scholar
  15. 15.
    Iman, R., Davenport, J.: Approximations of the critical region of the Friedman statistic. Commun. Stat. 9, 571–595 (1980)CrossRefGoogle Scholar
  16. 16.
    Koch, G.: The use of non-parametric methods in the statistical analysis of a complex split plot experiment. Biometrics 26(1), 105–128 (1970)CrossRefGoogle Scholar
  17. 17.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th international joint conference on Artificial intelligence. IJCAI’95, vol. 2, pp. 1137–1143. Morgan Kaufmann Publishers Inc., San Francisco, CA (1995)Google Scholar
  18. 18.
    Landgrebe, T.C., Duin, R.P.: Efficient multiclass ROC approximation by decomposition via confusion matrix perturbation analysis. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 810–822 (2008)CrossRefGoogle Scholar
  19. 19.
    Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)CrossRefzbMATHGoogle Scholar
  20. 20.
    Luengo, J., García, S., Herrera, F.: A study on the use of statistical tests for experimentation with neural networks: Analysis of parametric test conditions and non-parametric tests. Expert Syst. Appl. 36(4), 7798–7808 (2009)CrossRefGoogle Scholar
  21. 21.
    Moreno-Torres, J.G., Sáez, J.A., Herrera, F.: Study on the impact of partition-induced dataset shift on k -fold cross-validation. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1304–1312 (2012)CrossRefGoogle Scholar
  22. 22.
    Salzberg, S.L.: On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Min. Knowl. Discov. 1(3), 317–328 (1997)CrossRefGoogle Scholar
  23. 23.
    Shaffer, J.P.: Multiple hypothesis testing. Annu. Rev. Psychol. 46(1), 561–584 (1995)CrossRefGoogle Scholar
  24. 24.
    Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC, Boca Raton (2007)zbMATHGoogle Scholar
  25. 25.
    Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, f-score and roc: A family of discriminant measures for performance evaluation. In: A. Sattar, B.H. Kang (eds.) Australian Conference on Artificial Intelligence, Lecture Notes in Computer Science, vol. 4304, pp. 1015–1021. Springer (2006).Google Scholar
  26. 26.
    Stone, M.: Asymptotics for and against cross-validation. Biometrika 64(1), 29–35 (1977)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Tan, K.C., Yu, Q., Ang, J.H.: A coevolutionary algorithm for rules discovery in data mining. Int. J. Syst. Sci. 37(12), 835–864 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  28. 28.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco (2005)Google Scholar
  29. 29.
    Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. Trans. Evol. Comp. 1(1), 67–82 (1997)CrossRefGoogle Scholar
  30. 30.
    Wright, S.P.: Adjusted P-values for simultaneous inference. Biometrics 48(4), 1005–1013 (1992)CrossRefGoogle Scholar
  31. 31.
    Youden, W.J.: Index for rating diagnostic tests. Cancer 3(1), 32–35 (1950)CrossRefGoogle Scholar
  32. 32.
    Zar, J.: Biostatistical Analysis, 4th edn. Prentice Hall, Upper Saddle River (1999)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Salvador García
    • 1
    Email author
  • Julián Luengo
    • 2
  • Francisco Herrera
    • 3
  1. 1.Department of Computer ScienceUniversity of JaénJaénSpain
  2. 2.Department of Civil EngineeringUniversity of BurgosBurgosSpain
  3. 3.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations