A Primer of Statistical Methods for Classification

  • Rajarshi Dey
  • Madhuri S. MulekarEmail author
Part of the STEAM-H: Science, Technology, Engineering, Agriculture, Mathematics & Health book series (STEAM)


Classification techniques are commonly used by scientists and businesses alike for decision-making. They involve assignment of objects (or information) to pre-defined groups (or classes) using certain known characteristics such as classifying emails as real or spam using information in subject field of email. Here we describe two soft and four hard classifiers popularly used by statisticians in practice. To demonstrate their applications, two simulated and three real-life datasets are used to develop classification criteria. The results of different classifiers are compared using misclassification rate and an uncertainty measure.


  1. Agresti, A.: Categorical Data Analysis. Wiley, Hoboken (2013)zbMATHGoogle Scholar
  2. Aizerman, M.A., Braverman, E.M., Rozonoer, L.I.: Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote. Control. 25, 821–837 (1964)zbMATHGoogle Scholar
  3. Anderson, E.: The irises of the Gaspe Peninsula. Bull. Am. Iris Soc. 59, 2–5 (1935)Google Scholar
  4. Asparoukhov, O.K., Krzanowski, W.J.: A comparison of discriminant procedures for binary variables. Comput. Stat. Data Anal. 38, 139–160 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  5. Bayes, T.: An essay towards solving a problem in the doctrine of chances. Philos. Trans. 53, 370–418 (1763)MathSciNetzbMATHCrossRefGoogle Scholar
  6. Berkson, J.: Applications of the logistic function to bioassay. J. Am. Stat. Assoc. 9, 357–365 (1944)Google Scholar
  7. Bhatt, R.B., Sharma, G., Dhall, A., Chaudhury, S.: Efficient Skin Region Segmentation Using Low Complexity Fuzzy Decision Tree Model. IEEE-Indicon, Ahmedabad (2009)CrossRefGoogle Scholar
  8. Bhattacharya, S., Sanjeev, J., Tharakunnel, K., Westland, J.C.: Data mining for credit card fraud: a comparative study. Decis. Support. Syst. 50, 602–613 (2011)CrossRefGoogle Scholar
  9. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory – COLT ’92. p. 144 (1992)Google Scholar
  10. Bottou, L., Cortes, C., Denker, J.S., Drucker, L., Guyon, I., Jackel, L., LeCun, Y., Muller, U.A., Sackinger, E., Simard, P., Vapnik, V.N.: Comparison of classifier methods: a case study in handwriting digit recognition. Int. Conf. Pattern Recognit. 2, 77–87 (1994)CrossRefGoogle Scholar
  11. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  12. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  13. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)zbMATHGoogle Scholar
  14. Burrus, C.S., Barreto, J.A., Selesnick, I.W.: Iterative reweighted least squares design of FIR filters. IEEE Trans. Signal Process. 42(11), 2922–2936 (1994)CrossRefGoogle Scholar
  15. Chipman, H.A., George, E.I., McCulloch, R.E.: Bayesian CART model search. J. Am. Stat. Assoc. 93, 935–948 (1998)CrossRefGoogle Scholar
  16. Chomboon, K., Pasapichi, C., Pongsakorn, T., Kerdprasop, K., Kerdprasop, N.: An empirical study of distance mreics for K-nearest neighbor algorithm. 3rd International Conference on Industrial Application Engineering, 280–285 (2015)Google Scholar
  17. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  18. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory. 13(1), 21–27 (1967)zbMATHCrossRefGoogle Scholar
  19. Cox, D.R.: Analysis of Binary Data. Chapman and Hall, London (1969)Google Scholar
  20. Cramer, J.S.: The origins and development of the logit model. In: Cramer, J.S. (ed.) Logit Models from Economics and Other Fields, pp. 149–158. Cambridge University Press, Cambridge (2003)zbMATHCrossRefGoogle Scholar
  21. Denison, D.G.T., Mallick, B.K., Smith, A.F.M.: A Bayesian CART algorithm. Biometrika. 85, 363–377 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  22. Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. CRC Press, Boca Raton (2015)zbMATHGoogle Scholar
  23. Finch, W.H., Schneider, M.K.: Misclassification rates for four methods of group classification. Educ. Psychol. Meas. 66(2), 240–257 (2006)MathSciNetCrossRefGoogle Scholar
  24. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics. 7(2), 179–188 (1936)CrossRefGoogle Scholar
  25. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 5(1), 119–139 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  26. Grabusts, P.: The choice of metrics for clustering algorithms. Proceedinhs of the 8th International Scientific and Practical Conference, 11, 70–76 (2011)CrossRefGoogle Scholar
  27. Gurland, J., Lee, I., Dahm, P.A.: Polychotomous quantal response in biological assay. Biometrics. 16, 382–398 (1960)zbMATHCrossRefGoogle Scholar
  28. Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29(2), 147–160 (1950)MathSciNetzbMATHCrossRefGoogle Scholar
  29. Hand, D.J.: Classifier technology and the illusion of progress. Stat. Sci. 21, 1–14 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  30. Hand, D.J.: Assessing the performance of classification methods. Int. Stat. Rev. 80, 400–414 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  31. Hand, D.J., Yu, K.: Idiot’s Bayes - not so stupid after all? Int. Stat. Rev. 69(3), 385–399 (2001)zbMATHGoogle Scholar
  32. Har-Peled, S., Indyk, P., Motwani, R.: Approximate nearest neighbor: towards removing the curse of dimensionality. Theory Comput. 8, 321–350 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  33. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York (2001)zbMATHCrossRefGoogle Scholar
  34. Hosmer, T., Hosmer, D.W., Fisher, L.L.: A comparison of the maximum likelihood and discriminant function estimators of the coefficients of the logistic regression model for mixed continuous and discrete variables. Commun. Stat. 12, 577–593 (1983)CrossRefGoogle Scholar
  35. Hsu, C.W., Lin, C.J.: A comparison of methods for multi-class support vector machines. IEEE Trans. Neural Netw. 13(2), 415–425 (2002)CrossRefGoogle Scholar
  36. Izenman, A.J.: Modern Multivariate Statistical Techniques: Regression, Classification and Manifold Learning. Springer, New York (2008)zbMATHCrossRefGoogle Scholar
  37. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: With Applications in R. Springer, New York (2013)zbMATHCrossRefGoogle Scholar
  38. Kiang, M.: A comparative assessment of classification methods. Decis. Support. Syst. 35, 441–454 (2003)CrossRefGoogle Scholar
  39. Kleinbaum, D.G., Klein, M.: Logistic Regression: A Self-learning Text, 3rd edn. Springer, New York (2010)zbMATHCrossRefGoogle Scholar
  40. Kressel, U.H.G.: Pairwise classification and support vector machines. In: Advances in Kernel Methods: Support Vector Learning, pp. 255–268. MIT Press, Cambridge (1998)Google Scholar
  41. Lange, K.: MM Optimization Algorithms. SIAM, Philadelphia (2016)zbMATHCrossRefGoogle Scholar
  42. Lichman, M.: UCI Machine Learning Repository []. University of California, Irvine 2013
  43. Liu, W.Z., White, A.P.: A comparison of nearest neighbor and tree-based methods of non-parametric discriminant analysis. J. Stat. Comput. Simul. 53, 41–50 (1995)zbMATHCrossRefGoogle Scholar
  44. Liu, Y., Zhang, H.H., Wu, Y.: Hard or soft classification? Large-margin unified machines. J. Am. Stat. Assoc. 106(493), 166–177 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  45. Loh, W.Y.: Improving the precision of classification trees. Ann. Appl. Stat. 3, 1710–1737 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  46. Mahalanobis, P.C.: On the generalized distance in statistics. Proceedings of the National Institute of Science in India, 2(1), 49–55, (1936)Google Scholar
  47. Mai, Q., Zou, H.: Semiparametric sparse discriminant analysis in ultra-high dimensions. J. Multivar. Anal. 135, 175–188 (2015)zbMATHCrossRefGoogle Scholar
  48. Mantel, N.: Models for complex contingency tables and polychotomous response curves. Biometrics. 22, 83–110 (1966)CrossRefGoogle Scholar
  49. McLachlan, G.J.: Discriminant Analysis and Statistical Pattern Recognition. Wiley-Interscience, New York (2004)zbMATHGoogle Scholar
  50. McLachlan, G.J., Byth, K.: Expected error rates for logistic regression versus normal discriminant analysis. Biom. J. 21, 47–56 (1979)MathSciNetzbMATHCrossRefGoogle Scholar
  51. Menard, S.: Applied Logistic Regression Analysis, 2nd edn. Sage Publications, Thousand Oaks (2002)CrossRefGoogle Scholar
  52. Meshbane, A., Morris, J.D.: A method for selecting between linear and quadratic classification models in discriminant analysis. J. Exp. Educ. 63(3), 263–273 (1996)CrossRefGoogle Scholar
  53. Messenger, R., Mandell, L.: A modal search technique for predictive nominal scale multivariate analysis. J. Am. Stat. Assoc. 67, 768–772 (1972)Google Scholar
  54. Mills, P.: Efficient statistical classification of satellite measurements. Int. J. Remote Sens. 32, 6109–6132 (2011)CrossRefGoogle Scholar
  55. Mulekar, M.S., Brown, C.S.: Distance and similarity measures. In: Alhaji, R., Rekne, J. (eds.) Encyclopedia of Social Network and Mining (ESNAM), pp. 385–400. Springer, New York (2014)Google Scholar
  56. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)zbMATHGoogle Scholar
  57. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2-3), 103–134 (2000)zbMATHCrossRefGoogle Scholar
  58. Nilsson, N.: Learning Machines: Foundations of Trainable Pattern-Classifying Systems. McGraw-Hill, New York (1965)zbMATHGoogle Scholar
  59. Quenouille, M.H.: Problems in plane sampling. Ann. Math. Stat. 20(3), 355–375 (1949)MathSciNetzbMATHCrossRefGoogle Scholar
  60. Quenouille, M.H.: Notes on bias in estimation. Biometrika. 43(3-4), 353–360 (1956)MathSciNetzbMATHCrossRefGoogle Scholar
  61. Rao, R.C.: The utilization of multiple measurements in problems of biological classification. J. R. Stat. Soc. Ser. B. 10(2), 159–203 (1948)MathSciNetzbMATHGoogle Scholar
  62. Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI Workshop on Empirical Methods in AI, Sicily, Italy (2001)Google Scholar
  63. Samworth, R.J.: Optimal weighted nearest neighbour classifiers. Ann. Stat. 40(5), 2733–2763 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  64. Schölkopf, B., Sung, K., Burges, C., Girosi, F., Niyogi, P., Poggio, T., Vapnik, V.: Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Trans. Signal Process. 45, 2758–2765 (1997)CrossRefGoogle Scholar
  65. Sebestyen, G.S.: Decision-making Process in Pattern Recognition. McMillan, New York (1962)Google Scholar
  66. Snell, E.J.: A scaling procedure for ordered categorical data. Biometrics. 20, 592–607 (1964)MathSciNetzbMATHCrossRefGoogle Scholar
  67. Soria, D., Garibaldi, J.M., Ambrogi, F., Biganzoli, E.M., Ellis, I.O.: A non-parametric version of the naive Bayes classifier. Knowl.-Based Syst. 24(6), 775–784 (2011)CrossRefGoogle Scholar
  68. Srivastava, S., Gupta, M.R., Frigyik, B.A.: Bayesian quadratic discriminant analysis. J. Mach. Learn. Res. 8, 1287–1314 (2007)MathSciNetzbMATHGoogle Scholar
  69. Steel, S.J., Louw, N., Leroux, N.J.: A comparison of the post selection error rate behavior of the normal and quadratic linear discriminant rules. J. Stat. Comput. Simul. 65, 157–172 (2000)zbMATHCrossRefGoogle Scholar
  70. Steinwart, I., Christmann, A.: Support Vector Machines. Springer, New York (2008)zbMATHGoogle Scholar
  71. Theil, H.: A multinomial extension of the linear logit model. Int. Econ. Rev. 10(3), 251–259 (1969)CrossRefGoogle Scholar
  72. Tjalling, J.Y.: Historical development of the Newton-Raphson method. SIAM Rev. 37(4), 531–551 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  73. Tukey, J.W.: Bias and confidence in not quite large samples. Ann. Math. Stat. 29(2), 614–623 (1958)CrossRefGoogle Scholar
  74. Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote. Control. 24, 774–780 (1963)Google Scholar
  75. Wahba, G.: Soft and hard classification by reproducing Kernel Hilbert space methods. Proc. Natl. Acad. Sci. 99, 16524–16530 (2002)MathSciNetzbMATHCrossRefGoogle Scholar
  76. Witten, I., Frank, E., Hall, M.: Data Mining. Morgan Kaufmann, Burlington (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Mathematics and StatisticsUniversity of South AlabamaMobileUSA

Personalised recommendations