Small-Sample Classification

  • Lori A. DaltonEmail author
  • Edward R. Dougherty
Part of the Springer Series in Materials Science book series (SSMATERIALS, volume 225)


In a number of application areas, such as materials and genomics, where one wishes to classify objects, sample sizes are often small owing to the expense or unavailability of data points. Many classifier design procedures work well with large samples but are ineffectual or, at best, problematic with small samples. Worse yet, small-samples make it difficult to impossible to guarantee an accurate error estimate without modeling assumptions, and absent a good error estimate a classifier is useless. The present chapter discusses the problem of small-sample error estimation and how modeling assumptions can be used to obtain bounds on error estimation accuracy. Given the necessity of modeling assumptions, we go on to discuss minimum-mean-square-error (MMSE) error estimation and the design of optimal classifiers relative to prior knowledge and data in a Bayesian context.


LDALinear Discriminant Analysis Error Estimator Classification Rule True Error Classifier Design 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    T.W. Anderson, Classification by multivariate analysis. Psychometrika 16(1), 31–50 (1951)CrossRefGoogle Scholar
  2. 2.
    M.S. Esfahani, E.R. Dougherty, Effect of separate sampling on classification accuracy. Bioinformatics 30(2), 242–250 (2014)CrossRefGoogle Scholar
  3. 3.
    U.M. Braga-Neto, A. Zollanvari, E.R. Dougherty, Cross-validation under separate sampling: optimistic bias and how to correct it. Bioinformatics 30(23), 3349–3355 (2014)CrossRefGoogle Scholar
  4. 4.
    V.N. Vapnik, A. Chervonenkis, Theory of Pattern Recognition (Nauka, Moscow, 1974)Google Scholar
  5. 5.
    I. Shmulevich, E.R. Dougherty, Genomic Signal Processing (Princeton University Press, Princeton, 2007)CrossRefGoogle Scholar
  6. 6.
    L. Devroye, L. Györfi, G. Lugosi, A Probabilistic Theory of Pattern Recognition, Stochastic Modelling and Applied Probability (Springer, New York, 1996)CrossRefGoogle Scholar
  7. 7.
    C. Li, K.C.K. Soh, P. Wu, Formability of ABO3 Perovskites. J. Alloys Compd. 372(1), 40–48 (2004)CrossRefGoogle Scholar
  8. 8.
    U.M. Braga-Neto, E.R. Dougherty, Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004)Google Scholar
  9. 9.
    B. Hanczar, J. Hua, E.R. Dougherty, Decorrelation of the true and estimated classifier errors in high-dimensional settings. EURASIP J. Bioinform. Syst. Biol. Article ID 38473, 12 pp (2007)Google Scholar
  10. 10.
    U. Braga-Neto, E.R. Dougherty, Exact performance of error estimators for discrete classifiers. Pattern Recognit. 38(11), 1799–1814 (2005)Google Scholar
  11. 11.
    M.R. Yousefi, E.R. Dougherty, Performance reproducibility index for classification. Bioinformatics 28(21), 2824–2833 (2012)CrossRefGoogle Scholar
  12. 12.
    M.R. Yousefi, J. Hua, C. Sima, E.R. Dougherty, Reporting bias when using real data sets to analyze classification performance. Bioinformatics 26(1), 68–76 (2010)Google Scholar
  13. 13.
    M.R. Yousefi, J. Hua, E.R. Dougherty, Multiple-rule bias in the comparison of classification rules. Bioinformatics 27(12), 1675–1683 (2011)CrossRefGoogle Scholar
  14. 14.
    B. Hanczar, J. Hua, C. Sima, J. Weinstein, M. Bittner, E.R. Dougherty, Small-sample precision of ROC-related estimates. Bioinformatics 26, 822–830 (2010)Google Scholar
  15. 15.
    M. Hills, Allocation rules and their error rates. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 28(1), 1–31 (1966)Google Scholar
  16. 16.
    D. Foley, Considerations of sample and feature size. IEEE Trans. Inf. Theory 18(5), 618–626 (1972)CrossRefGoogle Scholar
  17. 17.
    M.J. Sorum, Estimating the conditional probability of misclassification. Technometrics 13, 333–343 (1971)CrossRefGoogle Scholar
  18. 18.
    G.J. McLachlan, An asymptotic expansion of the expectation of the estimated error rate in discriminant analysis. Aust. J. Stat. 15(3), 210–214 (1973)CrossRefGoogle Scholar
  19. 19.
    M. Moran, On the expectation of errors of allocation associated with a linear discriminant function. Biometrika 62(1), 141–148 (1975)CrossRefGoogle Scholar
  20. 20.
    M. Goldstein, E. Wolf, On the problem of bias in multinomial classification. Biometrics 33, 325–331 (1977)Google Scholar
  21. 21.
    A. Davison, P. Hall, On the bias and variability of bootstrap and cross-validation estimates of error rates in discrimination problems. Biometrica 79, 274–284 (1992)Google Scholar
  22. 22.
    Q. Xu, J. Hua, U.M. Braga-Neto, Z. Xiong, E. Suh, E.R. Dougherty, Confidence intervals for the true classification error conditioned on the estimated error. Technol. Cancer Res. Treat. 5, 579–590 (2006)CrossRefGoogle Scholar
  23. 23.
    A. Zollanvari, U.M. Braga-Neto, E.R. Dougherty, On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers. Pattern Recognit. 42(11), 2705–2723 (2009)CrossRefGoogle Scholar
  24. 24.
    A. Zollanvari, U.M. Braga-Neto, E.R. Dougherty, On the joint sampling distribution between the actual classification error and the resubstitution and leave-one-out error estimators for linear classifiers. IEEE Trans. Inf. Theory 56(2), 784–804 (2010)CrossRefGoogle Scholar
  25. 25.
    A. Zollanvari, U.M. Braga-Neto, E.R. Dougherty, Exact representation of the second-order moments for resubstitution and leave-one-out error estimation for linear discriminant analysis in the univariate heteroskedastic Gaussian model. Pattern Recognit. 45(2), 908–917 (2012)CrossRefGoogle Scholar
  26. 26.
    A. Zollanvari, U.M. Braga-Neto, E.R. Dougherty, Analytic study of performance of error estimators for linear discriminant analysis. IEEE Trans. Signal Process. 59(9), 4238–4255 (2011)CrossRefGoogle Scholar
  27. 27.
    F. Wyman, D. Young, D. Turner, A comparison of asymptotic error rate expansions for the sample linear discriminant function. Pattern Recognit. 23, 775–783 (1990)CrossRefGoogle Scholar
  28. 28.
    V. Pikelis, Comparison of methods of computing the expected classification errors. Autom. Remote Control 5, 59–63 (1976)Google Scholar
  29. 29.
    E.R. Dougherty, A. Zollanvari, U.M. Braga-Neto, The illusion of distribution-free small-sample classification in genomics. Curr. Genomics 12(5), 333–341 (2011)CrossRefGoogle Scholar
  30. 30.
    B. Efron, Estimating the error rate of a prediction rule: improvement on cross-validation. J. Am. Stat. Assoc. 78(382), 316–331 (1983)CrossRefGoogle Scholar
  31. 31.
    T. Vu, C. Sima, U.M. Braga-Neto, E.R. Dougherty, Unbiased bootstrap error estimation for linear discriminant analysis. EURASIP J. Bioinform. Syst. Biol. 2014(1), 15 (2014)CrossRefGoogle Scholar
  32. 32.
    C. Sima, E.R. Dougherty, Optimal convex error estimators for classification. Pattern Recognit. 39, 1763–1780 (2006)CrossRefGoogle Scholar
  33. 33.
    L.A. Dalton, E.R. Dougherty, Bayesian minimum mean-square error estimation for classification error-Part I: Definition and the Bayesian MMSE error estimator for discrete classification. IEEE Trans. Signal Process. 59(1), 115–129 (2011)Google Scholar
  34. 34.
    L.A. Dalton, E.R. Dougherty, Bayesian minimum mean-square error estimation for classification error-Part II: The Bayesian MMSE error estimator for linear classification of Gaussian distributions. IEEE Trans. Signal Process. 59(1), 130–144 (2011)Google Scholar
  35. 35.
    L.A. Dalton, E.R. Dougherty, Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error-Part II: Consistency and performance analysis. IEEE Trans. Signal Process. 60(5), 2588–2603 (2012)CrossRefGoogle Scholar
  36. 36.
    U. Braga-Neto, E. Dougherty, Bolstered error estimation. Pattern Recognit. 37(6), 1267–1281 (2004)CrossRefGoogle Scholar
  37. 37.
    L.A. Dalton, E.R. Dougherty, Optimal classifiers with minimum expected error within a Bayesian framework-Part I: Discrete and Gaussian models. Pattern Recognit. 46(5), 1301–1314 (2013)Google Scholar
  38. 38.
    M.H. DeGroot, Optimal Statistical Decisions (McGraw-Hill, New York, 1970)Google Scholar
  39. 39.
    H. Raiffa, R. Schlaifer, Appl. Stat. Decis. Theory (MIT Press, Cambridge, 1961)Google Scholar
  40. 40.
    E.R. Dougherty, J. Hua, Z. Xiong, Y. Chen, Optimal robust classifiers. Pattern Recognit. 38(10), 1520–1532 (2005)Google Scholar
  41. 41.
    R.A. Fisher, Statistical Methods for Research Workers (Oliver and Boyd, Edinburgh, 1925)Google Scholar
  42. 42.
    L.A. Dalton, E.R. Dougherty, Application of the Bayesian MMSE estimator for classification error to gene expression microarray data. Bioinformatics 27(13), 1822–1831 (2011)CrossRefGoogle Scholar
  43. 43.
    J.M. Knight, I. Ivanov, E.R. Dougherty, MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: Model-based RNA-Seq classification. BMC Bioinform. 15(1), 401 (2014)Google Scholar
  44. 44.
    J.M. Bernardo, Reference posterior distributions for Bayesian inference. J. R. Stat. Soc. Ser. B (Methodol.), 113-147 (1979)Google Scholar
  45. 45.
    J. Rissanen, A universal prior for integers and estimation by minimum description length. Ann. Stat. 416-431 (1983)Google Scholar
  46. 46.
    J.C. Spall, S.D. Hill, Least-informative Bayesian prior distributions for finite samples based on information theory. IEEE Trans. Autom. Control 35(5), 580–583 (1990)CrossRefGoogle Scholar
  47. 47.
    J.O. Berger, J.M. Bernardo, On the development of reference priors. Bayesian Stat. 4(4), 35–60 (1992)Google Scholar
  48. 48.
    R.E. Kass, L. Wasserman, The selection of prior distributions by formal rules. J. Am. Stat. Assoc. 91(435), 1343–1370 (1996)CrossRefGoogle Scholar
  49. 49.
    M.S. Esfahani, E. Dougherty, Incorporation of biological pathway knowledge in the construction of priors for optimal Bayesian classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(1), 202–218 (2014)CrossRefGoogle Scholar
  50. 50.
    B.-J. Yoon, X. Qian, E.R. Dougherty, Quantifying the objective cost of uncertainty in complex dynamical systems. Signal Process., IEEE Trans. 61(9), 2256–2266 (2013)CrossRefGoogle Scholar
  51. 51.
    L.A. Dalton, E.R. Dougherty, Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error-Part I: Representation. IEEE Trans. Signal Process. 60(5), 2575–2587 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.The Ohio State UniversityColumbusUSA
  2. 2.Texas A&M UniversityCollege StationUSA

Personalised recommendations