Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation


We extend the standard Bayesian multivariate Gaussian generative data classifier by considering a generalization of the conjugate, normal-Wishart prior distribution, and by deriving the hyperparameters analytically via evidence maximization. The behaviour of the optimal hyperparameters is explored in the high-dimensional data regime. The classification accuracy of the resulting generalized model is competitive with state-of-the art Bayesian discriminant analysis methods, but without the usual computational burden of cross-validation.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    This is the case for rare diseases, or when obtaining tissue material is nontrivial or expensive, but measuring extensive numbers of features in such material (e.g. gene expression data) is relatively simple and cheap.

  2. 2.

    While ϱ(λ) is not a good estimator for ϱ0(λ), Jonsson (1982) showed that in contrast \(\int \!\mathrm {d}\lambda \rho (\lambda )\lambda \) is a good estimate of \(\int \!\mathrm {d}\lambda \rho _{0}(\lambda )\lambda \); the bulk spectrum becomes more biased as d/n increases, but the sample eigenvalue average does not.

  3. 3.

    MATLAB 8.0, The MathWorks, Inc., Natick, Massachusetts, United States.

  4. 4.

    Leave-one-out cross-validation using an Intel i5-4690 x64-based processor, CPU speed of 3.50GHz, 32GB RAM. As the data dimension increases above 30,000, RAM storage considerations become an issue on typical PCs.

  5. 5.


  1. Bensmail, H., & Celeux, G. (1996). Regularized Gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association, 91 (436), 1743–1748.

    MathSciNet  Article  Google Scholar 

  2. Berger, J.O., Bernardo, J.M., et al. (1992). On the development of reference priors. Bayesian Statistics, 4(4), 35–60.

    MathSciNet  Google Scholar 

  3. Brown, P.J., Fearn, T., Haque, M. (1999). Discrimination with many variables. Journal of the American Statistical Association, 94(448), 1320–1329.

    MathSciNet  Article  Google Scholar 

  4. Coolen, A.C.C., Barrett, J.E., Paga, P., Perez-Vicente, C.J. (2017). Replica analysis of overfitting in regression models for time-to-event data. Journal of Physics A: Mathematical and Theoretical, 50, 375001.

    MathSciNet  Article  Google Scholar 

  5. Efron, B., & Morris, C.N. (1977). Stein’s paradox in statistics. New York: WH Freeman.

    Google Scholar 

  6. Friedman, J.H. (1989). Regularized discriminant analysis. Journal of the American statistical Association, 84(405), 165–175.

    MathSciNet  Article  Google Scholar 

  7. Geisser, S. (1964). Posterior odds for multivariate normal classifications. Journal of the Royal Statistical Society. Series B (Methodological), 26(1), 69–76.

    MathSciNet  Article  Google Scholar 

  8. Haff, L. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. The Annals of Statistics, 8(3), 586–597.

    MathSciNet  Article  Google Scholar 

  9. Hinton, G.E., & Salakhutdinov, RR. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.

    MathSciNet  Article  Google Scholar 

  10. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417.

    Article  Google Scholar 

  11. Hubert, L, & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  12. James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 361–379).

  13. Jonsson, D. (1982). Some limit theorems for the eigenvalues of a sample covariance matrix. Journal of Multivariate Analysis, 12(1), 1–38.

    MathSciNet  Article  Google Scholar 

  14. Keehn, D.G. (1965). A note on learning for Gaussian properties. IEEE Transactions on Information Theory, 11(1), 126–132.

    MathSciNet  Article  Google Scholar 

  15. Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.

    MathSciNet  Article  Google Scholar 

  16. MacKay, D.J. (1999). Comparison of approximate methods for handling hyperparameters. Neural Computation, 11(5), 1035–1068.

    Article  Google Scholar 

  17. Morey, LC, & Agresti, A. (1984). The measurement of classification agreement: an adjustment to the Rand statistic for chance agreement. Educational and Psychological Measurement, 44(1), 33–7.

    Article  Google Scholar 

  18. Raudys, S., & Young, D.M. (2004). Results in statistical discriminant analysis: a review of the former Soviet Union literature. Journal of Multivariate Analysis, 89(1), 1–35.

    MathSciNet  Article  Google Scholar 

  19. Shalabi, A., Inoue, M., Watkins, J., De Rinaldis, E., Coolen, A.C. (2016). Bayesian clinical classification from high-dimensional data: signatures versus variability. Statistical Methods in Medical Research, 0962280216628901.

  20. Srivastava, S., & Gupta, M.R. (2006). Distribution-based Bayesian minimum expected risk for discriminant analysis. In 2006 IEEE international symposium on information theory (pp. 2294–2298): IEEE.

  21. Srivastava, S., Gupta, M.R., Frigyik, B.A. (2007). Bayesian quadratic discriminant analysis. Journal of Machine Learning Research, 8(6), 1277–1305.

    MathSciNet  MATH  Google Scholar 

  22. Stein, C., & et al. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 197–206).

  23. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

    MathSciNet  Article  Google Scholar 

Download references


This work was supported by the Biotechnology and Biological Sciences Research Council (UK) and by GlaxoSmithKline Research and Development Ltd. Many thanks to James Barrett for his support.

Author information



Corresponding author

Correspondence to Mansoor Sheikh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sheikh, M., Coolen, A.C.C. Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation. J Classif 37, 277–297 (2020).

Download citation


  • Hyperparameters
  • Evidence maximization
  • Bayesian classification
  • High-dimensional data