Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation

  • Mansoor SheikhEmail author
  • A. C. C. Coolen


We extend the standard Bayesian multivariate Gaussian generative data classifier by considering a generalization of the conjugate, normal-Wishart prior distribution, and by deriving the hyperparameters analytically via evidence maximization. The behaviour of the optimal hyperparameters is explored in the high-dimensional data regime. The classification accuracy of the resulting generalized model is competitive with state-of-the art Bayesian discriminant analysis methods, but without the usual computational burden of cross-validation.


Hyperparameters Evidence maximization Bayesian classification High-dimensional data 



This work was supported by the Biotechnology and Biological Sciences Research Council (UK) and by GlaxoSmithKline Research and Development Ltd. Many thanks to James Barrett for his support.


  1. Bensmail, H., & Celeux, G. (1996). Regularized Gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association, 91 (436), 1743–1748.MathSciNetCrossRefzbMATHGoogle Scholar
  2. Berger, J.O., Bernardo, J.M., et al. (1992). On the development of reference priors. Bayesian Statistics, 4(4), 35–60.MathSciNetGoogle Scholar
  3. Brown, P.J., Fearn, T., Haque, M. (1999). Discrimination with many variables. Journal of the American Statistical Association, 94(448), 1320–1329.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Coolen, A.C.C., Barrett, J.E., Paga, P., Perez-Vicente, C.J. (2017). Replica analysis of overfitting in regression models for time-to-event data. Journal of Physics A: Mathematical and Theoretical, 50, 375001.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Efron, B., & Morris, C.N. (1977). Stein’s paradox in statistics. New York: WH Freeman.Google Scholar
  6. Friedman, J.H. (1989). Regularized discriminant analysis. Journal of the American statistical Association, 84(405), 165–175.MathSciNetCrossRefGoogle Scholar
  7. Geisser, S. (1964). Posterior odds for multivariate normal classifications. Journal of the Royal Statistical Society. Series B (Methodological), 26(1), 69–76.MathSciNetCrossRefzbMATHGoogle Scholar
  8. Haff, L. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. The Annals of Statistics, 8(3), 586–597.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Hinton, G.E., & Salakhutdinov, RR. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417.CrossRefzbMATHGoogle Scholar
  11. Hubert, L, & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefzbMATHGoogle Scholar
  12. James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 361–379).Google Scholar
  13. Jonsson, D. (1982). Some limit theorems for the eigenvalues of a sample covariance matrix. Journal of Multivariate Analysis, 12(1), 1–38.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Keehn, D.G. (1965). A note on learning for Gaussian properties. IEEE Transactions on Information Theory, 11(1), 126–132.MathSciNetCrossRefGoogle Scholar
  15. Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.MathSciNetCrossRefzbMATHGoogle Scholar
  16. MacKay, D.J. (1999). Comparison of approximate methods for handling hyperparameters. Neural Computation, 11(5), 1035–1068.CrossRefGoogle Scholar
  17. Morey, LC, & Agresti, A. (1984). The measurement of classification agreement: an adjustment to the Rand statistic for chance agreement. Educational and Psychological Measurement, 44(1), 33–7.CrossRefGoogle Scholar
  18. Raudys, S., & Young, D.M. (2004). Results in statistical discriminant analysis: a review of the former Soviet Union literature. Journal of Multivariate Analysis, 89(1), 1–35.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Shalabi, A., Inoue, M., Watkins, J., De Rinaldis, E., Coolen, A.C. (2016). Bayesian clinical classification from high-dimensional data: signatures versus variability. Statistical Methods in Medical Research, 0962280216628901.Google Scholar
  20. Srivastava, S., & Gupta, M.R. (2006). Distribution-based Bayesian minimum expected risk for discriminant analysis. In 2006 IEEE international symposium on information theory (pp. 2294–2298): IEEE.Google Scholar
  21. Srivastava, S., Gupta, M.R., Frigyik, B.A. (2007). Bayesian quadratic discriminant analysis. Journal of Machine Learning Research, 8(6), 1277–1305.MathSciNetzbMATHGoogle Scholar
  22. Stein, C., & et al. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 197–206).Google Scholar
  23. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© The Classification Society 2019

Authors and Affiliations

  1. 1.Institute for Mathematical and Molecular Biomedicine (IMMB)King’s College LondonLondonUK
  2. 2.Department of MathematicsKing’s College LondonLondonUK
  3. 3.Saddle Point ScienceLondonUK

Personalised recommendations