Skip to main content
Log in

Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

We extend the standard Bayesian multivariate Gaussian generative data classifier by considering a generalization of the conjugate, normal-Wishart prior distribution, and by deriving the hyperparameters analytically via evidence maximization. The behaviour of the optimal hyperparameters is explored in the high-dimensional data regime. The classification accuracy of the resulting generalized model is competitive with state-of-the art Bayesian discriminant analysis methods, but without the usual computational burden of cross-validation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. This is the case for rare diseases, or when obtaining tissue material is nontrivial or expensive, but measuring extensive numbers of features in such material (e.g. gene expression data) is relatively simple and cheap.

  2. While ϱ(λ) is not a good estimator for ϱ0(λ), Jonsson (1982) showed that in contrast \(\int \!\mathrm {d}\lambda \rho (\lambda )\lambda \) is a good estimate of \(\int \!\mathrm {d}\lambda \rho _{0}(\lambda )\lambda \); the bulk spectrum becomes more biased as d/n increases, but the sample eigenvalue average does not.

  3. MATLAB 8.0, The MathWorks, Inc., Natick, Massachusetts, United States.

  4. Leave-one-out cross-validation using an Intel i5-4690 x64-based processor, CPU speed of 3.50GHz, 32GB RAM. As the data dimension increases above 30,000, RAM storage considerations become an issue on typical PCs.

  5. http://archive.ics.uci.edu/ml/index.php

References

  • Bensmail, H., & Celeux, G. (1996). Regularized Gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association, 91 (436), 1743–1748.

    Article  MathSciNet  Google Scholar 

  • Berger, J.O., Bernardo, J.M., et al. (1992). On the development of reference priors. Bayesian Statistics, 4(4), 35–60.

    MathSciNet  Google Scholar 

  • Brown, P.J., Fearn, T., Haque, M. (1999). Discrimination with many variables. Journal of the American Statistical Association, 94(448), 1320–1329.

    Article  MathSciNet  Google Scholar 

  • Coolen, A.C.C., Barrett, J.E., Paga, P., Perez-Vicente, C.J. (2017). Replica analysis of overfitting in regression models for time-to-event data. Journal of Physics A: Mathematical and Theoretical, 50, 375001.

    Article  MathSciNet  Google Scholar 

  • Efron, B., & Morris, C.N. (1977). Stein’s paradox in statistics. New York: WH Freeman.

    Book  Google Scholar 

  • Friedman, J.H. (1989). Regularized discriminant analysis. Journal of the American statistical Association, 84(405), 165–175.

    Article  MathSciNet  Google Scholar 

  • Geisser, S. (1964). Posterior odds for multivariate normal classifications. Journal of the Royal Statistical Society. Series B (Methodological), 26(1), 69–76.

    Article  MathSciNet  Google Scholar 

  • Haff, L. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. The Annals of Statistics, 8(3), 586–597.

    Article  MathSciNet  Google Scholar 

  • Hinton, G.E., & Salakhutdinov, RR. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.

    Article  MathSciNet  Google Scholar 

  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417.

    Article  Google Scholar 

  • Hubert, L, & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  • James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 361–379).

  • Jonsson, D. (1982). Some limit theorems for the eigenvalues of a sample covariance matrix. Journal of Multivariate Analysis, 12(1), 1–38.

    Article  MathSciNet  Google Scholar 

  • Keehn, D.G. (1965). A note on learning for Gaussian properties. IEEE Transactions on Information Theory, 11(1), 126–132.

    Article  MathSciNet  Google Scholar 

  • Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.

    Article  MathSciNet  Google Scholar 

  • MacKay, D.J. (1999). Comparison of approximate methods for handling hyperparameters. Neural Computation, 11(5), 1035–1068.

    Article  Google Scholar 

  • Morey, LC, & Agresti, A. (1984). The measurement of classification agreement: an adjustment to the Rand statistic for chance agreement. Educational and Psychological Measurement, 44(1), 33–7.

    Article  Google Scholar 

  • Raudys, S., & Young, D.M. (2004). Results in statistical discriminant analysis: a review of the former Soviet Union literature. Journal of Multivariate Analysis, 89(1), 1–35.

    Article  MathSciNet  Google Scholar 

  • Shalabi, A., Inoue, M., Watkins, J., De Rinaldis, E., Coolen, A.C. (2016). Bayesian clinical classification from high-dimensional data: signatures versus variability. Statistical Methods in Medical Research, 0962280216628901.

  • Srivastava, S., & Gupta, M.R. (2006). Distribution-based Bayesian minimum expected risk for discriminant analysis. In 2006 IEEE international symposium on information theory (pp. 2294–2298): IEEE.

  • Srivastava, S., Gupta, M.R., Frigyik, B.A. (2007). Bayesian quadratic discriminant analysis. Journal of Machine Learning Research, 8(6), 1277–1305.

    MathSciNet  MATH  Google Scholar 

  • Stein, C., & et al. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 197–206).

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the Biotechnology and Biological Sciences Research Council (UK) and by GlaxoSmithKline Research and Development Ltd. Many thanks to James Barrett for his support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mansoor Sheikh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sheikh, M., Coolen, A.C.C. Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation. J Classif 37, 277–297 (2020). https://doi.org/10.1007/s00357-019-09316-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-09316-6

Keywords

Navigation