Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation

Sheikh, Mansoor; Coolen, A. C. C.

doi:10.1007/s00357-019-09316-6

Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation

Published: 02 April 2019

Volume 37, pages 277–297, (2020)
Cite this article

Journal of Classification Aims and scope Submit manuscript

321 Accesses
3 Citations
Explore all metrics

Abstract

We extend the standard Bayesian multivariate Gaussian generative data classifier by considering a generalization of the conjugate, normal-Wishart prior distribution, and by deriving the hyperparameters analytically via evidence maximization. The behaviour of the optimal hyperparameters is explored in the high-dimensional data regime. The classification accuracy of the resulting generalized model is competitive with state-of-the art Bayesian discriminant analysis methods, but without the usual computational burden of cross-validation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Olga Russakovsky, Jia Deng, … Li Fei-Fei

A random forest guided tour

Article 19 April 2016

Gérard Biau & Erwan Scornet

Notes

This is the case for rare diseases, or when obtaining tissue material is nontrivial or expensive, but measuring extensive numbers of features in such material (e.g. gene expression data) is relatively simple and cheap.
While ϱ(λ) is not a good estimator for ϱ₀(λ), Jonsson (1982) showed that in contrast \(\int \!\mathrm {d}\lambda \rho (\lambda )\lambda \) is a good estimate of \(\int \!\mathrm {d}\lambda \rho _{0}(\lambda )\lambda \); the bulk spectrum becomes more biased as d/n increases, but the sample eigenvalue average does not.
MATLAB 8.0, The MathWorks, Inc., Natick, Massachusetts, United States.
Leave-one-out cross-validation using an Intel i5-4690 x64-based processor, CPU speed of 3.50GHz, 32GB RAM. As the data dimension increases above 30,000, RAM storage considerations become an issue on typical PCs.
http://archive.ics.uci.edu/ml/index.php

References

Bensmail, H., & Celeux, G. (1996). Regularized Gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association, 91 (436), 1743–1748.
Article MathSciNet Google Scholar
Berger, J.O., Bernardo, J.M., et al. (1992). On the development of reference priors. Bayesian Statistics, 4(4), 35–60.
MathSciNet Google Scholar
Brown, P.J., Fearn, T., Haque, M. (1999). Discrimination with many variables. Journal of the American Statistical Association, 94(448), 1320–1329.
Article MathSciNet Google Scholar
Coolen, A.C.C., Barrett, J.E., Paga, P., Perez-Vicente, C.J. (2017). Replica analysis of overfitting in regression models for time-to-event data. Journal of Physics A: Mathematical and Theoretical, 50, 375001.
Article MathSciNet Google Scholar
Efron, B., & Morris, C.N. (1977). Stein’s paradox in statistics. New York: WH Freeman.
Book Google Scholar
Friedman, J.H. (1989). Regularized discriminant analysis. Journal of the American statistical Association, 84(405), 165–175.
Article MathSciNet Google Scholar
Geisser, S. (1964). Posterior odds for multivariate normal classifications. Journal of the Royal Statistical Society. Series B (Methodological), 26(1), 69–76.
Article MathSciNet Google Scholar
Haff, L. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. The Annals of Statistics, 8(3), 586–597.
Article MathSciNet Google Scholar
Hinton, G.E., & Salakhutdinov, RR. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
Article MathSciNet Google Scholar
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417.
Article Google Scholar
Hubert, L, & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Article Google Scholar
James, W., & Stein, C. (1961). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 361–379).
Jonsson, D. (1982). Some limit theorems for the eigenvalues of a sample covariance matrix. Journal of Multivariate Analysis, 12(1), 1–38.
Article MathSciNet Google Scholar
Keehn, D.G. (1965). A note on learning for Gaussian properties. IEEE Transactions on Information Theory, 11(1), 126–132.
Article MathSciNet Google Scholar
Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.
Article MathSciNet Google Scholar
MacKay, D.J. (1999). Comparison of approximate methods for handling hyperparameters. Neural Computation, 11(5), 1035–1068.
Article Google Scholar
Morey, LC, & Agresti, A. (1984). The measurement of classification agreement: an adjustment to the Rand statistic for chance agreement. Educational and Psychological Measurement, 44(1), 33–7.
Article Google Scholar
Raudys, S., & Young, D.M. (2004). Results in statistical discriminant analysis: a review of the former Soviet Union literature. Journal of Multivariate Analysis, 89(1), 1–35.
Article MathSciNet Google Scholar
Shalabi, A., Inoue, M., Watkins, J., De Rinaldis, E., Coolen, A.C. (2016). Bayesian clinical classification from high-dimensional data: signatures versus variability. Statistical Methods in Medical Research, 0962280216628901.
Srivastava, S., & Gupta, M.R. (2006). Distribution-based Bayesian minimum expected risk for discriminant analysis. In 2006 IEEE international symposium on information theory (pp. 2294–2298): IEEE.
Srivastava, S., Gupta, M.R., Frigyik, B.A. (2007). Bayesian quadratic discriminant analysis. Journal of Machine Learning Research, 8(6), 1277–1305.
MathSciNet MATH Google Scholar
Stein, C., & et al. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the third Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 197–206).
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the Biotechnology and Biological Sciences Research Council (UK) and by GlaxoSmithKline Research and Development Ltd. Many thanks to James Barrett for his support.

Author information

Authors and Affiliations

Institute for Mathematical and Molecular Biomedicine (IMMB), King’s College London, Hodgkin Building 4N/5N (Guy’s’ Campus), London, SE1 1UL, UK
Mansoor Sheikh & A. C. C. Coolen
Department of Mathematics, King’s College London, The Strand, London, WC2R 2LS, UK
Mansoor Sheikh
Saddle Point Science, London, UK
A. C. C. Coolen

Authors

Mansoor Sheikh
View author publications
You can also search for this author in PubMed Google Scholar
A. C. C. Coolen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mansoor Sheikh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheikh, M., Coolen, A.C.C. Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation. J Classif 37, 277–297 (2020). https://doi.org/10.1007/s00357-019-09316-6

Download citation

Published: 02 April 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00357-019-09316-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

ImageNet Large Scale Visual Recognition Challenge

A random forest guided tour

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accurate Bayesian Data Classification Without Hyperparameter Cross-Validation

Abstract

Access this article

Similar content being viewed by others

A survey on semi-supervised learning

ImageNet Large Scale Visual Recognition Challenge

A random forest guided tour

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation