Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

Yang, Aijun; Jiang, Xuejun; Shu, Lianjie; Lin, Jinguan

doi:10.1007/s00180-016-0665-3

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

Original Paper
Published: 06 June 2016

Volume 32, pages 127–143, (2017)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Aijun Yang^1,2,
Xuejun Jiang³,
Lianjie Shu⁴ &
…
Jinguan Lin⁵

930 Accesses
6 Citations
2 Altmetric
Explore all metrics

Abstract

The main challenge in working with gene expression microarrays is that the sample size is small compared to the large number of variables (genes). In many studies, the main focus is on finding a small subset of the genes, which are the most important ones for differentiating between different types of cancer, for simpler and cheaper diagnostic arrays. In this paper, a sparse Bayesian variable selection method in probit model is proposed for gene selection and classification. We assign a sparse prior for regression parameters and perform variable selection by indexing the covariates of the model with a binary vector. The correlation prior for the binary vector assigned in this paper is able to distinguish models with the same size. The performance of the proposed method is demonstrated with one simulated data and two well known real data sets, and the results show that our method is comparable with other existing methods in variable selection and classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian variable selection in multinomial probit model for classifying high-dimensional data

Article 04 December 2014

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

Article Open access 21 March 2018

A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

Article 27 May 2023

References

Albert J, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88:669–679
Article MathSciNet MATH Google Scholar
Armagan A, Dunson DB, Lee J (2013) Generalized double Pareto shrinkage. Stat Sin 3(1):119–143
MathSciNet MATH Google Scholar
Bae K, Mallick BK (2004) Gene selection using a two-level hierarchical Bayesian model. Bioinformatics 20(18):3423–3430
Article Google Scholar
Baragatti M (2011) Bayesian variable selection for probit mixed models applied to gene selection. Bayesian Anal 6(2):209–230
Article MathSciNet MATH Google Scholar
Baragatti M, Pommeret D (2012) A study of variable selection using g-prior distribution with ridge parameter. Comput Stat Data Anal 56:1920–1934
Article MathSciNet MATH Google Scholar
Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning, pp 82–90
Brotherick I, Robson CN, Browell DA, Shenfine J, White MD, Cunliffe WJ, Shenton BK, Egan M, Webb LA, Lunt LG, Young JR, Higgs MJ (1998) Cytokeratin expression in breast cancer: phenotypic changes associated with disease progression. Cytometry 32:301–308
Article Google Scholar
Chakraborty S (2009) Bayesian Binary kernel probit model for microarray based cancer classification and gene selection. Comput Stat Data Anal 53:4198–4209
Article MathSciNet MATH Google Scholar
Chakraborty S, Guo R (2011) Bayesian hybrid huberized SVM and its applications in high dimensional medical data. Comput Stat Data Anal 55(3):1342–1356
Article MATH Google Scholar
Chhikara R, Folks L (1989) The inverse Gaussian distribution: theory, methodology, and applications. Marcel Dekker, New York
MATH Google Scholar
Devroye L (1986) Non-uniform random variate generation. Springer, New York
Book MATH Google Scholar
Dougherty ER (2001) Small sample issues for microarray-based classification. Comp Funct Genomics 2:28–34
Article Google Scholar
Dudoit Y, Yang H, Callow M, Speed T (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Article MathSciNet MATH Google Scholar
Geman S, Geman D (1984) Stochastic relaxation, Gibbls distribution, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741
Article MATH Google Scholar
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889
Article Google Scholar
Geyer CJ (1992) Practical Markov chain Monte Carlo. Stat Sci 7:473–511
Article Google Scholar
Gilks W, Richardson S, Spiegelhalter D (1996) Markov Chain Monte Carlo in practise. Chapman and Hall, London
MATH Google Scholar
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Article Google Scholar
Gupta M, Ibrahim JG (2007) Variable selection in regression mixture modeling for the discovery of gene regulatory networks. J Am Stat Assoc 102(479):867–880
Article MathSciNet MATH Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Article MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The element of statistical learning. Springer, New York
Book MATH Google Scholar
Hendenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Kallioniemi OP, Wilfond B, Borg A, Trent J (2001) Gene expression profiles in hereditary breast cancer. N Engl J Med 344:539–548
Article Google Scholar
Hirota T, Morisaki T, Nishiyama Y, Marumoto T, Tada K, Hara T, Masuko N, Inagaki M, Hatakeyama K, Saya H (2000) Zyxin a regulator of actin filament assembly, targets the mitotic apparatus by interacting with h-warts/LATS1 tumor suppressor. J Cell Biol 149:1073–1086
Article Google Scholar
Ishwaran H, Rao JS (2005) Spike and slab variable selection: frequentist and bayesian strategies. Ann Stat 33(2):730–773
Article MathSciNet MATH Google Scholar
Kass RE, Carlin BP, Gelman A, Neal R (1998) Markov Chain Monte Carlo in practice: a roundtable discussion. Am Stat 52:93–100
MathSciNet Google Scholar
Lamnisos D, Griffin JE, Steel FJ Mark (2009) Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. J Comput Graph Stat 18:592–612
Article MathSciNet Google Scholar
Lee KE et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19:90–97
Article Google Scholar
Li F, Zhang NR (2010) Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics. J Am Stat Assoc 105(491):1202–1214
Article MathSciNet MATH Google Scholar
Liu X, Krishnan A, Mondry A (2005) An entropy-based gene selection method for cancer classification using microarray data. BMC Bioinform 6:76
Article Google Scholar
Mallick BK, Ghosh D, Ghosh M (2005) Bayesian classification of tumors using gene expression data. J R Stat Soc B 67:219–232
Article MATH Google Scholar
Maruyama Y, George EI (2011) gBF: a fully Bayes factor with a generalized g-prior. Technical Report, University of Pennsylvania. arXiv:0801.4410
Mitchell TJ, Beauchamp JJ (1988) Bayesian variable selection in linear regression. J Am Stat Assoc 83:1023–1036
Article MathSciNet MATH Google Scholar
Nguyen DV, Rocke DM (2002) Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics 18:1216–1226
Article Google Scholar
OHara RB, Sillanpaa MJ (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4:85–118
Article MathSciNet MATH Google Scholar
Panagiotelisa A, Smith M (2008) Bayesian identification, selection and estimation of semiparametric functions in high dimensional additive models. J Econom 143:291–316
Article MathSciNet Google Scholar
Park K, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686
Article MathSciNet MATH Google Scholar
Quintana MA, Conti DV (2013) Integrative variable selection via Bayesian model uncertainty. Stat Med 32(28):4938–4953
Article MathSciNet Google Scholar
Sha N, Vannucci M, Tadesse M, Brown P, Dragoni I, Davies N, Roberts T, Contestabile A, Salmon M, Buckley C, Falciani F (2004) Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60:812–819
Article MathSciNet MATH Google Scholar
Stingo FC, Vannucci M (2011) Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data. Bioinformatics 27(4):495–501
Article Google Scholar
Strawderman WE (1971) Proper Bayes minimax estimators of the multivariate normal mean. Ann Math Stat 42:385–388
Article MathSciNet MATH Google Scholar
Tolosi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27:1986–1994
Article Google Scholar
Yang K, Cai Z, Li J, Lin G (2006) A stable gene selection in microarray data analysis. BMC Bioinform 7:228
Article Google Scholar
Yang A, Song X (2010) Bayesian variable selection for disease classication using gene expression data. Bioinformatics 26(2):215–222
Article Google Scholar
Yuan M, Lin Y (2005) Efficient empirical bayes variable selection and estimation in linear models. J Am Stat Assoc 472:1215–1225
Article MathSciNet MATH Google Scholar
Zellner A (1986) On assessing prior distributions and Bayesian regression analysis with g-prior distributions. Bayesian inference and decision techniques: essays in honor of Bruno de Finetti. NorthHolland, Amsterdam, pp 233–243
Zhou X, Liu K, Wong S (2004) Cancer classification and prediction using logistic regression with Bayesian gene selection. J Biomed Inform 37:249–259
Article Google Scholar

Download references

Acknowledgments

The authors gratefully acknowledge the financial support of the Natural Science Foundation of China (11501294, 11101432, 11571073), the China Postdoctoral Science Foundation (2015M580374), and the Natural Science Foundation of Jiangsu (BK20141326).

Author information

Authors and Affiliations

College of Economics and Management, Nanjing Forestry University, Nanjing, China
Aijun Yang
School of Economics and Management, Southeast University, Nanjing, China
Aijun Yang
Department of Mathematics, South University of Science and Technology of China, Shenzhen, China
Xuejun Jiang
Faculty of Business Administration, University of Macau, Macau, China
Lianjie Shu
Department of Mathematics, Southeast University, Nanjing, China
Jinguan Lin

Authors

Aijun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xuejun Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Lianjie Shu
View author publications
You can also search for this author in PubMed Google Scholar
Jinguan Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuejun Jiang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (rar 9 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, A., Jiang, X., Shu, L. et al. Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis. Comput Stat 32, 127–143 (2017). https://doi.org/10.1007/s00180-016-0665-3

Download citation

Received: 17 February 2015
Accepted: 21 May 2016
Published: 06 June 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s00180-016-0665-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

Abstract

Access this article

Similar content being viewed by others

Bayesian variable selection in multinomial probit model for classifying high-dimensional data

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (rar 9 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

Abstract

Access this article

Similar content being viewed by others

Bayesian variable selection in multinomial probit model for classifying high-dimensional data

Heuristic algorithms for feature selection under Bayesian models with block-diagonal covariance structure

A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (rar 9 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation