Tournament screening cum EBIC for feature selection with high-dimensional feature spaces

Chen, ZeHua; Chen, JiaHua

doi:10.1007/s11425-009-0089-4

Tournament screening cum EBIC for feature selection with high-dimensional feature spaces

Published: 23 July 2009

Volume 52, pages 1327–1341, (2009)
Cite this article

Science in China Series A: Mathematics Aims and scope Submit manuscript

ZeHua Chen¹ &
JiaHua Chen²

115 Accesses
16 Citations
3 Altmetric
Explore all metrics

Abstract

The feature selection characterized by relatively small sample size and extremely high-dimensional feature space is common in many areas of contemporary statistics. The high dimensionality of the feature space causes serious difficulties: (i) the sample correlations between features become high even if the features are stochastically independent; (ii) the computation becomes intractable. These difficulties make conventional approaches either inapplicable or inefficient. The reduction of dimensionality of the feature space followed by low dimensional approaches appears the only feasible way to tackle the problem. Along this line, we develop in this article a tournament screening cum EBIC approach for feature selection with high dimensional feature space. The procedure of tournament screening mimics that of a tournament. It is shown theoretically that the tournament screening has the sure screening property, a necessary property which should be satisfied by any valid screening procedure. It is demonstrated by numerical studies that the tournament screening cum EBIC approach enjoys desirable properties such as having higher positive selection rate and lower false discovery rate than other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Introduction to Machine Learning

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

References

Hunter D, Li R. Variable selection via MM algorithms. Ann Statist, 33: 1617–1642 (2005)
Article MATH MathSciNet Google Scholar
Huang J, Horowitz J, Ma S. Asymptotic properties of bridge estimation in sparse high-dimensional regression models. Ann Statist, 36: 587–613 (2008)
Article MATH MathSciNet Google Scholar
Paul D, Bair E, Hastie T, et al. “Preconditioning” for feature selection and regression in high-dimensional problems. Ann Statist, 36: 1595–1618 (2007)
Article MathSciNet Google Scholar
Zhang C H, Huang J. The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann Statist, 36: 1567–1594 (2008)
Article MATH MathSciNet Google Scholar
Kosorok M R, Ma S. Marginal asymptotics for the “large p, small n” paradigm: With applications to microarray data. Ann Statist, 35: 1456–1486 (2007)
Article MATH MathSciNet Google Scholar
Fan J, Lv J. Sure independence screening for ultra-high dimensional feature space. Ann Statist, 70: 849–911 (2007)
Google Scholar
Tusher V, Tibshirani R, Chu C. Significance analysis of microarrays applied to transcriptional responses to ionizing radiation. Proc Nat Acad Sci USA, 98: 5116–5121 (2001)
Article MATH Google Scholar
Tibshirani R, Hastie T, Narasimhan B, et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Nat Acad Sci USA, 99: 6567–6572 (2002)
Article Google Scholar
Marchini J, Donnelly P, Cardon L R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genetics, 37: 413–417 (2005)
Article Google Scholar
Benjamini Y, Hochberg Y. Controlling the false discovery rate — A practical and powerful approach to multiple testing. J Royal Statist Soc Ser B, 57: 289–300 (1995)
MATH MathSciNet Google Scholar
Storey J D, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA, 100: 9440–9445 (2003)
Article MATH MathSciNet Google Scholar
Hoh J, Wille A, Ott J. Trimming, weighting, and grouping SNPs in human case-control association studies. Genome Research, 11: 2115–2119 (2001)
Article Google Scholar
Hoh J, Ott J. Mathematical multi-locus approaches to localizing complex human trait genes. Nature Reviews Genetics, 4: 701–709 (2003)
Article Google Scholar
Zaykin D V, Zhivotovsky L A, Westfall P H, et al. Truncated product method for combining p-values, Genet Epidemiol, 22: 170–185 (2002)
Article Google Scholar
Dudbridge F, Koeleman B P C. Rank truncated product of P-values, with application to genome wide association scans. Genet Epidemiol, 25: 360–366 (2003)
Article Google Scholar
Tibshirani R. Regression shrinkage and selection via the LASSO. J Royal Statist Soc Ser B, 58: 267–288 (1996)
MATH MathSciNet Google Scholar
Fan J, Li R. Variable selection via non-concave penalized likelihood and its oracle properties. J Amer Statist Assoc, 96: 1348–1360 (2001)
Article MATH MathSciNet Google Scholar
Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Statist Soc Ser B, 67: 301–320 (2005)
Article MATH MathSciNet Google Scholar
Efron B, Hastie T, Johnstone I, et al. Least angle regression. Ann Statist, 32: 407–499 (2004)
Article MATH MathSciNet Google Scholar
Ishwaran H, Rao J S. Detecting differentially expressed genes in microarrays using Bayesian model selection. J Amer Statist Assoc, 98: 438–455 (2003)
Article MATH MathSciNet Google Scholar
Chen J, Chen Z. Extended Bayesian information criteria for model selection with large model space. Biometrika, 95: 759–771 (2008)
Article Google Scholar
Akaike H. Information Theory and an Extension of the Maximum Likelihood Principle. In: Second International Symposium on Information Theory, eds. B.N. Petrox and F. Caski. Budapest: Akademiai Kiado, 267, 1973
Google Scholar
Schwarz G. Estimating the dimension of a model. Ann Statist, 6: 461–464 (1978)
Article MATH MathSciNet Google Scholar
Candes E, Tao T. The Dantzig selector: statistical estimation when p is much larger than n. Ann Statist, 35: 2313–2351 (2007)
Article MATH MathSciNet Google Scholar
Amos C I. Robust variance-components approach for assessing genetic linkage in pedigrees. Am J Hum Genet, 54: 535–543 (1994)
Google Scholar
Chen Z, Chen J, Liu J. A tournament approach to the detection of multiple associations in genome-wide studies with pedigree data. Working Paper 2006-09, www.stats.uwaterloo.ca. Department of Statistics & Actuarial Sciences, University of Waterloo, 2006
Serfling R J. Approximation Theorems of Mathematical Statistics. New York: John Wiley & Sons, 1980
Book MATH Google Scholar
Broman K W, Speed T P. A model selection approach for the identification of quantitative trait loci in experimental crosses. J Royal Statist Soc Ser B, 64: 641–656 (2002)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics & Applied Probability, National University of Singapore, 3 Science Drive 2, 117543, Singapore, Singapore
ZeHua Chen
Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z2, Canada
JiaHua Chen

Authors

ZeHua Chen
View author publications
You can also search for this author in PubMed Google Scholar
JiaHua Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to ZeHua Chen.

Additional information

Dedicated to Professor Zhidong Bai on the occasion of his 65th birthday

Zehua Chen was supported by Singapore Ministry of Educations ACRF Tier 1 (Grant No. R-155-000-065-112). Jiahua Chen was supported by the National Science and Engineering Research Countil of Canada and MITACS, Canada.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Chen, J. Tournament screening cum EBIC for feature selection with high-dimensional feature spaces. Sci. China Ser. A-Math. 52, 1327–1341 (2009). https://doi.org/10.1007/s11425-009-0089-4

Download citation

Received: 19 October 2008
Accepted: 22 April 2009
Published: 23 July 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s11425-009-0089-4

Keywords

MSC(2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tournament screening cum EBIC for feature selection with high-dimensional feature spaces

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

Feature selection techniques for machine learning: a survey of more than two decades of research

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

MSC(2000)

Navigation

Tournament screening cum EBIC for feature selection with high-dimensional feature spaces

Abstract

Access this article

Similar content being viewed by others

Introduction to Machine Learning

Feature selection techniques for machine learning: a survey of more than two decades of research

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

MSC(2000)

Search

Navigation