Statistics in Biosciences

, Volume 7, Issue 2, pp 262–281 | Cite as

An Adaptive Genetic Association Test Using Double Kernel Machines

  • Xiang ZhanEmail author
  • Michael P. Epstein
  • Debashis Ghosh


Recently, gene set-based approaches have become very popular in gene expression profiling studies for assessing how genetic variants are related to disease outcomes. Since most genes are not differentially expressed, existing pathway tests considering all genes within a pathway suffer from considerable noise and power loss. Moreover, for a differentially expressed pathway, it is of interest to select important genes that drive the effect of the pathway. In this article, we propose an adaptive association test using double kernel machines (DKM), which can both select important genes within the pathway as well as test for the overall genetic pathway effect. This DKM procedure first uses the garrote kernel machines test for the purposes of subset selection and then the least squares kernel machine test for testing the effect of the subset of genes. An appealing feature of the kernel machine framework is that it can provide a flexible and unified method for multi-dimensional modeling of the genetic pathway effect allowing for both parametric and nonparametric components. This DKM approach is illustrated with application to simulated data as well as to data from a neuroimaging genetics study.


Double kernel machine Garrote kernel machine Least squares kernel machine Subset testing Thresholding 



This research was supported by NIH grants CA129102. The authors thank the reviewers for helpful comments.

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bühmann MD (2003) Radial basis functions. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  3. 3.
    Cristianini N, Shawe-Tayor J (2000) An introduction to support vector machines. Cambridge University Press, CambridgeGoogle Scholar
  4. 4.
    Cai T, Lin X, Carroll RJ (2012) Identifying genetic marker sets associated with phenotypes via an efficient adaptive score test. Biostatistics 13:776–790CrossRefGoogle Scholar
  5. 5.
    Cai T, Tonini G, Lin X (2011) Kernel machine approach to testing the significance of multiple genetic markers for risk prediction. Biometrics 67:975–986MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Fan J (1996) Test of significance based on wavelet thresholding and Neyman’s truncation. J Am Stat Assoc 91:674–688CrossRefzbMATHGoogle Scholar
  7. 7.
    Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B Stat Methodol 70:849–911MathSciNetCrossRefGoogle Scholar
  8. 8.
    Harville DA (1977) Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc 72:320–338MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Hofmann T, Schölkopf B, Smola AJ (2008) Kernel method in machine learning. Ann Stat 36:1171–1220CrossRefzbMATHGoogle Scholar
  10. 10.
    Kim MH, Akritas MG (2010) Order thresholding. Ann Stat 38:2314–2350MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Kwee LC, Liu D, Lin X, Ghosh D, Epstein MP (2008) A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 82:386–397CrossRefGoogle Scholar
  12. 12.
    Lin D (2005) An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 21:781–787CrossRefGoogle Scholar
  13. 13.
    Liu D, Lin X, Ghosh D (2007) Semiparametric regression of multi-dimensional genetic pathway data: least squares kernel machine and linear mixed models. Biometrics 63:1079–1088MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Liu D, Ghosh D, Lin X (2008) Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinform 9:292CrossRefGoogle Scholar
  15. 15.
    Maity A, Lin X (2011) Powerful tests for detecing a gene effect in the presence of possible gene-gene interactions using garrote kernel machines. Biometrics 67:1271–1284MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Neyman J (1937) Smooth test for goodness of fit. Scand Actuar J 3–4:149–199CrossRefzbMATHGoogle Scholar
  17. 17.
    Nyholt D (2004) A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet 74:765–769CrossRefGoogle Scholar
  18. 18.
    Pan W, Shen X (2011) Adaptive tests for association analysis of rare variants. Genet Epidemiol 35:381–388CrossRefGoogle Scholar
  19. 19.
    Stein JL, Hua X, Morra JH et al (2010) Genome-wide analysis reveals novel genes influencing temporal lobe structure with relevance to neurodegeneration in Alzheimer’s disease. Neurolmage 51:542–554CrossRefGoogle Scholar
  20. 20.
    Wessel J, Schork NJ (2006) Generalized gonomic distance-based regression methodology for multilocus association analysis. Am J Hum Genet 79:792–806CrossRefGoogle Scholar
  21. 21.
    Wu MC, Kraft P, Epstein MP, Taylor DM, Chanock SJ, Hunter DJ, Lin X (2010) Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929–942CrossRefGoogle Scholar
  22. 22.
    Wu MC, Zhang L, Wang Z, Christiani DC, Lin X (2009) Sparse linear discriminant analysis for simultaneous testing for the significance of a gene set/pathway and gene selection. Bioinformatics 25:1145–1151CrossRefGoogle Scholar

Copyright information

© International Chinese Statistical Association 2014

Authors and Affiliations

  • Xiang Zhan
    • 1
    Email author
  • Michael P. Epstein
    • 3
  • Debashis Ghosh
    • 1
    • 2
  1. 1.Department of StatisticsPennsylvania State UniversityUniversity ParkUSA
  2. 2.Department of Public Health SciencesPennsylvania State UniversityUniversity ParkUSA
  3. 3.Department of Human GeneticsEmory UniversityAtlantaUSA

Personalised recommendations