Abstract
It is typical in QTL mapping experiments that the number of markers under investigation is large. This poses a challenge to commonly used regression models since the number of feature variables is usually much larger than the sample size, especially, when epistasis effects are to be considered. The greedy nature of the conventional stepwise procedures is well known and is even more conspicuous in such cases. In this article, we propose a two-phase procedure based on penalized likelihood techniques and extended Bayes information criterion (EBIC) for QTL mapping. The procedure consists of a screening phase and a selection phase. In the screening phase, the main and interaction features are alternatively screened by a penalized likelihood mechanism. In the selection phase, a low-dimensional approach using EBIC is applied to the features retained in the screening phase to identify QTL. The two-phase procedure has the asymptotic property that its positive detection rate (PDR) and false discovery rate (FDR) converge to 1 and 0, respectively, as sample size goes to infinity. The two-phase procedure is compared with both traditional and recently developed approaches by simulation studies. A real data analysis is presented to demonstrate the application of the two-phase procedure.
Similar content being viewed by others
References
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrox BN, Caski F (eds) Second Int Symp Info Theory. Akademiai Kiado, Budapest, pp 267–281
Baierl A, Bogdan M, Frommlet F, Futschik A (2006) On locating multiple interacting quantitative trait loci in intercross designs. Genetics 173:1693–1703
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
Bogdan M, Ghosh JK, Doerge RW (2004) Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics 167:989–999
Breiman L (1996) Bagging predictors. Mach Learn 26:123–140
Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc Ser B 64:641–656
Chen Z (2004) The full EM algorithm for the MLEs of QTL effects and positions and their estimated variances in multiple interval mapping. Biometrics 61:474–480
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95:759–771
Chen Z, Chen J (2009) Tournament screening cum EBIC for feature selection with high dimensional feature spaces. Sci China Ser A Math Phys Astron 52:1327–1341
Chen Z, Liu J (2009) Mixture generalized linear models for multiple interval mapping of quantitative trait loci in experimental crosses. Biometrics 65:470–477
Cowen NM (1989) Multiple linear regression analysis of RELP data sets used in mapping QTLs. In: Helentjaris T, Burr B (eds) Development and application of molecular markers to problems in plant genetics. Cold Spring Harbor Laboratory, Cold Spring Harbor, pp 113–116
Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Appl Numer Math 31:377–403
Efron B, Tibshirani R (2002) Empirical Bayes methods and false discovery rates for microarrays. Genet Epidemiol 23:70–86
Efron B, Tibshirani R, Storey JD and Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160
Fan J, Li R (2001) Variable selection via non-concave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Lv J (2007) Sure independence screening for ultra-high dimensional feature space. Ann Stat 70:849–911
Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line cross using flanking markers. Heredity 69:315–324
Jansen RC (1993) Interval mapping of multiple quantitative trait loci. Genetics 135:205–211
Jansen RC, Stam P (1994) High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447–1455
Li W, Chen Z (2009) Multiple interval mapping for quantitative trait loci with a spike in the trait distribution. Genetics 182:337–342
Kao CH, Zeng ZB, Teasdale RD (1999) Multiple interval mapping for quantitative trait loci. Genetics 152:1203–1216
Kao CH, Zeng ZB (2002) Modeling epistasis of quantitative trait loci using Cockerham’s model. Genetics 160:1243–1261
Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185–199
Miller A (2002) Subset selection in regression. Chapman & Hall/CRC, Boca Raton
Moreno-Gonzalez J (1992) Genetic models to estimate additive and non-additive effects of marker-associated QTL using multiple regression techniques. Theor Appl Genet 85:435–444
Park MY, Hastie T (2007) An L 1 regularization path algorithm for generalized linear models. J Roy Stat Soc B Ser 69:659–677
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Soller M, Brody T, Genizi A (1976) On the power of experimental design for the detection of linkage between marker loci and quantitative loci in crosses between inbred lines. Theor Appl Genet 47:35–39
Stone M (1974) Cross-validatory choice and assessment of statistical predictions (with Discussion). J Roy Stat Soc B Ser 39:111–147
Storey JD (2002) A direct approach to false discovery rates. J Roy Stat Soc B Ser 64:479–498
Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J Roy Stat Soc B Ser 58:267–288
Zak M, Baierl A, Bogdan M, Futschik A (2007) Locating multiple interacting quantitative trait loci using rank-based model selection. Genetics 176:1845–1854
Zeng ZB (1994) Precision mapping of quantitative trait loci. Genetics 136:1457–1468
Zhao J (2008) Model selection methods and their applications in genome-wide association studies. Dissertation, National University of Singapore
Zhao P, Yu B (2006) On model selection consistency of LASSO. J Mach Learn Res 7:2541–2567
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B Ser 67:301–320
Acknowledgments
The authors would like to express their appreciation to the editor and the anonymous referees for their valuable comments and suggestions which have led to a great deal of improvement on the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by D. Mather.
The research leading to this article is supported by the National University of Singapore research grant R-155-000-065-112.
Rights and permissions
About this article
Cite this article
Chen, Z., Cui, W. A two-phase procedure for QTL mapping with regression models. Theor Appl Genet 121, 363–372 (2010). https://doi.org/10.1007/s00122-010-1315-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-010-1315-8

