Skip to main content
Log in

A two-phase procedure for QTL mapping with regression models

  • Original Paper
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

It is typical in QTL mapping experiments that the number of markers under investigation is large. This poses a challenge to commonly used regression models since the number of feature variables is usually much larger than the sample size, especially, when epistasis effects are to be considered. The greedy nature of the conventional stepwise procedures is well known and is even more conspicuous in such cases. In this article, we propose a two-phase procedure based on penalized likelihood techniques and extended Bayes information criterion (EBIC) for QTL mapping. The procedure consists of a screening phase and a selection phase. In the screening phase, the main and interaction features are alternatively screened by a penalized likelihood mechanism. In the selection phase, a low-dimensional approach using EBIC is applied to the features retained in the screening phase to identify QTL. The two-phase procedure has the asymptotic property that its positive detection rate (PDR) and false discovery rate (FDR) converge to 1 and 0, respectively, as sample size goes to infinity. The two-phase procedure is compared with both traditional and recently developed approaches by simulation studies. A real data analysis is presented to demonstrate the application of the two-phase procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrox BN, Caski F (eds) Second Int Symp Info Theory. Akademiai Kiado, Budapest, pp 267–281

  • Baierl A, Bogdan M, Frommlet F, Futschik A (2006) On locating multiple interacting quantitative trait loci in intercross designs. Genetics 173:1693–1703

    Article  PubMed  CAS  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300

    Google Scholar 

  • Bogdan M, Ghosh JK, Doerge RW (2004) Modifying the Schwarz Bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics 167:989–999

    Article  PubMed  CAS  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 26:123–140

    Google Scholar 

  • Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc Ser B 64:641–656

    Article  Google Scholar 

  • Chen Z (2004) The full EM algorithm for the MLEs of QTL effects and positions and their estimated variances in multiple interval mapping. Biometrics 61:474–480

    Article  Google Scholar 

  • Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95:759–771

    Article  Google Scholar 

  • Chen Z, Chen J (2009) Tournament screening cum EBIC for feature selection with high dimensional feature spaces. Sci China Ser A Math Phys Astron 52:1327–1341

    Article  Google Scholar 

  • Chen Z, Liu J (2009) Mixture generalized linear models for multiple interval mapping of quantitative trait loci in experimental crosses. Biometrics 65:470–477

    Article  PubMed  CAS  Google Scholar 

  • Cowen NM (1989) Multiple linear regression analysis of RELP data sets used in mapping QTLs. In: Helentjaris T, Burr B (eds) Development and application of molecular markers to problems in plant genetics. Cold Spring Harbor Laboratory, Cold Spring Harbor, pp 113–116

    Google Scholar 

  • Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Appl Numer Math 31:377–403

    Article  Google Scholar 

  • Efron B, Tibshirani R (2002) Empirical Bayes methods and false discovery rates for microarrays. Genet Epidemiol 23:70–86

    Article  PubMed  Google Scholar 

  • Efron B, Tibshirani R, Storey JD and Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160

    Article  Google Scholar 

  • Fan J, Li R (2001) Variable selection via non-concave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  Google Scholar 

  • Fan J, Lv J (2007) Sure independence screening for ultra-high dimensional feature space. Ann Stat 70:849–911

    Google Scholar 

  • Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line cross using flanking markers. Heredity 69:315–324

    PubMed  CAS  Google Scholar 

  • Jansen RC (1993) Interval mapping of multiple quantitative trait loci. Genetics 135:205–211

    PubMed  CAS  Google Scholar 

  • Jansen RC, Stam P (1994) High resolution of quantitative traits into multiple loci via interval mapping. Genetics 136:1447–1455

    PubMed  CAS  Google Scholar 

  • Li W, Chen Z (2009) Multiple interval mapping for quantitative trait loci with a spike in the trait distribution. Genetics 182:337–342

    Article  PubMed  CAS  Google Scholar 

  • Kao CH, Zeng ZB, Teasdale RD (1999) Multiple interval mapping for quantitative trait loci. Genetics 152:1203–1216

    PubMed  CAS  Google Scholar 

  • Kao CH, Zeng ZB (2002) Modeling epistasis of quantitative trait loci using Cockerham’s model. Genetics 160:1243–1261

    PubMed  Google Scholar 

  • Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185–199

    PubMed  CAS  Google Scholar 

  • Miller A (2002) Subset selection in regression. Chapman & Hall/CRC, Boca Raton

    Book  Google Scholar 

  • Moreno-Gonzalez J (1992) Genetic models to estimate additive and non-additive effects of marker-associated QTL using multiple regression techniques. Theor Appl Genet 85:435–444

    Google Scholar 

  • Park MY, Hastie T (2007) An L 1 regularization path algorithm for generalized linear models. J Roy Stat Soc B Ser 69:659–677

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  Google Scholar 

  • Soller M, Brody T, Genizi A (1976) On the power of experimental design for the detection of linkage between marker loci and quantitative loci in crosses between inbred lines. Theor Appl Genet 47:35–39

    Article  Google Scholar 

  • Stone M (1974) Cross-validatory choice and assessment of statistical predictions (with Discussion). J Roy Stat Soc B Ser 39:111–147

    Google Scholar 

  • Storey JD (2002) A direct approach to false discovery rates. J Roy Stat Soc B Ser 64:479–498

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the LASSO. J Roy Stat Soc B Ser 58:267–288

    Google Scholar 

  • Zak M, Baierl A, Bogdan M, Futschik A (2007) Locating multiple interacting quantitative trait loci using rank-based model selection. Genetics 176:1845–1854

    Article  PubMed  Google Scholar 

  • Zeng ZB (1994) Precision mapping of quantitative trait loci. Genetics 136:1457–1468

    PubMed  CAS  Google Scholar 

  • Zhao J (2008) Model selection methods and their applications in genome-wide association studies. Dissertation, National University of Singapore

  • Zhao P, Yu B (2006) On model selection consistency of LASSO. J Mach Learn Res 7:2541–2567

    Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc B Ser 67:301–320

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to express their appreciation to the editor and the anonymous referees for their valuable comments and suggestions which have led to a great deal of improvement on the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zehua Chen.

Additional information

Communicated by D. Mather.

The research leading to this article is supported by the National University of Singapore research grant R-155-000-065-112.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Cui, W. A two-phase procedure for QTL mapping with regression models. Theor Appl Genet 121, 363–372 (2010). https://doi.org/10.1007/s00122-010-1315-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-010-1315-8

Keywords