Skip to main content
Log in

A fast expectation-maximum algorithm for fine-scale QTL mapping

  • Original Paper
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

The recent technology of the single-nucleotide-polymorphism (SNP) array makes it possible to genotype millions of SNP markers on genome, which in turn requires to develop fast and efficient method for fine-scale quantitative trait loci (QTL) mapping. The single-marker association (SMA) is the simplest method for fine-scale QTL mapping, but it usually shows many false-positive signals and has low QTL-detection power. Compared with SMA, the haplotype-based method of Meuwissen and Goddard who assume QTL effect to be random and estimate variance components (VC) with identity-by-descent (IBD) matrices that inferred from unknown historic population is more powerful for fine-scale QTL mapping; furthermore, their method also tends to show continuous QTL-detection profile to diminish many false-positive signals. However, as we know, the variance component estimation is usually very time consuming and difficult to converge. Thus, an extremely fast EMF (Expectation-Maximization algorithm under Fixed effect model) is proposed in this research, which assumes a biallelic QTL and uses an expectation-maximization (EM) algorithm to solve model effects. The results of simulation experiments showed that (1) EMF was computationally much faster than VC method; (2) EMF and VC performed similarly in QTL detection power and parameter estimations, and both outperformed the paired-marker analysis and SMA. However, the power of EMF would be lower than that of VC if the QTL was multiallelic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  • Chen WM, Abecasis GR (2007) Family-based association tests for genome wide association scans. Am J Hum Genet 81:913–926

    Article  PubMed  CAS  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38

    Google Scholar 

  • Druet T, Fritz S, Boussaha M, Ben-Jemaa S, Guillaume F, Derbala D, Zelenika D, Lechner D, Charon C, Boichard D, Gut IG, Eggen A, Gautier M (2008) Fine mapping of quantitative trait loci affecting female fertility in dairy cattle on BTA03 using a dense single-nucleotide polymorphism map. Genetics 178:2227–2235

    Article  PubMed  CAS  Google Scholar 

  • Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315–324

    Article  PubMed  CAS  Google Scholar 

  • Hernández-Sánchez J, Haley CS, Woolliams JA (2006) Prediction of IBD based on population history for fine gene mapping. Genet Sel Evol 38:231–252

    Article  PubMed  Google Scholar 

  • Hernández-Sánchez J, Grunchec JA, Knott S (2009) A web application to perform linkage disequilibrium and linkage analyses on a computational grid. Bioinformatics 25:1377–1383

    Article  PubMed  Google Scholar 

  • Hill WG, Hernández-Sánchez J (2007) Prediction of multi-locus identity-by-descent. Genetics 176:1–9

    Article  Google Scholar 

  • Kaplan NL, Hill WG, Weir BS (1995) Likelihood methods for locating disease genes in nonequilibrium populations. Am J Hum Genet 56:18–32

    PubMed  CAS  Google Scholar 

  • Kimmel G, Karp RM, Jordan MI, Halperin E (2008) Association mapping and significance estimation via the coalescent. Am J Hum Genet 83:675–683

    Article  PubMed  CAS  Google Scholar 

  • Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185–194

    PubMed  CAS  Google Scholar 

  • Lee SH, van der Werf JHJ (2006) Simultaneous fine mapping of multiple closely linked quantitative trait loci using combined linkage disequilibrium and linkage with a general pedigree. Genetics 173:2329–2337

    Article  PubMed  CAS  Google Scholar 

  • Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nat Genet 39:906–913

    Article  PubMed  CAS  Google Scholar 

  • McPeek MS, Strahs A (1999) Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine scale genetic mapping. Am J Hum Genet 65:858–875

    Article  PubMed  CAS  Google Scholar 

  • Meuwissen THE, Goddard ME (2000) Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155:421–430

    PubMed  CAS  Google Scholar 

  • Meuwissen THE, Goddard ME (2001) Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol 33:605–634

    Article  PubMed  CAS  Google Scholar 

  • Meuwissen THE, Goddard ME (2007) Multipoint identity-by-descent prediction using dense markers to map quantitative trait loci and estimate effective population size. Genetics 176:2551–2560

    Article  PubMed  CAS  Google Scholar 

  • Minichiello M, Durbin R (2006) Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet 79:910–922

    Article  PubMed  CAS  Google Scholar 

  • Morris AP (2006) A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants. Am J Hum Genet 79:679–694

    Article  PubMed  CAS  Google Scholar 

  • Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58:545–554

    Article  Google Scholar 

  • Pérez-Enciso M (2003) Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics 163:1497–1510

    PubMed  Google Scholar 

  • Schnabel RD, Kim J-J, Ashwell MS, Sonstegard TS, Van Tassell CP, Connor EE, Taylor JF (2005) Fine-mapping milk production quantitative trait loci on BTA6: analysis of the bovine osteopontin gene. Proc Natl Acad Sci USA 102:6896–6901

    Article  PubMed  CAS  Google Scholar 

  • Terwilliger JD (1995) A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. Am J Hum Genet 56:777–787

    PubMed  CAS  Google Scholar 

  • Wang T, Fernanda RL, van der Beek S, van Arendonk JAM (1995) Covariance between relatives for a marked quantitative trait locus. Genet Sel Evol 27:251–274

    Google Scholar 

  • Wang WYS, Baratt BJ, Clayton DG, Todd JA (2005) Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 6:109–118

    Article  PubMed  CAS  Google Scholar 

  • Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678

    Article  Google Scholar 

  • Xiong M, Guo SW (1997) Fine-scale genetic mapping based on linkage disequilibrium: theory and applications. Am J Hum Genet 60:1513–1531

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The three reviewers are thanked for their useful comments. This research was supported by Chinese National Natural Science Foundation grant 31001001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Fang.

Additional information

Communicated by M. Sillanpaa.

Appendices

Appendix 1: Derivation of the probability of an individual haplotype being IBD to the ancestral haplotype at the putative QTL loci

Equation (2) can be written as,

$$ \begin{aligned} p(S_{j} ,S_{j+1} \left| {\text{IBD}} \right.) p(\text{IBD}) &= p(S_{j},\;\text{IBD},\;S_{j+1}) \\ &= \sum {p(\phi ) \times} p(S\left| \phi \right.)\end{aligned}. $$
(10)

The second term in Eq. (10) can be factorized as, \( p(S|\phi) = \prod\limits_{{\text{marker loci}\;j}}^{j+1} {p(S(j)|\phi (j))} \), where ϕ is the IBD status of a segment including QTL locus, flanking markers (marker j and j + 1) and the regions in between them; \( p(S(j)\left| {\phi (j)} \right.) \) is the probability of the IBS between individual and ancestral haplotype at marker j conditional on the IBD status of marker j. The calculation of \( p(S\left| \phi \right.) \) for four IBS statuses of flanking markers \( (S_{j},\;S_{j + 1}) \), (1, 1), (1, 0), (0, 1) and (0, 0) can be easily obtained from a j and a j+1, where a j (a j+1) denotes the probabilities of the IBS but not the nonIBD between the individual and ancestral haplotype at marker j(j + 1) (see Meuwissen and Goddard 2001). Assuming the frequency of each marker allele to be equal in base population, \( a_{j} \)can be estimated as 1/(Number of alleles of jth marker). However, this assumption can be relaxed, which will be illustrated in “Discussion”.

The first term in Eq. (10), p(ϕ) is the probability of the IBD status of the segment including QTL locus, flanking markers (markers j and markers j + 1) and the regions in between them, which is derived from f(c) (see Meuwissen and Goddard 2001). f(c) is the probability of having an IBD region of size c between two individual haplotypes in Meuwissen and Goddard (2001), but it refers to the probability of having an IBD region of size c between an individual haplotype and the ancestral haplotype in EMF, and thus can be expressed as

$$ {\text f}(c) = \exp (-c)^{T} \alpha , $$
(11)

where the first term is the probability that the segment of size c is unbroken for T generations of meiosis; and the second term α is the probability that the intact IBD segment is inherited from the ancestral haplotype carrying the mutant QTL allele. α equals to N M /N, where N M is the number of current haplotypes containing the mutation M, and N is the total number of haplotypes. But because N M is unknown, α also cannot be obtained; therefore, in practice, α should be set beforehand, and the effect of α will be discussed later. The calculation of p(ϕ) with f(c) has been explained at length in Meuwissen and Goddard (2001), and thus they are not presented here. Once p(ϕ) and \( p(S\left| \phi\right.) \) have been calculated, Eq. (10) can be obtained by summing all possible terms relevant to ϕ that is IBD at QTL (see also Table III of Meuwissen and Goddard 2001 for more details).

The second term in the denominator of Eq. (2) also can be calculated using similar approach, and it can be factorized as

$$ p(S_{j},\;S_{j + 1} \left| {\text{nonIBD}} \right.)p(\text{nonIBD}) = p(S_{j},\;\text{nonIBD},\;S_{j+1}) = \sum {p(\phi) \times p(S|\phi)}. $$
(12)

which is calculated by summing all possible terms relevant to ϕ that is nonIBD at QTL (see Table III in Meuwissen and Goddard 2001). For clarification, all notations involved in this section are listed in Table 2.

Table 2 List of the notation symbols

Appendix 2: Extension to the combination of the linkage disequilibrium and linkage analysis

A pedigree with two generations was taken as an example to illustrate the approach to incorporate the linkage information, but the approach can be extended to other more complex pedigrees. Given the linkage phases of the unrelated founders and their offspring, the probabilities that offspring i carries two father’s QTL alleles \( A_{1}^{\text{P}} \) and \( A_{2}^{\text{P}} \) and two mother’s QTL alleles \( A_{1}^{\text{M}} {\text{ and }} A_{2}^{\text{M}} \), \( \text{Prob}(A_{1}^{\text{P}} ) {\text{ and }}\text{Prob}(A_{2}^{\text{P}} ) \), and \( \text{Prob}(A_{1}^{\text{M}} ) {\text{ and }} \text{Prob}(A_{2}^{\text{M}} ) \), respectively, can be easily inferred from flanking markers according to Haldane’s recombination rule (e.g., using the method of Wang et al. 1995). Given the probability that two QTL alleles (indicated by 1 and 2, respectively) of the father (i P) and mother (i M) of offspring i is IBD to the ancestral mutation allele, denoted by \( \pi_{{i_{\text{P}} }}^{1} {\text{ and }} \pi_{{i_{\text{P}} }}^{2} \) (for father), \( \pi_{{i_{\text{M}} }}^{1} {\text{ and }} \pi_{{i_{\text{M}} }}^{2} \) (for mother), the probabilities of three QTL genotypes combining LD and LA information can be calculated as, \( P_{i1} = (\text{Prob}(A_{1}^{\text{P}} )\pi_{{i_{\text{P}} }}^{1} + \text{Prob}(A_{2}^{\text{P}} )\pi_{{i_{\text{P}} }}^{2} ) \cdot (\text{Prob}(A_{1}^{\text{M}} ) + \text{Prob}(A_{2}^{\text{M}})\pi_{{i_{\text{M}} }}^{2} ) \) for genotype MM, \( P_{i3}= (\text{Prob}(A_{1}^{\text{P}} )(1-\pi_{{i_{\text{P}}}}^{\text{P}} ) + \text{Prob}(A_{2}^{\text{P}} )(1-\pi_{{i_{\text{P}} }}^{\text{M}} )) \cdot (\text{Prob}(A_{1}^{M} )(1 - \pi_{{i_{\text{M}} }}^{\text{P}} ) + \text{Prob}(A_{2}^{\text{M}} )(1 - \pi_{{i_{\text{M}} }}^{\text{M}} )) \) for mm, and \( P_{i2}= 1 - P_{i1}-P_{i3} \) for Mm or mM, respectively, which assumes the QTL loci is in Hardy–Weinberg equilibrium.

Appendix 3: Simulation of multiple QTL mutations

A chromosome segment with length of 2 cM was simulated. Twelve markers were evenly spaced on the segment and one QTL was localized at 1.05 cM. The population was created 500 generations ago, the effective population size (N e) was 200, and sex ratio was 1:1. In the base population, two alleles were assigned to each marker with equal frequency, and only one allele was assigned to QTL. The marker alleles were mutated at a rate of 4 × 10−4/generation. A new QTL mutation occurred every two generations. One individual haplotype was randomly chosen, and the QTL allele on the haplotype was mutated to a new QTL allele and assigned a new number. The high mutation rate of QTL might result in about 6–12 alleles in the present population. The effects of each QTL allele were randomly sampled from standard normal distribution N(0, 1). At the last generation, the effect of each QTL allele was rescaled so that the mean of QTL effect was zero and the variance was 0.2. The residual effect was sampled from normal distribution with mean 0 and variance 0.9; the overall mean was set as zero, and no polygenic effect was simulated. With these settings, the heritability explained by QTL was 0.18. The phenotypic value for each individual then was generated by summing the overall mean, QTL effect and residual error.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, M. A fast expectation-maximum algorithm for fine-scale QTL mapping. Theor Appl Genet 125, 1727–1734 (2012). https://doi.org/10.1007/s00122-012-1949-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-012-1949-9

Keywords

Navigation