A fast expectation-maximum algorithm for fine-scale QTL mapping

Fang, Ming

doi:10.1007/s00122-012-1949-9

A fast expectation-maximum algorithm for fine-scale QTL mapping

Original Paper
Published: 04 August 2012

Volume 125, pages 1727–1734, (2012)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Ming Fang¹

659 Accesses
1 Citation
Explore all metrics

Abstract

The recent technology of the single-nucleotide-polymorphism (SNP) array makes it possible to genotype millions of SNP markers on genome, which in turn requires to develop fast and efficient method for fine-scale quantitative trait loci (QTL) mapping. The single-marker association (SMA) is the simplest method for fine-scale QTL mapping, but it usually shows many false-positive signals and has low QTL-detection power. Compared with SMA, the haplotype-based method of Meuwissen and Goddard who assume QTL effect to be random and estimate variance components (VC) with identity-by-descent (IBD) matrices that inferred from unknown historic population is more powerful for fine-scale QTL mapping; furthermore, their method also tends to show continuous QTL-detection profile to diminish many false-positive signals. However, as we know, the variance component estimation is usually very time consuming and difficult to converge. Thus, an extremely fast EMF (Expectation-Maximization algorithm under Fixed effect model) is proposed in this research, which assumes a biallelic QTL and uses an expectation-maximization (EM) algorithm to solve model effects. The results of simulation experiments showed that (1) EMF was computationally much faster than VC method; (2) EMF and VC performed similarly in QTL detection power and parameter estimations, and both outperformed the paired-marker analysis and SMA. However, the power of EMF would be lower than that of VC if the QTL was multiallelic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Empirical Bayesian elastic net for multiple quantitative trait locus mapping

Article 10 September 2014

A Huang, S Xu & X Cai

Efficiency of low heritability QTL mapping under high SNP density

Article 09 December 2016

José Marcelo Soriano Viana, Fabyano Fonseca e Silva, … Hikmat Ullah Jan

High density marker panels, SNPs prioritizing and accuracy of genomic selection

Article Open access 05 January 2018

Ling-Yun Chang, Sajjad Toghiani, … Romdhane Rekaya

References

Chen WM, Abecasis GR (2007) Family-based association tests for genome wide association scans. Am J Hum Genet 81:913–926
Article PubMed CAS Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38
Google Scholar
Druet T, Fritz S, Boussaha M, Ben-Jemaa S, Guillaume F, Derbala D, Zelenika D, Lechner D, Charon C, Boichard D, Gut IG, Eggen A, Gautier M (2008) Fine mapping of quantitative trait loci affecting female fertility in dairy cattle on BTA03 using a dense single-nucleotide polymorphism map. Genetics 178:2227–2235
Article PubMed CAS Google Scholar
Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315–324
Article PubMed CAS Google Scholar
Hernández-Sánchez J, Haley CS, Woolliams JA (2006) Prediction of IBD based on population history for fine gene mapping. Genet Sel Evol 38:231–252
Article PubMed Google Scholar
Hernández-Sánchez J, Grunchec JA, Knott S (2009) A web application to perform linkage disequilibrium and linkage analyses on a computational grid. Bioinformatics 25:1377–1383
Article PubMed Google Scholar
Hill WG, Hernández-Sánchez J (2007) Prediction of multi-locus identity-by-descent. Genetics 176:1–9
Article Google Scholar
Kaplan NL, Hill WG, Weir BS (1995) Likelihood methods for locating disease genes in nonequilibrium populations. Am J Hum Genet 56:18–32
PubMed CAS Google Scholar
Kimmel G, Karp RM, Jordan MI, Halperin E (2008) Association mapping and significance estimation via the coalescent. Am J Hum Genet 83:675–683
Article PubMed CAS Google Scholar
Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185–194
PubMed CAS Google Scholar
Lee SH, van der Werf JHJ (2006) Simultaneous fine mapping of multiple closely linked quantitative trait loci using combined linkage disequilibrium and linkage with a general pedigree. Genetics 173:2329–2337
Article PubMed CAS Google Scholar
Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nat Genet 39:906–913
Article PubMed CAS Google Scholar
McPeek MS, Strahs A (1999) Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine scale genetic mapping. Am J Hum Genet 65:858–875
Article PubMed CAS Google Scholar
Meuwissen THE, Goddard ME (2000) Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155:421–430
PubMed CAS Google Scholar
Meuwissen THE, Goddard ME (2001) Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol 33:605–634
Article PubMed CAS Google Scholar
Meuwissen THE, Goddard ME (2007) Multipoint identity-by-descent prediction using dense markers to map quantitative trait loci and estimate effective population size. Genetics 176:2551–2560
Article PubMed CAS Google Scholar
Minichiello M, Durbin R (2006) Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet 79:910–922
Article PubMed CAS Google Scholar
Morris AP (2006) A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants. Am J Hum Genet 79:679–694
Article PubMed CAS Google Scholar
Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58:545–554
Article Google Scholar
Pérez-Enciso M (2003) Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics 163:1497–1510
PubMed Google Scholar
Schnabel RD, Kim J-J, Ashwell MS, Sonstegard TS, Van Tassell CP, Connor EE, Taylor JF (2005) Fine-mapping milk production quantitative trait loci on BTA6: analysis of the bovine osteopontin gene. Proc Natl Acad Sci USA 102:6896–6901
Article PubMed CAS Google Scholar
Terwilliger JD (1995) A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. Am J Hum Genet 56:777–787
PubMed CAS Google Scholar
Wang T, Fernanda RL, van der Beek S, van Arendonk JAM (1995) Covariance between relatives for a marked quantitative trait locus. Genet Sel Evol 27:251–274
Google Scholar
Wang WYS, Baratt BJ, Clayton DG, Todd JA (2005) Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 6:109–118
Article PubMed CAS Google Scholar
Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678
Article Google Scholar
Xiong M, Guo SW (1997) Fine-scale genetic mapping based on linkage disequilibrium: theory and applications. Am J Hum Genet 60:1513–1531
Article PubMed CAS Google Scholar

Download references

Acknowledgments

The three reviewers are thanked for their useful comments. This research was supported by Chinese National Natural Science Foundation grant 31001001.

Author information

Authors and Affiliations

Life Science College, Heilongjiang Bayi Agricultural University, Daqing, 163319, People’s Republic of China
Ming Fang

Authors

Ming Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Fang.

Additional information

Communicated by M. Sillanpaa.

Appendices

Appendix 1: Derivation of the probability of an individual haplotype being IBD to the ancestral haplotype at the putative QTL loci

Equation (2) can be written as,

$$ \begin{aligned} p(S_{j} ,S_{j+1} \left| {\text{IBD}} \right.) p(\text{IBD}) &= p(S_{j},\;\text{IBD},\;S_{j+1}) \\ &= \sum {p(\phi ) \times} p(S\left| \phi \right.)\end{aligned}. $$

(10)

The second term in Eq. (10) can be factorized as, $ p(S|\phi) = \prod\limits_{{\text{marker loci}\;j}}^{j+1} {p(S(j)|\phi (j))} $, where ϕ is the IBD status of a segment including QTL locus, flanking markers (marker j and j + 1) and the regions in between them; $ p(S(j)\left| {\phi (j)} \right.) $ is the probability of the IBS between individual and ancestral haplotype at marker j conditional on the IBD status of marker j. The calculation of $ p(S\left| \phi \right.) $ for four IBS statuses of flanking markers $ (S_{j},\;S_{j + 1}) $, (1, 1), (1, 0), (0, 1) and (0, 0) can be easily obtained from a _j and a _j+1, where a _j(a _j+1) denotes the probabilities of the IBS but not the nonIBD between the individual and ancestral haplotype at marker j(j + 1) (see Meuwissen and Goddard 2001). Assuming the frequency of each marker allele to be equal in base population, $ a_{j} $can be estimated as 1/(Number of alleles of jth marker). However, this assumption can be relaxed, which will be illustrated in “Discussion”.

The first term in Eq. (10), p(ϕ) is the probability of the IBD status of the segment including QTL locus, flanking markers (markers j and markers j + 1) and the regions in between them, which is derived from f(c) (see Meuwissen and Goddard 2001). f(c) is the probability of having an IBD region of size c between two individual haplotypes in Meuwissen and Goddard (2001), but it refers to the probability of having an IBD region of size c between an individual haplotype and the ancestral haplotype in EMF, and thus can be expressed as

$$ {\text f}(c) = \exp (-c)^{T} \alpha , $$

(11)

where the first term is the probability that the segment of size c is unbroken for T generations of meiosis; and the second term α is the probability that the intact IBD segment is inherited from the ancestral haplotype carrying the mutant QTL allele. α equals to N _M/N, where N _M is the number of current haplotypes containing the mutation M, and N is the total number of haplotypes. But because N _M is unknown, α also cannot be obtained; therefore, in practice, α should be set beforehand, and the effect of α will be discussed later. The calculation of p(ϕ) with f(c) has been explained at length in Meuwissen and Goddard (2001), and thus they are not presented here. Once p(ϕ) and $ p(S\left| \phi\right.) $ have been calculated, Eq. (10) can be obtained by summing all possible terms relevant to ϕ that is IBD at QTL (see also Table III of Meuwissen and Goddard 2001 for more details).

The second term in the denominator of Eq. (2) also can be calculated using similar approach, and it can be factorized as

$$ p(S_{j},\;S_{j + 1} \left| {\text{nonIBD}} \right.)p(\text{nonIBD}) = p(S_{j},\;\text{nonIBD},\;S_{j+1}) = \sum {p(\phi) \times p(S|\phi)}. $$

(12)

which is calculated by summing all possible terms relevant to ϕ that is nonIBD at QTL (see Table III in Meuwissen and Goddard 2001). For clarification, all notations involved in this section are listed in Table 2.

Table 2 List of the notation symbols

Full size table

Appendix 2: Extension to the combination of the linkage disequilibrium and linkage analysis

A pedigree with two generations was taken as an example to illustrate the approach to incorporate the linkage information, but the approach can be extended to other more complex pedigrees. Given the linkage phases of the unrelated founders and their offspring, the probabilities that offspring i carries two father’s QTL alleles $ A_{1}^{\text{P}} $ and $ A_{2}^{\text{P}} $ and two mother’s QTL alleles $ A_{1}^{\text{M}} {\text{ and }} A_{2}^{\text{M}} $, $ \text{Prob}(A_{1}^{\text{P}} ) {\text{ and }}\text{Prob}(A_{2}^{\text{P}} ) $, and $ \text{Prob}(A_{1}^{\text{M}} ) {\text{ and }} \text{Prob}(A_{2}^{\text{M}} ) $, respectively, can be easily inferred from flanking markers according to Haldane’s recombination rule (e.g., using the method of Wang et al. 1995). Given the probability that two QTL alleles (indicated by 1 and 2, respectively) of the father (i _P) and mother (i _M) of offspring i is IBD to the ancestral mutation allele, denoted by $ \pi_{{i_{\text{P}} }}^{1} {\text{ and }} \pi_{{i_{\text{P}} }}^{2} $ (for father), $ \pi_{{i_{\text{M}} }}^{1} {\text{ and }} \pi_{{i_{\text{M}} }}^{2} $ (for mother), the probabilities of three QTL genotypes combining LD and LA information can be calculated as, $ P_{i1} = (\text{Prob}(A_{1}^{\text{P}} )\pi_{{i_{\text{P}} }}^{1} + \text{Prob}(A_{2}^{\text{P}} )\pi_{{i_{\text{P}} }}^{2} ) \cdot (\text{Prob}(A_{1}^{\text{M}} ) + \text{Prob}(A_{2}^{\text{M}})\pi_{{i_{\text{M}} }}^{2} ) $ for genotype MM, $ P_{i3}= (\text{Prob}(A_{1}^{\text{P}} )(1-\pi_{{i_{\text{P}}}}^{\text{P}} ) + \text{Prob}(A_{2}^{\text{P}} )(1-\pi_{{i_{\text{P}} }}^{\text{M}} )) \cdot (\text{Prob}(A_{1}^{M} )(1 - \pi_{{i_{\text{M}} }}^{\text{P}} ) + \text{Prob}(A_{2}^{\text{M}} )(1 - \pi_{{i_{\text{M}} }}^{\text{M}} )) $ for mm, and $ P_{i2}= 1 - P_{i1}-P_{i3} $ for Mm or mM, respectively, which assumes the QTL loci is in Hardy–Weinberg equilibrium.

Appendix 3: Simulation of multiple QTL mutations

A chromosome segment with length of 2 cM was simulated. Twelve markers were evenly spaced on the segment and one QTL was localized at 1.05 cM. The population was created 500 generations ago, the effective population size (N _e) was 200, and sex ratio was 1:1. In the base population, two alleles were assigned to each marker with equal frequency, and only one allele was assigned to QTL. The marker alleles were mutated at a rate of 4 × 10⁻⁴/generation. A new QTL mutation occurred every two generations. One individual haplotype was randomly chosen, and the QTL allele on the haplotype was mutated to a new QTL allele and assigned a new number. The high mutation rate of QTL might result in about 6–12 alleles in the present population. The effects of each QTL allele were randomly sampled from standard normal distribution N(0, 1). At the last generation, the effect of each QTL allele was rescaled so that the mean of QTL effect was zero and the variance was 0.2. The residual effect was sampled from normal distribution with mean 0 and variance 0.9; the overall mean was set as zero, and no polygenic effect was simulated. With these settings, the heritability explained by QTL was 0.18. The phenotypic value for each individual then was generated by summing the overall mean, QTL effect and residual error.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, M. A fast expectation-maximum algorithm for fine-scale QTL mapping. Theor Appl Genet 125, 1727–1734 (2012). https://doi.org/10.1007/s00122-012-1949-9

Download citation

Received: 14 April 2012
Accepted: 15 July 2012
Published: 04 August 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s00122-012-1949-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast expectation-maximum algorithm for fine-scale QTL mapping

Abstract

Access this article

Similar content being viewed by others

Empirical Bayesian elastic net for multiple quantitative trait locus mapping

Efficiency of low heritability QTL mapping under high SNP density

High density marker panels, SNPs prioritizing and accuracy of genomic selection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Derivation of the probability of an individual haplotype being IBD to the ancestral haplotype at the putative QTL loci

Appendix 2: Extension to the combination of the linkage disequilibrium and linkage analysis

Appendix 3: Simulation of multiple QTL mutations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast expectation-maximum algorithm for fine-scale QTL mapping

Abstract

Access this article

Similar content being viewed by others

Empirical Bayesian elastic net for multiple quantitative trait locus mapping

Efficiency of low heritability QTL mapping under high SNP density

High density marker panels, SNPs prioritizing and accuracy of genomic selection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Derivation of the probability of an individual haplotype being IBD to the ancestral haplotype at the putative QTL loci

Appendix 2: Extension to the combination of the linkage disequilibrium and linkage analysis

Appendix 3: Simulation of multiple QTL mutations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation