Abstract
The recent technology of the single-nucleotide-polymorphism (SNP) array makes it possible to genotype millions of SNP markers on genome, which in turn requires to develop fast and efficient method for fine-scale quantitative trait loci (QTL) mapping. The single-marker association (SMA) is the simplest method for fine-scale QTL mapping, but it usually shows many false-positive signals and has low QTL-detection power. Compared with SMA, the haplotype-based method of Meuwissen and Goddard who assume QTL effect to be random and estimate variance components (VC) with identity-by-descent (IBD) matrices that inferred from unknown historic population is more powerful for fine-scale QTL mapping; furthermore, their method also tends to show continuous QTL-detection profile to diminish many false-positive signals. However, as we know, the variance component estimation is usually very time consuming and difficult to converge. Thus, an extremely fast EMF (Expectation-Maximization algorithm under Fixed effect model) is proposed in this research, which assumes a biallelic QTL and uses an expectation-maximization (EM) algorithm to solve model effects. The results of simulation experiments showed that (1) EMF was computationally much faster than VC method; (2) EMF and VC performed similarly in QTL detection power and parameter estimations, and both outperformed the paired-marker analysis and SMA. However, the power of EMF would be lower than that of VC if the QTL was multiallelic.
Similar content being viewed by others
References
Chen WM, Abecasis GR (2007) Family-based association tests for genome wide association scans. Am J Hum Genet 81:913–926
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc B 39:1–38
Druet T, Fritz S, Boussaha M, Ben-Jemaa S, Guillaume F, Derbala D, Zelenika D, Lechner D, Charon C, Boichard D, Gut IG, Eggen A, Gautier M (2008) Fine mapping of quantitative trait loci affecting female fertility in dairy cattle on BTA03 using a dense single-nucleotide polymorphism map. Genetics 178:2227–2235
Haley CS, Knott SA (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69:315–324
Hernández-Sánchez J, Haley CS, Woolliams JA (2006) Prediction of IBD based on population history for fine gene mapping. Genet Sel Evol 38:231–252
Hernández-Sánchez J, Grunchec JA, Knott S (2009) A web application to perform linkage disequilibrium and linkage analyses on a computational grid. Bioinformatics 25:1377–1383
Hill WG, Hernández-Sánchez J (2007) Prediction of multi-locus identity-by-descent. Genetics 176:1–9
Kaplan NL, Hill WG, Weir BS (1995) Likelihood methods for locating disease genes in nonequilibrium populations. Am J Hum Genet 56:18–32
Kimmel G, Karp RM, Jordan MI, Halperin E (2008) Association mapping and significance estimation via the coalescent. Am J Hum Genet 83:675–683
Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185–194
Lee SH, van der Werf JHJ (2006) Simultaneous fine mapping of multiple closely linked quantitative trait loci using combined linkage disequilibrium and linkage with a general pedigree. Genetics 173:2329–2337
Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nat Genet 39:906–913
McPeek MS, Strahs A (1999) Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine scale genetic mapping. Am J Hum Genet 65:858–875
Meuwissen THE, Goddard ME (2000) Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155:421–430
Meuwissen THE, Goddard ME (2001) Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol 33:605–634
Meuwissen THE, Goddard ME (2007) Multipoint identity-by-descent prediction using dense markers to map quantitative trait loci and estimate effective population size. Genetics 176:2551–2560
Minichiello M, Durbin R (2006) Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet 79:910–922
Morris AP (2006) A flexible Bayesian framework for modeling haplotype association with disease, allowing for dominance effects of the underlying causative variants. Am J Hum Genet 79:679–694
Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58:545–554
Pérez-Enciso M (2003) Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics 163:1497–1510
Schnabel RD, Kim J-J, Ashwell MS, Sonstegard TS, Van Tassell CP, Connor EE, Taylor JF (2005) Fine-mapping milk production quantitative trait loci on BTA6: analysis of the bovine osteopontin gene. Proc Natl Acad Sci USA 102:6896–6901
Terwilliger JD (1995) A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. Am J Hum Genet 56:777–787
Wang T, Fernanda RL, van der Beek S, van Arendonk JAM (1995) Covariance between relatives for a marked quantitative trait locus. Genet Sel Evol 27:251–274
Wang WYS, Baratt BJ, Clayton DG, Todd JA (2005) Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 6:109–118
Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678
Xiong M, Guo SW (1997) Fine-scale genetic mapping based on linkage disequilibrium: theory and applications. Am J Hum Genet 60:1513–1531
Acknowledgments
The three reviewers are thanked for their useful comments. This research was supported by Chinese National Natural Science Foundation grant 31001001.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by M. Sillanpaa.
Appendices
Appendix 1: Derivation of the probability of an individual haplotype being IBD to the ancestral haplotype at the putative QTL loci
Equation (2) can be written as,
The second term in Eq. (10) can be factorized as, \( p(S|\phi) = \prod\limits_{{\text{marker loci}\;j}}^{j+1} {p(S(j)|\phi (j))} \), where ϕ is the IBD status of a segment including QTL locus, flanking markers (marker j and j + 1) and the regions in between them; \( p(S(j)\left| {\phi (j)} \right.) \) is the probability of the IBS between individual and ancestral haplotype at marker j conditional on the IBD status of marker j. The calculation of \( p(S\left| \phi \right.) \) for four IBS statuses of flanking markers \( (S_{j},\;S_{j + 1}) \), (1, 1), (1, 0), (0, 1) and (0, 0) can be easily obtained from a j and a j+1, where a j (a j+1) denotes the probabilities of the IBS but not the nonIBD between the individual and ancestral haplotype at marker j(j + 1) (see Meuwissen and Goddard 2001). Assuming the frequency of each marker allele to be equal in base population, \( a_{j} \)can be estimated as 1/(Number of alleles of jth marker). However, this assumption can be relaxed, which will be illustrated in “Discussion”.
The first term in Eq. (10), p(ϕ) is the probability of the IBD status of the segment including QTL locus, flanking markers (markers j and markers j + 1) and the regions in between them, which is derived from f(c) (see Meuwissen and Goddard 2001). f(c) is the probability of having an IBD region of size c between two individual haplotypes in Meuwissen and Goddard (2001), but it refers to the probability of having an IBD region of size c between an individual haplotype and the ancestral haplotype in EMF, and thus can be expressed as
where the first term is the probability that the segment of size c is unbroken for T generations of meiosis; and the second term α is the probability that the intact IBD segment is inherited from the ancestral haplotype carrying the mutant QTL allele. α equals to N M /N, where N M is the number of current haplotypes containing the mutation M, and N is the total number of haplotypes. But because N M is unknown, α also cannot be obtained; therefore, in practice, α should be set beforehand, and the effect of α will be discussed later. The calculation of p(ϕ) with f(c) has been explained at length in Meuwissen and Goddard (2001), and thus they are not presented here. Once p(ϕ) and \( p(S\left| \phi\right.) \) have been calculated, Eq. (10) can be obtained by summing all possible terms relevant to ϕ that is IBD at QTL (see also Table III of Meuwissen and Goddard 2001 for more details).
The second term in the denominator of Eq. (2) also can be calculated using similar approach, and it can be factorized as
which is calculated by summing all possible terms relevant to ϕ that is nonIBD at QTL (see Table III in Meuwissen and Goddard 2001). For clarification, all notations involved in this section are listed in Table 2.
Appendix 2: Extension to the combination of the linkage disequilibrium and linkage analysis
A pedigree with two generations was taken as an example to illustrate the approach to incorporate the linkage information, but the approach can be extended to other more complex pedigrees. Given the linkage phases of the unrelated founders and their offspring, the probabilities that offspring i carries two father’s QTL alleles \( A_{1}^{\text{P}} \) and \( A_{2}^{\text{P}} \) and two mother’s QTL alleles \( A_{1}^{\text{M}} {\text{ and }} A_{2}^{\text{M}} \), \( \text{Prob}(A_{1}^{\text{P}} ) {\text{ and }}\text{Prob}(A_{2}^{\text{P}} ) \), and \( \text{Prob}(A_{1}^{\text{M}} ) {\text{ and }} \text{Prob}(A_{2}^{\text{M}} ) \), respectively, can be easily inferred from flanking markers according to Haldane’s recombination rule (e.g., using the method of Wang et al. 1995). Given the probability that two QTL alleles (indicated by 1 and 2, respectively) of the father (i P) and mother (i M) of offspring i is IBD to the ancestral mutation allele, denoted by \( \pi_{{i_{\text{P}} }}^{1} {\text{ and }} \pi_{{i_{\text{P}} }}^{2} \) (for father), \( \pi_{{i_{\text{M}} }}^{1} {\text{ and }} \pi_{{i_{\text{M}} }}^{2} \) (for mother), the probabilities of three QTL genotypes combining LD and LA information can be calculated as, \( P_{i1} = (\text{Prob}(A_{1}^{\text{P}} )\pi_{{i_{\text{P}} }}^{1} + \text{Prob}(A_{2}^{\text{P}} )\pi_{{i_{\text{P}} }}^{2} ) \cdot (\text{Prob}(A_{1}^{\text{M}} ) + \text{Prob}(A_{2}^{\text{M}})\pi_{{i_{\text{M}} }}^{2} ) \) for genotype MM, \( P_{i3}= (\text{Prob}(A_{1}^{\text{P}} )(1-\pi_{{i_{\text{P}}}}^{\text{P}} ) + \text{Prob}(A_{2}^{\text{P}} )(1-\pi_{{i_{\text{P}} }}^{\text{M}} )) \cdot (\text{Prob}(A_{1}^{M} )(1 - \pi_{{i_{\text{M}} }}^{\text{P}} ) + \text{Prob}(A_{2}^{\text{M}} )(1 - \pi_{{i_{\text{M}} }}^{\text{M}} )) \) for mm, and \( P_{i2}= 1 - P_{i1}-P_{i3} \) for Mm or mM, respectively, which assumes the QTL loci is in Hardy–Weinberg equilibrium.
Appendix 3: Simulation of multiple QTL mutations
A chromosome segment with length of 2 cM was simulated. Twelve markers were evenly spaced on the segment and one QTL was localized at 1.05 cM. The population was created 500 generations ago, the effective population size (N e) was 200, and sex ratio was 1:1. In the base population, two alleles were assigned to each marker with equal frequency, and only one allele was assigned to QTL. The marker alleles were mutated at a rate of 4 × 10−4/generation. A new QTL mutation occurred every two generations. One individual haplotype was randomly chosen, and the QTL allele on the haplotype was mutated to a new QTL allele and assigned a new number. The high mutation rate of QTL might result in about 6–12 alleles in the present population. The effects of each QTL allele were randomly sampled from standard normal distribution N(0, 1). At the last generation, the effect of each QTL allele was rescaled so that the mean of QTL effect was zero and the variance was 0.2. The residual effect was sampled from normal distribution with mean 0 and variance 0.9; the overall mean was set as zero, and no polygenic effect was simulated. With these settings, the heritability explained by QTL was 0.18. The phenotypic value for each individual then was generated by summing the overall mean, QTL effect and residual error.
Rights and permissions
About this article
Cite this article
Fang, M. A fast expectation-maximum algorithm for fine-scale QTL mapping. Theor Appl Genet 125, 1727–1734 (2012). https://doi.org/10.1007/s00122-012-1949-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-012-1949-9