Skip to main content

Advertisement

Log in

Computationally efficient map construction in the presence of segregation distortion

  • Original Paper
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

Key message

We present a novel estimator for map construction in the presence of segregation distortion which is highly computationally efficient. For multi-parental designs this estimator outperforms methods that do not account for segregation distortion, at no extra computational cost.

Abstract

Inclusion of genetic markers exhibiting segregation distortion in a linkage map can result in biased estimates of genetic distance and distortion of map positions. Removal of distorted markers is hence a typical filtering criterion; however, this may result in exclusion of biologically interesting regions of the genome such as introgressions and translocations. Estimation of additional parameters characterizing the distortion is computationally slow, as it relies on estimation via the Expectation Maximization algorithm or a higher dimensional numerical optimisation. We propose a robust M-estimator (RM) capable of handling tens of thousands of distorted markers from a single linkage group. We show via simulation that for multi-parental designs the RM estimator can perform much better than uncorrected estimation, at no extra computational cost. We then apply the RM estimator to chromosome 2B in wheat in a multi-parent population segregating for the Sr36 introgression, a known transmission distorter. The resulting map contains over 700 markers, and is consistent with maps constructed from crosses which do not exhibit segregation distortion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Abbreviations

\(M_1, M_2, M_3\) :

Genetic markers

\(M_s\) :

Segregation distortion locus (SDL)

\(r_{12}, r_{23}, r_{13}\) :

Recombination fractions between markers \(M_1, M_2\) and \(M_3\)

\(r_{1s}, r_{s3}\) :

Recombination fractions between \(M_s\) and markers \(M_{1}, M_{3}\)

\(g_y(t)\) :

Expected proportion of allele \(y\) at \(M_2\)

\(n_{xyz}\) :

Number of lines with multilocus genotype \(x, y, z\) at markers \(M_1, M_2, M_3\)

\(n_{x.z}\) :

Number of lines with multilocus genotype \(x, z\) at markers \(M_1, M_3\)

\(\mathbb P_d\) :

Distorted probability model

\(\mathbb P_{d,f}\) :

Distorted probability model for MAGIC8 population, assuming funnel \(f\)

\(\mathbb P_{u,f}\) :

Undistorted probability model for MAGIC8 population, assuming funnel \(f\)

\(\mathbb P_u\) :

Undistorted probability model

\(p_{f}\) :

Proportion of lines from funnel \(f\) in a MAGIC8 population

\(p_{xyzf}\) :

Proportion of lines from funnel \(f\) having multi-locus genotype \(x, y, z\) at marker \(M_1, M_2, M_3\)

\(p_{.y.f}\) :

Proportion of lines from funnel \(f\) having genotype \(y\) at marker \(M_2\)

\(\hat{p}_{.y.}\) :

Empirical proportion of lines having genotype \(y\) at \(M_2\)

\(\hat{p}_{x.z}\) :

Empirical proportion of lines having multi-locus genotype \(x, z\) at \(M_1, M_3\)

\(\hat{p}_{xyz}\) :

Empirical proportion of lines having multi-locus genotype \(x, y, z\) at \(M_1, M_2, M_3\)

\(p_{xyz} (r_{12}, r_{23} )\) :

Probability of multi-locus genotype \(x, y, z\) at markers \(M_1\), \(M_2\), \(M_3\), with given recombination fractions and no distortion

\(p_{.y.}\) :

Proportion of lines having genotype \(y\) at \(M_2\)

\(G\) :

Number of underlying genotypes at each marker

\(F\) :

Set of all funnels for the MAGIC8 population

\(|F|\) :

Number of funnels for the MAGIC8 population

References

  • Bandillo N, Raghavan C, Muyco PA, Sevilla MAL, Lobina IT, Dilla-Ermita CJ, Tung CW, McCouch S, Thomson M, Mauleon R, Singh RK, Gregorio G, Redona E, Leung H (2013) Multi-parent advanced generation inter-cross (magic) populations in rice: progress and potential for genetics research and breeding. Rice 6:11

    Article  PubMed  Google Scholar 

  • Broman K (2005) The genomes of recombinant inbred lines. Genetics 169:1133–1146

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Cavanagh C, Morell M, Mackay I, Powell W (2008) From mutations to magic: resources for gene discovery, validation and delivery in crop plants. Curr Opin Plant Biol 11(2):215–221. doi:10.1016/j.pbi.2008.01.002. http://www.sciencedirect.com/science/article/pii/S1369526608000162

  • Cavanagh CR, Chao S, Wang S, Huang BE, Stephen S, Kiani S, Forrest K, Saintenac C, Brown-Guedira GL, Akhunova A, See D, Bai G, Pumphrey M, Tomar L, Wong D, Kong S, Reynolds M, da Silva ML, Bockelman H, Talbert L, Anderson JA, Dreisigacker S, Baenziger S, Carter A, Korzun V, Morrell PL, Dubcovsky J, Morell MK, Sorrells ME, Hayden MJ, Akhunov E (2013) Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars. Proc Natl Acad Sci 110:8057–8062

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Cheng R, Saito A, Takano Y, Ukai Y (1996) Estimation of the position and effect of a lethal factor locus on a molecular marker linkage map. Theor Appl Genet 93:494–502. doi:10.1007/BF00417940

  • Cheng R, Kleinhofs A, Ukai Y (1998) Method for mapping a partial lethal-factor locus on a molecular-marker linkage map of a backcross and doubled-haploid population. Theor Appl Genet 97:293–298. doi:10.1007/s001220050898

  • Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE (2011) A robust, simple genotyping-by-sequencing (gbs) approach for high diversity species. PLoS ONE 6(e19):379. doi:10.1371/journal.pone/0019379

    Google Scholar 

  • Farr A, Lacasa Benito I, Cistu L, Jong J, Romagosa I, Jansen J (2011) Linkage map construction involving a reciprocal translocation. Theor Appl Genet 122(5):1029–1037. doi:10.1007/s00122-010-1507-2

  • Gill BS, Friebe BR, White FF (2011) Alien introgressions represent a rich source of genes for crop improvement. Proc Natl Acad Sci 108(19):7657–7658. doi:10.1073/pnas.1104845108. http://www.pnas.org/content/108/19/7657.short, http://www.pnas.org/content/108/19/7657.full.pdf+html

  • Hackett CA, Broadfoot LB (2003) Effects of genotyping errors, missing values and segregation distortion in molecular marker data on the construction of linkage maps. Heredity 90(1):33–38. doi:10.1038/sj.hdy.6800173

  • Hahsler M, Buchta C, Hornik K (2008) Getting things in order: an introduction to the r package seriation. J Stat Softw 25:3

    Google Scholar 

  • Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (2005) The approach based on influence functions. In: Robust statistics. Wiley, New York, pp 100–107

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  Google Scholar 

  • Huang BE, George AW (2011) R/mpmap: a computational platform for the genetic analysis of multi-parent recombinant inbred lines. Bioinformatics 27:727–729

    Article  CAS  PubMed  Google Scholar 

  • Huang BE, George AW, Forrest KL, Kilian A, Hayden MJ, Morell MK, Cavanagh CR (2012) A multiparent advanced generation inter-cross population for genetic analysis in wheat. Plant Biotechnol J 10(7):826–839. doi:10.1111/j.1467-7652.2012.00702.x

  • Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35(1):73–101

    Article  Google Scholar 

  • Huber PJ, Ronchetti EM (2009) Robust statistics, 2nd edn. Wiley, Hoboken, pp 45–55

    Google Scholar 

  • Kover PX, Valdar W, Trakalo J, Scarcelli N, Ehrenreich IM, Purugganan MD, Durrant C, Mott R (2009) A multiparent advanced generation inter-cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet 5(7):e1000,551

  • Liu X, Guo L, You J, Liu X, He Y, Yuan J, Feng Z (2010) Progress of segregation distortion in genetic mapping of plants. Res J Agron 4:78–83

    Article  Google Scholar 

  • Lorieux M, Goffinet B, Perrier X, León DG, Lanaud C (1995a) Maximum-likelihood models for mapping genetic markers showing segregation distortion. 1. backcross populations. Theor ApplGenet 90:73–80. doi:10.1007/BF00220998

  • Lorieux M, Perrier X, Goffinet B, Lanaud C, León D (1995b) Maximum-likelihood models for mapping genetic markers showing segregation distortion. 2. f2 populations. Theor Appl Genet 90:81–89. doi:10.1007/BF00220999

  • Teuscher F, Broman K (2007) Haplotype probabilities for multiple-strain recombinant inbred lines. Genetics 175:1267–1274

    Article  PubMed Central  PubMed  Google Scholar 

  • Tsilo TJ, Jin Y, Anderson JA (2008) Diagnostic microsatellite markers for the detection of stem rust resistance gene sr36 in diverse genetic backgrounds of wheat. Crop Sci 48(1):253–261

    Article  CAS  Google Scholar 

  • Wang C, Zhu C, Zhai H, Wan J (2005) Mapping segregation distortion loci and quantitative trait loci for spikelet sterility in rice (Oryza sativa l.). Genet Res 86:97–106

    Article  CAS  PubMed  Google Scholar 

  • Wu R, Ma CX, Casella G (2007) Statistical genetics of quantitative traits: linkage, maps and QTL. Springer, Berlin, pp 52–56

    Google Scholar 

  • Xie W, Ben-David R, Zeng B, Dinoor A, Xie C, Sun Q, Rder M, Fahoum A, Fahima T (2012) Suppressed recombination rate in 6vs/6al translocation region carrying the pm21 locus introgressed from haynaldia villosa into hexaploid wheat. Mol Breed 29(2):399–412. doi:10.1007/s11032-011-9557-y

  • Xu S (2008) Quantitative trait locus mapping can benefit from segregation distortion. Genetics 180:2201–2208

    Article  PubMed Central  PubMed  Google Scholar 

  • Xu S, Hu Z (2009) Mapping quantitative trait loci using distorted markers. Int J Plant Genomics. doi:10.1155/2009/410825

  • Zhang L, Wang S, Li H, Deng Q, Zheng A, Li S, Li P, Li Z, Wang J (2010) Effects of missing marker and segregation distortion on qtl mapping in f2 populations. Theor Appl Genet 121(6):1071–1082. doi:10.1007/s00122-010-1372-z

  • Zhu C, Wang C, Zhang YM (2007) Modeling segregation distortion for viability selection i. reconstruction of linkage maps with distorted markers. Theor Appl Genet 114:295–305. doi:10.1007/s00122-006-0432-x

Download references

Acknowledgments

Dr Huang is the recipient of an Australian Research Council Discovery Early Career Researcher Award (project number DE120101127).

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. Emma Huang.

Additional information

Communicated by Jiankang Wang.

Appendix

Appendix

Proof of Eq. 9

$$\begin{aligned} {\mathbb E} ( s_{xz} )&=\frac{p_{xaz}}{4 p_{.a.}} + \frac{p_{x h z}}{2 p_{.h.}} + \frac{p_{x b z}}{4 p_{.b.}} + \mathbb E ( \epsilon _{x z} )\\&= \frac{1}{4} \mathbb P_d (M_1 = x, M_3 = z \vert M_2 = a ) + \frac{1}{2}\mathbb P_d (M_1 = x, M_3 = z \vert M_2 = h )+ \frac{1}{4} \mathbb P_d (M_1 = x, M_3 = z \vert M_2 = b ) + {\mathbb E} (\epsilon _{x z} )\\&=\frac{1}{4} \mathbb P_u (M_1 = x, M_3 = z \vert M_2 = a ) + \frac{1}{2}\mathbb P_u (M_1 = x, M_3 = z \vert M_2 = h )+ \frac{1}{4} \mathbb P_u (M_1 = x, M_3 = z \vert M_2 = b )+{\mathbb E}(\epsilon _{xz})\\&= \mathbb P_u (M_1 = x, M_2 = a, M_3 = z ) + \mathbb P_u(M_1 = x, M_2 = h, M_3 = z )+ \mathbb P_u (M_1 = x, M_2 = b, M_3 = z )+{\mathbb E}(\epsilon _{x z})\\&= \mathbb P_u (M_1 = x, M_3 = z )+{\mathbb E} (\epsilon _{x z} ) \end{aligned}$$

Proof of Eq. 14

It is not true that

$$\begin{aligned} \mathbb P_{f,d} (M_1 = x, M_3 = z\vert M_2 \ne A) = \mathbb P_{f,u} (M_1 = x, M_3 = z\vert M_2 \ne A) \end{aligned}$$

However if we take \(p_f = |F|^{-1}\) we can rewrite

$$\begin{aligned} \sum _{f \in F} | F |^{-1} \mathbb P_{f, d}( M_1 = x, M_3 = z \vert M_2 \ne A) \end{aligned}$$

as

$$\begin{aligned} | F |^{-1} \sum _{f \in F} \sum _{B \le y \le H} \mathbb P_{f, u}( M_1 = x, M_3 = z \vert M_2 = y)\mathbb P_{f, d}( M_2 = y \vert M_2 \ne A) \end{aligned}$$

We now make the approximation that \(\mathbb P_{f, u} ( M_1 = x, M_3 = z \vert M_2 = y )\) has the same value for all funnels when \(x\), \(y\) and \(z\) are distinct. Similarly, assume that there is a unique value across all funnels for \(x = y, y \ne z\) and another value for \(x \ne y, y = z\). Note that this holds exactly for \(r_{12} = r_{23} = 0\) and \(r_{12} = r_{23} = 0.5\), and it is always true that \(\mathbb P_{f,u }( M_1 = x, M_3 = x \vert M_2 = x )\) is funnel-independent. If \(x \ne A\) and \(z \ne A\), we now have

$$\begin{aligned}&| F |^{-1}\left( \sum _{f \in F} \mathbb P_{f,u } ( M_1 = x, M_3 = z \vert M_2 = x )\mathbb P_{f, d} ( M_2 = x \vert M_2 \ne A)\right. \\&\quad + \sum _{f \in F} \mathop {\sum _{B \le y \le H}}_{y \ne x, y \ne z} \mathbb P_{f,u } ( M_1 = x, M_3 = z \vert M_2 = y )\mathbb P_{f, d}( M_2 = y \vert M_2 \ne A)\\&\quad + \left. \sum _{f \in F} \mathbb P_{f,u } ( M_1 = x, M_3 = z \vert M_2 = z )\mathbb P_{f, d} ( M_2 = z \vert M_2 \ne A )\right) \end{aligned}$$

From our assumption about funnel independence this becomes

$$\begin{aligned}&\mathbb P_{u} ( M_1 = x, M_3 = z \vert M_2 = x )c_x + \mathop {\sum _{y \ne A}}_{y \ne x, y \ne z} \mathbb P_u ( M_1 = x, M_3 = z \vert M_2 = y )c_y\\&\quad + \mathbb P_u( M_1 = x, M_3 = z \vert M_2 = z )c_z \end{aligned}$$

Here \(\mathbb P_u\) refers to the approximate funnel-independent values. However \(c_x\), \(c_y\) and \(c_z\) are all exactly \(\frac{1}{7}\), so this is equal to

$$\begin{aligned} \frac{8}{7} \mathbb P_u\left( M_1 = x, M_2 \ne y, M_3 = z\right) \end{aligned}.$$

Similar arguments can be made for the cases \(x = A, z = A\), etc. Substituting this approximation back into Eq. 13 gives the desired result. Note that as we did not use the fact that \(M_2\) was between \(M_1\) and \(M_3\), these approximations are equally as valid if the marker order is \(M_2, M_1, M_3\).

Simulation of error terms

As the expectation and distribution of \(\epsilon _{xz}\) from Sect. 2.2 are analytically intractable, we characterized them through simulation. We considered the case of three dominant markers \(M_1, M_2\) and \(M_3\). The recombination fractions \(r_{12}\) and \(r_{23}\) took on values \(0.05, 0.1, 0.2, 0.3, 0.4\) and \(0.5\). The true genotypic probabilities \(p_{.a.}\) and \(p_{.h.}\) took on values \(\frac{1}{10}, \frac{2}{10}, \ldots , \frac{8}{10}\), with the restriction that \(p_{.a.} + p_{.h.} \le 0.9\). The last genotypic fraction \(p_{.b.}\) had value \(1 - p_{.a.} - p_{.h.}\). All eight possible combinations of dominant founders at the markers were considered. In total, 10,368 different sets of parameters were considered. For each set of parameters, 30,000 F2 populations of \(300\) individuals were generated. For each population, the values \(\epsilon _{00}, \epsilon _{01}, \epsilon _{10}\) and \(\epsilon _{11}\) were calculated.

Figure 4 shows a histogram of the estimated values of \({\mathbb E} [\epsilon _{00} ]\) across all the scenarios considered. Note that Fig. 4 is not a histogram of a distribution, but it shows that the expectation was close to zero in every scenario considered. The behaviour of the other three expectations is similar. In considering whether it is reasonable to assume that \(\epsilon _{xz} \simeq 0\), it is therefore sufficient to examine the variance of \(\epsilon _{xz}\).

Table 1 Simulation parameters that produced the five largest values of \(\mathrm{Var} (\epsilon _{01} )\)

Table 1 lists the five scenarios for which the largest value of \(\mathrm{Var} (\epsilon _{01} )\) was observed. These scenarios all involve extreme distortion. They also involve specific choices of dominant founders. For example, in the first scenario Eq. 11 implies that

$$\begin{aligned} \epsilon _{01} = \frac{1}{4} \left( \frac{\hat{p}_{001}}{\hat{p}_{.0.}} - \frac{\hat{p}_{001}}{p_{.0.}}\right) + \frac{3}{4}\left( \frac{\hat{p}_{011}}{\hat{p}_{.1.}} - \frac{\hat{p}_{011}}{p_{.1.}}\right) . \end{aligned}$$

In this case \(p_{.1.} = 1 - p_{.a.}=0.2\), which is relatively small. So the difference in the second term can potentially be large, whereas the difference in the first term will tend to be small. Now consider the same scenario but with the dominant founder at \(M_2\) being \(a\). From Eq. 12, in this case

$$\begin{aligned} \epsilon _{01}&= \frac{3}{4} \left( \frac{\hat{p}_{001}}{\hat{p}_{.0.}} - \frac{\hat{p}_{001}}{p_{.0.}}\right) + \frac{1}{4}\left( \frac{\hat{p}_{011}}{\hat{p}_{.1.}} - \frac{\hat{p}_{011}}{p_{.1.}}\right) . \end{aligned}$$

The difference in the first term will tend to be small, and the difference in the second will be potentially large. However it is now multiplied by a factor of \(\frac{1}{4}\) rather than \(\frac{3}{4}\), and as a result \(\mathrm{Var} (\epsilon _{01} )\) is expected to be smaller in this case.

When applying the approximation \(\epsilon _{xz} \simeq 0\) we actually make four approximations simultaneously. The worst case for each individual approximation is not expected to be representative of the performance when actually applied to recombination fraction estimation. For example, in the first scenario in Table 1, the largest values of \(|\epsilon _{00}|, |\epsilon _{10}|\) and \(|\epsilon _{11}|\) observed across 30,000 populations were \(0.0186, 0.0089\) and \(0.11\). In the specific population that gave a value of \(-0.297\) for \(\epsilon _{01}\), the corresponding values of the other error terms were \(0.014, 0.0062\) and \(-0.022\). In general, we observed that scenarios with large values for one of the error terms have very small values for other terms. Hence for nearly all situations we expect the approximation to perform reasonably well.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shah, R., Cavanagh, C.R. & Huang, B.E. Computationally efficient map construction in the presence of segregation distortion. Theor Appl Genet 127, 2585–2597 (2014). https://doi.org/10.1007/s00122-014-2401-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-014-2401-0

Keywords

Navigation