Abstract
Detecting a quantitative trait locus, so-called QTL (a gene influencing a quantitative trait which is able to be measured), on a given chromosome is a major problem in Genetics. We study a population structured in families and we assume that the QTL location is the same for all the families. We consider the likelihood ratio test (LRT) process related to the test of the absence of QTL on the interval [0, T] representing a chromosome. We give the asymptotic distribution of the LRT process under the null hypothesis that there is no QTL in any families and under local alternative with a QTL at \(t^{\star }\in [0, T]\) in at least one family. We show that the LRT is asymptotically the supremum of the sum of the square of independent interpolated Gaussian processes. The number of processes corresponds to the number of families. We propose several new methods to compute critical values for QTL detection. Since all these methods rely on asymptotic results, the validity of the asymptotic assumption is checked using simulated data. Finally we show how to optimize the QTL detecting process.
This is a preview of subscription content, access via your institution.








References
Azaïs JM, Cierco-Ayrolles C (2002) An asymptotic test for quantitative gene detection. Ann Inst Henri Poincaré (B) 38:1087–1092
Azaïs JM, Delmas C, Rabier CE (2014) Likelihood ratio test process for quantitative trait locus detection. Statistics 48:787–801
Azaïs JM, Gassiat E, Mercadier C (2009) The likelihood ratio test for general mixture models with possibly structural parameter. ESAIM 13:301–327
Azaïs JM, Wschebor M (2009) Level sets and extrema of random processes and fields. Wiley, New York
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57(1):289–300
Chang MN, Wu R, Wu SS, Casella G (2009) Score statistics for mapping quantitative trait loci. Stat Appl Genet Mol Biol 8:16
Chen HY, Zhang Q, Yin CC, Wang CK, Gong WJ, Mei G (2006) Detection of quantitative trait loci affecting milk production traits on bovine chromosome 6 in a Chinese holstein population by the daughter design. J Dairy Sci 89:782–790
Churchill G, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963–971
Cierco C (1998) Asymptotic distribution of the maximum likelihood ratio test for gene detection. Statistics 31:261–285
Davies RB (1987) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74:33–43
Delong D (1981) Crossing probabilities for a square root boundary by a Bessel process. Commun Stat Theory Methods 10:2197–2213
Didelez V, Pigeot I, Walter P (2006) Modifications of the Bonferroni-Holm procedure for a multi-way ANOVA. Stat Papers 47(2):181–209
Estrella A (2003) Critical values and p values of bessel process distributions: computation and application to structural break tests. Econ Theory 19:1128–1143
Frary A, Nesbitt TC, Frary A, Grandillo S, van der Knaap E et al (2000) fw2.2: a quantitative trait locus key to the evolution of tomato fruit size. Science 289:85–88
Gassiat E (2002) Likelihood ratio inequalities with applications to various mixtures. Ann Inst Henri Poincaré (B) 6:897–906
Genz A (1992) Numerical computation of multivariate normal probabilities. J Comput Graph Stat 1:141–149
Gudbjartsson DF, Walters GB, Thorleifsson G, Stefansson H, Halldorsson BV, Zusmanovich P, Helgadottir A (2008) Many sequence variants affecting diversity of adult human height. Nat Genet 40(5):609–615
Haldane JBS (1919) The combination of linkage values and the calculation of distance between the loci of linked factors. J Genet 8:299–309
Haley CS, Knott S (1992) A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69(4):315–324
Jung BC, Jhun M, Song SH (2007) A new random permutation test in ANOVA models. Stat Papers 48(1):47–62
Lander ES, Botstein D (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 138:235–240
Le Cam L (1986) Asymptotic methods in statistical decision theory. Springer, New York
Li C, Zhou A, Sang T (2006) Rice domestication by reducing shattering. Science 311:1936–1939
Plackett RI (1984) A reduction formula for normal multivariate integrals. Biometrika 41:351–360
Rabier CE, Genz A (2014) The supremum of Chi-Square processes. Methodol Comput Appl Probab 16:715–729
Rabier CE (2014) On quantitative trait locus mapping with an interference phenomenon. TEST 23(2):311–329
Rabier CE (2015) On stochastic processes for quantitative trait locus mapping under selective genotyping. Statistics 49:19–34
Rebaï A, Goffinet B, Mangin B (1994) Approximate thresholds of interval mapping tests for QTL detection. Genetics 138:235–240
Rebaï A, Goffinet B, Mangin B (1995) Comparing power of different methods for QTL detection. Biometrics 51:87–99
Ron M, Kliger D, Feldmesser E, Seroussi E, Ezra E, Weller JI (2001) Multiple quantitative trait locus analysis of bovine chromosome 6 in the Israeli holstein population by a daughter design. Genetics 159:727–735
Siegmund D, Yakir B (2007) The statistics of gene mapping. Springer, New York
Silva AA, Azevedo ALS, Gasparini K, Verneque RS, Peixoto MGCD, Panetto BR, Guimaraes SEF, Machado MA (2011a) Quantitative trait loci affecting lactose and total solids on chromosome 6 in Brazilian Gir dairy cattle. Genet Mol Res 10:3817–3827
Silva AA, Azevedo ALS, Verneque RS, Gasparini K, Peixoto MGCD, da Silva MVGB, Lopes PS, Guimaraes SEF, Machado MA (2011b) Quantitative trait loci affecting milk production traits on bovine chromosome 6 in zebuine Gyr breed. J Dairy Sci 94:971–980
Van der Vaart AW (1998) Asymptotic statistics., Cambridge Series in Statistical and Probabilistic MathematicsCambridge University Press, Cambridge
Weller JI, Golik M, Seroussi E, Ron M, Ezra E (2008) Detection of quantitative trait loci affecting twinning rate in Israeli holsteins by the daughter design. J Dairy Sci 91:2469–2474
Weller JI, Kashi Y, Soller M (1990) Power of daughter and granddaughter designs for determining linkage between marker loci and quantitative trait loci in dairy cattle. J Dairy Sci 73:2525–2537
Wu R, Ma CX, Casella G (2007) Statistical genetics of quantitative traits. Springer, New York
Acknowledgments
This work has been supported by the the National Center for Scientific Research (CNRS), the Animal Genetic Department of the French National Institute for Agricultural Research, and SABRE. We thank Simon de Givry for help with human data.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proof of Theorem 1
1.1 Preliminaries
Let t belong to the interval \([t_1,t_2]\) and let recall Lemma 2.3 of Azaïs et al. (2014).
Lemma 1
The conditional expectation x(t) of X(t) is linear in \(X(t_{1}),X(t_{2})\):
with \( \alpha (t)= Q^{1,1}_t - Q^{-1,1}_t\) and \( \beta (t)= Q^{1,1}_t - Q^{1,-1}_t\).
Then, we have the following relationship
Since the model is regular, we can apply Theorem 5.39 of Van der Vaart (1998). As a result, according to formulae (4) and (5), we have
where \(o_{P_{\theta _{0}}}(1)\) denotes a sequence of random vectors that converges to zero in probability under \(H_0\).
Let \(S_{n}(.,i)\) be the following process, for n observations:
According to Lemma 1,
We will call \(Z^{i}(.)\) the limiting process of \(S_{n}(.,i)\).
1.2 Study under \(H_{0}\)
Without loss of generality, let us assume \(n=1\) and let us consider the process S(., i) defined in the following way:
where \(h(t)=x(t)/\sqrt{\mathbb {E}\left\{ x^{2}(t)\right\} } \).
h(.) is a random process, independent of Y and C. It is easy to see that
Besides,
So, we have
\(\mathbb {E}\left\{ Z^{i}(t)\right\} =0\), \(\mathbb {V}\left\{ Z^{i}(t)\right\} =1\) and \( \mathrm {Cov}\left\{ Z^{i}(t_{1}),Z^{i}(t_{2})\right\} = \rho (t_1,t_2) \).
A direct application of central limit theorem implies that \(Z(t_1)\) and \(Z(t_2)\) have a limit distribution which is a Gaussian distribution. According to formula (9), we have \(\Lambda _{n}(t)=\sum _{i=1}^{I}S_{n}^{2}(t,i)+o_{P_{\theta _{0}}}(1)\). As a result, \(\Lambda _{n}(.)\mathop {\rightarrow }\limits ^{F.d.}\sum _{i=1}^{I}\left\{ Z^{i}(.)\right\} ^{2}\).
Study under \(H_{\lambda t^{\star }}\)
In this part, we set
where \(\varepsilon \) is a standard normal random variable. Recall that \(t^{\star }\) denotes the QTL location.
According to formula (9), we have
Recall that under \(H_{\lambda t^{\star }}\), if there is a QTL within family i (i.e. \(\lambda _{i}\ne 0\)), the density of \(Y\big |X(t_1),X(t_2),C\) verifies
The model with \(t^*\) fixed is differentiable in quadratic mean, this implies that the alternative defines a contiguous sequence of alternatives. By Le Cam’s first lemma, relation (12) remains true under the alternative. As a result, \(\Lambda _{n}(.)\mathop {\rightarrow }\limits ^{F.d.}\sum _{i=1}^{I}\left\{ Z^{i}(.)\right\} ^{2}\).
Calculations of the mean function of \(Z^{i}(.)\), so-called \(m_{t^{\star }}^{i}(t)\), can be done using the process \(S_{n}(.,i)\). According to formula (16) and (11), we have
where \(S_{n}^{0}(.,i)\) is the process obtained under \(H_{0}\).
Recall that \(h_{j}(.)\) is the equivalent of the process h(.) for the individual j. According to the law of large number:
Besides, we have \(\mathbb {E}\left\{ X(t^{\star })h(t_{1})\right\} =\rho (t_1,t^{\star })\) and \(\mathbb {E}\left\{ X(t^{\star })h(t_{2})\right\} =\rho (t^{\star },t_2)\).
As a result,
Due to the interpolation, we have
Study of the supremum of the LRT process
Since the model with t fixed is regular, we have the relationship (cf. section “Study under \(H_0\)”)
under the null hypothesis. Our goal is now to prove that the rest above is uniform in t.
Let us consider now t as an extra parameter. Let \(t^*,\theta ^*\) be the true parameter that will be assumed to belong to \(H_0\). Note that \(t^*\) makes no sense for \(\theta \) belonging to \(H_{0}\). It is easy to check that at \(H_0\) the Fisher information relative to t is zero so that the model is not regular.
It can be proved that Assumptions 1, 2 and 3 of Azaïs et al. (2009) holds. So, we can apply Theorem 1 of Azaïs et al. (2009) and we have
where the observation \(X_j\) stands for \(Y_j,X_j(t_1),X_j(t_2),C_j\) and where \(\mathcal {D}\) is the set of scores defined in Azaïs et al. (2009), see also Gassiat (2002). A similar result is true under \(H_0\) with a set \(\mathcal {D}_0\). Let us precise the sets of scores \(\mathcal {D}\) and \(\mathcal {D}_0\). This sets are defined at the sets of scores of one parameter families that converge to the true model \(p_{t^*,\theta ^*} \) and that are differentiable in quadratic mean.
It is easy to see that
where \(l'\) is the gradient with respect to \(\theta \). In the same manner
where now the gradient is taken with respect to \(\mu _1\), ..., \(\mu _{I}\) and \(\sigma \) only. Obviously, this gradient does not depend on t.
Using the transform \( U \rightarrow -U \) in the expressions of the sets of score, we see that the indicator function can be removed in formula (15). Then, since the Fisher information matrix is diagonal (see formula (5)) , it is easy to see that
This is exactly the desired result. Since the model with \(t^*\) fixed is differentiable in quadratic mean, the alternative defines a contiguous sequence of alternatives. By Le Cam’s first lemma, relation (15) remains true under the alternative.
Appendix 2: Proof of Theorem 2
The proof of the theorem is the same as the proof of Theorem 1 as soon as we can confine our attention to the interval \((t^{\ell },t^{r})\) when considering a unique instant t and to the intervals \((t^{\ell },t^{r})(t'^{\ell },t'^{r})\) when considering two instants t and \(t'\). For that we need to prove that
which is a direct consequence of the independance of the increments of Poisson process.
Proof of results introduced in Sect. 8
Recall that \(\mathbb {T}_{K^{i}}^{i}=\{t_{1}^{i},\ldots ,t_{K^{i}}^{i}\}\). Let \(t\in [t^{i}_{1},t^{i}_{K^{i}}]\backslash \mathbb {T}_{K^{i}}^{i}\). Let define \(x^{i}(t)\) the quantity such as \(x^{i}(t) = \mathbb {E}\left\{ X(t)\mid X(t^{\ell ,i}),X(t^{r,i}),C=i\right\} \). Besides, \(Q^{1,1}_{t,i}\), \(Q^{1,-1}_{t,i}\), \(Q^{-1,1}_{t,i}\) and \(Q^{-1,-1}_{t,i}\) are the following quantities:
Lemma 2
We have the following relationship:
with
\( \alpha _{i}(t)= Q^{1,1}_{t,i} - Q^{-1,1}_{t,i}\)
,
\( \beta _{i}(t)= Q^{1,1}_{t,i} - Q^{1,-1}_{t,i}\).
Let \(S_{n}(.,i)\) be the following process, for n observations:
According to Lemma 2
We will call \(Z^{i}(.)\) the limiting process of \(S_{n}(.,i)\).
Let us consider now the case where the first informative marker does not lie at the beginning of the chromosome (\(0<t_{1}^{i}\)). Let \(t\in [0,t_{1}^{i}[\), we have
where \(\tilde{x}^{i}(t) = 2 \mathbb {P}\left\{ X(t)=1\mid X(t^{i}_{1}),C=i\right\} - 1\). Recall that in the classical situation, when t have two flanking markers: \(x^{i}(t) = 2 \mathbb {P}\left\{ X(t)=1\mid X(t^{\ell ,i}),X(t^{r,i}),C=i\right\} - 1\). In our case,
Besides, we have
As a result,
By symmetry, when \(t^{i}_{K^{i}}<T\), we have
To conclude, we just have to use same kind of arguments as in formula (9) in order to prove that the LRT process converges asymptotically to the process \(\sum _{i=1}^{I}\left\{ Z^{i}(.)\right\} ^{2}\).
Rights and permissions
About this article
Cite this article
Rabier, CE., Azaïs, JM., Elsen, JM. et al. Chi-square processes for gene mapping in a population with family structure. Stat Papers 60, 239–271 (2019). https://doi.org/10.1007/s00362-016-0835-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-016-0835-y
Keywords
- Chi-square process
- Gaussian process
- Likelihood ratio test
- Mixture models
- QTL detection
- MCQMC
Mathematics Subject Classification
- 62M86
- 65C05
- 62P10