Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test

Yuan, Ao; Yue, Qingqi; Apprey, Victor; Bonney, George

doi:10.1007/s00439-006-0216-z

Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test

Original Investigation
Published: 29 June 2006

Volume 120, pages 253–261, (2006)
Cite this article

Human Genetics Aims and scope Submit manuscript

Ao Yuan¹,
Qingqi Yue¹,
Victor Apprey¹ &
…
George Bonney¹

86 Accesses
8 Citations
Explore all metrics

Abstract

Association studies for complex diseases based on haplotype data have received increasing attention in the last few years. A commonly used nonparametric method, which takes haplotype structure into consideration, is to use the U-statistic to compare the similarities between genetic compositions in the case and control populations. Although the method and its variants are convenient to use in practice, there are some areas where the tests cannot detect even large differences between cases and controls. To overcome this problem and enhance the power, we propose a new form of the weighted U-statistic, which directly compares the dissimilarity between the haplotype structures in the case and control populations. We show that this test statistic is asymptotically a linear combination of the absolute values of normal random variables under the null hypothesis, and shifts strictly toward the right under the alternative, and therefore has no blind areas of detection. Simulation studies indicate that our test statistic overcomes the weakness of the existing ones and is robust and powerful as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HaploShare: identification of extended haplotypes shared by cases and evaluation against controls

Article Open access 09 May 2015

Haplotype Inference

An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

References

Bourgain C, Génin E, Holopainen P, Mustalahti K, Mä M, Partanen J (2000) Use of closely related affected individuals for the genetic study of complex diseases in founder populations. Am J Hum Genet 68:154–159
Article PubMed Google Scholar
Cheung VG, Nelson SF (1998) Genomic mismatch scanning identifies human genomic DNA shared identical by descent. Genomics 47:1–7
Article PubMed CAS Google Scholar
Devlin B, Roeder K, Wasserman L (2000) Genomic control for association studies: a semiparametric test to detect excess-haplotype sharing. Biostatistics 1:369–387
Article PubMed CAS Google Scholar
Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman and Hall, New York
Google Scholar
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
Article PubMed CAS Google Scholar
Grant GR, Manduchi E, Cheung VG, Ewens WJ (1999) Significant test for direct identity-by-descent mapping. Ann Hum Genet 63:441–454
Article PubMed CAS Google Scholar
Jorde LB (2000) Linkage disequilibrium and the search for complex disease genes. Genome Res 10:1435–1444
Article PubMed CAS Google Scholar
Kimura M (1980) A simple model for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
Article PubMed CAS Google Scholar
Lee M-LT, Dehling HG (2005) Generalized two-sample U-statistics for clustered data. Stat Neerl 59:313–323
Article Google Scholar
McGuire G, Prentice M, Wright F (1999) Improved error bounds for genetic distances from DNA sequences. Biometrics 55:1064–1070
Article PubMed CAS Google Scholar
Schaid DJ, McDonnell SK, Hebbring SJ, Cunningham JM (2005) Nonparametric tests of association of mutation genes with human disease. Am J Hum Genet 76:780–793
Article PubMed CAS Google Scholar
Tajima F, Nei M (1982) Biases of the estimates of DNA divergence obtained by the restriction enzyme technique. J Mol Evol 18:115–120
Article PubMed CAS Google Scholar
Tzeng JY, Devlin B, Wasserman L, Roeder K (2003a) On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit. Am J Hum Genet 72:891–902
Article PubMed CAS Google Scholar
Tzeng JY, Byerley W, Devlin B, Roeder K, Wasserman L (2003b) Outlier detection and false discovery rates for whole-genome DNA matching. J Am Stat Assoc 98:236–246
Article Google Scholar
Van der Meulen MA, Te Meerman GJ (1997) Haplotype sharing analysis in affected individuals from nuclear families with at least one affected offspring. Genet Epidemiol 14:915–920
Article PubMed CAS Google Scholar
Vardi Y, Ying Z, Zhang CH (2001) Two-sample tests for growth curves under dependent right censoring. Biometrika 88:949–960
Article Google Scholar
Weeks DE, Lange K (1988) The affected-pedigree-member method of linkage analysis. Am J Med Genet 42:315–326
CAS Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1:80–83
Article Google Scholar

Download references

Acknowledgments

The research has been supported in part by the National Center for Research Resources at NIH grant 2G12RR003048. The authors are grateful to the two reviewers and the editor for their helpful suggestions. We thank Mrs. Ashelyn Mosby for her careful reading of the manuscript.

Author information

Authors and Affiliations

National Human Genome Center, Department of Community Health and Family Medicine, Howard University, Washington, DC, USA
Ao Yuan, Qingqi Yue, Victor Apprey & George Bonney

Authors

Ao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Qingqi Yue
View author publications
You can also search for this author in PubMed Google Scholar
Victor Apprey
View author publications
You can also search for this author in PubMed Google Scholar
George Bonney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ao Yuan.

Appendix

Proof of the proposition

(i) Note under H ₀, $U_{m,n} = \mu_{\hat{p}, \hat{q}} = \hat{p}^{\prime} D\hat{q}$ and D = D _p,p, so we have

$$ \hat{U}_{m,n} - \mu_{\check{p},\check{p}} = \left( \hat{U}_{m,n} - U_{m,n} \right) + (U_{m,n} - \mu_{p,p}) + \left( \mu_{p,p} - \mu_{\check{p},\check{p}} \right). $$

(6)

Let D _p,q ^(1,0)(·,·) = ∂D _p,q(·,·)/∂p and D _p,q ^(0,1)(·,·) = ∂D _p,q(·, ·)/∂q be the column vectors of first partial derivatives, and D _p,q ^(0,1) and D _p,q ^(0,1) be the corresponding matrices of column arrays. The first term in (6) is

$$\begin{aligned} \hat{U}_{m,n} - U_{m,n} &= \hat{p}^{\prime} \left(D_{\hat{p},\hat{q}} -D_{p,q} \right) \hat{q} = \hat{p}^{\prime} \left[D^{(1,0)}_{p,q} \left(\hat{p} - p \right) + D^{(0,1)}_{p,q} \left( \hat{q} - p \right) + O \left( \left\|\hat{p} - p \right\|^2 + \left\| \hat{q} - q \right\|^2 \right) \right] \hat{q}\\ &= \hat{p}^{\prime} \left[D^{(1,0)}_{p,q} \left( \hat{p} - p \right) + D^{(0,1)}_{p,q} \left(\hat{q} - p \right) \right] \hat{q} + O_P(1/N). \end{aligned}$$

Note the (i, j)th component of D _p,q ^(1,0) is the vector with lth entry

$$ {\frac{\partial}{\partial p_l}} D_{p,q} \left( {\bf h}_i, {\bf h}_j \right) = \left\{\begin{array}{*{20}l} {\frac{{\rm e}^{p_i-q_i} - {\rm e}^{q_i-p_i}}{2}}\;{\frac{{\rm e}^{p_j-q_j} + {\rm e}^{q_j-p_j}}{2}} D \left({\bf h}_i, {\bf h}_j \right) &\hbox{if}\;l=i\\ 0 & \hbox{else}\end{array}\right. \quad(l=1, \ldots, k). $$

Thus D _p,q ^(1,0) = 0 under H ₀, similarly D _p,q ^(0,1) = 0 under H ₀, and so under H ₀,

$$ \hat{U}_{m,n} - U_{m,n} = O_P(1/N). $$

Note ∂(p′Dq)/∂p = q′ D, ∂(p′Dq)/∂q = p′D, so under H ₀ the second term in (6) is

$$ \hat{p}^{\prime} D\hat{q} - p^{\prime} Dp = p^{\prime} D \left( \hat{p} - p \right) + p^{\prime} D\left( \hat{q} - p \right) + O \left( \left\| \hat{p} - p \right\|^2 + \left\| \hat{q} - p \right\|^2 \right). $$

Also, ∂μ_p,p/∂p = ∂(p′Dp)/∂p = 2p′D, so the third term in (6) under H ₀ is

$$ -2p^{\prime} D\left( \check{p} - p \right) + O \left( \left\| \check{p} - p \right\|^2 \right) = -2p^{\prime} D\left( \check{p} - p \right) + O_P(1/N). $$

Now collect the above relationships, we have

$$ \sqrt{N} \left( \hat{U}_{m,n} - \mu_{\check{p},\check{p}} \right) = \sqrt{N}p^{\prime} D\left[ \left(\hat{p} - p \right) + \left( \hat{q} - p \right) - 2 \left(\check{p} - p \right) \right] + O_P \left(1/\sqrt{N} \right). $$

Note that under H ₀,

$$ \sqrt{N} \left[ \left( \hat{p} - p \right) - \left( \hat{q} - p \right) \right] \mathop{\rightarrow}\limits^{D} N \left(0, \left(\gamma_1+\gamma_2 \right)R \right), $$

and for each i, the i-th entry of $\left( \hat{p} - p \right) + \left( \hat{q} - p \right) - 2\left(\check{p}-p \right)$ is

$$\left\{\begin{array}{*{20}l} \left( \hat{p}_i - p_i \right) - \left(\hat{q}_i - p_i \right) & \hbox{if}\;\hat{p}_i - p_i >= \hat{q}_i - p_i \\ \left(\hat{q}_i - p_i \right) - \left(\hat{p}_i - p_i \right) & \hbox{else}. \end{array} \right.$$

Thus, for a vector a = (a ₁, ..., a _k)′, denote |a| = (|a ₁|, ..., |a _k|)′, we have

$$ \sqrt{N} \left(\hat{U}_{m,n} - \mu_{\check{p},\check{p}} \right) = p^{\prime} D \left| \sqrt{N} \left[ \left( \hat{p} - p \right) - \left( \hat{q} - p \right) \right] \right| + O_P \left(1/\sqrt{N} \right) \mathop{\rightarrow}\limits^{D} \left(\gamma_1 + \gamma_2 \right)^{1/2} p^{\prime} D|W|, $$

where W ∼ N(0,R).

(ii) In this case

$$\begin{aligned} \hat{U}_{m,n} - \check{p}^{\prime} D\check{p} &= \left( \hat{p}^{\prime} D_{\hat{p},\hat{q}} \hat{q} - \hat{p}^{\prime} D_{p,q} \hat{q} \right) + \left( \hat{p}^{\prime} D_{p,q} \hat{q} - p^{\prime} D_{p,q}q \right) + \left( p^{\prime} D_{p,q}q - \check{p}^{\prime} D\check{p} \right)\\ &= \hat{p}^{\prime} \left[ D_{p,q}^{(1,0)} \left( \hat{p} - p \right) + D_{p,q}^{(0,1)} \left( \hat{q} - q \right) \right] \hat{q} + q^{\prime} D_{p,q} \left(\hat{p} - p \right)\\ &\quad + p^{\prime} D_{p,q} \left(\hat{q} - q \right) + \left( p^{\prime} D_{p,q}q - \tilde{p}^{\prime} D\tilde{p} \right) + O_P(1/N). \end{aligned}$$

Since $\hat{p} \rightarrow p$ (a.s.) and $\hat{q} \rightarrow q$ (a.s), by Slutsky’s theorem, $\sqrt{N} \hat{p}^{\prime} \left[ D_{p,q}^{(1,0)} \left( \hat{p} - p \right)+ D_{p,q}^{(0,1)} \left( \hat{q} - q \right) \right] \hat{q}\;\hbox{and}\;\sqrt{N} p^{\prime} \left[ D_{p,q}^{(1,0)} \left( \hat{p} - p \right) + D_{p,q}^{(0,1)} \left( \hat{q} - q \right) \right] q$ has the same asymptotic distribution. With the components of D ^(1,0)_p,q given in (i), it is easy to check that

$$ p^{\prime} D_{p,q}^{(1,0)} \left( \hat{p} - p \right) q = a \left( \hat{p} - p \right), \quad p^{\prime} D_{p,q}^{(0,1)} \left( \hat{q} - q \right)q = b \left( \hat{q} - q \right) $$

where a = (a ₁, ..., a _k)′ and b = (b ₁, ..., b _k)′ with

$$\begin{aligned} a_i &= p_i {\frac{{\rm e}^{p_i-q_i} - {\rm e}^{q_i-p_i}}{2}} \sum\limits_{j=1}^k {\frac{{\rm e}^{p_j-q_j} + {\rm e}^{q_j-p_j}}{2}} q_jD \left( {\bf h}_i, {\bf h}_j \right)\quad (i=1, \ldots, k)\\ b_i &= q_i {\frac{{\rm e}^{q_i-p_i} - {\rm e}^{p_i-q_i}}{2}} \sum\limits_{j=1}^k {\frac{{\rm e}^{q_j-p_j} + {\rm e}^{p_j-q_j}}{2}} p_jD \left({\bf h}_i, {\bf h}_j \right) \quad (i=1,\ldots,k). \end{aligned}$$

This gives

$$\begin{aligned} \, &\sqrt{N} \left( \hat{U}_{m,n} - \check{p}^{\prime} D\check{p} \right) - \sqrt{N} \left( p^{\prime} D_{p,q}q - \tilde{p}^{\prime} D\tilde{p} \right)\\ \, &\quad = \sqrt{N} \left[ \left( a^{\prime} + q^{\prime} D_{p,q} \right) \left( \hat{p} - p \right) + \left( b^{\prime} + p^{\prime} D_{p,q} \left( \hat{q} - q \right) \right) \right] + O_P \left(1/\sqrt{N} \right) \mathop{\rightarrow}\limits^{D} N(0, \sigma^2) \end{aligned}$$

with

$$ \sigma^2 = \gamma_1 \left( a^{\prime} + q^{\prime} D_{p,q} \right) R \left(a + D_{p,q}q \right) + \gamma_2 \left( b^{\prime} + p^{\prime} D_{p,q} \right) Q \left( b + D_{p,q}p \right). $$

Similarly, we obtained asymptotic versions of Tzeng et al.’s results. let σ_T ² and $\tilde{\sigma}^2_T$ be their asymptotic variance under H ₀ and H ₁, respectively, assume $0 \lim N/m = \gamma_1 < \infty,\;0 \lim N/n = \gamma_2 < \infty$ and non-degeneracy of the kernel, then

$$ \sqrt{N} \left( \hat{p}^{\prime} A \hat{p} - \hat{q}^{\prime} A\hat{q} \right)/\sigma_T \mathop{\rightarrow}\limits^{D} N(0,1), $$

where σ_T ² = 4(γ₁+ γ₂) p′ARAp. σ_T ² is consistently estimated by $\hat{\sigma}^2_T$ which is σ_T ² with p replaced by $\hat{p},$ the estimate of p from the pooled data. Under H ₁,

$$ \sqrt{N} \tilde{\sigma}^{-1}_T \left( \hat{p}^{\prime} A\hat{p} - \hat{q}^{\prime} A\hat{q} - p^{\prime} Ap + q^{\prime} Aq \right) \mathop{\rightarrow}\limits^{D} N(0,1), $$

where $\tilde{\sigma}^2_T =4 \left( \gamma_1 p^{\prime}ARAp + \gamma_2 q^{\prime}AQAq \right),\;\hbox{and}\;\tilde{\sigma}^{2}_T$ is consistently estimated by its empirical version.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, A., Yue, Q., Apprey, V. et al. Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test. Hum Genet 120, 253–261 (2006). https://doi.org/10.1007/s00439-006-0216-z

Download citation

Received: 19 January 2006
Accepted: 26 May 2006
Published: 29 June 2006
Issue Date: September 2006
DOI: https://doi.org/10.1007/s00439-006-0216-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test

Abstract

Access this article

Similar content being viewed by others

HaploShare: identification of extended haplotypes shared by cases and evaluation against controls

Haplotype Inference

An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of the proposition

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detecting disease gene in DNA haplotype sequences by nonparametric dissimilarity test

Abstract

Access this article

Similar content being viewed by others

HaploShare: identification of extended haplotypes shared by cases and evaluation against controls

Haplotype Inference

An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of the proposition

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation