Genomewide association filtering using a highly locusspecific transmission/disequilibrium test
Abstract
Multimarker transmission/disequilibrium tests (TDTs) are powerful association and linkage tests used to perform genomewide filtering in the search for disease susceptibility loci. In contrast to case/control studies, they have a low rate of false positives for population stratification and admixture. However, the length of a region found in association with a disease is usually very large because of linkage disequilibrium (LD). Here, we define a multimarker proportional TDT (mTDT _{ P }) designed to improve locus specificity in complex diseases that has good power compared to the most powerful multimarker TDTs. The test is a simple generalization of a multimarker TDT in which haplotype frequencies are used to weight the effect that each haplotype has on the whole measure. Two concepts underlie the features of the metric: the ‘common disease, common variant’ hypothesis and the decrease in LD with chromosomal distance. Because of this decrease, the frequency of haplotypes in strong LD with common disease variants decreases with increasing distance from the disease susceptibility locus. Thus, our haplotype proportional test has higher locus specificity than common multimarker TDTs that assume a uniform distribution of haplotype probabilities. Because of the common variant hypothesis, risk haplotypes at a given locus are relatively frequent and a metric that weights partial results for each haplotype by its frequency will be as powerful as the most powerful multimarker TDTs. Simulations and real data sets demonstrate that the test has good power compared with the best tests but has remarkably higher locus specificity, so that the association rate decreases at a higher rate with distance from a disease susceptibility or disease protective locus.
Keywords
Human Leukocyte Antigen Class Association Rate Chromosomal Distance Relative Genotype Risk Family TrioIntroduction
Genomewide genotyping of singlenucleotide polymorphisms (SNPs) can yield a few hundred thousand binary markers in a single chip array, providing a relatively unbiased examination of the entire genome for common risk variants. Many loci have been determined to be associated with multifactorial diseases using this new technology. However, in most cases, the information provided is not enough to localize the causal variant of the association. Nonetheless, genomewide association studies yield useful information for better identification of an associated region that facilitates fine mapping of the region with a reduced number of markers.
There are two main types of genomewide data association analyses: case–control studies and familybased studies. Although case–control association studies are the most common, they have high type I errors because of population stratification (Spielman et al. 1993; Zhang et al. 2003). In familybased studies, transmission/disequilibrium tests (TDTs) are powerful tests requiring only family trios with both parents and one affected offspring. In contrast to case–control studies, TDTs are known to be robust for population structures. Therefore, they are an interesting alternative to case–control studies when family trios can be genotyped. The classic singlemarker biallelic TDT can detect association due to linkage. Multimarker generalizations of the classic TDT enhance it by detecting marker interactions, such as when a trait does not depend on a single marker but there is association when considering more than one marker together, which may point to linkage disequilibrium (LD) or gene–gene interaction (epistasis). This may be the case for genomewide genotyping in which a disease susceptibility locus cannot be genotyped but some markers in LD with the locus can be. Thus, the power of a multimarker TDT can significantly enhance that reached by a single TDT.
Different approaches have been used to define multimarker TDTs, each of them computing statistical significance in a different way. The most widely used are: (1) TDTs that are straightforward extensions of the classic singlemarker biallelic TDT; (2) TDTs that group haplotypes to reduce the degrees of freedom (df); and (3) TDTs based on haplotype similarities to reduce df and improve the test power.
The idea behind the first of the approaches is simple. In nuclear families with one affected child, there must be a difference between the counts for nontransmitted and transmitted haplotypes if they are directly associated with the disease or in linkage with a susceptibility locus. The most commonly used test in this approach is the classic multimarker TDT (mTDT) (Spielman and Ewens 1996; Lazzeroni and Lange 1998), a straightforward extension of the biallelic monomarker TDT that can be used by considering each haplotype as a particular allele (Sham 1997; Bourgain et al. 2001). Using this approach, we can also consider introducing some nonlinear transformation to the transmitted/nontransmitted haplotype counts, such as TDT _{ E } (Zhao et al. 2007), which is based on the concept of entropy. More specific tests have also been defined to improve power for uncertain transmission cases (Clayton 1999; Zhao et al. 2000) or genotyping errors (Gordon et al. 2001). The main problem with tests using this approach is that the df of the approximate χ^{2} distribution increase with the number of haplotypes and thus permutation tests to determine the null distribution may be required for sparse data.
The second approach tries to reduce the df by grouping haplotypes using different criteria such as haplotype distance (Li et al. 2001) or a haplotype evolutionary relationship (Seltman et al. 2001). These tests are very timeconsuming when used in genomewide searches, as they have to first infer a model to group the haplotypes. As an example, a cladogram for which it is assumed that there are no recurrent disease mutations and no recombination or gene conversion must be estimated. Violation of these strong assumptions may decrease the general accuracy of the test.
The third approach also tries to reduce the df using haplotype similarities. However, instead of counts for the haplotype groups, similarity metrics are used, such as the length measure used in the length contrast test (TDT _{ LC }) (Yu et al. 2005) and the signed rank test (TDT _{ SR }) (Yu et al. 2005) and other metrics such as those used in the maximum identity length contrast (MILC) test (Bourgain et al. 2001) and the haplotypesharing TDT (HSTDT) (Zhang et al. 2003). For the TDT _{ LC } and TDT _{ SR } tests it is assumed that there must be less variation among haplotypes transmitted to affected offspring than among nontransmitted haplotypes, as they distinguish the sign of the difference in the measure between transmitted and nontransmitted data sets. However, TDTs based on this assumption are more specific than multimarker TDTs because they do not detect statistically significant differences in haplotype similarities when these are greater among nontransmitted haplotypes. This may occur when a haplotype is not in linkage with a disease susceptibility gene but with a protective gene, so that it will be more frequent in healthy individuals. There is a more important issue in similaritybased TDTs: similarity measures are computed by pairwise comparisons between individuals. Thus, their computational complexity is a quadratic function of the number of founders, in contrast to most of TDT measures, which use sample counts and increase linearly with the number of individuals. For current genotype samples with up to a few thousand individuals, similaritybased TDTs are thus a real burden.
Our goal was to define a computationally feasible multimarker measure, named a proportional mTDT (mTDT _{ P }), with high power and high robustness for population admixture and stratification with high locus specificity as an association test. Therefore, association rates are expected to quickly decrease with distance from a disease susceptibility or protective locus. The measure belongs to the first of the approaches and is a generalization of mTDT that weights partial results for each haplotype by its probability frequency. The success of the measure in improving locus specificity is based on two assumptions: (1) according to the decrease in LD with chromosomal distance, the frequency of haplotypes in linkage with a disease haplotype is higher at shorter distances from the disease locus; and (2) according to the ‘common disease, common variant’ (CDCV) hypothesis, disease susceptibility variants are quite common in complex diseases and a combination of several genes, rather than a single gene, together with environmental factors, causes the disease. A consequence of these assumptions is that haplotypes in very strong LD with a disease or protective variant are common and their frequency will notably decrease with chromosomal distance.
Therefore, under both extremes of the expectrum of chromosomal distances (the null hypothesis of no linkage and no distance to the disease locus), there must be little difference between mTDT _{ P } and mTDT; as we depart from these, differences between the two tests arise: association detected by mTDT _{ P } will decrease more rapidly as we depart from the disease locus.
In “Methods”, after analysis of mTDT and the reasons why it cannot be considered a highly locusspecific test, we propose mTDT _{ P }, a modification of mTDT that considers differences in haplotype frequencies to improve both specificity and sensitivity. “Simulation studies” compares different multimarker TDTs for different genetic models, relative risks, haplotype lengths and total disease susceptibility loci. As mentioned above, our goal was not only to study test power and robustness under different configurations, but also to observe the rate at which statistical significance decreases with chromosomal distance. Simulations to study association rates at different chromosomal distances from a disease susceptibility locus have been performed for singlemarker TDTs (Zhao et al. 2007). The “Simulation studies” compare sensitivity, specificity and robustness for some stateoftheart multimarker TDTs defined under different approaches. In “Real data sets”, we compare the power and locus specificity of our test (mTDT _{ P }) with other TDTs using real trio samples for Crohn and multiple sclerosis (MS) diseases and robustness using control trio samples of individuals from the International Hapmap Project (IHP) (HapMapConsortium 2003), and finally “Discussion”.
Methods
Assume that the data represent M nuclear families in which one child is affected and that L SNPs are genotyped for all the family members. As an example, for L = 2 and assuming biallelic SNPs, there will be only k = 4 different haplotypes: AB, Ab, aB and ab. Consider a sample composed of all transmitted and nontransmitted haplotypes when the parents are heterozygotic. Let n be the sample size. Thus, subsamples S _{ T } and S _{ U } of transmitted and nontransmitted haplotypes, respectively, both contain n/2 haplotypes.
Analysis of mTDT
Both mTDT and mTDT _{ s } give all haplotypes the same weight, regardless of their frequencies, as each summand is the square of a standard normal distribution under the null hypothesis. Even under the null hypothesis, the variability in haplotype frequency is usually very high, with some haplotypes very frequent and others very rare. Therefore, the assumption that differences in transmission of multimarker haplotypes follow a χ^{2} distribution under the null hypothesis of no linkage leads to a test that is too simplistic and unrealistic. The larger the haplotypes, the greater is the departure of the true null distribution from a \(\chi_{k1}^2\) distribution, as there are more differences among haplotype frequencies.
We explore the consequences of this simplification once we introduce a generalization of mTDT that considers differences in haplotype frequencies.
Definition of mTDT_{P}
Factors n _{ i }/n, ∀i ∈ 1,…, k weight haplotypes according to their frequencies, which means that differences in transmission for the most frequent haplotypes have a greater effect on the measure.
Taking into account that haplotype counts are correlated, the asymptotic variance of mTDT _{ P } under the null hypothesis is derived in Appendix 1.
Under different genotype frequencies, the variance (Appendix 1) is larger than \({\frac{2}{k1}}\), so that, as it occurs with mTDT (Sham 1997), TDT _{ P } will tend to be anticonservative. A feature of this measure is that it reduces the impact of random effects due to rare haplotypes without the need of imposing a lower bound in haplotype counts for haplotypes to be used, as is usually done by mTDT (Sham and Curtis 1995).
But the main feature of mTDT _{ P } is that, in contrast to most multimarker TDTs which lack either in power or in locus specificity, mTDT _{ P } has both: a high power and a high locus specificity to detect disease susceptibility or disease protective loci in complex diseases. The reason for the measure to be comparable in power to the powerful mTDT is that, assuming the CDCV hypothesis, the impact that nonrecombinant haplotypes have on the measure is high when chromosomal distance to a disease locus is very short, as their frequencies are high and so are their weights. As we depart from the disease locus, the recombination factor increases, nonrecombinant haplotypes will be less frequent in haplotypes transmitted to affected children and their impact in the whole measure will decrease faster than when weighting is not used, as in mTDT.
In order to characterize the distribution of mTDT _{ P } under the null hypothesis of no linkage to avoid using permutation tests to assess statistical significance we will first consider the simpler but unrealistic situation of haplotype counts being obtained from independent samples (“Independent random variables: characterization and approximation of a weighted χ^{2} distribution”) as a starting point to consider dependencies among them (“Dependent random variables: approximation of mTDT _{ P } under the null hypothesis”).
Independent random variables: characterization and approximation of a weighted χ^{2} distribution
The computation of the distribution function of W _{ w } = (w _{1},…, w _{ k }) is very complicated because of numerical integration (Solomon and Stephens 1977; Gabler and Wolff 1987). As we are interested in a TDT for genomewide association filtering, permutation tests should be avoided and an easily computable approximation of the asymptotic test distribution under the null hypothesis is required.
It is straightforward to show that in the case of equal weights (\(w_i={\frac{1}{k}},\forall i \{1,\ldots, k\}\)), \(\delta={ \frac{1} {k}}\) and the approximation turns out to be a true weighted χ^{2} distribution, as the three distribution functions are exactly the same.
Dependent random variables: approximation of mTDT_{P} under the null hypothesis
As each individual carries a pair of haplotypes, haplotype counts are not obtained from independent samples. Therefore, \(Y_{i}^2, \/ i\in \{1,\ldots,k\}\) are not independent \(\chi_1^2\) variables and thus mTDT _{ P } under the null is not \(W_{k,{\bf w}=({\frac{n_1} {n}},\ldots,{\frac{n_k}{n}})}\). Therefore, the exact distribution needs to be assessed.
As it was said above, under the null hypothesis of no linkage and when the frequencies of all parental heterozygous genotypes are equal, mTDT asymptotically follows a \(\chi_{k1}^2\) distribution and, therefore, mTDT _{ P } = (k − 1)mTDT a scaled \(\chi_{k1}^2\). For k = 2, the asymptotic variance is 2. Moreover, for k = 2, mTDT _{ P } also reduces to the simple (i.e., monomarker, monoallelic) TDT.
To use the approximation of the weighted sum of χ^{2} distributions W considered above (Gabler and Wolff 1987) in order to obtain the distribution of mTDT _{ P } under the null hypothesis and considering that the χ^{2} distributions are not independent, we have modified the limiting distributions G and U so that it can be easily shown they will be exactly a scaled \(\chi_{k1}^2\) with scale factor k − 1 under equal genotype heterozygous frequencies.
Inputs: 

k: the number of different haplotypes in the sample 
weights: a list of k weights 
HP: the value of statistics mTDT_{ P } for the current sample 
Output: 
result: p value 
Description: 
result = 0 
DS = 1 
R1 = 0 
df = k − 1 
Foreach haplotype i = 1,…,k 
dZero = 0.5/weights(i) 
R1 = R1 + weights(i)*gammai(dZero, HP*dZero) 
DS = DS*weights(i) 
R2 = pValTestChiSquare(HP/DS^{1/k }, k) 
result = max(R1, 1 − R2) 
In order to check whether mTDT _{ P } follows a weighted χ^{2} distribution in the more general case of different parental heterozygous genotype frequencies, we performed permutations in “Simulation studies” (Zhang et al. 2003; Yu et al. 2005) and we did not find significant differences (data not shown).
Simulation studies
We compared the performance of our solution mTDT _{ P } with several stateoftheart multimarker TDTs, such as the classic mTDT and other TDTs based on different approaches: the similaritybased tests mTDT _{ LC } and mTDT _{ SR }, the entropybased mTDT _{ E } and the groupbased mTDT _{T1}. mTDT _{1T } (Ott 1999) is a \(\chi_1^2\) test under the null hypothesis of no linkage that checks differences between the haplotype with more significant differences n _{ iT } − n _{ iU } and the rest of the haplotypes in a sample.
We also modified mTDT using some wellknown corrections of χ^{2} tests to improve the specificity by reducing random errors due to low frequencies and some modifications of these (Appendix 2), such as the Yates (1934) correction mTDT _{ Y }, its modification mTDT _{ YP } and the Laplace corrections mTDT _{L1} and mTDT _{L2}.
Besides robustness to population stratification and power, we are interested in measuring locus specificity. Thus, the decrease in the rate of associations detected with incremental linkage distance or recombination rates (θ) was assessed considering the extreme points from θ = 0 for which all associations detected are true positive associations (power) and from θ = 0.0002 for which most associations detected are type I errors.
Statistical significance levels were obtained using a permutation procedure for mTDT _{ LC }, mTDT _{ SR } and mTDT _{ E } (Zhang et al. 2003; Yu et al. 2005). For mTDT _{ P }, the approximation of a weighted χ^{2} with weights being the haplotype frequencies was used (Independent random variables: characterization and approximation of a weighted χ^{2} distribution). For the remaining tests, the exact χ^{2} distribution was used.
Simulation setup
We tried to reproduce the same simulations used in several studies to check TDT accuracy (Zhang et al. 2003; Yu et al. 2005) and explained in the following subsections.
As our main goal is to have a useful test to perform genomewide association filtering, computational complexity is a main issue and a linear relationship between computational complexity and the number of SNPs is highly desirable. Therefore, we applied the tests in a very feasible way in which only consecutive or overlapping clusters of SNPs (known as sliding windows) were tested together. For simulations of a cluster as suggested by Crawford et al. (2004), we assumed that recombination rates among all the markers tested is very low, which is equivalent to assuming that they belong to the same lowrecombination block (Daly et al. 2001). The recombination fraction within blocks (θ_{ B }) for a common population with exponential growth, such as an African population, has been estimated as 0.000088 (Hinds et al. 2005) and we used this value in the simulations.
We also modified the method for introducing a disease mutation compared to other studies (Sham 1997; Zhang et al. 2003; Yu et al. 2005). Instead of considering only one ancestral chromosome with the diseasecausing mutation, or the improvement of using two ancestral chromosomes (Zhang et al. 2003), a more realistic simulation of inheritance of complex diseases was used, in which the number of ancestral disease chromosomes can change according to the coalescent model, as any other gene does.
Populations were drawn using msHOT (Hellenthal and Stephens 2007), a program for generating samples based on the coalescent model that incorporates recombination. The samples for all the populations were obtained using trioSampling, a computer program available on the supplementary website. In the following subsections, we describe the simulations in detail and highlight any departures from the setup commonly used (Sham 1997; Zhang et al. 2003; Yu et al. 2005). A more detailed explanation of the simulations performed can be accessed on the supplementary website.
Robustness
To check the robustness to population stratification, simulations were performed as described by Zhang et al. (2003) and Yu et al. (2005). Therefore, we considered stratified populations. However, instead of using samples of 200 nuclear families (Zhang et al. 2003; Yu et al. 2005), we produced samples with 500 nuclear families. Moreover, we used recombination fraction from the markers to the disease locus θ = 0.5 to represent a true null. Association rates were estimated based on 1, 000 replications. Families were randomly sampled by choosing haplotypes with the disease mutation and randomly choosing the haplotypes transmitted to children considering recombinations. For the first subpopulation, the minor allele frequency (MAF) for the markers was 0.5 and the probability of the disease mutation in parents p _{ D } was 0.2. For the second subpopulation, different MAFs q for the markers were used: q ∈ {0.1, 0.3, 0.5} and p _{ D } was 0.3. Different proportions of individuals from the first sample were used, \(pp\in\{1/2, 1/4, 1/6\}\). Therefore, by varying pp and q, nine different scenarios where considered to test the robustness.
Locus specificity and sensitivity
Simulations for power (sensitivity), i.e., assuming no recombination between the disease susceptibility locus and the markers tested, were similar to those used in several studies assuming one founder disease haplotype (Lam et al. 2000; Zhang et al. 2003; Yu et al. 2005), except that SNPs used were assumed to be in high LD, i.e., they belong to the same lowrecombination block (Daly et al. 2001). Therefore, we performed simulation analyses using haplotype data sets for 200 nuclear families (family trios with both parents and an affected child). Association rates were estimated based on 100 replications of the simulations described below (Sham 1997; Zhang et al. 2003; Yu et al. 2005).
Values used to configure sample parameters used in specificity/sensitivity simulations
Relative risk  2, 4, 6, 8, 10 
Genetic model  Additive, recessive, dominant 
θ to disease loci  0, 5e−05, 1e−04, 1.5e−04, 2e−04 
Haplotype length  1, 2, 4, 6, 8, 10 
The fourth parameter checks the decrease in association rate due to chromosomal distance. We considered five different recombination fractions (θ) from the markers to the disease susceptibility locus, ranging from perfect LD (no recombination) to θ = 0.0002. Use of the recombination fraction to choose markers for the samples forced us to modify the pattern of population growth to simulate the LD decrease with distance in a more realistic way in a human population (Kruglyak 1999; Crawford et al. 2004). For greater consistency with real populations and complex diseases in which different numbers of founders can carry the disease loci, we used the coalescent model (Nordborg 2001) to draw populations with a variable number of founder haplotypes and population growth as explained above. Any position can be a disease susceptibility locus. Disease founder haplotypes were chosen by selecting one SNP with a mutant allele with frequency in the interval [0.2, 0.4] to mimic a common disease (Yu et al. 2005).
We later produced a second set of simulations with more realistic relative risks (1.2, 1.6, 2.0, 2.4 and 2.6) and samples of 500 nuclear families and focused only in the most powerful statistics which were also highly efficient (computational complexity linear to the number of families).
In order to know how frequencies of the disease mutation affect mTDT _{ P } and the other measures, we generated a third set of simulations with same parameters as the second one but considering the frequency of the disease mutation in the interval [0.1, 0.2].
Simulation results
The sensitivity and specificity of the tests were analyzed by counting rates of association for different chromosomal distances from markers to disease loci.
Type I error rates in presence of population stratification and admixture and recombination factor the the disease locus 0.5 based on 1,000 simulations
α  MAF  pp 


0.01  0.1  0.5  0.009 
0.01  0.3  0.5  0.012 
0.01  0.5  0.5  0.013 
0.01  0.1  0.75  0.012 
0.01  0.3  0.75  0.016 
0.01  0.5  0.75  0.015 
0.01  0.1  0.833  0.011 
0.01  0.3  0.833  0.013 
0.01  0.5  0.833  0.013 
0.05  0.1  0.5  0.054 
0.05  0.3  0.5  0.063 
0.05  0.5  0.5  0.071 
0.05  0.1  0.75  0.060 
0.05  0.3  0.75  0.061 
0.05  0.5  0.75  0.052 
0.05  0.1  0.833  0.055 
0.05  0.3  0.833  0.056 
0.05  0.5  0.833  0.058 
Results for sensitivity (θ = 0) show that mTDT, mTDT _{1T } and mTDT _{ P } achieve the best results under all scenarios tested, with little differences among the three of them, whereas locus specificity results (θ ∈ {0.00005, 0.0001, 0.00015, 0.0002}) show that mTDT _{ P } has better performance than all the other methods. Therefore, association rates decrease faster with mTDT _{ P } than with the other methods whenever recombination fraction θ to the disease locus increases. These differences are more appreciable when we increase RR and haplotype length.
Results for α = 0.05 and haplotype lengths of 1, 2, 4, 6, 8 and 10 for one locus are available on the supplementary web site (Figures S1–S6). Results for two loci and disease models Additive, DomOrDom and RecOrRec (Figures S7–S12) and for two loci and disease models DomAndDom, Threshold and Modified) Figures S13–S18) are available on the supplementary web site. We also used the corrections to the small data problem mentioned in Appendix 2 (Figs. S19–S36). As expected, the same pattern was always observed: all the corrections improved the specificity at a cost of a reduction in sensitivity. The higher the correction, the stronger was this pattern. It should be noted that for haplotypes of length 1, i.e., only one marker, mTDT, mTDT _{1T } and mTDT _{ P } are equivalent and therefore yield the same results. Differences among them increase with haplotype length.
As mTDT and mTDT _{ P } showed a constant pattern of higher power than the other statistics for all the scenarios provided, we focused in them together with mTDT _{ Y }, the measure that performs the lightest correction to the small data problem. Disregarding mTDT _{ LC } and mTDT _{ SR } made feasible to perform a second and third set of simulations using a larger number of nuclear families: 500. We did not use mTDT _{1T } because it chooses the haplotype with the highest power and therefore it requires multitesting correction. When we used Bonferroni correction (data not shown) the measure was not competitive any more, in agreement with the already referred overcorrect association results (Tang et al. 2009).
Results for α = 0.05 and haplotype lengths of 1, 2, 4, 6, 8 and 10 for one locus are available on the supplementary web site (Figs. S37–S42). Results for two loci and disease models Additive, DomOrDom and RecOrRec (Figs. S43–S48) and for two loci and disease models DomAndDom, Threshold and Modified (Figs. S49–S54) are available on the supplementary web site.
Real data sets
As in the simulation study, besides mTDT and tests designed to cope with the problem of small data (mTDT _{ Y }, mTDT _{ YP }, mTDT _{L1} and mTDT _{L2}), we used the same tests for stateoftheart data sets for comparison with mTDT _{ P }: mTDT _{1T }, mTDT _{ E }, mTDT _{ LC } and mTDT _{ SR }. We added a further test for the real data sets. mTDT _{1U } is the same as mTDT _{1T } but uses the most frequent nontransmitted instead of the most frequent transmitted haplotype. Our purpose was to consider whenever a disease is more common in the absence of a protective disease locus in affected individuals, a situation for which mTDT _{1T } would be powerless.
A multimarker TDT for genomewide association searches requires a very efficient exploration approach for the method to be feasible. A possible approach would consist of dividing the SNP sequence into blocks of low recombination using an algorithm based on confidence intervals (Gabriel et al. 2002). However, we chose to split regions in a blockfree way because a lowrecombination block has sensible differences depending on the definition used by the algorithm to split a region in blocks (Halldórsson et al. 2004). Thus, we used sliding windows (Daly et al. 2001) to apply the test to very small subsets of consecutive markers, such as 6, 8 or 10 markers. Each subset is a window and windows can share markers.
We used sliding windows of 1, 2, 4, 6, 8 and 10 SNPs per window and an offset of 1 to compute p values. Significance levels were computed for each sliding window using standard permutation tests (1,000 permutations) for when the null distribution is unknown. For all tests for which the null distribution or its approximation is known, we used that distribution to compute p values.
Phase reconstruction
We inferred haplotype frequencies using all the information from the family (Yu et al. 2005; Rinaldo et al. 2005). Those haplotypes that were unsolved using family information, were inferred using the EM algorithm under the restriction of family information (Abecasis et al. 2001; Yu et al. 2005).
To avoid inaccurate haplotype reconstruction, EM algorithm is usually applied within a low recombination block (Niu et al. 2002). However, despite we first performed a preliminary division of the chromosome in blocks of low recombination by using some of the several algorithms proposed to do that (Gabriel et al. 2002), we finally decided to use sliding windows because of the following two reasons.
On one hand, results from different block building algorithms are very distinct (Halldórsson et al. 2004) and they may bias results from TDT measures. Moreover, the chances of an haplotype of few SNPs to cover more than one block are being reduced with the increase in the number of sequenced SNPs. As an example, with a current genomewide SNP array of about 500,000 SNP markers, and considering the estimation of 20,700 bp as the average block size in Caucasian populations (Hinds et al. 2005) it means about 20 SNPs per block. For windows of length 10, there are few chances for the haplotype to span through more than one block.
On the other hand, in trio samples the EM algorithm is used under the restriction of family information (Zhang et al. 2003; Yu et al. 2005) and, therefore, it is more accurate than the simple EM to infer the phase, even beyond block boundaries, as the only positions whose transmission/nontransmission alleles cannot be solved using family information are those for which the three family members are heterozygotic (Sebastiani et al. 2004). We compared (data not shown) results of two main ways to proceed within each family: (1) to choose the most likely phase according with the EM algorithm under the restriction of family information or (2) to use weighted phases using as weights the frequencies reported by the algorithm (Zhang et al. 2003; Yu et al. 2005) and, in agreement with these works, found no significant differences among the two methods. Therefore, we opted for using the first one of the two choices, for being the one with lower computational complexity.
Data sets used
We used nine data sets of trio genotypes, one with individuals with Crohn’s disease (affectedCrohn) and the others with individuals with MS. The Crohn data set is a publicly available set originally used by Rioux et al. (2001).
Real data sets
Data set  ch.  First SNP  Last SNP  SNPs 

EVI5  1  92388330  93651891  93 
IL2R  10  6103680  7715013  353 
IL7R  5  35847586  35991293  35 
HLA  6  30736061  33163225  468 
KIAA0350  16  11050221  11226546  26 
CD226  18  65550188  65997985  38 
CD58  1  116677600  116983610  19 
IRF5  7  128055671  128309250  15 
For all the sets used, we prepared data sets for unaffected individuals from data publicly available at the website of the IHMP (HapMapConsortium 2003) consisting of genotype data for 30 family trios (HapMap Phase II) typed in the CEPH population, who are Utah residents with ancestry from Northern and Western Europe. The tests for unaffected trios are used as a control, since an association found in unaffected individuals may point out to a disease protective locus, genotypic errors or changes in Hardy–Weinberg equilibrium.
Crohn affected and unaffected data sets from the IHMP are all available on the supplementary website.
Results for real data sets
To show these results we used comparative TDT (CTDT) maps, which are drawn by averaging the p values for each sliding window covering the same marker. A computer program to construct these maps was built using BioCASE (Montes and AbadGrau 2009). Each row in a CTDT map represents sample results obtained from a different TDT. The height of the colored bar for each marker represents the range of the p value. If the p value is greater than 0.05, there is no color for that marker position, meaning that association is not significant. If the p value is less than 0.01, the colored bar has maximum height.
The association of the KIAA0350/ CLEC16A locus with MS was reported by the IMSGC genomewide association study (International Multiple Sclerosis Genetics Consortium et al. 2007), however it did not reached genomewide significance. Later on, it was replicated in several populations and now is considered a risk factor for MS (Martínez et al. 2010; M et al. 2009). Our results for the KIAA0350 locus (Fig. 8a) reveal that mTDT _{ P } detected a strong association (maximum height bar) from locus rs28087 to locus rs248836. Compared with mTDT and the alternative corrections for coping with the small data problem, mTDT _{ P } is more specific, as the range of markers with maximum association is smaller. The other tests were not able to detect association, with p values less than 0.01.
Interferon regulatory factor 5 (IRF5) has been found to be associated with MS in a cadidate gene study in several population (Kristjansdottir et al. 2008). Results for IRF5 (Fig. 8b) show an interesting pattern in mTDT _{ P } and mTDT _{1T }: there is a locus with maximum association (rs3807306), which may mean that the actual disease susceptibility locus is somewhere between this marker and its left and right neighbors, and a continuous decrease with distance from the marker at maximum association either to the left or to the right along the chromosome. This pattern only applies to the right side of the locus, with maximum association for other mTDT measures. Thus, mTDT _{ P } again yields the maximum information: the power is maximum for a shorter region and significantly decreases with distance from this region.
However, results obtained by mTDT _{ P } do not always show a narrower region of association. Sometimes the region is as wide or even wider than that detected by mTDT. This is the case for the human leukocyte antigen (HLA) locus (see the fifth CTDT map in Figs. S38–S43 on the supplementary website). This would mean that there is no single gene associated with the disease at that locus and other associations were detected as a result of linkage, but many of them along the HLA locus can influence disease onset. This is consistent with other studies suggesting that the HLA class II genes (HLADRB1) are the major determinants of MS risk in the major histocompatibility complex (MHC) region. Despite the recognized effect of HLA class II genes on risk, it is not clear what contributions other genes in this region may make. The MHC region has extensive LD spanning several megabases (Mb) and high levels of variability, with the HLA genes having hundreds of alleles. The MS data set analyzed here has not been genotyped at a sufficiently marker density across the entire MHC region to model the class II effects appropriately to be confident that the associations are not attributable to either the class II loci themselves or other (untyped) loci within the region.
Discussion
With current SNP genotype samples for family trios of a few hundred or thousand trios, the locus specificity of a test has become as important as its power, as it is very common to find associations due to linkage in loci at a considerable distance from the disease susceptibility locus. These associations usually cannot be replicated in other samples from close populations, as they are at some distance from the disease susceptibility locus and their haplotypes may have departed from the common ancestors in the first sample used due to recombination. A lack in locus specificity means they may detect association at considerably large chromosomal distance to the disease susceptibility locus. These associations can be considered spurious associations, as they do not point out to a susceptibility locus or positions very close to it and they will be hardly replicated in a lightly different sample. Thus, more than two markers may be used so that power will increase with a lower risk of low specificity. Therefore, it is very important to consider the locus specificity of TDTs to increase the chances of finding truly risky haplotypes, i.e., those actually at the disease susceptibility locus or at a very short distance from it, and thus the chances of replicating the results in other samples. With this goal, we proposed mTDT _{ P }, which is based on mTDT, one of the first multimarker TDTs. mTDT, together with mTDT _{1T } and mTDT _{ P } has the highest power under a wide range of scenarios in light of our simulations. Because mTDT _{ P } is based on mTDT, the new assumption used to define mTDT _{ P } is crucial to improve locus specificity without risking the high power of mTDT. Therefore, the new assumption and thus the modification introduced by the test had to be as simple as possible for the test to be as generic as mTDT and to focus on reducing association rates with chromosomal distance to the disease susceptibility locus at a faster rate. To achieve this, the new assumption was very specific: association decreases with chromosomal distance from a specific locus because of recombinations. As a consequence, haplotypes in phase with a disease variant at the time at which a variant appeared would recombine more often with other haplotypes with increasing distance from the disease locus. Thus, in a sample of trios with affected offspring, the frequency of these nonrecombinant haplotypes will be lower than if the haplotype were closer to the disease locus. Therefore, by weighting each summand in mTDT by the haplotype frequency, we reduce the effect that haplotypes at some chromosomal distance to a disease locus can have on the measure because of linkage. Moreover, in positions close to the disease locus, and assuming the CDCV hypothesis, there would be very few, but common, haplotypes with strong association with the risk variant, so that the weighting procedure will not reduce the power.
We performed simulations under a wide range of population and disease variables, such as the number of disease loci, the disease model, the relative risk of a genotype, haplotype length, etc. Simulations confirmed the correctness of the assumptions and the improvement in locus specificity achieved by mTDT _{ P } without reducing the power. We also used several real trio data sets with affected offspring.
As these TDTs are to be applied to genomewide data sets. a multiple testing correction should be performed. Multiple testing correction for GWAS is currently a very active research topic (Betensky and Rabinowitz 2000; Wei et al. 2009; Gorlov et al. 2009), as most of the current approaches do not consider LD between different markers and they usually overcorrect association results and therefore trueeffect associations may be missed. As the objective in the simulations performed was to compare power and locus specificity from different tests, we did not perform multiple testing corrections in any of the tests and p values were directly compared. Moreover, mTDT _{1T } and mTDT _{1U }, which choose the haplotype with the lowest p value, were not competitive when the Bonferroni correction was applied. Current real genomewide data usually have hundred thousand markers. We considered using sliding windows and comparative TDT maps as visual tools for genomewide screening, including also the use of IHMP samples as controls. In these two visual tools, instead of a unique pvalue for each window with multiple testing correction, average pvalues for all the windows a marker belongs to are drawn in order to reduce the chances of spurious associations. Therefore, we chose a simple approach to detect association decay with distance in order to select a region to perform a further finemapping study including a more dense screening over the selected region and sample replication for which multiple testing correction may be required.
The results obtained using mTDT _{ P } analysis for the MS data set showed more precise definition of MS implicated variants among the loci analyzed. KIAA0350/CLEC16A has been associated with several autoimmune diseases in genomewide association and replication studies (International Multiple Sclerosis Genetics Consortium et al. 2007; Todd et al. 2007; Márquez et al. 2009). Fine mapping of the region for type 1 diabetes (T1D) by resequencing of exons and flanking regions and SNP genotyping for the surrounding genes revealed that the most probable causal variant would be localized at the 3′ end of the KIAA0350/CLEC16A gene. Results for the mTDT _{ P } CTDT map of the KIAA0350/CLEC16A locus using MS data reveal that the region with greatest association is the last 3′ 60 Kbp of the gene, whereas the other TDTs extend the association to the intergenic 3′ region. These mTDT _{ P } results pointed to the 3′ end of the KIAA350 gene as the causative association region in MS as described for T1D. We also observed for some other loci that the mTDTP map extends to a larger region than the other TDT maps. This is the case for the IRF5 locus. The most probable causal variant for association of the IRF5 locus with MS is a functional 5bp biallelic insertion–deletion polymorphism that differentially binds the SP1 transcription factor to the IRF5 promoter (Kristjansdottir et al. 2008). The mTDT _{ P } map revealed maximal association at IRF5 and extended it to the 5′ region, including the IRF5 promoter, whereas the other maps did not reveal any association with the IRF5 promoter. In designing a fine mapping of the IRF5 locus based on mTDT, mTDT _{ Y }, mTDT _{ YP }, mTDT _{L1} or mTDT _{L2} results, we would be erroneously focusing on the middle of the gene instead of on the promoter, where the most probable causative variants are located.
An interesting question arises about whether mTDT _{ P } would be still useful when diseasesusceptibility variants have very low frequencies, i.e., under the ‘common disease, many rare variants’ (CDMRV) hypothesis. In general, GWAS are not suitable to capture rare variants and other techniques, such as DNA resequencing of candidate genes are often used (Bodner and Bonilla 2008). However, it is being recently claimed that many of the associations found by GWAS are due to ‘synthetic associations’ between very rare variants and less rare alleles, such as SNP markers (Dickson et al. 2010) on the basis that what is usually tested are not the causative genes but SNP markers around them. Under this hypothesis, we believe mTDT _{ P } may have less power than mTDT if we consider results from our simulations (Fig. 7 and supplementary Figures S55–S59): using usual mutation frequencies in common diseases (interval [0.2, 0.4]) mTDT _{ P } outperforms mTDT in power; if we reduce mutation frequencies to be in the interval [0.1, 0.2], still high to be considered a rare variant, differences in power between the two test converge and even mTDT outperforms mTDT _{ P } under several scenarios.
Our ultimate goal is to have a multimarker test that: (1) requires little computational time, as mTDT or mTDT _{ HE }; (2) provides high power under very different circumstances, as mTDT or mTDT _{1T }; (3) performs stronger filtering than stateoftheart TDTs so that it can detect association in narrower regions when used as a first genomewide step in searching for disease susceptibility or protective genes. mTDT _{ P } achieves these three goals better than all the other tests we used. Moreover, by producing highly informative Comparative TDT (CTDT) maps using different lowcomplexity TDT measures with very different specificity and sensitivity behaviors and using IHMP samples as both control and test validators, we provide a robust tool for visual exploration that may assist molecular biologists in decisions about the regions to choose for fine mapping.
In conclusion, we believe mTDT _{ P } can benefit genomewide association studies as its higher locus specificity may be crucial to improve chances of detecting only associations close to a disease susceptibility or protective locus and therefore its chances of being replicated in different samples.
Web source
A supplementary website has been created for this study at http://bios.ugr.es/TDTP, where Figures S1–S43, Table S1, a detailed explanation of the simulations performed and the source code in c++ of the software developed for this work are available.
Notes
Acknowledgments
The authors were supported by the Spanish Research Program under project TIN200767418C0303, the Andalusian Research Program under project P08TIC03717 and the European Regional Development Fund (ERDF). We acknowledge the International Multiple Sclerosis Genetics Consortium (IMSGC) for giving us access to their data repository.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
References
 Abecasis GR, Martin R, Lewitzky S (2001) Estimation of haplotype frequencies from diploid data. Am J Hum Genet 69:198CrossRefGoogle Scholar
 Abramowitz M, Stegun I (1972) Handbook of mathematical functions. Dover, New YorkGoogle Scholar
 Betensky R, Rabinowitz D (2000) Simple approximations for the maximal transmission/disequilibrium test with a multiallelic marker. Ann Hum Genet 64:567–574CrossRefPubMedGoogle Scholar
 Bodner W, Bonilla C (2008) Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet 40:695–701CrossRefGoogle Scholar
 Bourgain C, Genin E, Holopainen P, Mustalahti K, Mki M, Partanen J (2001) Maximum identity length contrast: a powerful method for susceptibility gene detection in isolated populations. Am J Hum Genet 68:154–159CrossRefPubMedGoogle Scholar
 CastaoMartínez A, LópezBlázquez F (2005) Distribution of a sum of weighted central chisquare variables. Commun Stat Theory Methods 34:515–524CrossRefGoogle Scholar
 Clayton D (1999) A generalization of the transmission/disequilibrium test for uncertain haplotype transmission. Am J Hum Genet 65:1170–1177CrossRefPubMedGoogle Scholar
 Crawford DC, Bhangale T, Li N, Hellenthal G, Rieder MJ, Nickerson DA, Stephens M (2004) Evidence for substantial finescale variation in recombination rates across the human genome. Nat Genet 36:700–706CrossRefPubMedGoogle Scholar
 Daly M, Rioux J, Schaffner S, Hudson T, Lander E (2001) Highresolution haplotype structure in the human genome. Nat Genet 29:229–232CrossRefPubMedGoogle Scholar
 Dickson S, Wang K, Krantz I, Hakonarson H, Goldstein D (2010) Rare variants create synthetic genomewide associations. PLoS Biol 8:1000, 294Google Scholar
 Fan RZ, Xiong MM (2001) Linkage transmission disequilibrium test of two unlinked disease loci. Adv Appl Stat 1:277–308Google Scholar
 D'Netto MJ, Ward H, Morrison K, DeLuca S, Handunnetthi L, Sadovnick A, Ebers G (2009) Risk alleles for multiple sclerosis in multiplex families. Neurology 72:1984–1988CrossRefPubMedGoogle Scholar
 Gabler S, Wolff C (1987) A quick and easy approximation to the distribution of a sum of weighted chisquare variables. Stat Hefte Stat Pap 28:317–325CrossRefGoogle Scholar
 Gabriel S, Schaffner S, Nguyen H, Moore J, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, LiuCordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander E, Daly M, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296Google Scholar
 Gordon D, Heath SC, Liu X, Ott J (2001) A transmission/disequilibrium test that allows for genotyping errors in the analysis of singlenucleotide polymorphism data. Am J Hum Genet 69:371–380CrossRefPubMedGoogle Scholar
 Gorlov IP, Gallick GE, Gorlova OY, Amos C, Logothetis CJ (2009) Gwas meets microarray: are the results of genomewide association studies and geneexpression profiling consistent? prostate cancer as an example. PLoS ONE 4(8):e6511CrossRefPubMedGoogle Scholar
 Halldórsson B, Bafna V, Lippert R, Schwartz R, de La Vega F, Clark A, Istrail S (2004) Optimal haplotype blockfree selection of tagging snps for genomewide association studies. Genome Res 14:1633–1640CrossRefPubMedGoogle Scholar
 HapMapConsortium TI (2003) The international hapmap project. Nat Biotechnol 426:789–796Google Scholar
 Hellenthal G, Stephens M (2007) mshot: modifying hudson’s ms simulator to incorpore crossover and gene conversion hot spots. Bioinformatics 23:520–521CrossRefPubMedGoogle Scholar
 Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR (2005) Wholegenome patterns of common dna variation in three human populations. Science 18:1072–1079CrossRefGoogle Scholar
 International Multiple Sclerosis Genetics Consortium, Hafler DA, Compston A, Sawcer S, Lander ES, Daly M, Jager PD, de Bakker P, Gabriel S, Mirel D, Ivinsonand A, PericakVance M, Gregory S, Rioux J, McCauley J, Haines J, Barcellos L, Cree B, Oksenberg J, Hauser S (2007) Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med 357(9):851–62Google Scholar
 Johnson N, Kotz S, Balakrishnan N (1994) Continuous univariate distributions. Wiley, New YorkGoogle Scholar
 Kristjansdottir G, Sandling J, Bonetti A, IM IR, L LM, C CW, Gustafsdottir S, Sigurdssonand S, Lundmark A, K PTKK, Elovaara I, Pirttil T, Reunanen M, L LP, Saarela J, Hillert J, Olsson T, Landegren U, Alcina A, Fernández O, Leyva L, Guerrero M, Lucas M, Izquierdo G, Matesanz F, Syvnen A (2008) Interferon regulatory factor 5 (irf5) gene variants are associated with multiple sclerosis in three distinct populations. J Med Genet 45:362–369Google Scholar
 Kruglyak L (1999) Prospects for wholegenome linkage disequilibrium mapping of common disease genes. Nat Genet 22:139–142CrossRefPubMedGoogle Scholar
 Lam J, Roader K, Devlin B (2000) Haplotype fine mapping by evolutionary trees. Am J Hum Genet 66:659–673CrossRefPubMedGoogle Scholar
 Lazzeroni LC, Lange K (1998) A conditional inference framework for extending the transmission/disequilibrium test. Human Heredity 48:67–81CrossRefGoogle Scholar
 Li J, Wannng D, Dong J, Jiang R, Zhang K, Zhang S, Zhao H, Sun F (2001) The power of transmission disequilibrium tests for quantitative traits. Genet Epidemiol 18 (Supp 1):632–637Google Scholar
 Márquez A, Varadé J, Robledo G, Martínez A, Mendoza J, Taxonera C, FernándezArquero M, DíazRubio M, GómezGarcía M, LpezNevot M, de la Concha E, Martín J, Urcelay E (2009) Specific association of a clec16a/kiaa0350 polymorphism with nod2/card15() crohn’s disease patients. Eur J Hum Genet 17(10):1304–1308CrossRefPubMedGoogle Scholar
 Martínez A, Perdigones N, Espino MCL, Varadé J, Lamas J, J JS, FernándezArquero M, de la Calle H, Arroyo R, de la Concha E, B BFG, Urcelay E (2010) Chromosomal region 16p13: further evidence of increased predisposition to immune diseases. Ann Rheum Dis 69:309–11Google Scholar
 Montes R, AbadGrau MM (2009) Biocase: Accelerating software development of genomewide filtering applications. In: IWANN ’09: Proceedings of the 10th international workconference on artificial neural networks. Springer, Berlin, pp 1097–1100Google Scholar
 Niu T, Qin Z, XU X, Liu J (2002) Bayesian haplotype inference for multiple linked singlenucleotide polymorphisms. Am J Hum Genet 70:157–169CrossRefPubMedGoogle Scholar
 Nordborg M (2001) Coalescent theory. Wiley, Chichester, pp 179–212Google Scholar
 Ott J (1999) Analysis of human genetic linkage. John Hopkins, BaltimoreGoogle Scholar
 Rinaldo A, Bacanu SA, Devlin B, Sonpar V, Wasserman L, Roeder K (2005) Characterization of multilocus linkage disequilibrium. Genet Epidemiol 28:193–206Google Scholar
 Rioux JD, Daly MJ, Silverberg MS, Lindblad K, Steinhart H, Cohen Z, Delmonte T, Kocher K, Miller K, Guschwan S, Kulbokas EJ, O’Leary S, Winchester E, Dewar K, Green T, Stone V, Chow C, Cohen A, Langelier D, Lapointe G, Gaudet D, Faith J, Branco N, Bull SB, McLeod RS, Griffiths AM, Bitton A, Greenberg GR, Lander ES, Siminovitch KA, Hudson TJ (2001) Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to crohn disease. Nat Genet 29:223–228CrossRefPubMedGoogle Scholar
 Sebastiani P, Grau MMA, Alpargu G, Ramoni MF (2004) Robust transmission disequilibrium test for incomplete family genotypes. Genetics 168:2329–2337CrossRefPubMedGoogle Scholar
 Seltman H, Roeder K, Devlin B (2001) Transmission/disequilibrium test meets measured haplotype analysis: familybased association analysis guided by evolution of haplotypes. Am J Hum Genet 68:223–235CrossRefGoogle Scholar
 Sham PC (1997) Transmission/disequilibrium tests for multiallelic loci. Am J Hum Genet 61:774–778CrossRefPubMedGoogle Scholar
 Sham PC, Curtis D (1995) An extended transmission/disequilibrium test (TDT) for multiallelic marker loci. Ann Hum Genet 59:323–336CrossRefPubMedGoogle Scholar
 Solomon H, Stephens MA (1977) Distribution of a sum of weighted chisquared variables. J Am Stat Assoc 72:881–885CrossRefGoogle Scholar
 Spielman RS, Ewens WJ (1996) The tdt and other familybased tests for linkage disequilibrium and association. Am J Hum Genet 59:983–989PubMedGoogle Scholar
 Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulindependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–516PubMedGoogle Scholar
 Stuart A (1955) A test for homogeneity of the marginal distributions in a twoway classification. Biometrika Trust 42:412–416Google Scholar
 Tang R, Feng T, Sha Q, Z S (2009) A variablesized slidingwindow approach for genetic association studies via principal component analysis. Ann Hum Genet 73:631–637CrossRefPubMedGoogle Scholar
 Todd J, Walker N, Cooper J, Smyth D, Downes K, Plagnol V, Bailey R, Nejentsev S, Field S, Payne F, Lowe C, Szeszko J, Hafler J, Zeitels L, Yang J, Vella A, Nutland S, Stevens H, Schuilenburg H, Coleman G, Maisuria M, Meadows W, Smink L, Healy B, Burren O, Lam A, Ovington N, Allen J, Adlem E, Leung H, Wallace C, Howson J, Guja C, IonescuTrgovite C (2007) Robust associations of four new chromosome regions from genomewide analyses of type 1 diabetes. Nat Genet 39:857–864CrossRefPubMedGoogle Scholar
 Wei Z, Sun W, Wang K, Hakonarson H (2009) Multiple testing in genomewide association studies via hidden Markov models. Bioinformatics 25(21):2802–2808, http://www.bioinformatics.oxfordjournals.org/cgi/content/abstract/25/21/2802, http://www.bioinformatics.oxfordjournals.org/cgi/reprint/25/21/2802.pdf Google Scholar
 Yates F (1934) Contingency table involving small numbers and the χ^{2} test. J R Stat Soc 1:217–235Google Scholar
 Yu K, Gu CC, Xiong C, An P, Province M (2005) Global transmission/disequilibrium tests based on haplotype sharing in multiple candidate genes. Genet Epidemiol 29:223–235CrossRefGoogle Scholar
 Zhang S, Sha Q, Chen H, Dong J, Jiang R (2003) Transmission/disequilibrium test based on haplotype sharing for tightly linked markers. Am J Hum Genet 73:566–579CrossRefPubMedGoogle Scholar
 Zhao H, Zhang S, Merikangas KR, Trixler M, Wildenauer DB, Sun F, Kidd KK (2000) Transmission/disequilibrium tests using multiple tighly linked markers. Am J Hum Genet 67:936–946CrossRefPubMedGoogle Scholar
 Zhao J, Boerwinkle1 E, Xiong M (2007) An entropybased genomewide transmission/disequilibrium test. Hum Genet 121:357–367Google Scholar