# Performance of statistical methods on CHARGE targeted sequencing data

**Part of the following topical collections:**

## Abstract

### Background

The CHARGE (Cohorts for Heart and Aging Research in Genomic Epidemiology) Sequencing Project is a national, collaborative effort from 3 studies: Framingham Heart Study (FHS), Cardiovascular Health Study (CHS), and Atherosclerosis Risk in Communities (ARIC). It uses a case-cohort design, whereby a random sample of study participants is enriched with participants in extremes of traits. Although statistical methods are available to investigate the role of rare variants, few have evaluated their performance in a case-cohort design.

### Results

We evaluate several methods, including the sequence kernel association test (SKAT), Score-Seq, and weighted (Madsen and Browning) and unweighted burden tests. Using genotypes from the CHARGE targeted-sequencing project for FHS (n = 1096), we simulate phenotypes in a large population for 11 correlated traits and then sample individuals to mimic the CHARGE Sequencing study design. We evaluate type I error and power for 77 targeted regions.

### Conclusions

We provide some guidelines on the performance of these aggregate-based tests to detect associations with rare variants when applied to case-cohort study designs, using CHARGE targeted sequencing data. Type I error is conservative when we consider variants with minor allele frequency (MAF) < 1%. Power is generally low, although it is relatively larger for Score-Seq. Greater numbers of causal variants and a greater proportion of variance improve the power, but it tends to be lower in the presence of bi-directionality of effects of causal genotypes, especially for Score-Seq.

## Keywords

Case-cohort design CHARGE targeted sequencing data Rare variants Type I error Power SKAT Score-Seq Madsen and browning Burden tests## Background

Genome-wide association studies (GWAS) have identified hundreds of disease susceptible loci that harbor common variants, but most are not causal and explain only a small portion of the genetic risk for most diseases. The role of rare variants with minor allele frequency (MAF) < 0.05 has not been comprehensively explored in GWAS, while rare variant associations are believed to play an important role in disease etiology [1]-[12]. Emerging sequencing technologies allow for the characterization of virtually all of an individual’s genetic variation. Hence, motivations for this work are: 1) the shift in measurement of genetic variants away from common variation using genotyping arrays to genotyping or sequencing of rare variants, requiring greater understanding of rare variant methods; and 2) the high cost of sequencing requires careful consideration of efficient study designs. Here we discuss the case-cohort study design for sequencing studies and evaluate the possible limitations of current methods for data collected under this study design.

The Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Sequencing Project is a national, collaborative effort from three studies: Framingham Heart Study (FHS), Cardiovascular Health Study (CHS) and Atherosclerosis Risk in Communities (ARIC). What makes the CHARGE sequencing study different from other studies is its case-cohort design, where a cohort random sample plus selected individuals with extreme values from one or more pre-specified traits are considered for analysis. Such a study design is advantageous when investigators wish to examine multiple traits. One component of the CHARGE targeted sequencing study involves 1096 individuals from FHS, consisting of a cohort random sample of 504 study participants from the Offspring Cohort and 592 participants selected from the extremes of 11 traits.

In recent years, many statistical approaches have been developed to jointly analyze multiple rare variants in aggregate-based tests to gain power. But current statistical methods for rare variant association studies rarely consider a case-cohort design, and hence potential bias in estimation and type-I error might be observed in analyses of CHARGE targeted sequencing data. The methods developed to date are generally for studies in which participants are assumed to be independent and a random representation of the general population such as case-control design. Typically, all study participants are considered for analyses in a case-cohort design; for dichotomous traits, participants affected by a specific disease or trait are considered as cases and other participants carrying other diseases or from a random non-diseased sample are considered as controls; for quantitative traits, all participants are used in genetic association studies of a specific disease trait, but potential bias in effect estimates may arise when including selected extreme values. Some participants as “potential risk carriers” for multiple traits can also make the issue even more complex. The uniquely ascertained participants in a case-cohort design with correlated traits form a non-representative dataset and may generate biases.

To address the concerns regarding the case-cohort study design and application of methods for rare variants, we evaluate type I error and power of statistical methods for aggregate-based association tests of rare variants in the case-cohort study design of the CHARGE targeted sequencing project. We examine the statistical performance of commonly used methods, the Sequence Kernel Association Test (SKAT) [13], Score-Seq [14], weighted [15], and unweighted (T1 [16]) burden tests. These methods have been well-studied using simulated data. Although Ladouceur et al., [17] used Sanger sequencing from 1998 individuals for both continuous and binary traits in their power comparison, they did not perform type 1 error comparisons. Our work contributes in the following aspects. (1) We evaluate the statistical performance of several statistical methods that aggregate data in a genomic region on measured CHARGE targeted sequencing data based on a case-cohort design. (2) We evaluate over seventy-seven targeted sequencing regions in CHARGE, representing a wide range of genotype structures. (3) We consider complex, correlated phenotypes. (4) We evaluate both type I error and power, because power is a valid measure only if type I error is properly controlled. We aim to provide some guidelines on the performance of these methods to detect associations with rare variants on CHARGE targeted sequencing data using the case-cohort study design.

## Methods

### Data

*t*distribution with 15 degrees of freedom. The number of traits was doubled after each iteration; half were generated by adding the previous traits to a randomly generated

*t*value, and the second half were generated by adding the negative of the previous traits to a randomly generated

*t*value. We generated 2

^{4}= 16 traits in this manner, and we selected the first 11 traits for analysis. The correlation among the traits is given in Table 1.

**Correlation among 11 traits**

Traits\Traits | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|

| 1 | 0.6 | 0.2 | 0.6 | -0.2 | 0.2 | 0.6 | 0.2 | -0.6 | -0.2 | 0.2 |

| 0.6 | 1 | 0.6 | 0.2 | 0.2 | -0.2 | 0.2 | 0.6 | -0.2 | -0.6 | -0.2 |

| 0.2 | 0.6 | 1 | 0.6 | 0.6 | 0.2 | -0.2 | 0.2 | 0.2 | -0.2 | -0.6 |

| 0.6 | 0.2 | 0.6 | 1 | 0.2 | 0.6 | 0.2 | -0.2 | -0.2 | 0.2 | -0.2 |

| -0.2 | 0.2 | 0.6 | 0.2 | 1 | 0.6 | 0.2 | 0.6 | 0.6 | 0.2 | -0.2 |

| 0.2 | -0.2 | 0.2 | 0.6 | 0.6 | 1 | 0.6 | 0.2 | 0.2 | 0.6 | 0.2 |

| 0.6 | 0.2 | -0.2 | 0.2 | 0.2 | 0.6 | 1 | 0.6 | -0.2 | 0.2 | 0.6 |

| 0.2 | 0.6 | 0.2 | -0.2 | 0.6 | 0.2 | 0.6 | 1 | 0.2 | -0.2 | 0.2 |

| -0.6 | -0.2 | 0.2 | -0.2 | 0.6 | 0.2 | -0.2 | 0.2 | 1 | 0.6 | 0.2 |

| -0.2 | -0.6 | -0.2 | 0.2 | 0.2 | 0.6 | 0.2 | -0.2 | 0.6 | 1 | 0.6 |

| 0.2 | -0.2 | -0.6 | -0.2 | -0.2 | 0.2 | 0.6 | 0.2 | 0.2 | 0.6 | 1 |

We considered both positive and negative correlations among traits. There were strong positive correlations between pairs of traits such as traits 5 and 9, traits 6 and 10, and traits 7 and 11. There were also strong negative correlations between pairs of traits such as traits 1 and 9, traits 2 and 10, and traits 3 and 11. We picked some representative traits having a wide range of pairwise correlations to test the performance of the statistical methods. The selected traits included traits 1, 2, 6, 9, and 10. We focus on traits with differing correlations, positive and negative, especially with correlation 0.6 between traits 1 and 2, 0.2 between traits 1 and 6, -0.6 between traits 1 and 9, and -0.2 between traits 1 and 10.

Next, we sampled a subset of individuals from the large population, using the same sampling scheme that was used to select participants for the CHARGE targeted sequencing project. We first selected a random cohort with 504 individuals. We then sampled extremes for each of 11 traits, with participants in the extreme for one trait not eligible for selection for other traits. We chose the top 50 unselected individuals at the extremes for each of 10 traits, and then chose the top 92 to mimic one trait in FHS that had more individuals in the extreme. All individuals, regardless of selection, are analyzed using continuous traits in our case-cohort design.

*y*

_{0}under the null hypothesis. We generated phenotypes for our power evaluation using the equation

where Open image in new window indicates the additional power generated from *P* causal SNPs (coefficient *β*_{ j } for SNP *G*_{ j } with *j* = 1,...*P*) and *y*_{0} is generated under the null hypothesis. We randomly selected a portion of rare variants with MAF < 1% as causal variants, and the effect sizes for the causal variants were calculated by 0.4*|log10(MAF)|, following the approach of Wu et al. [13]. Power will increase with the larger the number of causal variants in this aggregate sum and the larger their effect sizes.

- 1.
When the total number of variants for a region was low and less than 10, we selected all variants as causal regardless of whether they are high risk or not. We had 2 such regions.

- 2.
When the number of high risk variants in a region was low and less than 5 and the total number of variants was between 10 and 100, we randomly selected an additional 50% of the variants as causal. We had 22 such regions.

- 3.
When the number of high risk variants in a region was low and less than 5 but the total number was greater than 100, we selected an additional 5% of the non-high risk variants as causal.

- 4.
When the number of high risk variants in a region was greater than 5, we chose all of them as causal.

- 5.
We assigned causal variants to have the same direction of genetic effects for phenotypes using rules 1-4. We also assigned a second set of causal variants to have bi-directional effects on phenotypes using rules 1-4 by setting the first half to have positive effects and the second half to have negative effects on the phenotypes.

Note that rules 2-4 ensure that the number of causal variants in a region is 5 or more. However, removal of variants with a high missing rate (>10%) results in several regions having the number of causal variants < 5. These regions are 1, 9, 12, 18, 39 that have 4 causal variants and region 45 that has 3 causal variants.

### Statistical methods description

*G*

_{ ij }denote the genotype of the

*j*th variant for the

*i*th person with values of 0, 1, or 2 according to the number of rare alleles for variant

*j*, where

*i*= 1, 2, ...,

*n*and

*j*= 1, 2, ...

*P*. Let

*Y*

_{ i }denote the trait, and

*Z*

_{ ik }the

*k*th covariate for participant

*i*, where

*k*= 1,...

*M*. We present methods for quantitative traits, but they can readily be extended to dichotomous traits.

- 1.
**Unweighted burden test statistic (T1-Count)**[16]

*j*. Association with this genotype score and the trait of interest can be evaluated using linear or logistic regression.

- 3.
**SKAT statistics**[13]

*α*

_{0},

*α*

_{ k }and

*β*

_{ j }are regression parameters. SKAT is a general approach and uses weights computed from the data. For our purposes in testing the null hypothesis H

_{0}:

**β =**0, SKAT takes a simple form, where

**β**is the vector of all

*β*

_{ j }

*s*. For a given set of weights, the score test can be expressed as

*y*

_{ i }from the model when there are no genotypes in the model. We used the Beta distribution for the weights,

*w*

_{ j }~

*Beta*(

*a*

_{1},

*a*

_{2}), with the default parameters

*a*

_{1}= 1 d

*a*

_{2}= 25 as suggested by Wu et al. [13]. Association between the trait of interest and the rare variants can be evaluated using the score test, and its significance is computed analytically using a mixture of chi-square distributions.

- 4.
**Score-Seq statistics**[14]

We can relate *Y*_{ i } to *G*_{ i } and *Z*_{ i } using the following linear regression model,

Y_{i} = *τS*_{ i } + *ϵ*^{ T }*Z*_{ i } + *ϵ*_{ i }, where *ϵ*_{ i } ~ *N*(0,ϵ*σ*^{2}).

Here *S*_{ i } = *ξ*^{ T }*G*_{ i }, a scalar from the product of a weighted linear combination of *G*_{i1},..., *G*_{ iP } with weights of *ξ*_{ j }. *ξ* = (*ξ*_{1},..., *ξ*_{ P })^{ T } is a *P* × 1 vector, *ξ* = *β*/*τ* and *τ* is a scalar constant, and *β* is a vector of coefficients for *G*_{ i } as defined in Equation (3) .

der *H*_{0}, the test statistic *T* = *U*/*V*^{1/2} has an asymptotic standard normal distribution. Lin et al. [14] also provides permutation-type tests for more accurate p values. We implemented permutation-type tests in this article.

## Results and discussion

We applied the four methods to the simulated data for each targeted region using targeted sequence genotype data from 1096 FHS participants. We evaluated type I error and power with 10,000 replicates, at significance levels of *α* = 0.001, 0.01, and 0.05. We restricted our analyses to genetic variants with MAF < 1%. We evaluate the statistical performance of the four approaches under a case-cohort design to provide some guidelines for studies with case-cohort designs.

### Type I error

*α*= 0.01. Assuming the number of rejections follows a binomial distribution, we calculated the 95% confidence interval for the type I error, and its bounds are indicated as the two ends for each region in Figure 1. The horizontal solid line indicates the nominal level of

*α*= 0.01. When the nominal level is within the 95% CI of type I error, we consider type I error to be properly controlled. We use numbers to indicate regions for simplicity of presentation. The mapping from region numbers to gene names and their chromosomes and positions are given in Additional file 1: Table S1 in the supplementary file.

From Figure 1 (a) for T1, we observe that the nominal significance level of 0.01 is within the 95% CI for most targeted regions. There are a few regions where the type I error is slightly inflated and the lower bound of the 95% CI is close to the nominal level, such as regions 26, 71 and 72. From Figure 1 (b) for MB, the type I error is also within the 95% CI for most targeted regions, with only a few regions having the lower bound of 95% CI close to the nominal level. From Figure 1 (c) for Score-Seq, the nominal level for most regions is within the 95% CI, with a few regions having a lower bound above the nominal level. From Figure 1 (d) for SKAT, no regions have type I error above the nominal level and the type I error tends to be conservative so that there are some regions with the upper bound of the 95% CI below the nominal level. As a result, SKAT has some regions with better controlled type I error than other methods such as region 24 compared to Score-Seq, although SKAT has more regions with conservative type I error. Regions 13 and 73 tend to have type I error close to or lower than the nominal level in T1, Score-Seq and SKAT. A closer look at them indicates that region 13 has 34 variants but region 73 has a large number of 257 variants with MAF < 1%. Overall, no methods have regions with 95% CIs above the nominal level except that SKAT has a few regions with the upper bound of 95% CI lower than the nominal level.

Type I error is also mostly under control when the nominal level alpha is larger at 0.05 or smaller at 0.001 (Additional file 1: Figure S1 in the Supplement). The overall control of type I error across methods is consistent with previous reports [13]-[15],[19]. Hence, although the case-cohort study design could induce biases by including extremes when using existing statistical methods, the type I error remains under control when applying these approaches to a case-cohort design.

### Power

We explored different characteristics of regions to investigate the possible explanations for the difference in power over regions. The characteristics included the total number of causal variants, the total allele count, and the proportion of the variance explained by the rare variants (R^{2}). The summary is in Additional file 1: Table S1 of the Supplement. The power for T1 and MB are consistently low across all regions regardless of characteristics, no matter whether the genetic effects are in the same or in opposite direction.

When the number of causal variants for a region was large, the power tended to be high for Score-Seq and SKAT (Additional file 1: Figure S3 (a) of the Supplement), particularly for Score-Seq, which is consistent with the observation from [14]. For example, regions 44 and 54 have 51 and 60 causal variants, and the power of Score-Seq was 0.258 and 0.360, respectively. When the number of causal variants was modest and greater than 10, the power for Score-Seq was lower, varying from 0.05 to 0.2. When the number of causal variants was low and less than 10, the power was even lower.

We also evaluated relevant measures, total allele count and total number of variants. The total allele count was the total number of the minor alleles in all causal variants in a region. If a region had a higher total allele count, the power tended to be high (Additional file 1: Figure S3 (c) of the Supplement). However, not all regions that had higher total allele count had higher power. For example, region 73 had a total allele count of 31, but the power was only 0.006 for SKAT and was 0.055 for Score-Seq when genetic effects.

*y*

_{ p }are from equation (1). The calculated

*R*

^{2}over regions is given in Figure 4, with the overall mean

*R*

^{2}equal to 0.0040. Looking at Figures 3 and 4, both methods tended to have larger power when

*R*

^{2}is larger, particularly Score-Seq with a larger increase (Additional file 1: Figure S3 (d) in the Supplement).

We investigated the power when causal variants have bi-directionality with 50% of variants having a positive effect and 50% of variants having a negative effect. The influence of relevant measures for power is plotted in Additional file 1: Figure S3 in the Supplement, when the causal effects are both uni-directional and bi-directional. The power over 77 regions is plotted in the lower half of Figure 3. The power for SKAT was low, and did not change much compared with the scenarios having the same directionality of genetic effect. The power for Score-Seq however decreased. The other characteristics of the targeted regions including the total number of causal variants and the total allele count did not change the power for SKAT much, compared to the power evaluated when all rare alleles shared the same directionality. This result is expected, as the power for SKAT is closely related to R^{2} and is robust to directionality (Figure 4). Further observation indicates that the power for alpha = 0.01 tended to be lower (Additional file 1: Figure S4 in the Supplement). Other factors that may influence the power include sample size and weighting schemes/distributions of effects [14],[17].

## Conclusion

Many statistical methods have been developed in recent years to evaluate the risk conferred by rare variants in human complex diseases. However, no statistical method has considered the case-cohort design. We evaluated several approaches to access the association between groups of rare variants and a complex quantitative trait, because most FHS traits in the targeted sequencing project were quantitative. We aimed to evaluate several representative statistical methods instead of a comprehensive evaluation of all existing methods and compared their performance on our CHARGE targeted sequencing data. Our work contributes in that we are the first to evaluate both type I error and power using a case-cohort design of observed targeted CHARGE sequencing data. Seventy-seven targeted regions represent a wide range of genotypic characteristics from real sequencing data, and we evaluated the performance using correlated complex phenotypes by mimicking the sampling scheme used in CHARGE targeted sequencing project.

Type I error in the case-cohort design of CHARGE targeted sequencing data was mostly under control, although type I error for SKAT tended to be conservative. We tested the type I error over correlated traits using SKAT. Most regions for most traits had appropriate type 1 error. These results suggested that correlated complex phenotypes in a case-cohort design may influence the behavior of type I error but not substantially. Power was generally low in our studies no matter whether the effects of causal variants have the same directionality or bi-directionality, consistent with observations in previous studies [13],[14],[17]. As SKAT tended to have somewhat conservative type I error, power should be evaluated carefully, considering the type I error for each method in a particular region.

We examined different characteristics of targeted regions to explore possible explanations for the difference in power across regions in the case-cohort design based on CHARGE targeted sequencing data. The characteristics that we examined included the total number of causal variants, the total allele count, the proportion of variance explained by the causal variants (R^{2}), the significance level and directionality. Power for Score-Seq tended to be higher, when the total number of causal variants and the total allele count were larger. Score-Seq tended to have higher power when R^{2} is larger. Bi-directionality does not seem to influence power much for SKAT, but lowers the power for Score-Seq. Other characteristics were also investigated in prior reports to determine the influence on power of characteristics such as the ratio of causal variants to total number of variants, effect sizes and sample sizes [13],[14],[17]. The proportion of the total number of causal variants among the total number of variants in a region is often used when the regions have a similar number of variants [13]. Our studies used effect sizes of the form 0.4*|log10(MAF)| [13] to generate larger effect sizes for rarer variants. Our Targeted Sequencing study in FHS had a fixed sample size of 1096, and hence we did not examine the influence of varying sample sizes. Although our results for power are limited to this sample size, we expect that comparisons across methods will be similar. Our work could also be extended to exome sequencing by applying tests to all variants within the exome of a gene, or within subsets of exons of a gene. Future improvements for statistical methods and a better understanding of the underlying genetic structure may aid in evaluating rare variant association studies.

### Availability of supporting data

The data set supporting the results of this article is available in the dbGAP repository, http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi%3Fstudy_id=phs000651.v3.p8.

## Additional file

## Notes

### Acknowledgements

Chuanhua Xing’s work is supported by 5 RC2 HL102419-02 (Boerwinkle), NIH/NHLBI.

## Supplementary material

## References

- 1.Xing C, Satten GA, Allen AS: A weighted accumulation test for associating rare genetic variation with quantitative phenotypes. BMC Proc. 2011, 5 (Suppl 9): S6-10.1186/1753-6561-5-S9-S6.PubMedPubMedCentralCrossRefGoogle Scholar
- 2.Hoffmann TJ, Marini NJ, Witte JS: Comprehensive approach to analyzing rare genetic variants. PLoS One. 2010, 5 (11): e13584-10.1371/journal.pone.0013584.PubMedPubMedCentralCrossRefGoogle Scholar
- 3.Ahituv N, Kavaslar N, Schackwitz W, Ustaszewska A, Martin J, Hebert S, Doelle H, Ersoy B, Kryukov G, Schmidt S, Yosef N, Ruppin E, Sharan R, Vaisse C, Sunyaev S, Dent R, Cohen J, McPherson R, Pennacchio L: Medical sequencing at the extremes of human body mass. Am J Hum Genet. 2007, 80: 779-791. 10.1086/513471.PubMedPubMedCentralCrossRefGoogle Scholar
- 4.Azzopardi D, Dallosso AR, Eliason K, Hendrickson BC, Jones N, Rawstorne E, Colley J, Moskvina V, Frye C, Sampson JR: Multiple rare nonsynonymous variants in the adenomatous polyposis coli gene predispose to colorectal adenomas. Cancer Res. 2008, 68: 358-363. 10.1158/0008-5472.CAN-07-5733.PubMedCrossRefGoogle Scholar
- 5.Brunham LR, Singaraja RR, Hayden MR: Variations on a gene: Rare and common variants in ABCA1 and their impact on HDL cholesterol levels and atherosclerosis. Annu Rev Nutr. 2006, 26: 105-129. 10.1146/annurev.nutr.26.061505.111214.PubMedCrossRefGoogle Scholar
- 6.Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH: Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004, 305: 869-872. 10.1126/science.1099870.PubMedCrossRefGoogle Scholar
- 7.Cohen JC, Pertsemlidis A, Fahmi S, Esmail S, Vega GL, Grundy SM, Hobbs HH: Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. Proc Natl Acad Sci U S A. 2006, 103: 1810-1815. 10.1073/pnas.0508483103.PubMedPubMedCentralCrossRefGoogle Scholar
- 8.Ji W, Foo JN, O’Roak BJ, Zhao H, Larson MG, Simon DB, Newton-Cheh C, State MW, Levy D, Lifton RP: Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet. 2008, 40: 592-599. 10.1038/ng.118.PubMedPubMedCentralCrossRefGoogle Scholar
- 9.Nejentsev S, Walker N, Riches D, Egholm M, Todd JA: Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science. 2009, 324: 387-389. 10.1126/science.1167728.PubMedPubMedCentralCrossRefGoogle Scholar
- 10.Romeo S, Yin W, Kozlitina J, Pennacchio LA, Boerwinkle E, Hobbs HH, Cohen JC: Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest. 2009, 119: 70-79.PubMedPubMedCentralGoogle Scholar
- 11.Slatter TL, Jones GT, Williams MJ, van Rij AM, McCormick SP: Novel rare mutations and promoter haplotypes in ABCA1 contribute to low-HDL-C levels. Clin Genet. 2008, 73: 179-184. 10.1111/j.1399-0004.2007.00940.x.PubMedCrossRefGoogle Scholar
- 12.Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, Cooper GM, Nord AS, Kusenda M, Malhotra D, Bhandari A: Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science. 2008, 320: 539-543. 10.1126/science.1155174.PubMedCrossRefGoogle Scholar
- 13.Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X: Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011, 89 (1): 82-93. 10.1016/j.ajhg.2011.05.029.PubMedPubMedCentralCrossRefGoogle Scholar
- 14.Lin DY, Tang ZZ: A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet. 2011, 89 (3): 354-367. 10.1016/j.ajhg.2011.07.015.PubMedPubMedCentralCrossRefGoogle Scholar
- 15.Madsen B, Browning S: A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009, 5 (2): e1000384-10.1371/journal.pgen.1000384.PubMedPubMedCentralCrossRefGoogle Scholar
- 16.Morris A, Zeggini E: An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol. 2010, 34 (2): 188-193. 10.1002/gepi.20450.PubMedPubMedCentralCrossRefGoogle Scholar
- 17.Ladouceur M, Dastani Z, Aulchenko YS, Greenwood CMT, Richards JB: The empirical power of rare variant association methods: results from Sanger sequencing in 1,998 individuals. PLoS Genet. 2012, 8 (2): e1002496-10.1371/journal.pgen.1002496.PubMedPubMedCentralCrossRefGoogle Scholar
- 18.Wang K, Li M, Hakonarson H: ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Res. 2010, 38: e164-10.1093/nar/gkq603.PubMedPubMedCentralCrossRefGoogle Scholar
- 19.Basu S, Pan W: Comparison of statistical tests for disease association with rare variants. Genet Epidemiol. 2011, 35 (7): 606-619. 10.1002/gepi.20609. 10.1002/gepi.20609. Epub 2011 Jul 18PubMedPubMedCentralCrossRefGoogle Scholar

## Copyright information

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.