Common variants have long been thought to have a major role in common phenotypes; however, genome-wide association studies (GWAS) that focused on common variants have explained only a small proportion of heritability. Because rare variants are expected to have larger effect sizes than common variants, they may explain a proportion of the missing heritability [1]. As reviewed by Bansal et al [2], because of limited power for detecting the individual effects of rare variants, several approaches for identifying them have used collapsing methods (called burden tests) that assess the combined effect of multiple rare variants in a genomic region.

Because these burden tests implicitly assume that all variants influence the trait in the same direction and with the same magnitude of effect (after incorporating known weights), their power is reduced if certain variants confer increased risk while others are protective [3]. Several methods have been developed recently that are less sensitive to this mixture of rare-variant effects. Two one-sided statistics by Ionita-Laza et al [4] quantify enrichment in risk variants and protective variants, respectively. The C-alpha test by Neale et al [5] uses the distribution of genetic variation observed in cases and controls to detect the presence of a mixture, thus implicating genes as risk factors for disease. By testing the variance rather than the mean, the C-alpha test maintains consistent power when the target set contains both risk and protective variants [5]. The sequence kernel association test (SKAT) by Wu et al [6] is a score-based variance component test for association of variants in a region for continuous and discrete traits, which easily adjusts for covariates.

Performance of burden tests has been shown to depend on how rare and common variants are weighted when collapsing them in a genomic region [2, 7]. In this article, using the Genetic Analysis Workshop 18 (GAW18) sequence data, we investigated whether performance of the most commonly used nonburden test, SKAT, also depends on the weighting scheme. In particular, we evaluated 3 weighting schemes: equal weighting, Madsen-Browning weighting, and SKAT default linear weighting. Analyses were performed without knowledge of the underlying simulation model. However, we used the GAW18 answers in presenting the results below.


Sequence and phenotypic data

GAW18 provided whole genome sequencing (WGS) data and longitudinal phenotype data for related individuals of Mexican American heritage. Because SKAT can only handle unrelated individuals, we used both sequence and phenotype data on a subset of 142 unrelated individuals. Using PLINK [8], we extracted the data and created input files for SKAT. We used the first-visit measurements for diastolic blood pressure (DBP) and systolic blood pressure (SBP) for all 200 simulated data and the most likely genotype data based on sequencing (geno.csv files). For covariates of both DBP and SBP, we used age, sex, blood pressure medication, and smoking.

Annotation files

To obtain a gene list to run SKAT, we used the annotation file that was constructed based on the 1000 Genomes Project ( Of the 8,348,674 single-nucleotide polymorphisms (SNPs) in the data set, 2,652,577 were located in genes, with a total of 10,922 genes. Some of these SNPs belonged to multiple overlapping genes and were used for multiple genes in which they were contained. Because 1,856,005 SNPs in the genotype files were not found in the 1000 Genomes annotation file, not all genes were available for testing. For the SBP phenotype, we were missing 4 causal genes, and for the DBP phenotype, we were missing 3 causal genes.

Statistical analysis

We applied SKAT to the analysis of all 200 replicates of simulated DBP and SBP phenotypes for 2,652,577 SNPs that are located in genes, spanning all odd-numbered chromosomes provided by GAW18. Because our analysis was based on 142 unrelated individuals, instead of using minor allele frequency (MAF) provided by GAW18 (which is based on 959 related individuals), we computed MAF based on these 142 individuals using PLINK [8]. We also constructed a data set including low-frequency SNPs (with our computed MAF <0.05).

For a continuous trait Y, SKAT uses a linear model Y i = γ 0 + γ 1 X i + β G i with genotype values G i and covariates X i for subject i. As described by Wu et al [6], SKAT assumes that the genetic effect β j of an individual variant j follows an arbitrary distribution with mean 0 and variance w j τ, where τ is a variance component and w j is a prespecified weight for variant j. SKAT further assumes that w j follows Beta (MAF j ; a 1 ,a 2 ). Weighting with a 1 = a 2 = 1 corresponds to equally weighting all variants regardless of their MAF, which was shown to be equivalent to the C-alpha test by Neale at al [5]. Weighting with a 1 = a 2 = 0.5 is the same as the weight used by Madsen and Browning [7]. Default linear weighting by SKAT uses a 1 = 1 and a 2 = 25, which puts more weight on rare variants than Madsen-Browning (M-B) weighting, as shown in Figure 1.

Figure 1
figure 1

Weighting schemes used in sequence kernel association test analysis. MAF, minor allele frequency.

For both data sets (one with all SNPs and another with only low-frequency SNPs), we ran SKAT using these 3 weighting schemes: equal weighting, Madsen-Browning weighting, and SKAT default linear weighting. For each scenario, we obtained p-values for the 10,922 genes in 200 replications. To evaluate performance, we computed power (true-positive) and type I error (false-positive) rates at level 0.05. For each gene, power was computed by the proportion of replicates with p < 0.05 over 200 replicates. The overall power was computed by averaging these values across all causal genes. Type I error was computed by averaging these values across all null genes.


Overall empirical power and type I error

Table 1 presents type I error and overall power using 3 weighting schemes for both data sets. All 3 weighting schemes performed poorly. No causal gene met the threshold for genome-wide significance, and the lowest p-value for a true positive occurred at 1.0 × 10−5 for analyses of SBP and 1.0 × 10−4 for analyses of DBP. The type I error was slightly below 0.05, roughly keeping the level 0.05. Overall power across all causal genes was about 0.05, which was almost identical to the type I error rate. The equal weighting scheme using all SNPs performed slightly better for both DBP and SBP phenotypes.

Table 1 Empirical power and type I error at α = 0.05 for diastolic blood pressure and systolic blood pressure phenotypes

Empirical power at causal genes

Although overall power was similar across the 3 weighting schemes, power at each causal gene varied greatly and depended on the weighting scheme. When using all SNPs, results were considerably different across 3 weighting schemes, as shown in the left plot of Figure 2. In particular, the correlation between power using equal weighting and power using default weighting was 0.17 (shown in Table 2). In contrast, when using rare SNPs, results were more consistent across different weighting schemes, as shown in the right plot of Figure 2. Results using default weighting were very similar when using all SNPs and when using only rare SNPs (with correlation 0.97).

Figure 2
figure 2

Power comparison using 3 weighting schemes. (A) All single-nucleotide polymorphisms (SNPs). (B) Rare SNPs. DBP, diastolic blood pressure M-B, Madsen-Browning; SBP, systolic blood pressure.

Table 2 Spearman correlation across empirical powers at all causal genes

To understand how power depended on the weighting scheme, we selected causal genes that show power over 0.2 from at least one of weighting schemes from Figure 2; these selected causal genes are presented in Table 3. For genes TXNIP, GLUL, PDCD6IP, DGKE, and SCAP, the SKAT default weighting gave much higher power than equal weighting. Except for PDCD6IP, all causal SNPs were low frequency (with MAF <0.05). Using only low-frequency SNPs provided similar performance.

Table 3 Causal genes with empirical power over 0.2 using any weighting scheme

However, for genes MAP4, FLNB, DHX8, FGR, S100A6, and KRT23, the equal weighting scheme gave much higher power than the SKAT default weighting (0.32 vs. 0.04 at MAP4 for DBP). Looking within the top 20 causal variants, we found that MAP4 has 6 of the top 10 variants with the greatest effect on SBP and DBP. For all of these variants, one was common with a MAF of 0.38, and the other 5 were low frequency (with MAF <0.05). Similarly, gene FLT3 had 2 variants that fell within the top 20 causal variants, one common with a MAF of 0.42 and one rare with a MAF of 0.0016. No other causal gene had more than one variant in the top 20 causal variants. Because of this, we suspect that equal weighting performed better than SKAT default weighting, which gives a much higher weight to rarer variants.

We also considered whether performance would depend on the signal-to-noise ratio, defined as the number of causal variants divided by the total number of variants in the gene, for each of the top 15 genes. For DBP, MAP4 had the 10th lowest sig signal-to-noise ratio, and FLT3 had the 13th lowest. For the SBP phenotype, they were 7th and 10th, respectively. Therefore, we found that signal-to-noise ratio could not account for the relatively high performance on MAP4 and FLT3.

Discussion and conclusions

SKAT has become one of the most commonly used nonburden tests for analyzing rare variants because it is fast and because nonburden tests are shown to be more powerful when most variants are noncausal or the effects of causal variants are in different directions. Using SKAT, we were able to analyze the entire sequencing data for all 200 replications of both DBP and SBP simulated phenotypes under six different cases (3 weighting schemes times two data sets: one with all SNPs another with only low-frequency SNPs). Performance of burden tests has been shown to depend on how to weigh rare and common variants when collapsing them in a genomic region. In this article, using the GAW 18 sequence data, we found that performance of the nonburden test SKAT also depended on the weighting scheme.

In this article, we focused on the ability of SKAT to detect genes with various MAF-based weighting schemes. When causal SNPs within a gene are mostly rare, then MAF-based weighting by upweighting rare variants and downweighting common variants would be more desirable. Indeed we found that the SKAT default weighting scheme (which heavily downweights common variants) performed better for such genes. We also found that this SKAT default weighting scheme behaved similarly to other weighting schemes after eliminating all common SNPs. In contrast, we observed that an equal weighting scheme performed the best for MAP4 and FLT3, both of which included a common variant with a large effect.

However, we found that SKAT with all 3 weighting schemes performed poorly. Overall power across all causal genes was about 0.05, which was almost identical to the type I error rate. We suspect that this poor performance is partly due to a small sample size. Wu et al [6] previously demonstrated that SKAT, as with other rare variant methods, is severely hindered by small sample sizes, and this data set was no exception. Lin and Tang [9] had previously compared several rare variant methods and found that SKAT is very conservative when sample size and the level α for calculating power are small. Our results were also consistent with a study performed by Daye et al [10] that showed at a sample size of 200, SKAT performs at or around the α = 0.05 level regardless of coverage.

Many GAW18 investigators that analyzed only unrelated individuals (with a sample size of 142) have also observed poor performance. Because of high sequencing costs, small sample sizes will continue to be a problem with rare variant analysis. We observed that analysis using all related individuals (with a sample size of 847) performed much better [11]. For example, results based on a burden test using all related individuals provided an empirical power of 0.99 for MAP4. SKAT was recently extended to handle related individuals for continuous phenotypes [12]. We expect that if the extended SKAT were applied to all related individuals in GAW18 data, it would provide better performance for detecting rare variants.

To apply both burden and nonburden tests for a gene, it is necessary to know which SNPs are contained in the gene. GAW18 performed WGS of 1043 individuals of Mexican American heritage with an average 60x sequencing depth, with a goal of finding novel SNPs. However, this creates a problem for applying these rare variant approaches because available annotation information does not contain these novel variants. In particular, among 1458 causal SNPs in the Answers, only 731 SNPs were contained in the annotation file that was constructed based on the 1000 Genomes Project. In addition to a small sample size, we suspect that our results were also affected by our use of incomplete annotation information. Although the results and issues presented in this article were based on GAW18 data, they may be shared with other sequencing studies.