Review

Two general approaches have been widely used to study the genetics of asthma: genome-wide linkage studies followed by positional cloning and candidate gene association studies. The results of linkage studies for asthma have been described in detail elsewhere [13]; this review focuses on the published candidate gene association studies for asthma.

Candidate gene approach: basic principles and potential problems

"Candidate genes" are selected because their biological function suggests that they could play a role in the pathophysiology of asthma (such as genes encoding cytokines and their receptors, chemokines and their receptors, transcription factors, IgE receptor, etc.). Association studies between variation in these candidate genes and asthma-related phenotypes are mostly conducted in unrelated case and unrelated control samples by comparing allele or genotype frequencies between samples. Association studies with candidate genes are appealing because they are hypothesis-driven and can identify genetic variation that has relatively modest effects on susceptibility [4]. Compared with linkage analysis, case-control studies are much simpler to perform and less costly because they do not require the collection of families. However, the interpretation of association studies is not always straightforward (for example, see ref. [5]). In particular, there are a large number of negative association studies with candidate genes that are never reported. Because the reported p-values are rarely adjusted for the total number of studies performed (both reported and unreported), the type I error rate in the reported studies is actually higher than the nominal level.

A statistically significant association between a variant in a candidate gene and a disease phenotype can have three possible explanations. (1) The marker allele truly affects gene function by altering the amino-acid sequence or by modifying splicing, transcriptional properties, or mRNA stability, and thereby directly affects disease risk. (2) The marker allele is in linkage disequilibrium (LD) with the true disease-causing variant. LD, or allelic association, is the nonrandom association of alleles at linked loci in populations, and will usually only be detected over small distances (≤ 60 approximated kb) [6, 7], although LD over longer distances has been observed. Thus, the marker allele must be located in relatively close proximity to the disease-causing variant. (3) The association is a false-positive result (type I error). Using a p-value of 0.05 as the threshold for significance will result in a 5% type I error rate. However, in most cases p-values are calculated using large sample approximations. As a result, the probability of type I errors for many of these approximations is higher than the nominal p-value when the sample size is small, as it is in most published studies. Further, false-positive results are more likely if multiple comparisons are made, either with multiple polymorphisms in the same gene, polymorphisms in multiple genes, or with multiple phenotypes. In these cases, the 5% false positive rate expected when the null hypothesis (of no association) is rejected at p < 0.05 no longer applies because there is a 5% type I error rate expected with each independent comparison (for example, with polymorphisms in different genes). A common correction for multiple comparisons is to multiply the p-values by the number of comparisons (known as a Bonferroni correction). Therefore, in a study of 10 variants, one would need to obtain a p-value of 0.005 to have the equivalent of a 5% type I error rate. However, the Bonferroni correction can be overly conservative if the multiple tests correspond to correlated variables. For example, phenotypes are often correlated (e.g., asthma and IgE levels) as are the genotypes of single nucleotide polymorphisms (SNPs) that are in LD. Thus, these comparisons do not represent independent tests and the Bonferroni correction can be extreme in these circumstances. In fact, the Bonferonni correction can be conservative even for independent tests [8]. Because there is no simple correction for multiple correlated comparisons, alternative methods to correct for the error rates are used. Permutation tests are useful because they preserve the correlation structure of the data and provide accurate p-values. They can be used to control the probability of committing any type 1 error and this can lead to stringent thresholds in studies with a large number of candidate genes. An alternative approach is to control the False Discovery Rate (FDR) [9], which is the proportion of false positives in the set of rejected hypotheses. FDR is a more liberal rate to control, so it is more powerful.

Type I errors can also result from genotyping errors, particularly if there is a systematic error such as overcalling one genotype over another. This is particularly worrisome in case-control studies in unrelated individuals because Mendelian error checks cannot be performed as they can for family studies. One way to minimize this is to make sure that the genotypes in the cases and controls are in Hardy-Weinberg proportions. Systematic errors in genotyping will often yield genotype frequencies that are not in Hardy-Weinberg proportions. In fact, there are a surprisingly large number of published associations that either did not check for Hardy-Weinberg equilibrium or presented data that were not in Hardy-Weinberg proportions [10]. When published association studies of a variety of diseases were re-examined, 12% of the 133 SNPs reported were not in Hardy-Weinberg equilibrium in the controls, suggesting genotyping error. Further, the proportion of SNPs that deviated from equilibrium was higher among the SNPs for which a positive association was reported [10]. Some of these markers were not identified by the authors as showing deviations from equilibrium. One explanation for this could be that they used an incorrect test of significance, which may not be uncommon (e.g. [11, 12]). In particular, for a biallelic marker with three genotypes (such as for SNPs), the significance testing for Hardy-Weinberg equilibrium is based on a 1-degree of freedom test. On the other hand, markers showing departure from Hardy-Weinberg equilibrium should not be automatically discarded because deviation from Hardy-Weinberg equilibrium among cases is expected for variants close to a susceptibility locus under many genetic models [1315]. Nevertheless, markers that show deviations from expectations should be closely scrutinized and retyped to ensure that they are genotyped correctly. Lastly, type I errors can result from population substructure, or sampling cases and controls that differ with respect to ethnic background. Because allele frequencies vary among ethnic groups, great care must be taken to assure that case and control subjects have similar ethnic compositions. If they differ, an allele may be significantly more frequent in the cases compared with controls due to differences in ethnicity, but this may be misinterpreted as an association with the disease. Methods are now available to directly test for stratification and to correct for any imbalances [16, 17], but these require genotyping the case and control samples for 30 or more informative loci (i.e., loci that discriminate between pairs of racial or ethnic groups [17]). Another approach to address the problem of population admixture is to conduct family-based association studies with analytical methods that are robust to population admixture, such as the transmission disequilibrium test (TDT) [18]. These methods have become increasingly popular in genetic studies of complex disorders, although they are uniformly less powerful than studying an equivalent number of unrelated cases and controls.

Not all associations that are not replicated are false positive results. An association may not be replicated because of different patterns of LD in different populations. Differences in LD patterns can be caused by differences in allele frequencies and/or the presence of more than one causal variant. Although there are no examples of this, it remains a theoretical possibility. This can be addressed by examining haplotypes instead of single SNPs. Many studies have now shown that examining multiple SNPs as haplotypes is often preferable to single SNP analysis [19, 20]. A haplotype is composed of alleles at different loci that are inherited together on the same chromosome. Thus, even if the disease-causing variant itself is not identified, a shared haplotype that contains the disease variant will be more common in cases than in controls. This could in addition help to identify the true susceptibility variant. On the other hand, an association may not be replicated because the phenotype is defined differently between studies. For example, the phenotype "atopy" has been defined as a positive skin prick test [SPT], a positive RAST test, high total serum IgE, or a combination of these tests. Although these phenotypes are clearly related, it is likely that some genes that influence total IgE levels do not influence specific IgE response to allergens, and vice versa.

Lastly, positive associations may not be replicated because the true model of genetic susceptibility for diseases such as asthma and atopy is complex. It is most likely that any particular susceptibility variant has a relatively minor effect on the phenotype and that the magnitude of its effect will be influenced by genes at other loci (gene-gene interactions) [21, 22] and by the environmental factors (gene-environment interactions) [2325]. In fact, some variants may only confer susceptibility in combination with other genes (epistasis) or in certain environments. Because background genes and environmental factors differ between populations it would not be surprising if associations with single SNPs or haplotypes differed between populations.

Review of the association study literature

We searched the public databases for published candidate gene association studies of asthma and related phenotypes, using keywords "association" or "case-control" together with each of the following: "asthma", "bronchial hyperresponsiveness", "BHR", "atopy", "SPT", "atopic dermatitis", "IgE", and "drug response". We identified 199 studies with at least one significant association reported. These studies identified 64 genes as potential susceptibility loci. We then searched for all other association studies with variants in these 64 genes (Table 1 [see Additional file 1]). For this analysis, we considered an initial association replicated if at least one other study found an association with variation in the same gene, but not necessarily with the same variant. Using these criteria, 33 gene associations were replicated and 9 associations were not replicated either in a second study or in a second sample in the same study. Additionally, 22 genetic associations were only reported in a single study, i.e. replication studies have not been published, even though it is likely that at least some have been performed. In this regard, the establishment of a database of non-significant candidate gene studies would helpful for prioritizing genes for genetic studies and would therefore benefit the asthma genetics community overall. Nonetheless, the results of this survey underline the potential for a great amount of heterogeneity underlying asthma. However, many of these studies have been conducted in small samples (<100 cases and controls), few correct for multiple testing, and in many cases, replication studies were performed in different ethnic groups than those studied in the original report. Lastly, in only 34.8 % of studies, all markers were reported to be in Hardy-Weinberg equilibrium, while in 2.2 % of studies at least one marker was reported to deviate from equilibrium, and in the majority of studies (63 %) testing for Hardy-Weinberg proportions was not mentioned at all.

Regardless of these caveats, some genes stand out because they were associated with asthma-phenotypes rather consistently across studies and populations. In particular, variation in eight genes have been associated with asthma-phenotypes in five or more studies: interleukin-4 (IL4), interleukin-13 (IL13), β2 adrenergic receptor (ADRB2), human leukocyte antigen DRB1 (HLA-DRB1), tumor necrosis factor (TNF), lymphotoxin-alpha (LTA), high-affinity IgE receptor (FCER1B) and IL-4 receptor (IL4RA). These loci likely represent true asthma or atopy susceptibility loci or genes important for disease modification. An example of the latter is the ADRB2 gene, which has been more consistently associated with asthma severity than with asthma or atopy per se.

Conclusions

Variants in 64 genes have been associated with asthma or atopy phenotypes in at least one study, although many of these studies are methodologically limited and need replication. In the future, association studies incorporating gene-gene and gene-environment interactions may help to disentangle some of the complexities of these diseases and explain some of the discrepant results. Lastly, the development of guidelines for establishing appropriate thresholds for significance in association studies would impose more rigorous standards on candidate gene studies, similar to what is now standard for linkage studies [26], and the creation of a database for unpublished association studies would be helpful for evaluating the overall evidence for association of asthma or atopy candidate genes.