Background

A possible genetic predisposition to tuberculosis has been suggested in several studies,[13] but the functional immunologic correlate of the genetic polymorphisms identified is often unclear[4]. We have sought to first identify immunologic defects that may predispose to tuberculosis, then assess for genetic polymorphisms associated with these immunologic defects. Because there is evidence that extrapulmonary tuberculosis is the result of an underlying immune defect, [5] we have focused our search on persons with prior extrapulmonary disease. In previous studies we noted that HIV-seronegative adults with prior extrapulmonary tuberculosis had lower levels of CD4+ lymphocytes, unstimulated cytokine production, and tumor necrosis factor (TNF)-α production in response to lipopolysaccharide (LPS) or LPS + interferon (IFN)-γ than persons with prior pulmonary tuberculosis or latent M. tuberculosis infection[6, 7].

In the current study we have combined two previous study populations in which cytokine responses were well-characterized, [6, 7] to identify genetic polymorphisms associated with extrapulmonary tuberculosis, and by extension, the associated immunologic abnormalities. We make use of a candidate gene approach utilizing single nucleotide polymorphisms (SNPs) and microsatellites previously reported to be associated with tuberculosis, and we also include SNPs in candidate genes that are hypothesized to play a role in tuberculosis pathogenesis.

Because several loci could contribute to the phenotype expressed in complex diseases such as tuberculosis, it is important to search for gene-gene interactions (epistasis), which may give a more accurate prediction of disease risk. Such interactions are difficult to detect using traditional statistical methods such as single locus association tests because those tests were not developed to detect purely interactive effects. Such tests identify genes with main effects and then follow-up analyses assess for interactions between genes that exhibit a main effect. New statistical and computational methods that have better power to detect interactions, including those without main effects, in relatively small sample sizes, are required. Multifactor dimensionality reduction (MDR) analysis is a novel method developed to address this need[8]. In the current study both single locus association tests and MDR analysis were used to detect potential single or multi-locus interactions that predict extrapulmonary tuberculosis.

Results

Clinical and Demographic Characteristics

There were 24 extrapulmonary tuberculosis cases (18 black), 24 pulmonary tuberculosis controls (19 black), and 57 controls (49 black) with latent M. tuberculosis infection. The clinical and demographic characteristics of the study population are in Table 1. There were no significant differences between cases and controls according to age, sex, or race. However, pulmonary controls had lower median body mass index (BMI) at the time of study than either extrapulmonary tuberculosis cases or controls with latent M. tuberculosis infection. Extrapulmonary tuberculosis cases had lower median CD4+ lymphocytes at time of study than controls. The sites of disease among extrapulmonary cases included lymphatic (n = 9), pleural (n = 4), bone/joint (n = 4), pericardial (n = 2), genito-urinary (n = 2), miliary/meningeal (n = 1), gastrointestinal (n = 1), and laryngeal (n = 1).

Table 1 Clinical and Demographic Characteristics of the Study Population

SNPs Associated with Tuberculosis in Previous Studies

Of the 27 SNPs and 3 microsatellites previously reported in the literature as being associated with tuberculosis (Table 2), 25 SNPs and all 3 microsatellites were assessed by PCR-based SNP genotyping. The IL-1β -511 (C→T) and vitamin D receptor (VDR) Bsm1 SNPs were on the GeneChip® Human Mapping 50K Xba Array. Of the 30 polymorphisms tested, 5 SNPs (rs 17235416, rs1800450, rs5743708, rs6265786, rs2393799) were unable to be genotyped after quality control and were therefore excluded. Only markers with < 10% missing data were considered for analysis. No imputation of missing values was performed. For univariate analyses, missing values were removed for analysis; for the MDR analysis, missing observations were ignored, such that attribute construction and accuracy calculations were only performed based on observed values.

Table 2 Polymorphisms Tested that had Previously Been Reported as Associated with Tuberculosis

In single locus association testing of each polymorphism among all study participants, the IL-1β +3953 (C→T) SNP was associated with any tuberculosis (P = 0.044) and extrapulmonary tuberculosis (P = 0.049) compared to PPD+ controls (Table 3). Among black patients, the Fok1 SNP in the VDR gene was associated with extrapulmonary tuberculosis compared to pulmonary tuberculosis controls (P = 0.018).

Table 3 Of the Polymorphisms Tested that had Previously Been Reported as Associated with Tuberculosis (Table 2), the SNPs Associated with Tuberculosis in this Study

In MDR analysis of the 25 polymorphisms among all study participants, the microsatellite in TLR2 (GT) had 61% prediction accuracy for any tuberculosis compared to controls with M. tuberculosis infection (P = 0.038) (Table 3). It also had 63% prediction accuracy for any tuberculosis in blacks (P = 0.047) and 76% prediction accuracy for extrapulmonary tuberculosis in blacks (P = 0.002). Results did not change when the analysis included CD4+ lymphocytes and BMI as potential covariates (results not shown).

SNPs in Candidate Genes

There were 661 SNPs in genes presumed to be associated with tuberculosis pathogenesis (Table 4), of which 613 (93%) had genotyping efficiency > 90% (i.e., < 10% missing data) and were included in the analysis. In the MDR analysis of these 613 SNPs, an intronic SNP in TNF-α (C→T) (rs1811063) predicted tuberculosis with 62% accuracy among all patients (P = 0.05; Table 5). The combination of two SNPs in TLR4 (C→T) (rs1399431) and TNF-α (C→T) (rs7791836) predicted tuberculosis risk with 71% accuracy in blacks (P = 0.02). Results did not change when the analysis included CD4+ lymphocytes and BMI as potential covariates (results not shown). There were no SNPs associated with extrapulmonary tuberculosis compared to controls.

Table 4 Polymorphisms in the Affymetrix GeneChip® Human Mapping 50K Xba Array in Genes Hypothesized to be Associated with Tuberculosis Pathogenesis
Table 5 Of the SNPs Listed in Table 4, the SNPs Associated with Tuberculosis in This Study Population

Discussion

In this study population we first identified immunologic defects among persons with extrapulmonary tuberculosis, then explored possible associations with candidate genetic variants. In single locus association tests among genetic variants previously associated with tuberculosis susceptibility, IL-1β +3953 was associated with extrapulmonary disease, as well as all forms of tuberculosis, compared to persons with M. tuberculosis infection. Among black patients, the Fok1 SNP in the VDR gene distinguished extrapulmonary from pulmonary disease. The MDR analysis provides important additional results, however, because it is more powerful than traditional logistic regression analysis,[9] adjusts for multiple comparisons via permutation testing, and because the results were more consistent across the groups assessed. MDR also evaluates potential gene-gene and gene-environment interactions, which are important to investigate in complex phenotypes such as tuberculosis pathogenesis. The TLR2 microsatellite predicted extrapulmonary tuberculosis in black patients with 76% accuracy. It also predicted all forms of tuberculosis in blacks, as well as in the full study population. The prediction accuracy estimates the importance of variables in the dataset, with the intent of generalizing the model to the full population.

These results point to the importance of IL-1β +3953, VDR Fok1 A/G, and the TLR2 microsatellite in extrapulmonary tuberculosis risk, with the latter two being particularly significant in blacks, even in such a small sample size. This is important in light of the increased proportion of extrapulmonary tuberculosis in blacks[5]. Further studies are needed to determine if these polymorphisms could account for this epidemiological finding. This study included very few individuals of other racial backgrounds, making comparisons of genetic models between racial groups impossible. Our results are consistent with the previously noted association between IL-1β +3953 and tuberculous pleurisy, and VDR Fok1 and several forms of extrapulmonary tuberculosis--both noted among in Gujarati Asians living in London[10, 11]. They are also consistent with the recently identified link between TLR and the innate immune response to M. tuberculosis: the relationship between TLR signaling, the up-regulation of the VDR, and vitamin D-mediated killing of intracellular M. tuberculosis via the microbial peptide cathelicidin[12]. In that study, blacks had low 25-hydroxyvitamin D levels and low cathelicidin messenger RNA induction. While immediate biological interpretation of our results is not possible, defects in M. tuberculosis recognition and/or subsequent intracellular signaling in the nuclear factor kappa B (NF-κB) pathway related to TLR2 polymorphisms would be consistent with the subtle innate immune defects in the extrapulmonary tuberculosis patients in this study population: low unstimulated cytokine production, decreased TNF-α production in response to LPS or LPS + IFN-γ, and low CD4+ lymphocytes[6, 7, 4].

The results from SNPs in genes hypothesized to play a role in tuberculosis pathogenesis showed that no SNPs were significantly associated with extrapulmonary tuberculosis. There was, however, a SNP in TNF-α that was associated with tuberculosis among all study participants. And among blacks, a combination of SNPs in TNF-α and TLR4 predicted tuberculosis risk with 71% accuracy. These SNPs cannot be interpreted as easily in light of the immunologic findings, which pertain to extrapulmonary tuberculosis. However, they suggest that similar cytokine pathways as those noted above are important in tuberculosis pathogenesis.

It was notable that persons with prior extrapulmonary tuberculosis or latent M. tuberculosis infection had significantly higher body mass index (BMI) than persons with pulmonary tuberculosis. If BMI after completion of treatment is comparable to BMI prior to development of disease, it would suggest that lower BMI may predispose to pulmonary tuberculosis. It should be noted that the median BMI in persons with pulmonary disease was within the normal range, whereas BMI in persons in the other two groups was high. A recent large population-based study in Hong Kong found that as BMI increased, the risk of pulmonary tuberculosis decreased, but the risk of extrapulmonary tuberculosis did not[13]. The protective effect of increased BMI on pulmonary disease in that study persisted even after controlling for confounding variables, including smoking and diabetes mellitus. The above findings suggest that risk factors for extrapulmonary tuberculosis are not ameliorated by increased BMI--which would be consistent with an immunogenetic predisposition to extrapulmonary tuberculosis. Factors related to the M. tuberculosis strain also play a role in extrapulmonary disease[14].

The most important limitation of the current study was the small sample size. This was driven by the intensive immunologic testing performed on specimens from each participant. Statistically significant genetic associations were indeed identified, but the sample was under-powered (with either traditional methods or MDR analysis) to detect other potential associations. Additionally, even for the positive associations detected, reliable estimates of effect size cannot be determined, due to the "winner's curse" related to the small sample size[15]. Table 6 shows the detectable effect sizes with the current sample size for the variants previously associated with tuberculosis. These calculations assume 80% power and an uncorrected alpha of 0.05%. Larger studies are needed in which candidate gene polymorphisms and associated cytokine pathways are fully evaluated in all patients, so that any genetic polymorphisms identified can be interpreted in the context of the cytokine abnormalities noted. Given this, one should consider the results of this study--particularly those from the candidate gene approach and interaction analysis--as hypothesis-generating rather than hypothesis-confirming. While MDR has greater statistical power than traditional methods, its power to detect gene-gene and gene-environment interactions is less than that to detect main effects. In searching for interactive effects, the small sample size did not allow for investigating interactions with more than two loci.

Table 6 Odds Ratios that Could Have Been Detected Given the Sample Size and Minor Allele Frequency in the Study Population

There were other limitations of this study. First, of the SNPs tested that had previously been associated with tuberculosis, most pertained to pulmonary rather than extrapulmonary disease. Due to possible differences in the pathophysiology of these two disease manifestations, [7] one might expect that not all of the SNPs tested would be related to the pathogenesis of extrapulmonary tuberculosis. Second, not all genes that are presumably associated with tuberculosis pathogenesis had SNPs included in the GeneChip® Human Mapping 50K Xba Array, so not all such genes could be assessed. Third, we were unable to directly incorporate cytokine response data into the genetic analyses because the methodology used to quantify cytokine responses was not the same in the two immunologic studies.

Additionally, concerns with multiple testing arise when screening such a large number of genetic variants. The results of the first approach are presented without any corrections for multiple testing. Because these variants have been associated with tuberculosis before, each test represents an independent statistical hypothesis. By using MDR and permutation in the second approach, P values were empirically derived based on the total number of tests at each stage.

Conclusions

The results of this study suggest that an evaluation of the underlying mechanism(s) of the genetic variants in IL-1β +3953, VDR Fok1 A/G, and the TLR2 microsatellite--and their role in the pathogenesis of extrapulmonary tuberculosis--is warranted. Comprehensive immunogenetic studies will contribute to our understanding of tuberculosis pathogenesis, and may allow us to identify persons at highest risk of developing tuberculosis.

Methods

Study participants

The study population was pooled from two immunologic studies that have been described previously; the inclusion criteria for both studies were similar and are described in detail elsewhere[6, 7]. Briefly, patients were identified through the Baltimore City Health Department Eastern Chest Clinic and Nashville Metropolitan Health Department Tuberculosis Clinic. In this case-control study, eligibility criteria for case patients included a history of treated culture-confirmed extrapulmonary tuberculosis, age ≥ 18 years old, and HIV-seronegative status. Extrapulmonary disease was defined as any site outside of the pulmonary parenchyma. Exclusion criteria included serum creatinine >2 mg/dL, use of corticosteroids or other immunosuppressive agents at the time of diagnosis or time of study entry, malignancy, or diabetes mellitus. The criteria for pulmonary tuberculosis control patients included HIV-seronegative adults ≥ 18 years old who had completed treatment for culture-confirmed pulmonary tuberculosis, and had no evidence of extrapulmonary tuberculosis. Positive cultures of sputum, bronchoalveolar lavage, or pulmonary parenchyma were required. Controls with latent M. tuberculosis infection were ≥ 18 years old, HIV-seronegative, and had a positive tuberculin skin test (defined as ≥ 10 mm induration after intradermal placement of 5 tuberculin units of PPD) without evidence of active tuberculosis. Participants in this control group were U.S.-born (and therefore not vaccinated with BCG), and were mostly close contacts of tuberculosis cases. Exclusion criteria for both control groups were the same as for the case group. Controls were drawn from the same two clinic populations as cases, and were not related to the cases. Extrapulmonary tuberculosis cases and pulmonary controls completed treatment prior to study entry.

This study was approved by the institutional review boards of the Johns Hopkins Hospital, the Baltimore City Health Department, the National Institutes of Health, Vanderbilt University Medical Center, and the Nashville Metropolitan Health Department. All study participants provided written informed consent.

Sample Collection

CD4+ lymphocytes were quantified by flow cytometry. DNA was extracted from blood samples using the Puregene DNA Isolation kit (Gentra Systems, Minneapolis, MN) following the manufacturer's protocol. Genomic DNA was stored at -70°C until genotyping. Laboratory personnel were blinded to the case-control status of the specimens.

SNP Genotyping

As of November 15, 2006, 27 SNPs were identified in the literature as being associated with tuberculosis (Table 2). SNPs not included in the GeneChip® Human Mapping 50K Xba Array (see below) were genotyped using validated TaqMan SNP genotyping assays from Applied Biosystems. For each SNP, 25 ng DNA was used. The polymerase chain reaction (PCR) was performed in an ABI 9700 thermocycler under the following conditions: 95°C for 10 minutes followed by 50 cycles of 92°C for 15 seconds and 60°C for 1 minute. The 384-well plates were read on an ABI 7900HT sequence detection system according to manufacturer's manual. The probes were labeled with FAM or VIC dye at the 5' end and a minor-groove binder and non-fluorescent quencher at the 3' end.

Microsatellite Genotyping

As of November 15, 2006, 3 microsatellites were identified in the literature as being associated with tuberculosis (Table 2). A 5'(GT)n microsatellite associated with NRAMP1, a VNTR associated with interleukin (IL)-1RA, and a GT repeat polymorphism in the toll-like receptor (TLR) 2 gene were all genotyped using fluorescent fragment analysis [16]. All 3 microsatellites were amplified by PCR. The PCR primers and conditions for the 5'(GT)n and IL-1RA microsatellites were previously reported[17, 10]. The TLR2 microsatellite was amplified using the following conditions: 1 minute at 96°C; 30 cycles of 94°C for 1 minute, 1 minute at 53°C, and 2 minutes at 70°C; and a final elongation period of 10 minutes at 70°C. The primers used to amplify the TLR2 microsatellite were previously published [18]. The forward primers for NRAMP1, IL-1RA and TLR2 microsatellites (5'-ACT CGC ATT AGG CCA ACG AG-3', 5'-CTC AGC AAC ACT CCT AT-3', and 5'-GCA TTG CTG AAT GTA TCA GGG A-3' respectively) were all 5' labeled with FAM purchased from MWG Biotech. Following PCR, the amplicons were electrophoresed on a 3730s DNA analyzer (Applied Biosystems) and analyzed with GeneMapper 4.0 (Applied Biosystems).

Candidate Gene Genotyping

Twenty six candidate genes were identified based on their potential role in tuberculosis immunopathogenesis and the inclusion of SNPs in these genes in the GeneChip® Human Mapping 50K Xba Array (Affymetrix, Inc., Santa Clara, CA). A total of 661 SNPs in these 26 candidate genes were present (Table 4). Genotyping was performed according to the manufacturer's protocol. Briefly, a complexity reduction process was performed where genomic DNA (250 ng) was digested with XbaI, ligated to XbaI adaptor (Affymetrix), and amplified by polymerase chain reaction (PCR) using primers specific to the ligated adaptor. Cycling conditions were an initial denaturation of 94°C for 3 minutes followed by 30 cycles of 94°C for 30 seconds, 60°C for 45 seconds, and 68°C for 60 seconds. A final extension of 68°C for 7 minutes concluded the reactions. PCR products were assayed by gel electrophoresis, purified, fragmented to < 250 bp using dilute DNaseI (Affymetrix), biotin end-labeled with terminal deoxynucleotidyl transferase, and hybridized to the 50K Xba Array at 48°C for 16 hours at 60 rpm. The hybridized arrays were washed and stained on Fluidics Station 450 and scanned with the GeneChip Scanner according to the manufacturer's settings (Affymetrix). The arrays were analyzed with software GDAS version 3.0.2 (Affymetrix), which provides rank scores for the probability of particular genotypes at SNP loci. The scores were AA or BB for homozygous alleles and AB for heterozygous alleles, and confidence scores showed the accuracy of the genotype call. Standard procedures and default analysis parameters for individual DNA samples were employed. An internal control run in parallel did not detect any DNA contamination. All procedures were performed using the same lots of reagents.

Quality Control

Genotype calling was performed using GDAS version 3.0.2 software as noted above. This software uses the Dynamic Models algorithm for genotype calling [19]. All markers with significant (P < 0.05 after Bonferroni correction) departure from Hardy-Weinberg equilibrium were excluded from final analyses. In addition, SNPs in the GeneChip® Human Mapping 50K Xba Array with <90% efficiency (>10% missing data) were excluded from the analysis. Missing data patterns did not deviate from random expectations.

Statistical Analysis

Clinical and demographic characteristics were compared among the three patient groups (extrapulmonary tuberculosis, pulmonary tuberculosis, and PPD+) using the Kruskal-Wallis test for continuous variables and the chi-squared and Fisher exact tests for categorical variables. Missing observations were individually excluded from all stages of analyses (for both the univariate and MDR analyses (no imputation was performed).

Two populations were assessed for genetic factors associated with extrapulmonary tuberculosis: 1) all individuals regardless of race; 2) only black participants. This stratified analysis was performed to minimize spurious associations due to population stratification [20]. There were not enough individuals to perform analyses stratified by other racial groups. The two populations were further divided into three subgroups for association analyses: a) any tuberculosis vs. PPD+; b) extrapulmonary tuberculosis vs. PPD+; and c) extrapulmonary vs. pulmonary tuberculosis.

Genetic analyses were performed in two stages (see below). Single locus chi-squared association tests (genotypic tests, with 2 degrees of freedom) (STATA version 9.0; College Station, TX) were performed on variants previously reported to be associated with tuberculosis susceptibility. MDR analyses were performed both on previously reported variants and on variants in candidate genes. MDR was performed to explore potential single-locus associations and gene-gene interactions between variants[21]. Details of the MDR algorithm and implementation are described in the Additional file 1 and Figure 1.

Figure 1
figure 1

Summary of the general steps to implement the MDR method, adapted from Ritchie and Motsinger 2005 [52]. In step one, the exhaustive list of n combinations are generated from the pool of all independent variables. In step two, for k = 1 to N, the combinations are represented in k-dimensional space, and the number of responders and non-responders are counted in each multifactor cell. In step three, the ratio of responders to non-responders is calculated within each cell. In step four, each multifactor cell in the k-dimensional space is labeled as high-likelihood/high-risk if the ratio of responsive individuals to non-responsive individuals exceeds a threshold and low-likelihood/low-risk if the threshold is not exceeded. In step five the training accuracy is calculated. This is then repeated for each multifactor combination. In step seven, the model with the best training accuracy is selected and evaluated in the test set. In step eight, the testing accuracy of the model is estimated. In step nine a permutation test is conducted to determine the statistical significance of the model(s). Steps 1 through 6 are repeated for each possible cross-validation interval. Bars represent hypothetical distributions of responders (left) and non-responders (right) with each multifactor combination. Dark-shaded cells represent high-likelihood genotype combinations while light-shaded cells represent low-likelihood genotype combinations.

The first stage of the genetic analysis examined variants reported to be associated with tuberculosis susceptibility in previous studies (Table 2). Because each variant represented a distinct statistical hypothesis (since each was being evaluated for replication), no correction for multiple testing was used.

In the second stage of genetic analysis, a nested candidate-gene study was compiled from the Affymetrix Xba genotyping arrays. Only SNPs in genes hypothesized to play a role in tuberculosis pathogenesis were included (see Table 4).

A Linux version of the MDR software was used for data analysis (compiled and benchmarked on a PC with a 600 MHz Pentium-III running Red Hat 2.2.5-15, written in C and compiled with the GNU C compiler). Presently, MDR software is being distributed in a JAVA version with a graphical user interface http://www.epistasis.org/mdr.html.

For both analysis stages MDR was also performed with genetic factors plus CD4+ lymphocytes and body mass index (BMI). This allowed for exploration of both gene-gene and gene-clinical factor interactions.