Introduction

Tuberculosis (TB) is the leading cause of death among infectious diseases worldwide. China, which is the third highest-burdened country, accounted for 8.4% of the total global cases in 2019 [1]. The pathogen of tuberculosis is the Mycobacterium tuberculosis (MTB). Although approximately one-third of the people worldwide are infected with MTB, it is worth noting that only 3–10% of them eventually develop active clinical TB during their lifetime [2]. The occurrence or development of tuberculosis is determined by the complex interaction between three factors, the MTB strain itself, environmental, and host genetic factors [3,4,5]. Host genetics has been revealed to be important in determining disease progression and outcomes after MTB infection by many animal models studies, twin and family studies, as well as numerous case-control studies [6].

When MTB invades the host, it initially faces the innate immunity, which is modulated by the innate immunity genes. Pattern recognition receptors (PRRs) are key signaling molecules of the innate immune system that affect the initial identification of MTB [7]. Numerous studies have shown that genetic variation of the of PPRs, such as C-type lectin receptors (CLRs), Toll-like receptors (TLRs), RIG-I-like receptors (RLRs), and NOD-like receptors (NLRs) or their adapter protein-coding gene, is involved in modulating MTB-mediated immune responses and participate in determining the outcomes of MTB infection [8,9,10].

Macrophage-inducible C-type lectin (Mincle) is a newly described macrophage-inducible CLR, encoded by the C-type Lectin Receptor 4E (CLEC4E). Trehalose-6,6′-dimycolate (TDM), also known as cord factor, is the most abundant cell wall glycolipid of MTB that is important for the initial identification of MTB. It has been reported that Mincle could be considered as the mammalian receptor for TDM from MTB [11]. Lu et al found impaired production of interleukin-6 together with tissue necrosis factor in TDM-stimulated macrophages from Mincle−/− mice exposed to Malassezia spp [12]. Moreover, in response to a TB vaccine containing trehalose-6,6′-dibehenate (TDB), a synthetic analog of TDM, Mincle has been shown to have a pivotal role in the generation of Th1/Th17 cell immune responses and granuloma formation [13]. These studies all suggested that Mincle has a significant role in recognition of mycobacteria.

So far, only two studies have investigated the relationship between the CLEC4E gene and TB susceptibility in humans [14, 15]. Yet, these two studies both had a small sample size, and their results were controversial. In order to evaluate the possible function of CLEC4E gene variants in TB, more studies need to be conducted. Moreover, such research has never been carried out among the western China population. Consequently, this relatively large-scale study was designed to investigate whether the single nucleotide polymorphisms (SNPs) in the CLEC4E gene were associated with susceptibility to TB in a Han population from Western China.

Methods

Subjects

Chinese Western Han individuals with TB recruited from West China Hospital of Sichuan University between January 2014 and February 2016 were enrolled. TB was diagnosed according to TB guidelines [16] based on their laboratory test results, clinical symptoms, and radiological examination. The inclusion criteria are typical symptoms and signs of tuberculosis and meet at least one of the following conditions: (1) smear positive for at least two separate clinical specimens and/or culture positive for MTB and/or examination positive for MTB nucleic acid (TB-DNA) (2) CT and other imaging examinations showed typical manifestations of active tuberculosis. (3) pathological diagnosis supports tuberculosis lesions. Patients suffer from immunodeficiency, autoimmune diseases, or other infectious diseases were excluded. Healthy controls were enrolled from the same population in the same period from the Physical Examination Center in West China Hospital of Sichuan University. They were all healthy according to normal laboratory test, physical examination and imaging examination. Individuals with TB history or non-Han population were excluded. Finally, 900 TB cases and 1534 healthy controls were enrolled.

The study protocol has been reviewed and approved by the Ethics Committee of West China Hospital of Sichuan University. Written informed consent was obtained from all participants before performing any study-related procedure.

SNP selection and genotyping

Genetic variation information of the CLEC4E and intergenic regions of its upstream and downstream were obtained from the dbSNP database https://www.ncbi.nlm.nih.gov/SNP/. Haploview V4.2 was then employed to run the TagSNPs with a threshold of r2 greater than or equal to 0.8 from rescored SNPs. TagSNP with a minor allele frequency of (MAF) > 0.20 according to 1000 Genomes Project in East Asian population was selected.

Peripheral blood samples were collected from 2434 individuals and transferred to the biological specimen bank of resources of “Tuberculosis Researches” in the Department of Laboratory Medicine, West China Hospital, Sichuan University for preservation, and the demographic information of these individuals was gathered. Genomic DNA was extracted by the QIAamp® DNA Blood Mini Kit (Qiagen, Hilden, Germany). Improved multiplex ligation detection reaction (iMLDR) method (Genesky Biotechnologies Inc., Shanghai, China) [17] was used to genotype SNPs. 10% of samples were randomly selected for re-genotyping to check for concordance and the reproducibility of the genotyping was 100%.

Statistical analysis

The Chi-square test, independent t-test, and Mann-Whitney U test were applied for general variables. All these calculations were performed by SPSS 20.0 (IBM, Chicago, USA). The Hardy-Weinberg equilibrium (HWE), differences of genotype distribution, and allele frequency of candidate SNPs between the TB group and control group or between age and sex subgroup were analyzed using PLINK 1.07 software [18]. Unconditional logistic regression models were used to test for the dominant model and recessive model. Data were adjusted for age and sex. Odds ratios (OR) with 95% confidence intervals (95% CI) and P-values were calculated. Haploview version 4.2 was used to examine the linkage disequilibrium (LD) by D′ and r2 value, haplotype structure and haplotype frequencies were estimated. A P value < 0.05 was considered statistically significant. Bonferroni correction was used to correct for multiple testing.

Functional annotation

SNPs in strong LD (r2 > 0.90) with the SNPs associated with TB risk were identified according to the information from 1000 Genomes Project [19]. The DNAse, protein binding and transcription factor binding motifs were analyzed using HaploReg vesion4.1 [20]. Additionally, in order to identify whether these genes could provide more explanations for the associations observed in these SNPs, we used data from the GTEx project [21] so as to analyze if these variants have an effect on expression quantitative trait loci (eQTL). We searched the bioinformatics website lncRNASNP2 database [22] (http://bioinfo.life.hust.edu.cn/lncRNASNP/#!/) to obtain more information about long noncoding RNA (lncRNA).

Results

Characteristics of the study subjects

Finally, 900 TB patients and 1534 healthy Chinese Han individuals were enrolled in our study. The positive rate for TB-DNA results among patients was 50.5%, which was a little higher than those of MTB smear and culture (50.5% vs. 32.8 and 33.7%, respectively), as shown in a previous article published by our research group [23]. Compare with the control group, the ratio of male /female in the TB group was higher (1.151 vs. 1.514, P < 0.001). The median age was 41 (27, 57) years for the TB cases and 36 (29, 45) years for the controls (P < 0.001); details are shown in Table 1.

Table 1 Basic characteristics of the participants enrolled in the study

SNPs of CLEC4E are associated with susceptibility to TB

Four SNPs of CLEC4E (rs10841856, rs10770847, rs10770855, rs4480590) were chosen for genotyping. Genotype frequencies for these four SNPs were all in Hardy-Weinberg equilibrium (P > 0.05). All of these four SNPs had a frequency of variants > 0.20 and were included for further analysis. The chromosomal locations, functional annotations, p-values for the HWE test in control subjects, and MAFs of these candidate SNPs are summarized in Table 2. We performed haplotype analyses for all four variants in/near the CLEC4E gene. The LD patterns of these four CLEC4E SNPs are shown in Fig. 1; no haploblock was identified.

Table 2 Characteristics of CLEC4E SNPs
Fig. 1
figure 1

Linkage disequilibrium plot in D′ demonstrating adjacent strength between SNP pairs in the CLEC4E gene. D′ (A) and r2 (B) values were multiplied by 100. In (A), squares without a number have a value of 100, equal to a D′ value of 1. When two SNPs are completely linked, the D′ value is 1. In (B), squares without a number have a value of 80, equal to an r2 value of 0.8. The r2 values ≥0.8 were considered significant. The four SNPs in our study were not in linkage disequilibrium

SNPs of CLEC4E depended on sex and age

A weak correlation was identified for the mutant G allele and GG and GA genotype of rs10841856 and the susceptibility of TB before Bonferroni correction (Table S1). Likewise, a weak correlation was also observed between rs10841856 and the risk of TB in the dominant model; nonetheless, statistical significance was lost after Bonferroni correction (Table S2).

When the whole data were stratified according to sex, the mutant G allele frequency of rs10841856 among male TB subjects (47.69%) was higher than among male controls (40.61%). Also, the mutant G allele was strongly associated with TB risk, with an adjusted OR of 1.334 (95% CI: 1.142–1.560; P<0.001 after adjusting for age; P = 0.001 after Bonferroni correction). It was observed that the homozygous mutant GG (21.03% vs. 16.32%, P = 0.002, P = 0.008 after Bonferroni correction) and heterozygous AG (53.32% vs. 48.72%, P = 0.002, P = 0.008 after Bonferroni correction) were more common in the TB group than in the control group. Likewise, rs10841856 was significantly associated with TB susceptibility with an adjusted OR of 1.557 (95% CI = 1.222–1.984; P < 0.001 after adjusting for age; P < 0.001 after Bonferroni correction) in the dominant model, whereas, no significant differences of rs10841856 were found between female TB subjects and female controls. These results suggested that allele G of rs10841856 might be a risk factor in TB subjects, especially in males. Meanwhile, a weak association between rs10770847 G allele (P = 0.044 after adjusting for age) and the risk of TB was found among males; however, after Bonferroni’s correction for multiple testing, both associations lost statistical significance (Table 3 and Table 4).

Table 3 Comparison of CLEC4E SNPs polymorphisms in relation to TB risk in Chinese Han population stratified by sex
Table 4 Comparison of CLEC4E SNPs in relation to TB risk in the Chinese Han stratified by sex (dominant and recessive model)

Next, we stratified the whole data according to age, which showed no significant differences between TB subjects and control in the < 40 years age group or between TB subjects and control in the ≥40 years age group. There were also no statistically significant differences between the allele frequencies and genotype distribution of the other 3 loci (rs10770847, rs4480590, and rs10770855) in TB patients and healthy controls before or after the whole data stratification in relation to sex/age after Bonferroni’s correction (Table S3 and S4).

Functional annotation

Rs10841856 is an intronic region of CLEC4E. Using LD information from the 1000 Genomes Project, eight SNPs were strongly linked (r2 > 0.90) with rs10841856. Among them, rs11046135 was near the 5’UTR region of CLEC4E; rs7485954 was in the upstream transcript region, and the remaining six SNPs were located in intronic regions of the CLEC4E gene. Based on the data from the Encyclopedia of DNA Elements (ENCODE) project [24], rs7307228, rs4242896, rs7139227, rs11046135, and rs7485954 might fall in a strong promoter or/and enhancer activity region; rs10841847, rs7139227, rs10841856, and rs11046135 in a DNAse hypersensitivity site region; rs7139227 in a transcription factor binding region; rs10841847, rs7307228, rs6487242, rs7139227, rs4562874, rs10841856, rs11046135, and rs7485954 in the regulatory motif (Table S5). According to the GTEx project, these eight SNPs are expression quantitative trait loci (eQTLs) for CLEC4E and RP11-561P12.5 and are associated with a decrease in CLEC4E and an increase in RP11-561P12.5 (Table S6).

Discussion

The role of host’s genetic factor in tuberculosis susceptibility has gained increasing attention in TB research over recent years. Mincle is an indispensable receptor for TDM-induced innate immune responses (such as granuloma genesis) and in vitro macrophage activation during mycobacterial infection [25]. In the present study of the Western Han Chinese population, rs10841856 minor G allele of CLEC4E, which was the coding gene of Mincle, significantly increased the susceptibility to tuberculosis, especially among male subjects. Interestingly, Deo et al [14] suggested that for rs10841847, the minor G allele was a risk factor of pulmonary tuberculosis infection in a northern Chinese population. According to the 1000 Genomes Project, rs10841847, which is also an intronic variation of CLEC4E, is in strong linkage disequilibrium (LD) with rs10841856 (D′ = 0.95). Our findings on the association with TB risk of rs10841856 in male individuals supported the suggestion of the involvement of CLEC4E genetic polymorphism in TB. Nevertheless, Bowker et al [15] genotyped four tagSNPs of CLEC4E, reporting no differences in these SNPs between South Africa TB patients and controls. Such different observations might reflect the existence of many confounding factors, including ethnic background and sample size.

TB has a higher incidence in males than in females. In 2018, males accounted for 68%, while females accounted for only 31% of TB patients in China [1]. In our study, TB was also more common in males than in female individuals. Recently, Haiko et al [26] conducted a Genome-Wide association study that emphasizes the importance of sex-stratification analysis, because strong sex-specific effects are found on both autosomes and X chromosomes, and these effects should be considered when studying the association with SNPs and TB. When the whole data were stratified according to sex, in rs10841856, the G allele was a risk genotype for TB, especially in males. A significant difference was also found only in males when the association was calculated under the dominant model. This study showed the impact of sex on TB for CLEC4E rs10841856. Sex-specific effects of gene SNPs have been previously described in some diseases, including TB [27,28,29] As far as we know, this is the first report that described sex-specific interactions for variants in CLEC4E, which could be used as a basis for replication studies in independent populations.

The rs10841856 polymorphism is located in the intronic region. Although genetic polymorphisms in intron regions are not generally thought to cause changes in the encoded amino acids, they may affect splicing, transcription, and expression of genes [30,31,32]. According to data from the GTEx project, rs10841856 might be an eQTL of CLEC4E and RP11-561P12.5. Rs10841856 polymorphism decreased the expression of CLEC4E and increased the expression of RP11-561P12.5 in whole blood. The decreased expression of CLEC4E was associated with bacterial infection and has been observed in several studies [33, 34]. For MTB infection, Pahari et al [35]. observed that CLEC4E agonist could improve host immunity and reduced bacterial load in the lungs of the infected mice. They elucidated the novel role of CLEC4E in inducing autophagy during defending MTB infection. Rs10841856 might be associated with CLEC4E expression decrease, which may weaken the defense ability against MTB. RP11-561P12.5 is a lncRNA located at chromosome 12: 8700957-8720209, adjoining CLEC4E. Although there are scarce reports on the biological functions of RP11-561P12.5, according to the lncRNASNP2 database, RP11-561P12.5 may bind to miR-197-3p. Van Rensburg et al [36] demonstrated that the neutrophil-associated miR-197-3p showed significantly lower transcript levels in TB cases; meanwhile, miR-197-3p acted as a binding site on the 3’UTR region of IL-22 receptor IL22RA1, thereby affecting the production of IL-22 [37]. IL-22 can inhibit MTB growth within macrophages [38] and promotes the innate immune responses, thereby limiting damage during pathogen infections [39]. The rs10841856 polymorphism influences the expression of RP11-561P12.5. We speculated that by binding to miR-197-3p, lncRNA RP11-561P12.5 might have a similar mechanistic effect on the production of IL-22 that are also involved in the occurrence of TB.

No association of the other 3 SNPs (rs4480590, rs10770847, and rs10770855) was found with tuberculosis in this study after Bonferroni correction. To date, there was no TB-related research on rs4480590, rs10770847 and rs10770855. These three SNPs may not be related to TB risk in the Western China Han population. However, multicenter studies with large samples are needed to further verify these findings.

The present study has some limitations. Firstly, SNPs were mainly detected in the intrinsic region. Thus, variants in exons and regulatory genetic sequences should be taken into consideration, which means that more comprehensive and systematic variants of association studies are needed in the future. Secondly, the individuals involved in our study were all from the Western China Han population, which suggests that as same as for any novel genetic association, our findings should be replicated in other population and functional tests, and pathway analyses are required to validate our findings further.

In conclusion, the strong association was observed between the G allele and the dominate model of rs10841856 and the susceptibility of TB among males in a western Chinese Han population. Rs10841856 and its strong LD SNPs are associated with a decrease in CLEC4E and an increase in RP11-561P12.5. Accordingly, rs10841856 in CLEC4E might be a novel mutation that has a significant role in increasing the risk of TB among the male Han population from Western China.