Background

Tuberculosis (TB) is a highly contagious infectious disease caused by Mycobacterium tuberculosis(M.tb) that primarily affects the lungs and spreads through the respiratory tract [1, 2]. It is estimated that 25% of the global population is infected with M.tb. Despite the large global efforts at curbing the spread of M.tb complex strains, 10.6 million new patients develop TB every year [3,4,5]. China still has the second-highest number of TB infections globally [6]. Therefore, having a thorough and comprehensive understanding of the transmission mechanisms of M.tb is of great significance in the prevention and treatment of TB.

The metabolism of fatty acids is critical to the survival of M.tb within the host. M.tb utilizes diverse lipids as major carbon and energy source during infection. Fatty acids are degraded via beta-oxidation to generate reduced power and energy [7,8,9]. At the same time, fatty acids play a crucial role in the composition of the cell wall of M.tb [10]. Furthermore, it is worth noting that M.tb utilizes fatty acids to produce essential metabolic intermediates closely related to its virulence [11]. M.tb is capable of incorporating fatty acids into phospholipids or utilizing them as a source of carbon for energy storage through their conversion into triglycerides. This conversion process has been linked to the promotion of drug resistance in M.tb [12, 13]. In response to hypoxia, M.tb within macrophages loaded with lipids undergoes a process of accumulating neutral lipids, which results in the loss of acid resistance and the development of antibiotic resistance [14]. Some virulence genes can facilitate the spread of M.tb. Fatty acid metabolism plays a significant role in enhancing the virulence and pathogenicity of M.tb. Nonetheless, the exact regulatory mechanism of fatty acid metabolism in M.tb is still unclear, and there is limited research on how mutations in fatty acid metabolism genes affect the transmission of M.tb. Therefore, further investigation is necessary to gain a better understanding of these aspects.

Whole genome sequencing (WGS) is a reliable tool for studying M.tb transmission. In this study, WGS was used to study the influence of fatty acid metabolism gene mutations on the transmission of M.tb in China. Specifically, the genomic cluster was used to represent the transmission of M.tb [15].

Results

Sample description

The M.tb isolates were classified according to the seven geographical regions of China. The vast majority of M.tb isolates from Eastern China (66.8%), Southern China (15.4%) and Central China (4.5%), as shown in Fig. 1. The analysis revealed that the majority of M.tb isolates belonged to lineage2 (n = 2744, 85.93%), followed by lineage4 (n = 439, 13.75%), and a smaller number of isolates belonged to lineage3 (n = 10, 0.31%). Most of the isolates belonged to sub-lineage2.2, while there were fewer isolates belonging to sub-lineage4.4 and sub-lineage4.5. The M.tb isolates were clustered into 499 groups, with sizes ranging from 2 to 108 isolates. Of these clusters, those containing 2 isolates of M.tb were defined as small clusters, those containing 3–5 isolates were defined as medium clusters, and those containing 6 or more isolates were defined as large clusters. There were 86 cross-regional clusters, ranging in size from 2 to 6 regions, as shown in Table 1. The phylogenetic tree of M.tb isolates was constructed as described in Fig. 2.

Fig. 1
figure 1

Distribution of 3197 isolates of Mycobacterium tuberculosis in seven natural geographical regions of China

Table 1 Characteristics of Mycobacterium tuberculosis in China
Fig. 2
figure 2

Phylogenetic tree for the 3197 Mycobacterium tuberculosis isolates from China

The effect of mutations in fatty acid metabolism genes on clustering

After excluding positions with a mutation frequency lower than 0.01, we analyzed 73 mutation positions. During the comparison between clustered and non-clustered isolates, we observed significant differences (P < 0.05) for 43 mutation positions in fatty acid metabolism genes in the univariate analysis, as detailed in Supplement Table 1. Following univariate analysis, 73 mutation positions variables were selected for multivariate regression. To correct for possible confounding factors, we used the lineage and geographical location of M.tb as covariates in addition to the 73 mutation positions in fatty acid metabolism genes. Finally, five mutation positions of fatty acid metabolism genes with significant influence on clustering were determined (P < 0.05), as shown in Table 2. Among these, three mutation positions of fatty acid metabolism genes were identified as risk factors for clustering. The mutations were gca (136,605, 317G > C, Arg106Pro; OR, 22.144; 95% CI, 2.591-189.272), ogt(1,477,346, 286G > C ,Gly96Arg; OR, 3.893; 95%CI, 1.432–10.583), and rpsA (1,834,776, 1235 C > T, Ala412Val; OR, 3.674; 95% CI, 1.217–11.091), respectively.

Table 2 Analysis of the effect of fatty acid metabolism gene mutations on clustering

Effects of mutations in fatty acid metabolism genes on clustering of lineage2

Positions with mutation in genes involved in fatty acid metabolism in lineage2 frequency less than 0.01 were removed. We analyzed 55 mutation positions, and 18 fatty acid metabolism gene mutation positions showed statistically significant differences between clustered and non-clustered isolates (P < 0.05), as detailed in Supplementary Table 2. Following univariate analysis, 55 mutation positions of fatty acid metabolism genes were analyzed by multivariate regression. In order to correct the possible confounding factors, we used the geographical location of M.tb as a covariate in addition to the 55 mutation positions in fatty acid metabolism genes. The results showed that mutations in six fatty acid metabolism gene positions were significantly associated with the clustered isolates of lineage2 (P < 0.05), see Table 3. Among these three mutation positions were identified as risk factors for clustering, including ogt (1,477,346, 286G > C, Gly96Arg; OR, 3.952; 95% CI, 1.453–10.749), rpsA (1,834,776, 1235 C > T, Ala412Val; OR,3.636; 95% CI,1.204–10.982), and gca (136,605, 317G > C, Arg106Pro; OR, 22.789; 95% CI, 2.669-194.569).

Table 3 Analysis of the effect of fatty acid metabolism gene mutations on clustering of lineage 2

Effects of mutations in fatty acid metabolism genes on clustering of lineage4

The analysis focused on 33 fatty acid metabolism gene mutation positions in lineage4, which were selected by excluding those with frequencies lower than 0.01. In the comparison between clustered and non-clustered isolates of lineage4, the difference in the mutation of two fatty acid metabolism gene positions was statistically significant (P < 0.05). Results can be found in Supplementary Table 3. Following univariate analysis, 33 fatty acid metabolism gene mutation positions were included in a multivariate regression analysis. However, we included the geographical location of M.tb as a covariate in our analysis in order to control for possible confounding effects. The results showed that there was no risk factor for the clustered isolates of lineage4, see Table 4.

Table 4 Analysis of the effect of fatty acid metabolism gene mutations on clustering of lineage 4

The effect of mutations in fatty acid metabolism genes on the cross-regional transmission of M.tb

After screening out the positions with clustering mutation frequency less than 0.01, 61 mutation positions of fatty acid metabolism genes were analyzed. In comparison between the cross-regional and non-cross-regional clusters, 26 fatty acid metabolism gene mutation positions showed significant differences (P < 0.05), as detailed in Supplementary Table 4. Following univariate analysis, 61 mutation positions were included in multiple regression analysis, and we also included the lineage as covariate to correct for potential confounding factors. The results showed that five mutation positions of fatty acid metabolism genes had a significant influence on regional factors (P < 0.05), see Table 5. Among these, mutation position of arsA(3,001,498) was identified as cross-regional risk factors (885 C > G, Thr295Thr; OR, 6.278; 95% CI, 2.508–15.711). Notably, the arsA was synonymous mutations.

Table 5 Analysis of the influence of fatty acid metabolism gene mutations on cross-regional

Effects of mutations in fatty acid metabolism genes on cluster size of M.tb

A total of 61 mutation positions of fatty acid metabolism genes were analyzed. The results showed that 31 mutation positions were significantly associated with cluster size (P < 0.05). Among these, 20 mutation positions were found to be positively related to cluster size. Notably, seven of these mutation positions were synonymous, including fgd1 (491,742, 960T > C, Phe320Phe), fadB (957,117, 825T > C, Asp275Asp), fadH (1,306,259, 1968T > C, Ala656Ala), rpsA (1,834,177, 636 A > C, Arg212Arg), fadD15 (2,449,629, 1470G > A, Gln490Gln), fas (2,847,281, 2052T > C, Asp684Asp), and agpS (3,476,350, 612 C > T, Ser204Ser). For further details refer to Fig. 3.

Fig. 3
figure 3

Correlation analysis of fatty acid metabolism gene mutation positions and clusters

Discussion

Fatty acid metabolism plays a crucial role in the growth of M.tb. To investigate the impact mutations of fatty acid metabolism gene mutations on the spread of TB in China, we analyzed 3107 isolates of M.tb and 83 fatty acid metabolism genes. In China, most of the M.tb isolates belonged to lineage2 (Beijing lineage), followed by lineage 4 (European lineage), and lineage3 (South Asia lineage).Most of the clustered isolates (n = 1463,91.67%) also belonged to lineage2, which indicated that the main isolates of transmission belonged to lineage2 in China.

Based on our findings, we observed a missense mutation (317G > C, Arg106Pro) at position 136,605 of gca (Rv0112), and another missense mutation (1235 C > T, Ala412Val) at position 1,834,776 of rpsA (Rv1630). These mutations have been associated with increased risk of transmission of M.tb, particularly within lineage 2, and are also correlated with cluster size. Although some functions of gca remain unclear, they may be associated with the transport of the M.tb cell membrane and the synthesis of the cell wall, both of which play critical roles in the pathogenesis of TB. Further research is needed to fully understand the mechanism by which this mutation promotes transmission. RpsA (Rv1630) is the largest 30 S protein in the ribosome and plays a crucial role in translation. Mutations or deletions of rpsA can have a significant impact on the growth and metabolism of M.tb [16,17,18]. A missense mutation (c.1235 C > T p.Ala412Val) has been identified at position 1,834,776 of rpsA. This mutation promotes the spread of TB isolates and lineage2 isolates and is associated with cluster size. Interestingly, both the Beijing isolate of M.tb and multidrug-resistant isolates exhibit two non-synonymous single nucleotide polymorphisms in the ogt gene [19,20,21]. The researchers hypothesized that these mutations in ogt (Rv1316c) may contribute to the successful global distribution of these isolates, which is consistent with our findings. Our results revealed a missense mutation (286 g > C, Gly 96 arg) at position 1,477,346 of the ogt gene. The ogt gene encodes an enzyme called N-acetylglucosamine (O-GlcNAc) transferase, which is a glycosyltransferase responsible for catalyzing the addition of O-GlcNAc modification onto specific serine or threonine residues of proteins. O-GlcNAc transferase may play a role in regulating M.tb growth, adaptability, and pathogenicity by modifying and affecting key M.tb proteins. This missense mutation potentially promotes the transmission of M.tb isolates, including lineage 2 isolates, and could have implications for M.tb metabolism, cell wall synthesis, drug resistance, and other characteristics [22, 23]. In our study, we did not find any mutations in fatty acid metabolism genes that had an impact on the transmission of lineage 4. This could be partially attributed to the fact that our sample size may have been insufficient to capture rare lineage 4 strains or related mutations, and a larger sample set might provide more accurate results.

A synonymous mutation at position 3,001,498 of arsA (Rv2684) (885 C > G, Thr295Thr) has been found to affect the transmission of isolates across different regions. The expression of arsA allows M.tb to adapt to different environments within the host’s body. Specifically, arsA helps the bacterium to evade the host immune response [2, 24].

In addition, our results confirmed that both synonymous and non-synonymous mutations can affect the transmission of M.tb, indicating that synonymous mutations in fatty acid metabolism of M.tb are not all neutral mutations, which is consistent with the result that synonymous mutations in yeast genes studied by Xukang Shen are mostly strong non-neutral mutations [25].

Conclusion

The results of this study suggest that mutations in fatty acid metabolism genes may increase the transmission risk of M.tb, which highlights the need for further investigation into the effects of these mutations on M.tb control and dissemination. These findings provide valuable insights into the therapy of TB.

Method

Sample collection

A total of 1550 M.tb culture-positive cases were collected from two medical institutions from 2011 to 2018 in China: Shandong Public Health Clinical Research Center (SPHCC) and Weifang Respiratory Clinical Hospital (WRCH). All samples were collected anonymously and informed consent was not required. Our research was approved by the Ethics Committee of Shandong Provincial Hospital, which is affiliated with Shandong First Medical University.

DNA extraction and sequencing

Genomic DNA from 1447 isolates was extracted with Cetyltrimethylammonium Bromide (CTAB) and underwent quality control (QC). The Illumina HiSeq 4000 system was used to sequence the genomes [26], and the sequence data were deposited in the National Center for Biotechnology Information (NCBI) under BioProject PRJNA1002108. In addition, 1755 isolates of M.tb from 23 provinces, 4 municipalities, and 5 autonomous regions in China were included in this study [27,28,29,30,31,32,33,34]. See Supplementary Tables 56 for the sample number. A total of 3202 genomes were analyzed, and M.tb H37Rv was used as the reference genome sequence.

Single nucleotide polymorphism (SNP) analysis

To map the sequencing reads to the standard isolate H37Rv, the BWA Mem algorithm (version 0.7.17-r1188) was used. We only included samples with a coverage rate of 98% or higher and a minimum depth of at least 20% [35]. Variant calling was performed using Samclip (version 0.4.0) and SAMtools (version 1.15), and the resulting variants were further filtered by Free Bayes (version 1.3.2) and Bcftools (version 1.15.1). We excluded Single nucleotide polymorphisms (SNPs) located in repeat regions, such as polymorphic GC-rich sequences (PE/PPE genes) and direct repeat SNPs, as well as repeat bases identified by Tandem Repeat Finder (version 4.09) and RepeatMask (version 4.1.2-P1) [36, 37]. Finally, the SNP was annotated with SnpEff v 4.1 l, and the result was obtained with Python programming language [38].

Prediction of drug resistance

To identify drug resistance mutations, we compared known indels and SNPs using TBProfiler (version 2.8.12) and the tuberculosis database (TBDB) [39, 40]. We then searched for genotypic markers of drug resistance mutations in both first-line drugs (such as isoniazid, rifampicin, pyrazinamide, ethambutol, and streptomycin) and second-line drugs (such as ethionamide, quinolones, amikacin, capreomycin, and kanamycin), using a set of genetic polymorphisms. Mutations that were not correlated with phenotypic drug resistance were excluded as markers of genetic drug resistance [41]. For more information about the mutations detected as molecular resistance predictions in 3202 isolates, please refer to Supplementary Table 7.

Phylogenetic analysis

The isolates were divided into different lineages according to Coll et al. [42] (Supplementary Tables 56). The maximum likelihood phylogenetic tree construction was performed using IQ-TREE (verdion1.6.12) using the JC nucleotide substitution model, the gamma model of rate heterogeneity, and 100 bootstrap replicates [43]. M carneti CIPT140010059 was considered an outlier, and five isolates belonging to two lineages were excluded. The phylogenetic tree was visualized by iTOL (https://itol.Embl.De/). However, isolates of lineage1were excluded from further analysis because of their small number. Therefore, a total of 3193 isolates were included in the final analysis.

Propagation analysis

Cluster analysis was used to study the effect of fatty acid metabolism gene mutation on the transmission of M.tb. Clustering was defined as a group of isolates with less than 10 SNPs among each other (see Supplementary Table 8). To study the regional variations, the geographical location of the isolates in China was divided into seven natural regions. Then, the clusters were classified as cross-regional clusters or non-cross-regional clusters. The cross-regional cluster means that the strains in the cluster come from two or more different regions.

Acquisition of fatty acid metabolic genes

According to the NCBI database, a total of 83 fatty acid metabolism genes were obtained. Mutations in genes involved in fatty acid metabolism were done by bcftools (version 1.15.1) with an included filter parameter ‘FMT/GT="1/1” && QUAL > = 100 && FMT/DP > = 10 && (FMT/AO)/(FMT/DP) > = 0’. The results were shown in Supplementary Table 9.

Statistical analysis

The data are presented as a number (percent). The positions with mutation frequency < 0.01 in fatty acid metabolism genes were excluded from the analysis [44]. SPSS version 26 was used for statistical analysis. The comparison of categorical variables was done using the Pearson’s chi-square test or Fisher exact test as appropriate between clustered and non-clustered, as well as cross-regional and non-cross-regional clusters. Variables with univariate analysis were included in the binary logistic regression model for multivariate analysis. To analyze the effect of fatty acid metabolism gene mutations on cluster size, the rank correlation analysis of Spearman was carried out by using R version 4.1.0. All reported statistical tests were 2-sided, and P values < 0.05 were considered statistically significant.