Background

Drug-resistant tuberculosis (DR-TB) is a growing threat to global TB control efforts. The burden of DR-TB in high-burden countries is largely driven by transmission of those strains. Understanding factors driving DR-TB transmission and interventions aimed at reducing transmission may be critical for successful control of the DR-TB epidemic in these settings. Furthermore, addressing the heterogeneity of DR-TB transmission is important, as there is a wide geographical variation in disease burden within and between settings as well as localized transmission in subpopulation.

Molecular epidemiological studies have been useful in a number of countries in supporting TB control by identifying drivers for transmission. These studies have shown that patient-related risk [1, 2], environment [2, 3] and bacterial factors influence TB transmission [4]. However there are varying findings on risk-factors for clustering between studies, particularly between the low-incidence and high incidence countries [5, 6]. In low-incidence countries, risk-factors such as alcohol and drug abuse, immigrant status, homelessness, urban residence, and young age are the major risk-factors influencing clustering [7,8,9,10,11]. Whereas, in lower-middle income countries, information on risks-factors is scarce. Only few studies in high-TB incidence countries have assessed the risk-factors for clustering. The risk-factors identified in these studies include age [12], prior imprisonment [13], treatment failure, visitation of social settings such as bars and churches [14] as well as Human Immunodeficiency Virus (HIV) infection [15,16,17].

South Africa (SA) has one of the highest burdens of DR-TB in the world. The prevalence of DR-TB varies greatly across different provinces, with majority of DR-TB cases in Kwa-Zulu Natal (KZN), Western Cape (WC), Eastern Cape (EC) and Gauteng (GP) [18]. This variation in burden of the DR-TB could be due to varying distribution of individual and community level risk-factors, and variations in TB control programme performance. Thus, a better understanding of risk-factors for clustering can help to direct resources and efforts to specifically targeted high-risk groups as well as areas that contribute disproportionately to transmission.

This study aimed to identify the potential risk-factors driving DR-TB transmission in SA, using data collected from a sentinel molecular surveillance of Rifampicin-Resistant-TB (RR-TB) which was conducted between 2014 and 2018. In addition, we aimed to describe the characteristics of cases by cluster size and investigate whether risk-factors vary by cluster size.

Methods

Study population and setting

The study used retrospective data from sentinel molecular surveillance of RR-TB. The study included culture-positive samples from patients newly diagnosed with RR-TB via Xpert M. tuberculosis/RIF or Xpert M. tuberculosis/RIF Ultra assay between 2014 and 2018. The surveillance was implemented at eight of the nine provinces, with at least one district targeted per province. These provinces included: Nelson Mandela Metro (EC), Frances Baard (Northern Cape [NC]), Ehlanzeni (Mpumalanga [MP]), Dr Kenneth Kaunda (North West [NW]) and Umgungunglovo (KZN), City of Johannesburg (GP), Mangauang (Free State [FS]), City of Cape Town Metro, Cape Winelands and West Coast (WC). All RR-TB samples were submitted to the Center for TB (CTB), at the National Institute for Communicable Diseases (NICD), in Johannesburg for culture and genotyping. All culture confirmed samples were genotyped by combination of spoligotyping and 24-loci MIRU-VNTR typing. Said et al. (19) provides a detailed description of the study’s design, study population, and laboratory [19].

Cluster definition

Clustered cases were defined as two or more patients having identical patterns by both spoligotyping and 24-loci MIRU-VNTR typing. A non-clustered (unique) case was defined as any case from the study population having a unique pattern not shared by any other case.

Multi-drug-resistant (MDR) TB was defined as resistance to at least isoniazid (INH) and rifampicin (R); while extensively drug-resistant (XDR) TB was defined as MDR-TB with additional resistance to any fluoroquinolone (FLQ) and to at least one of the three injectable second-line drugs: amikacin (AMK), kanamycin (KAN) and/or capreomycin (CAP).

Analysis

Descriptive statistics were used to present the number and proportion of clustered strains, non-clustered strains, clusters and distribution of cluster size. We defined the size of a cluster by categorising cases into four groups: 2–5 cases per cluster [small cluster], 6–10 cases per cluster [medium cluster], 11–25 cases per cluster [large cluster], and ≥ 26 cases [very large cluster].

We investigated risk-factors for cases belonging to molecular clusters of different sizes. Our outcome of interest is a categorical variable with five levels, therefore multinomial logistic regression which is an extension of the simple logistic regression was used. For each risk factor, an odds ratio (OR) was calculated for clustered cases (four cluster size outcomes) and cases not in a cluster formed a comparison group. Risk-factors were investigated at single-variable analysis and variables with an association of p < 0.2 were included in the initial multivariable model. The final multivariable model was built by stepwise backward elimination of variables which did not contribute significantly to produce a final model. A p-value of 0.05 was used as threshold. For each risk factor, an OR was calculated for clustered cases (four cluster size outcomes) and cases not in a cluster formed a comparison group.

Exposure variables (from questionnaire) included: demographic (age, sex, income, and province), clinical characteristics (previous treatment and HIV status), high-risk work settings for transmission (health care worker and mine workers) and laboratory finding (sputum smear result and drug susceptibility profile).

Result

During the 5-years study period, a total of 374,399 TB cases were reported by the TB surveillance program for the ten districts in 8 provinces included in the study. The TB surveillance program reports only laboratory confirmed TB cases which is based on a positive TB result for either an Xpert MTB/Rif or Xpert MTB/ultra assay, culture, line Probe assay or smear microscopy. Of the 374,399 cases, 17,399 were RR-TB (3365 from Nelson Mandela Metro in EC, 919 from Mangauang in FS, 4042 from City of Johannesburg in GP, 1533 from Umgungunglovo in KZN, 2798 from Ehlanzeni in MP, 1383 from Dr Kenneth Kaunda in NW, 605 from Frances Baard in NC and 2754 from three districts in WC). The current study is a sentinel surveillance and included only patients who provided written informed consent and a second sputum sample for the study. A total of 2893 culture confirmed RR-TB cases had genotyping results which is 17% of the reported RR-TB cases in the 10 districts.

Of the 2893 with genotyping results, 864 (29.9%) were collected from the three district in WC, 696 (24.1%) were from Nelson Mandela Metro, 419 (14.5%) were from Ehlanzeni, 343 (11.9%) were from Dr Kenneth Kaunda, 224 (7.7%) were from Umgungunglovo, 138 (4.8%) were from City of Johannesburg, 132 (4.6%) were from Frances Baard and 76 (2.6%) were from Mangauang. For one (0.03%) isolate, no information on province was available.

Strain families based on spoligotyping could be assigned to 2752 (95.1%) cases. The most common lineage was Beijing family identified (1432/2752,52.0%), followed by LAM (323/2752,11.7%), T (263/2752,9.6%), EAI (208/2752,7.6%), S (172/2752,6.3%), X (204/2752,7.4%), H (86/2752,3.1%). The remaining 2.3% (64/2752) isolates belonged to other genotype families.

A total of 51.8% (1498/2893) of the isolates belonged to molecular clusters. A total of 277 clusters were identified, with cluster size ranging from two to 259 isolates. Most clusters (226/277,81.6%) were small (2–5 cases), 10.8% (30/277) were medium sized (6–10 cases), 13/278 (4.7%) were large (11–25 cases) and 2.9% (8/277) were very large with 26–259 cases.

Characteristics of study population

Questionnaire data was available for all the provinces with the exception of WC. The characteristics of patients for the seven provinces is summarized in Table 1. For WC, only demographics (age and gender) and laboratory test results (sputum smear status and drug susceptibility testing) were available from the laboratory information system (Table 2).

Table 1 Characteristics of DR-TB cases in molecular clusters of different sizes in seven high-burden districts
Table 2 Single and multiple variable multinomial logistic regression analysis for risk-factors associated with cases in a molecular cluster of different sizes in South Africa (2014–2018)

Characteristics of patients from the seven provinces

The age of patients enrolled in the surveillance ranged from 18 to 89 years (interquartile range (IQR): 29; 45). Over a half 1100/1956 (56.2%) of all cases were males. The majority 1245/1693 (73.5%) of patients with known occupational status were not in employment. Of those with employment, 11.8% work in health care system and 20.1% patients work in mines. Information on previous history of TB was available for 82.2% of cases; of these 52.0% had been previously treated for TB. Information on HIV testing was known for 81.8% of cases; of these 73.7% were HIV positive.

Sputum smear results were available for 99.8%, of which over a half (71.1%) were smear positive. Drug susceptibility testing (DST) for at least INH and RIF was available for 98.5% of the isolates. MDR represented over half of the resistant strains (56.7%).

Characteristics of patients from three districts in Western Cape

The proportion of males was higher (60.2%) than females (37.5%), while the sex for 0.2% of the patients were not available. The age ranged from 18 to 77 years (IQR: 29; 45). Sputum smear results were available for 89.0%, of which 50.1% were smear positive. Drug susceptibility testing data was available for 99.4% of the isolates. Of those, the majority (69.8%) of the isolates were MDR-TB (Table 3).

Table 3 Characteristics of RR-TB cases in molecular clusters of different sizes and single-variable multinomial logistic regression analysis for risk-factors associated with cases in a molecular cluster of different sizes in Western Cape (2014–2018)

Factors associated with clustering

Single-variable multinomial logistic regression analysis of cases from the seven surveillance sites are shown in Table 2; Figs. 1, 2, 3 and 4. Factors which were significantly associated after adjustment in the univariate analysis were included in the final multivariable model (Table 2). Patients in the 11–25 and with ≥ 26 isolates/cluster group were more likely to be infected by Beijing family (OR = 0.32, 95% CI 0.12–0.82 and OR = 0.23, 95% CI 0.12–0.46, respectively), having XDR-TB (OR = 5.08, 95% CI 2.26–11.40), living in EC (OR = 5.14, 95% CI 2.07–12.76 and OR = 6.53, 95% CI 3.46–12.35, respectively) or KZN (OR = 5.52, 95% CI 2.00-15.33) provinces, and having history of imprisonment (OR = 3.24, 95% CI 1.39–7.51). Individuals living in NC (OR = 0.51, 95% CI 0.29–0.87) or infected with RIF-R TB (OR = 0.60, 95% CI 0.40–0.91) were less likely to belong in a cluster > 5 isolates/cluster group (Table 2). However, being HIV positive, being previously treated, smear grading 2 + and 3+, having MDR-TB were associated with large or very large clusters only in the univariate analysis.

Fig. 1
figure 1

Coefficient plots of adjusted odds ratios with 95% confidence intervals from multinomial logistic regression analysis (Cluster = 2–5 cases)

Fig. 2
figure 2

Coefficient plots of adjusted odds ratios with 95% confidence intervals from multinomial logistic regression analysis (Cluster = 6–10 cases)

Fig. 3
figure 3

Coefficient plots of adjusted odds ratios with 95% confidence intervals from multinomial logistic regression analysis (Cluster = 11–25 cases)

Fig. 4
figure 4

Coefficient plots of adjusted odds ratios with 95% confidence intervals from multinomial logistic regression analysis (Cluster = ≥ 26 cases)

Multivariate analysis was not performed for WC as there was no risk-factors data (questioner data). In addition, some of the numbers in the available data for demographics and laboratory were small, therefor the confidence interval of the regression model was too wide making the analysis not meaningful. In the univariate analysis, having smear grading 2 + was significantly associated with cluster size n = 11–25. Patients infected with MDR and XDR TB are more likely to be in cluster ≥ 26 isolates/cluster group or cluster size n = 6–10. The odds for patients infected with Beijing was 15.8 times more likely to be in cluster ≥ 26, while only 1.72 times more likely to be in small cluster (2–5 cases) (Table 3).

Discussion

The DR-TB epidemic has been attributed to several drivers, including environmental, social, and host-related risk-factors that promote transmission. In high-burden settings such as SA, considerable demographic and geographic heterogeneity in DR-TB transmission exists, implying that specific risk groups as well as high-burden areas might be prioritized for targeted intervention. Thus, in this study, we analyzed potential risk-factors for genotypic clustering in SA, during a five-year period, by comparing demographic, clinical and epidemiologic characteristics with cluster sizes. To our knowledge, this study is the largest that has been conducted in SA to assess risk-factors related to transmission.

The majority (81.6%) of the clusters identified in the study were small with few large and very large clusters identified mainly in districts from WC, EC and MP. Being part of a cluster suggests that M. tuberculosis was recently transmitted to the patient [1]. The size of clusters could depend on a number of factors related to the host, environment or differences in the strains themselves. In this study, specific cluster sizes were associated with either patient demographic, clinical, or epidemiological characteristics. Cases in either large or very large molecular clusters were more likely to have multiple risk-factors.

Variation in the distribution of clusters of DR-TB in different setting indicates different transmission dynamics. Living in Nelson Mandela Metro, EC was found to be a risk-factor in both univariate and multivariate analysis. EC province has the third highest number of people infected with DR-TB in SA [20]. The cases for the current study were from Port Elizabeth in the Nelson Mandela district, which is one of major city in the EC. TB is a major public health challenge in this district. One in 100 people is infected with TB and 90% of those diagnosed with TB are also co-infected with HIV and/or AIDS. The district have also a third highest rate of persons loss-to-follow on treatment [21]. Given these challenges the current TB control strategy need to implement rigorous TB and DR-TB surveillance systems for early case detection and treatment as well as improved transmission control measures.

In contrast, living in Mangauang, FS or Frances Baard, NC was associated with small cluster size in the univariate analysis, and the association remained significant for Frances Baard, NC in the multivariate analysis. Small cluster sizes may indicate small close contact transmission or reactivation of disease, emphasizing the importance of contact case investigations and infection control as the primary intervention in these areas.

In the univariate analysis, XDR-TB was associated with all cluster sizes (except small clusters). In the multivariate analysis, RR-TB was associated with small cluster sizes, while XDR-TB with medium and very large cluster sizes. DR-TB strains are more likely to be clustered than drug-sensitive cases due to the long treatment duration which might provide greater opportunity for transmission. Community-based active case-finding interventions is important, particularly in those settings where DR-TB cases are transmitted which may be attributed to community contacts. In addition, educating household and community members about DR-TB transmission, attitudes and prevention practices is needed.

In high-incidence setting, smear positivity is expected to be associated with clustering. Almost 90% of the TB transmission in the community is associated with sputum smear positive [22,23,24], as smear-positive patients often have more advanced disease and higher bacterial loads than those who are smear-negative. Also, higher smear grading could have a higher chance of transmitting disease and developing active TB among contacts than those with lower grading [23]. A meta-analysis study reported that compared to scanty, the sputum smear grading 2 + and 3 + were significant risk-factors in all the studies included [25]. In this study, smear grading 2 + and 3 + were associated with clustering only in univariate analysis. The lack of association between cluster size and smear grading might be due to the source cases being outside the sampled study population. The first cases for the large clusters are usually not identifiable. The majority of TB transmission in high-burden setting does not come from known contacts [26,27,28].

Being HIV positive was risk-factor for clustering (small and very large cluster size), but was not significant in the multivariate analysis. The role of HIV coinfection remains unclear, with some studies finding an increase in clustering of TB with HIV infection [12, 29] and others finding no association [30,31,32,33].

Workers of certain occupational sectors such as mining and healthcare sector are at particular risk for TB and transmission. In this study neither working in a mine or health sector were associated with clustering in the multivariate analysis. However, the majority (68%) of cases in this study were unemployed, which might limit the statistical power of this finding.

Large clusters of prevalent genotypes can become established in a certain area due to prolonged and uncontrolled transmission. The univariate analysis in this study showed infection with Beijing genotype are more likely to be medium cluster size (6–10 cases). The further multivariate analysis, however, showed association of Beijing genotypes with all cluster sizes. Beijing strains have wide spread distribution globally and are known to be associated with high clustering [34,35,36,37]. The majority of large or very large clusters in this study belonged to Beijing family, which suggests that these strains might have greater transmissibility [37].

Multivariate analysis was not performed for districts in WC, as there was no questionnaire data available. In the univariate analysis, smear positivity was a risk-factor for medium and large cluster sizes. The odds for patients infected with MDR was 7.7 times more likely to be in very large cluster (≥ 26 cases). The odds for Beijing genotype was 15.8 times more likely to be in cluster ≥ 26 as compared to only 1.49 times more likely to be in small cluster (2–5 cases). The Beijing genotype is endemic strain in WC. It was linked to an outbreak of MDR-TB at a school [38] and a subgroup of the Beijing family of strains (R220 genotype) were identified as commonly transmitted DR-TB strains in the province [39].

The study had a number of limitations. First, there is a selection bias in the study population because only culture positive samples in selected districts were included. Also, the study is a sentinel surveillance included only patients who accessed health care and consented to provide second sputum sample, thus patients who did not consent, undiagnosed and/or died in the community would not be included. As a result, our findings may not be generalizable to the entire SA population. Second, we were not able to obtain risk-factors data for all enrolled participants. Third, sample collection in the different provinces occurred during different time periods due to implementation considerations (approvals, logistics etc.), which could have impacted clustering analysis. Areas that had shorter sampling durations may have missed transmission events and underestimated clustering. Fourth, the clusters in study were not supported by contact investigations to confirm the linkages between clustered isolates using epidemiological data. Lastly, the possibility of overestimating clustering and recent TB transmission-rates is possible considering that the basis of the clustering analysis was done using traditional typing, whereas WGS could have offered a better resolution of strains and further discrimination between individuals in clusters. Despite these limitations, our study provides important information on risk-factors that might be contributing to the high DR-TB transmission in SA.

Conclusion

Sociodemographic, clinical and bacterial risk-factors influenced rate of M. tuberculosis genotypic clustering. Hence, high-risk groups and hotspot areas for clustering in EC, WC, KZN and MP should be prioritized for targeted intervention to prevent ongoing DR-TB transmission.