Background

Tuberculosis (TB) continues to be a challenge to control. Although widespread and common efforts have had an impact in achieving declining numbers in global incidence for the first time in history, TB still causes 10.4 million new cases and 1.4 million deaths per year in worldwide [1].

Brazil ranks sixteenth among the world’s 22 countries with high TB burdens; here, in 2016, the TB incidence was 63,189 cases, and the incidence rate was 30.9 per 100,000 per year, with mortality rate of 2.2 per 100,000 according to World Health Organization estimates [1]. The state of Espírito Santo has one of the lowest incidence of tuberculosis (28.6 / 100,000 / year) in Brazil [2]. Vitoria is a large urban setting and capital of Espírito Santo state, and reports over 279 cases of TB each year, which is among the highest incidence in the country (40.2/100,000/year) [3].

Understanding how TB transmission occurs is a key component to strategically manage TB from a public health perspective. Mycobacterium tuberculosis complex (MTBC) genotyping methods have been widely used in in molecular epidemiological studies [4]. These methods help to detect its spread, understand the dynamics of the disease, and develop tuberculosis (TB) control strategies to minimize TB expansion locally and globally.

Studies have suggested that identical IS6110 RFLP patterns of M. tuberculosis (Mtb) isolates from epidemiologically linked patients reflect TB resulting from recent transmission [5, 6]. Subsequent cases in transmission chains result in “clusters” of patients who share Mtb strains of the same genotype [7]. If a large proportion of new TB cases in a given community are due to recent transmissions, this is a reflection of an inadequate TB control program.

Many studies have investigated risk factors for clustering, suggesting that patient-related risk factors are important for TB transmission [5, 6, 8]. There is substantial evidence, however, that bacterial factors also contribute to variability in cluster size and the extent of transmission of TB in a community [8]. Indeed molecular epidemiologic studies have suggested that some strains are more successfully transmitted than others [9,10,11].

We performed genotyping of a large collection of Mtb strains that had been collected over a 11-year period in the metropolitan area of Vitoria, Espirito Santo state, Brazil and evaluated the relation between genotypes and clustering and strain and epidemiologic, clinical, and demographic characteristics. Our hypothesis was that this approach might help better understand risk factors for recent TB transmission in this particular setting.

Methods

Study population

This cross-sectional study examined all TB patients newly diagnosed in the metropolitan area of Vitoria, Brazil between 2000 and 2010. The Metropolitan area comprises four municipalities (Vitória, Cariacica, Serra and Vila Velha) with about 1,200,000 inhabitants. The study sample included isolates from all patients with positive culture results. The isolates were classified according to cluster size of Mtb strains and their associations with molecular and epidemiologic features were assessed.

Genotyping methods

IS6110 restriction fragment length polymorphism (RFLP) analysis

Sputum cultures for TB diagnosis are done routinely by the reference Mycobacteriology Laboratory at Núcleo de Doenças Infecciosas at the Federal University of Espirito Santo (NDI – UFES). We analyzed all available stored Mtb isolates that were consecutively obtained at reference laboratory for genotype analysis.

We used the standard IS6110 RFLP protocol [12] to genotype the isolates. Briefly, the genomic mycobacterial DNA was extracted, digested, and separated by gel electrophoresis. The DNA fragments resolved in agarose gel were transferred to a Hybond N-Plus membrane (GE Healthcare Life Sciences) and were hybridized with a probe made from a PCR product of the 3′ part of the PvuII fragment of IS6110. The IS6110 containing fragments on the membrane were detected by chemiluminescence (ECL direct™ nucleic acid labeling and detection system, GE Healthcare Limited, UK) and exposure to an X-ray film (A Hyperfilm™ ECL, GE Healthcare Limited, UK). The Mtb 14,323 strain was used as a reference strain for comparison of the RFLP patterns.

The IS6110 RFLP band patterns were analyzed by the BioNumerics software version 6.5 (Applied Maths – Belgium). A dendrogram was constructed to show the degree of similarity among the isolates by unweighted pair group method of arithmetic average (UPGMA) and the Dice index (1.0% tolerance, 1.5% optimization).

Two or more isolates with identical RFLP patterns (fingerprint) were defined as belonging to a cluster while strains with RFLP patterns of at least 70% similarity were considered members of the same “family”. As described in other studies, isolates belonging to a cluster were considered to result from recent infections while isolates whose RFLP patterns were distinctly different from any other pattern identified among the isolates studied were considered unique or non-cluster patterns and were considered to represent reactivation from an old infection. Clusters composed patterns with less than six bands were tested by spoligotyping as this increases cluster reliability [13,14,15,16]. We named the clusters in our study with an abbreviation of ES, for Espírito Santo State.

Spoligotyping

Isolates were also submitted to spoligotyping by a commercial kit (Ocimum Biosolutions Inc., India) according to a standard protocol [17, 18], allowing the classification of strains into spoligotype-based families, based on the presence or absence of spacer regions. Results were recorded in a 43-digit binary format and compared with an updated SpolDB4 [18] database – SITVITWEB [19] of the Pasteur Institute of Guadeloupe (available at http//:www.pasteur-guadeloupe.fr:8081/SITVITDemo/) that provides information on the Mtb spoligotypes worldwide. The orphan patterns were entered into SPOTCLUST [20] in order to define the probability of a strain to belong to a certain family.

Long sequence polymorphism (LSP)

A multiplex PCR adapted from Gibson et al. [21] was performed to identify isolates of the RDRio genotype. The differentiation of RDRio from non-RDRio was determined according to the PCR product band size; the presence of a band of 1175 bp indicated RDRio while a band of 530-bp identified non- RDRio strains.

Epidemiological, clinical and molecular characteristics

We obtained general epidemiologic characteristics including gender, age, race, schooling (years), and previous history of TB, from the Brazilian national surveillance system (SINAN) and also from laboratory records maintained at the NDI-UFES. SINAN is the Brazilian Information System for notifiable diseases and its data are publicly accessible via the website of the Data Processing Department of Brazilian Ministry of Health (DATASUS) [22].

The following socio-demographic variables were evaluated: age (< 20 years, 21–30 years, 31–40 years, 41–50 years and > 50 years), gender (male, female), race (white, black and others), and schooling (< 4 years, 4–8 years, > 8 years). The covariates related to TB included were: clinical form (PTB - pulmonary, EPTB - extra pulmonary, pulmonary + extra pulmonary), X-ray suspicious for TB (no, yes) and result of initial sputum smear (positive and negative). The genotype variables were those based on spoligotyping (LAM, non-LAM) and RDRio status (RDRio, non-RDRio).

Statistical analysis

In order to identify risk factors for clustering, we performed univariate analysis using t-test for continuous variables and chi-square or the Fisher’s exact test for categorical variables. Factors that were significantly associated with clustering were analyzed by a multiple logistic regression test by a stepwise approach to identify factors that were independent predictors of clustering. We analyzed the isolates’ distribution according to cluster size and observed that the sample was not normally distributed. Thus, we defined the “cluster size” into four categories (clusters with 2–5, 6–9 or ≥10 isolates/cluster and unique patterns). Descriptive analysis of molecular and epidemiologic data was performed, according to cluster size classification. Based on a theoretical model for the study of determining TB [23], we performed crude analyses and we built one hierarchical polytomous regression model to identify factors associated with cluster size. Despite the categories of cluster size to imply an order, they did not meet the assumptions of an ordered logistic regression. Thus, we chose polytomous regression, which allows us to model simultaneously these multiple categories without the order assumption.

In the model we included all isolates analyzed. Unique pattern was defined as the reference group and was compared with the three cluster size categories.

The hierarchical levels for both models were defined as follows: level 1: the molecular variables (spoligotype and RDRio Genotype); level 2: the variables of level 1 and demographic variables (municipality of residence); level 3: the variables of level 2 and socio-demographic variables (age, gender, skin color and schooling); and level 4: the variables of level 3 and clinical variables (X-ray suspicious for TB, result of initial sputum smear, and TB clinical form). Therefore, the total effect of each variable is adjusted for the variables at the same level and the levels above. Descriptive data were shown as absolute and relative frequencies or mean value and standard deviation. Results from association analysis were presented as odds ratios (OR) with confidence intervals of 95% (95% CI). All analyses were conducted with the Stata® statistical package, version 13.0 (StataCorp LP, College Station, TX, USA).

Results

Between January 2000 and December 2010, 5470 TB patients were diagnosed in the metropolitan area of Vitoria. Among these, 1320 (24%) had culture performed, and we obtained good quality RFLP patterns from 959 (72.6%) of them.

The IS6110 RFLP analysis demonstrated that 461 (48%) cases had an isolate that belonged to a cluster and 498 (52%) had a unique pattern (Table 1). Cluster size ranged from two to 34 isolates and 108 (11.2%) formed a cluster with 10 or more isolates, 87 (9.1%) with 6–9 isolates and 266 (27.7%) with 2–5 isolates.

Table 1 Distribution of characteristics of TB patients according to their M. tuberculosis isolates’ IS6110 RFLP cluster status

All clusters were grouped into 30 RFLP families and six of these comprised 24.1% of the clustered isolates, while 108 (11.2%) belonged to the six largest clusters (≥ 10 isolates).

The cluster with the highest number of isolates was ES14 containing 34 followed by ES1b, ES8, ES14o, ES19h and ES25, which included 20, 16, 15, 13 and 10 isolates, respectively (Fig. 1). The ES14 genotype has an eight band pattern and is a member of the largest family (n = 86), sharing this pattern with one to three additional bands. In addition, the ES14 cluster has been present throughout the 11-year period and in 2003, 68% of all TB cases belonging to the largest clusters were caused by three clonal groups ES14, ES19h and ES25. However, in 2007, the clusters ES14o, ES1b and ES8 contributed to 88% of all TB cases belonging to larger clusters.

Fig. 1
figure 1

Genotypic Profile of the six largest clusters found in the study. (a) – SIT (International Shared Type), (b) - RDRio profile - (0): not RDRio, (1): RDRio and (MP): mixed population

We found that all isolates of the ES14 cluster were of the RDRio genotype while the other large clusters ES1b (20 isolates) and ES8 (16 isolates) were exclusively non-RDRio (WT).

Of the total isolates spoligotyped, 448 (52%) were classified as LAM and 412 (48%) as non-LAM. Spoligotyping analysis for ES14 family showed one predominant sublineage (LAM9/SIT42; n = 42 [42/86 = 49%].

The results of univariate analysis of variables with genotype clustering status are presented in Table 1. Patients with extrapulmonary TB (EPTB) group were less likely to be infected with a cluster strain than those with pulmonary TB (PTB) (OR = 0.56, 95% CI 0.37–0.84; OR = 0.98, 95% CI 0.98–0.99, respectively). Individuals aged 21 to 30 years, 31 to 40 years, 41–50 years and > 50 years were less likely to be infected with a cluster strain than compared to individuals < 20 years of age (OR = 0.53, 95% CI 0.33–0.85, OR = 0.51 95% CI 0.31–0.82, OR = 0.56, 95% CI 0.34–0.92 and OR = 0.45, 95% CI 0.27–0.76).

Table 2 summarizes the demographic, clinical characteristics and laboratory findings of TB patients with isolates belonging to a cluster (2–5, 6–9 and ≥10) or a unique pattern genotype.

Table 2 Distribution of characteristics of TB patients according to their M. tuberculosis isolates’ IS6110 RFLP cluster size

The hierarchical polytomous regression model (Table 3) showed that at the first level, those patients in the 6–9 and with ≥10 isolates/cluster group were more likely to belong to the LAM lineage (adjusted OR = 1.17, 95% CI 1.08–1.26; adjusted OR = 1.25, 95% CI 1.14–1.37, respectively), using unique patterns as the reference.

Table 3 Crude and adjusted odds ratio by hierarchical polytomous regression analysis of the association of characteristics of TB patients according to their M. tuberculosis isolates’ IS6110 RFLP cluster patterns (unique patterns for reference)

On the other hand, subjects in the 2–5 isolates/cluster group were less likely to belong to the RDRio genotype (adjusted OR = 0.91, 95% CI 0.80–1.03). At the second level, living in a specific neighborhood (Serra city) in the metropolitan area of Vitória decrease the risk of being in the 6–9 isolates/cluster group (adjusted OR = 0.29, 95% CI, 0.10–0.84), using unique patterns as the reference group. In addition, at the third level individuals aged 21 to 30 years, 31 to 40 years and > 50 years were less likely of belonging the 2–5 isolates/cluster group than unique patterns compared to individuals < 20 years of age (adjusted OR = 0.49, 95% CI 0.28–0.85, OR = 0.43 95% CI 0.24–0.77 and OR = 0.49, 95% CI 0.26–0.91) respectively.

Suspicion of TB on chest x-ray was less likely observed for those infected with strains in the 2–5 isolates/cluster group (adjusted OR = 0.35, 95% CI 0.15–0.79) than unique patterns. Interestingly, the extrapulmonary disease was less likely to occur in those infected with strains in the 2–5 isolates/cluster group (adjustment OR = 0.45, 95% CI 0.24–0.85) than unique patterns. Furthermore, TB patients whose isolates were included in the 6–9 isolates/cluster group (crude OR = 2.68, 95% IC 1.13–6.34) were more likely to be smear-positive compared to unique patterns as the reference. The difference was not statistically significant, however, after regression analysis.

Discussion

Since two decades ago, many studies on transmission of TB have been complemented by genotyping techniques. The IS6110 RFLP test has been used to distinguish patients with TB due to recent transmission from reactivation disease [24]. In the present study, we evaluated transmission dynamics of TB in Vitoria, Brazil, during a 11-year period, comparing demographic, clinical and epidemiologic characteristics with Mtb genotypes and genotype clustering. We observed that a large proportion of recently transmitted TB was due to a limited set of Mtb genotypes and that certain cluster sizes were associated with patient demographic, clinical, or epidemiological characteristics.

A limitation of our study is that our patient data were derived from the SINAN secondary database with limited data. Data such as smoking, HIV status, drug abuse and drug susceptibilities testing, at the time of the study, are not regularly reported by SINAN. On the other hand, the same database was the basis for studies on disease surveillance as described in earlier studies [11, 22, 25, 26]. The strength of the study is the large sample size, offering a statistical power that is higher than in most other studies. In addition, the long study period increases the chance of finding epidemiological links through genotyping of the Mtb strains, while that a small sample size and a poorly defined area can underestimate clustering proportions [27].

We provide evidence that six Mtb (cluster) strains have consistently contributed to the high burden of recent-transmission TB from 2000 to 2011 in the Metropolitan area of Vitória-ES. They accounted for 12% of all culture-confirmed TB cases in this area during this period. This observation was possible probably because of the high case coverage and length of the study period, increasing the chance of genotype clusters, as reported by van Soolingen et al. [28]. This finding is in concordance with a failing TB control program that fails to stop disease transmission in this area, and better approach should be implemented. This may be related to a variety of factors including delay in diagnosis and poor contact investigation strategies. Maciel and colleagues recently suggested a possibility for implementing a new case-finding strategy based on screening populations in neighborhoods with high-density recent-transmission TB and social network analyses [29].

In the present study, we found that isolates of the ES14 family accounted for the largest proportion of recently-transmitted TB cases, which suggests that these strains are either more transmissible or more likely to cause disease after infection. Strains of this family were LAM (mostly LAM9) family and RDRio genotype. Isolates with the particular eight band pattern that is the basis of this family have been reported as predominant also in studies conducted in Rio de Janeiro, São Paulo and Rio Grande do Sul in Brazil, and is frequently encountered in a database of isolates originating from other countries such as the Caribbean, Europe, Africa and other countries in South America [21, 30,31,32]. These findings suggest that the incidence of TB in this region may be strongly influenced by a relatively small subset of actively circulating strains. It is known that in areas with a higher incidence of TB, RFLP patterns are often less variable than in low-incidence areas [33]. Recently Ribeiro et al. (2015) showed that new TB cases do not just cluster in space, but that certain M. tuberculosis lineages tend to cluster even after controlling for known individual and socioeconomic factors that can influence transmission [29].

In our study, the clinical manifestation of TB—pulmonary or extrapulmonary TB—was associated with any particular cluster size (2–5 isolates/cluster). Although Gomes and colleagues showed no association between the clinical manifestation of TB and clustering rates, interestingly when stratified by cluster size a larger proportion of EPTB cases were in particular cluster size [25]. Our earlier studies also demonstrated that RDRio strains are less likely to cause extrapulmonary disease than non-RDRio strains [11] and the high prevalence of RDRio strains in the present study could be associated with levels of clustering of EPTB cases but many other factors influence clinical manifestations of TB, including the duration of illness before diagnosis as well as underlying host factors.

Our findings are consistent with several studies which have demonstrated the predominance of isolates of the LAM family and of the RDRio lineage in TB cases in Brazil [9, 11, 34,35,36]. Lazzarini and colleagues showed that the LAM1 and LAM2 sublineages exclusively belonged to the RDRio genotype, while the lineages LAM4, LAM5, LAM6 and LAM9 included both RDRio and non RDRio genotypes, and LAM3 were all non-RDRio [9], although some exceptions on this rule were presented recently [32]. Indeed, previous studies showed that RDRio genotype is significantly associated with cluster groups (an indication of recent transmission) than non-RDRio strains, both in Brazilians and in non-Brazilian populations [36]. These data corroborate with findings in our study that showed that isolates from RDRio genotype belong to 6–9 isolates/cluster group. Although the proportion of cases with more 6–9 isolates/cluster patterns among RDRio strains was significantly greater than that non RDRio strains, it is not clear if this difference could be attributed to enhanced virulence and transmissibility of the RDRio strains. Previous studies suggested that these strains were recently introduced in some regions of Brazil and evolved after its introduction, or that the RDRio strains are more biologically “fit” [11, 36].

Glynn and colleagues suggested that these cluster strains are particularly transmissible or particularly more likely to cause disease after infection [36]. Other possibilities for their predominance are that they have been present in a geographic setting longer than others and that they had more time to become widespread, or that we are seeing a founder effect in some populations with subsequent spread following human migration patterns [33].

Conclusions

Our findings suggest that strains belonging to the LAM family and RDRio genotype showed are more likely to be largest clusters (6–9 and ≥10 isolates/cluster). We confirmed that the ES14 family is the most prevalent genotype of Mtb in Vitória – ES, Brazil, this suggesting, a large proportion of TB cases in one city can be caused by a few set of lineages circulating in the city. This provides an opportunity to characterize factors that affect transmission instead of host factors. Therefore, once an M. tuberculosis lineage enters in a community, that particular strain and its related family strains are more likely to propagate than outside strains.