Introduction

Corynebacterium pseudotuberculosis is a Gram-positive bacterium that infects several different species of mammals. Strains of the biovar ovis infect sheep and goats, and strains of the biovar equi infect larger mammals such as horses, camels, and buffaloes. The manifestation of the infection depends on the host [1,2,3,4]. This bacterium causes significant economic loss to animal production all over the world due to reduced production of wool, milk and meat, carcass condemnation, as well as the death of infected animals [4,5,6]. C. pseudotuberculosis can also affect humans, causing distinct kinds of lymphadenitis. Contamination occurs through contact with infected animals and consumption of infected food [4, 5, 7].

This organism affects several countries such as Australia, Brazil, Canada, Egypt, Israel, New Zealand, South Africa, United Kingdom and United States [4, 8,9,10,11,12,13,14,15,16,17]. Cases in other countries such as Portugal [18], Mexico [19] and Equatorial Guinea [20] have been reported in the recent years. In the United States, C. pseudotuberculosis infections are reemerging and considered endemic [19], and the state with the highest number of cases of this bacterium was Texas, which borders Mexico [21]. The spread of C. pseudotuberculosis to other countries brings out the importance of improving the understanding of this bacterium. In the present study, six Mexican C. pseudotuberculosis strains were investigated, two from the biovar equi and four from the biovar ovis. This is the first time that strains of this bacterium, isolated in Mexico, have been completely sequenced. Among those strains, these are the first isolates of the biovar equi coming from this country [19]. The characterization of these strains is important for achieving a better understanding of this species, considering they can present relevant features not yet identified in other strains.

Organism information

C. pseudotuberculosis is a pathogenic bacterium that belongs to the CMNR ( Corynebacterium , Mycobacterium , Nocardia , and Rhodococcus ) group. This group is characterized by high GC content (46–74%) and by the structure of the cell wall which is mainly composed of peptidoglycan, arabinogalactan and mycolic acids [4, 22]. C. pseudotuberculosis is placed in the phylum Actinobacteria , class Actinobacteria , order Actinomycetales , suborder Corynebacterineae and genus Corynebacterium [23,24,25,26,27,28,29,30]. The species is considered a facultative intracellular pathogen [4, 31] which is Gram-positive, pleomorphic, non-motile, non-sporulating, mesophilic and can survive both in the host and in the soil [25, 31,32,33,34,35]. Its strains are classified into two biovars, ovis and equi, according to its host preference and nitrate reduction capacity, which is identified through the presence or absence of the narG gene in a PCR Multiplex test [36]. The biovar equi can reduce nitrate and affects mostly large ruminants. The biovar ovis is not able to reduce nitrate and affects mostly small ruminants [4]. More information about classification, general features of this species and some details about the target strains are shown in Table 1 (Additional file 1).

Table 1 Classification and general features of Corynebacterium strains MEX1, MEX9, MEX25, MEX29, MEX30, and MEX31 according to the MIGS recommendations [41]

Six C. pseudotuberculosis strains were isolated in Mexico from different hosts and biovars. The strain MEX1 was isolated from a retropharyngeal abscess in a goat. The strain MEX9 was isolated from a prescapular abscess in a goat. The strain MEX25 was isolated from a parotidean abscess in a sheep. The strain MEX29 was isolated from a retropharyngeal abscess in a sheep. These four strains presented negative result for the presence of the narG gene in the PCR multiplex test and were classified as belonging to the biovar ovis. All ovis strains were obtained from outbreaks occurred relatively close to Mexico City. MEX30 and MEX31 were isolated from abscesses in the pectoral muscles of two horses [19]. These two strains were positive for the presence of the narG gene in PCR Multiplex. Consequently, they were classified as belonging to the biovar equi. Although both equi strains were obtained in the same city, they could be considered as isolated cases.

To verify the phylogenetic relationship of these strains to other strains of C. pseudotuberculosis , we generated a phylogenetic tree (Fig. 1) based on the core proteome and progressive refinement, using a bootstrap value of 100. The tree was generated using the PEPR software (https://github.com/enordber/pepr.git) with the Maximum-Likelihood method. The Mexican strains were clustered according to the respective biovars and host preferences, as shown in previous works) [1, 37].

Fig. 1
figure 1

Phylogenetic tree of new Corynebacterium pseudotuberculosis strains of this work inside the rectangles, with other strains of the group CMNR. The blue rectangles highlight the biovar ovis strains and the red rectangle highlights the biovar equi strains of this work. The numbers near the nodes indicate bootstrap values

MEX30 and MEX31 were isolated in Valparaiso, in the first reported case of infection of horses in Mexico [19]. They clustered together probably because they came from the same source, that could be transported infected animals. Affected horses were identified in all regions of the US and the state of Texas, which borders Mexico, has the highest number of cases) [9, 21].

Ovis strains were isolated in Tlaxcala (MEX1) and Rio Frio de Juárez (MEX29), with a 50 Km distance from each other, and Guanajuato (MEX9 and MEX25), within a 400–450 Km distance from the two other isolation localities. However, the strains cluster by host rather than locality of isolation. MEX1 and MEX9 were isolated from goat and MEX25 and MEX29 were isolated from sheep. However, MEX25 and MEX29 (goat) clustered with isolates from lhama (USA) and cow (Israel), while MEX1 and MEX9 (sheep) clustered with isolates from goat and sheep (Brazil), all with a 100% bootstrap. Strains of Ovis biovar are more clonal but does not show the same degree of clustering by the host as Equi [1, 37]. Considering a maximum distance of 450 Km between localities of isolation, this genetic structure could better be explained by farming history than host preference. The goat and sheep farms could have different sources of Ovis strains. Transportation of infected animals and further contact and transmission of the disease probably occurred between farms of the same host species [38,39,40].

Genome sequencing information

Genome project history

The present project is a collaboration between the National Autonomous University of Mexico (UNAM), Mexico City, Mexico, and the Federal University of Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, Brazil. The six C. pseudotuberculosis strains were isolated by UNAM researchers. Sequencing was performed at the National Reference Laboratory for Aquatic Animal Diseases (AQUACEN), and the two processes of assembly and annotation were performed at the Laboratory of Cellular and Molecular Genetics (LGCM), both laboratories located at UFMG. All genomes are complete and available at the National Center for Biotechnology Information (NCBI). This information is shown in Table 2 and conforms with MIGS recommendations [41]. As mentioned above, the present study presents the first sequencing of C. pseudotuberculosis , and the first isolation of the biovar equi, from Mexico. This data can provide new insights into the diagnosis and treatment of diseases caused by this organism.

Table 2 Project information

Growth conditions and genomic DNA preparation

The samples used in the present study are in the sample collection of LGCM. All six strains were grown in a brain-heart-infusion media (BHI-HiMedia Laboratories Pvt. Ltd., India) with 1.5% of bacteriological agar and supplemented with 0.5% of Tween 80, at 37 °C for 72 h under rotation. Genomic DNA was extracted following the protocol of Pacheco et al. [36].

Genome sequencing and assembly

The first step in sequencing each genome was the library construction, following manufacturer’s recommendations (IonXpress™ Plus gDNA Fragment Library Preparation). This was performed in three steps: (i) DNA fragmentation using the Ion Shear™ Plus Reagents Kit, (ii) addition of adapters using Ion Xpress™ Barcode Adapters and (iii) library amplification using the Ion PGM™ Template OT2 200 kit (all kits from Thermo Fisher Scientific, USA). The resulting library was put on the semiconductor chip Ion 318 Chip Kit v2 (Thermo Fisher Scientific) and then into the sequencer Ion Personal Genome Machine™ (Thermo Fisher Scientific). The number of reads and the mean read length of MEX1, MEX9, MEX25, MEX29, MEX30 and MEX31 strains are respectively: 1,100,551 and 244; 1,496,261 and 201; 1,117,243 and 206; 1,371,907 and 230; 1,127,325 and 186; and, 1,262,316 and 230.

The assembly process was managed using SIMBA software [42]. The quality assessment of the reads was performed using FastQC software [43]. The assemblies were performed using SPAdes version 3.6 [44] on MEX1 and MEX31; and, Newbler version 2.9 (Roche, USA) on MEX9, MEX25, MEX29, and MEX30. This produced the following contigs: 6 on MEX1, 7 on MEX9, 7 on MEX25, 9 on MEX29, 33 on MEX30 and 13 on MEX31. The N50 s were: 543,202 on MEX1, 372,309 on MEX9, 543,326 on MEX25, 367,275 on MEX29, 103,276 on MEX30 and 535,978 on MEX31. The QUAST software [45] was used to evaluate the quality of the assemblies for all strains. The scaffolds were constructed using CONTIGuator software version 2.0 [46] with C. pseudotuberculosis strain 29,156 (CP010795.1) as a reference to MEX9, MEX25 and MEX29, C. pseudotuberculosis strain MEX9 as a reference to MEX1, C. pseudotuberculosis strain 316 (CP003077.1) as a reference to MEX30 and C. pseudotuberculosis strain E19 (CP012136.1) as a reference to MEX31. Gap closure was performed using CLC Genomics Workbench 7 (Qiagen, USA). This process resulted in six complete genome sequences.

Genome annotation

Genome annotation was performed in two steps: automatic annotation and manual curation. The RAST [47] and tRNAscan-SE [48] software were used in the automated annotation. An in-house script was also employed to transfer the annotation from a reference genome. The Artemis software version 16.0.0 [49], the UniProt [50] and the National Center for Biotechnology Information (NCBI) databases [51] were used in the manual curation. Putative frameshifts were analyzed using CLC Genomics Workbench 7 (Qiagen, USA) and fixed whenever possible.

Genome properties

Genome sizes of the respective strains are: 2,337,090 bp (base pairs) on MEX1, 2,337,578 bp on MEX9, 2,337,529 bp on MEX25, 2,337,866 bp on MEX29, 2,368,140 bp on MEX30 and 2,367,880 bp on MEX31. The respective percentages of the predicted coding regions are: 86.16% on MEX1, 86.33% on MEX9, 85.94% on MEX25, 86.66% on MEX29, 83.06% on MEX30 and 86.64% on MEX31. These genome sizes and the G + C content (~ 52%) are consistent with other C. pseudotuberculosis studies [2, 6, 52]. There are 64 predicted RNA genes in strains of the biovar ovis (MEX1, MEX9, MEX25 and MEX29) and 66 from the biovar equi (MEX30 and MEX31). The numbers (and percentages) of predicted protein coding genes and pseudogenes of MEX1, MEX9, MEX25, MEX29, MEX30 and MEX31 strains are, respectively: 2021 (94.22%) and 60 (2.80%); 2025 (94.36%) and 57 (2.66%); 2016 (94.07%) and 63 (2.94%); 2032 (94.73%) and 49 (2.28%); 2008 (91.77%) and 114 (5.21%); and 2058 (94.32%) and 61 (2.80%). Table 3 shows detailed information about properties and statistics of these genomes. The number of genes associated with general COG functional categories [53, 54] was generated with the in-house script Blast Cog (https://github.com/aquacen/blast_cog) and are summarized in Table 4. The circular maps of C. pseudotuberculosis MEX1 and MEX30 strains in comparison with the other strains of the present study are shown in Figs. 2 and 3, respectively.

Table 3 Genome statistics
Table 4 Number of genes associated with general COG functional categories
Fig. 2
figure 2

Circular map of C. pseudotuberculosis strain MEX1 (biovar ovis) in comparison with the other strains of this study. The cluster of methylation type III, which is only present in biovar ovis strains, is highlighted in blue

Fig. 3
figure 3

Circular map of C. pseudotuberculosis strain MEX30 (biovar equi) in comparison with the other strains of this study. The cluster of CRISPR-Cas, which is only present in biovar equi strains, is highlighted in blue. The nitrate reductase gene cluster is highlighted by a black rectangle

Insights from the genome sequence

The nucleotide sequences, analyzed using the Gegenees software version 2.1 [55], show high similarity (> 92%) between the strains. Higher similarity (> = 99.7%) within strains belonging to the same biovar was found (Fig. 4). This is consistent with a previous study [1], using 15 strains of C. pseudotuberculosis , that shows similarity greater than 99% within the biovar ovis strains and at least 95% of sequencing similarity within the biovar equi strains. Moreover, the sequencing similarity among strains isolated from the same host is higher than the similarity among strains isolated from different hosts (Figs. 1 and 4).

Fig. 4
figure 4

Alignment generated using Gegenees software showing the percentage similarity among the strains, based on the accessory genome. The blue rectangle highlights the grouping of the biovar ovis. The red rectangle highlights the grouping of the biovar equi

Traditionally, the two biovars are differentiated using a nitrate reduction test, in which equi is positive, and ovis is negative [56]. Figure 3 highlights the cluster of genes related to nitrate reduction in Mexican equi strains with the black rectangle. The Protein Family Sorter tool [57] was used to search for genes or clusters of genes that may be used to differentiate the biovars. Within the six genomes of the present study, we found the cluster of genes that is related to proteins of type III restriction-modification (RM) systems [58, 59] exclusively in the biovar ovis (highlighted in blue in Fig. 2). A cluster of genes related to the proteins of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR-Cas) systems, probably belonging to type I-E [60], was found exclusively in the biovar equi (highlighted in blue in Fig. 3). Both systems work as protection systems, defending the bacteria against exogenous DNA. We analyzed 40 other sequenced strains of C. pseudotuberculosis to confirm these results in other strains. The same pattern was observed.

RM systems have two main components, a DNA methyltransferase, and a restriction endonuclease. The first one methylates the DNA in possible cleavage sites; the second one is responsible for the cleavage of DNA from external sources [61]. A good review of RM systems can be found in [62]. CRISPR-Cas systems are adaptive immune systems in bacteria and archaea. They use a complex of proteins known as Cas that are responsible for acquiring new, short sequences of external sources (exogenous genetic elements). These short sequences are incorporated into the bacterial chromosome and are called CRISPRs. The CRISPRs are transcribed into small RNAs that guide the Cas proteins to recognize and cleave foreign DNA, protecting the bacterial genome [63]. Reviews of CRISPR-Cas systems can be found in [63,64,65].

Possible new drug targets were predicted using the Specialty Genes Search from the Pathosystems Resource Integration Center (PATRIC) bioinformatics resource center [66]. The result shows a new putative target, the gene nrdF2, for five of the six strains used in the present study. In the C. pseudotuberculosis MEX30 strain, this gene is annotated as a pseudogene, which can explain why it was not considered a putative target. The product of this gene is the small subunit of ribonucleotide reductase (RNR) which is involved in dNTP (deoxynucleotide triphosphate) synthesis that reduces ribonucleotides to nucleotides. The RNRs can be classified into three classes (I, II and III). Class I is oxygen dependent and has two subclasses (Ia and Ib). Class Ia is coded by nrdA and nrdB genes; class Ib is coded by nrdE and nrdF. Therefore, the RNR found in the biovar ovis strains belongs to class Ib [67]. Previous studies [68,69,70] show the importance of this gene for growth under normal conditions (in vitro) in Mycobacterium tuberculosis , Corynebacterium ammoniagenes and Corynebacterium glutamicum . Additionally, other studies have pointed to this gene as a potential target of M. tuberculosis vaccine [70,71,72].

Conclusions

In the present study, we investigated six strains of C. pseudotuberculosis from different hosts and their sequenced genomes, the first whole-genome investigation of this organism from Mexico. The phylogenomic analysis suggested that the genetic structure of Ovis is more influenced by animal transportation than host preference. An in silico analysis of protein families showed two important clusters that may differentiate the biovars equi and ovis. Also, the present work identified a new putative drug target against C. pseudotuberculosis , the gene nrdF2, which has been previously described as a potential vaccine target [70,71,72]. Further in silico and in vitro analyses are required to validate these findings. Those results could provide a better understanding of this organism and its mechanisms of virulence and pathogenesis, as well as develop new diagnoses, vaccines, and treatments.