CRISPR-Cas systems are present predominantly on mobile genetic elements in Vibrio species
Bacteria are prey for many viruses that hijack the bacterial cell in order to propagate, which can result in bacterial cell lysis and death. Bacteria have developed diverse strategies to counteract virus predation, one of which is the clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR associated (Cas) proteins immune defense system. Species within the bacterial family Vibrionaceae are marine organisms that encounter large numbers of phages. Our goal was to determine the significance of CRISPR-Cas systems as a mechanism of defense in this group by investigating their prevalence, phylogenetic distribution, and genome context.
Herein, we describe all the CRISPR-Cas system types and their distribution within the family Vibrionaceae. In Vibrio cholerae genomes, we identified multiple variant type I-F systems, which were also present in 41 additional species. In a large number of Vibrio species, we identified a mini type I-F system comprised of tniQcas5cas7cas6f, which was always associated with Tn7-like transposons. The Tn7-like elements, in addition to the CRISPR-Cas system, also contained additional cargo genes such as restriction modification systems and type three secretion systems. A putative hybrid CRISPR-Cas system was identified containing type III-B genes followed by a type I-F cas6f and a type I-F CRISPR that was associated with a prophage in V. cholerae and V. metoecus strains. Our analysis identified CRISPR-Cas types I-C, I-E, I-F, II-B, III-A, III-B, III-D, and the rare type IV systems as well as cas loci architectural variants among 70 species. All systems described contained a CRISPR array that ranged in size from 3 to 179 spacers. The systems identified were present predominantly within mobile genetic elements (MGEs) such as genomic islands, plasmids, and transposon-like elements. Phylogenetic analysis of Cas proteins indicated that the CRISPR-Cas systems were acquired by horizontal gene transfer.
Our data show that CRISPR-Cas systems are phylogenetically widespread but sporadic in occurrence, actively evolving, and present on MGEs within Vibrionaceae.
KeywordsCRISPR-Cas systems Vibrio species Mobile genetic elements Transposons Tn7 Genomic islands Horizontal gene transfer
CRISPR associated proteins
Clustered Regularly Interspaced Short Palindromic Repeats
Direct Repeats, R end right end attachment site L end Left end attachment site
Mobile genetic elements
Protospacer adjacent motif
Type III secretion system
Vibrio pathogenicity island
Bacteriophages (phages) are viruses that infect bacteria, by injecting their viral DNA or RNA into bacterial host cells. This foreig'n DNA can then circularize and replicate or integrate into the bacterial host chromosome to form a prophage by site specific recombination mediated by an integrase. Phages are abundant in many ecosystems and are estimated to outnumber bacteria by ten-fold . Some phages are useful to the bacterium by adding new genes and producing new phenotypes that can impact fitness and bacterial virulence [2, 3, 4, 5, 6, 7]. However, many phages are harmful to their bacterial host causing bacterial cell lysis and death, and are important modulators of bacterial populations [8, 9, 10]. Bacteria have evolved mechanisms to protect against phage infection, including restriction modification (RM) systems and phage exclusion mechanisms—such as receptor modification. A more recent addition to this list is the clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR associated proteins (Cas) system [11, 12, 13]. This system is a bacterial immunity defense mechanism against foreign DNA such as phages and plasmids [13, 14, 15, 16, 17]. CRISPR-Cas systems are widespread among prokaryotes, found in ~ 84% of archaeal and ~ 47% of bacterial genomes . A CRISPR-Cas system consists of three functional components: a set of cas genes, a leader sequence and a CRISPR array. The CRISPR array features direct repeats (DRs), which vary in size from 21 to 37-bp and can occur in 1 to over 100 times depending on the species and the strain. These repeats are separated by non-repetitive DNA of similar size called spacers . The spacer sequences were shown to be acquired from previously infecting phage and act as memory to protect against future infection [14, 19]. Cas proteins are encoded by the cas gene cluster in an operon, which is usually located upstream from a CRISPR array. These Cas proteins are required for expression of cas genes, new spacer acquisition and target recognition and degradation [16, 20, 21, 22, 23, 24].
CRISPR-Cas systems are classified based on Cas protein content and arrangements in CRISPR-Cas loci with two main classes (1 and 2) and at least six types (I, II, III, IV, V, VI) that have been defined and updated [15, 25, 26]. The first level of classification is into class 1 and class 2, which separates CRISPR systems based on the type of CRISPR RNA (crRNA)–effector protein complexes that are utilized. Class 1 systems (type I, type III, and type IV) use a multi-subunit crRNA–effector complex, which is made up of several Cas proteins bound together with the mature crRNA to form a large protein complex. The function of this complex is to bind to the crRNA, which acts as a guide to target complementary foreign DNA and use its nuclease ability to cleave the targeted sequences [27, 28]. The class 2 systems consisting of type II, type V, and type VI have only one protein, Cas9, Cas12, and Cas13 respectively, that fulfills all the functions of the multiprotein effector complex .
There are currently six types of CRISPR systems characterized, which are differentiated based on their signature cas genes, gene arrangement and direct repeats. Signature Cas proteins are the effectors that cleave target DNA/RNA and are used to differentiate between these types. These hallmark proteins are Cas3 for type I, Cas9 for type II, Cas10 for type III, Csf1 for type IV, Cas12 (Cpf1) for type V, and Cas13 for type VI [15, 25, 26]. Among the system types, are subtypes that have additional signature genes and gene arrangements [15, 26]. These subtypes include: seven type I subtypes (I-A, I-B, I-C, I-U, I-D, I-E, and I-F), three type II subtypes (II-A, II-B, and II-C), four type III subtypes (III-A, III-B, III-C, and III-D), six type V subtypes (V-A, V-B, V-C, V-D, V-E, V-U), and three type VI subtypes (VI-A, VI-B, VI-C) [15, 26]. Common to almost all types of CRISPR systems is the presence of Cas1 and Cas2 proteins, which function in adaptation to add new spacer sequences into the CRISPR array . However, recent studies indicate that these proteins are absent from a few active CRISPR-Cas systems, in which case the system is dependent on adaptation modules from other systems [15, 16, 23]. The most conserved CRISPR-associated proteins are Cas1 and Cas3 but in general Cas proteins are numerous and highly divergent making classification challenging [15, 25, 26, 30].
Spacers are sequences derived from protospacers from foreign invading mobile genetic elements (MGEs) that are incorporated into a CRISPR array between two repeat sequences generally adjacent to the CRISPR leader sequence. The protospacer adjacent motif (PAM) is a sequence consisting of two or three nucleotides immediately before the protospacer sequence and it is necessary to distinguish CRISPR targeted protospacer sequences (non-self) from the system’s own genome (self) .
The family Vibrionaceae have 8 genera that have whole genome sequences available, Aliivibrio, Enterovibrio, Grimontia, Listonella, Vibrio, Photobacterium, Photococcus, Salinivibrio) that are ubiquitous in the marine environment where phages are abundant. There have been limited studies describing CRISPR-Cas systems in this bacterial family and most studies were confined to V. cholerae, a significant pathogen of humans that causes cholera [15, 32, 33, 34, 35, 36, 37]. A type I-F CRISPR-Cas system was described within a phage named ICP phages (International Centre for Diarrheal Disease Research, Bangladesh cholera phage), which was isolated from cholera stool samples . A CRISPR-Cas type I-E system was described in V. cholerae biotype classical strains that caused the earlier pandemics of cholera [32, 34]. The type I-E system was present within a 17-kb genomic island (GI) named GI-24 . GIs are non-self-mobilizing integrative and excisive elements that can contain a diverse range of traits and are present in a subset of strains of a species and absent from others. All GIs contain a recombination module comprised of an integrase required for site specific recombination, associated attachment sites (attL and attR) and, in many cases, a recombination directionality factor [38, 39]. GI-24 contained a recombination module and inserted between homologues of ORFs VC0289-VC0290 in the El Tor strain N16961, which lacked this island [32, 40]. Recently, a type I-F CRISPR-Cas system was identified within a 29-kb genomic island named Vibrio Pathogenicity Island-6 (VPI-6) that contained a recombination module and could excise from the chromosome as a complete unit . Thus to date, the CRISPR-Cas systems that have been identified in V. cholerae are all present within MGEs.
Here, we determined the prevalence, diversity and phylogenetic distribution of CRISPR-Cas systems present within Vibrionaceae through comparative genomics, bioinformatics, and phylogenetic analyses of available genome sequences in the NCBI database. In addition, we examined the genomic context of each system to determine whether it was acquired as a single module or within a MGE. Several variant type I-F CRISPR-Cas systems were identified in V. cholerae and in 41 additional species. The canonical type I-F system and a variant type I-Fv in V. cholerae were present within the genomic island VPI-6. A mini type I-F system (tniQcas5cas7cas6f) was within Tn7-like transposons in V. cholerae, V. parahaemolyticus and over 40 additional species. In V. parahaemolyticus, within the Tn7-associated CRISPR-Cas system was the pathogenicity island containing the type three secretion system 2 (T3SS-2). A putative hybrid type III-B/I-F system, which contained a type III-B cas gene cluster, a cas6f gene and a type I-F CRISPR array was identified in several V. cholerae and V. metoecus strains. The hybrid system was present within a prophage at the same genome location in both species. Multiple CRISPR-Cas system types including type I-C, I-E, I-F, II-B, III-A, III-B, III-D and the rare type IV system were uncovered. Interestingly, the majority of these CRISPR-Cas systems were identified within MGEs that included genomic islands, plasmids and transposon-like elements. A number of novel cas gene arrangements and cas gene contents were found among the systems identified. The data suggest that many variations of Cas protein content exist within the different types, and the acquisition of CRISPR-Cas systems on MGEs is a common feature in this group. Phylogenetic analysis of Cas proteins and their sporadic occurrence within a species also suggested that the CRISPR-Cas systems were acquired by horizontal gene transfer. Overall these data show that CRISPR-Cas systems are phylogenetically widespread but not the predominant defense mechanism of this group.
CRISPR-Cas systems present in the family Vibrionaceae
Using BLAST and comparative genome analyses, we examined species belonging to the family Vibrionaceae available in the NCBI genome database for the presence of CRISPR-Cas systems. We identified eight different system types: type I-C, I-E, I-F, II-B, III-A, III-B, III-D, and IV as well as variants of these types and hybrid systems among 70 species (Additional file 1: Figure S1A). These CRISPR-Cas systems were sporadic in their occurrence and distribution within and among species. The majority of the systems were detected on MGEs such as genomic islands, plasmids, and transposon-like elements suggesting a possible vector for horizontal gene transfer (Additional file 1: Figure S1B). The most predominant type identified was the type I system, which accounted for 81% of the systems identified that encompassed type I-F, type I-E and type I-C systems. Within the type I systems, the type I-F subtype was the most abundant and was found across four genera, 41 species, and 116 strains (Additional file 1: Fig. S1C). A type II-B system was present in two Vibrio species and three Salinivibrio species (Additional file 1: Figure S1C). The type III systems were the next most prominent type making up 14% of the systems identified consisting of type III-A, type III-B, and type III-D. The rare type IV system was identified on a plasmid in two strains of V. parahaemolyticus. The distribution data, only present in a few strains of a particular species, leads to the most parsimonious conclusion that CRISPR-Cas systems are not ancestral to any species within this family.
CRISPR-Cas type I-F systems in V. cholerae
To determine whether the CRISPR-Cas system and the VPI-6 island had a similar evolutionary history and were acquired together, phylogenetic analysis of the cas1 gene from the type I-F system and intV gene from VPI-6 was performed. Overall there was congruency between the intV and cas1 gene trees, with four divergent branches within the intV tree, and strains found within these four divergent branches showed a somewhat similar branching pattern in the cas1 gene tree. This would suggest a similar evolutionary history. A few strains, 2012Env-2, HE-48, A325, showed different clustering patterns between the trees. However, the bootstrap values for many branches for the cas1 tree were low indicating the branching patterns are not robust as there were limited polymorphic sites (Additional file 1: Figure S2).
A mini type I-F system within a Tn7-like transposon
Mini type I-F-carrying Tn7 R-end and L end attachment sites
Right end (R)
Left end (L)
RM system Type I-C CRISPR-Cas
In V. cholerae HE-45, the Tn7-like transposon was inserted at another novel Tn7 insertion site downstream of inosine-5′-monophosphate dehydrogenase (IMPDH) also annotated as guanosine 5′-monophosphate oxidoreductase (guaC) (Fig. 2b) (Table 1). The Tn7-like element encompassed a 36-kb region, containing a mini type I-F system and a restriction modification system. In strain 490–93, at the same genomic location in which the Tn7-like element for HE-45 was inserted, we identified a region with a mini type I-F system and a xylulose metabolism gene cluster, however due to short contig, we were unable to locate a Tn7-like element or R and L sites (Fig. 2b). This is the only V. cholerae strain in the NCBI genome database (> 900 genomes sequenced) that contains this metabolic cluster. Overall it appears that these Tn7-like elements have captured not only CRISPR-Cas defense systems but also restriction modification systems.
In strain ISF-25-6 that contains a T3SS-2β on chromosome 1, a variant Tn7 associated mini CRISPR-Cas type I-F system was present on chromosome 2 at the IMPDH locus between VPA1158 and VPA1159 relative to strain RIMD2210633 (Fig. 3a) (Table 1). This region also contains a restriction modification system and the entire 35-kb region is flanked by attTn7 sites (Table 1). In non-human pathogenic strains, the Tn7 associated mini type I-F system is located in chromosome 1 at the SRP-RNA insertion site between VP0953 and VP0954 relative to RIMD2210633 (Fig. 3b). At this site depending on the strain, two divergent CRISPR-Cas systems were present associated with two divergent Tn7-like transposons. In one strain CDC_K4762, the region also contained a type IV toxin antitoxin system, we were able to identify the R site, however due to a short contig, the L site could not be determined (Fig. 3b) (Table 1). Comparative genomic analysis indicated that the Tn7 associated CRISPR-Cas mini type I-F was acquired at least four times in this species.
Putative hybrid CRISPR-Cas type III-B/I-F system
The CRISPR array associated with the type III-B/I-F hybrid system contained a type I-F direct repeat and a type I-F PAM (Fig. 5c and b). In addition, in two V. metoecus strains YB4D01 and RC341, a highly similar type III-B/I-F hybrid system within a prophage highly homologous to that present in V. cholerae HE-45 was identified (Fig. 5a) (Additional file 2: Table S7). Phylogenetic analysis of an integrase gene, cmr1, and cas6f genes among five strains showed no congruency suggesting no shared evolutionary history. However, there was a limited number of polymorphic sites among the strains examined (Additional file 1: Figure S3).
Next, we examined the CRISPR arrays associated with the type I-F system and the putative type III-B/I-F hybrid system identified in V. cholerae. The CRISPRMap program classified the direct repeat sequence as a type I-F system repeat in all strains analyzed (Additional file 2: Tables S1 and S7). The arrays analyzed ranged in size from 2 to 83 spacers and a total of 1504 spacers were identified. Using the CRISPRTarget program to identify spacer homology in the plasmid and phage databases, we found that 356 of the 1504 spacers hit to protospacers. A total of 215 spacers matched to regions within the same sequences of phages X29/phi-2 (accession number KJ572845) and Kappa, as well as three filamentous phages fs2, fs-1 and KSF1 (Additional file 1: Figure S4). In V. cholerae strain 984–81, spacers had four targets to CTXphi. Several spacers were also found to target Vibrio phages, pYD21-A, YFJ, CP-T1, and Martha 12B12.
CRISPR-Cas type I-F systems within mobile genetic elements (MGEs)
We determined that 97% of the type I-F systems identified in this study were associated with MGEs, which was based on the presence of signature genes in the vicinity of the CRISPR-Cas genes and comparative genome analysis. For example, the type I-F system in V. metoecus YB5B04 was present within an 18-kb island integrated between a gene encoding a hypothetical protein and trmA, with respect to V. metoecus OYP8G12, which lacked the island (Additional file 1: Figure S5A). The 5′ end of the island was marked with int, which encoded a putative integrase required for site specific recombination. We also identified attL and attR sites flanking the island indicating that the 18-kb region was likely acquired as a unit by site specific recombination (Additional file 1: Figure S5A). The GC content of this region was 43% compared to the overall genome GC content of 47% suggesting it is not ancestral to the genome. In V. parahaemolyticus A4EZ703, a type I-F system was present within an island inserted between VPA0712 and VPA0713 with respect to V. parahaemolyticus RIMD2210633 that lacked the region (Additional file 1: Figure S5B). The 63-kb island had a GC content of 41%, compared to 45% across the genome. This island contained an integrase at its 3′ end and the island was flanked by attL and attR sites (Additional file 1: Figure S5B). The type I-F system in V. vulnificus 93 U204 was present within a 25-kb genomic island inserted between VV1_0634 and the tRNA-Met locus with respect to V. vulnificus CMCP6 that lacked the region (Additional file 1: Figure S5C). The type I-F system was also present within a genomic island region in V. fluvialis that contained an integrase gene (Additional file 1: Figure S5D). Although the CRISPR-Cas system are identified within different genomic islands in these strains, it is not possible to determine whether they were acquired with the island or whether they are a recent addition to the island.
Phylogenetic analysis of the Cas6f proteins
The Cas6f associated with the Tn7-like transposon mini type I-F system formed four highly divergent branches (clade VII-clade X) within which were highly variant Cas6f proteins, with some species present on multiple distantly related branches indicating in some species the system was acquired multiple times from diverse sources (Fig. 6).
Type I-E CRISPR-Cas systems in V. cholerae
Type I-E CRISPR-Cas systems in Vibrionaceae
A total of 28 strains encompassing ten species of Vibrio, four species of Photobacterium, nine species of Salinivibrio contained a type I-E system (Additional file 2: Table S4). In V. metoecus YB5B06, the type I-E system was present within GI-24 similar to V. cholerae classical strains (Fig. 7a). The type I-E system was also identified in V. albensis strains ATCC 14547 and VL426 and was present in a 12-kb genomic region inserted at the same genomic location as GI-24 with respect to N16961, however, no integrase was identified (Fig. 7a; Additional file 2: Table S4).
Vibrio azureus LC2–005 and NBRC 104587 also contained the I-E system, each with two CRISPR arrays. We did not identify any spacer hits for either of these two strains. A canonical type I-E system consisting of cas3cas8ecse2cas7cas5cas6ecas1cas2 was present in two strains of V. gazogenes, CECT 5068 and DSM 21264. The associated type I-E arrays consisted of 15 total spacers (13 and 2 spacers, respectively), however no protospacer matches were identified. We identified a canonical type I-E system in V. harveyi ATCC 43516 with 37 spacers. In S. sharmensis DSM 18182, the type I-E systems had 79 spacers with protospacer targets in Salinivibrio phage SMHB1 (Additional file 2: Table S4).
CRISPR-Cas type I-E system present within an excisable genomic island GI-24
The type I-E system present in V. harveyi ATCC 43516 was carried on an 85-kb region inserted between LA59_08695 and LA59_08700, with respect to V. harveyi ATCC 33843, which lacked the region (Fig. 7d). The 85-kb region had a GC content of 40%, compared to 45% for the whole genome, however no integrase or transposase genes were identified. The region also contained genes for a type three secretion system (T3SS) (Fig. 7d). In P. profundum SS9 and V. halioticoli NBRC 102217, the type I-E system was identified within a homologous conjugative plasmid suggesting horizontal transfer between these distantly related species.
Phylogenetic analysis of Cas8e proteins
The Cas8e protein sequences were aligned and a neighbor-joining tree was constructed. The branching patterns demonstrates the presence of six major clades designated I to VI (Additional file 1: Figure S7). We identified 12 V. cholerae biotype classical strains that contained highly homologous Cas8e proteins that clustered in lineage I with a Cas8e protein from V. metoecus strain YB5B06 and two Cas8e proteins from two V. albensis strains (Additional file 1: Figure S7). Divergent but related to this group were Cas8e proteins from two strains of V. azureus in lineage II. The next three divergent lineages, III, IV and V grouped Cas8e proteins based on the genus and species they were present in. Lineage III consisted of Cas8e proteins from 4 strains of V. gazogenes and one strain each of V. spartinae and V. ruber that were all highly related. Lineage IV was comprised of Cas8e from V. parahaemolyticus and V. harveyi clustered together and branching with these were Cas8e from two Photobacterium species (Additional file 1: Figure S7). Lineage V was comprised of Cas8e from 8 strains of Salinivibrio and one strain of Photobacterium galatheae. Finally, clade VI consisted of the two most divergent Cas8e proteins from V. halioticoli NBRC 102217 and P. profundum SS9, which contained a variant type I-E system carried on a plasmid.
Type I-C CRISPR-Cas systems in Vibrionaceae
Previously, we identified a type I-C system in V. metschnikovii CIP 69.14 . We used the Cas proteins from this species as seeds in BLAST searches to identify putative systems in the Vibrionaceae. This analysis identified type I-C CRISPR-Cas systems in 12 species; V. metschnikovii, V. cidicii, V. hangzhousensis, V. navarrensis, P. aquimaris, P. marinum, V. anguillarum, V. salilacus, V. fujianensis, Vibrio sp. V03-P4A6T147, Salinivibrio sp. DV, and Photobacterium sp. CECT 9192 (Additional file 2: Table S5). All type I-C systems identified, with the exception of the one present in V. anguillarum, contained the canonical CRISPR-Cas type I-C cas gene arrangement and a type I-C 32-bp direct repeat (Additional file 2: Table S5).
In Vibrio sp. V03-P4A6T147 and V. hangzhouensis CGMCC 1.7062, we were unable to identify CRISPR arrays due to short contig sequences. Across the remaining species there were a total of 491 spacers identified, and each had a conserved type I-C PAM. In P. marinum, there were two CRISPR arrays flanking the type I-C cas gene cluster each with a type I-C repeat. The CRISPR arrays ranged in size from 2 spacers up to 179 spacers present in Salinivibrio sp. DV, the largest array identified in this study (Additional file 2: Table S5). Protospacer targets were identified for 31 spacers from a total of 491 and of these 31 targets 16 were hits to the Salinivibrio phage SMHB1.
Type I-C CRISPR-Cas system present within a Tn7-like transposon
In V. cidicii 1048–83, the type I-C system is present within a 25-kb region that contains three transposases genes and had a GC content of 40%, compared to 48% GC content for the entire genome (Fig. 9e). In V. anguillarum PF7, V. hangzhouensis and P. aquimaris, the type I-C systems was within a region that contained both transposases and integrase genes. However, several strains of Vibrio, Photobacterium and Salinivibrio contained only a complete type I-C system integrated within the genomes with no additional genes present suggesting it was the sole acquisition at the insertion site (Additional file 1: Figure S8). Overall it appears that the CRISPR-Cas systems in these species were acquired as distinct unit or modules and not within any identifiable MGE.
Phylogenetic analysis of the Cas8c proteins present in Vibrionaceae showed that V. metschnikovii and V. navarrensis Cas8c proteins were closely related to each other but were the most divergent Cas8c proteins and formed a separate highly divergent branch. Cas8c (Fig. 9d). In V. metschnikovii CIP69.14, the type I-C system was not associated with any MGE or signature MGE genes. In V. anguillarum PF7, two divergently transcribed cas gene clusters are present, cas3cas5cas8cas7 and cas3cas5cas8cas7cas4cas1cas2. The Cas8c proteins from this species clustered within two distinct lineages, one with Cas8c from V. salilacus, Vibrio sp., V. fujianensis and Salinivibrio sp. DV and the second with Cas8c proteins from V. cidicii and V. hangzhouensis. The Cas8c proteins from three Photobacterium species clustered together with long-branch lengths indicating they are not closely related to each other. (Fig. 9d).
Type II-B CRISPR-Cas system in Vibrionaceae
Analyzing the CRISPR array, a type II-B system repeat sequence of 37-bp was identified in Vibrio and in Salinivibrio strains (Fig. 10b). We used CRISPRone to detect the trans-activating crRNA (tracrRNA), which is usually located between the cas genes and CRISPR array region and is complementary to the repeat sequence of the type II-B system, allowing it to pair with the repeat fragment of the pre-crRNA for interference . We identified the tracrRNA downstream of the cas1 in three out of the six strains analyzed as shown for S. kushneri IC202 and S. sharmensis CBH463 (Fig. 10e). The inability to detect the tracrRNA in the other three strains could be due to the threshold of 15 nucleotide match and at most two mismatches for the paring length set by the program . Spacer analysis identified from 3 to 51 spacers among the strains with a total of 130 spacers and 15 putative protospacers were identified (Additional file 2: Table S6). Using these protospacers, we were able to identify the PAM sequence for these II-B systems and found it to be a 3’NGG 5′ (Fig. 10c), which is in agreement with what was previously shown in Francisella novicida .
Phylogenetic analysis based on the Cas9 obtained from the 6 strains demonstrated two major clades. Clade separation was genus specific: Clade I contains species belonging to Salinivibrio and showed highly related Cas9 proteins among three species. Divergent from these were Cas9 proteins in clade II from two Vibrio species (Fig. 10d).
CRISPR-Cas type II-B systems present within MGEs
The type II-B system in V. natriegens CCUG 16373 was present within a 30-kb region that was inserted adjacent to a tRNA-Met locus that was absent from V. natriegens CCUG16374. (Fig. 10a). The 30-kb region contained a restriction modification system and three integrases, one of which was adjacent to the tRNA locus suggesting site specific integration (Fig. 10a). Within two Salinivibrio species, the type II-B system is also present within a genomic island that contains an integrase and is inserted at a tRNA locus (Fig. 10e).
Type III CRISPR-Cas systems in Vibrionaceae
We used the Cas10 protein from the putative hybrid type III-B/I-F system to determine whether other species contained type III systems within Vibrionaceae. We identified 15 species that contained a type III system (Additional file 2: Table S7). Based on cas gene arrangement and cas gene homology, three subtypes were identified: type III-A, type III-B, and type III-D (Additional file 2: Table S7). In addition to these subtypes, we also uncovered a hybrid type III-B/I-F hybrid system in V. palustris CECT 9027 and Salinivibrio sp. DV (Additional file 2: Table S7). Interestingly the type I-F direct repeat in Salinivibrio sp. DV was identical to the repeat present in V. metoecus YB4D01 and V. cholerae (Additional file 2: Table S7). This suggests a common origin in distantly related species and recent horizontal transfer between these genera. In addition, we identified a type III-B system in V. spartinae CECT 9026 with three type I-F CRISPR arrays (Additional file 2: Table S7). In Salinivibrio sp. MA351, we identified a III-B system followed by a type I-F array but this system also clustered with a complete type I-F system (Additional file 2: Table S7).
The genome sequence for four V. gazogenes strains ATCC 43941, ATCC 43942, CECT 5068 and DSM 21264 each contained at least one type III system. Vibrio gazogenes ATCC 43941 and ATCC 43942 harbored identical type III-B systems on chromosome 1 with cas2cas1 divergently transcribed from hphpcmr1cas10cmr3cmr4cmr5cmr6 with two CRISPR arrays, one at each end of the cas gene clusters (Additional file 1: Figure S9A). Strains CECT 5068 and DSM 21264 harbored a homologous type III-B system with two CRISPR arrays and is found on chromosome 2 (Additional file 1: Figure S9B). These strains also contained a type III-A system with two arrays, one at each end of the cas loci (Additional files 2 and 1: Table S7; Figure S9C).
We identified an additional five strains with a type III-A system containing the cas gene arrangement of cas10cas7cas5cas7cas1cas2 (Additional file 2: Table S7). The type III-A system in these strains contained one type III-A CRISPR array, with the exception of P. aphoticum JCM 19237, which contained two type III-A CRISPR arrays. Seven strains containing a type III-D system were also identified containing cas10csm3csx10csm3csx19cas7cas6 along with cas1cas2 genes in close proximity (Additional file 2: Table S7).
In 18 of the strains with a type III system characterized in this study, cas1cas2 were present. Of note was the presence of a reverse transcriptase (RT) domain in 14 of the 18 Cas1 proteins identified. In type III-A system of V. gazogenes CECT 5068 and DSM 212464, the Cas1 protein is fused with RT and Cas6 domains. In 11 strains with either a type III-A, III-B or III-D system, only RT and Cas1 domains are fused. In P. aphoticum JCM 19237, the RT encoding gene is adjacent to the cas1. These RT containing Cas1 proteins have been shown previously to be primarily found in proximity to type III systems, are not specific to any subtype, and function autonomously . In addition, the reverse transcriptase activity of the RT-Cas1 domain is required for spacer acquisition from RNA .
Neighbor-joining trees were constructed from the Cas1 domain sequences and the Cas10 proteins to determine the evolutionary history of these proteins. In clade I of the Cas1 domain tree, the seven strains containing a type III-A system are clustered (Additional file 1: Figure S10A). This clade contained two V. gazogenes strains with a cas1 gene with cas6 and retron domains and are distantly related to cas1 genes from Photobacterium and Vibrio species. In clade II, the Cas1 from the four strains containing a type III-B cluster together from 4 V. gazogenes strains (Additional file 1: Figure S10A). In these strains, the Cas1 is directly next to but transcribed divergently from the type III-B system cas genes and contains a retron domain. Clade III consists of three Vibrio species with a type III-D system. The Cas1 from these three strains has a fused RT domain (Additional file 1: Figure S10A). Clade IV contains four species with a type III-D system that formed the most divergent cluster with a Cas1 only domain. This clade is highly divergent, characterized by long-branch lengths (Additional file 1: Figure S10A).
In the Cas10 tree from the strains containing Cas1, the proteins are separated based on the subtypes, with all type III-A clustered together, type III-B clustered together and all type III-D clustered together (Additional file 1: Figure S10A-B). The Cas10 from the seven strains containing a type III-D system are much more closely related and cluster in one single clade (Additional file 1: Figure S10B). In the Cas10 tree, proteins from the type III-B are the most divergent (Additional file 1: Figure S10A-B). These data suggest that each type III systems share a similar evolutionary history which is not the case within the Cas1 domain tree.
CRISPR-Cas type III systems within MIGEs
Phylogenetic analysis of all the Cas10 proteins identified demonstrated that these systems are highly divergent from one another. One exception is the Cas10 from the hybrid type III-B/I-F systems associated with a prophage, which all clustered together (Additional file 1: Figure S11). Branching from these type III-B/I-F hybrid systems were V. palustris CECT 9027, V. spartinae CECT 9026 and Salinivibrio sp. MA351 which all have a type III-B system and I-F arrays. In clade II, Cas10 proteins from V. gazogenes strains cluster together demonstrating homologous type III-B systems. Clade III is comprised of the seven strains containing a type III-A system, which are homologous to each other and encompasses Photobacterium and Vibrio species. Clade IV contains the diverse type III-D systems and comprise of Cas10 proteins from Vibrio, Salinivibrio and Photobacterium species (Additional file 1: Figure S11).
Type IV CRISPR-Cas systems in Vibrionaceae
CRISPR-Cas systems are an adaptive, bacterial defense system against invading DNA (and RNA in some cases) such as phages and plasmids. Most of the studies of these systems have been conducted in Escherichia coli, Pseudomonas aeruginosa and several species of Streptococcus, leaving many families of bacteria unexamined [12, 29, 66, 67]. In this study, comparative genomics and bioinformatics analyses were used to identify the types of CRISPR-Cas systems present in marine species belonging to the family Vibrionaceae. In multiple species, different types of CRISPR-Cas systems were identified, including novel variants. The variation was marked by the presence of different cas genes and different gene arrangements.
In V. cholerae, canonical and variants of type I-F were identified within genomic islands and mini type I-F systems were present within Tn7-like transposons. The mini type I-F system was also identified in a large number of V. parahaemolyticus strains as well as many additional species. A previous study has suggested that the Tn7-like transposons have coopted the mini type I-F CRISPR-Cas systems to identify and target its insertion site . The Tn7-like transposon lacks key genes required for its function, specifically TnsE and TnsD homologues required for integration, and the mini CRISPR-Cas system lacks genes required for target cleavage (cas3) and spacer acquisition (cas1 and cas2). Peters and co-workers hypothesize that the CRISPR containing transposon function together to form a functional element that allows for target integration into MGE and/or movement within a genome . In V. parahaemolyticus, a significant pathogen of humans, the main virulence mechanism, a T3SS-2 system, is present within a CRISPR-Cas-carrying Tn7-like element. T3SS-2 systems are large genomic regions of > 80-kb and are only present in human pathogenic strains and are absent from environmental isolates [49, 52, 53]. T3SS-2 systems have been identified at different genome locations depending on the strain analyzed and the mechanism of acquisition and insertion is unknown. Our analysis shows that the T3SS-2 systems in V. parahaemolyticus are all part of CRISPR-carrying Tn7-like transposons and indicate a mechanism by which these virulence factors can be mobilized, which needs to be investigated further.
A putative hybrid system consisting of type III-B genes and cas6f followed by a type I-F array, was present in eight V. cholerae strains and two V. metoecus strains. The strains were isolated between 1974 and 2012 from three continents and each strain contained a unique array with between 12 and 50 spacers. Although further experimental evidence is required, the type III-B/I-F could be a true hybrid system. It has been proposed that extensive recombination events within the CRISPR-Cas region may lead to the formation of a hybrid system [16, 68]. The type III-B/I-F hybrid system is associated with a prophage region and thus may explain the high similarity between the systems found in a few strains in two species. Although one cannot rule out the possibility that the CRISPR-Cas systems is only co-localized with the prophage.
This study is the first, large scale genome analysis to characterize the CRISPR-Cas systems in the family Vibrionaceae, members of which are marine inhabitants that encounter huge amounts of viruses and thus would be good candidates for the presence of diverse CRISPR-Cas systems. In addition to type I-F systems, our analysis uncovered type I-E, I-C, III-A, III-B, III-D systems and system variants, as well as type II-B and type IV systems. Thus, there is a diversity of systems present among the different species, but their occurrence is infrequent suggesting that these systems are not a major contributor to bacterial fitness and survival. The multiple variations in the cas gene arrangements and content identified here suggests that gene rearrangements occur frequently and the systems are constantly in flux. In addition, our analysis demonstrates that the majority of systems identified in Vibrionaceae are associated with MGEs and that many of these elements carry additional cell defense systems mainly restriction modification systems and toxin-antitoxin systems. In these MGEs, CRISPR-Cas may be playing more of a role in protecting the MGE they are carried on rather than the strain they are present in, similar to the role of the type I-F system present in phages isolated from cholera stool samples [33, 37]. Phylogenetic analysis using marker Cas proteins suggests that within some species acquisition occurred multiple times, resulting in strains with multiple CRISPR-Cas systems of different system types, which leads to the question of whether some strains are more susceptible for uptake of these systems than others. Overall, the data shown that CRISPR-Cas systems were sporadic in distribution, and are not ancestral to any species of Vibrionaceae. In addition, the presence of these systems on MGEs carrying additional cargo genes such as virulence genes could suggest additional roles within the cell.
The CRISPR-Cas repertoire is an ever-expanding defense system in bacteria. In Vibrionaceae, these systems are highly diverse with multiple sub-types and sporadic in distribution. The identification of protospacer targets suggests that these systems were active within this family. Importantly, the association of these systems with genomic islands, plasmids, phages, and transposons suggest a vector for their transfer amongst species. Furthermore, the mobility of these systems can lead to novel variant subtypes, which we have identified several examples of, including a type III-B/I-F hybrid system. This work opens the door for future studies in determining how CRISPR-Cas systems affect the survival of these bacteria and potentially novel functions of these systems.
CRISPR-Cas sequence analysis and predictions
The NCBI non-redundant protein sequences database was queried using BLASTp, initially using the Cas protein sequences previously identified in V. cholerae as well as signature Cas proteins identified in other species as seeds. A complete list of the bacterial strains within Vibrionaceae that contained a CRISPR-Cas system with a CRISPR array (repeat and spacer) and the accession number of the sequence contigs are shown in (Additional file 2: Table S8). The FASTA sequences containing putative CRISPR-Cas systems were downloaded from the NCBI genome database and searched for CRISPR arrays (repeats and spacers) using the CRISPRDetect software tool  and the CRISPRFinder tool . The direct repeat sequences of putative CRISPR arrays determined by CRISPRDetect and CRISPRFinder were used as an input for CRISPRMap  to assign a type and subtype to each of the newly identified CRISPR arrays.
Identification of putative protospacers
All CRISPR spacers identified by CRISPRFinder and CRISPRDetect were used as query for the CRISPRTarget program, using default parameters to identify the complementary protospacer sequence of each spacer . A cutoff score of 22 was used for the analysis. Using Weblogo , we aligned the 3′ flanking sequences of the protospacer hits, comprised of 8-bp, to visualize the motif. Phage genomes were downloaded from the NCBI database and the distribution of protospacer hits were mapped onto the phage genome. Targeted gene loci were determined by examination of the target phage genome.
Identification of MGEs
The genomic region surrounding the CRISPR-Cas systems were analyzed for the presence of markers of MGEs such genomic islands, phages, and transposons markers included integrases (int), transposases (tnp) and site specific attachment (att) sites. In addition the %GC content of each region was determined and compared to the overall %GC content of the chromosome. To identify regions of difference between strain with and without a CRISPR-Cas system, the sequence/contig with the CRISPR-Cas system was compared with related strains or species lacking the system, and figures were generated using Easyfig . Phaster, a phage identification tool, was used to identify phage regions associated with CRISPR-Cas systems . Minimal sized contigs where core genes were unable to be identified were excluded from MGE analysis.
Comparative analysis of CRISPR-Cas systems
The Genbank sequences of CRISPR-Cas systems were downloaded from the NCBI genome database. The nucleotide sequence for these were compared using the Easyfig comparison tool. BLASTn, with default settings, was used for comparison, unless otherwise stated.
Phylogenetic analysis of CRISPR-Cas systems
Evolutionary analysis was performed on Cas1, Cas6f, Cas8e, Cas8c, Cas9, and Cas10 proteins as well as integrase genes associated with CRISPR-Cas systems from species within the family Vibrionaceae. Additionally, evolutionary analysis was conducted using Cas1 and Cas10 sequences from the type III systems. Protein sequences were obtained from NCBI database and aligned using the ClustalW algorithm . Aligned protein sequences were used to generate a Neighbor-Joining tree with a bootstrap value of 1000, and the Poisson model of substitution with pairwise deletion in MEGA7 .
Genomic island GI-24 excision assays
PCR assays to detect both the empty site attB following excision of GI-24 as well as the circular intermediate attP of excised GI-24 were performed as previously described (Carpenter et al. 2015). DNA was isolated from V. cholerae O395 and N16961 from overnight liquid incubations using NucleoSpin Tissue kit (Macherey-Nagel, Duren, Germany) following the manufacturer’s instructions. In order to detect the excision, a nested two-stage PCR was performed to amplify the empty chromosomal site attB or the circular intermediate attP. The first round of PCR was performed in 20 μl reactions using 10 ng of isolated DNA as template using primer pairs: attB_R1L: agtttggtgcgggtatcaag and attB_R1R: gccactgcgtgactctgtta and for attP the following primers were used: attP_R1R: gctccctccttcaagtaccgctc and attP_R1L: gcgaaactgccaacgcacg. Following the first round of PCR, 1 μl of the reaction product was used as a template for a second round of PCR using attB primers_attBR2L aaagtgggcgagtagggtt and_attBR2R tctggacaccatcatgcaat and attP primers_attPR2R ccgatagcgacaatgacactgc and attPR2L gagacccttgcacccaatccatc. The PCR products were analyzed by gel electrophoresis on a 1% agarose gel stained with ethidium bromide. Experiments were performed with two biological replicates and three technical replicates.
We thank Alexandra L. Burgess, John R. Vaile III, and Ji Kent Kwah for help with initial data collection. We gratefully acknowledge members of the Boyd Group for constructive feedback on the manuscript and thank the anonymous reviewers for their constructive comments and suggestions.
This research was supported by a National Science Foundation grant (award IOS-1656688) to E.F.B, which in part supported AR and DPM. NDM was funded by a University of Delaware graduate fellowship award and the NIH through a CBI training grant: 5T32GM008550. DPM was supported in part by Delaware INBRE: NIGMS (P20GM103446) summer undergraduate scholar’s fellowship. The funding sources played no role in the design of the study nor collection, analysis, and interpretation of data nor in the writing of the manuscript.
Availability of data and materials
All data and materials are presented within the manuscript and/or as additional supporting files.
EFB conceived of the study, directed the research, drafted and edited the manuscript. NDM, AR, JDB, and DPM performed the research and edited the manuscript. All authors have read and approved the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 11.Mojica FJ, Diez-Villasenor C, Soria E, Juez G. Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria. Mol Microbiol. 36. England 2000. p. 244–6.Google Scholar
- 15.Makarova KS, Wolf YI, Alkhnbashi OS, Costa F, Shah SA, Saunders SJ, et al. An updated evolutionary classification of CRISPR-Cas systems. Nat Rev Microbiol. 13. England 2015. p. 722–736.Google Scholar
- 16.Makarova KS, Haft DH, Barrangou R, Brouns SJ, Charpentier E, Horvath P, et al. Evolution and classification of the CRISPR-Cas systems. Nat Rev Microbiol. 9. England 2011. p. 467–477.Google Scholar
- 19.Hille F, Charpentier E. CRISPR-Cas: biology, mechanisms and relevance. Philos Trans R Soc Lond B Biol Sci. 2016;371(1707).Google Scholar
- 30.Koonin EV, Makarova KS. CRISPR-Cas: an adaptive immunity system in prokaryotes. F1000 Biol Rep. 2009;1:95.Google Scholar
- 36.Carpenter MR, Kalburge SS, Borowski JD, Peters MC, Colwell RR, Boyd EF. CRISPR-Cas and Contact-Dependent Secretion Systems Present on Excisable Pathogenicity Islands with Conserved Recombination Modules. J Bacteriol. 2017;199(10).Google Scholar
- 51.de Souza SM, Orth K. Intracellular Vibrio parahaemolyticus escapes the vacuole and establishes a replicative niche in the cytosol of epithelial cells. MBio. 2014;5(5):e01506–14.Google Scholar
- 54.Arndt D, Grant JR, Marcu A, Sajed T, Pon A, Liang Y, et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 2016;44(Web Server issue):W16–21.Google Scholar
- 64.Silas S, Makarova KS, Shmakov S, Paez-Espino D, Mohr G, Liu Y, et al. On the Origin of Reverse Transcriptase-Using CRISPR-Cas Systems and Their Hyperdiverse, Enigmatic Spacer Repertoires. MBio. 2017;8(4).Google Scholar
- 68.Silas S, Lucas-Elio P, Jackson SA, Aroca-Crevillen A, Hansen LL, Fineran PC, et al. Type III CRISPR-Cas systems can provide redundancy to counteract viral escape from type I systems. elife. 2017;6.Google Scholar
- 70.Grissa I, Vergnaud G, Pourcel C. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res. 2007;35(Web Server issue):W52–7.Google Scholar
- 74.Sullivan MJ, Petty NK, Beatson SA. Easyfig: a genome comparison visualizer. Bioinformatics. 2011;27(7);1009–10.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.