Arenaviruses are small RNA viruses with two ambisense genome segments. The large (L) segment encodes the viral polymerase (L) and the matrix protein (Z), while the small (S) segment encodes the glycoprotein (GP) and the nucleoprotein (NP). The open reading frames (ORFs) are separated by a structured non-coding intergenic region (IGR) that facilitates transcription termination (Fig. 1A) [1,2,3]. Highly conserved sequences at the genome termini (untranslated regions, UTRs) contain conserved complementary nucleotides that are critical for viral RNA synthesis [4,5,6].

Fig. 1
figure 1

Analysis of Tacaribe virus non-coding sequences. (A) Schematic diagram of the genome, indicating discrepancies in non-coding regions. Open reading frames (colored boxes), terminal noncoding regions (grey boxes) and intergenic regions (hairpin) are all indicated. Arrowheads indicate the location of missing data and discrepancies. (B) Genome termini. Sequences determined in this study (TCRV-EXP, shown in bold) were compared to the reference sequences for strain TRVL-11573 (TCRV-REF) and the 2012 Florida strain (TCRV-Florida) as well as Junín virus (JUNV, strain Romero) and Machupo virus (MACV, strain Carvallo). Mismatches are shown in red, and missing data are indicated by dashes. Complementarity of the genome end sequences is shown with the promoter region at the 3’ end boxed. Missing data/discrepancies are shown in red. (C) Intergenic regions. Annotation is as described for (B). GenBank accession numbers are as follows: TCRV-EXP (S segment, MT081316; L segment, MT081317), TCRV-REF (S segment, M20304; L segment, J04340), TCRV-Florida (S segment, KF923400; L segment, KF923401). JUNV and MACV S and L segment sequences were as published in references [23] and [24], respectively

The arenaviruses that infect mammals (i.e., mammarenaviruses) are divided into the Old World arenaviruses, which are primarily found in Africa, and the New World arenaviruses, which are mostly found in South America. Tacaribe virus (TCRV; species Tacaribe mammarenavirus) is the prototype of such New World arenaviruses (also known as TCRV serocomplex viruses). While many members of this group are causative agents of hemorrhagic fever, TCRV is not itself a human pathogen, making it important both for comparative pathogenesis studies and as a BSL2 system for basic research into arenavirus biology [7, 8].

TCRV was originally isolated from dead bats collected in Trinidad as part of a rabies surveillance program at the Trinidad Regional Virus Laboratory (TRVL). Further efforts led to several additional isolations during the period from 1956 to 58; however, only the strain TRVL-11573 was preserved [9]. It has since been disseminated to laboratories worldwide, where it has formed the basis for all molecular biology research on this virus. Indeed, it remained the only strain in existence until 2012, when a nearly identical virus isolate was recovered from ticks collected in a Florida state park [10]. Unsurprisingly, given its importance for research, sequences for both segments of the TRVL-11573 strain were generated early on and have been available in the GenBank database since 1993 (accession no. M20304 [S], J04340 [L]) [11,12,13,14]. The genome sequence established by these reference sequences is coding-complete and has formed an important basis for many molecular and functional studies. However, more recent studies have increasingly suggested that these sequences also contain significant errors [10, 15, 16]. Furthermore, no currently available TCRV sequence includes the 5’ end of the S segment – information that is critical for the development of molecular systems dependent on viral RNA synthesis (e.g., reverse genetics systems). To address these issues, we have generated a complete (end-to-end) genome sequence based on the TCRV prototype strain TRVL-11573 using modern sequencing methods.

Provenance and sequencing

TCRV (strain TRVL-11573) [9] was obtained through the University of Geneva and was originally sourced from the Arbovirus Reference Laboratory of the CDC [17]. Virus stocks were grown on Vero76 cells (CCLV-RIE0228), and viral RNA was isolated from these supernatants using a QIAamp Viral RNA Mini Kit (QIAGEN) and reverse transcribed using virus-specific primers and Superscript III (Invitrogen). The resulting cDNA was then used with iProof (Bio-Rad) to amplify specific overlapping regions of the genome, which were then purified using a NucleoSpin Gel and PCR Clean-Up Kit (Macherey-Nagel). Genome ends were amplified from cDNA using ligation-anchored PCR, as described previously [18,19,20]. Briefly, for 3’ end amplification, a 3’-end-blocked linker (/5Phos/GAAGAGAAGGTGGAAATGGCGTTTTGG/3Phos/) was ligated to the viral RNA using T4 RNA ligase (NEB) prior to reverse transcription with a gene-specific primer and subsequent PCR using a gene-specific primer and a primer complementary to the linker sequence. In contrast, for 5’ end amplification, cDNA was synthesized using an internal gene-specific primer and cleaned up using a QIAquick PCR Purification Kit (QIAGEN) prior to linker ligation and PCR as described above. Sanger sequencing of all products with specific primers was performed by Eurofins/GATC. Additional details of the experimental protocols are available on request. IGR folding predictions were performed using Mfold [21].

Sequence properties

Sequencing of the TCRV genome revealed 7103 nucleotides for the complete L segment (GenBank accession MT081317) and 3422 nucleotides for the complete S segment (GenBank accession MT081316) (Fig. 1A) and identified 16 nucleotides that were missing from the 5’ end of the existing S segment reference sequence (Fig. 1B). Significant discrepancies were also identified in comparison to the previously reported 3’ end sequence of the L segment. Specifically, we observed differences at nucleotides 6 and 8 of the 3’ terminus that change the predicted base pairing between the 5’ and 3’ termini (Fig. 1B). The new sequence data would suggest that the TCRV genome ends are identical to those of the closely related Junín virus (JUNV) and Machupo virus (MACV). Interestingly, it has been reported recently that publically available reference sequences for JUNV and MACV also contained errors at these same positions and that such errors can hamper the development of reverse genetics systems [22,23,24]. We also identified discrepancies in the IGRs (Fig. 1C). While the single-nucleotide insertion in the L segment IGR appears to have little effect on the energetics of folding, the more extensive changes in the S segment are predicted to have a dramatic effect on the stability of the secondary structures formed in this region (ΔG = -78.0 (vRNA)/76.6 (cRNA) kcal/mol compared to ΔG = -52.4 (vRNA)/53.8 (cRNA) kcal/mol) for the reference sequence). These changes indicate that the IGRs of TRVL-11573 are identical to those reported for the Florida strain (Fig. 1C).

Comparison of the coding regions also highlighted several obvious differences. In particular, the NP sequence contains two frameshift mutations (a deletion and an insertion) that result in a 4-amino-acid exchange from GPPT to DLQL (Fig. 2A) in a loop region flanking key exonuclease active site residues. While this mutation was originally proposed to explain the reduced ability of TCRV NP, in comparison to other arenavirus NPs, to inhibit type I interferon (IFN) production during infection [25], the presence of a GPPT-to-DLQL mutation could not be confirmed by more-recent sequences derived from the TRVL-11573 isolate [15], nor was it found in the 2012 Florida isolate [10]. Furthermore, the sequence data for GP revealed a 12-amino-acid deletion that eliminates potential N-linked glycosylation sites that are present in the reference sequence. While it is unclear if this difference is due to the loss of this region during virus passaging over the decades, or whether it is due to improvement in sequencing techniques, our observation is consistent with findings from a recently reported partial GP sequence for TRVL-11573 (KP159416) [16] (Fig. 2B) as well as the 2012 Florida strain [10], suggesting that other current isolates also lack this sequence. Overall, the sequences generated in this study support both of these reported deviations from the currently available reference sequence for the TRVL-11573 isolate of TCRV.

Fig. 2
figure 2

Analysis of Tacaribe virus coding region sequences. Sequence discrepancies in the (A) nucleoprotein (NP) or (B) glycoprotein (GP) open reading frame. Arrowheads indicate the location of discrepancies, which are shown in red text in the respective sequences. Amino acid positions are indicated. Exonuclease catalytic site residues (in NP) and putative N-linked glycosylation sites (in GP) are boxed in black. GenBank accession numbers are as follows: TCRV-EXP (S segment, MT081316; L segment, MT081317), TCRV-REF (S segment, M20304; L segment, J04340), TCRV-Florida (S segment, KF923400; L segment, KF923401), TCRV-NP partial (KC329849), TCRV-GP partial (KP159416)

Interestingly, taking these discrepancies into account, TRVL-11573 shows a much higher degree of sequence similarity to the sequence isolated from ticks in Florida than was reported based on the previously available reference sequence [10], with the S segments showing 99.7% identity (9 nucleotide mismatches; 6 amino acid changes) and the L segments showing 99.9% identity (10 nucleotide mismatches; 2 amino acid changes). Indeed, when all publically available TRVL-11573 sequences, including partial sequences, are taken into account, only one nucleotide position in the S segment and two in the L segment appear to be unique to the Florida isolate, representing an unexpected level of conservation between viruses from different countries that are separated by more than 50 years in their isolation dates (in addition to the extensive laboratory passage history of TRVL-11573).

In summary, we detected a number of significant differences in both the coding and non-coding sequence of the TCRV strain TRVL-11573 sequence compared to the early sequences that have until now represented the only publically available reference for this important prototype arenavirus. It is anticipated that the availability of the first complete sequence of TCRV, covering both the coding and non-coding regions and based on modern sequencing methods, will be instrumental for future molecular and evolutionary studies of this virus.