Genomic epidemiology supports multiple introductions and cryptic transmission of Zika virus in Colombia
Colombia was the second most affected country during the American Zika virus (ZIKV) epidemic, with over 109,000 reported cases. Despite the scale of the outbreak, limited genomic sequence data were available from Colombia. We sought to sequence additional samples and use genomic epidemiology to describe ZIKV dynamics in Colombia.
We sequenced ZIKV genomes directly from clinical diagnostic specimens and infected Aedes aegypti samples selected to cover the temporal and geographic breadth of the Colombian outbreak. We performed phylogeographic analysis of these genomes, along with other publicly-available ZIKV genomes from the Americas, to estimate the frequency and timing of ZIKV introductions to Colombia.
We attempted PCR amplification on 184 samples; 19 samples amplified sufficiently to perform sequencing. Of these, 8 samples yielded sequences with at least 50% coverage. Our phylogeographic reconstruction indicates two separate introductions of ZIKV to Colombia, one of which was previously unrecognized. We find that ZIKV was first introduced to Colombia in February 2015 (95%CI: Jan 2015 – Apr 2015), corresponding to 5 to 8 months of cryptic ZIKV transmission prior to confirmation in September 2015. Despite the presence of multiple introductions, we find that the majority of Colombian ZIKV diversity descends from a single introduction. We find evidence for movement of ZIKV from Colombia into bordering countries, including Peru, Ecuador, Panama, and Venezuela.
Similarly to genomic epidemiological studies of ZIKV dynamics in other countries, we find that ZIKV circulated cryptically in Colombia. More accurately dating when ZIKV was circulating refines our definition of the population at risk. Additionally, our finding that the majority of ZIKV transmission within Colombia was attributable to transmission between individuals, rather than repeated travel-related importations, indicates that improved detection and control might have succeeded in limiting the scale of the outbreak within Colombia.
KeywordsColombia Genomic epidemiology Phylogeography Virus evolution Zika virus
- Ct value
Cycle threshold value
Instituto Nacional de Salud de Colombia
real time reverse-transcription polymerase chain reaction
Universidad del Rosario
In recent years, countries across the Americas have experienced the emergence and endemic circulation of various mosquito-borne viruses, making this a critical area for public health surveillance and epidemiologic research. Zika virus (ZIKV) caused a particularly widespread epidemic, with over 800,000 suspected or confirmed cases reported . Given estimated seroprevalence rates between 36% and 76% [2, 3, 4, 5], the true number of ZIKV infections in the Americas is likely much higher. With neither a vaccine nor ZIKV-specific treatments available, understanding the epidemiology of ZIKV is our primary tool for controlling disease spread . However, because many infections are asymptomatic , the analysis of surveillance data alone yields inaccurate estimates of when ZIKV arrived in a country [7, 8]. In such cases, introduction timing and transmission dynamics post-introduction are better inferred from genomic epidemiological studies, which use joint analysis of viral genome sequences and epidemiologic case data. Indeed, such studies have defined our understanding of when ZIKV arrived in Brazil , described general patterns of spread from Brazil into other countries in the Americas [7, 9, 10], and been used to investigate the extent of endemic transmission occurring post-introduction . Genomic epidemiological studies of the spread of ZIKV in the Americas have aided our understanding of the epidemic [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19], but generally, ZIKV pathogen sequencing has remained a challenge for the public health community .
Colombia has a population of approximately 48 million people. In addition, Colombia has Aedes aegypti and Ae. albopictus mosquitoes, which are commonly found at elevations below 2000m above sea level . Public health surveillance for arboviral diseases, along with other notifiable conditions, is performed by the Instituto Nacional de Salud de Colombia (INS) . While suspected cases from other municipalities were reported earlier , the INS first confirmed ZIKV circulation in mid-September 2015, in the Turbaco municipality on the Caribbean coast. ZIKV spread throughout the country, appearing in areas infested with Ae. aegypti that experience endemic dengue transmission and previous circulation of chikungunya virus . Over the entire epidemic Colombia reported 109,265 cases of Zika virus disease , making it the second most ZIKV-affected country in the Americas after Brazil. The extent of the epidemic led the INS to start active surveillance for congenital Zika syndrome  as well as other neurological syndromes associated with ZIKV infection . While the INS determined that epidemic ZIKV transmission ended in July 2016, they continue to perform surveillance for endemic transmission.
Despite numerous reported cases, only 12 whole ZIKV genomes from Colombian clinical samples were publicly available. These sequences included 1 sample from Barranquilla, Atlántico department, 4 samples from Santander department, and 7 sequences for which departmental or municipal information was unspecified. We sequenced an additional 8 samples from ZIKV-positive human clinical and Ae. aegypti specimens, sampled from previously unrepresented Colombian departments. We describe here the first detailed phylogeographic analysis of Colombian ZIKV to estimate when, and how frequently, ZIKV was introduced into Colombia.
INS sample selection and processing
The INS National Virological Reference Laboratory collected diagnostic specimens from over 32,000 suspected ZIKV cases over the course of the epidemic. Of these, roughly 800 serum specimens were positive for ZIKV by real time RT-PCR (rRT-PCR) and had cycle threshold (Ct) values less than 30. From this set we selected 176 serum specimens that were ZIKV-positive, but negative for dengue and chikungunya viruses, as per results from the Trioplex rRT-PCR assay . Specimens were selected such that each Colombian department was represented over the entire time period in which specimens were submitted. We extracted RNA using the MagNA Pure 96 system (Roche Molecular Diagnostics, Pleasanton, CA, USA) according to manufacturer’s instructions. An extraction negative was used for each plate; positive controls were eschewed given the risk of cross-contaminating low titer clinical samples . We attempted reverse transcription and PCR-amplification of ZIKV using the two-step multiplex PCR protocol developed by Quick and colleagues on all 176 extracted samples . Briefly, cDNA was generated using random hexamer priming and the Protoscript II First Strand cDNA Synthesis Kit (New England Biolabs, Ipswich, MA, USA). We amplified cDNA using the Zika Asian V1 ZIKV-specific primer scheme , which amplifies 400bp long overlapping amplicons across the ZIKV genome, over 35 cycles of PCR. Amplicons were purified using 1 × AMPure XP beads (Beckman Coulter, Brea, CA, USA) and quantified using with the Qubit dsDNA High Sensitivity assay on the Qubit 3.0 instrument (Life Technologies, Carlsbad, CA, USA). Due to long storage periods and variable storage temperature, the vast majority of samples were too degraded to amplify. Of the 176 processed samples, only 15 amplified sufficiently to perform sequencing.
UR sample selection and processing
Universidad del Rosario (UR) collected and performed diagnostic testing on 23 human clinical samples from different geographic regions, and 38 Ae. aegypti samples from the Cordoba department of Colombia. RNA was extracted using the RNeasy kit (Qiagen, Hilden, Germany) and a single TaqMan assay (Applied Biosystems, Foster City, CA, USA) directed to ZIKV was employed  to confirm ZIKV presence. Approximately 60% of samples (14 clinical samples and 23 Ae. aegypti samples) were found to be ZIKV-positive by rRT-PCR. From these, we attempted amplification on 8 samples with sufficiently high viral copy numbers that they were likely to amplify. Amplification, purification, and quantification of ZIKV amplicons from UR samples were performed as described above. Of the eight samples that we performed PCR on, four samples amplified sufficiently to conduct sequencing; three samples were from human clinical specimens and one was from an Ae. aegypti sample.
We sequenced amplicons from 4 UR and 15 INS samples using the Oxford Nanopore MinION (Oxford Nanopore Technologies, Oxford, UK) according to the protocol described in Quick et al. . Amplicons were barcoded using the Native Barcoding Kit EXP-NBD103 (Oxford Nanopore Technologies, Oxford, UK) and pooled in equimolar fashion. Sequencing libraries were prepared using the 1D Genomic DNA Sequencing kit SQK-LSK108 (Oxford Nanopore Technologies, Oxford, UK). We used AMPure XP beads (Beckman Coulter, Brea, CA, USA) for all purification steps performed as part of library preparation. Prepared libraries were sequenced on R9.4 flowcells (Oxford Nanopore Technologies, Oxford, UK) at the INS in Bogotá and at the Fred Hutchinson Cancer Research Center in Seattle.
Raw signal level data from the MinION were basecalled using Albacore version 2.0.2 (Oxford Nanopore Technologies, Oxford, UK) and demultiplexed using Porechop version 0.2.3_seqan2.1.1 (https://github.com/rrwick/Porechop). Primer binding sites were trimmed from reads using custom scripts, and trimmed reads were mapped to Zika reference strain H/PF/2013 (GenBank Accession KJ776791) using BWA v0.7.17 . We used Nanopolish version 0.9.0 (https://github.com/jts/nanopolish) to determine single nucleotide variants from the event-level data, and used custom scripts to extract consensus genomes given the variant calls and the reference sequence. Coverage depth of at least 20x was required to call a SNP; sites with insufficient coverage were masked with N, denoting that the exact nucleotide at that site is unknown. After bioinformatic assembly, 8 samples produced sufficiently complete genomes to be informative for phylogenetic analysis.
All publicly available Asian lineage ZIKV genomes and their associated metadata were downloaded from ViPR  and NCBI GenBank. The full download contained both published and unpublished sequences; we sought written permission from submitting authors to include sequences that had not previously been published on. Any sequences for which we did not receive approval were removed. Additionally, we excluded sequences from the analysis if any of the following conditions were met: the sequence had ambiguous base calls at half or more sites in the alignment, the sequence was from a cultured clone for which a sequence from the original isolate was available, the sequence was sampled from countries outside the Americas or Oceania, or geographical sampling information was unknown. Finally, we also excluded viruses whose estimated clock rate was more than 4 times the interquartile distance of clock rates across all sequences in the analysis. Sequences that deviate this greatly from the average evolutionary rate either have far too many or too few mutations than expected given the date they were sampled. This deviation usually occurs if the given sampling date is incorrect, or if the sequence has been affected by contamination, lab adaptation, or sequencing error. After curation, the final dataset consisted of 360 sequences; 352 publicly available ZIKV full genomes from the the Americas (including Colombia) and Oceania, and the 8 Colombian genomes from the present study.
Data were cleaned and canonicalized using Nextstrain Fauna (https://github.com/nextstrain/fauna), a databasing tool that enforces a schema for organizing sequence data and sample metadata, thereby creating datasets compatible with the Nextstrain Augur analytic pipeline (https://github.com/nextstrain/augur) and the Nextstrain Auspice visualization platform (https://github.com/nextstrain/auspice). A full description of the Nextstrain pipelines can be found in Hadfield et al. . Briefly, Nextstrain Augur performs a multiple sequence alignment with MAFFT , which is then trimmed to the reference sequence. A maximum likelihood phylogeny is inferred using IQ-TREE . Augur then uses TreeTime  to estimate a molecular clock; rates inferred by TreeTime are comparable to BEAST , a program that infers temporally-resolved phylogenies in a Bayesian framework . Given the inferred molecular clock, TreeTime then creates a temporally-resolved phylogeny, infers sequence states at internal nodes, and estimates the geographic migration history across the tree. These data are exported as JSON files that can be interactively visualized on the web using Nextstrain Auspice.
We generated a series of datasets in which ZIKV genomes from either Colombia or Mexico were subsampled. For each value from 1 to n, where n is the total number of available sequences for the country of interest, we generated 30 subsampled datasets in which n sequences were randomly sampled without replacement. Each subsampled set was then combined with all ZIKV genomes from other countries included in the main phylogeographic analysis described above, and the phylogeographic inference was rerun. For the Colombian rarefaction analysis, we have n=20 high quality whole genomes, so we inferred 20×30=600 phylogeographically labelled trees. For the Mexican rarefaction analysis, we have n=51 high quality whole genomes in total, and inferred 51×30=1530 phylogeographically labelled trees. For each labelled tree, we used custom scripts to traverse the resulting phylogeny and count the number of ZIKV introductions into the country of interest. We then plotted the inferred introduction count as a function of the number of sequences sampled.
Sequencing and sampling characteristics of reported ZIKV genomes
Metadata and percent of genome that was able to be called unambiguously for 19 samples that amplified sufficiently to attempt sequencing
Belén de Umbría
Valle del Cauca
Valle del Cauca
Valle del Cauca
Valle del Cauca
General patterns of ZIKV transmission in the Americas
Multiple introductions of ZIKV to Colombia
We infer patterns of ZIKV introduction and spread in Colombia from the phylogenetic placement of 20 Colombian sequences. Previously, 12 of these sequences were publicly available. We add an additional 8 Colombian sequences sampled over a broad geographic and temporal range. Colombian ZIKV sequences clustered into two distinct clades (Fig. 3b). Both clades are descended from viruses inferred to be from Brazil (Fig. 3a). Viruses immediately ancestral to both Colombian clades are estimated by the phylogeographic model to have 100% model support for a Brazilian origin. However, lack of genomic sampling from many ZIKV-affected countries in the Americas may limit our ability to infer direct introduction from Brazil or transmission through unsampled countries prior to arrival in Colombia.
Transmission within Colombia
We estimate that clade 1 was introduced to Colombia around late February of 2015 (95% CI: January 2015 to April 2015) (Figs. 3b and 4), and that clade 2 was introduced in mid September of 2015 (95% CI: July 2015 to November 2015) (Figs. 3b and 5). Our estimate of clade 1 introduction timing supports between five and eight months of cryptic ZIKV transmission within Colombia prior to initial case detection in September 2015, a finding that is consistent with other genomic epidemiological studies of ZIKV [7,8,10].
Transmission from Colombia to other countries
We find evidence for onward transmission from Colombia into other countries in the Americas. Clade 1 shows movement of viruses into countries that share a border with Colombia, namely Panama, Venezuela, and Peru, as well as into the Dominican Republic and Martinique (Fig. 4). Clade 2 indicates movement of Colombian ZIKV into neighboring Ecuador (Fig. 5). Transmission from Colombia into bordering countries seems reasonable, and these patterns agree with previously documented trends of ZIKV expansion in the Americas [7,9,10], but provide more detail due to the greater amounts of sequence data now available. For instance, analysis by Metsky et al.  also supports movement of ZIKV from Colombia to Martinique and the Dominican Republic. However, without sequence data from Panama, Peru, or Venezuela, they were unable to capture spread from Colombia into these countries.
Despite the scale of the Colombian epidemic, publicly available sequence data were limited, and no detailed genomic epidemiological analysis of ZIKV dynamics had been performed. We sought to improve genomic sampling for Colombia, and to perform a detailed genomic analysis of the Colombian epidemic. Only 12 Colombian genomes were available prior to this study. To these data we added 8 new sequences sampled broadly across Colombia, and performed a phylogeographic analysis of American ZIKV. We describe general transmission patterns across the Americas and present estimates of ZIKV introduction timing and frequency specific to Colombia. We find evidence of at least two introductions of ZIKV to Colombia, yet remarkably the majority of Colombian viruses cluster within a single clade, indicating that a single introduction event caused the majority of ZIKV cases in Colombia. Under the assumption that viruses sequenced from Colombia are random samples of Colombian ZIKV cases, we find that Colombian ZIKV diversity is well represented by 20 Colombian genomes. It is therefore unlikely that further genomic sampling would reveal more introductions of ZIKV into Colombia. ZIKV dispersal out of Colombia appears widespread, with movement to bordering countries (Panama, Venezuela, Ecuador, and Peru) as well as more distal countries in the Caribbean.
While it may be tempting to read the inferred phylogeographic migration history as a complete record of transmission between countries, we caution against doing this for analyses of ZIKV. In contrast to other large outbreaks, such as the Ebola epidemic in West Africa, genomic sampling of the American ZIKV epidemic is sparse. Many ZIKV-affected countries have minimal genomic sampling; others have none at all. Thus while the phylogeographic model will correctly infer the geographic location of internal nodes given the dataset at hand, adding sequences from previously unsampled countries may alter migration histories such that apparent direct transmission from country A to country C instead becomes transmission from country A to country B to country C.
Consistent with other studies, our estimates of when introductions to Colombia occurred support cryptic ZIKV transmission prior to initial case confirmation. Perhaps more surprisingly, our estimate of the age of clade 1 indicates that ZIKV likely spread to Colombia even before official confirmation of ZIKV circulation in Brazil . These findings underscore the utility of genomic epidemiology to date introduction events and describe transmission patterns that are difficult to detect using traditional surveillance methods, thereby providing more accurate definitions of the population at risk and a better understanding of how importation and within-country transmission shape epidemics.
We found evidence for two separate introductions of ZIKV to Colombia, one of which occurred five to eight months prior to the official confirmation of ZIKV in Colombia. Refining our estimates of when ZIKV circulated in Colombia improves our definition of when individuals were at risk for ZIKV infection. Accurately defining this exposure period increases the ability of population-level observational studies to properly measure associations between ZIKV infection and outcomes of interest. In addition, we found that the majority of ZIKV diversity in Colombia descends from one of these two introductions, and rarefaction analyses suggest that we would not identify more introductions with greater genomic sampling. Taken together, these findings suggest that most cases of ZIKV infection were attributable to ZIKV transmission within Colombia after a single introduction event, and that cases of ZIKV infection acquired in other countries and brought back to Colombia were rare. As the majority of Colombian ZIKV infections were locally-acquired, infection prevention and control measures targeting local spread might have limited the scale of the outbreak within Colombia.
We thank R. Shabman, B. Pickett, P. Rahal, L. Karan, R. Delgado, A. Enfissi, N. Grubaugh, and R. Lanciotti for giving us permission to include unpublished genomes available on GenBank in our analysis. We also thank Maria Fernanda Torres for facilitating this collaboration and Adam Geballe for generously loaning space in his laboratory at the Fred Hutch.
AB, LHM, DPR, IML, MEH, JDR, MMR, and TB conceived of the study and contributed to study design. AB, LHM, KLD, LP, AR, and CT, performed the lab work. AB and BP performed the bioinformatic assembly. AB performed the phylogeographic analysis. AB, LHM, and TB interpreted the data and wrote the manuscript. DPR, MEH, DPC, JDR, and MMR facilitated the collaboration and sought ethics approvals for the work. JDR, DPC, and MMR provided clinical samples and metadata. All authors provided critical feedback that helped shape the research and the manuscript. All authors read and approved the final manuscript.
AB is supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1256082. LHM, DPR, MEH, and IML are supported by NIH U54 GM111274. LHM is an Open Philanthropy Project fellow of the Life Sciences Research Foundation. This work was partially supported by Grupo de Virología, Dirección de Redes en Salud Pública, Instituto Nacional de Salud. Bogotá D.C, Colombia. CT and JDR are funded by Gobernación de Córdoba, sistema general de regalias (SGR) Colombia, Grant No. 754/2013 and by Dirección de Investigación e Innovación from Universidad del Rosario. TB is a Pew Biomedical Scholar and is supported by NIH R35 GM119774-01. The funding bodies had no role in the design of the study, data collection, data analysis, data interpretation, nor in the writing of the manuscript.
Ethics approval and consent to participate
In Colombia, most ZIKV diagnostic testing occurred at the INS. However, Colombian academics were also involved in ZIKV surveillance, sampling Ae. aegypti and limited human cases. Our study includes samples collected by both the INS and the Universidad del Rosario (UR). Sample collection by UR was approved by the Universidad del Rosario Ethics Committee, approval number 617-2017. Written consent was obtained for human specimen collection. Sample collection performed by the INS was approved by the Comité de Ética y Metodologías de la Investigación (CEMIN) - Instituto Nacional de Salud, approval number CEMIN 39-2017. Consent for specimen collection was waived as specimens were collected as part of the Colombian public health response. All sample selection for inclusion in the genomic analysis, sequencing library preparation, and data analysis were performed on anonymized samples and data. Administrative permission to access data collected by the INS was required. This permission was granted in Collaboration Agreement Fred-Hutch-Ref-#COL171204.
Consent for publication
The authors declare no competing interests.
- 1.PAHO. Zika Cumulative Cases. https://www.paho.org/hq/index.php?option=com_content&view=article&id=12\%390: zika-cumulative-cases&Itemid=42090&lang=en. Accessed 10 Sept 2018.
- 21.Pacheco O, Beltrán M, Nelson CA, Valencia D, Tolosa N, Farr SL, et al.Zika Virus Disease in Colombia - Preliminary Report. N Engl J Med. 2016. https://doi.org/10.1056/NEJMoa1604037.
- 22.Rojas DP, Dean NE, Yang Y, Kenah E, Quintero J, Tomasi S, et al.The epidemiology and transmissibility of Zika virus in Girardot and San Andres island, Colombia, September 2015 to January 2016. Euro Surveill. 2016; 21(28). https://doi.org/10.2807/1560-7917.es.2016.21.28.30283.
- 23.Instituto Nacional de Salud. Boletin epidemiologico semanal - semana epidemiologica 08 de 2016. https://www.ins.gov.co/buscador-eventos/BoletinEpidemiologico/2016\%20B\%olet\%C3\%ADn\%20epidemiol\%C3\ %B3gico\%20semana\%208.pdf. Accessed 11 Mar 2016.
- 27.Santiago GA, Vázquez J, Courtney S, Matías KY, Andersen LE, Colón C, et al.Performance of the Trioplex real-time RT-PCR assay for detection of Zika, dengue, and chikungunya viruses. Nat Commun. 2018; 9(1). https://doi.org/10.1038/s41467-018-03772-1.
- 36.Pettersson JHO, O Pettersson JH, Eldholm V, Seligman SJ, Lundkvist Å, Falconar AK, et al.How Did Zika Virus Emerge in the Pacific Islands and Latin America?MBio. 2016; 7(5). https://doi.org/10.1128/mbio.01239-16.
- 37.PAHO. Epidemiological Alert: Zika virus infection. PAHO. 2015. https://www.paho.org/hq/dmdocuments/2015/2015-may-7-cha-zika-virus-epi-alert.pd.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.