The pepper virome: natural co-infection of diverse viruses and their quasispecies
- 1.1k Downloads
The co-infection of diverse viruses in a host plant is common; however, little is known about viral populations and their quasispecies in the host.
Here, we report the first pepper viromes that were co-infected by different types of viral genomes. The pepper viromes are dominated by geminivirus DNA-A followed by a novel carlavirus referred to as Pepper virus A. The two pepper cultivars share similar viral populations and replications. However, the quasispecies for double-stranded RNA virus and two satellite DNAs were heterogeneous and homogenous in susceptible and resistant cultivars, respectively, indicating the quasispecies of an individual virus depends on the host.
Taken together, we provide the first evidence that the host plant resistant to viruses has an unrevealed antiviral system, affecting viral quasispecies, not replication.
KeywordsCultivar Pepper Plant virus Quasispecies Virome
Aconitum latent virus
Aphid lethal paralysis virus
Bell pepper endornavirus
Chilli leaf curl virus
Croton yellow vein mosaic virus alphasatellite
Gaillardia latent virus
Next generation sequencing
Pepper leaf curl Bangladesh virus
Pepper leaf curl virus betasatellite
Pea streak virus
Potato virus P
Single nucleotide variations
Tobacco vein clearing virus
Tomato yellow leaf curl virus;
Pepper plants in the Capsicum species, which are members of the Solanaceae family, have been cultivated worldwide and consumed as fresh fruits or powdered spices . A wide range of pathogens, including viruses, cause serious diseases in pepper plants. In particular, cultivated pepper plants are susceptible to diverse plant viruses, including DNA and RNA viruses . Out of 68 known virus species that infect peppers, about 20 cause viral diseases in pepper plants . Among the known RNA viruses that infect pepper plants, viruses in the genera Potyvirus, Cucumovirus, Polerovirus, and Crinivirus have been reported . Of the known DNA viruses that infect pepper plants, members of the genus Begomovirus, in the family Geminiviridae, are frequently reported. Begomoviruses transmitted by Bemisia tabaci are very destructive and emerging viruses in the production of pepper .
Next-generation sequencing (NGS) enables us to identify known as well as novel viruses in various plant species . In the case of pepper plants, several known viruses, including Pepper vein yellows virus and Bell pepper endornavirus (BPEV), and a novel virus, Pepper yellow leaf curl virus, have been identified by Illumina NGS techniques [5, 6, 7, 8].
A wide range of viruses display a significant level of genetic diversity within the host by replicating with high mutation rates . Viral populations are often composed of mutant clouds rather than a uniform viral genome with the same nucleotide sequence, which are referred to as viral quasispecies . It is known that high mutation rates as well as genetic recombination or reassortment are important factors for viral quasispecies [10, 11]. Studies on the quasispecies of medically important RNA viruses have been intensively conducted . In the case of plant RNA viruses, a previous study using three different viruses demonstrated that the quasispecies of many RNA viruses are influenced by host-virus interactions . In addition, some plant DNA viruses belonging to begomoviruses, such as Tomato yellow leaf curl virus (TYLCV), also exhibit high mutation rates, suggesting quasispecies of single-stranded (ss) DNA viruses in the host .
Several previous studies have reported that many plants are often co-infected by different plant viruses and viroids [14, 15, 16]. Plants, including fruit trees, ornamental plants, bulb plants, and tubers, are usually cultivated by vegetative reproduction, such as clonal propagation and grafting. Therefore, rates of virus co-infection are much higher than in plants propagated by seeds. For instance, previous studies have reported the co-infection of eight viruses and two viroids in a single grapevine cultivar , nine different RNA viruses in garlic bulbs , and 11 RNA viruses in a single sweet potato cultivar .
Recently, we screened available pepper transcriptomes to identify BPEV-associated sequences . Out of 77 screened pepper transcriptomes, two pepper transcriptomes were co-infected by different DNA and RNA viruses. In this study, we carried out systematic bioinformatics analyses to identify viruses, including a novel RNA virus that infects pepper plants. In addition, we revealed viral populations and quasispecies in two different pepper cultivars.
De novo transcriptome assembly and virus identification
De novo assembly of viral genomes
We identified several contigs associated with DNA virus components. A total of 13 and 16 contigs from PJ and TW, respectively, were closely associated with ChiLCV DNA-A (Fig. 3b). The lengths of ChiLCV DNA-A-associated contigs ranged from 1936 nt to 208 nt. The contigs were aligned on the complete sequences of five genes for ChiLCV DNA-A except the C1 gene. In addition, we identified seven and 13 contigs associated with PepLCVB from PJ and TW, respectively (Fig. 3c). The lengths of the contigs associated with PepLCVB ranged from 968 nt to 239 nt. Moreover, six CYVMVA-associated contigs, which fully cover a gene-encoding replication protein, were identified (Fig. 3c). The size of CYVMVA-associated contigs ranged from 1164 nt to 524 nt. Using contigs aligned to the reference genome, we obtained partial consensus sequences for DNA virus components. We aligned the obtained nucleotide and protein sequences for the PepLCVB C1 gene and CYVMVA replication gene with the respective reference sequence (Additional file 2: Figures S2 and S3). The sequence alignment showed the newly identified PepLCVB and CYVMVA might be novel sequences displaying many polymorphisms with the reference sequences.
Identification and complete genome assembly of a novel ssRNA virus
Phylogenetic relationships of identified viruses
Single nucleotide variations of identified viruses in two different pepper cultivars
Viral population and the amount of virus replication
Our results showed that at least three different types of plant viruses were co-infected in the PJ and TW cultivars. We examined the amount of viral RNAs and copy numbers of the individual viral genome in each cultivar (Additional file 1: Table S15). The amount of viral reads associated with DNA-A (68.9% and 75.9% in PJ and TW, respectively) was dominant, followed by ssRNA (27.3% and 20% in PJ and TW, respectively) and dsRNA (2.9% and 3.1% in PJ and TW, respectively) (Fig. 7b and c). The number of reads associated with alphasatellite and betasatellite DNAs ranging 0.1% to 0.6% were very low. Based on the copy number of each viral genome, the portion of ssDNA was increased to 89.3% and 89.7% in PJ and TW, respectively, while the portion of dsRNA with the largest genome among the identified viruses was decreased to 0.7% in both cultivars (Fig. 7d and e). The portion of ssRNA was decreased, while the portion of satellite DNAs was increased.
In this study, we conducted systemic bioinformatics analyses to reveal viral populations and their quasispecies in two different pepper plants, which are susceptible and resistant to various viruses, respectively. We will discuss how our in silico analyses using transcriptome data were very effective for discovering a wide range of new biologically significant results.
Although pepper plants are susceptible to a large number of RNA and DNA viruses, the coinfection of diverse DNA and RNA viruses in pepper plants was surprising. In addition, it was very difficult to reveal individual viral genomes from complex viral populations showing a high level of genetic diversity. For this, much experience and many techniques are required. An important factor for identifying viruses using NGS data is the length of viral contig. For example, BLAST results using short lengths of contigs revealed various viruses that are members in the same genus, although these contigs were derived from a single virus genome. For example, all contigs associated with AcLV, PeSV, PVP, and GalLV identified by BLAST search in our study were finally revealed as sequences derived from a novel virus, PepVA. Therefore, the BLAST results from partial sequences sometimes lead to the wrong interpretation for virus identification. Therefore, we recommend complete or nearly complete viral genome sequences for virus identification using BLAST search.
The two libraries in this study were constructed based on oligo(dT) primers to amplify the mRNAs of pepper. Thus, the identification of ssRNA with poly(A) tail, such as PepVA, was not surprising. However, the identification of RNA viruses without poly(A) tail, such as BPEV, as well as the identification of DNA viruses, such as TVCV, and several geminiviral DNA-A, alphasatellite, and betasatellite sequences from the plant transcriptomes is the first reported to our knowledge. Recently, several studies have shown that RNA viruses without poly(A) tail can be recovered by RNA-Seq data [14, 19]. Previous studies have adapted the enrichment of circular DNAs by PCR followed by NGS to identify DNA viruses and their satellite DNAs [22, 23]. In addition, plant RNA and DNA viruses were de novo assembled by small RNA sequencing . Our data showed that geminivirus segments, such as DNA-A and satellite DNAs, can also be identified from mRNA transcriptome data, which can fully cover most regions of virus-encoded genes. Based on our results, we suggest the possible application of mRNA transcriptome data for plant virus-associated studies.
We identified ten contigs with no close relatives to known viruses. However, conserved domain prediction and phylogenetic studies have identified a novel virus tentatively named PepVA belonging to the genus Carlavirus. Due to the poly(A) tail of PepVA, it was possible to assemble two complete genomes of PepVA from PJ and TW using only transcriptome data.
Recent studies have demonstrated that plant transcriptome data can be usefully applied in the assembly of complete or nearly complete genomes for several viruses and viroids [6, 14, 25]. In our experience, several factors, such as viral sequence coverage, the amount of viral replication, and sequencing methods, are important for the assembly of viral genomes. The number of sequence reads from paired-end sequencing is twice that from single-end sequencing. In general, paired-end sequencing is optimal for the de novo assembly of plant transcriptomes as well as viral genomes. In this study, we used two well-known de novo assemblers – Trinity and Velvet – for de novo transcriptome assembly and compared their viral genome assembly ability. As we have demonstrated in our recent study , Trinity was far superior for de novo assembly of the virus genome with longer contigs compared with Velvet. In general, velvet generated several short-length contigs, which is not optimal for virus genome assembly; however, the contigs generated by Velvet can be useful for the identification of viruses with low titers. Based on our results, the optimal combination for virus identification and viral genome assembly from plant transcriptomes might be de novo assembly using Trinity with paired-end sequencing. However, we also highly recommend assembling transcriptomes with two methods for virus-associated studies. Previously, many studies have used conventional PCR followed by Sanger sequencing to examine quasispecies of plant RNA and DNA viruses in different hosts [12, 26, 27]. Although the sizes of viral genomes are relatively small, several fragments for a target viral genome should be amplified instead of amplifying a complete viral genome. Furthermore, to understand the comprehensive nature of quasispecies for target viruses, a high level of sequence coverage is necessary. The sequence coverage through the conventional methods composed of PCR and Sanger sequencing is relatively low, while several NGS-based approaches produce numerous sequence reads, which are enough to reveal the quasispecies of target viruses. Therefore, several studies associated with viral quasispecies including viroids have used diverse NGS techniques instead of the PCR-based conventional methods [28, 29, 30, 31].
The same list of diverse viruses has been infected in two different pepper transcriptomes, indicating that the two cultivars were grown in the same field. The presence of ALPV suggests the sampled pepper plants were severely damaged by aphids, which are vectors for many plant viruses . In addition, the identification of geminiviral components in pepper plants suggests the transmission of geminivirus by Bemisia tabaci . The co-infection of diverse viruses in pepper plants indicates that these plants were naturally grown without any disease control by humans. Therefore, these samples are good examples for studying the natural co-infection of diverse viruses in different pepper cultivars and their quasispecies. A previous study demonstrated that BPEV and TVCV are transmitted by seeds [34, 35]. In addition, a recent study indicates the possible transmission of a geminivirus into tomatoes by seeds . Based on previous results, the co-infection of diverse viruses in pepper plants might also be mediated by not only insect vectors, but also seeds.
In general, the co-infection of various viruses in the host might be harmful for the host. For example, a previous study showed that the mixed infection of two geminiviruses in the pepper plants exhibited a synergistic interaction with increased disease symptoms . Although viruses are regarded as pathogens, some viruses might be good, which means that they are symbiotic partners in the host and play beneficial roles in the plant [38, 39]. In contrast, it is highly possible that some co-infected viruses do not have any association with the host disease symptoms.
The two different pepper cultivars, PJ and TW, have been demonstrated to be susceptible and resistant to diverse viruses, including Pepper leaf curl virus (PepLCV) infection in field conditions [20, 21]. Unexpectedly, we found a significant difference in mutation rates, but not in viral replication. The most surprising result was that out of the five examined viral genomes, BPEV, alphasatellite, and betasatellite exhibited homogenous sequences in TW and heterogeneous sequences displaying quasispecies in PJ. Moreover, the mutation rates for PepVA were also slightly reduced in TW compared with PJ. Based on these results, we infer that the mutation of BPEV, alphasatellite, and betasatellite DNAs was strongly inhibited in TW, which is a resistant cultivar to various pepper viruses. In addition, we found that mutation rates are highly correlated with the size of the viral genome. For example, BPEV with a relatively large dsRNA genome displayed the lowest mutation rate, while DNA-A and betasatellite showed a high level of mutation rates in PJ. We provide the first evidence that the strong inhibition of the viral mutation for dsRNA and satellite DNAs in the plant cultivar leads to resistance to various viruses. We hypothesized that the quasispecies nature of satellite DNAs might be highly correlated with resistance to geminivirus infection.
In this study, we demonstrated that two different pepper cultivars were co-infected by four different viral genomes – ssRNA, dsRNA, ssDNA, and dsDNA. Furthermore, we quantified the amount of viral RNAs for an individual virus and examined the copy numbers for the respective virus. Although two transcriptomes were sequenced by different methods, there were no big differences between two cultivars for viral populations. In addition, we found that the amounts of two satellite DNAs, dsDNA, and dsRNA replications were low, indicating that their replication was very low in the pepper host.
Taken together, we provide a wide range of new results associated with virus identification, viral genome assembly, viral populations, and virus quasispecies in two different plant cultivars using only pepper transcriptome data. Based on our results, it might be of interest to find a correlationship between the co-infection of different viruses and disease symptoms in the pepper host.
Plant materials, library preparation and sequencing
Two different pepper (Capsicum annuum) cultivars named Pusa Jwala (PJ) and Taiwan-2 (TW) and grown in India were used in this study. PJ and TW were identified as susceptible and resistant, respectively, to diverse viruses, including capsicum chlorosis, cucumber mosaic, and groundnut bud necrosis, which were screened by the ELISA test. Four different tissues composed of the root, stem, leaf, and fruit were pooled and used for total RNA extraction. The two prepared libraries were sequenced by the HiSeq2000 system. The PJ library was single-end sequenced, while the TW library was paired-end sequenced.
De novo transcriptome assembly
Two raw datasets were downloaded from the SRA (Sequence Read Archive) database (http://www.ncbi.nlm.nih.gov/sra) in NCBI with their respective accession numbers. Initially, the raw data were de novo assembled by the Velvet (ver. 1.2.10) and Oases (ver. 0.2.08) programs  and the assembled sequences were deposited in the TSA (Transcriptome Shotgun Assembly) database with the accession numbers GAXV00000000 (PJ) and GAXU00000000 (TW). All bioinformatics analyses were conducted using a 64-core CPU and 256-GB RAM workstation installed with Xubuntu 12.04.4 LTS. The raw data were again de novo assembled by Trinity (ver. r20140717) with default parameters .
Identification of viruses from transcriptome data
Four different transcriptome datasets assembled by two different de novo assemblers were subjected to a BLAST search against the virus reference database (http://www.ncbi.nlm.nih.gov/genome/viruses/). We used MEGABLAST, which is optimized for highly similar sequences with the e-value 1E-5 as a cutoff. The identified viruses were manually filtered to remove endogenous virus-like sequences and bacteriophages. Finally, virus-associated contigs from each dataset were used for further analyses.
Assembly of consensus viral genomes
Virus-associated contigs were aligned to the respective reference viral genome using the BWA program  and visualized by the Tablet program . In addition, we used the ClustalW program implemented in the MEGA6 program  followed by manual modification to obtain a consensus viral genome from each pepper cultivar. As a result, we obtained ten consensus sequences for five viral genomes from two pepper cultivars.
Identification of a novel virus and its annotation
Our BLAST search and sequence alignment demonstrated that the 10 virus-associated contigs were derived from a single novel virus. The longest contigs were subjected to conserved domain prediction  and the ORF finder . The complete genome sequences for PepVA were deposited in GenBank with the following accession numbers: KU923763 (PepVA isolate PJ) and KU726694 (PepVA isolate TW).
In the case of PepVA, five proteins were blasted against NCBI’s non-redundant (NR) protein database and homologous viral protein sequences were subjected to sequence alignment. The amino acid sequences for an individual viral protein were aligned by ClustalW followed by manual editing. Phylogenetic trees were constructed by 1000 bootstrap replicates using the Neighbor-Joining method with the Poisson correction of distance using the MEGA6 program . For BPEV, ChiLCV DNA-A, PepLCVB, and CYVMVA, the obtained consensus nucleotide sequence from an individual cultivar were blasted against the non-redundant nucleotide (nt) database. Homologous nucleotide sequences were subjected to sequence alignment followed by manual editing. Phylogenetic trees were constructed by 1000 bootstrap replicates using the Neighbor-Joining method with the Maximum Composite Likelihood model using the MEGA6 program.
Analysis of single nucleotide variation for each viral genome
To detect single nucleotide variations (SNVs) for the identified viral genome in each cultivar, we mapped raw sequence data on the individual consensus viral genome sequence using BWA with default parameters . The aligned reads in the SAM file format were converted into BAM file format using SAMtools (ver. 1.3) . The BAM file was sorted and indexed. SNV calling was performed by BCFtools (ver. 1.3), resulting in a VCF (Variant Call Format) file containing information on SNVs. The identified SNVs for each virus were visualized using the Tablet program .
Viral RNAs and copy numbers
To analyze the amount of viral RNAs in each transcriptome, we performed MEGABLAST against the identified consensus viral sequences. To calculate the viral copy number for each virus, the obtained number of viral reads was multiplied by 101 and then divided by the total length of the individual genome.
This work is dedicated to the memory of my father, Tae Jin Cho (1946–2015).
This work was partially supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government Ministry of Education (No. NRF-2016R1D1A1A02937216) and the support of the “Cooperative Research Program for Agriculture Science & Technology Development” (Project No. PJ01186102) conducted by the Rural Development Administration, Republic of Korea.
Availability of data and materials
Raw sequencing data used in this study are available with following accession numbers in SRA database (SRR1068215 and SRR1123893) and assembled sequences were deposited in the TSA (Transcriptome Shotgun Assembly) database with the accession numbers GAXV00000000 (PJ) and GAXU00000000 (TW). The obtained viral consensus sequences were deposited in GenBank with their respective accession number as follows: BPEV isolate PJ (KU923755), BPEV isolate TW (KU923756), ChilLCV isolate PJ (KU923757), ChiLCV isolate TW (KU923758), CYVMVA isolate PJ (KU923759), CYVMVA isolate TW (KU923760), PepLCVB isolate PJ (KU923761), PepLCVB isolate TW (KU923762), PepVA isolate PJ (KU923763), and PepVA isolate TW (KU726694).
WKC and YJ designed the research; YJ, HC, SMK, SLK, and BCL performed the research; YJ, HC, SMK, SLK, BCL, and WKC analysed the data; and YJ, HC, SMK, SLK, BCL, and WKC wrote the paper. All authors read and approved the final manuscript.
The authors declare that they have no competing interest.
Consent for publication
Ethics approval and consent to participate
This study did not include the use of any animals, human or otherwise, so did not require ethical approval.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 5.Alabi O, Al Rwahnih M, Jifon J, Gregg L, Crosby K, Mirkov T. First Report of Pepper vein yellows virus Infecting Pepper (Capsicum spp.) in the United States. Plant Dis. 2015;99(11):1656. https://doi.org/10.1094/PDIS-03-15-0329-PDN.
- 8.Dombrovsky A, Glanz E, Lachman O, Sela N, Doron-Faigenboim A, Antignus Y. The complete genomic sequence of Pepper yellow leaf curl virus (PYLCV) and its implications for our understanding of evolution dynamics in the genus Polerovirus. PLoS One. 2013;8:e70722.CrossRefPubMedPubMedCentralGoogle Scholar
- 14.Jo Y, Choi H, Cho JK, Yoon J-Y, Choi S-K, Cho WK. In silico approach to reveal viral populations in grapevine cultivar Tannat using transcriptome data. Sci Rep. 2015;5:15841. https://doi.org/10.1038/srep15841.
- 20.Kumar S, Kumar R, Kumar S, Kumar Singh A, Singh M, Bahadur Rai A, et al. Incidence of leaf curl disease on Capsicum germplasm under field conditions. Indian Journal of Agricultural Sciences. 2011;81:187.Google Scholar
- 22.Hagen C, Frizzi A, Gabriels S, Huang M, Salati R, Gabor B, et al. Accurate and sensitive diagnosis of geminiviruses through enrichment, high-throughput sequencing and automated sequence identification. Arch Virol. 2012;157:907–15.Google Scholar
- 24.Seguin J, Rajeswaran R, Malpica-Lopez N, Martin RR, Kasschau K, Dolja VV, et al. De novo reconstruction of consensus master genomes of plant RNA and DNA viruses from siRNAs. PLoS One. 2014;9:e88513.Google Scholar
- 31.Jo Y, Choi H, Cho WK. Genome Sequence of Dengue virus 3 from the Pythium insidiosum Transcriptomes. Front Microbiol. 2016;7:926. https://doi.org/10.3389/fmicb.2016.00926.
- 34.Okada R, Kiyota E, Sabanadzovic S, Moriyama H, Fukuhara T, Saha P, et al. Bell pepper endornavirus: molecular and biological properties, and occurrence in the genus Capsicum. J Gen Virol. 2011;92:2664–73.Google Scholar
- 36.Kil E-J, Kim S, Lee Y-J, Byun H-S, Park J, Seo H, et al. Tomato yellow leaf curl virus (TYLCV-IL): a seed-transmissible geminivirus in tomatoes. Scientific reports. 2016;6Google Scholar
- 41.Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–52.Google Scholar
- 43.Milne I, Stephen G, Bayer M, Cock PJ, Pritchard L, Cardle L, et al. Using Tablet for visual exploration of second-generation sequencing data. Briefings Bioinf. 2013;14(2):193–202. https://doi.org/10.1093/bib/bbs012.
- 44.Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–29. https://doi.org/10.1093/molbev/mst197.
- 45.Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, et al. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011;39:D225–9.Google Scholar
- 47.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.