Introduction

Tomato yellow leaf curl virus (TYLCV) (Genus Begomovirus, Family Geminiviridae) [1], causes significant damages to tomato in both quantitative and qualitative terms. Symptoms associated with TYLCV infection in tomato include chlorotic leaf edges, upward leaf cupping, leaf mottling, reduced leaf size, and flower drop. Symptoms on the fruit make the agricultural product unmarketable. Thus, TYLCV can inflict a severe impact on tomato production. Tomato plants infected at an early stage won't bear fruit and plants are severely stunted. In early virus infections, 100% crop loss is not uncommon. The virus is generally transmitted by whitefly, Bemisia tabaci, and the vector has developed resistance to many of the insecticides, which limits the vector control option [2,3,4,5,6].

The virion of TYLCV is composed of twin icosahedral capsids. The viral genome consists of a single-stranded circular DNA genome of about 2.7 kb, which encodes six open reading frames (ORFs), two (V1 and V2) in viral sense and four (C1- C4) in complementary sense. All the proteins encoded by begomoviruses are multifunctional. Protein encoded by ORF V1 is involved in encapsidation of the genome into geminate particles, insect transmission and long distant movement. Protein derived from V2 ORF plays a role in cell-to-cell movement and suppression of host-mediated RNA silencing. C1 protein is involved in viral DNA replication, while ORF C2-derived protein activates viral DNA transcription and is also a host RNA silencing suppressor. Protein encoded by ORF C3 plays a role in enhancing viral DNA replication, and ORF C4 encoded protein is known to be a RNA silencing suppressor [7, 8]. Besides these ORFs, the TYLCV genome has a 300-nucleotide (nt) non-coding intergenic region (IR), which possesses a bidirectional promoter and a core conserved nona-nucleotide sequence (TAATATTAC). The nona-nucleotide sequence is recognized by the C1 protein, and it serves as the origin of replication. A great deal of information was generated over the years about the diversity, epidemiology and molecular biology of the disease and the virus [9,10,11].

Management of viral diseases, especially those caused by TYLCV, is challenging and expensive. The whitefly vector has a wide host range, and several weeds serve as reservoir green bridges for TYLCV to survive between crop seasons. Plant health management approaches include the use of resistant cultivars working on the principle of resistance (R) genes, RNA interference (RNAi), and vector management by insecticides, and cultural practices such as adjusting planting date and crop-free period [1, 12,13,14]. In tomato, host plant resistance to TYLCV was mainly based on the use of Ty genes [15]. Nevertheless, the genomic features of TYLCV are altered owing to multiple factors such as virus recombination, genetic mutations, and inclusion of satellite components and invasion of exogenous whitefly species among others [16, 17]. Hence, TYLCV have the ability to overcome the endogenous and genetically engineered disease control measures, and there is a continuous need to develop novel virus management strategies that are effective and durable.

Viral diseases of plants are widespread in Kuwait and are causing significant economic losses reaching up to 95% in many vegetable crops [18]. These crops comprise the majority of greenhouse and open field agriculture in Kuwait. Mosaic, abnormal leaf color, abnormal vein patterns of leaves, mottling in leaves, spotting patterns in leaves, and abnormal leaf shape, leaf curling, and yellowing were observed on plants grown in Al Wafra and Abdally during 2007, 2010, and 2014 seasons [19]. Recent studies showed that TYLCV infection is causing major economic losses in tomato plants, even upto90%. TYLCV has been reported as a major tomato-infecting virus, but it has not been fully characterized at the molecular level and little is known about the genetic diversity of TYLCV populations in Kuwait. Whiteflies are the main vector for TYLCV [20, 21] and a considerable quantum of research was done regarding the management of whitefly vectors with a goal to reduce the impact of begomoviruses [20, 22, 23]. Considering the economic importance of the viral disease in tomato, caused due to TYLCV, and frequent disease outbreaks inflicting severe crop losses [18,19,20,21] this study was framed with an objective of characterizing the genome sequences of TYLCV infecting tomato in Kuwait and to perform a comprehensive genomic analysis. We report the molecular characterization of several TYLCV isolates collected from northern and southern Kuwait and the extent of genetic diversity among the isolates characterized herein and reported earlier from Kuwait, the role of virus recombination events, and the inferences from molecular phylogeny with a view to decipher the evolutionary genomics of the virus.

Materials and methods

Sample collection

Two commercial tomato farms (Abdally in northern Kuwait, and Al Wafra region in southern Kuwait) were surveyed and leaf samples from plants that displayed symptoms suggestive of TYLCV infection (640 sample) were collected and brought back to Kuwait Institute for Scientific Research, Environmental and Life Science Research Center, (KISR), Safat, 13109, Kuwait and stored at 4ºC until further use. The virus infected samples were collected during 2020 and 2021.

Total DNA extraction and TYLCV detection

Total genomic DNA was extracted using the CTAB method and tested for the presence of TYLCV by PCR using primer pair PTYc787 (GTTCGATAATGAGCCCAG) and PTYc1121(ATGTAACAGAAACTCATG) [21]. The samples positive for TYLCV were used to sequence the full-length genome which was carried out in Department of Plant Pathology, Washington State University, Pullman, USA following rolling circle amplification (RCA) method. Briefly, the RCA reaction was performed by mixing the following components: genomic DNA (50 ng), 1 μL of exo-resistant random primers (500 μM, 35 optical density units (OU)/mL), 2 μL of 10X reaction buffer, to bring the volume to 10.2 μL using water. The RCA reaction mixture was heated at 95 °C for 5 min in water bath, followed by chilling on ice for 2 min. Following this, 2 μL of 10 mM dNTP, 1.6 μL of phi-29 DNA polymerase, 0.2 μL of pyrophosphate, inorganic (0.1U/μL) were added and incubated at 30 °C for 18 h. The RCA reaction was carried out in a thermal cycler machine. The reaction was stopped by heating the components at 65 °C for 10 min. The RCA derived reaction products were resolved in agarose gel electrophoresis and the amplicons were directly sequenced using the Sanger DNA sequencing protocol.

In silico analysis

Sequence comparisons of the fifteen Abdally TYLCV isolates, sixteen Al Wafra isolates and three previously reported KISR isolates were performed using the Sequence Demarcation Tool (SDT) [24]. Sequence alignment was generated using ClustalW algorithm. A molecular phylogenetic tree was constructed using the maximum-likelihood method (default parameters with 1000 replicates in the bootstrap analysis) using these sequences along with TYLCV genome sequences available in GenBank utilizing MEGAX [25]. Phylogeny was inferred from using, General Time Reversal (GTR) model with G + I (invariant sites and distributed range) [25]. Potential recombination events were detected by Recombination Detection Program-4 (RDP 4 Beta 4.16) [26]. All the complete TYLCV genome sequences reported from Kuwait and available in GenBank were used in the analysis and the sequence alignment was carried out using ClustalW in MEGA X [25] and the aligned sequences were used for recombination detection studies. For identifying recombination events, step-down correction with the highest acceptable p-value setting of 0.05 was used along with other default settings for all of nine methods (RDP, Chimaera, BootScan, 3Seq, GENECONV, MaxChi, SiScan and LARD, PhylPro) available in the RDP 4 [26].

Results and discussion

Molecular genomic analysis of geminiviruses has been the mainstay in deciphering the genomic diversity of the virus species infecting economically important crops and in devising suitable disease control measures [1, 2, 14]. Hence, in this study, the complete genomic sequences of thirty-one TYLCV isolates from Kuwait were sequenced and deposited in GenBank database (Table 1). The nucleotide identity analysis of various TYLCV genes from these isolates were shown along with the previously reported three isolates (KISR, KISR2 and KISR3) from Kuwait [18, 19] (Fig. 1). The nucleotide sequence identity of Kuwaiti isolates of TYLCV range from 91.20 to 100%. Compared to all fifteen Abdally TYLCV isolates, the three previously reported KISR isolates and six out of sixteen Al Wafra isolates have 19 extra nucleotides (TTCTTTCTAGGTGTGCCCC) in the intergenic region (Fig. 2). There exist 4-nucleotide variations before the 19 extra nucleotides. The three previously reported isolates and six Al Wafra isolates had “G/CCTT” before the 19 extra nucleotides, whereas the others possessed “AAA(A)”.

Table 1 List of whole genome sequences of tomato yellow leaf curl virus isolates reported from Kuwait and their NCBI GenBank accession details
Fig. 1
figure 1

Comparative sequence analysis based on the whole genome sequences of tomato yellow leaf curl virus isolates from two different regions of Kuwait

Fig. 2
figure 2

Multiple sequence alignment of all tomato yellow leaf curl virus isolates reported from Kuwait. Three previously reported KISR isolates and six out of sixteen Al Wafra isolates had 19 extra nucleotides (highlighted in yellow) near the 5′-end

Phylogenetic analyses were conducted to compare complete nucleotide sequences with those from other parts of the world and to infer molecular evolution of current isolates (Fig. 3). A total of eighty-seven TYLCV isolates were analyzed, which formed two distinct clades. All 34 Kuwaiti isolates (31 from this study and three previously reported) grouped into a clade along with the isolates from China, Iran, Israel, Japan, Jordan, Mexico, Oman, Portugal, Turkey and the US (Fig. 3). TYLCV isolates reported from Cuba, Japan, Lebanon, Morocco, The Netherlands, and Spain formed a distinct clade (Fig. 3). Earlier study on molecular analysis of TYLCV in the Arabian Peninsula and Iran suggests these regions as the centre of diversity [11].

Fig. 3
figure 3

Phylogenetic tree based on the whole genome sequences of Tomato yellow leaf curl virus isolates reported from Kuwait and other parts of world. Evolutionary history was inferred by using Maximum Likelihood method and Tamura-Nei Model. The phylogenetic tree was rooted to Pepper golden mosaic virus isolate (AY928514). Three previously reported Kuwaiti isolates are shown in red boxes. The sequences of AI Wafra isolates are highlighted in yellow and Abdally isolates are highlighted in green

To identify putative recombination events, an analysis was performed using the RDP4 program. Fourteen recombination events in the 87 TYLCV full-length sequences from worldwide were detected by at least four methods in the RDP4 program (Table 2). The isolates Abdally 6A (OL890669) and Abdally 3B (OL890670) reported in this study were identified to be potential recombinants. In addition, TYLCV isolates Abdally 11B (OL890676), Abdally 13B (OL890678), and Abdally 3B (OL890670) served as major parents for generation of different recombinants. It suggests the occurrence of microevolution within the TYLCV populations in Kuwait (Table 2 and Fig. 4). Further, genetic recombinant TYLCV isolates were shown to have ecologically selective advantage over their parental viruses [17].

Table 2 Potential genetic recombination events identified in tomato yellow leaf curl virus genomes as detected by Recombination Detection Program (RDP)
Fig. 4
figure 4

Recombinants detected by Recombination Detection Program (RDP v 4) among various tomato yellow leaf curl virus isolates. The event numbers are shown. (details of the major and minor parents and p-values are provided in Table 2)

Comparative sequence analysis of Kuwaiti TYLCV isolates reveal > 90% sequence identity, suggesting that these isolates may have derived from a few parental strains originally imported into the country (Fig. 1). The additional 19 nucleotides observed in nine Kuwaiti isolates indicate that these isolates might have resulted from a single gene recombination/insertion event. The further integration of “CCTT” motif could represent a second mutation event occurred at a later time (Fig. 2). The phylogenetic analysis of all known TYLCV sequences reported worldwide suggests that TYLCV seems to undergo a relatively rapid evolution and exhibit significant genetic variation. The isolates Al Wafra 1, Al Wafra 7, Al Wafra2, AlWafra 16, Al Wafra 3, and Al Wafra 6 were genetically distinct and formed a separated cluster. However, some TYLCV isolates viz., Al Wafra 17, Al Wafra 23, Al Wafra 24 and Abdally 15 shared phylogenetic lineage with TYLCV isolates reported from Iran and Oman (Fig. 3). On the other hand, all the Abdally isolates, excluding Abdally 15, showed monophyletic origin along with TYLCV isolates reported from Oman (Fig. 3).The phylogenetic relatedness of Kuwait TYLCV isolates with that of Iran and Oman imply the cross border transfer or movement of virus isolates among the countries in Persian Gulf region (Fig. 3). If the observed diversity in sequences in selected isolates has any role in modulating the TYLCV-tomato interactions remains to be seen and the current study lays groundwork for further investigations.

In conclusion, we report here the complete genome sequences of 31 isolates of TYLCV circulating in the tomato fields of Northern and Southern Kuwait. A comprehensive comparative genomic analysis of the novel TYLCV isolates along with those reported earlier identified recombination events that could possibly involve in the evolution of TYLCV causing the generation of novel variants of concern. The information presented in this article will be quite useful for the comprehension of TYLCV biology, epidemiology and disease control measures.

Limitations

No limitations were encountered during the study. Our findings form the basis for further studies on the functional significance of the extra 19 nucleotides found in the 5′ end of genomic sequences of KISR and six Al Wafra isolates.