Cotton (Gossypium hirsutum L) is an important fibre and cash crop of India, and it plays a major role in the industrial and agriculture economy of the country. It accounts for ~ 4 per cent of the country’s gross domestic product, often termed by famers as ‘white gold’ for bringing higher income. India has been the largest producer of cotton in the last 4 years and contributes about 26 per cent of total world’s production (Anonymous 2020). However, in the last two decades cotton leaf curl disease (CLCuD) has significantly limited cotton production (Monga et al. 2005; Rajagopalan et al. 2012; Sattar et al. 2013). CLCuD is caused by geminiviruses complex in the genus Begomovirus, commonly referred to as CLCuD- associated begomoviruses (CABs) and are transmitted by whitefly (Bemisia tabaci) (Briddon and Stanley 2006; Kumar et al. 2010; Rajagopalan et al. 2012; Sattar et al. 2017).

The genome of CABs consists primarily of a monopartite single stranded circular DNA known as DNA-A and is frequently found to be associated with non-viral circular ssDNA satellite molecules viz. betasatellite and alphasatellite, forming a disease complex (Briddon and Stanley 2006; Sattar et al. 2013; Zerbini et al. 2017) DNA-A, with size of ~ 2.8 kb encodes coat protein (AV1/CP) and pre-coat protein (AV2) on the virion strand, along with replication initiation protein (Rep), replication enhancer protein (REn), transcriptional activator protein (TrAP), and the C4 protein on the complementary strand (Briddon et al. 2001, 2002; Amrao et al. 2010). Betasatellites measures half of its helper virus i.e. 1.4 kb, and contain a single ORF (βC1), satellite conserved region (SCR) and A-rich region (Gnanasekaran et al. 2019). The βC1 appears to be the determinant of pathogenicity, suppressesing the host defense mechanism and contributing in virus accumulation (Briddon et al. 2001; Qazi et al. 2007; Iqbal et al. 2012). Betasatellites depend on their helper virus for movement, encapsidation and replication. In contrast, alphasatellites are self-replicating molecules, around 1.4 kb in size, encode Rep protein. Nevertheless, they dependent on their helper viruses for their movement and encapsidation (Zhou 2013). Interestingly, Rep proteins encoded by some alphasatellites have demonstrated the ability to suppress RNA silencing activity, thus overcome host defense mechanism (Nawaz-ul-Rehman et al. 2010).

CLCuD-infected cotton plants show characteristic symptoms of vein thickening (small and main vein thickening), upward or downward curling of leaves, formation of a small leaf like structure (enation) on the abaxial side of leaves and stunting. In Indian subcontinent, CLCuD was noticed for the first time in the Multan region of Pakistan in 1967 (Hussain and Ali, 1975; Amrao et al. 2010), and in New Delhi (India) in 1989 (Rishi and Chauhan 1994). Due to the mobility of the vector, the disease has spread throughout the other adjoining cotton growing states/areas.

Since the first epidemic of CLCuD during 1990s, at least nine distant begomovirus species have been found to be associated with the disease in Indian subcontinent (Yousaf et al. 2013). Later, in beginning of 21th century, a new recombinant begomovirus strain i.e. cotton leaf curl Burewala virus (CLCuBuV) was reported from the Punjab province of Pakistan (Amrao et al. 2010; Ullah et al. 2015). Recently, Datta et al. (2017) noticed a significant change in epidemiology of CLCuD and reported the rebound of CLCuMuV in Punjab region. Biswas et al. (2020) also reported the predominance of the CLCuMuV- Rajasthan strain in the North-Western cotton growing plains of India.

The viral complex is still evolving due to high probability of recombination and could lead to resistant breaking of varieties, production of diverse symptoms and increase in newer host adaptation (Saunders et al. 2001; Seal et al. 2006). Because of multiple virus strain and the complex interaction between plant host, whitefly vector and begomoviruses, conventional breeding programs have not been much effective against highly evolving CABs (Chaparro-Garcia et al. 2015). Understanding the continuous evolution and recombination events of begomovirus-satellite complexes, and the application of new technologies like CRISPR-Cas9, will facilitate the development of resistance sources against this disease. During 2015–2016, Balram et al. (2017) surveyed the cotton belt of Northern India and observed the maximum CLCuD incidence (70.3 to 72.3%) in Punjab in comparison to Haryana (42.2 to 61.0%) and Rajasthan (40.1 to 42.5%). Many changes have been noticed in the occurrence of the CLCuD in this Punjab Region (Datta et al. 2017) and Sattar et al. (2017) have predicted the upcoming third epidemic in the Indian subcontinent. Hence, this study was conducted to know the prevalence, diversity and recombination of CABs and its satellite molecules in the cotton growing region of the Indian Punjab.

Materials and methods

Field survey and sample collection

During Kharif 2019–2020, roving survey of the cotton fields was undertaken in cotton growing districts of Punjab, India viz. Bathinda, Mansa, Faridkot, Fazilka, Sri Muktsar Sahib and Ludhiana (Fig. 1). Leaves showing peculiar symptoms of CLCuD i.e. upward and downward curling, small and main vein thickening, enation on the abasial sideand stunting were collected and preserved at − 80 °C until use.

Fig. 1
figure 1

Map displaying the geographical region of cotton growing districts (Colored) viz., Bathinda, Mansa, Faridkot, Sir Muktsar Sahib, Fazilka and Ludhiana of Punjab, India

Total DNA extraction and viral genome enrichment

Total DNA was extracted using the cetyl trimethyl ammonium bromide (CTAB) method (Doyle and Doyle 1987). Isolated DNA was quantified using Eppendorf Bio-Spectrometer (Eppendorf AG, Germany). Subsequently, circular viral genome in total DNA was enriched with rolling circle amplification (RCA) using Φ29 DNA polymerase as per manufacturer’s protocol (GE Healthcare UK Limited).

PCR amplification

The RCA products were used as template for Polymerase Chain Reaction (PCR) assay with universal degenerate primers (Rojas 1993; Deng et al. 1994; Wyatt and Brown 1996). Primer pairs Beta 01/ 02 (Briddon et al. 2002) and Alpha UN 101/ 102 (Bull et al. 2003) were used to amplify the betasatellite and alphasatellite, respectively. The PCR reaction was carried out by using the EmeraldAmp® GT PCR Master Mix (2 × Premix) (Takara Bio India Private Limited). All PCR assays were carried out in Thermocycler (Eppendorf vapo. protect) under different PCR conditions depending on the primer sets (Table 1). The amplified PCR products were visualized on 1% Agarose gel stained with Ethidium Bromide. Desired amplicons were purified by using NucleoSpin Gel and PCR Clean-up kit (Macherey–Nagel, Germany). Purified PCR products were outsourced to AgriGenome Labs Pvt. Ltd. Kerala (India) for bi-directional Sanger sequencing.

Table 1 PCR conditions for different universal primers

Sequence analysis

The raw reads thus obtained via Sanger sequencing were analyzed and assembled using Lasergene (DNAStar Inc., Madison, USA). Overlapping contigs were made and sequences submitted to Genbank database of NCBI. Accession numbers were obtained for sequences representing DNA-A (MW836808, MW836809, MW921474, MW921475), betasatellite (MW855104, MW855105, MW855106, MW855107) and alphasatellite (MW921476, MW921477). Top BLAST (Basic Local Alignment Searching Tool for nucleotides) hits sequences were retrieved and aligned using the MUSCLE (Multiple Sequence Comparison by Log-Expectation) algorithm available in MEGA version 7.0 (Kumar et al. 2016). The phylogenetic tree was constructed, using the Neighbor-Joining method with 1000 bootstrap replicates. Pairwise nucleotide identity was determined by using the Sequence Demarcation Tool (SDT) version 1.2 (Muhire et al. 2014).

Results

Roving survey

During the survey, the CLCuD affected plants were observed and their symptoms variability was recorded. The majority of observed diseased plants exhibited enation with vein thickening on the underside of the leaves. The symptomatology of CLCuD varied with disease severity. In few plants, chlorosis and stunting were also observed and the number of flowers and bolls formation was affected in some cases.

Confirmation of CABs and associated satellites molecules

To detect the presence of begomovirus, CP gene was amplified with three universal primer pairs viz., Deng A/ Deng B (Deng et al. 1994), PALIc 1960/ PARIv 722 (Rojas 1993) and AV 494/ AC 1048 (Wyatt and Brown 1996) whereas the betasatellite and alphasatellites were amplified with Beta 01/ 02 (Briddon et al. 2002) and Alpha UN 101/102 (Bull et al. 2003) primer pairs respectively. Two universal primers (Deng A/ Deng B, AV 494/ AC 1048) amplified the CP gene and yielded amplicon of 530 bp (Fig. 2a) and 575 bp (Fig. 2b) respectively. The primer pair PALIc 1960/ PARIv 722 amplified the CP gene and Rep gene and yielded amplicon of 1.2 kb size (Fig. 2c). Both satellite primers Beta 01/02 and Alpha UN 101/102 yielded major and minor bands (Fig. 2d). Beta 01/02 amplified the beta-C1 gene of sizes 1350 bp (minor) and 600–700 bp (major). Similarly, Alpha UN 101/102 also amplified two types of amplicons, 1200 bp (minor) and 800 bp (major). The small sized products are the amplification of defective DNA molecules (deletion mutations) or are result of the recombination (Briddon et al. 2002). Of the sixty five total samples tested, only twenty eight showed amplification of CP gene and confirmed the presence of begomoviruses whereas thirty seven samples did not show any amplification.

Fig. 2
figure 2

Agarose gel electrophoresis showing the PCR amplification using the different Universal Primer sets: a Deng A/ Deng B, b AV 494/ AC 1048, and c PALIc 1960/ PARIv 722, Amplification of associated satellite molecules- d Beta 01/Beta 02 and e Alpha UN 101/ Alpha UN 102

Sequence and phylogenetic analysis of CABs

The pairwise sequence similarity analysis was done with SDT v1.2 (Muhire et al. 2014) for the demarcation of species and colored matrix chart was generated. The twenty eight partial CP gene of CLCuD—begomovirus isolates amplified in this study were analysed and compared with 18 other isolates reported by Brown et al. (2015). The 18 reference sequences include: six isolates of CLCuMuV (EU365613, AJ002447, JN807763, AJ132430, EU365616, AF363011), five isolates from Cotton leaf curl Alabad virus (AJ002452, GU112081, GU112004, FJ210467, EU384575) and Cotton leaf curl Kokharn virus (AJ496286, AM421522, HF549182, GU385879, FN552001), one of Cotton leaf curl Gezira virus (AF260241) and Cotton leaf curl Bangalore virus (AY705380), to determine the species of the CLCuD complex. Recently reported isolates (Datta et al. 2017; Biswas et al. 2020) were also included in the analysis. Two isolates from Bathinda (MW836809 and MW921474) and one from Abohar (MW921475) under this study shared 89.39 to 98.13% nt similarity among themselves and only 54.36 to 56.34% pairwise similarity with Mansa isolate (MW836808). The Abohar and Bathinda isoaltes (MW921475, MW836809 and MW921474) shared maximum pairwise nt identity of 92.72 to 97.81% with CLCuMuV-Rajasthan stain (AJ132430). Whereas the isolate from Mansa (MW836808) shared similarity of 92.21 to 92.36% with isolates of CLCuMuV-Pakistan (EU365616) and CLCuMuV- Hisar (AJ 132430) strains.

Based on the standardized criterion for species (cut off value ≥ 91%) and strain (cut-off value ≥ 94%) demarcation threshold (Brown et al. 2015), the begomovirus(s) characterized under this study were considered as isolates of CLCuMuV species. The Bathinda isolates (MW836809 and MW921474) were closely related to CLCuMuV-Rajasthan strain. In the NJ phylogenic analysis, the isolates in this study are divided into two groups (Fig. 3); two isolates from Bathinda (MW836809 and MW921474) and one from Abohar (MW921475) formed a cluster with previously reported CLCuMuV-Raj strain isolates from Pakistan and India, while isolate from Mansa (MW836808) formed a distant cluster with other CLCuMuV isolates (AJ132430 and JF502358).

Fig. 3
figure 3

Phylogenetic analysis of CABs under this study and some referenced viruses. The sequences generated under this study are indicated in red colored triangles. The tree was generated using the Neighbor-joining (NJ) method in MEGA 7. The reference sequences described by Brown et al. (2015) are highlighted by star (*)

Sequence and phylogenetic analysis of betasatellite molecules associated with CABs

The present betasatellite isolates showed 58.8–87.6% nucleotide sequence similarity among themselves and were considered to be same species. They are related to cotton leaf curl Multan betasatellite (CLCuMuB) by 75–98% nt identity (Fig. 4a). Based on the recommended species demarcation cut-off value (≥ 78%) (Briddon et al. 2008), the betasatellite isolates in the present study are considered members of CLCuMuB, and thus reveals the occurrence of a single betasatellite species of CLCMuB in association with CLCuD in Punjab.

Fig. 4
figure 4

Phylogenetic analysis of betasatellites (a) and alphasatellites (b) under this study and some reference isolates. The sequences generated under this study are indicated in red colored triangles. The tree was generated using the Neighbor-joining method in MEGA 7

In the phylogenetic analysis, present betasatellite isolates were divided into three groups (Fig. 4a). The isolates from Faridkot (MW855107) and Bakainwala (MW855106) form the first group, Ludhiana (MW855104) clustered with the Pakistan isolates of CLCuMuB (LN831965, MH550634 and LT549459) in the second group, while Mansa isolate (MW855105) made the third group with Indian isolates of CLCuMuB (GQ39730, KP015741, HF549187 and HM146308).

Sequence and phylogenetic analysis of alphasatellite molecules associated with CABs

Two alphasatellite isolates from Bakainwala (MW921476) and Muktsar (MW921477) shared 92% nt identity. The Bakainwala isolate showed percent nucleotide similarity with the following isolates: cotton leaf curl Multan alphasatellite (CLCuMuA) (MK357290, MF14173 and KY783480) (92–98%), Gossypium darwani symptomless alphasatellite (GDrSLA) (MF141741) (95.65%), and Agreatum conyzoides symptomless alphasatellite (ACSLA) (94.47%). Muktsar isolate showed maximum nt identity of 91–95% with GDrSLA isolates (KX656837, MF929023 and MF141741) and 94% with cotton leaf curl Burewala alphasatellite (CLCuBA) isolate (FN658729.1). The alphasatellite isolate from Bakainwala formed a cluster with Indian isolates of CLCuMuA (MF141734 and KY83480) and GDrSLA (MF141741), while isolate from Sri Muktsar Sahib formed a single separate branch (Fig. 4b).

Discussion

The present study revealed the occurrence of CLCuD in the South Western districts of Punjab, India. During the survey, symptoms of CLCuD viz. small vein thickening, curling (upward and downward), enation (small leaf-like outgrowth), reduced internodes and stunting of plant were observed in this region. These symptoms are generally associated with the CABs as reported by several workers (Kumar et al. 2010; Sohrab et al. 2014; Balram et al. 2017; Bhattacharyya et al. 2017; Khare et al. 2018). CLCuD has become a major threat to cotton production in the northwestern region of India and presents great challenge in achieving the full yield potential of cotton varieties (Sain et al. 2020). The diseases causes 30 to 80 per cent (%) yield losses in seed cotton yield (Arora and Singh 2019). In this study, CABs were characterized on the basis of CP gene sequence with three sets of universal primer pairs of begomoviruses. Although CP gene sequences are not enough for the specific determination of begomoviruses (Biswas et al. 2020), Bhattacharyya et al. (2017) have shown that the pattern in the phylogenetic analysis among CP gene sequence and complete genome sequence are mostly similar.

Since begomoviruses are more prone to recombination and mutation, new variants/ species are emerging and could cause the resistant breaking (Rajagopalan et al. 2012; Godara et al. 2015; Chakrabarty et al. 2020). In the present study, we reported the presence of the CLCuMuV species and Rajasthan strain in the Punjab region. This report is supported by many previous studies. Recently, Biswas et al. (2020) reported the CLCuMuV-Rajasthan as the most prominent strain in the northwestern cotton growing states; Punjab, Haryana and Rajasthan. Datta et al. (2017) have also reported the re-emergence of CLCuMuV in the Punjab region. Pathogenicity determinants, i.e. betasatellites and alphasatellites associated with CABs are also evolving with their helper viruses and undergoing recombination (Kumar et al. 2017; Zubair et al. 2021).

A diverse group of alphasatellites including cotton leaf curl Multan alphasatellite (CLCuMuA), Gossypium darwinii symptomless alphasatellite (GDarSLA), cotton leaf curl Shadadpur alphasatellite (CLCuShA), cotton leaf curl Burewala alphasatellite (CLCuBuA), Gossypium davidsonii symptomless alphasatellite (GDavSLA), Gossypium mustilinum symptomless alphasatellite (GMusSLA) have been reported in cotton (Nawaz-ul-Rehman et al. 2012; Sattar et al. 2017; Zubair et al. 2017). In contrast to the divergence of CABs and alphasatellite, a single CLCuMuB has been reported as the predominant betsatellite associated with CABs in this region (Sattar et al. 2013; Sohrab et al. 2014; Saleem et al. 2016). In the present study, we found the single betasatllite CLCuMuB and two types of alphasatellites i.e. CLCuMuA and GDrSLA in the cotton growing region of Punjab, India. Occurrence of major recombination events in betasattelite molecules are indications of possible outbreak of CLCuD and resistance breaking in cotton hybrids in near future.