Introduction

Baculoviruses have a positive-stranded double-stranded DNA genome and are one of the most successful insect viruses that have been used in biological control for many years (Herniou et al. 2011). They are an environmentally friendly microbial agent that does not harm any living thing other than insects and therefore can be used safely in biological control (Krieg et al. 1980; Entwistle et al. 1983). In addition, the use of baculoviruses as an expression vector in the production of recombinant protein for the diagnosis of diseases and vaccine development with bacmid (bacterial artificial chromosomes containing the baculovirus genome) technology is one of its most important features (O’Reilly et al. 1992; Miller et al. 1998).

Baculoviruses are subject to various classifications according to both their nucleo-capsid structure and the host group they infect. Baculoviruses with polyhedrin protein are called nucleopolyhedrovirus, and those with granulin protein are called granuloviruses. Nucleopolyhedroviruses are determined to have a single or multiple nucleocapsid structure as a result of transmission electron microscopy examinations (Ackermann and Smirnoff 1983). Although the majority of baculoviruses infect the Lepidoptera group (Alphabaculoviruses and Betabaculoviruses), there are also isolates that infect Hymenoptera (Gamabaculoviruses) and Diptera (Deltabaculoviruses) (Jehle et al. 2006).

In recent years, molecular comparison of baculovirus isolates has been done at the whole genome level. In particular, genomic differences between baculovirus isolates obtained from dissimilar geographical regions are understood by next-generation sequencing and bioinformatics analysis. Thus, detailed information about the genome features of baculoviruses was obtained and it was determined that they contain approximately 900 different genes (Ferrelli et al. 2012). There are 38 conserved genes common to all baculoviruses (Garavaglia et al. 2012; Javed et al. 2017). However, other genes and repeat regions in the genome cause significant differences between isolates. In particular, geographic variants isolated from the same insect have different virulence and host spectrum. The reason for these differences will only be revealed by performing genome analyses and comparing the isolates with each other at the genome level. In addition, genome analyses of baculoviruses will enable the identification of new baculovirus genes that were not known until now (Van Oers et al. 2005).

Multiple genotypic variants have emerged in baculoviruses, even in individually infected larvae. Restriction fragment length polymorphism analysis (RFLP), which has been used since old times, is a very convenient method for the detection of geographic variation (Erlandson 2009). In addition, the emergence of new, reliable, and inexpensive DNA sequencing strategies that have emerged in recent years has allowed these assays to be performed more effortlessly and more reliably for the efficient detection and characterization of genotypic variants in and within geographic and temporal isolates of baculovirus species in silico. The results of the comparative analysis of geographic strains showed that especially homologous repeat regions and baculoviral repeat origin genes cause potential recombination events that cause genetic variability (Eroglu et al. 2020).

Spodoptera exigua is an important agricultural pest that originates from Southeast Asia and has spread all over the world thanks to its high migration-adaptation ability (Xia-Ling et al. 2011). During the larval period, polyphagous feed causes great economic loss in more than 50 plant species. It especially prefers important cultural plants such as sugar beet, corn, cotton, lettuce, sunflower, and tomato. If these larvae, which consume all the leaves until only the veins remain in the plants, are not combated, the product loss reaches 100% (Smits et al. 1987; Eroglu 2022).

In our previous studies, a baculovirus isolate was detected as a result of the examination of S. exigua larvae collected from sugar beet cultivation areas in Iranian (Darsouei et al. 2017). Whole genome analysis of S. exigua baculovirus isolates previously obtained only from the USA, Spain, UK, Korea, and China was performed. However, SeMNPV-IR isolated a very different locality (Razavi Khorasan region located between Southwest Asia and Central Asia) from other SpexNPV strains for which complete genome analysis was performed. This study, it was aimed to perform a whole genome analysis of this different geographical isolate (SeMNPV-IR) and to compare it with other all Spodoptera nucleopolyhedrovirus genomes in the database.

Materials and methods

Virus source and DNA extraction

The virus isolate was obtained from Spodoptera exigua larvae found in sugar beet plantations in Mashhad, Iran between 2014 and 2015 (Darsouei et al. 2017). After the infected larvae were collected from the field by Darsouei et al. and brought to the laboratory, virus production was carried out in healthy S. exigua larvae. Infected larvae were purified by the classical cheesecloth and sucrose gradient method (Munoz et al. 1997; Eroglu et al. 2018, 2019). The polyhedral inclusion bodies (PIB) need to be removed before proceeding with the DNA extraction. So, firstly 300 µl of pure virus suspension (1 × 108 PIB/mL) and 300 µl 3 × DAS buffer (0.3 M Na2CO3, 0.5 M NaCl, 0.03 M EDTA; pH 10.5) were mixed and rotated for 30 min to remove the virus particles from PIBs. Then, centrifuged at 20.300 g, 25℃ for 30 min, and removed the supernatant. DNA extraction was performed from the naked virus solution according to the instructions of the EcoPURE Genomic DNA kit. The DNA concentration measured by nanodrop was 68 ng/µl and the purity ratio A260/280 ≥ 1.8.

Genomic DNA sequencing and genome assembly

Genomic DNA was sequenced at MG Bioinformatic company using Illumina NovaSeq generates raw images utilizing (NovaSeq 2 × 150 PE) for system control. First, raw sequence reads were checked using the FastQC program (Andrews 2010). Since the sequencing results were of the quality and depth to form mitogenome sequences, the extraction of low-quality-score sequences from the data was not performed. More than 90% of the sequences in all sequence reads meet the Q30 criterion (error rate 0.001). No positional deviation was observed between sequence reads in terms of nucleotide content. Two different methods were used together to select and assemble the sequences of the genome from the high-volume sequence information of which sequencing was performed de novo and preliminary quality analysis was performed. First, raw sequence reads were checked using the FastQC program. Since the sequencing results were of the quality and depth to form genome sequences, the extraction of low-quality-score sequences from the data was not performed. After that, NOVOPlasty 2.7.2 (Dierckxsens et al. 2017) algorithms, which are used to assemble the sequences of the genome in all data, were run in the TRUBA computer cluster. Both methods were successful in acquiring genome data de novo, resulting in almost identical genome sequences. The genome obtained using these two tools was transferred to the Geneious Prime (Biomatters Ltd.) program and the accuracy of the genome obtained using the “map read to reference” feature on the raw sequences was checked.

Genome annotation and genomic relationship analyzes

Accurate determination of gene boundaries (annotation) of the genome formed by combining contigs is required for further analysis. Annotation is the process of defining the start, ending, and transcription chains (heavy or light) of gene regions, the locations of repetitive regions, and structural features such as the origin of transcription and replication.

In genome annotation of SeMNPV-IR, the detection of all genes and homologous repeat sequences in the genome was performed using the Benchling Biology Software (https://benchling.com) as described in our previous study (Eroglu et al. 2020). The SeMNPV (Accession number: AF169823), the first Spodoptera exigua nucleopolyhedrovirus whose genome was analyzed, was chosen as the reference genome (Ijkel et al. 1999). The amino acid sequences encoded by 38 core genes concerning all Spodoptera nucleopolyhedrovirus genomes and SeMNPV-IR genome were aligned in BioEdit (7.1.3.0). In phylogenetic analysis, the Jones-Taylor-Thornton (JTT) model with 1000 bootstrap in the Maximum Likelihood method to generate a phylogeny was used by the MEGA11 program. The restriction endonuclease profile of the SeMNPV-IR was compared to other Spodoptera exigua NPV genomes (SeMNPV, SeMNPV-K1, SeMNPV VT-SeAl2, and SeMNPV VT-SeOx4) in terms of StuI and SacII enzymes as in silico by using the Benchling tool (https://benchling.com).

Results

Genome organization of SeMNPV-IR

The whole genome of the SeMNPV-IR isolate was sequenced and registered at NCBI (Accession number: OP562161). The genome size of SeMNPV-IR was detected to be 135.764 kb. There are a total of 136 ORFs in the genome, of which 75 are clockwise and 61 are counterclockwise with 43.92% GC content. All ORFs of SeMPN-IR were compared to reference baculovirus genomes (Autographa californica MNPV, Cydia pomonella GV, and Spodoptera exigua MNPV) and close to the relationship as per phylogenetic analysis (SeMPNV-K1, VTSe-Ox4, and VTSe-Al2). The location of ORFs in the SeMPV-IR genome with the genomes indicated in Table 1 and their amino acid similarity ratios are given. In addition, 38 core genes found in baculoviruses are indicated by light blue color in Table 1.

Table 1 Composition of protein-coding regions and homologous repeat regions of the SeMNPV-IR genome

ORFs contents

The ORFs in the genome of baculoviruses are divided into seven main classes according to their replication, oral infectivity, transcription, apoptosis, auxiliary, structural, and unknown functions (Herniou et al. 2003). In the genome map of the SeMNPV-IR isolate detailed in Fig. 1, the functions of the genes are represented by different colors.

Fig. 1
figure 1

Circular whole genome map of SeMNPV-IR

Eleven genes are responsible for replication (me-53, helicase, i.e.-1, dutpase, dnapol, dbp, lef-1, lef-2, lef-3, lef-7, and lef-11), twelve genes for transcription (lef-4, lef-5, lef-6, lef-8, lef-9, lef-10, met, vlf-1, p47, 39k, i.e.-0, and pkip-1), nineteen genes for structural (polyhedrin, p10, odv-ec27, odv-e18, gp-41, vp-39, p78, p6.9, pk-1, fusion protein, p24, gp16, gp37, calyx, odv-e25, odv-e28, two odv-e66 and vp1034), nine genes for auxiliary (cathepsin, chitinase, egt, alk-exo, fgf, sod, ptp-2, arif-1 and ubiqutin), ten genes for oral infectivity (pif0 -pif9), three genes for apoptosis (iap-2 and two iap-3), thirty-two genes for unknown and forty genes for hypothetical. Baculovirus repeat (bro) genes are absent in the SeMNPV-IR genome as in other SeMNPV genomes.

In addition to protein-coding genes, homologous repeat (hr) sequences in the genome are also very common in baculoviruses. These sequences, which can be in different numbers in the genome, are generally responsible for increased gene expression and DNA replication (Cochran et al. 1982; Kool et al. 1995). SeMNPV-IR has 7 repeat regions in its genome, ranging in length from 96 to 806 base pairs (Table 1).

Genomic relationship analyzes

In order to determine the phylogenetic closeness of SeMNPV genomes isolated from different geographical regions, the amino acid sequences of 38 core genes belonging to the genomes were obtained from the NCBI database. The localities and access numbers from which the genomes used in this analysis were obtained are given in the S1 table. As a result of the analysis, the SeMNPV isolate obtained from the Mashhad region of Iran showed akin to the SeMNPV genomes isolated from the USA and Korea (Fig. 2).

Fig. 2
figure 2

Phylogenetic relationship analysis of SeMNPV-IR as per core genes of baculoviruses

In silico Restriction Fragment Length Polymorphism (RFLP).

To compare the genomes of geographic SeMNPV isolates, in silico analysis was performed using the Benchling online tool. Digestion of the SeMNPV-IR genome with the StuI and SacII restriction endonucleases produced 6 and 11 fragments, respectively. The RFLP results showed that the genome of the Iranian isolate differed from the USA, UK, Spain, and Korea isolates (Fig. 3; Table 2). RFLP profiles typically differ among geographic isolates of baculoviruses. These differences are usually due to the presence or absence of non-protein-coding repeat sequences found in the genome (Smith and Croock, 1988; Munoz et al. 1999).

Table 2 Diversity of homolog repeat (hr) regions in SeMNPV-IR and other SeMNPV genomes
Fig. 3
figure 3

Restriction endonuclease profiles as per StuI and SacII enzymes. Ladder: lambda DNA/HindIII, 1: SeMNPV-IR, 2: SeMNPV, 3: SeMNPV-K1, 4. SeMNPV VT-Ox4, 5: SeMNPV VT-Al2 based on the genome by Benchling program

Discussion

The new generation sequencing methods that emerged after the completion of the human genome project allowed the genomes of many organisms to be elucidated. Viruses are organisms on the fringes of life that differentiate quite easily at the genome level and can even have great variation among geographic isolates of the same virus. Variations in baculoviruses between different geographic isolates from the same insect have been a topic of interest for us virologists for many years. The fact that these differences can be easily analyzed at the genome level today has enabled the studies to be carried to advanced dimensions. Most studies of geographic genetic diversity among baculoviruses have been done for Alphabaculoviruses that infect lepidopteran larval populations (Erlandson 2009). Comparison of geographic isolates and band differences seen in RFLP often reveal differences in virulence versus native and alternative hosts (Laviña-Caoili et al. 2001).

Spodoptera exigua larvae are a popular agricultural pest that damages many agricultural crops and causes economic loss all over the world (Lasa et al. 2007). Although many studies have been conducted on the use of baculoviruses in the control of the pest (Takatsuka et al. 2003; Nathan and Kalaivani 2006; Widiawati et al. 2021), the search for isolates with local, higher virulence and more effectiveness is still continuing. Kondo et al. (1994) after examining 11 nucleopolyhedrovirus-infected S. exigua larvae collected from Shiga, Japan divided the isolates into two groups according to the shape difference seen in both polyhedra (cubic and icosahedral). In addition, the RFLP results supported that the isolates should be divided into two groups. They reported that while the RFLP results of the isolates in the first group were similar to Authographa californica NPV, the bands of the isolates in the second group were similar to Spodoptera exigua NPV California isolates. Murillo et al. (2006) isolated seven S. exigua NPVs with both phenotypic and genotypic variation from greenhouse fields in Spain. As a result of polymerase chain reactions, it has been shown that a single genotype is dominant in some isolates and two or three different genotypes are mixed in others. They have been reported that this situation affects both the killing rate and the pathogenicity of the isolates. There are general differences between geographic isolates of baculoviruses in terms of host spectrum, virulence rate, and mortality rate on larvae (Allaway and Payne 1984). These differences are undoubtedly due to the variations found in the genome (Cory et al. 2005). To date, genome analysis of SeMNPV isolates from the USA, Korea, UK, Spain, and China has been performed (Ijkel et al. 1999; Theze et al. 2014). In this study, the genome of SeMNPV-IR isolated from Razavi Khorasan Province of Mashhad in eastern Iran was analyzed. The region is an interesting geographical region located in the middle of three countries which countries Iran, Turkmenistan, and Afghanistan. In this study, genomic features of SeMNPV-IR were described in detail and compared with other Spodoptera exigua NPV genomes (SeMNPV, SeMNPV-K1, SeMNPV VT-SeAl2, and SeMNPV VT-SeOx4). The SeMNPV-IR genome has 40 hypothetical genes and most of these hypothetical genes are also found in other SeMNPV genomes as homologs. However, two of them (Orf 83 and Orf 104) are not found in UK and Spain isolates (Theze et al. 2014), while they are present in the genomes of USA (Ijkel et al. 1999) and Korea isolates (unpublished) (Table 1). The similarity rate between Orf 83 in the SeMNPV-IR genome and the homologous gene found in the genome of the USA and Korea isolates is 98%, while this rate is 89% for Orf 104.

Homologous repeat regions in baculoviruses are responsible both transcriptional enhancers and for progeny virus production and are regions that vary widely among isolates (McClintock and Dougherty 1988; Sun 2015). In these regions, the A + T ratio is quite high, and some baculoviruses may be absent while may be high numbers in some baculovirus genomes (Luque et al. 2001; Wang et al. 2016). The SeMNPV-IR has 7 homolog repeat regions (hr1-hr7). Hr1 (96 bp) is not found in other SeMNPV genomes. Besides, the base length of hr2 (355 bp) is considerably shorter than that of other SeMNPVs (1131 and 1347 bp), with a similarity rate of 92% (Table 3).

Table 3 Restriction endonuclease profile of SeMNPV isolates 1: SeMNPV-IR (Iran), 2: SeMNPV (USA), 3: SeMNPV-K1 (Korea), 4: SeMNPV-VT-Ox4 (UK), 5: SeMNPV-VT-Al2 (Spain) as per StuI and SacII enzymes

Phylogenetic relationship analysis demonstrated that the Iran isolate of SeMNPV clustered near the isolates from USA and Korea.

The genomes of SeMNPV of Iran, USA, Korea, UK, and Spain isolates were digested in silico with StuI and SacII enzymes (Fig. 3; Table 3). While there are 6, 8, 7, 7, and 7 cut regions for the StuI enzyme in genomes, there are 11, 12, 12, 12, and 12 cut regions for the SacII enzyme, respectively. In terms of both enzymes, it was Iran isolate that had the least cut sites.

As a result of the comparison of all protein-coding regions and homologous repeat regions in the genome, significant differences were observed between SeMNPV genomes, especially in two hypothetical regions (Orf 83, Orf 104) and two homologous repeat regions (hr1, hr2). Genomic variation among geographic variants from the same insect species isolated is thought to be largely due to hr regions.

In this study, whole genome analysis of SeMNPV-IR was performed. The elucidation of the function of two hypothetical genes in the genome of this isolate, whose genome is in a very different location (Razavi Khorasan) from the previously sequenced SeMNPV samples, provides a basis for further investigation. The clarification of the functions of these genes by conducting different studies will contribute greatly to the known data about the genome of baculoviruses.