Background

Members of the family Baculoviridae are rod-shaped viruses with circular, covalently closed, double-stranded DNA genomes [1]. This family includes four genera: Alphabaculovirus (lepidopteran-specific nucleopolyhedroviruses (NPVs)), Betabaculovirus (lepidopteran-specific granuloviruses), Gammabaculovirus (hymenopteran-specific NPVs) and Deltabaculovirus (dipteran-specific NPVs) [2]. To date, 54 baculovirus genomes have been sequenced (http://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=10442), including 37 from Alphabaculovirus, 13 from Betabaculovirus, three from Gammabaculovirus and one from Deltabaculovirus. Nucleopolyhedrovirus (NPV) and granulovirus (GV) are distinguished from each other by their occlusion body morphology. The NPVs produce large, polyhedron-shaped occlusion bodies, called polyhedra, which contain many virions, whereas the GVs have smaller occlusion bodies, called granules, which normally contain a single virion. The NPVs are further designated as single-nucleocapsid (S) or multinucleocapsid (M), depending on the potential number of nucleocapsids packaged in an envelope of the virion.

The cotton bollworm, H. armigera, is a serious pest that causes economic losses to over 60 vegetable and field crops throughout the world [3]. H. armigera larvae are significantly resistant to chemical insecticides; therefore, baculovirus pesticides have been recognized as one of the most promising agents to control such pests [4]. HzSNPV was registered as one of the first commercial baculovirus pesticides (Virion-H, Biocontrol-VHZ, Elcar) in the 1970s, and has been used extensively to control the cotton bollworm in the USA [5]. HearSNPV was also the first commercial baculovirus pesticide used to control H. armigera in China, and has been extensively used for the control of the pests in vegetable crops [6].

The DNA genomes of HearSNPV-G4 [7], HearSNPV-C1 [8], HearNPV-NNg1 [9], and HzSNPV [10] have been sequenced. Among them, the HearSNPV-G4 and HearSNPV-C1 were isolated from China, HearNPV-NNg1was isolated from Kenya, and HzSNPV was isolated from the United States. Comparative genomic analyses showed that overall gene content and arrangement in these four viruses were highly conserved, and they are considered variants of the same NPV species [9]. In addition, the nucleotide sequence of the HearGV DNA genome was reported [11]. Multinucleocapsid NPVs isolated from H. armigera (HearMNPV) producing ODV virions with multiple nucleocapsids per envelope have been identified [12, 13]. The genes of other 18 HearMNPV isolates from H. armigera which included lef-8, lef-9, polyhedrin have been reported [14].

In this study, a new nucleopolyhedrovirus isolated from H.armigera was observed by electron microscope (EM), suggesting it was multinucleocapsid NPV. Experimental infection of insect larvae indicated that host range of HearMNPV was different from that of HearSNPV and that the cytopathological effect of HearMNPV differed from that of HearSNPV. This report describes the sequence and organization of the HearMNPV genome and compares it with sequence data from other baculoviruses, such as HearSNPV and MacoNPV-B.

Methods

Viruses and insects

HearMNPV was originally isolated from a naturally infected H. armigera in the Shanghai city, China in the 1970s. The virus was propagated in laboratory stocks of healthy third instar H. armigera larvae by per os infection. A laboratory stock of eastern armyworm, cotton leaf worm and beet armyworm were reared at 26°C with a 16:8 h light:dark cycle on a semi-synthetic diet.

Virus purification, DNA extraction, and construction of genomic DNA libraries

To generate a large number of polyhedra, healthy third instar H. armigera larvae were inoculated and the hemolymph was collected from the H. armigera larvae were collected on ice and centrifuged for 10 min at 4°C. The precipitate was washed several times with distilled water and re-suspended in 0.1% SDS for 30 min at room temperature. After centrifugation, the clean polyhedra were re-suspended in 200 μl TE buffer (10 mM Tris–HCl, pH 8.0, 1 mM EDTA) [15].

The genomic DNA of HearMNPV was purified according to the following protocol: about 5 × 108 polyhedra were dissolved in 0.1 M Na2CO3, 0.15 M NaCl, pH10.4 on ice for 10 minutes, SDS was then added to a final concentration of 0.5%, and the solution was kept on the ice for another 10 minutes. The genomic DNA was extracted twice in an equal volume phenol (pH8.0) and once in chloroform. The DNA was precipitated with two volumes ethanol, washed with 70% ethanol, and dissolved in 0.1 × TE buffer (pH8.0) [16]. The quantity and quality of the isolated DNA were determined by spectrophotometrically and by electrophoresis on 0.7% agarose.

A random genomic library of HearMNPV was constructed according to the “partial filling-in” method and contained 2.0 to 5.0 kbp fragment in vector pUC19 [15, 16] DNA fragments for sequencing were prepared from 527 recombinant plasmids. The recombinant plasmids were sequenced with plasmid specific primers and 'primer nesting' from both strands, using BigDye Terminator v3.1 (ABI) on a 3130XL Genetic analyzer (ABI). The combined sequence was generated from these clones represented a six-fold genomic coverage. The gaps were filled by PCR.

The insect cell lines and infection

The Hz-AM1 cell line and HaBacHZ8-GFP were gifts from Dr. Fei Deng of Wuhan Institute of Virology, Chinese Academy of Sciences. HaBacHZ8 is a bacmid of HearNPV that lacks the polyhedrin gene. An enhanced GFP gene was introduced to HaBacHZ8 by using the HearSNPV bac-to-bac system [17, 18] and this generated the bacmid HaBacHZ8–GFP [19]. The QB-Ha-E-5 cell line, which was a gift from Dr. Guiling Zheng of Shandong Agricultural University, was established from the embryonic tissue of H. armigera (Lepidoptera: Noctuidae). The cell line had been subcultured over 60 passages in TNM-FH medium supplemented with 10% fetal bovine serum. The cell line could be infected by H. arigera single nucleopolyhedrovirus (HaSNPV) [20]. The Hz-AM1 cells and QB-Ha-E-5 cells were cultured at 27°C in TNM-FH insect medium (Sigma, USA) supplemented with 10% (v/v) heat-inactivated fetal bovine serum (Gibco-BRL, Gaithersburg, USA). Hz-AM1cells and QB-Ha-E-5 cells were infected with HearMNPV at a multiplicity of infection (MOI) of 5. For coinfection, QB-Ha-E-5 cells were infected simultaneously with HearMNPV and HaBacHZ8-GFP at an MOI of 5 for each virus. The cells were examined using Nikon-Ts100 and Leica TCS SP5 II microscopes.

Scanning electron microscopy

Polyhedra were fixed in 2.5% glutaraldehyde at 4°C for 2 h. The fixed sample was dehydrated through a serial ethanol gradient, and then embedded in Epon-Araldite resin. A diamond knife was used to cut ultrathin sections on a Reichert OMU3 Ultramicrotome. The sections were stained with 2% aqueous uranyl acetate, followed by lead citrate. Micrographs of the Polyhedra were taken with a Hitachi S3400N transmission electron microscope at 80 kV.

Transmission electron microscopy

Polyhedra were fixed in 2.5% glutaraldehyde in 0.05 cacodylate buffer at 4°C for 2 h and post-fixed in 1% osmium tetroxide in the same buffer for 2 h at room temperature. Fixed samples were dehydrated through a graded series of ethanol solutions and embedded in Spurr’s resin. Sections were cut, stained with uranyl acetate and lead citrate, and examined under a JEM-1230 transmission electron microscope (TEM) at an accelerating voltage of 80 kV.

Quantitative PCR (qPCR)

Third-instar larvae were starved for 12 h at 26°C before being inoculated, and H. armigera test larvae were allowed to ingest a diet soaked in a 10 μl drop, containing an estimated 107 OBs. Control larvae ingested a diet soaked in a 10 μl drop with no OBs. The diet soaked OBs or ddH2O were replaced by fresh diet with no OBs after 2 hours adsorption period at 26°C. Time zero of the infection was defined as the time when the diet soaked OBs or ddH2O was removed from the culture boxes. Larvae used in this experiment were sacrificed at various time points ranging from 4 to 96 h post-inoculation (p.i.). A powder was prepared from ten larvae using a mortar and pestle after each collection under liquid nitrogen. 0.1MNa2CO3, 0.15 M NaCl, 1%NP-40 was added to the powders to a total volume of 700 μl. Total DNA was then extracted by the addition of an equal volume of phenol (pH8.0) (twice) and chloroform (once). The DNA was precipitated with two volumes of ethanol, washed with 70% ethanol, and dissolved in ddH2O. The quantity and quality of the isolated DNAs were determined spectrophotometrically.

HearMNPV DNA copy number was determined by real-time qPCR with primers specific to the rr2b gene. The viral copy number was then normalized against host-genome copy number by qPCR with primers specific to the host actin gene [21]. The rr2b and actin genes were amplified by PCR and cloned into pGEM-T. Recombinant plasmid DNA concentrations were quantified using a spectrophotometer and dilution standards were generated. For each standard dilution, three independent qPCRs were performed using rr2b or actin specific primers, and standard curves were generated. For each larval DNA extract, three independent qPCRs were performed using rr2b and actin specific primers. The mean of the HearMNPV DNA copy numbers were determined and the number of rr2b amplicons was normalized against the number of host actin genes to derive the mean number of viral copies per mean host actin gene copy number. The specific primers were as follows:

actin-F 5' CTCTTCCAGCCCTCATTCTTG 3'

actin -R 5' TTCTGCATACGGTCAGCGATA 3'

rr2b-F 5' AGCAACAAGACTTAATACTCAACGC 3'

rr2b-R 5' AATATGGCTGCAAAGCTACCG 3'

DNA sequence analysis

Restriction fragments from recombinant plasmids were sequenced and assembled into contigs using SeqMan5.0 from the DNASTAR software package. PCR was used to generate gap-spanning fragments and low quality data regions after preliminary assembly. Open reading frames (ORFs) were identified using ORF Finder http://www.ncbi.nlm.nih.gov/gorf/gorf.html[22]. The criterion for defining an ORF was a size of at least 150 nt (50 aa) with minimal overlap. Promoter motifs present upstream of the putative ORFs were screened as described previously [23]. Homology searches were done through the National Centre for Biotechnology Information (NCBI) website using BLAST [24]. Multiple alignments and percentage identities were performed using the Clustal W. The Tandem Repeats Finder http://tandem.bu.edu/trf/trf.html was used to locate and analyze the homologous regions (hrs) [25]. GeneParityPlot analysis was performed as described previously [26]. A phylogenetic tree was inferred from amino acid sequences by NJ and MP analyses using MEGA, version 5.0 [27]. Bootstrap analyses were performed to evaluate the robustness of the phylogenies using 1000 replicates for both NJ and MP analyses.

Results and discussion

Electron microscopy observation

Scanning electron microscopy revealed that the purified occlusion bodies (OBs) of NPV originating from infected cotton bollworm have irregular shapes, with diameters of about 2 ± 0.3 μm (Figure 1A). Transmission electron microscopy showed multiple rod-shaped nucleocapsids of about 230 nm in length and 50 nm in width embedded in each OB, with multiple nucleocapsids packaged within the envelope of the virion (Figure 1B). These results indicated that the virus is a typical multinucleocapsid NPV. However, transmission electron microscopy indicated that HearSNPV have single nucleocapsids packaged in their virion (Figure 1C). Therefore, the isolate was termed H. armigera multinucleocapsid nucleopolyhedrovirus (HearMNPV).

Figure 1
figure 1

Electron micrographs of polyhedra from HearMNPV and HearSNPV. A. Scanning electron micrograph of HearMNPV (12,000×) B. Transmission electron micrograph of HearMNPV (80,000×) C. Transmission electron micrograph of HearSNPV (20,000×).

HearMNPV infected insect larvae and cells

Experimental infection of insect larvae showed that HearMNPV can infect the eastern armyworm (Pseudaletia separate), but cannot infect either the cotton leaf worm (Spodoptera litura) or the beet armyworm (Spodoptera exigua). By contrast, HearSNPV cannot infect P. separate. These results indicate that the host range of HearMNPV differs from that of HearSNPV. Moreover, HearMNPV-infected Hz-AM1 cells produced no polyhedra and showed no typical cytopathic effects (CPE), even at 96 h post-infection (pi) (Figure 2A). However, HearMNPV-infected QB-Ha-E-5 cells produced polyhedra (Figure 2B). It was previously reported that Hz-AM1 cells were permissive to HearSNPV-G4 [8, 17, 18]. When QB-Ha-E-5 cells were infected by HaBacHZ8-GFP [18, 19], which was constructed from HearSNPV, green fluorescence was observed under fluorescence microscopy (Figure 2C). These results indicate that the host range and cells infected by HearMNPV differ from those infected by HearSNPV.

Figure 2
figure 2

Cytopathic effects in infected cells. Panel A: Hz-AM1 cells uninfected (A) and infected with HearMNPV (MOI:5) at 72 hpi (B). Panel B: QB-Ha-E-5 cells uninfected (C) and infected with HearMNPV (MOI:5) at 72 hpi (D). Panel C: QB-Ha-E-5 cells co-infected with HearMNPV and HaBacHZ8-GFP (MOI:5) at 72 hpi. Cells were viewed using confocal laser fluorescence microscopy. Arrows 1 and 2 indicate polyhedra-containing fluorescent cells, arrow 3 indicates polyhedra-containing cells, and arrow 4 indicates fluorescent cells. Magnification: ×400.

Coinfection of QB-Ha-E-5 cells with HaBacHZ8-GFP and HearMNPV showed that certain cells possessed green fluorescence (under fluorescence microscopy), some cells produced polyhedra, and some cells possessed both green fluorescence and polyhedra (Figure 2C). The results indicate in cells coinfected with two distinct viruses, the viruses are able to coexist, replicate, and package themselves independently. Cydia pomonella granulovirus (CpGV) is one of the most successful commercial baculovirus insecticides; however, resistance of the codling moth (C. pomonella) to commercially applied CpGV in orchards located in Germany and France has occurred [28]. Therefore, alternating virus treatment or using a mix of HearMNPV and HearSNPV could delay the development of resistance in H. armigera, helping to improve both the prevention and control of H. armigera in the field.

HearMNPV virus DNA replication in vivo

HearMNPV is a potentially new isolate infectious for H. armigera based on the analysis of host range and morphology. Thus qPCR was used to determine the replication efficiency of HearMNPV infectious for H. armigera in vivo. Figure 3 shows the quantity of HearMNPV viral DNA in infected larvae, in the decreasing phase (0–4 hr), latent phase (4–12 h), exponential phase (12–48 h), and stationary phase (48–96 h). Initially, the number of viral DNA (vDNA) copies appeared to decrease between 0 h post infection (p.i.) and 4 h p.i. before increasing by 6.97 times between 4 h p.i. and 12 h p.i. The number of vDNA dramatically increased between 12 h p.i. and 48 h p.i., from 452/105 actin to 2.02 × 1011/105 actin, an increase of 4.46 × 108 fold. These results indicated that the viral DNA replicated about 29 times, taking about 1.24 h to generate another vDNA copy. This trend continued into the stationary phase, to a lesser degree: vDNA increased 4.82 times between 48 h p.i. and 96 h p.i, and there were about1.17 × 1012 copies per 105 actin at 96 h. These results suggested that H. armigera could be infected by HearMNPV efficiently and the replication kinetics conformed what has previously been described for other baculoviruses [29].

Figure 3
figure 3

Quantification of HearMNPV DNA copy number in infected H. armigera larvae. qPCR was used to determine the number of rr2b gene copies relative to the actin gene at various times following infection. The plotted points indicate the averages of the number rr2b gene copies relative to actin gene (performed in triplicate).Error bars represent standard deviations.

Nucleotide sequence of the HearMNPV genome

The HearMNPV genome consists of 154,196 bp (GenBank accession no. NC_011615), which is similar to the genomes of MacoNPV-A (155,060 bp) and MacoNPV-B (158,482 bp). The HearMNPV genome has a G + C content of 40.07%, which is within the 58% (LdMNPV) and 32.7% (ChocGV) range for baculovirus genomes, and is similar to MacoNPV-B (40%), AcMNPV (40.7%), BmNPV (40.4%), and EppoNPV (40.7%). According to the adopted convention, the adenine residue at the translation initiation codon of the polyhedrin gene represented the zero point on the HearMNPV physical map, and the polyhedrin gene was designated as ORF 1 (Table 1). A total of 162 putative ORFs and four homologous regions (hrs) were detected in the HearMNPV genome, using computer-assisted analysis to select ORFs starting from a methionine-initiated codon (ATG) and including at least 50 amino acids (aa) and having a minimal overlap with other ORFs [30, 31]. All 162 ORFs are shown in Table 1 by location, orientation, size, and potential baculovirus homologs.

Table 1 List of ORFs in HearMNPV and their Homologous ORFs in the MacoNPV-B, MacoNPV-A, HearSNPV(G4), AcMNPV, AgSeNPV and HearGV

HearMNPV ORFs had an average length of 870 bp, with ORF85 (helicase) being the largest (3,627 bp) and ORF99 (ctl, conotoxin-like protein) being the smallest (150 bp). The 162 predicted ORFs encode 46,677 aa. The total coding sequence and intergenic regions were 139,026 and 15,170 bp, which represented 90.16% and 9.84% of the genome, respectively. The four hrs were distributed along the genome, with sizes ranging from 724 to 1,766 bp, and their total sequence was 4,749 bp, accounting for 3.08% of the genome. Thirty-eight ORFs overlapped with adjacent ORFs by between 1 and 244 bp, with a total of 1485 bp.

Of the 162 ORFs identified in HearMNPV, 21 possessed a consensus early promoter motif (a TATA box followed by a CAGT or CATT motif 20 to 40 bp downstream, and up to 180 bp upstream, of the initiation codon). Seventy-one ORFs only contained a late promoter motif ((A/T/G) TAAG up to 180 bp upstream of the initiation codon), and nine had both early and late promoter motifs, which might allow transcription during both the early and late stages of infection. Sixty-one ORFs lacked any recognizable consensus early or late promoter motifs up to 180 bp upstream of the ATG. Eighty-six ORFs (46%) were oriented in a clockwise direction and 76 ORFs (54%) were in a counter clockwise direction, according to the transcription orientation of the polyhedron gene.

Comparison of HearMNPV ORFs to other baculoviruses

The overall gene arrangement and the homology between genes of the HearMNPV and other baculoviruses genomes were compared using Identity-GeneParity analysis [26]. The gene content and organization of HearMNPV were compared with a group I NPV (AcMNPV [32]), Group II NPVs (MacoNPV-B [33] MacoNPV-A [34], HearSNPV-G4 [7] and AgseNPV [35]), and GV (HearGV [11]. HearMNPV shares 117 ORFs with AcMNPV, 161 ORFs with MacoNPV-B, 159 ORFs with MacoNPV-A, 123 with HearSNPV-G4, 147 with AgseNPV, and 89 with HearGV. The average amino acid sequence identities between HearMNPV and AcMNPV, MacoNPV-B, MacoNPV-A, HearSNPV-G4, AgseNPV and HearGV were 44.7%, 98.5%, 90.2%, 41.0%, 58.5%, and 38.6%, respectively.

Comparison of the gene order between HearMNPV and MacoNPV-B revealed a significantly gap between orf 52 to orf 53 in the HearMNPV genome. The gap corresponds to a region of MacoNPV-B comprising orf54, 55, 56, 57, 58, 59, and 60. In addition, the orf66 and orf17 of HearMNPV are homologous to the orf18 and orf117 of MacoNPV-B with 99, 41.2% aa identity, respectively (Table 1). However, the locations of these homologues are not conserved. Relative to each other HearMNPV (x-axis) and MacoNPV-B (y-axis) contain 1 and 8 unique genes, respectively. However, HearMNPV and MacoNPV-B maintain perfect co-linearity in gene content and arrangement (Figure 4A).

Figure 4
figure 4

Identity-Gene Parity Plots of HearMNPV with MacoNPV-B (A), MacoNPV-A (B), HearSNPV-G4 (C), AcMNPV (D), AgseNPV (E), and HearGV (F). Amino acid identity (%) of individual homologous ORFs of HearMNPV compared to other baculoviruses are shown in various colors. ORFs unique to each virus are placed on the x-axis and y-axis, respectively (black diamonds).

The gene arrangement of HearMNPV was also completely collinear with that of MacoNPV-A. The result of the Identity-GeneParity analysis showed that relative to each other HearMNPV (x-axis) and MacoNPV-A (y-axis) contain 2 and 10 unique genes, respectively. There was also high collinearity between HearMNPV and MacoNPV-A (Figure 4B).

In terms of gene content, arrangement, and homology level, HearMNPV is significantly distant from HearSNPV-G4, although they infect the same host, H. armigera. Relative to each other HearMNPV (x-axis) and HearSNPV-G4 (y-axis) contain 38 and 18 unique genes, respectively, and these genes are distributed throughout the genomes (Figure 4C). The ‘left’ part of the HearMNPV genome (ORF5-69) displayed a high degree of gene scrambling in the gene parity plot analysis. The homologous ORFs from HearMNPV 70 to 160 are approximately collinear with the HearSNPV-G4 ORFs 8 to 96; however, the direction of the diagonal indicates these regions are inverted, relative to each other, except for HearMNPV ORF102-107 (corresponding to HearSNPV-G4 ORF62–67). (Figure 4C)

Relative to each other HearMNPV (x-axis) and AgseNPV (y-axis) contain 15 and 11unique genes, respectively. The collinearity between HearMNPV and AgseNPV was higher than that between HearMNPV and HearSNPV-G4, and lower than that between HearMNPV and MacoNPV-B or MacoNPV-A. (Figure 4E)

The collinearity between HearMNPV and AcMNPV from Group I was lower than those between HearMNPV and NPVs from group II (Figure 4D); the parity analysis of HearMNPV and HearGV ORFs displayed a much more dispersed pattern (Figure 4F).

Phylogenetic analysis

Based on 29 concatenated, conserved genes [36], a phylogenetic tree was estimated for 54 baculoviruses. The results reflected the current systematic assignment of the viruses (Figure 5), indicating that HearMNPV and MacoNPV-B are grouped together and are distinct from HearSNPV-G4 HearSNPV-C1 and HearSNPV-NNg1. In addition, the phylogenetic analysis of three highly conserved genes (lef-8, lef-9, and polh) indicated that the HearMNPV sequences were separated from the other eighteen HearMNPV isolates [14]. These results imply that HearMNPV is a new isolate that differs from HearSNPV.

Figure 5
figure 5

Phylogenetic analysis of concatenated amino acid sequence alignments, showing bootstrap values >50% for NJ and MP trees at each node (NJ/MP). The location of HearMNPV is shown in bold. The GenBank accession numbers of each virus are listed after the names.

Genomic Comparison between HearMNPV and MacoNPV-B: HearMNPV lacks a 5.4-kb fragment that contains five ORFs

Compared with the MacoNPV-B genome, the HearMNPV genome does not have a 5.4-kb fragment that contains ORF54, 55, 56, 57, and 58 (Figure 6). The nucleotide identities between HearMNPV orf52 1–633 bp, 639–896 bp, and 853–1050 bp and MacoNPV-B orf 53, orf 59, orf 60 are 98%, 98%, and 95%, respectively. Amino acids 147–349 of the protein encoded by HearMNPV orf52 are 100% identical to those of the protein encoded by MacoNPV-B orf 53 and amino acids (aa) 1–65 of the protein encoded by HearMNPV orf52 are 86% identical to the amino acid sequence of MacoNPV-B orf 60 (Figure 6, indicated by the gray parts in the circles and arrows). However, there was no aa sequence identity between the proteins encoded by HearMNPV orf52 and MacoNPV-B orf 59 (Figure 6, indicated by the black parts in the circles and arrows). The MacoNPV-A genome also lacked the 5.4-kb fragment, suggesting that an insertion in the genome might have lead to the division of ORF59 in MacoNPV-A [33].

Figure 6
figure 6

Comparison of the genome structure of HearMNPV and MacoNPV-B. The left and right arrows represent ORFs in HearMNPV and MacoNPV-B genomes, respectively. The numbers above the arrows represent the names of the ORFs in HearMNPV and MacoNPV-B genomes. The lines between the names of the ORFs represent homologies between the HearMNPV and MacoNPV-B genomes. The black region of arrows in the circle or box represent the nucleotide sequences homologies and the gray region of arrows in the circle or box represent the amino acid sequence homologies. The down arrow indicates the sites where the 5.4-kb fragment is inserted. The letter A indicates that the ORF of HearMNPV has no homolog in the corresponding position of MacoNPV-B. The letter B represents the ORF unique to HearMNPV. Double Vertical Lines represent ORFs that are not in the HearMNPV and MacoNPV-B genomes.

According to the sequence analysis of 54 whole genomes of baculoviruses, the 5.4-kb fragment present in MacoNPV-B but not in HearMNPV shared homologous sequences with XecnGV [30] and HearGV [11], by reverse alignment (Table 2). However, this phenomenon was not observed in other genomes. Combined with the phylogenetic analysis (Figure 5), the results suggest that the 5.4-kb fragment was gained during evolution of MacoNPV-B and thus the common ancestor of HearMNPV ORF52 evolved to MacoNPV-B ORF53, ORF59, and ORF60 through gaining 5.4-kb fragment, together with subsequent nucleotide mutations, deletions, and insertions (Figure 6). For recombination to occur, the different viruses species have to be coinfecting the same host at the same time. A relatively recent recombination event between ancestors of MacoNPV-B and XecnGV resulted in the insertion of a 5.4-kb fragment from an ancestor of XecnGV into the genome of an ancestor of MacoNPV-B genome, suggesting that these lineages were capable of infecting the same host species at some point during their history [33]. HearMNPV and HearGV could infect the same host cotton bollworm, H. armigera, which provides the opportunity for the natural recombination between two viruses. However, HearMNPV did not gain the 5.4-kb fragment from HearGV by recombination.

Table 2 Comparison of ORFs aa identity from 5.4-kb fragment of MacoNPV-B, XecnGV and HearGV

HearMNPV ORF66

The nucleotide sequence of ORF66 has high nucleotide sequence similarity to MacoNPV-B’s ORF17 and ORF18. Presumably, a mutation gave rise to the division of HearMNPV ORF66 into two open read frames in MacoNPV-B.

HearMNPV ORF66, located between ORF65 (dutpase) and ORF67 (bro-b), is 1779 bp long and encodes a protein 592aa. The aa sequence identity is 99% between the first 301aa of HearMNPV ORF66’s (874–1779 bp) and MacoNPV-B ORF18 (301aa). However, the genome sequence of HearMNPV ORF66 and MacoNPV-B ORF18 are not collinear (Figure 4A, Figure 6).

Nucleotides 676–968 of HearMNPV ORF66 are 95% identical to MacoNPV-B ORF17; however, the amino acids encoded by this nucleotide sequence did not share amino acid identity with the protein encoded by MacoNPV-B ORF17 because of frameshifts and other mutations of a few nucleotides. There is also no sequence similarity between the N-terminal region (1–675 bp) of HearMNPV ORF66 and MacoNPV-B, either at the nucleotide or amino acid level (shown in the boxes of Figure 6).

The HearMNPV ORF66 protein is 92% identical to the five homologous hr1, hr2, hr3, hr4, and hr5, each of approximately 608aa in size, of Heliothis virescens ascovirus -3e (HvAV-3e) [37], and 85% identical to the proteins encoded by orf34 (564aa) and orf77 (606aa) of Spodoptera frugiperda ascovirus-1a (SfAV-1a) [38]. The comparison showed that these homologous ORFs have four conserved cysteine domains, suggestive of a zinc-binding domain, hypothesized to be a DNA binding domain. This putative domain is found at the C terminus of a large number of transposase proteins, indicated that this might be related to gene duplication in the genome.

Interestingly, we have found an element in the right and left flanking DNA sequences of HearMNPV orf66 had two perfect inverted terminal repeats (ITRs) of 13 nucleotides. Moreover, the tetranucleotide5-TTAA-3, which is very common in transposition of the TTAA family, is duplicated upon this element [39, 40] (Figure 7). This indicated that this element could insert exclusively at this insertion site (TTAA). However, the ORF66 has no amino acid identity with piggyBac transposase [41]by blastp analysis. Sequence analysis showed that the right and left flanking DNA sequences of HvAV-3e hr1, hr2, hr3, hr4, and hr5 also have two perfect ITRs of 13 nucleotides. However, the left flanking DNA sequences of MacoNPVB orf18 lacked the sequence CCTCCTAAGACCC. These results indicated homologous of HearMNPV orf66 in MacoNPVB was split into MacoNPVB orf18 and orf17 during evolution.

Figure 7
figure 7

Diagram of the region with two DNA sequences flanking a putative Transposase ORF (HearMNPV orf66 ) of 1 779 bp encoding a protein with 592 amino acids. The tetranucleotide TTAA duplicated is characteristic of a transposition event by a transposable element. ITR represent the inverted terminal repeats.

Searching for homologs of HearMNPV ORF66 among the baculoviruses revealed that only HearMNPV ORF66 (592aa), MacoNPV-B ORF18 (301aa), HearGV ORF53 (572aa), ORF157 (572aa), ORF157 (576aa), and PsunGV ORF39 (571aa) are homologous ORFs. The phylogenetic analysis indicated that HearGV ORF53, ORF157, and PsunGV ORF39 belong to the same phylogenetic branch, while HearMNPV ORF66, MacoNPV-B ORF18, HvAV-3e hr1–hr5, and SfAV-1a ORF34 and 37 belong to the same phylogenetic clade (Figure 8). HearMNPV and HvAV-3e are both isolated from cotton bollworms, HearMNPV ORF66 and HvAV-3e hr1–hr5 share a flank structure, and have the highest amino acid identity among the homologous genes in baculovirus and ascoviridae to date (excluding unreleased relevant data). This data indicated that these genes might have been exchanged among species and genera.

Figure 8
figure 8

Phylogenetic analysis of the HearMNPV ORF66 amino acid sequence. The phylogenetic tree shows bootstrap values >50% for NJ and MP trees at each node (NJ/MP). The location of HearMNPV ORF66 is shown in bold. The sequences used are from Mamestra configurata NPV-96B (ORF18), Helicoverpa armigera GV(HearGV (ORF53 and ORF157), Pseudaletia unipuncta GV(ORF39), Heliothis virescens ascovirus 3e (HvAV-3e hr1hr5), and Spodoptera frugiperda ascovirus 1a (SfAV-1a ORF34 and ORF77).

The genomic differences between HearMNPV and MacoNPV-B are mainly located between hr1 and hr2, including the deletion of the 5.4 kb fragment in HearMNPV and the changes in ORF66, both of which were close to a bro gene (Figure 6).

HearMNPV ORF17

The locations of the HearMNPV ORF17 and its homologue in the MacoNPV-B genome are not conserved. HearMNPV ORF17 only has 41.2% aa identity to MacoNPV-B ORF117(e = 6e-46, with 98% query coverage), while HearMNPV ORF110 was collinear at an aa identity of 97.2% with MacoNPV-B ORF117, indicating that HearMNPV ORF17 has no significant collinearity with the homologous ORF of MacoNPV-B.

HearMNPV unique ORF

HearMNPV ORF139 is 264 bp long and encodes a protein of 87 aa. There is an early promoter CATT motif in the 180 bp region upstream of the start codon. Using both BLASTX and BLASTP searching, no homologous protein was found among baculoviruses.

bro genes

The occurrence of the baculovirus repeat ORF (bro) gene family is a striking feature in many baculovirus genomes [42]. bro genes are associated with regions of viral genome rearrangement [43]. BmNPV BRO proteins have nucleic acid binding activity that influences host DNA replication and transcription [44]. BRO proteins function as nucleocytoplasmic shuttling proteins that utilize the CRM1-mediated nuclear export pathway [45]. We identified six bro genes dispersed among the genome of HearMNPV and named them bro-a to bro-f, according to the order of their appearance on the linearized genome. There are eight and seven bro genes in MacoNPV-A and MacoNPV-B, respectively. The bro genes are classified into four groups, based on the similarity of the 41-amimo acid core domain sequences used for LdMNPV BRO protein classification [46]. HearMNPV bro-c, bro-d, and bro-e belong to group I bro genes, bro-a and bro-b belong to group II bro genes, bro-f belongs to group IV There is no bro gene corresponding to MacoNPV-B bro-b, which belongs to group III. The HearMNPV genome also lacks homologs of the MacoNPV-A bro-a (group I) and bro-c (groupIII) genes.

The HearMNPV bro-a, -b, -c, -d, -e, - f genes showed aa identities of 83%, 77.5%, 98.3%, 89.6%, 97.4%, and98.8% to MacoNPV-B bro-a, -c, -d, -e, -f, -g, respectively. MacoNPV-B bro-b is located in the region of 5.4 kb fragment of MacoNPV-B, which is lack in the HearMNPV genome.

The HearMNPV bro-a gene had an N-terminal region from aa 1 to aa 134 with aa identities of 63% and 95% to MacoNPV-B bro-a and MacoNPV-A bro-b, respectively. The C-terminal region, from aa 135 to aa 331, has aa identities of 98% and 93% to MacoNPv-B bro-a and MacoNPV-A bro-b, respectively. This suggested that bro-a C-terminal regions are the highly conserved portions in these three virus genomes.

HearMNPV bro-f shows high homology to a hypothetical protein P20 [47] from Leucania separata NPV (LeseNPV) and MacoNPV-A bro-h, which both encode 179 aa proteins with amino acid identities of 95% and 98%, respectively. HearMNPV bro-f shows the highest homology to MacoNPV-B bro-g, with an amino acid identity of 98.8%. However, amino acids 1–17 of HearMNPV bro-f are not found in MacoNPV-B bro-g.

When comparing the bro genes of HearMNPV with MacoNPV-B, the lowest aa identity is between HearMNPV bro-b and MacoNPV-B bro-c, at 77.5%. The HearMNPV ORF66 gene is adjacent to HearMNPV bro-b and has changed much comparing with ORF17 and ORF18, which are closest to MacoNPV-B bro-c.

The differences between HearMNPV bro-c, d, and e and their homologs in MacoNPV-B represent minor nucleotide insertions, deletions, and substitutions.

The bro genes of HearMNPV differed from those of MacoNPV-B in both sequence and number, which indicated that the bro gene region is one of the most important in genomic variation of baculoviruses. The differences between HearMNPV and MacoNPV-B (the 5.4 kb fragment and the location of ORF66) were found in the vicinity of a bro gene. These differences indicated that bro gene might play a role in gene exchange, and, consequently, viral virulence and host range.

hrs

Variable numbers of hr sequences, composed of direct repeats containing a “core” imperfect palindrome and dispersed unevenly among the genome in AT rich intergenic regions, have been identified in most baculovirus genomes [48]. The baculovirus hrs act as enhancers of RNA polymerase II-mediated transcription of baculovirus early promoters [49], as well as functioning as origins of DNA replication in transient replication assays [50, 51]. They are also sites of frequent recombinant and rearrangement in baculovirus genomes [52, 53]. Four hrs were identified in the HearMNPV genome, with the sizes of 1185 bp (hr1), 1766 bp (hr2), 1074 bp (hr3) and 724 bp (hr4), respectively. The hrs are distributed throughout the HearMNPV genome: between orf14 and orf15, orf63 and orf64, orf130 and orf131, and orf138 and orf139 for hr 1, 2, 3, and 4, respectively. Sequence analysis confirmed that the four hrs comprise two apparent domains with perfect or near-perfect 40 bp palindromes (designated type A) and 31 bp flanking repeats (designated type B) at the head/end of one or both sides of the palindromes (Figure 9A). Each hr repeat sequence comprises two apparent domains (type A and type B) that is similar to that described for MacoNPV-A and MacoNPV-B [33, 34]. However, the repeat unit numbers in each hr of HearMNPV was different from MacoNPV-A and MacoNPV-B.

Figure 9
figure 9

Comparison of the hr regions between HearMNPV and MacoNPV-B. Panel A: a deduced consensus sequence of domains A and B from each hr region was used for this alignment. Conserved sequences are indicated with different shading: black indicates 100% conservation, gray >70% conservation, and no shading <70% conservation. Panel B: arrows represent the direction and positions of the repeat-A and repeat-B regions, black boxes represent type A repeats and blank boxes represent type B repeats.

The four hrs of HearMNPV are located at similar positions in the genome as those of MacoNPV-B and MacoNPV-A. Sequence alignment between HearMNPV and MacoNPV-B hrs indicated that these four homologous regions had some insertions/deletions of different sizes, giving rise to identities of 92.0%, 92.3%, 86.6%, 81.9%, respectively. hr1 has three insertions, two of 52 bp and 43 bp that contained only a type A repeat, and one of 147 bp that contained type A and type B repeats. hr2 has two insertions (209 bp and 81 bp) that contained both type A and type B repeats. hr3 has the biggest deletion (489 bp) and another large deletion of 136 bp, which also contained both type A and type B repeats. hr4 has a small deletion of 69 bp that also contains both type A and type B repeats (Figure 9B). The HearMNPV hr4 (724 bp) is shorter than the MacoNPV-B hr4 (1178 bp) occurrence, probably caused by the presence of HearMNPV ORF139, which is adjacent to HearMNPV hr4. HearNPV NNg1 contains five hrs (hr1-hr5), similar to HearNPV C1, G4, and HzNPV. The arrangement of these hrs on the genome is almost the same in HearNPV C1, G4 and HzNPV, and it is possible that variability in the hr sequences affect not only progeny virus production, but also the insecticidal activity of the Helicoverpa spp. NPVs [9]. The homologous regions are also suggested to be responsible for the difference in virulence between two Mamestra configurata NPV-A variants, v90/4 and v90/2 [54], indicating that the difference in the organization of the homologous regions of HearMNPV and MacoNPV-B are possibly associated with mechanisms of recombination.

Conclusion

HearMNPV differs significantly from HearSNPV not only in biological properties and morphology, but also in gene content, arrangement, and homology level based on genome sequence comparison, which considered to be different viruses, and not variants of the same virus. Although the average amino acid sequence identity between HearMNPV and MacoNPV-B is 98.5%, but their effective host range are different. Moreover, a 5.4-kb segment of the MacoNPV-B genome which is the apparent result of recombination with an ancestor of XecnGV is absent in the HearMNPV genome, suggesting that the recombination event responsible for the occurrence of this 5.4 kb segment occurred after the divergence of MacoNPV-B and HearMNPV. The location and length of HearMNPV orf66 and MacoNPV-B orf18 are different in their respective genomes. Phylogenetic analysis indicated that these events may occur after MacoNPV-B and MacoNPV-A separated from their ancestor. These distinct differences between HearMNPV and MacoNPV-B may account for their different host range.