Background

Protease inhibitors are classified either by the type of proteases they inhibit, such as serpins, cystatins, and metalloproteases, or according to the presence of structural motifs such as Kunitz, Kasal, or Bowman-Birk (Song et al. 2008). Peptides containing trypsin inhibitor-like domain (TIL) are usually comprised of 56–84 amino residues and known to play a key role in the various physiological processes by inhibiting proteinases (Nie et al. 2012). Generally, TIL-type protease contains a single TIL domain with 10 cysteine residues, which form 5 disulfide bridges (Bania et al. 1999), and shows inhibitory activities against trypsin, cathepsin, chymotrypsin, or elastase. A few TIL domain peptides reported until now include BmKAPi from Mesobuthus martensii Karsch (Zeng et al. 2002), BMSI7 from Boophilus microlpus (Fogaca et al. 2006), SjAPI from Scorpiops jendeki (Chen et al. 2013), APim6 from Apis melifera (Michel et al. 2012), AcCI from Apis cerana (Kim et al. 2013), and BtCI from Bombus terrestris (Qiu et al. 2012). The TIL domain identified from the venom of a scorpion species, Scorpiops jendeki, shows dual function peptide with α-chymotrypsin- and elastase-inhibiting properties (Chen et al. 2013). Similarly, BMSI isolated from hemocytes of B. microplus was reported to inhibit chymotrypsin and elastase (Fogaca et al. 2006). In a frog species, Lepidobatrachus laevis, a TIL domain cysteine-rich peptide was isolated from the skin and showed inhibitory activity against trypsin (Wang et al. 2015). In A. cerana, it was identified that the recombinant AcCI demonstrated inhibitory activity against chymotrypsin (Kim et al. 2013). It has been reported that in Bombyx mori, mutations at the second and sixth cysteine residue dramatically reduced the activity of inhibition against microbial proteases (Li et al. 2016). Though a couple of studies are available on TIL domain-containing cysteine-rich peptides from insects, however, until now, to the best of our knowledge, there is no information available for such kind of peptides in H. armigera.

H. armigera is one of the most destructive polyphagous cosmopolitan insect pest species (Noor-ul-Ane et al. 2018). Over recent decades, several pests have developed resistance against various families of synthetic insecticides (Shakeel et al. 2017). Similarly, H. armigera has also evolved resistance to several insecticides, and resistance might be a factor accountable for its pest status (Wu 2007). This has led to developing alternative control tactics (Sun 2015). Therefore, it is imperative to identify and understand the molecular mechanism of such genes that might play a critical role during various physiological functions of insects.

Considering the importance of TIL domain cysteine-rich peptides in physiological processes, as exhibited by previous reports, the present study aimed to identify and describe the bioinformatic analysis of a cysteine-rich peptide of H. armigera including cloning, isolation of genomic DNA, and investigation of the expression pattern in the fat body of H. armigera, especially at the pupal stage. Furthermore, the phylogenetic relationship of H. armigera TIL domain cysteine-rich peptides to TIL domain peptides from other insects, was studied.

Materials and methods

Insect culture

H. armigera population was nurtured on an artificial diet (Abbasi et al. 2007) under the laboratory conditions of 25 ± 2 °C, 70% RH, and a photoperiod of 16:8-h L:D.

Identification of HaTIL2 cDNA and nucleotide sequencing

The sequence-specific primers were designed in accordance with the length of the cDNA sequence of HaTIL2 to amplify the open reading frame (ORF) (Table 1). Then, automated DNA sequencer (Perkin-Elmer Applied Biosystems, Foster City, CA, USA) was used to sequence the plasmid DNA. The sequence was then compared, using DNASIS and BLAST databanks of NCBI. The amino acid sequence was aligned by MacVector (ver. 6.5, Oxford Molecular Ltd.).

Table 1 Sequences of PCR and qPCR primers used in study of Helicoverpa armigera

Biochemical properties and phylogenetic tree construction

NCBI Basic Protein BLAST was used to identify homologous TIL domain sequences from different insects with H. armigera HaTIL2 sequence used as the query. PEPSTATS was used to compute the primary sequence composition of IPK2 (Rice et al. 2000). The physicochemical parameters of HaTIL2 were predicted using the ProtParam tool (Gasteiger et al. 2005). The SignalP 4.1 server was used to identify the possible secretory signal peptide of HaTIL2 (Petersen et al. 2011). To detect the evolutionary location and phylogenetic similarities of HaTIL2 with other insect TIL domain peptides, multiple sequence alignment (MSA) of selected amino acid sequences was carried out to produce quality alignments. To better understand the evolutionary relationship of HaTIL2 with other TIL domain peptides, serpin sequences of H. armigera, Tribolium castaneum, Musca domestica, Drosophila miranda, Drosophila suzukii, A. florea, Bombus ignitus, B. terrestris, B. impatiens, and A. cerana were used to construct a phylogenetic tree, using the maximum parsimony method in MEGA (Tamura et al. 2013). Tree #1 out of 2 most parsimonious trees (length = 223) is shown. The consistency index was (0.870968), the retention index was (0.800000), and the composite index was 0.699552 (0.696774) for all sites and parsimony-informative sites (in parentheses). The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches. The MP tree was obtained using the Subtree-Pruning-Regrafting (SPR) algorithm with search level 1 in which the initial trees were obtained by the random addition of sequences (10 replicates). The analysis involved 11 amino acid sequences. All positions containing gaps and missing data were eliminated. There were 70 positions in the final dataset. Finally, iTOL was used to render the final image (Letunic and Bork 2016).

Genomic DNA isolation and PCR

The FGENESH program (Salamov and Solovyev 2000) was used to predict the number of potential exons and introns in the HaTIL2 genomic DNA. For experimental verification, the whole bodies of H. armigera (n = 3) were used to isolate DNA (Wizard Genomic DNA Purification Kit, Promega®). The cDNA sequence of HaTIL2 was used to design the amplification primers. The verification of PCR products was done by DNA sequence analysis.

Cloning of genes from H. armigera

To amplify gene-specific primers, cDNA and a set of sense and antisense primers were used in PCR (Table 1). For the digestion reaction, sense primer was incorporated into the BamHI restriction site, whereas to incorporate antisense primer, the KpnI restriction site was used. The amplification by PCR had the following steps: denaturation at 95 °C for 3 min, then 35 cycles at 95 °C for 30 s, 55 °C for 30 s, and 72 °C for 2 min, with a final extension at 72 °C for 5 min. The PCR product was visually examined on a 1% (w/v) agarose gel stained with ethidium bromide using the Bio-Rad imaging system. The gel extraction kit (Omega) was used to purify the target gene-amplified product. The purified product was then ligated to pGEM–T easy vector (Takara) and transformed into Escherichia coli bacteria DH5α. A positive clone was then selected on LB agar plates that contained 50 μg mL−1 ampicillin after incubation at 37 °C overnight. The resulting PCR clones were sequenced by Shanghai Sunny Biotech Co., Ltd.

Collection of tissues

Dissection of healthy H. armigera larvae was performed on the ice under a stereomicroscope (Zeiss, Jena, Germany). The insect fat body was collected and washed with PBS (140 mM NaCl, 27 mM KCl, 8 mM Na2HPO4, and 1.5 mM KH2PO4, pH 7.4). The fat body tissues were stored at − 80 °C for RNA extraction.

Total RNA preparation and synthesis of cDNA template for real-time qPCR

The isolation of the entire RNA was done using TRIzol reagent following the manufacturer’s protocol. The assessment of RNA sample quality was performed by running on 1% (w/v) agarose gel electrophoresis. The UV spectrophotometer was used to assess the concentration of RNA samples. DNase I (Fermentas) eliminated the contamination of DNA. The synthesis of cDNA was carried out using reverse transcription RevertAid™ Reverse Transcriptase (Fermentas) in a 20-μL reaction. Finally, the resultant product was stored at − 80 °C.

Fluorescence real-time quantitative PCR analysis of gene expression

To perform real-time qPCR, a Bio-Rad iQ2 cycler was used and the reaction was carried out in PCR strips. The amplification signal was detected by using SYBR Green chemistry. Triplicate first-strand DNA aliquots for each sample served as templates for RT-qPCR using the SsoFast™ EvaGreen® Supermix (Bio-Rad, Hercules, CA, USA) with an iQ2 Optical System (Bio-Rad). Each reaction was carried out in iQ™ 96-well PCR plates, 100 nM of each primer and a 20-μL volume of cDNA, covered with adhesive seals (Bio-Rad). The thermal reaction was as follows: denaturation for 30 s at 95 °C, then 40 cycles for 5 s at 95 °C, and 60 °C for 10 s. Furthermore, a melting curve analysis was carried out for the amplified product. To analyze the relative expression, the 2− ΔΔCT method (Livak and Schmittgen 2001) was employed with a ribosomal protein L28 (RPL28) used as an internal control for normalization (Chandra et al. 2014; Shakeel et al. 2015, 2018). The list of primers for RT-qPCR is presented in Table 1.

Results and discussion

The TIL domain found in protease inhibitors has been documented to inhibit chymotrypsin, elastase, cathepsin, and trypsin, and thus play an important role in various biological processes. Several protease inhibitors with a TIL domain have been identified previously (Zeng et al. 2002; Michel et al. 2012; Qiu et al. 2012; Chen et al. 2013; Kim et al. 2013). However, currently, there were no other publications about the identification and expression analysis of H. armigera TIL domain cysteine-rich peptides; thus, this work represents the first report about identification, cloning, and expression analysis of a novel TIL domain cysteine-rich peptide, designated as HaTIL2.

Sequence analysis of HaTIL2 and cDNA cloning

In the present study, based on the sequence-specific primers of HaTIL2, 470-bp fragment was amplified, which contained ORF of 240 nucleotides encoding 80 amino acid residues. The deduced amino acid sequence analysis showed that HaTIL2 had predicted the molecular weight of 8.632 kDa with an isoelectric point of 4.41. Obtained results of molecular weight of HaTIL2 were similar to other documented TIL domain peptides (Zeng et al. 2002; Michel et al. 2012; Qiu et al. 2012; Chen et al. 2013 and Kim et al. 2013).

Analysis of HaTIL2 genomic DNA sequence

It has been documented that introns play a vital role in gene expression and the introduction of an intron into a gene elevated the level of gene expression (Le Hir et al. 2003; Nott et al. 2003). In the present study, genomic DNA sequence analysis was conducted to determine the number of introns and exons present in the HaTIL2 gene of H. armigera. For this purpose, the FGENESH software was used to predict the HaTIL2 gene sequence. The results of sequence prediction revealed that the full-length genomic DNA sequence of HaTIL2 has two exons and one intron. To confirm the presence of two exons and one intron in the HaTIL2 sequence, genomic DNA was isolated from H. armigera and used as a template for PCR. The amplification results showed an amplicon size of 574 nucleotides. The PCR products were verified by DNA sequence analysis, which also demonstrated one intron and two exons in HaTIL2. The gain of an intron by TIL domain proteases might be during the evolution process (Zeng et al. 2014). As TIL peptides are also involved in the immune response of insects, gain of intron might give rise to stronger immune response. The TIL family is categorized as a group of inhibitors having low molecular weight and a TIL domain consisting of a high proportion of cysteine residues (Rosengren et al. 2001). In a previous study, it was documented that trypsin sequences contained 7 cysteine residues (Bown et al. 1997), which were lower than those observed in this study, as 11 cysteine residues in HaTIL2 domain peptide were found. The presence of cysteine residues in odd numbers might indicate difference in the activity of inhibition (Li et al. 2016).

Biochemical properties of HaTIL2

The biochemical properties of HaTIL2 predicted by ProtParam revealed Cys (13.8%) as the most abundant amino acid, followed by Val (8.8%), Ser (8.8%), Asp (7.5%), and Ala (6.2%); Trp (0.0%) and Met (1.2%) were the least abundant amino acids in the HaTIL2 sequence (Table 2). The isoelectric point (pI) of HaTIL2 was approximately 5, indicating that it might be soluble in the acidic buffer. Instability indices (Ii) are used to determine the in vivo half-lives, and a protein with an Ii value lower than 40 is predicted to be stable (Rogers et al. 1986). The Ii of HaTIL2 was computed to be 58.89 indicating that it might be thermally unstable. Moreover, the present study obtained a high aliphatic index (80.38) for the HaTIL2 protein. The gradient average hydropathicity (GRAVY) analysis was conducted to determine the hydrophobicity of HaTIL2, whereby a positive GRAVY value indicates a hydrophobic nature and negative GRAVY value, a hydrophilic nature (Kyte and Doolittle 1982). The GRAVY index for HaTIL2 was calculated to be 0.089 (Table 2) indicating that HaTIL2 is hydrophobic and cannot interact favorably with water. The SignalP 4.1 server identified the presence of a signal peptide at the N-terminus location of the protein indicating that HaTIL2 is a secretory protein that might play a critical role in the inhibition of extracellular SPs. Previously, several SPIs have been reported to contain the potential signal peptide and mature peptide (Bania et al. 1999; Cierpicki et al. 2000; Kim et al. 2013).

Table 2 Biochemical properties of the HaTIL2 protein sequence of Helicoverpa armigera

Multiple sequence alignment of HaTIL2

When HaTIL2 was compared with the other members, the multiple sequence alignment showed that HaTIL2 had 83.33% identity with HaTIL3 (H. armigera) and 36% identity with AfCI (Apis florea), which was greater than that of 34% identity with BiCI and BiVSPI (Bombus ignitus), 32% identity with BtCI (Bombus terrestris), and 31% identity with AcCI (Apis cerana) (Fig. 1). The results of HaTIL2 multiple sequence alignment indicated a low identity level (31–36%) with other TIL domain peptides, and such phenomenon was not surprising as this sort of heterogeneity was common among serpins. It has been demonstrated that six serpins of Manduca sexta exhibited a very low level of amino acid identity (40%) within the species (Tong and Kanost 2005).

Fig. 1
figure 1

Alignment of the amino acid sequences between HaTIL2 and other known serine protease inhibitors is shown. The sources of the aligned sequences were Helicoverpa armigera TIL2 (this study, GenBank accession no. MN836536), H. armigera TIL3 (AHX25885.1), A. florea CI (XP_003696076), Bombus ignitus VSPI (AGY95442.1), B. terrestris CI (AFX62368.1), B. impatiens CI (XP_003484766.1), and A. cerana CI (AGB06350.1)

Phylogenetic tree analysis of HaTIL2

To get insight into the evolutionary relationships of HaTIL2 with cysteine-rich peptides from other insects, a neighbor-joining tree was constructed using maximum parsimony analysis. The results of neighbor-joining tree demonstrated that HaTIL2 is in the same clade as HaTIL3 (H. armigera), DmCEI (D. miranda), DsCtAPI (D. suzukii), TcIMI (T. castaneum), and MdCEI (M. domestica) (Fig. 2).

Fig. 2
figure 2

Phylogenetic relationship of HaTIL2 with cysteine-rich proteases of other insects. The sources of the sequences were Helicoverpa armigera TIL2 (this study, GenBank accession no. MN836536), H. armigera TIL3 (AHX25885.1), Tribolium castaneum IMI (XP_008201444.1), Musca domestica CEI (XP_005177283.1), Drosophila miranda CEI (XP_017134624.1), Drosophila suzukii CtAPI (XP_016937015.1) A. florea CI (XP_003696076), Bombus ignitus VSPI (AGY95442.1), B. terrestris CI (AFX62368.1), B. impatiens CI (XP_003484766.1), and A. cerana AcCI (AGB06350.1)

The mRNA expression profile of HaTIL2 in the fat body

In order to characterize the biological role of HaTIL2 in H. armigera, the expression pattern of mRNA was investigated in the fat body at different stages of H. armigera especially at the pupal stage. For this purpose, the SYBER Green real-time PCR was employed on cDNA of H. armigera fat body tissue to determine tissue-specific mRNA expression of HaTIL2. The RPL28 gene was used as an endogenous control. The mRNA transcripts of HaTIL2 were expressed constitutively in the fat body at larval, prepupal, pupal, and adult stages (Fig. 3). The consistent expression of HaTIL2 was observed in the fat body indicating that HaTIL2 is a serine protease inhibitor derived from the H. armigera body. As HaTIL2 was consistently expressed in the fat body, it shows that the fat body is the major site of serine protease inhibitor genes. Our results are also in accordance with the studies of Chamankhah et al. (2003), Li et al. (2012) and Liu et al (2015), where serine protease inhibitor genes were highly expressed in the fat body. The HaTIL2 showed high expression at the prepupal stage and 5th, 8th, 9th, and 10th days of pupal stage, whereas it was low on other days of the pupal stage and at the adult stage (Fig. 3) suggesting that it might also play an important role during the pupal stage.

Fig. 3
figure 3

Expression profiles of HaTIL2 in Helicoverpa armigera fat body tissue are shown. The expression profile of HaTIL2 in the fat body during H. armigera development was analyzed by RT-qPCR. The ribosomal protein L28 gene was used as the internal control for RT-qPCR

Conclusion

This is the first study to identify a cysteine-rich serine protease inhibitor from the fat body of H. armigera. Obtained results are of particular interest, as a new avenue of research towards functional studies of protease inhibitors will be opened. Further studies should be conducted to find out the detailed functional role of protease inhibitors in H. armigera.