Introduction

Seed storage proteins (SSPs) are a set of proteins that accumulate to high levels in seeds during later stages of seed development. During seed germination, these proteins are degraded, and the resulting amino acids are utilized by developing seedlings as a nutritional source. SSPs, the major proteins of grains, are the plant proteins most abundantly consumed by humans. The accumulation of SSPs is therefore a closely regulated biosynthetic process of great agronomic and economic importance. SSPs of most dicotyledonous species are synthesized exclusively in developing embryos, and their biosynthesis represents an excellent system for the investigation of tissue-specific and developmentally controlled gene expression [1].

SSPs are classified into four groups: globulins, albumins, prolamins, and glutelins [2]. The globulins, dominantly represented in legumes, are further divided into subgroups: 11S legumin-type and 7–8S vicilin-type [3]. Legumin is a member of storage proteins (11S globulins) originally found in the family Leguminaceae [37] but later found in a diverse array of higher plants, including monocots [610].

Legumin is a histidine- and glutamine-rich protein and synthesized as a single polypeptide of 50–55 kDa. Prolegumin is generated by linking of α- and β-subunit regions via cysteine residues through disulfide bonds at the time of insertion of nascent preprolegumin into the endoplasmic reticulum lumen followed by cleavage of the signal sequences. The prolegumins are assembled into trimers, transferred through vesicles into the storage vacuole, and cleaved into α and β-subunits to form a hexamer [8]. Molecular masses of α- and β-subunits are generally 30–40 and 18–20 kDa, respectively [35, 7, 8]. The amino acid sequences of β-chains are more homogeneous than those of the α-chain, which vary considerably in length because of different numbers of repeats in the C-terminal region [3, 7, 8].

The 11S globulins proteins are collectively called as legumins, but in many plants, trivial names are derived from the genus of the plant, for example, the 11S of soybean (Glycine max), Douglas fir, sunflower, and rapeseed are referred to as glycinin, psuedotsugin, helianthin, and cruciferin, respectively. In addition to the 11S globulins (legumin), there is another closely related storage protein, the 7S/8S globulin, commonly called vicilin, which forms a trimer linked by disulfide linkages [8, 11, 12].

There is a direct link between the study of gene expression in plants and the importance of manipulating the control of gene expression. Cloned cDNAs corresponding to mRNA encoding 7S and 11S storage globulins have allowed the analysis of the primary amino acid sequence of storage proteins, the number of genes encoding proteins, expression of these genes during seed development, and isolation and determination of the structure of genes themselves. Globulins, SSPs, are encoded by gene families varying from a few to as many as 20 or more members. Group 1 glycinins of soybean are encoded by three or four genes [13], while in pea, it was estimated that there are eight genes for legumin, 11 for vicilin, and one for convicilin [14]. The oat 12S globulin gene family appears to contain six to eight members [15]; however, no study is found in literature for legumin gene isolation from pigeonpea and its heterologous expression.

Recent advances in relation to transgenic transfer of storage protein genes have opened up possibilities of examining regulatory sequences that control the developmental expression of SSPs. In most cases, genes of storage proteins have been expressed only in seed tissues and not in others. Further, researchers are needed to delineate the cis-acting sequences regulating the expression of genes, which would help in the hyperexpression of desired storage protein genes [16, 17]. Attention has now shifted toward the incorporation of useful genes for reducing yield losses and those for quality improvement.

The ultimate aim of the present study is to develop crop varieties that combine high quality with high yield having balanced essential amino acids. Therefore, proteins are one of the targets for improving the nutritional quality, and attempts are being made through the manipulation of its native legumin genes. Studies of the structure of SSPs, legumin, and their interactions have been limited by the difficulty of isolating subunits in large amounts from the complex mixture of the seed endosperm; besides this, it is also difficult to crystallize mature SSPs, prepared from seeds, because of the heterogeneity of molecular species [18]. One way to overcome this problem is the expression of the legumin gene in heterologous systems. These systems have the additional advantage that specific gene modifications can be made and the new gene constructs can quickly be expressed. Our previous research have proved that the Escherichia coli system is very efficient in expressing large amounts of individual SSPs [19], which are biologically active too, derived from a variety of eukaryotic organisms. Previously, we isolated and characterized the vicilin gene of the pigeonpea [20]; in the present study, the legumin protein was extracted from pigeonpea seeds of different developmental stages for characterization, and the legumin gene was isolated from the cDNA library, constructed from 18-day-old (days after flowering [DAF]) immature seeds of pigeonpea. Legumin gene was further characterized by DNA blotting, and its expression was detected in E. coli by immunoblotting using a polyclonal antibody.

Materials and Methods

Plant Material

Mature seeds of pigeonpea (Cajanus cajan L.) var. UPAS-120 were obtained from the Department of Plant Breeding, G.B. Pant University of Agriculture and Technology, Pantnagar, India, and the variety H 82-1 was taken from Indian Institute of Pulse Research, Kanpur, India. The moisture content was determined by taking 1.0 g seeds of variety UPAS-120 from 5, 10, 15, 20, and 25 DAF, using the method of Iwabuchi and Yamauchi [21].

Seed Storage Protein Fractionation and Analysis

Total SSPs of mature seed were isolated using the method of Wright and Boulter [4] and estimated by the biuret method [22]. The extracted SSPs were precipitated with ammonium sulfate and separated in two fractions, soluble albumin and insoluble globulins, by dissolving in sodium citrate buffer (pH 4.7). Both fractions were lyophilized and quantified, and globulins were again fractionated on Sephadex G-50 column using the zonal isoelectric precipitation method of Shutov and Vaĭntraub [23]. The procedure of Laemmli [24] was used to check the protein-banding pattern, total number of polypeptides, and molecular weight of each fraction. Slab gel electrophoresis of albumins and globulins: vicilins, and legumins of pigeonpea var. UPAS-120 and H 82-1 was performed. The albumins and globulins of both the varieties were run on 10% polyacrylamide gel electrophoresis (PAGE) and sodium dodecyl sulfate (SDS) PAGE [25], and the gels were analyzed by Molecular Imager System (Bio-Rad, USA). The molecular weights were calculated, and the banding pattern was analyzed using Quantity One 1-D analysis software (Bio-Rad).

Preparation of Polyclonal Antibodies and Immunological Study

Polyclonal antibodies against legumin 11S polypeptide was raised in a New Zealand hyperimmune white rabbit using the method of Lee et al. [26]. The cross-reactivity was checked for polyclonal antibodies by dot-blot analysis and indirect enzyme-linked immunosorbent assay (ELISA). Dot blot is an efficient technique for the detection of proteins as it is a very rapid technique and also semiquantitative in nature, while indirect ELISA is a quantitative as well as qualitative method of protein detection.

cDNA Library and Cloning of Legumin Gene

Total RNA was isolated from 18-day-old (DAF) immature seeds of pigeonpea using the guanidium thiocynate method [27] and electrophoresed in denatured 1.5% agarose gel [25, 28], prepared in 3-(N-morpholino)propanesulfonic acid buffer (0.1 M 3-(N-morpholino)propanesulfonic acid, 40 mM sodium acetate, and 5 mM ethylenediamine tetraacetic acid, pH 7.0) and quantified (spectrophometer, optical density, OD260). mRNA was purified using an oligo dT column (New England Biolabs, UK) and subjected for the construction of a representative cDNA library of SSP genes by using the cDNA synthesis system (Promega, USA). The resulting cDNA was cloned into a vector (λgt-11).

The cloned cDNA was packaged into bacteriophage heads using the Ready-to-go packaging extract (Pharmacia Biotech, UK). The extract had been prepared from E. coli strain Y1090 RM+ lysogen in which the prophage carried a cos mutation. The cos mutation is a deletion in the cos site, which prevented the endogenous prophage from being packaged while exogenous recombinant DNA is efficiently packaged. After packaging cDNA, cloned in the λgt-11 vector, the recombinant and nonrecombinant clones were checked by the selection of blue/white plaques with isopropyl-β-d-thiogalactopyranoside (IPTG)/X-gal.

Screening of cDNA Library for Legumin Gene

The nonradioactive deoxygenin (DIG)-labeled legumin cDNA probe (PRC 924), a gift from R. Casey, was used to screen the legumin gene. In situ plaque hybridization was carried out using a N+ Nylon membrane and DIG DNA detection kit (Boehringer Mannheim, Germany), following the manufacturer’s protocol.

Subcloning and Sequencing of Legumin Gene

Lambda phage (recombinant λgt-11 clones) was isolated from all positive legumin plaques [25] and subjected to polymerase chain reaction (PCR; initial denaturation at 94°C for 5 min, 40 cycles of denaturation 94°C for 2 min, annealing 55°C for 2 min and extension at 72°C for 2 min) using specific λgt-11 forward (5′-GGT GGC GAC GAC TCC TGG AGC CCG-3′) and reverse (5′-TTG ACA CCA GAC CAA CTG GTA ATG-3′) primers, to determine the insert size. PCR amplicons were purified (Qiagen, USA) and subjected to EcoRI restriction digestion (as an insert is flanking with this site that was introduced with adapters during cDNA synthesis and cloning). The restriction-digested insert was gel eluted and purified (Qiagen) and ligated to dephosphorylated pUC18 vector arms. The ligation mix was transformed in the E. coli DH 5α strain, and positive colons were preliminarily screened by blue/white selection with IPTG/X-gal analysis, followed by secondary screening with PCR (both colony and plasmid) and restriction digestion; thereafter, putative legumin clones were sequenced by an automated DNA sequencer. Sequences of clones were subjected for the BLAST and studied using online available bioinformatics tools and submitted to the National Center for Biotechnology Information (NCBI; www.ncbi.nlm.nih.gov) gene data bank.

Characterization of Legumin Gene (DNA Blotting)

Plasmid DNA from positive legumin clones was isolated and digested with the EcoRI restriction endonuclease enzyme to excise the insert. The intact plasmid and digested insert were electrophoresed on 1.2% agarose gel, and the DNA was transferred to a N+ Nylon membrane. After transferring, it was hybridized with nonradioactive DIG-labeled legumin probes using the DIG DNA detection kit (Boehringer Mannheim, Germany) following the manufacturer’s protocol.

Legumin Protein Analysis

Legumin protein analysis was done using computational biology. Amino acid sequences were deduced from nucleotide sequences and imported to ExPASy server (http://www.expasy.org/tools/) for primary analysis and to P-val and PSIPRED server (http://bioinf.cs.ucl.ac.uk/psipred/psiform.html) for secondary structure prediction and analysis. Legumin amino acid sequences were also subjected to NCBI BLAST to compare with the Protein Data Bank (PDB) database. The PDB database was explored for the comparative study of the legumin protein domain and residue interactions with others homologous domains using the online Cn3D4.1 software (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml).

Small-scale Expression of Legumin Gene in E. coli

The complementary domain sequence (CDS) of the legumin gene was amplified from pUC18 cloned legumin cDNA, using primer combination (F: 5′-CTC TCA TAT GAG AGT ACA GGC ACA GC-3′ and R: 5′-CTC TCT CGA GTC AGC ATA GTT TTG TG-3′). The NdeI (CATATG) and XhoI (CTCGAG) restriction sites were incorporated in the forward and reverse primers, respectively, for the directional and in-frame cloning. The PCR reactions were heated at 94 °C for 2 min to denature the DNA, cycled 35 times at 94 °C for 1 min, 62 °C for 1 min, 72 °C for 1.5 min, and finally incubated at 72 °C for 7 min. PCR products were purified, digested with NdeI and XhoI, ligated into the expression vector pET-24a (Novagen, USA), and transformed to E. coli BL21 (DE3) host cells. A single bacterial colony, containing the recombinant legumin gene in pET-24a, was inoculated into 50 ml 2YT medium containing 50 μg ml−1 kanamycin and 100 mM IPTG, and incubated at 37 °C with gentle shaking until the OD600 was approximately 0.6. The cells from 50 ml of the culture were collected by centrifugation at 10,000 × g for 1 min and redissolved in lysis buffer (0.1 M MgCl2, 4% [w/v] SDS, 10% [v/v] glycerol, 5% [v/v] 2-mercaptoethanol, 0.01% [w/v], bromophenol blue, 100 mM Tris–HCl, pH 6.8) [19, 29], and the proteins were analyzed by SDS-PAGE and protein gel blotting using the polyclonal antibody. Legumin and purified globulin proteins isolated from pigeonpea were used as the control.

Result and Discussion

SSPs are found only in seeds, and their accumulation in legume seeds shows sigmoidal kinetics, closely following dry weight (D.W.) gain and is associated with the cell expansion phase of seed development [30]. Detailed analysis of seed protein accumulation has shown that there are usually differential rates of accumulation for each of the storage proteins during seed development [31, 32]. Studies have shown that there is a continuous decrease in moisture content in the developing seeds, which is necessary for the inactivation of metabolic enzymes. In the present study, moisture content was determined by taking 1.0 g seeds of variety UPAS-120 5, 10, 15, 20, and 25 DAF, and it was found to be 90.9%, 80.7%, 78.7%, 73.02%, and 57.51%, respectively, while the moisture content of mature seeds was observed to be 9.1%. This is achieved by lowering the water content; hence, dry and mature seeds have very low amounts of water [33].

Seed Storage Proteins Fractionation and Analysis

The nutritional quality of pigeonpea in terms of its chemical constituents, proteins, amino acids, and digestibility has received increasing attention. Protein quality is of prime importance in pigeonpea products used for human food [34]. The total seed protein content in the two varieties of pigeonpea UPAS-120 and H82-1 were analyzed by using the biuret method [22] and was found to be 29.27% (on dry-weight basis [d.b.]) and 23.15% (on d.b.), respectively (Table 1). Hulse [35] found that the protein content of pigeonpea seed samples ranged between 18.5% and 26.3%, while Singh and Jambunathan [36] found that protein content of 43 commonly cultivated varieties of pigeonpea ranged from 21.1% to 28.1% for dhal samples. Mehta et al. [37] have also reported that the protein content in legume seeds in the range of 20–40% on d.b. The amounts of storage proteins extracted with phosphate buffer were found to be 25.78% in UPAS-120 and 21.78% in H 82-1 (D.W.), which accounts for 88.08% and 94.08% of total proteins, respectively.

Table 1 Amount of SSPs extracted from pigeonpea (Cajanus cajan L.) seeds variety UPAS-120 and H 82–1.

The total extracted proteins were subjected to 0–100% ammonium sulfate precipitations. Only 88–94% of the total protein was extracted, of which 79–91% (of extracted protein) were precipitated by ammonium sulfate. The major fractions, globulins and albumins, were separated by sodium citrate buffer (pH 4.7) and found to be 62–63% and 37% of the precipitated proteins, respectively. The ratio of globulins to albumins in mature seeds was approximately 2:1 (Table 1). Almost the same type of results were also obtained in a quantitative characterization of seeds from 59 pea (Pisum sativum L.) lines and relative taxa of wide geographical origin, concerning its starch and protein contents and globulin composition [38].

Globulins and albumins of developing seeds, collected at 5, 10, 15, 20, 25 daf, were also determined. In the phases of seed development, cell division occurred, and intermediates were built up [39]. The rate of protein synthesis remains low with very little proteins being synthesized initially. During later stages and maturation of seeds, food reserved in the form of lipids, carbohydrates, and proteins was laid down in the cotyledons. Ultrastructural studies of Phaseolus vulgaris [40], Vicia faba [41], and P. sativum [42] showed the pattern of multivasiculation in the cotyledon cells. The albumin fraction continuously increased from 5th to 25th day (DAF), and globulin proteins started synthesizing at later stages from the tenth day, and the maximum amount was observed at the 20th day; after that, its synthesis decreased gradually. The same pattern was also observed in several other legumes crops [39, 4345].

Globulin fractions from mature seeds of pigeonpea varieties UPAS-120 and H82-1 were further fractionated on a Sephadex G-50 column and eluted with citrate buffer (pH 4.7). The elution profile produced two peaks in both varieties: the first was soluble and second insoluble globulins, i.e., vicilin (7S) and legumin (11S), respectively, corresponding to pH 4.7. Wright and Boulter [4] also found third storage proteins named convicilin [46], during zonal isoelectric precipitation of pea gobulins. They proposed that convicilin was found to be associated with legumin and vicilin protein fraction at ionic strengths of 0.15 and 0.25 M (or higher), respectively. Croy et al. [46] eluted convicilin by zonal isoelectric precipitation separately from vicilin and legumin by a linear salt gradient at pH 4.7 and found a third overlapping peak between peaks I and II. The convicilin was eluted from the interphase of these peaks by running on a Sephadex G-100 column. Fractions containing convicilin were pooled and applied directly to a column of hydroxyapatite and eluted with a linear concentration gradient of potassium phosphate buffer (pH 8.0). Wright and Boulter [4] have also reported that the zonal isoelectric precipitation is an extremely effective method for obtaining crude legumin and vicilin (about 95% pure) in a single step. In the present study, the interphase, containing concivilin fraction, was discarded, and the remaining two fractions, vicilin and legumin, were collected through ammonium sulfate precipitation. The ration of legumin to vicilin in two varieties of pigeonpea was 39:59 (UPAS-120) and 39.5:59.5 (H 82-1).

For further characterization, all fractions were pooled, lyophilized, and subjected to 10% PAGE and SDS-PAGE. A similar globulin-banding pattern was observed in both varieties, but many differences were observed in albumins, which revealed that globulin proteins are much conserved proteins than albumins. Two fractions of globulin, legumin, and vicilin showed much homology among the varieties; however, there is heterogeneity between legumin and vicilin in the solubility and banding pattern on SDS-PAGE. These results clearly revealed that both the proteins have originated from different groups of genes [47], even though evolution study [48] confirmed that globulin proteins, mainly legumin, were originated by a single cluster of genes from Archaea bacteria to plant and found more homology among them. The globulin pattern remained quantitatively almost the same throughout the development.

PAGE analysis of globulins of variety UPAS-120 and H 82-1 showed differences in some low- and high-molecular-weight bands, viz., 93.12-, 90.76-, and 84.51-kDa bands were present in UPAS-120 but absent in var. H82-1. Similarly, low-molecular-weight bands of 44.23 and 26.72 kDa were present in UPAS-120 only, while 105.84-, 42.64-, and 35.32-kDa bands were present in H 82-1 only. All other bands were the same in both varieties. SDS-PAGE for the globulin of UPAS-120 had 56.08-, 30.95-, and 15.75-kDa bands, and the globulin of H 82-1 had 97.98-, 96.43-, and 14.65-kDa bands different than each other, while the remaining bands were the same.

Globulins showed a difference in banding patterns, when electrophoresed in the presence and absence of β-marcaptoethanol. Bands of 93.12, 90.47, 83.54, 57.01, 44.23, and 32.77 kDa of UPAS-120 globulins disappeared, and some new bands became sharp (42.68, 38.07, 35.01, 33.45, 31.95, 29.29, 22.82, 21.84, 20.72, 17.47, 16.59, 15.75, 8.58, 4.32, and 1.52 kDa) in a 1D scan of the gel; similarly, in H 82-1, 105.84- and 54.11-kDa bands disappeared, and some new bands appeared (38.62, 35.28, 23.76, 23.05, 22.10, 21.69, 21.01, 14.65, 9.98, 5.08, and 1.88 kDa) in the presence of β-mercaptoethanol, other bands were unchanged. An almost similar result was also observed in pea globulin composition, when the protein was electrophoresed with and without β-marcaptoethanol [38].

Mehta et al. [37] revealed extensive α- and β-polypeptide heterogeneity in dissociated legumins. These proteins mainly composed of two subunits, two acidic α-subunits (40 kDa) and six different basic β-subunits (20 kDa). These 20- and 40-kDa subunits are cross-linked by disulfide bonds [49]. Gupta et al. [50] found 19 polypeptides ranging from 13.8 to 92.26 kDa in lentil (Lens culinaris M), while Sital and Narang [51] observed three subunits, with molecular weights 16, 50, and 56 kDa for mung bean (Vigna radiata L.). In the present study, the globulin fraction of peak II (legumin, 11s) proteins showed nine bands with the same banding pattern in both varieties. These studies revealed that globulin genes are highly conserved and belong to multiple gene family [48, 52, 53].

Immunological Study

Globulin fractions were again fractionated by zonal isoelectric precipitation into legumin (11S) and vicilin (7S) proteins. Polyclonal antibodies were raised against purified protein legumin 11S, and the titer was found to be approximately 1:10,000. Cross-reactivity was also checked for polyclonal antibodies by dot-blot analysis, and it was found that legumin antisera showed no cross-reactivity with vicilin and albumins. Legumin protein contents were determined by the indirect ELISA method, and it was found that the seed harvested at 5, 10, 15, 20, and 25 DAF and mature seeds contained 48.16, 54.23, 58.13, 62.98, 62.16, and 60.58 μg/mg legumin protein, respectively.

The ratio of vicilin to legumin was also calculated in developing seeds of pigeonpea from the 5th to 25th day (DAF) by the same indirect ELISA, and it was much higher at the early maturation phase, and then it decreased at a later stage. This was a typical curious observation that vicilin started to synthesize first and reached at a maximum level within 15 to 20 days and legumin reached at a maximum level at 20 days after flowering. Similar results were also observed in several other legume seeds [51, 54]. These studies revealed the nature of legumin and vicilin genes; both genes were originated separately during evolution and show subunit heterogeneity among them, but the developmental study of the seeds revealed that both types of genes expressed at 15 to 20 days after flowering.

cDNA Library and Cloning of Legumin Gene

mRNA synthesis for storage proteins must be initiated very early during the development [1, 5557], with a slow rate [58], and vicilin mRNA level decreases much earlier than that of legumin [59]. Saha and Koundal [60] used 18-DAF seeds of chickpea for the construction of cDNA library. In the present study, it was observed that 18–20-daf legumin proteins are synthesized with a maximum rate, and a previous study [20] revealed that the isolation of legumin (11S) gene from cDNA clones requires the cDNA library of immature seeds of pigeonpea at 20 DAF. Total RNA/mRNA was isolated 20 DAF and was subjected to cDNA library preparation [1]. The library was preliminary screened with a blue/white color selection. The total number of plaques in the library was 2,268, out of which 399 were nonrecombinant (blue), while the remaining 1,869 (82.40%) were recombinant (white plaques).

Domoney and Casey [14] obtained 120 recombinant clones when used largest cDNA fraction, fractionated on sucrose gradient. Ishibashi and Minamikawa [61] and Fujino et al. [62] both obtained about 20,000 phages from 2 μg of poly A+ RNA, isolated from cotyledons of mature seeds using λgt-10 vectors, while Saha and Koundal [60] obtained 1.5 × 106 PFU/μg of RNA, and the library contained nearly 40% recombinant plaques as found by blue–white color selection using IPTG/X-gal. After size exclusion column purification through Sephacryl S-300 or other gradient methods, it was found that small-size cDNA clones (~300 kb) were present in large numbers, but they did not have a complete sequence to represent a polypeptide or proteins. Therefore, in the present study, all cDNAs smaller than ~300 kb were discarded, and only larger cDNAs were picked to reduce the number of clones in the library but increase the chances of large-size cDNA genes, which may have a complete sequence of the polypeptide. A total of 1.35 × 104 clones were obtained from 5 μg of mRNA, which indicate that the cDNA library must have all major SSP genes.

The cDNA clones were amplified in E. coli LE 392 host cells and legumin clones screened by the nonradioactive DIG-labeled cDNA probe of legumin (PRC 924, gifted from R. Casey). All possible positive clones were reamplified and plated for secondary screening with the same probe. Finally, after the third screening, five positive legumin clones were obtained, and all were subjected to PCR amplification using specific λgt-11 forward and reverse primers followed by restriction digested with EcoRI to find out the insert size [63]. The putative legumin insert of size 1.5 kb was ligated to the dephosphorylated pUC 18 vector and transformed to E. coli DH 5α. Positive clones were preliminarily screened by blue/white selection with IPTG/X-gal analysis, followed by secondary screening with PCR (both colony and plasmid) and restriction digestion, and thereafter sequenced with automated DNA sequencer. Sequences of clones were subjected for blast and studied using online available bioinformatics tools and submitted to NCBI gene data bank (accession number AF 3555403).

Pairwise homology of nucleotide sequences of isolated legumin gene with other known legumin genes was determined by the BLASTN 2.2.16 program of NCBI [64]. Close homology was observed with P. sativum leg A class precursor gene. The overall homology of the isolated legumin gene was observed to be 61.76% with cDNA library genes and 62.35% with genomic library genes. The size of the legumin gene was 1.482 kb, which gave an idea that the clone may contain a complete expressed gene sequence (legumin mRNAs). Saha and Koundel [60] obtained cDNA ranged between 0.7 and 1.8 kb, while Domoney and Casey [14] observed legumin, vicilin, and convicilin seed storage genes of sizes ranging between 0.6 and 2.3 kb. Hager et al. [53] isolated cDNA clones of sizes 1.4 to 1.8 kb, and Gijs [65] isolated a legumin-like storage protein gene from flax (Linum usitatissimum L.) with the size of ~1.0 kb. Ealing and Casey [66] reported two cDNA clones of pea (P. sativum) seed lipoxygenase of size 3 kb; however, Marraccini et al. [67] isolated a 1-kb cDNA clone of the complete 11S SSP gene of Coffea arabica. The NCBI homology blast search, comparative homology, and in silico studies also indicated that isolated legumin cDNA contained a 5′ untranslated region, signal peptide, an open reading frame (ORF; CDS), as well as a 3′ poly-A tail. It is, therefore, concluded that the cDNA clone of the SSP gene legumin (leg) of size 1,482 b is full length and contained complete expressed gene sequences with CDS.

Characterization of Legumin Gene (DNA blotting)

Plasmid DNA from positive legumin clones was isolated and insert was excised by EcoRI restriction digestion. The intact plasmid and insert was elecrophorased on 1.2% agarose gel, and the DNA was transferred to a N+ Nylon membrane. After transferring, it was hybridized with nonradioactive DIG-labeled legumin probes. Both intact and digested DNA gave positive signals, which proved that legumin cDNA was cloned in the pUC18 vector (Fig. 1).

Fig. 1
figure 1

DNA blot analysis of the legumin gene cloned in pUC18 vector, using a nonradioactive DIG-labeled cDNA probe of legumin (PRC 924, gifted from R. Casey). Lane 1EcoRI-digested leg insert of size 1.48 kb; lane 2—undigested vector pUC18 (size 4.2 kb) harboring the leg gene

Legumin Protein Analysis

Legumin gene of 1,482 bp contained an untranslated 5′ leader sequence (86 bp) and an ORF of 1,032 bp, coding for 343 amino acids. In silico translation and the primary structure of the legumin gene was done using online bioinformatics tools (ExPASy Server basedProtParam [68]). Theoretical pI, aliphatic index, and grand average of hydropathicity of the legumin protein (comprising of 343 amino acids with Mr 39,495.9 Da), were found to be 9.70, 84.20, and −0.333, respectively. The extinction coefficient of the deduced legumin protein was found to be 51,045 (assuming all Cys residues appear as half-cystines) and 50,420 M−1 cm−1 (assuming no Cys residues appear as half-cystines) at 280 nm measured in water. The instability index was computed to be 58.92 and classified the protein as unstable. The half-life of the proteins was estimated to be 30 (in mammalian reticulocytes, in vitro), greater than 20 (yeast, in vivo), and greater than 10 h (E. coli, in vivo). The secondary structure of the legumin protein (Fig. 2) was predicted using PSIPRED protein structure prediction server [69, 70], and the significant PDB alignment of the legumin protein (Fig. 3) by BLASTP-2.2.16 [64, 71] was observed with the crystal structure of proglycinin chain A (A1aB1b homotrimer) of soybean (PDB no. 1FXZ [18]).

Fig. 2
figure 2

Predicted secondary structure of the legumin protein using PSIPRED server [69]

Fig. 3
figure 3

BLAST tree showing the alignment of the legumin protein with PDB databases, generated by the fast minimum evolution method of BLASTP-2.2.16 [64, 71]

Legumin protein sequences (accession no. AF355403) were imported in Cn3D software and compared with the 3D structure of soybean (Glycine max) SSP, proglycinin (PDB no. 1FXZA, MMDB Id 17263). Proglycinin is a homotrimer of 1FXZ_A, 1FXZ_B, and 1FXZ_C and each containing two domains d1 and d2. The legumin protein showed homology with 1FXZA-d1. The alignment of the legumin protein (Fig. 4) to the 3D structure of the proglycinin protein revealed the right ORF of the nucleotide sequences encoding the mature active proteins.

Fig. 4
figure 4

Structural alignment of the legumin protein of pigeonpea with proglycinin homotrimer (PDB no. 1FXZ, MMBD-Id 17263). a Cn3D structure of proglycinin (homotrimer, showing all chains and domains, [23]); b superimposition of legumin protomer on proglycinin, showing identical domains (blue in color); c structural alignment of legumin protein with proglycinin chain 1FXZ_A, domain-1; d legumin protein domains and e legumin protein residues, aligned with proglycinin

Small-scale Expression of Legumin Gene in E. coli

The ORF, which encoded a functional common buckwheat legumin-like protein of 195 amino acids, was expressed in the host strain E. coli BL21 (DE3), induced by IPTG; thereafter, recombinant proteins were analyzed by protein gel blot [72]. The production of recombinant 11S globulins in vitro and in vivo may facilitate the elucidation of various steps in their biogenesis, assembly, and deposition. The production of recombinant globulins in E. coli overcomes the limitation of isolating globulin fractions (legumin, vicilin, etc.) in large amounts from the complex mixture of the seed endosperm to study the structure of proteins and their interactions.

Host cells, harboring the recombinant legumin gene, were grown in the presence of IPTG, and total bacterial protein was isolated. Previously purified globulin, legumin protein (from pigeonpea seed), and total bacterial protein was electrophoresed (SDS-PAGE), and proteins were transferred on to the nitrocellulose membrane. The nitrocellulose membrane was first exposed with antilegumin antisera or primary antibodies (1:500 dilution) for antigen–antibody reaction and then subjected to second antibody conjugated with alkaline phosphatase enzyme. The membrane was developed with the substrate of alkaline phosphatase enzyme (NBT/BCIP). Blue color bands appeared in response to the primary legumin antibody reaction. Globulin and legumin proteins produced several bands with sizes of approximately 62.22, 42.31, 37.28, 34.43, 32.56, 24.72, 23.19, 20.87, and 18.22 kDa; however, signals were strong in globulin. The cloned legumin gene expressed in E. coli host cells also produced several bands with sizes of approximately 60.48, 40.83, 35.00, 34.19, 32.31, 22.81, 18.00, and 10.71 kDa (cluster form). Two bands (35.00 and 34.19 kDa) were observed in the total bacterial protein of recombinant cells (Fig. 5) compared to host cells (nonrecombinant), which were the product of legumin genes of the cDNA insert in the plasmid, and others may be the product of host cells and may be having homology to legumin proteins as it was clearly known that the legumin protein evolved from bacteria to plant [48].

Fig. 5
figure 5

Immunoblotting of the legumin protein. Lane 1—total globulin protein, lane 2—legumin protein, fractionated from total SSP of pigeonpea, and lane 3—total protein isolated from log phase of recombinant E. coli, harboring the pET-24a expression vector with the leg gene

The present study may be the first report of the heterologous expression of the legumin gene of pulses (pigeonpea) in E. coli. There are few reports of the heterologous expression of the legumin gene, viz., the expression of cDNA encoding proglycinin subunits of soybean in E. coli [73] and expression of legumin-like protein, FA02 β-subunit, of common buckwheat (Fagopyrum esculentum) [62], although it is often counted as cereals, where enough studies have been done on heterologous expression of seed storage genes in E coli. The ability to produce the legumin gene in heterologous systems, as well as to assemble them into oligomeric forms, will allow the study of these proteins and analysis of their structural functional relationships.

Conclusion

The low sulfur content of legume seeds is the main nutritional limitation; hence, the isolation of the gene coding for high-sulfur legumin is an important step toward crop improvement. The leg-3 gene, isolated from chickpea, contained five methionine and six cysteine amino acid residue-coding sequences [74]. Leg A protein isolated from P. sativum also contained six methionine and cysteine amino acid residues [75] (NCBI accession no. AJ132614). In other legumin proteins, methionine and cysteine residues were present, but their number may vary from 2 to 10. The legumin protein has inter- and intradisulfide bonding between acidic and basic polypeptides; hence, the number of cysteine residues is increased. The isolated legumin gene contains a good number of sulfur containing amino acids (10 methionine and 11 cysteine), and it is efficiently expressed in the heterologous system (E. coli). This gene thus can be transferred in multicopies for the overexpression in pigeonpea as well as in other legume crops so that sulfur amino acids may be increased and the improvement in nutritional quality of legumes may be achieved. The expression of the SSP genes in E. coli also allows the expression in higher plants without modifications and further increases the certainty of expression in transgenics. This knowledge may open new avenues for the expression of other important SSPs and the improvement of legume quality by genetic engineering technology.