1 Introduction

A plant molecular pharming system has been thoroughly explored over the last two decades. Molecular pharming has had positive impacts on the production of pharmaceutically important proteins. It has proven that some pharmaceuticals displaying advantageous clinical properties over their native counterparts could be produced in efficient ways by molecular pharming technologies. As molecular pharming platforms, plants are excellent biofactories for the production of drugs, antibodies, and vaccines in various host systems such as whole transgenic plants, cell suspension culture, hairy roots, and hydroponic culture [13]. However, based on the biochemical pathway features, plant species are different in their ability to express and accumulate recombinant proteins. Various plant species have already been investigated for their relevance to produce bioactive molecules in an artificial way in plants. Table 1 summarizes some human antibodies and vaccines produced in different plant species and eukaryotic algae.

Table 1 Non-exhaustive list of recombinant proteins produced in plants, with the pathology and clinical status phases indicated for each recombinant protein

Thanks to important advantages over other prokaryotic or eukaryotic systems, plants have gained great importance in molecular pharming because they offer (1) reduced contamination risks, (2) reduced costs, (3) scaling-up possibilities [46], and (4) synthesizing of large and complex protein compounds while retaining their activities (post-translational modifications) [7]. There is, however, a major drawback to a plant molecular system regarding low expression and accumulation levels of some foreign proteins. To overcome this shortcoming, several studies have been conducted to improve the different aspects that allow increasing the yield of the recombinant proteins in plants. Some of these studies have focused on the choice of plant hosts, others on the bioengineering strategies of the target proteins and the expression cassettes with their components such as promoters, terminal sequences, epitope tags, and signal peptides. These aspects must be taken into account when we attempt to maximize the expression and yield of the targeted proteins to increase their production. Intensified efforts have also been done to improve heterologous gene structures and codon optimization. Here, we review these aspects and report recent advances in the improvements of plant molecular pharming to increase protein yield and accumulation based on upstream and downstream processing studies and empirical essays.

2 Choice of the Host: The Appropriate Plant Platform

The choice of suitable plants for molecular pharming technology is an essential factor for the success of the plant molecular pharming approach. The choice of host plant depends on a broad range of criteria including the nature of the protein, ability for transformation and regeneration, post-translational modifications, scale-up of production and maintenance costs, span of production cycles, and the downstream processing requirements. A wide range of plant crops have thus been tested for molecular pharming purposes, including leafy crops, cereal and legume seeds, oil crops, plant cell suspensions, hairy roots, and microalgae.

Leafy crops are helpful in terms of biomass yield and high soluble protein levels [8]. Additionally, leaf harvesting does not need flowering and thus significantly reduces contamination through pollen or seed dispersal [5]. However, there is a major problem of instability of the expressed proteins in leaves due to proteolytic degradation with aging of the leaves. In fact, the instability of proteins present in leaf cells, and also in cells of the other plant tissues, may start during the translation of the foreign proteins, which hold a natural tendency towards a structural heterogeneity in a heterologous environment [9]. Despite this, one of the major causes of protein instability inside the leaf cells is the presence of numerous proteolytic vacuoles in their cytoplasm. In fact, the mature leaves possess very large extra cytoplasmic vacuolar compartments containing numerous active proteolytic enzymes that are involved in the degradation of native and foreign proteins, notably after harvesting or during downstream processing (extraction/purification from freshly collected leaves). One main life-supporting function of the vacuolar compartment is protein breakdown. Indeed, numerous vacuoles are major sites of cellular proteolysis and contribute largely to amino acid recycling in the cell, in vivo [10]. To avoid this hurdle, the leaves must be processed immediately at the farm or transported as dried or frozen material [11]. Moreover, protein expression in plant aerial parts could affect the growth and development of the host plant. Tobacco is the most suitable leafy crop for many reasons such as high biomass yield, well-established technology for gene transfer and expression, year-round growth and harvesting, and the existence of large-scale infrastructure for processing. Furthermore, tobacco has little risk in contaminating either food or feed chains because it is a non-food or non-feed crop. Although many tobacco cultivars contain high levels of toxic alkaloids and phenolic substances [12], these compounds can be removed during the purification process. The use of transgenic tobacco chloroplasts as an alternative bioreactor presents important advantages including high transgene-copy number and high level of accumulated proteins with reduced toxicity for the host plant [13]. Many recombinant proteins have been produced at high levels in tobacco chloroplasts. For example, Oey et al. [14] obtained a proteinaceous antibiotic by transformation of the tobacco chloroplast by up to 70 % of the total soluble proteins (TSP), which is the highest recombinant protein accumulation accomplished so far in plants. More recently, a proinsulin has been expressed in tobacco and lettuce chloroplasts with reduced cost and facility for oral delivery [15].

Alternative leafy crops such as alfalfa, lettuce, and soybean are also being investigated as suitable hosts for molecular pharming [16]. Alfalfa and soybean produce lower amounts of leaf biomass than tobacco, but they have a major advantage of using atmospheric nitrogen through symbiotic relations, resulting in reduced fertilizer inputs [17]. Alfalfa is particularly useful because it can be harvested up to nine times a year and has high leaf protein content (up to 30 mg total protein per gram in fresh weight). Medicago Inc. Company (http://www.medicago.com) has invested in the development of transformation methods for the production of recombinant proteins in alfalfa leaves in order to improve the yield of antibodies with human-like N-glycans [18].

Lettuce is also being investigated as a production host for edible recombinant vaccines and it has been used in a series of clinical trials for vaccine production against hepatitis B [19], pig edema disease [20], pneumonic plague [16], cholera [21], severe acute respiratory syndrome coronavirus [22], and porcine epidemic diarrhea viruses [23]. One disadvantage, however, of the chloroplast transgenic system is that plant plastids do not ensure some of the important post-translational modifications, such as glycosylation. The latter is required for the function of many human proteins; the only organelle that can achieve this process is the Golgi apparatus. However, there are differences between the models of human and plant glycosylation, and these differences can cause an immune response in humans [24]. Therefore, plastid-based expression technology is limited to the production of non-glycosylated products [25].

Seeds of cereals and legumes offer many advantages, in particular the long stability of their expressed proteins because seeds have an appropriate biochemical and physiologic environment such as dormancy and low water content that promote stable protein accumulation [26]. Indeed, the lost water by desiccation changes the pH ambient, consequently decreasing the activity of proteases inside seed cells and protecting the recombinant protein contents. Hence, the protein storage in seed endosperm allows long-term storage for at least 3 years at room temperature with no detectable loss of protein activity [27]. Cereal seeds also lack the phenolic substances that are present in tobacco leaves, increasing the efficiency of downstream processing [7]. The high seed protein content, ranging from 7 to 10 %, is translated into high expression for several seed-targeted recombinant proteins [2831]. However, the overall yields of recombinant proteins in seed crops are much lower than in tobacco. Additionally, the transgenic plants must pass through a flowering cycle to produce seeds; therefore, there is a possibility of contamination that may occur by pollen transfer [5, 32].

Maize, rice, barley, and wheat are the most important platforms for seed recombinant protein production. Generally, maize seeds are excellent bioreactors because their kernel sizes are large (with 82 % of endosperm), which is the principal site of protein accumulation [33, 34]. Maize also has several advantages such as having the highest biomass yield among food crops, and ease of transformation and scaling-up [7]. The foremost disadvantage of using maize as a bioreactor is that it is a cross-pollinating plant, and thus there is substantial risk of contamination of maize crops by their transgenic counterparts [35]. Similar to maize, rice is easy to transform and to scale-up with a self-pollinating trait, reducing the probability of horizontal gene flow [36]. Recent studies report that rice has the potential to offer an oral delivery system for vaccine antigens and therapeutic proteins [31, 32]. Like rice, barley is a self-pollinating crop and could be considered a better choice than maize [7]. However, barley is less widely grown than maize and harder to transform [37, 38]. On the other hand, wheat shows the lowest yields per unit biomass and it is still under development and improving transformation efficiency [39].

Legume seeds are also an interesting production system because they have exceptionally high protein content (20–40 %), which could produce high yields of recombinant proteins too. Soybean seeds offer the advantage of a low risk of contamination by pollen because soybean is largely self-pollinating, with a low production cost. Furthermore, soybean has been explored for its efficacy in expressing human growth hormone [40] and humanized antibody against herpes simplex virus [41]. Similar to soybean, peas present comparable advantages. However, only a few studies have reported on the use of pea seeds as a platform for commercial production of recombinant proteins. Transgenic pea seeds are used as bioreactors for the production of a single-chain Fv fragment (scFV) antibody [42]. On the other hand, Saalbach et al. [43] have reported a high level of expression of scFV antibody with 2 % of the total seed protein.

Because of their inherently low associated proteolytic activity and simplified protein isolation via oil body separations, oleosin-fusion is an emerging approach as a promising system for recombinant protein production. The target recombinant protein is produced as a fusion with oleosin, accumulates in seed oil bodies, and can be purified using simple extraction and centrifugation procedures followed by the release of the target protein by proteolytic cleavage [26, 44]. However, oleosin expression levels are still not high enough for economical production. Safflower seeds are also considered advantageous for their high protein yield, low acreage, and self-pollination [32].

Plant cell suspension culture has been recognized as a promising alternative biosynthetic platform for valuable proteins, and combines the qualities of whole-plant systems with those of microbial fermentation and mammalian cell culture [45, 46]. Suspension cells in particular increase safety compared with mammalian cell systems which can harbor human pathogens. Additionally, it is well-suited for product isolation when it is secreted into the culture medium. The first plant-produced commercial therapeutic protein for human infusion, glucocerebrosidase, was produced in carrot cells [47], though other recombinant proteins have previously been produced in plants for therapeutic purposes [48, 49].

Insertion into the plant nuclear genome of the T-DNA segment, carried on a voluminous and extrachromosomal Ri-plasmid (Ri for root-inducing) present in the phytopathogen Gram-negative soil bacterium, Agrobacterium rhizogenes [50, 51], leads to the emergence of neoplastic adventitious roots, referred to as “hairy root syndrome” [52, 53]. The tumorous roots, resulting from an A. rhizogenes-mediated transformation, can be grown indefinitely in vitro and have become a viable alternative production platform for heterologous proteins, as well as for other plant metabolites naturally biosynthesized in wild roots [2]. Similar to suspension cells, hairy roots can be axenically cultured in controlled and contained environments for the production of high-value pharmaceutical proteins. In addition, the possible extracellular secretion or rhizosecretion of expressed proteins from hairy root cultures [54] offers a simplified method for the recovery of foreign proteins from a low-cost process. The advantages of rapidly growing hairy roots over suspension cells include long-term genotype and phenotype stability, efficient productivity, and good growth on hormone-free media [50]. The Swiss biotechnology company Rootec (http://www.rootec.com) is currently focusing on the commercialization of plant-derived products with hairy roots grown in mist bioreactors. Hairy roots appear to be an attractive in vitro expression system with several advantages compared with field-grown plants and suspension cultured cells.

Other systems, such as eukaryotic algae and higher aquatic plants (e.g., duckweeds) have been envisaged for heterologous protein expression. The main advantages of these platform types are fast growth, easy harvest, and high protein contents (up to 45 % dry weight) [55]. Moreover, these eukaryotic systems are able to produce glycosylated proteins contrary to the microbial cultures and their production costs are lower than that of the mammalian cells [56]. Biolex, Inc. (http://www.biolex.com) is working to develop a duckweed-based expression system to produce recombinant pharmaceuticals and proteins. The leading product of Biolex, a hepatitis C treatment, is a controlled-release interferon (IFN)-α2b that is currently on the market [57]. Monoclonal antibodies (mAbs), i.e., anti-CD20 mAb and anti-CD30 mAb, represent another range of products that are actually in preclinical development [58]. Microalgae also offer potential production systems of recombinant proteins because of their ease of genetic transformation, rapid growth, short life cycles, and cost effectiveness [59]. Currently, most of the work is achieved with one green unicellular alga, Chlamydomonas reinhardtii [11]. C. reinhardtii taken as a recombinant protein expression system combines, at once, the traditional bacterial fermentation or mammalian cell-culture methods. These microalgae offer post-translational modification systems that are not present in bacteria and essential for the biological activity of many therapeutic proteins [60]. Furthermore, C. reinhardtii has a rapid growth rate with a doubling time of 10 h under optimal growth conditions and supports easy nuclear and chloroplast genome transformation. Therefore, only the chloroplast is considered a realistic production system for a high level of recombinant protein accumulation [61, 62]. However, as mentioned earlier, the principal limitation of chloroplast recombinant protein production is that the expressed proteins lack post-translational modifications such as glycosylation [11]. Despite this limitation, functional recombinant proteins, including antibodies and growth factors [63], vaccines [64], and autoantigens [13] have been expressed in the chloroplast of C. reinhardtii. Plant molecular pharming technology has attracted more attention in the corporate world and more international companies are specializing in this field. Table 2 provides the names and web addresses of biotech companies working on molecular pharming to develop relevant plant platforms.

Table 2 Website addresses of some molecular pharming companies

3 Structure and Function of the Expression Cassette in Molecular Pharming Systems

The high costs of the production of drugs in transgenic plants can be minimized by increasing the expression levels of the target molecules. To improve protein expression levels, it is necessary to understand the functional gene structure and all genetic and epigenetic parts in spatio-temporal expression context. The prerequisite for making the system effective in molecular pharming is to improve the expression level based on the expression cassette structures to enhance the transcription and translation of recombinant proteins. To optimize the transcription of recombinant proteins, some approaches have been envisaged with a focus on promoter elements, translation and stability of the transcripts, leader sequences and exogenous or endogenous signal peptides. Tandem repeats of recombinant proteins or peptides and RNA interference (RNAi) technology to suppress the proteinase activity have also been attempted in order to increase the yield of proteins of interest.

3.1 Promoters in Molecular Pharming

Gene promoter is an essential element in genetic engineering for plant molecular pharming. Promoter can have ubiquitous expression (35S promoter) or spatiotemporal expression features as deacetylvindoline 4-O-acetyl transferase gene promoter from the medicinal plant Catharanthus roseus [65]. For its strength and efficiency to express recombinant genes in living cells, the 35S promoter is widely used for driving the expression of various recombinant genes in the molecular pharming industry [5, 6668]. The human growth factor placental lactogen (hPL) is an example transcribed by 35S promoter in transgenic tobacco cells [69]. The 35S promoter is used as single or multiple copies to increase gene transcription efficiency. It is used, for example, as a part of the vector pRJC-hTf to express the human serum transferrin (hTf) in transgenic tobacco [70], and in Arabidopsis protoplasts and maize callus to study the localization of the Escherichia coli heat-labile enterotoxin B subunit (LT-B) [34]. Dual 35S promoter is used as a part of the pZP200 vector to increase the expression levels of the surface antigen 1 (SAG1) [from 78 to 322] open reading frame (ORF) antigen of Toxoplasma gondii in tobacco leaves [71].

Although 35S promoter is a good choice for molecular pharming systems, it has moderate expression levels in some seed crops expressing recombinant proteins due to the lack of cis-acting regulatory elements specific for promoters of mature seeds’ genes and the elements on which specific transcription factors bind to activate or repress the transcription. Moreover, 35S promoter is not as efficient in monocots as in dicots. For this reason, monocot seed-adapted promoters have to be used instead of 35S promoter. As such, the fatty acid elongase (FAD) was employed as a seed-specific promoter to express a chimeric protein in canola seeds [72]. The maize 27 kDa γ-zein promoter with γ-zein signal peptide was also used as a part of recombinant gene cassettes to drive the gene expression of the cholera toxin B subunit (CT-B) in maize seeds [34, 73]. Another example of seed-specific promoters is the barley horde in promoter (0.45 kb), which is employed to express and produce an active recombinant human hematopoietic growth factor FlT in pb22gl-12 in barley plants [74]. On the other hand, it was demonstrated that extending the length of the globulin1 promoter from 1.4 to 3 kb or using three tandem repeats of the 1.4 kb globulin1 promoter was a successful strategy to increase the expression level of the hepatitis B surface antigen (HBsAg) in maize seeds. The obtained yield of this antigen shifted from 0.12 to 0.31 % of the TSP [75]. On the other hand, inducible promoters are a powerful approach to restricting the gene expression at the spatiotemporal level and expressing the recombinant genes when and where they are needed. For example, the rice gene promoter α-amy8, which is strongly inducible by sugar depletion, was successfully employed to produce the mouse granulocyte–macrophage colony-stimulating factor (mGM-CSF) in rice suspension culture with a final yield up to 8 % of total secreted proteins [76]

The microalgae C. reinhardtii has also employed for production of recombinant proteins based either on a nuclear or a chloroplast gene expression systems. For nuclear one, a few promoters have been used such as the promoters of the photosystem I complex, ribulose bisphosphate carboxylase, heat shock protein 70A genes while for chloroplast gene expression systems, other genes promoter were exploited as the rbcL (ribulose bisphosphate), atpA (alpha subunit of adenosine triphos-phatase), psbD (photosystem II D1) and psbA (photosys-tem II psbA) promoters are considered a good choice. These promoters have been found to be more efficient when they are used with their 5′- and 3′-untranslated regions (5′UTR and 3′UTR) [77].

Chloroplast-based systems have the advantage of avoiding the nuclear gene silencing and methylation with enhanced increasing of the level of transgene expression. Tobacco chloroplast is envisioned as a relevant host to produce native and synthetic insulin-like growth factor (IGF)-1 (IGF-1s), while employing its specific regulatory elements including 5′UTR and 3′UTR regions and 16S rRNA (ribosomal RNA) promoter as part of the expression cassette. The chloroplast recombinant protein is fully functional based on its ability to stimulate the growth and proliferation of the human HU-3 cells [78]. Another example of using chloroplast as a model to produce human recombinant protein is the production of the IFN-α5 gene in tobacco chloroplasts [79]. In this study, the chloroplast rbcl light-inducible promoter was able to produce a yield up to 4,063 pg/g fresh weight [79].

Table 3 summarizes some promoters and other elements in the expression cassettes used in molecular pharming systems.

Table 3 Some promoters with their vectors used in plant molecular pharming to produce recombinant proteins in various host plants. Only transient or agroinfiltration transformation events are mentioned; all other events are stable transformation

3.2 Specific Peptides and Sequences to Enhance Molecular Yield Pharming Productivity

In addition to the importance of promoter architecture for gene expression in molecular pharming, other strategies based on using specific peptides at N- and C-termini have been employed to enhance the transcript level of recombinant proteins. These peptides are recognizable by the host system and can increase the level of transcription and translation of recombinant proteins with enhanced stability and protection against protease degradation by targeting recombinant protein in the secretory pathway to specific organelles [34]. The cereal α-amylase signal peptide is largely used in seed-specific vectors, e.g., to produce the mouse mGM-CSF in a Gateway vector under the control of the α-amylase gene promoter [76]. Replacing the native signal with α-amylase signal peptide did not show significant effect on the expression and accumulation of the protein in transgenic tobacco and potato [80]. The signal peptide from the 2S2 seed storage protein was successfully used to direct the human recombinant proteins into a seed secretory pathway in Arabidopsis, petunia, and tobacco [26, 81]. Another signal peptide, the AP24 Osmotin, is used to target the surface-expressed antigen 1 (SAG1) and granule dense 4 (GRA4) to the apoplast of tobacco leaves [71]. The bacterial LT-B from E. coli can drive the localization of recombinant proteins to the secretory pathway in Arabidopsis protoplasts and maize [34]. The sequence KDEL (Lys-Asp-Glu-Leu), encoding for endoplasmic reticulum (ER) retention signal, was used to express the mouse (mGM-CSF) into tobacco transgenic plants [82]. Kozak consensus peptide plays an important role in eukaryotic gene expression system [72]. Kozak and KDEL (vacuole retention peptides) are largely exploited to enhance the expression, stability, and retention of recombinant protein [26]. Ma and co-workers [80] showed that pleiotropic cytokine interleukin (IL)-4 expression level under the control of 35S promoter in transgenic tobacco and potato was increased when using constructs containing the ER SEKDEL (Ser-Glu-Lys-Asp-Glu-Leu) in C-terminal. HDEL (His-Asp-Glu-Leu) retention peptide was also shown to be crucial for ER targeting of the IgG–HDEL fusion into transgenic tobacco plants [83]. The histidine tag (his-tag) is widely used for targeting and facilitating the purification of recombinant proteins [80, 84]. The ZZ linker from Staphylococcus aureus is also used as an epitope tag in chloroplast expression cassette to facilitate the purification process of synthetic and native IGF-1 [78].

Other specific signals and peptides are used for their suitability to host plants and their positive roles in transcription and/or translation of the recombinant proteins. Table 4 summarizes some of these specific signals and their usage as parts of the expression cassettes.

Table 4 Examples of signal peptides, peptide tags, vectors, recombinant genes, and plant host systems employed in molecular pharming studies

3.3 Codon Optimization and Chimeric Gene Strategies in Molecular Pharming

Re-engineering recombinant genes could generate active proteins with increased expression levels. The possibility of making a fusion between a few genes using linkers to assemble amino acids or ORFs in a chimeric structure is a successful strategy in molecular pharming. Using the optimal codon to match the endogen proprieties of the host plant can significantly increase the level of transcription and translation of recombinant proteins. Amani et al. [72], for example, employed linkers to create a chimeric structure of three virulence factors: EspA, Intimin, and Tir receptor from E. coli O157:H7. The designed vaccine was able to reduce bacterial infection and colonization in mice. Another example was constructed from two heavy- and light-chain sequences of LO-BM2 antibody (a therapeutic IgG) and expressed in tobacco bright yellow (BY)-2 cells [85]. The two chains were fused in tandem or inverted in a convergent orientation to be expressed as part of an expression cassette under the control of PMA4 promoter with two 35S enhancer sequences. In this work, the tandem construct had a much higher level of expression than the inverted one in tobacco suspension cells and transgenic tobacco plants [85]. A number of chimeric proteins have been constructed from multiple HIV proteins to be evaluated as HIV vaccines. Targets for HIV vaccines included the different categories of HIV proteins: structural proteins (ex. gag, env, glycoprotein [gp] 120, gp41, gp160); enzymes (pol); and regulatory proteins (nef, tat, rev, vpr). More information about potential vaccines reported as candidate plant-based vaccines for human is reviewed by Soria-Guerra et al. [6].

Protein accumulation could also be increased by duplicating a small peptide to form a large multimeric protein. For example, ten sequential tandem repeats of the small peptide glucagon-like peptide-1 (GLP-1) were combined into one large synthetic gene that was introduced to tobacco [86]. Western blot analysis revealed that the multimerized GLP-1 protein accumulated up to 0.15 % of the TSP in transgenic tobacco leaves. Additionally, Cerovska et al. [87] constructed a chimeric version of an epitope driven from the human papillomavirus type 16 L2 (amino acids from 108 to 120), fused to the coat protein of the potato virus X (PVX CP). Then, the chimeric construction was transiently expressed in transgenic tobacco with a resulting yield of the fused protein of 170 mg/kg of the fresh leaf tissue.

Codon optimization of the surface antigen, SAG178–322 from Toxoplasma gondii (protozoan parasite), was investigated in transgenic tobacco leaves [71]. The synthetic ORF of SAG1 was designed by overlapping oligonucleotides strategy with codon optimized by ER signal peptide. The synthetic SAG1 was then introduced into the plant binary vector, pZP200, and expressed in tobacco leaves. It was found that the tobacco leaves transformed with unmodified SAG1 gene accumulated protein five- to tenfold more than the leaves transformed with a codon-optimized SAG1 gene [71]. This may be due to the ER localization signal that allowed high accumulation levels of the native SAG1. While this study shows a negative impact of the codon optimization on the expression of synthetic, optimized gene, Daniell et al. [78] found that IGF-1s was expressed, and fully functional, in tobacco chloroplasts at similar levels of the native IGF protein. In fact, in a comparison study between the expression levels of the native IGF-1 and an IGF-1s, the authors found that, after optimization of the IGF-1 gene, the yield of the optimized IGF-1 was 11.3 % of TSP, while the native IGF-1 was 9.5 % of TSP. Moreover, the expression of the IGF-1s was detected by western blots in E. coli, while no native IGF-1 protein was observed, suggesting that the translation machinery in chloroplast is more flexible than in E. coli, with regard to codon optimization and usage [71].

3.4 Linkers in Molecular Pharming for Recombinant Protein Fusions

In plant molecular pharming, linkers are important elements for producing multimeric proteins, and introducing multiple and more complex traits. Linkers are short peptide sequences composed of flexible residues such as glycine and serine between adjacent domains to ensure that these domains do not sterically interfere with each other. Linkers are important elements assembling few proteins, ORFs, or peptides together without interfering with the structure of fused units. Linkers have no negative effects on the activity and stability of the assembled parts in the new structure. Variable short peptides have been exploited as linkers in genetic engineering. For example, the linker EAAAK was used to design a new synthetic gene encoding the carboxy-terminal fragment of Intimin, the middle region of Tir receptor and the carboxy-terminal part of the EspA factor in E. coli [72, 88]. The new synthetic gene was translated into transgenic canola and tobacco plants with a yield up to 0.2–0.33 % TSP in transgenic canola seeds and 0.1–0.3 % TSP in transgenic tobacco. Another example is the AG simple alanine-glycine linker (AG linker), consisting of six amino acids with three AG repeats (AGAGAG). This linker was used as a part of the expression cassette in pLM03, pLM08, and pLM09 vectors and resulted in a positive shift of yield from undetectable to 0.059 % extracted from maize kernels [34]. As a part of a cleavable polyprotein precursor with native kex2p-like protease activity, the kex2p linker (IGKRGK) 3EF (amino acids Leu/Glu and Glu/Phe) was also successfully used to fuse red fluorescent protein (DsRed) or human cytokine (GMCSF) to the green fluorescent protein (GFP) into the viral vector TTOSA1-103SPEKCD43 and the two fusions were expressed in tobacco cells (referred to as RKG and GKG cells). The expression levels of RKG and GKG were 0.5 and 0.2 % of TSP, respectively [89]. To test the effect of the length of linker on the accumulation of recombinant proteins, two different lengths were compared by Plchova et al. [90]. The first link was four amino acids in length (GPGP) and the second 15 amino acids in length (SGGGG) × 3. Both of them were employed to link different mutagenized versions of the human papillomavirus E7 oncoprotein with PVX CP in C- and N-terminal [90]. The resulting fusions were then expressed in E. coli and tobacco (Nicotiana benthamiana), but no significant difference was observed between the two lengths of linker on the accumulating protein constructs. However, the N-terminus fusions resulted in twofold higher protein yield than the C-terminus fusion (approximately 2.80 mg/mL culture medium against 1.35 mg/mL, respectively). More information about linkers and the choice of linkers are provided by Soria-Guerra [6] and Crasto and Feng [91].

3.5 Gateway Vectors in Plant Molecular Pharming

The Gateway recombination system is a relatively recent powerful tool used in molecular cloning and gene expression investigations. Based on specific recombination sequences, the Gateway system allows cloning and transferring of DNA fragments in a high-throughput manner between different expression vectors while maintaining the orientation of the cloned reading frame [92]. The system is based on the insertion of a DNA fragment, or a complete ORF, between two flanking recombination sequences in a specific plasmid vector to create an “entry clone”. Once an entry clone is created, the cloned DNA could be sub-cloned to any Gateway-designed expression vector, independent of the vector function or the host background [92]. Due to its advantage as a high-throughput cloning tool, Gateway technology is suitable for creating ‘molecular archives’ containing the whole predicted ORFs, ORFeome, of an organism [93]. Different ORFeome projects have thus been generated using Gateway technology for protein expression and functional analysis, such as the yeast ORFeome [94] and the human ORFeomes [95]. The Gateway ORFeomes also allow straightforward expression of native or N-terminal and/or C-terminal fusion proteins [96]. The use of Gateway technology in molecular pharming is emerging at great pace. A recent study support this trend and reports the suitability of Gateway-based vectors to transform plant chloroplast [97]. The author of this study converted a standard transformation vector into a Gateway destination vector in which they cloned the GFP associated with a ribosome binding site, T7g10. Then, they transformed the tobacco chloroplast by the biolistic method, and found that GFP was successfully integrated into the plastid genome and resulted in an accumulation level of GFP protein up to 3 % of the TSP. PCR testing and confocal laser scanning microscopy confirmed the presence of GFP in the chloroplast, suggesting that Gateway system is suitable for plant pharming in this organelle. In another recent study, Buntru et al. [98] report that multigene constructs carrying seven transgenes, up to 26 kb size, were successfully transformed into tobacco cells using Gateway technology and were stably inherited for at least two generations. One of the implications of the findings in this study is the availability of a powerful and efficient tool for multigenes transformation that may facilitate the genetic engineering for the production of pharmaceutical compounds at an industrial scale.

Kagale et al. [99] have also constructed a series of tobacco mosaic virus (TMV) compatible with Gateway technology and intended for ORFeome exploitation with the facility of N- or C-terminal fusions with a wide range of epitope tags. Similarly, Liu et al. [76] have recently constructed a Gateway-compatible binary T-DNA destination vector to produce the mGM-CSF in rice suspension cell culture under the control of the rice α-amy8 promoter with its signal peptide. A binary Gateway vector, pK7WG2, has been successfully used to produce the human glutamic acid decarboxylase (GAD65), an autoantigen implicated in type 1 diabetes mellitus, in transgenic tobacco with protein accumulation up to 2.2 % of TSP, but retaining its autoantigenecity [100]. Morandini et al. [101] also investigated the use of Gateway vectors such as the pPhasGW for seed-specific transgene expression system. The pPhasGW vector was successfully employed to express a few human recombinant proteins such as a chimeric version of GAD67/65, an isoform of the glutamic acid decarboxylase (65 kDa), a murine anti-inflammatory cytokine IL-10 and proinsulin. These proteins were produced in Arabidopsis and tobacco under the control of the specific seeds promoter phaseolin and resulted in a high yield of the chimeric GAD67/65 in Arabidopsis seeds (7.7 % of TSP) while IL-10 and proinsulin reached up to 0.70 and 0.007 % of TSP, respectively [101]. The pPhasGW vector was also a good choice to express a few other constructs harboring recombinant single-chain variable fragments to the crystallizing fragment of the antibody (scFv-Fc) of two antiviral mAbs (2G12 and HA78) in Arabidopsis transgenic seeds and leaves under the control of the seed-specific β-phaseolin promoter from Phaseolus vulgaris; the maximal expression levels ranged from 0.8 to 9.4 mg recombinant protein per gram dry seed weight [81].

More information about Gateway-compatible plants can be found elsewhere [102, 103].

3.6 RNA Interference, Gene Silencing and Suppressing of the Proteases Activity in Molecular Pharming

The limitations to producing a high yield of recombinant proteins are sometimes related to the plant host system and its metabolism. This obstacle can be overcome by inhibiting some of the major metabolic activities in the plant host. For example, the production and accumulation of the recombinant human granulocyte colony-stimulating factor was significantly increased in transgenic rice suspension culture by an RNAi approach designed to suppress the cysteine proteinase gene expression or by the inhibition of proteinase [104, 105]. The inactivation of ATP/ADP transporters in potato tubers by RNAi methodology increased the accumulation level of human recombinant mAb, scFv L1G6, in the tubers up to twofold, with 30 % increase in tuber biomass and 50 % more soluble protein than wild-type tubers [106]. In rice, it was found that the co-transformation of cell suspension culture with recombinant human granulocyte-macrophage colony-stimulating factor (rhGM-CSF) and a serine proteinase inhibitor II gene resulted in a reduction of the protease activity and an increase of the yield of the recombinant rhGM-CSF by twofold compared with cells expressing rhGM-CSF only [105]. The strategy of a gene silencing suppressor was applied on a few systems to enhance the gene expression and recombinant protein accumulation. For example, adding a suppressor of gene silencing can have a positive role on gene expression such as the derivatives of pEAQ-HT vector, which was used to express transiently the human papillomavirus under the control of 35S promoter in agro-infiltrated N. benthamiana leaves [107]. The anthrax receptor fusion protein co-expressed with viral gene silencing suppressors in agro-infiltrated tobacco leaves showed a tenfold increase in the recombinant protein compared with the absence of suppression, particularly with p1 suppressor [108]. Rice seeds are an efficient model to produce pharmaceuticals for human health, particularly when storage proteins (13 kDa prolamin and glutelin) have been silenced by the RNAi technology, so the yield of the human growth hormone polypeptides reached the level of 470 μg/g dry weight [109].

Protease activities, in turn, can be exploited to increase the level of recombinant proteins. The protease activities were investigated to express multiple secretory proteins in fusions between red fluorescent protein (DsRed), or human cytokine (GMCSF), and downstream GFP as a single transgene by using a cleavable polyprotein precursors containing kex2p linkers such as LEAGG(IGKRGK)3EF [89]. Using protease inhibitors in a specific manner demonstrated its potential to increase the recombinant protein yield. The transient expression of tomato Cathepsin D, inhibitors of A1 and S1 proteinases, led to improvement of the murine diagnostic antibody C5-1 accumulation and protection. It altered the A1/S1 protease activities in the targeted specific cell compartment (leaf apoplast) in N. benthamiana and increased the protein content by 40 % [110].

4 Glycosylation Patterns in Plants Producing Pharmaceutical Proteins

Glycosylation represents the most important post-translational modification that corresponds to covalent linkage of an oligosaccharide side chain to the natural or biopharmaceutical proteins [111]. Shared by all eukaryotic organisms, this fundamental mechanism significantly modulates folding, activity, yield, and immunogenicity of therapeutic glycoproteins [112]. Nevertheless, there are interkingdom differences in the glycosylation patterns, particularly in the N-glycoform design. N-glycosylation starts in the ER and continues in the Golgi apparatus with the addition of α(1,3)-fucose and β(1,2)-xylose residues to N-glycans in plants and with the addition of α(1,6)-fucose moieties, glucose, and acid sialic residues to N-glycans of glycoproteins in mammals [17]. Plant-specific xyloglucan is likely associated with allergenicity disorders limiting the use of therapeutic glycosylated proteins by parenteral administration. To overcome such hurdles, humanization of plant-made glycoproteins must be achieved. In recent years, promising strategies of glycoengineering have made humanizing plant N-glycolysation possible [113], either by inactivation of the host plant’s endogenous glycosyltransferases (α1,3-fucosyltransferases and β1,2-xylosyltransferase) or de novo expression of heterologous glycosyltransferases (β (1,4) galactosyltransferase and sialyltransferase), similar to those of mammals [114]. Besides, proteins resident of the ER lumen possess high-Man (Mannose)-type N-glycans with structures common to plants and mammals [114]. Thereby, the addition of the H/KDEL sequence at the C-terminal end of a secreted protein makes its retention in the ER possible and its glycosylation is limited to only high mannose-type N-glycans [111]. Another non-immunogenic glycosylation approach, based on RNA-interference, modulates the levels of xylosyl and fucosyl glycans and prevents the addition of undesirable sugar residues to biotherapeutic proteins [17].

5 Conclusion and Perspectives

Many genetic, upstream and downstream aspects of the molecular pharming technology must be taken into account when we attempt to maximize the expression and translation of the recombinant proteins of interest. Various plants species, organs, tissues, cells, and subcellular organelles have been extensively studied to optimize the yield of pharmaceutical recombinant proteins with more or less successful results. Genetic transformation and regeneration systems are still not feasible or are unavailable for other plant species. This fact narrows the spectra of plant hosts available to explore new approaches and tools to enhance the potential of drug design, discovery, and production.

The design of expression cassette in molecular pharming has profound effects on gene expression and protein accumulation. It is thus of high interest to elucidate the mechanisms underlying the transcription and translation process of the recombinant proteins. In this review, we attempted to bring into focus the involvement of some transcription and translation constituents. Although a limited number of studies report the applications of Gateway vectors in molecular pharming studies, these vectors facilitate the cloning and the expression in a short time in comparison with classical cloning techniques. Epigenetic RNAi or gene silencing suppressors can also have significant impacts on the whole plant metabolism and the expression and accumulation of heterologous proteins. The obstacle of plant glycosylation features can be overcome by humanization of plant-made glycoproteins.

The plant-made pharmaceuticals industry will flourish in the near future when it is able to provide a high quality and quantity of pharmaceutical products at low costs and with basic infrastructure. It is currently in dire need of a standard regulation system to avoid all environmental and healthcare system troubles (biosafety and regulatory issues). The recent advancements and progress in molecular biology and genomic studies and molecular cloning techniques will certainly give more effective expression and purification systems for the plant molecular pharming platform.