Structure basis for the unique specificity of medaka enteropeptidase light chain

Dear Editor, 
 
Enteropeptidase (enterokinase, EC 3.4.21.9) is a serine protease, which shows a specific cleavage of its substrates at the C-terminal of the recognition site (Asp)4Lys (Zheng et al., 2009). Because of the unique specificity, enteropeptidase could be used as a tool for the production of recombinant fusion proteins. Especially the recombinant enteropeptidase light chain (EPL), which contains a catalytic domain, is of large interest to be applied in biopharmaceutical industry (Lu et al., 1997). 
 
Enteropeptidase has been cloned from several sources, including bovine (Kitamoto et al., 1994), porcine (Matsushima et al., 1994), humans (Kitamoto et al., 1995), mouse (Yuan et al., 1998), and rat (Yahagi et al., 1996). For the high availability, the recombinant bovine enteropeptidase light chain (BEPL) is now most commonly used. However, it is known that BEPL does not exhibit high stringency in its specificity for the canonical target sequence D4K (Liew et al., 2007). For instance, Shahravan et al. showed that BEPL also cleaved at unexpected DR and SR sites in AhR6-C/EBP protein (Shahravan et al., 2008). Recently, the enteropeptidase from medaka had been reported to be a more effective tool, because of its much stricter specific for the D4K sequence compared to its mammalian counterparts (Ogiwara and Takahashi, 2007). Therefore, we sought to solve the crystal structure of the light chain of medaka enteropeptidase (MEPL), and gain insights into the determinants for its stricter specificity. 
 
MEPL shares high sequence similarity with other EPL classes, which have been structurally identified in previous reports (Fig. S1). Therefore, the crystal structure of MEPL was determined by molecular replacement using the bovine enteropeptidase light chain (PDB entry 1EKB, 53.7% amino acid identity) as the search model and refined to 2.0 A resolution. The structure of MEPL displays a typical α/β trypsin—like serine protease fold (Fig. 1A). It consists of two six-stranded β-barrels (β1–β6 and β7–β12), either of which makes up about one half of the entire molecule. Both β-barrels are arranged in a Greek-key-pattern containing α-helices at the middle of each barrel with a third α-helix located at the C-terminus. The surface potentials of MEPL reveal an equal distribution of charged amino acids on protein surface although the region near the active center has a predominantly negative potential (Fig. 1B). 
 
 
 
Figure 1 
 
Structure of MEPL. (A) Overall structure of MEPL. The structure of MEPL is shown as ribbon diagrams in two orientations: Front view, looking into the catalytic center; side view, besides the catalytic center. The MEPL molecules are colored as blue-white ... 
 
 
 
Superimposing of MEPL with the bovine enteropeptidase light chain resulted in an rmsd of 0.79 A for the Ca coordinates (Fig. 1C). The differences are mainly located in the loop regions, such as the so-called ‘131-loop’ that connects strand β7 and β8. MEPL also contains, relative to bovine enzyme, an additional small 310-helix between the strand β4 and β5. Furthermore, the catalytic triad of the unliganded medaka enteropeptidase superimposes well with the bovine enzyme in complex with a trypsinogen-activation peptide analogue (Lu et al., 1999). However, in secondary structure, the active centers do not show any obvious differences between the two enzymes, which intimates that the stricter specificity of MEPL may depend on the distinct amino acid sequence and unique intramolecular interactions. 
 
The comparison of the catalytic centers between MEPL, BEPL and HEPL (human enteropeptidase light chain, PDB entry 4DGJ, amino acid identity 55%), provide an understanding of the mechanism of substrate specificity functioned by these different enzymes (Fig. 2A and ​and22B). 
 
 
 
Figure 2 
 
Comparison of the catalytic center in MEPL, BEPL and HEPL, and specific assay of MEPL variants. (A and B) View of detailed two regions considered as playing crucial role in the specificity mechanism. Key residues are shown as colored sticks (blue-white ... 
 
 
 
In the structure of BEPL with a trypsinogen-activation peptide (VD4K), the side chain of Lys-P1 inserts deeply into a specific pocket, at the bottom of which Asp181 neutralizes the terminal amino group (amino acid residues of peptidyl substrates customarily are numbered P1, P2, P3, etc. (Schechter and Berger, 1967)). The structure of this cleft determinates that the P1 site of substrates could only be lysine or arginine (Lu et al., 1999). This specific pocket consists of three parts: the strand β11, the so called “174-loop” that connects strand β9 and β10, and the “208-loop” that connects strand β11 and β12. The main differences locate at the N-terminus of “208-loop”, while the sequences of strand β11 and “174-loop” are highly conservative. Compared with BEPL and HEPL, MEPL shows much less activity for the peptide substrates with arginine at P1 site, which indicates its restriction for the entrance of arginine residues. And according to the structural details, we supposed that the variations of MEPLH24, MEPLE136, MEPLV209, MEPLG210 and MEPLR213 may play curial role in the tension changes of the loop chains around the pocket of catalytic center. 
 
In MEPL structure, MEPLE136 makes three hydrogen bonds with MEPLR213, while there are no similar interactions in BEPL because it is BEPLL213, BEPLY136 in these positions as well as the same situation for the HEPL structure (Fig. 2A). Although the interactions are a little far distance from the catalytic center (8 A), these hydrogen bonds may help anchoring the “208-loop”, restricting the electron density size of substrate. In addition, MEPLV209 and MEPLG210 with short side chains are covered and fixed by the side chains of four residues (MEPLE136, MEPLE163, MEPLR213 and MEPLR216), reducing the mobility of “208-loop” and strand β11 in MEPL. However, the amino acid residues at the same positions in BEPL and HEPL are exposed on the surface of the proteins, and therefore possess higher flexibility (Figs. 2A and S2). Moreover, we observed that the imidazole group of MEPLH24 could form a hydrogen bond with the main chain oxygen atom of MEPLG185, which could help improving the structural rigidity of “174-loop” (Fig. 2B). Nevertheless, there are no such interactions in BEPL and HEPL because it is BEPLQ24 and HEPLL24 at this position, respectively. Over all, the unique residue interactions in MEPL may reduce the flexibility of the specific pocket, restricting the entrance of arginine residue which has larger side chain. 
 
In order to study the impacts of these residues on MEKL-substrate interaction, 3 residues in MEPL were replaced by those in BEPL respectively (H24Q, E136Y and R213L), and a combined mutant E136Y/R213L was built. Then, the specific activities of the four variants were assayed by using the specific substrate GD4K-βNA and the non-specific substrates Boc-E(OBzl)-AR-MCA and Z-FR-MCA. The results showed that three mutants (H24Q, R213L and E136Y/R213L) exhibited significantly increased activity for the MCA-containing substrates, while retaining GD4K-βNA hydrolyzing activity (Fig. 2C and ​and2D).2D). This result indicated the importance of the two interactions (H24-G185 and E136-R213) for the specificity of enteropeptidase. Nevertheless, these mutations did not raise the activities of MEPL on unspecific substrates to the level of BEPL, which meant there should be other facts affecting the substrate-selectivity of MEPL. The low enzyme activity of mutant E136Y for both GD4K-βNA and the MCA-containing substrates meant that the amino acids in position 136 might affect the activity of enteropeptidase. And it was also been provided by the kinetic studies (Table S3). 
 
Taken together, our data on cleavage of peptide substrates, faithfully supported by crystal structures, present an extraordinary example of fine adjustment of enzyme mechanism. Of note, this finding can be directly applied in the reengineering of other enteropeptidases, for instance BEPL and HEPL, to improve the specificity.


Supplementary data
Figure. S1 Figure S1. Amino acid sequence alignments of the EPL from different sources. Amino acid residues are numbered based on the sequence of MEPL (top number). Secondary structure elements of MEPL are labeled above the sequence. The active-site residues (H41, D92 and S187) are indicated by blue star. The residues, which are expected to be the determinants for the stricter specificity of MEPL, are indicated by green triangles.  Rfree is an R-factor for a selected subset (5%) of reflections that was not included in prior refinement calculations. C Numbers in brackets: the values for the outer resolution shell. Table S2. PCR primers used for mutant MEPL production by site-directed mutagenesis.

Mutations
Primers

Construction of recombinant Escherichia coli for expression of MEPL
The MEPL-encoding DNA fragment was synthesized and cloned into the expression vector pET-32 (Novagen) at a site downstream to the fusion partner thioredoxin (Trx) gene, following the sequence encoding D4K. The expression plasmid pET-32-MEPL was transformed into E.coli BL21(DE3).

Protein expression and purification
The medaka enteropeptidase light-chain sequence was taken from the Uni-Prot database (ID A4UWM5; residues range from 795 to 1036). The nucleotide sequence was synthesized and cloned into the expression vector pET-32 (Novagen) at the endonuclease Kpn I and Bam HI sites. The final construct encoded an N terminal Trx-tag, a 5-amino acid linker containing an enteropeptidase cleavage site, and MEPL.
The protein was overexpressed in an E.coli strain BL21 (DE3) as inclusion bodies.
The expression was induced by the addition of 0.5 mM IPTG when OD600 reached 0.6. Cells were harvested by centrifugation and the pellets were suspended in 20 mM Tris (pH = 8.0) and lysed by sonication. The cell lysate was then centrifuged to get the precipitate.
Then, the inclusion bodies were solubilized by the solution containing 8.5 M Urea and 20 mM β-mercaptoethanol for 3 h and ultra-centrifuged at 10,000 g for 1h. The refolding was performed by a fast dilution method in 20 mM NH3·H2O and 10% Glycerol at pH = 10.5. After 72 h at 4 ℃, the refolding solution was dialyzed to 20 mM Tris (pH = 8.0) at 7 / 11 room temperature to facilitate autocatalytic activation by cleaving the fusion Trx-tag at the D4K sequence.
MEPL was further purified by affinity chromatography on STI-agarose (Sigma). The column was equilibrated with 20 mM Tris (pH 8.0) and 50 mM NaCl and then later eluted with 50 mM HCOONa (pH 3.0). The final sample of MEPL was dialyzed against 10 mM Tris-HCl, 150 mM NaCl, pH = 8.0, and concentrated to 20 mg mL -1 . The determination of protein concentrations was carried out spectrophotometrically by absorbance at 280nm using a molar extinction coefficient of 59,970 M -1 cm -1 calculated from the known amino acid sequence.

Crystallization
The concentrated protein sample (20 mg mL -1 in 10 mM Tris-HCl, 150 mM NaCl, pH = 8.0) was screened for crystallization using commercial available screen kits.
Following extensive optimization trials, the crystallization was performed at 291 K using the hanging-drop vapor-diffusion technique. The hanging drops were obtained by mixing 1-μl of the protein solution with 1-μl reservoir buffer containing 20% polyethylene glycol (PEG) 3,350, 0.1 M cadmium chloride, 0.1 M sodium acetate (pH 4.6), and equilibrated against 500-μl reservoir solution. The protein crystals reached final dimensions of 100×100×100 µm 3 with the best diffraction within one week.
For data collection, the native crystals were soaked in cryoprotectant ( 20% PEG 3,350, 0.1 M cadmium chloride, 0.1 M sodium acetate pH 4.6, and 5% glycerol), and flash-cooled in liquid nitrogen. Then the crystals were transferred into a dry nitrogen stream at 100 K for X-ray data collection.

Data collection, structure determination, and refinement
X-ray diffraction data were collected at beamline BL17A (Photon Factory, Japan) at a resolution of 2.0 Å. Data were processed, integrated, and scaled using the HKL2000 program package (Otwinowski and Minor, 1997). The crystals belonged to space group P212121 with unit-cell dimensions a = 48.5 Å, b = 71.65 Å, and c = 134.5 Å.
The structure of MEPL was solved by molecular replacement method, employing the crystal structures of BEPL (PDB entry 1EKB) (Lu et al., 1999) as the initial searching model by using the program PHASER (McCoy et al., 2005). The clear solutions in both the rotation and translation functions indicated the presence of two molecules in one asymmetric unit, which is consistent with the Matthews coefficient and solvent content (Matthews, 1968). The disconsistent residues were manually rebuilt in the program Coot under the guidance of the Fo-Fc and 2Fo-Fc electron density maps (Emsley and Cowtan, 2004). After the refinement in PHENIX (Adams et al., 2002), the respective working Rfactor and Rfree dropped from 0.42 and 0.48 to 0.19 and 0.24, respectively, for all data from 50.0 to 2.0 Å. Refinement was monitored by calculating Rfree based on a subset containing 5% of the total reflections. Model geometry was verified using the program PROCHECK (Laskowski et al., 1993). Data collection and refinement statistics are detailed in Table S1. All structure figures were prepared using PYMOL (DeLano, 2002).

Site-directed mutagenesis
Site-directed mutagenesis was carried using overlap-PCR to produce the corresponding fragments for the following mutants: H24L, Y136E, R213L, and Y136E/R213L. The DNA of wide type (WT) MEPL was used as the template and the 9 / 11 primers used to construct MEKL variants were shown in Table S2. The PCR products were digested with KpnI and BamHI (Takara Biotech), gel purified, and ligated into the expression vector pET-32. All mutations were confirmed by DNA sequencing and correct plasmids for mutants were transformed into E.coli BL21 (DE3).

Enzyme assays
The activities of the recombinant MEPL variants and BEPL were determined by using the specific substrate GD4K-βNA (Sigma, St. Louis, MO) according to the method described before (Ogiwara and Takahashi, 2007). Enzyme activity for the non-specific substrates Boc-E(OBzl)-AR-MCA and Z-FR-MCA (Peptide Institute, Osaka, Japan) were measured by the method of Barrett (Barrett, 1980). These assays were repeated three times.

Kinetic Studies
The kinetic assay of EP for GD4K-βNA was performed at 30℃ in 100ul buffer containing 0.04 -0.8 mM GD4K-βNA, 25 mM Tris (pH 8.3), 2.5% dimethylsulfoxide and 2 mM CaCl2. The reaction was initiated by adding MEPL (final concentration 18.7 nM). The active rate was determined from the continuously increment of fluorescence (λex = 337 nm and λem = 420 nm) for 3min. The Km and kcat were determined from Lineweaver-Burk plots. These assays were repeated three times.
The kinetic assay for various 4-methylcoumaryl-7-amide (MCA)-containing peptide substrates was determined in 100 ul buffer containing 25 mM PB (pH7.5), 2.5% dimethylsulfoxide and 2 mM Ethylenediaminetetraacetic acid disodium salt (EDTA), with concentration range of the substrates 0.08 -0.8mM. Reactions were started by addition of MEPL (final concentration 37 nM) and fluorescence absorbance (λex = 380 nm and λem = 460 nm) of the released MCA was monitored continuously for 3 min. The Km and kcat were determined from Lineweaver-Burk plots. These assays were repeated three times.