Predominant cleavage of proteins N-terminal to serines and threonines using scandium(III) triflate

Abstract Proteolytic digestion prior to LC–MS analysis is a key step for the identification of proteins. Digestion of proteins is typically performed with trypsin, but certain proteins or important protein sequence regions might be missed using this endoproteinase. Only few alternative endoproteinases are available and chemical cleavage of proteins is rarely used. Recently, it has been reported that some metal complexes can act as artificial proteases. In particular, the Lewis acid scandium(III) triflate has been shown to catalyze the cleavage of peptide bonds to serine and threonine residues. Therefore, we investigated if this compound can also be used for the cleavage of proteins. For this purpose, several single proteins, the 20S immune-proteasome (17 proteins), and the Universal Proteomics Standard UPS1 (48 proteins) were analyzed by MALDI–MS and/or LC–MS. A high cleavage specificity N-terminal to serine and threonine residues was observed, but also additional peptides with deviating cleavage specificity were found. Scandium(III) triflate can be a useful tool in protein analysis as no other reagent has been reported yet which showed cleavage specificity within proteins to serines and threonines. Graphic abstract Electronic supplementary material The online version of this article (10.1007/s00775-019-01733-7) contains supplementary material, which is available to authorized users.


Introduction
Site-selective cleavage of proteins is an important tool for the analysis of protein sequences. Trypsin cleaves proteins C-terminal to arginine and lysine residues, and is by far the most commonly used endoproteinase in bottom-up proteomics [1]. However, protein sequences can contain either large sequence regions without arginines and lysines or where these amino acids are too close together. In such cases, peptides suitable for analysis by mass spectrometry cannot be obtained using trypsin. Other endoproteinases cleave either at basic (Arg-C, Lys-C, Lys-N, and LysargNase), acidic (Asp-N and Glu-C), or hydrophobic (chymotrypsin) amino acids [2][3][4]. Only few chemical reagents for the selective cleavage of proteins have been reported [5]. The most often used chemical reagent, cyanogen bromide, selectively cleaves peptides bonds on the C-terminal side of methionines, but the reagent is volatile and toxic and thus only occasionally used [6]. Metalcatalyzed specific hydrolysis of peptide bonds has been described, but their practical use in protein sequencing is still in infancy [7]. The site-selective cleavage of peptides and proteins primarily at X-Y bonds in X-Y-His and X-Y-Met sequences by Pd(II) complexes and Met-Z bonds by Pt(II) complexes was reported [8,9]. Asparagine-selective bond cleavage of N-terminal protected peptides using diacetoxyiodobenzene under mild conditions has been reported as well [10]. Selective cleavage of peptide bonds N-terminal to serines has been described using N,N′-disuccinimidyl carbonate [11], and a water-solubleorganoradical conjugate [12]. Truncation of long-lived proteins at the N-terminal side of serines and threonines has been observed [13]. Zinc is a major trace metal in biological systems and it could be shown that the presence of Zn(II) catalyzes this truncation process [14,15]. Different metal complexes were compared and revealed that Sc(III) triflate is particularly effective in cleaving peptide bonds at serine and threonine residues [16]. Peptides with up to 42 amino acids were investigated and revealed fragments due to cleavage at serine and threonine residues. In this report, we evaluated the use of Sc(III) triflate for the cleavage of proteins (10-80 kDa) using MALDI-MS and LC-MS. The obtained mass spectrometrical data allowed a more detailed analysis of the cleavage specificity. Predominant cleavage only N-terminal to serine and threonine residues was determined, but also many peptide fragments were identified due to semi-and non-specific cleavage.

LC-MS
LC-MS was performed as previously described [18]. An LC/MS system consisting of a Dionex Ultimate 3000 RSLCnano-LC system (Sunnyvale CA, USA) connected to a linear quadrupole ion trap-Orbitrap (LTQ-Orbitrap XL) mass spectrometer (ThermoElectron, Bremen, Germany) equipped with a nanoelectrospray ion source was used to analyze the tryptic peptides. For liquid chromatography separation, an Acclaim PepMap 100 column (C18, 3 µm, 100 Å) (Dionex, Sunnyvale CA, USA) capillary of 25 cm bed length was used with a flow rate of 300 nL/ min. Two solvents A (0.1% formic acid) and B (aqueous 90% acetonitrile in 0.1% formic acid) were used to eluate the peptides from the nano column. The gradient went from 3 to 35% B in 40 min and from 35 to 50% B in 3 min and finally to 80% B in 2 min. The mass spectrometer was operated in the data-dependent mode to automatically switch between Orbitrap-MS and LTQ-MS/MS acquisition. Survey full-scan MS spectra (from m/z 300 to 2000) were acquired in the Orbitrap with resolution R = 60,000 at m/z 400 and allowed the sequential isolation of the top six ions, depending on signal intensity, for fragmentation on the linear ion trap using collision-induced dissociation at a target value of 10,000 charges.

Data analysis
LC-MS data were acquired using Xcalibur v2.5.5 and raw files were processed to generate peak list in Mascot generic format (*.mgf) using ProteoWizard release version 3.0.331. Database searches were performed using Mascot in-house version 2.4.0. A fragment ion mass tolerance of 0.5 Da, parent ion tolerance of 10 ppm, oxidation of methionines, and acetylation of the protein N-terminus as variable modifications were considered as search parameters for all analyses. For full-and semi-specific cleavage at serine and threonine residues, two missed cleavage sites were applied, whereas no missed cleavage site was used for the database search without enzymatic cleavage specificity. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [19] partner repository with the data set identifier PXD014918.

Predominant cleavage of proteins N-terminal to serines and threonines using Sc(III) triflate
Different reaction conditions using Sc(III) triflate for the digestion of proteins were tested first with two proteins [alpha-1-glycoprotein (bovine), and beta-casein (bovine)]. Out of the different tested temperatures (30 °C, 40 °C, 50 °C, 60 °C, and 70 °C), reaction times (1 h, 4 h, 16 h, and 40 h), buffers (100 mM ammonium acetate, 100 mM ammonium bicarbonate, 100 mM ammonium citrate, 100 mM potassium dihydrogen phosphate, 100 mM sodium carbonate, 100 mM MES, 100 mM phosphate-buffered saline, 100 mM triethylammonium carbonate, 0.1% trifluoroacetic acid, and water), and Sc(III) triflate molarity (10 mM, 100 mM, 1 M), highest scores after database searches using Mascot were obtained for 60 °C/16 h/water/100 mM Sc(III) triflate. To further investigate if Sc(III) triflate can cleave proteins in general, these reaction conditions were applied to ten other proteins (alpha-casein (bovine), concanvalin (Jack bean), alpha-crystallin (bovine), cytochrome c (horse), glyceraldehyde-3-phosphate dehydrogenase (rabbit), beta-lactoglobulin (bovine), myoglobin (horse), ribonuclease A (bovine), thioredoxin (E. coli), and transferrin (human); all from Sigma-Aldrich). The reaction products of 1 pmol starting material were analyzed by MALDI-MS and LC-MS, respectively. The peptide mass fingerprints recorded by MALDI-MS revealed that the most intense peaks were almost exclusively cleavage products due to N-terminal cleavage at serine and threonine residues with full or semi-specificity ( Fig. 1 and supplementary Fig. 1). Strikingly, several overlapping sequences were observed in most of the MALDI-MS spectra (Supplementary Fig. 1). A possible explanation for this observation could be that the proteins were first cleaved specifically and then further degraded. Searching the LC-MS data with three different cleavage specificities (no specificity, semi-specific, and full-specific cleavage at N-terminal serine and threonine residues) disclosed that the cleavage products were not exclusively products of N-terminal cleavage to serines and threonines. Actually, higher identification scores were achieved at semi and no enzyme specificity compared to full N-terminal cleavage at serines and threonines (Table 1). However, favored cleavage at these sites was observed considering the identified 2403 unique peptides as revealed by motif analysis using GibbsCluster ( Supplementary Fig. 2) [20]. In addition, the percentages of the amino acids around the cleavage sites were calculated (Fig. 2). The proteins were cleaved in around 35% of identified peptides N-terminally to serine and threonine residues and none of the other amino acids showed preferred cleavage. As a note, cleavage C-terminal to aspartic acid was observed for Aβ1-42 as minor intense fragments, suggesting that a hard carboxylate functional group directs Sc(III) for cleavage of the peptide bonds. Our data set agrees with this observation, because C-terminal cleaved peptides at aspartic acids were observed twice as often as for N-terminal cleavages. As with MALDI-MS, the LC-MS signals detected with highest intensities belonged to peptides cleaved N-terminal to serine and threonine residues (Fig. 3). Bottom-up analysis is typically performed to confirm the protein sequence of therapeutic proteins. High sequence coverage of the protein is desirable to verify the identity of the sequence or modifications of it. However, high sequence coverage is dependent on the protein sequence and the available endoproteases. Recombinant granulocyte colony-stimulating factor (G-CSF) produced in E. coli (Filgrastim) is in clinical use since 1991 to reduce the duration and severity of neutropenia in patients undergoing myelosuppressive chemotherapy and to mobilize peripheral blood progenitor cells. Using trypsin, only a sequence coverage of 20% was obtained, because this protein with 174 amino acids contains a long region (42-147) without any arginine or lysine, respectively. Cleavage with Sc(III) triflate revealed a sequence coverage of 95% with semi-specific N-terminal serine/threonine cleavage (individual Mascot ion scores > 13). Table 1 Single proteins were incubated with 100 mM Sc(III) triflate One pmol of starting material was analyzed by nanoUHPLC-MS using a 1 h LC gradient and an LTQ-Orbitrap XL mass spectrometer. Mascot ion scores (Score) and sequence coverages (SC) are displayed for the individual proteins searching the same data set with full N-terminal cleavage at serines and threonine (Full), semi-specific cleavage (Semi), and no enzyme (no). Highest score and sequence coverages are displayed in bold for the individual proteins  and at all 20 amino acids (AAs) including previous and next amino acid are presented Fig. 3 Degree of cleavage specificity of Sc(III) triflate. The box plot at the top shows the relation of the specificity of peptides concerning their intensity (intense outliers were excluded for visibility reasons). The bar plot in the middle displays the absolute distribution of all identified peptides (2403 peptides). In the box plot at the bottom, the 10% most intense peptides (240 peptides) were considered and the distribution is clearly biased towards peptides specifically cleaved N-terminal to serine and threonine residues

Cleavage of more complex protein samples
The immune-20S proteasome is a 700 kDa complex and contains 17 different protein subunits. Database search analysis of the immune-20S proteasome (from UBPBio) after exposure to Sc(III) triflate using Mascot against human Swiss-Prot database identified 13 proteins with ion scores above identity or extensive homology with full N-terminal serine/ threonine specificity, 12 proteins with semi-N-terminal serine/threonine specificity, and 10 proteins with no enzymatic specificity ( Table 2). Sixteen proteins of the immune-20S proteasome were previously identified after tryptic digestion using the same LC-MS system, but after separation of the proteins using 2D gel electrophoresis [21]. This result showed that Sc(III) triflate can perform similarly well as trypsin for the analysis of small protein complexes. Next, we analyzed 1 µg of the Universal Proteomics Standard (UPS1 from Sigma-Aldrich) which corresponds to 0.83 pmol each of 48 human proteins. Using trypsin, 44 proteins were identified, whereas 32 proteins with ion scores above identity or extensive homology with full N-terminal serine/threonine specificity, 27 proteins with semi-N-terminal serine/threonine specificity, and 25 proteins with no enzymatic specificity were found using Sc(III) triflate. This result revealed that Sc(III) triflate might be less suitable with increased complexity because of the limited cleavage specificity. We also tried to identify proteins after separation using SDS-PAGE, but have not been able to identify any protein. To explain this result, we combined an unstained gel piece and added one of the standard proteins (human transferrin) and incubated it with Sc(III) triflate and again could not identify any peptide. Therefore, it can be concluded that Sc(III) triflate is inactivated by the gel and is not applicable for in-gel digests.

Conclusions
Sc(III) triflate is able to cleave proteins in-solution with predominant cleavage N-terminal to serine and threonine residues. The reagent worked with all proteins under investigation under more moderate temperatures and reduced reaction times than previously reported for the cleavage of peptides.
Notably, the reagent is stable, relative cheap in comparison to endoproteinases, and the results were highly reproducible. Therefore, Sc(III) triflate can be a useful regent for the analysis of proteins, in particular for single protein analysis where classical endoproteinases as trypsin give limited sequence coverages because of the lack of cleavage sites.