Introduction

Proteins can be modified with different molecules from small to large ones. It is well-known that protein post-translational modifications (PTMs) play extremely important roles in many biological events, including cell signaling, and the regulation of protein interactions, stability, and degradation [1]. Many types of modifications, including glycosylation [2,3,4], phosphorylation [5,6,7], and methylation [8], are regulated by enzymes. However, some non-enzymatic modifications can take place in cells and their formation is based on the chemical reactivity of certain amino acid residues on proteins such as cysteine, lysine, and arginine. These non-enzymatic PTMs impact protein structures and activities, and are involved in many cellular processes [9, 10]. They are also related to human aging and diseases, especially chronic ones including cardiovascular diseases and diabetes. Therefore, it is of great importance to systematically study non-enzymatic PTMs, which will deepen our understanding of their critical roles in biological systems.

Glycation is one type of non-enzymatic modifications, in which reducing sugars or sugar-derived metabolites are covalently attached to primary amine or guanidino groups of proteins through the Maillard reaction [11]. The reaction starts with the condensation between the carbonyl group of reducing sugars and the amine group of proteins to produce an unstable Schiff base, which can undergo an intramolecular rearrangement to form the Amadori product. The relatively stable Amadori on modified proteins can further undergo a series of reactions, including rearrangement and dehydration to form advanced glycation end products (AGEs) [12].

Protein glycation has recently attracted increased attention because it is closely associated with aging and diabetes, and could be a hallmark of other diseases, such as neurodegenerative, cardiovascular, and metabolic diseases [13,14,15]. Protein glycation is correlated with the concentration of glucose. In normal cells, glucose is tightly regulated in a narrow concentration range, while cancer cells take in more glucose for the glycolysis in order to provide sufficient energy and intermediates for their proliferation [16, 17], which could result in the difference of protein glycation from normal cells.

Despite its importance, systematic analysis of protein glycation is extraordinarily challenging and understudied due to the low abundances of many glycated proteins and the typical sub-stoichiometry of glycation. Antibody-based analysis, including immunoassay and Western blot, has provided valuable information on protein glycation [18, 19]. However, the low throughput and the issues related to antibodies, such as the non-specificity and high cost, restrict its applications. With the development of the instrument and computational technology in recent years, mass spectrometry (MS) has become a very powerful tool and provided a unique opportunity to study protein PTMs on a large scale [20,21,22,23,24,25,26,27,28,29]. However, due to the complexity of biological samples and the typically low stoichiometry of protein glycation, the enrichment of glycated proteins/peptides is indispensable before MS analysis. Boronate affinity enrichment, which relies on the reversible covalent interactions between the hydroxyl groups of the glycan and boronic acid under basic conditions, has been employed to enrich glycated proteins/peptides [11, 30, 31]. Recently, our group developed a method based on dendrimer-conjugated boronic acid derivative (DBA), which can dramatically enhance the interactions between glycans and boronic acid benefiting from synergistic interactions, to efficiently capture low-abundance N- and O-glycopeptides [32]. The synergistic interactions can also facilitate the enrichment of glycated peptides.

In this work, we systematically and site-specifically analyzed protein glycation in three types of human cells—Jurkat, HEK293T, and MCF7 cells. Several hundreds of glycated proteins were identified and the results indicated that this non-enzymatic modification was not entirely random. Proteins at the extracellular regions and the nucleus were more frequently glycated. The formation of protein glycation was related to protein sequences and secondary structures. Interestingly, almost all enzymes involved in the glycolytic pathway were glycated. In addition, we found that many glycation sites were also reported as the ubiquitination and acetylation sites, which showed that protein stability and the regulation of gene expression may be disturbed by protein glycation. Systematic analysis of protein glycation in human cells helps us have a better understanding of this non-enzymatic modification.

Experimental

Sample Preparation and MS Analysis

As discussed above, benefitting from synergistic interactions with glycans, the DBA method is highly effective to catch glycopeptides in complex biological samples. Using this method, we globally analyzed protein N-glycosylation in human cells, yeast cells, and mouse brain tissues [32]. In addition, we applied the DBA method to analyze O-GlcNAcylated proteins in human cells. Normally, there is no cis-diols in GlcNAc and glucose, and therefore the interactions between boronic acid and GlcNAc or glucose are weak. Because of multiple hydroxyl groups of these sugars participating in the synergistic interactions, the DBA method was also effective to enrich O-GlcNAcylated peptides for their global analysis.

In brief, HEK293T, Jurkat, and MCF7 cells were cultured, and after cell lysis, proteins were extracted. Proteins were digested with trypsin, and purified peptides were treated with PNGase F for the removal of N-glycans, and then we enriched glycopeptides with the DBA beads. Enriched glycopeptides were fractionated and analyzed using an online LC-MS/MS system. The resolution was set as 70,000 for full MS and 35,000 for MS2. The detailed experimental procedure is available in the previous paper [32]. The raw files for protein O-GlcNAcylation analysis may also contain much information about protein glycation. Therefore, benefitting from the raw files collected previously [32], we performed further database search for protein glycation. Glycated proteins were analyzed using different bioinformatic methods, and a possible mechanism for the formation of protein glycation was proposed.

Database Search and Data Filtering

The raw files were first converted to mzXML formats and then were searched using SEQUEST (version 28) [33]. The spectra were searched against a database containing sequences of all human proteins (Homo sapiens) downloaded from UniProt. The following parameters were used for the peptide search: 10 ppm precursor mass tolerance; 0.025 Da fragment ion mass tolerance; fully digested with trypsin; up to three missed cleavages; variable modifications: oxidation of methionine (+ 15.9949 Da), glycation of lysine or arginine (+ 162.0528 Da); fixed modifications: carbamidomethylation of cysteine (+ 57.0214 Da). The raw files are publicly accessible at http://www.peptideatlas.org/PASS/PASS01344.

The target-decoy method was used to evaluate the false discovery rates (FDRs) of glycated peptide identifications [34]. Each sequence from a protein in the protein database was listed in both forward and reverse orders. The quality of glycated peptides was evaluated and controlled by linear discriminant analysis (LDA) integrating several parameters including Xcorr, charge state, and precursor mass accuracy. Peptides with fewer than seven amino acids were removed and the glycated peptide spectral matches were filtered to be less than 1% FDR. The dataset was restricted to only glycated peptides while determining the FDRs.

Glycation Site Localization

The possibility of glycation site localization was calculated based on the fragment ions using a probabilistic algorithm similar to Ascore that considers all possible glycation sites in a peptide and the presence of experimental fragment ions unique to each site [35]. The resulting ModScore indicates the possibility of the glycation site, and the sites with a ModScore > 13 (P < 0.05) were considered to be well-localized.

Results and Discussion

Identification of Glycation Sites and Glycated Peptides from MCF7 Cells

Orbitrap MS has become very powerful for bottom-up proteomics [36, 37] and both MS1 and MS2 were recorded in the Orbitrap cell with high resolution and high mass accuracy for glycated peptide identifications. Two examples of tandem mass spectra for glycated peptide identifications are shown in Figure 1a, b. The glycated peptide YDDMAACMK#SVTEQGAELSNEER (# represents the glycation site) from 14-3-3 protein zeta/delta (YWHAZ) was identified with Xcorr of 4.5 and the mass accuracy of 0.66 ppm. The glycation site was well-localized at K27. Another example is ALSDHHIYLEGTLLK#PNMVTPGHACTQK, which was confidently identified with Xcorr of 5.1 and the mass accuracy of 0.47 ppm, and the modified site was well-localized at K230. The peptide is from fructose-bisphosphate aldolase A, which plays a key role in glycolysis and gluconeogenesis.

Figure 1
figure 1

Tandem mass spectrum of YDDMAACMK#SVTEQGAELSNEER (# represents the glycated site) (a) and ALSDHHIYLEGTLLK#PNMVTPGHACTQK (b). Identification of glycated sites (c) and glycated proteins (d) from biological duplicate experiments of MCF7 cells. Identification of glycated proteins from HEK, Jurkat, and MCF7 cells in parallel experiments (e)

Like phosphorylated peptides, the neutral loss could happen for glycated peptides. Previously, the pyrylium (loss of three water molecules) and furylium ions (loss of 3 × H2O and HCHO) were reported under collision-induced collision (CID) [38]. When manually checking many tandem mass spectra, we barely found these fragments related to the water loss. Instead, some fragments in the low mass range appeared in almost every MS2 spectrum that we checked, such as the fragments at m/z of 161.0403, 136.0752, and 110.0717 Da. The peak at m/z of 161.0403 may be from the glucose residue, which was cleaved from glycated peptides. Other peaks may also be related to the glucose residue, and remain to be investigated. The difference of the neutral loss may be contributed to the different activation methods. In this work, we used higher-energy collision dissociation (HCD) to fragment peptides, and although HCD is also a type of CID, it can produce different fragments. It has been well-documented that the neutral loss from glycopeptides may be different with different activation methods [39, 40]. Different neutral losses were observed for glycopeptides even with varying normalized collision energies (NCE) under HCD [40].

In biological duplicate experiments in MCF7 cells, we identified 325 glycation sites on 205 proteins and 436 sites on 271 glycated proteins, respectively, and all identified sites are listed in Table S1. The overlap of glycation sites and proteins (234 and 163) is reasonably good between the biological duplicate experiments (Figure 1c, d), which indicates that protein glycation, even though it is a non-enzymatic modification, did not occur randomly. Here, ModScore was used to evaluate the possibility of glycation site localization. Of all the identified glycation sites from MCF7 cells, the ModScore values of most sites (94.7%) were larger than 13, which was considered well-localized, and over 93% had a ModScore larger than 19 (corresponding to a P value of less than 0.01) (Figure 2a).

Figure 2
figure 2

(a) The ModScore distribution for the identified glycation sites in MCF7 cells. (b) The number of glycation sites identified per glycated protein. (c) The relationship between the number of glycation sites and the length of glycated proteins

For the identified glycated proteins, most of them (66.8%) carried only one glycation site and 20 proteins had more than three sites (Figure 2b). For example, 10 glycation sites were identified on heat shock protein 60, which is a mitochondrial chaperonin responsible for transporting and refolding of proteins from the cytoplasm into the mitochondria. This protein was reported to be correlated with diabetes, cancer, and immunological disorders [41]. Another heat shock protein, HSP90A, which assists protein folding, transport, maintenance, degradation, and cell signaling [42], was found to possess eight glycation sites. We also attempted to investigate the relationship between the protein length and the number of glycation sites, but could not find obvious correlation between them (Figure 2c). However, it seemed that proteins with more glycation sites (greater than 3) were likely to be shorter (fewer than 1000 amino acids). Moreover, we evaluated the effect of protein abundance on glycation. The comparison between the abundance distribution of the glycated proteins identified here and all proteins from an online database (PaxDb) [43] is displayed in Figure S1. The result demonstrated that protein glycation was not biased for highly abundant proteins.

Identification of Glycated Peptides in HEK293T, Jurkat, and MCF7 Cells

Protein glycation has been reported to be related to chronic diseases such as diabetes [44]. Glycated proteins in human plasma and erythrocytes were investigated previously because of their association with the glucose concentration in the blood. For example, Zhang et al. analyzed protein glycation in normal and diabetic plasma and erythrocytes, and found that several amino acid residues including alanine (A), valine (V), and glutamic acid (E) appeared more frequently in the vicinity of the glycated lysine residues [31]. Keilhauer et al. analyzed protein glycation in the HeLa lysate and surprisingly found that over 50% protein glycation sites were modified on arginine (R) instead of lysine (K). In addition, they also concluded that HCD fragmentation is well-suited for analyzing glycated peptides [45].

In this work, we systematically studied protein glycation in three different types of human cells. The number of  glycated proteins identified in each cell line are displayed in Figure 1e. We identified 166, 115, and 205 glycated proteins from HEK293T, Jurkat, and MCF7 cells, respectively (Table S2), and 123 proteins were identified in at least two types of cells. We also investigated the distribution of the glycation sites on lysine (K) and arginine (R) in the three cell lines and found that nearly all the identified sites were on lysine and only 4, 1, and 3 glycation sites were identified on arginine (R) from HEK293T, Jurkat, and MCF7 cells, respectively (Figure S2).

The glycated proteins identified in the three cell lines were clustered using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) [46]. Proteins located at extracellular vesicles were the mostly highly enriched with a P value of 1.2 × 10−37, 1.2 × 10−40, and 1.2 × 10−60 for Jurkat, HEK293T, and MCF7 cells, respectively (Figure S3A). Some examples of the glycated proteins located at the extracellular vesicles are in Table 1. Since the glucose concentrations in the standard culture media (4.5 mg/L in DMEM for HEK293T and MCF7 cells, and 2 mg/L in RPMI 1640 for Jurkat cells) are higher than the physiological concentration in the body (0.75 mg/L) [47], it is reasonable that proteins located at the extracellular region are more likely to react with glucose in the media. Meanwhile, proteins located at membrane-bounded organelle, nucleus, and ribonucleoprotein complex were also highly enriched, which might be due to the fact that proteins in those regions were long-lived and/or were exposed to activated glucose-derived metabolites [48, 49]. All glycated proteins were also categorized according to biological process using DAVID. The highly enriched categories included protein localization to organelle, protein folding, and translation, which indicated that protein glycation might affect these biological processes (Figure S3B).

Table 1 Examples of Identified Glycation Sites and Glycated Proteins Located at the Extracellular Vesicles

Protein Glycation and Protein Structure

Most studies investigated protein glycation on single or several proteins, while large-scale analysis may provide us more valuable information such as the site preference of glycation. Protein glycation may be attributed to many factors [50]. The type of amino acids near lysine is highly important for glycation. Hydrophobic and acidic amino acids including alanine (A), leucine (L), and glutamic acid (E) frequently appear around the glycated lysine residues. On the contrary, histidine (H) and cysteine (C) seem not to be among the major amino acids in the neighborhood of the glycated lysine [31, 51, 52]. In addition, it has been proposed that pKa values and the microenvironment caused by the 3D structure near the lysine residue may also play important roles in the lysine glycation [53]. However, it should be noted that the results from different reports may contradict with each other [54, 55], which further complicates the understanding of protein glycation.

Here, we studied the relationship between protein glycation and protein structure based on the results from the three cell lines. First, we calculated the frequency of five amino acids flanking each side of the glycated lysine. The frequency of each amino acid is displayed in Figure 3a and further normalized by its corresponding natural abundance (Figure 3b). Glutamic acid (E), isoleucine (I), and methionine (M) were found to be the most frequent near the glycated lysine, and compared to their natural abundances, the frequencies of all three amino acids increased by over 60% in the three types of cells. In the current results, acidic and hydrophobic amino acids were found to promote protein glycation, while cysteine (C) and basic amino acids (histidine (H), lysine (K), and arginine (R)) were underrepresented in the proximity of the glycation sites (Figure 3b and Figure S4).

Figure 3
figure 3

Frequency (a) and normalized frequency (b) of amino acid residues near the identified lysine glycation sites (five residues flanking the central lysine)

Next, we investigated the effect of protein structure on glycation. NetsurfP [56] was employed to predict the solvent accessibility of the glycated lysine residues and all the lysine residues from the glycated proteins identified here and the results are shown in Figure 4a. It is reasonable that most of the glycated lysine residues (around 80% for all the three cell lines) are exposed to solvent. Interestingly, 20%, 20%, and 22% of the glycated lysine residues from HEK293T, Jurkat, and MCF7 cells are buried in the proteins, and these numbers increase by 8%, 7%, and 9% compared to the fractions of all the buried lysine residues from the identified glycated proteins. One possibility is that some proteins might not fold properly or be unfolded under certain conditions, which makes the buried lysine residues be glycated more easily. Furthermore, the protein structures are dynamic, which makes the buried residues accessible to solvent and glucose under certain conditions.

Figure 4
figure 4

Distribution of the solvent accessibility (a) and predicted structure (b) of each glycated lysine site and all lysine residues from the glycated proteins identified in HEK293T, Jurkat, and MCF7 cells

Glycation may result in local distortion of proteins and change the overall protein structures. For human serum albumin (HSA), glycation leads to a higher propensity for the formation of β-sheet, which causes protein aggregates eventually [57]. Here, we studied the location of the glycated lysine residues in the secondary structures of proteins. NetsurfP was also used to predict the secondary structures of the glycated proteins. Among all the lysine residues, most of them were found at the helix and coil structures while the percentage of lysine at the β-strand structure was the lowest (Figure 4b). This is consistent with the previous result that hydrophilic lysine less frequently occurred in the β-strand [58]. The trends were the same with those of the glycated lysine residues in the three cell lines. The fractions of the glycated lysine residues from HEK293T, Jurkat, and MCF7 in the β-strand were 0.14, 0.14, and 0.15, respectively, which were slightly higher than the fractions of all lysine residues (0.09, 0.08, and 0.09) from the identified glycated proteins (Figure 4b). Actually, the solvent accessibility among the three types of secondary structures is different and the amino acid residues from the β-strand are the least accessible to solvent compared to those in the helix and the coil [59]. Previously, it was believed that glycation may cause protein structure damage [57, 60], but the above results might indicate that proteins with “damaged” structures were easier to be further glycated, which may account for the increased fraction of the glycated lysine residues in the β-strand. These results further demonstrated that protein glycation was complicated.

Protein Glycation and Glycolysis

In cells, glucose participates in the glycolysis and can be converted to pyruvate with the energy released through sequential enzymatic transformations [61, 62]. This process generates carbonyl-containing intermediates, including glyceraldehyde-3-phosphate and dihydroxyacetone phosphate. They can be converted to methylglyoxal, which is reactive and may modify proteins. Therefore, an increase in the concentration of glycolytic aldehydes and methylglyoxal promotes protein non-enzymatic modifications. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is a very important enzyme in the glycolytic pathway, and it was reported that glycation on GAPDH would reduce its catalytic activity [63]. Methylglyoxal as a non-enzymatic modification reagent for GAPDH was also reported previously [64]. However, the investigation of glucose on glycating the enzymes in the glycolytic pathway is underrepresented.

We systematically analyzed the glycation of the enzymes involved in the glycolytic pathway using glucose as the glycation reagent, and surprisingly nine of the ten enzymes including GAPDH were glycated except phosphofructokinase-1 (Figure 5 and Table S3). In cells, due to its relatively low concentration, it is generally thought that glucose is not effective to glycate proteins [65]. However, we found that the majority of the glycolytic enzymes were glycated by glucose in cells. As discussed above, the frequencies of acidic amino acids near the glycated lysine residues were relatively high. We checked the glycated peptides from the glycolytic enzymes and also found that aspartic acid (D) and glutamic acid (E) occurred frequently in the proximity of the glycated lysine residues.

Figure 5
figure 5

Identified glycated proteins (highlighted) in the glycolytic pathway

Here, we proposed a possible mechanism to explain why nearly every enzyme in the glycolytic pathway was glycated (Figure 6). Uridine diphosphate glucose (UDP-glucose) is the activated form of glucose, participating in the glycolytic pathway, and the concentration of UDP-glucose normally is higher in the region where the glycolysis happens [66]. UDP-glucose may participate in the formation of protein glycation involved in the glycolytic pathway. A double displacement mechanism involving the formation of a covalent intermediate between an acidic residue and the substrate of glycosyltransferase was proposed previously [67, 68]. Similarly, UDP-glucose may be “hijacked” by an acidic amino acid near the lysine residue through an SN2 reaction [67]. Then the amine group of the nearby lysine residue could attack the “trapped” glucose to form a Schiff base. With the help of the proximity effect [69], glucose may become more reactive towards lysine. Another possible way was to generate an oxocarbenium cation-like species as a short-lived intermediate, which can be captured by lysine. The function of the acidic amino acid residues near lysine was to deprotonate and activate lysine [68].

Figure 6
figure 6

A proposed mechanism for protein glycation

Another key step that the acidic amino acids may participate in is the rearrangement of the Schiff base, which is normally considered as the rate-determining step of the Maillard reaction. Theoretical calculation indicated that in aqueous solution, water (H2O) could form a hydrogen bond bridge between the α-hydrogen atom and the imine, which lowers the energy barrier of this step [70]. As shown in Figure 6, carboxylic acid may facilitate the hydrogen bond formation to lower the energy barrier and further promote the rearrangement. Integrating the two factors, the proposed mechanism may be able to account for why the enzymes in the glycolytic pathway were prone to protein glycation. Overall, these current results provide new insights into the glycation on the proteins involved in glycolysis.

Protein Glycation and Other Lysine Modifications (Acetylation and Ubiquitination)

PTMs regulate protein structures, spatial localizations, and interactions, and thus control their activities. Over 400 PTMs are listed in UniProt database with acetylation, ubiquitination, and phosphorylation being most studied and common ones [71]. The crosstalk between different modifications has been recognized as an important way for the regulation of cellular events, and numerous studies of well-characterized proteins like p53 and tau have demonstrated that the PTM crosstalk regulates the activities of these proteins [72, 73].

Generally, the crosstalk may be divided into two categories, i.e., positive and negative forms [74]. In the positive crosstalk, one PTM may trigger the addition or removal of another modification, such as enhancing the binding affinity of the protein towards an enzyme catalyzing another modification. For the negative crosstalk, one modified group may compete against another group on a single site or prevent the modification of the protein through masking the binding site. Hart and colleagues studied the competitive crosstalk between O-phosphorylation and O-linked N-acetylglucosamine (O-GlcNAc) [75]. Recently, it was reported that O-GlcNAcylation was also involved in the crosstalk with other PTMs such as acetylation and methylation [75, 76]. The PTM crosstalk of histone was reported to have a variety of PTMs involved, including acetylation, methylation, phosphorylation, and ubiquitination [77, 78], which tightly regulated gene expression. However, the correlation between protein glycation and other PTMs has not been systematically studied yet.

We studied the relationship between protein glycation and two most common Lys modifications, i.e., acetylation and ubiquitination. The ubiquitination sites were from previous reports [79, 80] and the acetylation sites were downloaded from Protein Lysine Modifications Database (PLMD) [81], in which the acetylation sites from the literature are archived. It was found that 67.9%, 72.6%, and 67.2% of the identified glycation sites from HEK293T, Jurkat, and MCF7 cells were the sites that were reported to be ubiquitinated (Figure 7a). This strongly suggested that glycation may prevent proteins from ubiquitination, and thus affect their degradation by proteasome.

Figure 7
figure 7

Overlap between the glycation sites identified here and the reported acetylation or ubiquitination sites. (a) Protein clustering based on biological process (b) and molecular function (c) for the proteins with glycation and acetylation

The overlaps between the identified glycation sites and the acetylation sites were even higher, i.e., 76.0%, 81.5%, and 73.5% for HEK293T, Jurkat, and MCF7 cells, respectively (Figure 7a), which indicated that glycation may disturb gene expression through the interference of protein acetylation. Gene Ontology (GO) analysis of the overlapped proteins showed a strong enrichment of biological processes associated with translation, gene expression, regulation of mRNA stability, and DNA metabolic process (Figure 7b). For molecular function analysis, proteins involved in RNA binding, histone deacetylase binding, and single-stranded DNA binding were highly enriched (Figure 7c). These results further supported that protein glycation may interfere gene expression and protein translation.

Conclusions

Compared to enzymatic modifications of proteins, non-enzymatic ones have been dramatically understudied. Glycation, as a non-enzymatic modification, is related to the development and progression of chronic diseases, especially diabetes. In this work, we systematically analyzed protein glycation in HEK293T, Jurkat, and MCF7 cells. Proteins at some cellular components such as the extracellular regions and the nucleus were more frequently glycated. Both protein primary sequences and secondary structures affected the formation of protein glycation. Interestingly, nearly every enzyme in the glycolytic pathway was glycated and a possible mechanism was proposed. In addition, we found that many glycation sites were also reported as the ubiquitination and acetylation sites, which strongly suggested that glycation may disturb protein degradation and the regulation of gene expression. The systematic analysis of glycated proteins provides valuable information for further clinical and biomedical investigation of protein glycation.