1 Introduction

With the advent of soft ionization sources, especially electrospray ionization (ESI) [1] and matrix-assisted laser desorption/ionization (MALDI) [2], mass spectrometry (MS) has become an increasingly powerful tool to investigate bio-macromolecules [38]. MS-based proteomics techniques offer a unique opportunity to systematically study proteins [9, 10], which is beyond the reach of conventional biochemical methods. Classic antibody-based protein identification methods depend entirely on the availability and quality of antibodies. In addition, antibodies are expensive and the experimental procedures are typically time-consuming and labor-intensive. Alternatively, MS-based techniques can assist in the confident global identification and quantification of proteins without the use of antibodies [1113].

Protein modifications are essential in biological systems and involved in nearly every cellular event [1418]. There has been great interest in the comprehensive and quantitative analysis of modified proteins to obtain a better understanding of protein functions and disease mechanisms, which could lead to the discovery of effective biomarkers and drug targets [1921]. Although MS-based proteomics techniques have been applied to globally identify modified proteins, pinpoint the modification sites, and quantify the abundance changes [2226], it is still extraordinarily challenging to achieve these goals in complex biological samples for multiple reasons [27, 28]. Many modifications are substoichiometric and occur on low-abundance proteins that only have several copies per cell, such as glycosylated receptor proteins. In addition, reversible modifications are very dynamic, and labile modification groups are not stable in the gas phase or during sample preparation. Finally, each modified group is structurally different; thus, there is no universal method that is suitable for the analysis of all protein modifications. Therefore, effective separation and enrichment methods are mandatory for the large-scale analysis of modified proteins.

Protein N-glycosylation is one of the most common and important modifications; it frequently initiates cell signal transduction and regulates cell–cell communication and cell–matrix interactions [29, 30]. Based on predictions and computational results, about half of all mammalian proteins are glycosylated at any given time [31]. Additionally, the vast majority of cell surface and secretome proteins are glycosylated. Aberrant glycosylation is often correlated with human diseases, including cancer and infectious diseases [21, 32]. Glycoproteins on the cell surface can serve as effective biomarkers for the early detection of disease and excellent drug targets for disease treatment because of their easy accessibility by small molecules and macromolecules. However, comprehensive identification of cell surface glycoproteins has not been thoroughly investigated, especially compared with intracellular proteins. Cell surface glycoprotein enrichment is even more challenging than typical proteomics experiments, which magnifies the difficulties associated with modified proteins mentioned above. Several years ago, a very elegant cell surface capturing method was reported for cell surface glycoprotein analysis [33]. Glycoproteins on living cells were oxidized, bound to biotin through a bifunctional linker molecule, biocytin hydrazide, and enriched by avidin for MS analysis. However, the main drawback of the method is that the oxidation reaction is not very fast or efficient and it can also act as an external stimulus to cells. Effective methods will profoundly advance the analysis of cell surface glycoproteins and provide insight into glycoprotein function.

In this work, we have developed an effective MS-based method to identify cell surface glycoproteins comprehensively and site-specifically. A sugar analog containing a biologically inert but chemically functional azido group was fed to cells to label cell surface glycoproteins according to a previous method [34]. Surface glycoproteins containing the functional group were subsequently bound to biotin through copper-free click chemistry under mild physiological conditions. Further separation and enrichment by exploiting the strong interaction between biotin and avidin allowed the global analysis of cell surface glycoproteins.

2 Experimental

2.1 Cell Culture and Labeling

HEK293T cells were grown in Dulbecco’s modified Eagle medium (DMEM) containing 10% fetal bovine serum (FBS). Once cells reached 10% confluency, medium was changed to DMEM containing 10% FBS and 50 μM N-azidoacetylgalactosamine (GalNAz). Cells were incubated for 3 d until confluency was ~80%. After metabolic labeling, cells were washed with phosphate buffered saline (PBS) two times and 100 μM dibenzocyclooctyne (DBCO)-sulfo-biotin in PBS was added into the culture flasks. Cells were incubated for 1 h with gentle agitation at 4°C and then harvested by scraping and centrifugation at 300 g for 5 min. The supernatant was discarded and the cell pellet containing about 4 × 107 cells was washed twice with PBS containing 10 mM dithiothreitol (DTT).

2.2 Cell Lysis and Membrane Protein Extraction

The cell pellet was incubated in a buffer containing 150 mM NaCl, 50 mM 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid (HEPES) (pH = 7.4), 25 ug/mL digitonin, and Roche (Indianapolis, IN, USA) protease inhibitor (1 tablet per 10 mL) with end-over-end rotation for 10 min at 4°C. After incubation, samples were centrifuged at 2000 g for 10 min. Cell pellets were washed with the buffer twice and subsequently lysed with the MiniBeadbeater (Biospec, Bartlesville, OK, USA) in a buffer containing 10 mM HEPES, 1.5 mM MgCl2, and 10 mM KCl. The resulting solutions were centrifuged at 2500 g for 10 min, and the supernatant was collected and centrifuged at 16,000 g for 30 min. The membrane-rich pellet was collected and washed two times with the lysis buffer. The pellet was further incubated in 0.1 M sodium carbonate solution containing 1 mM ethylenediaminetetraacetic acid (EDTA) on ice for 30 min, followed by centrifugation at 16,000 g for 15 min. Sodium carbonate incubations and subsequent centrifugations were repeated once. The membrane rich pellets were incubated with shaking in a buffer containing 4 M urea, 100 mM NaCl, 10 mM HEPES, and 1 mM EDTA for 30 min at room temperature, and then samples were centrifuged at 16,000 g for 15 min. The urea buffer wash was repeated once. Solubilization buffer containing 100 mM PBS and 1% NP-40 was added to cell pellets and incubated end-over-end overnight at room temperature.

2.3 Glycopeptide Preparation and Enrichment

Solubilized samples were centrifuged at 16,000 g for 15 min and supernatants were collected. Disulfide bonds within proteins were reduced with 5 mM DTT (56°C, 25 min) and subsequently alkylated with 14 mM iodoacetamide (room temperature, 30 min in the dark). After reduction and alkylation, proteins were purified with the methanol chloroform protein precipitation method [35]. Four volumes of methanol, one volume of chloroform and three volumes of water were added to one volume of the protein sample, and the mixture was vortexed. The sample was centrifuged at 5000 g for 20 min. The proteins remained at the phase boundary between the methanol and chloroform layers. The methanol layer above the sample was removed. Four volumes of methanol were added, and the mixture was vortexed again. The sample was centrifuged again at 5000 g for 15 min. The supernatant was removed without disturbing the pellet, and the pellet was dried.

The resulting ~2 mg protein samples were digested overnight at a protein:trypsin ratio of ~100:1 in 2 mL of buffer containing 50 mM HEPES (pH = 8.5), 0.1 M urea, and 5% ACN. The next day, the digestion was quenched with trifluoroacetic acid (TFA) and purified on a 200 mg Sep-Pak C18 cartridge. Purified samples were dried and enriched with 200 μL of NeutrAvidin bead slurry end-over-end for 30 min at 37°C. After enrichment, the peptide sample was transferred to a spin column and beads were washed 10 times with 400 μL PBS and once with water. Peptides were eluted from the beads twice by 2-min incubations with 200 μL of 8 M guanidine (pH = 1.5) at 56°C. Combined eluates were purified on a 50 mg Sep-Pak C18 cartridge. Enriched peptides were dried thoroughly before enzymatic deglycosylation with eight units of peptide-N-glycosidase F (PNGase F; Sigma-Aldrich, St. Louis, MO, USA) in 40 μL of buffer containing 40 mM NH4HCO3 in H2 18O for 3 h at 37°C. The deglycosylation reaction was quenched with formic acid (FA) and purified with the stage tip method. Proteins were eluted into three samples using 20%, 50%, and 80% ACN each containing 1% HOAc.

2.4 LC-MS/MS Analysis

Purified samples were dried and resuspended in a solvent containing 5% ACN and 4% FA, and 4 μL were loaded onto a C18-packed microcapillary column (Magic C18AQ, 5 μm, 200 Å, 100 μm × 16 cm) using a WPS-3000TPL RS autosampler (Thermostatted Pulled Loop Rapid Separation Nano/Capillary Autosampler; Dionex, Sunnyvale, CA, USA). Peptides were separated by reversed-phase chromatography using an UltiMate 3000 binary pump with a 110 min gradient that varied for three samples. The first sample had a gradient of 4%–25% ACN (0.125% FA), the second sample’s gradient was 10%–38% ACN (0.125% FA), and the third sample had a gradient of 15%–50% ACN (0.125% FA). Samples were detected in a hybrid dual-cell quadrupole linear ion trap-Orbitrap mass spectrometer (LTQ Orbitrap Elite; ThermoFisher, Waltham, MA, USA) using a data-dependent Top 20 method. Each cycle included one full MS scan (resolution: 60,000) in the Orbitrap at the automatic gain control (AGC) target of 1 million, followed by up to 20 MS/MS of the most intense ions in the LTQ. Selected ions were excluded from being further sequenced for 90 s. Ions with a single or unassigned charge were not fragmented. Maximum ion accumulation times were 1000 ms for each full MS scan and 50 ms for MS/MS scans.

2.5 Database Searches and Data Filtering

The raw files recorded by MS were converted into mzXML format. Precursors for MS/MS fragmentation were checked for incorrect monoisotopic peak assignments while refining precursor ion mass measurements. The SEQUEST algorithm [36] (ver. 28) was used to search and match all MS/MS spectra against a database encompassing sequences of all proteins in the Uniprot Human (Homo sapiens) Database (downloaded in February 2014) containing common contaminants such as keratins. Each protein sequence was listed in both forward and reverse orientations to estimate the false discovery rate (FDR) of peptide identification [37, 38]. A 20 ppm precursor mass tolerance and 1.0 Da product ion mass tolerance were used in the database search. Other selected parameters were full tryptic digestion, up to two missed cleavages, variable modifications including oxidation of methionine (+15.9949) and 18O tag of Asn (+2.9883), and fixed modifications including carbamidomethylation of cysteine (+57.0214).

In order to evaluate and further control FDRs of glycopeptide identification, the target-decoy method was employed [37, 38]. Linear discriminant analysis (LDA) was utilized to distinguish correct and incorrect peptide identifications using numerous parameters, including XCorr, ΔCn, and precursor mass error [39]. Separate linear discriminant models were trained for each raw file using forward and reverse peptide sequences to provide positive and negative training data. This approach is similar to other methods in the literature [40, 41]. After scoring, peptides less than six amino acids in length were discarded and peptide spectral matches were filtered to a less than 1% FDR based on the number of decoy sequences in the final data set. The dataset was restricted to glycopeptides when determining FDRs.

2.6 Glycosylation Site Localization

To localize glycosylation sites and obtain a level of confidence corresponding to the identification, we applied a probabilistic algorithm [42] that considers all potential glycosylation sites on a peptide and uses the presence or absence of experimental fragment ions unique to each to obtain a ModScore. The ModScore, which is similar to Ascore [42], indicates the likelihood that the best site match is correct compared with the next best match. If only one glycosylation site is possible, a value of 1000 is assigned to the site. We considered sites with a score ≥19 (P ≤0.01) to be confidently localized.

3 Results and Discussion

3.1 Labeling and Enrichment Method

Sugar analogs have frequently been used to discover glycotransferase inhibitors [43]. Some sugar analogs can be used by glycosyltransferases to modify glycans within mammalian proteins. By taking advantage of this discovery, scientists have extensively investigated culturing cells with sugar analogs bearing various functional groups, including azido, alkyl, and aldehyde groups [34, 4446]. The azido functional group in the sugar analog serves as a chemical handle for the click chemistry reaction, which will provide further insight into protein function and cellular activities.

In recent years, metabolic labeling has been employed for in vivo imaging experiments and valuable information has been obtained regarding the location of the sugar analogs and their relative abundance changes based on the fluorescence signal and corresponding intensity changes [45, 47]. In previous studies, GalNAz has been used in mammalian cells to modify proteins, resulting in azido-containing proteins. In this experiment, this strategy was applied to label glycoproteins that contain N-acetylgalactosamine (GalNAc). GalNAz was added into DMEM and incubated with HEK293T cells in order to metabolically label glycoproteins [48].

Cell surface glycoproteins with the functional azido group were selectively bound to biotin through copper-free click chemistry, as shown in Figure 1a. The copper-free click chemistry reaction between DBCO and the functional azido group is very specific, rapid, and efficient [49, 50]. More importantly, the reaction can occur under physiological conditions, which allows cell surface glycoproteins on living cells to be labeled and minimizes external stimuli. After metabolic labeling, cells were lysed and proteins were extracted and digested. Based on the strong interaction between biotin and avidin, glycopeptides containing biotin were selectively enriched by incubation with NeutrAvidin conjugated agarose beads. After incubation, the beads were washed to remove non-biotinylated peptides, and enriched glycopeptides were subsequently eluted from the beads.

Figure 1
figure 1

Principle of the cell surface glycoprotein enrichment method including (a) the metabolic labeling and click chemistry reaction, and (b) glycopeptide separation and analysis

Enriched glycopeptides were dried for at least 24 h in a vacuum concentrator and then treated with PNGase F in heavy-oxygen water. This enzyme cleaves N-glycans and converts asparagine (Asn) to aspartic acid (Asp) in the process. The deamination of Asn can also occur in vivo and during sample preparation. By performing this enzymatic deglycosylation in heavy-oxygen water, newly formed Asp is labeled with 18O, which helps distinguish bona fide glycosylation sites from those caused by non-enzymatic deamination [51, 52]. As a result, N-glycopeptides and their corresponding glycosylation sites were confidently identified and localized.

3.2 Glycopeptide Identification and Site Localization

Using this novel strategy, 144 unique N-glycosylated peptides were identified. Figure 2 shows the tandem mass spectra of three identified N-glycopeptides, EN#TSDPSLVIAFGR, GHTLTLN#FTR, and YSVQLMSFVYN#LSDTHLFPN#ASSK (# denotes the glycosylation site) from lysosome-associated membrane glycoprotein 1 (Lamp1), which is a single-pass type I membrane protein. It presents carbohydrate ligands to selectins and is involved in tumor cell metastasis [53]. These peptides were confidently identified with XCorr values of 4.11, 2.45, and 4.50, respectively. The high mass accuracy of each identification is represented by their ppm values of 0.84, 0.14, and –0.16. Corresponding glycosylation sites were confidently localized at N84, N103, N121, and N130, with ModScore values of 1000 for each. Additionally, all sites contain the consensus motif Asn-X-Ser/Thr, where X is any amino acid residue except proline.

Figure 2
figure 2

Tandem mass spectra of three peptides from the Lamp1 protein, including (a) EN#TSDPSLVIAFGR, (b) GHTLTLN#FTR, and (c) YSVQLMSFVYN#LSDTHLFPN#ASSK (# denotes the glycosylation site). The complete protein sequence with the highlighted identified peptides is shown in (d)

The high resolution and mass accuracy of the Orbitrap mass spectrometer allowed for very confident N-glycopeptide identifications. The mass accuracy distribution of all identified N-glycopeptides is displayed in Figure 3a. Clearly, the vast majority of identified glycopeptides have a mass accuracy of less than 3 ppm. The mass accuracy results show the high confidence level associated with glycopeptide identifications. This is further confirmed by the XCorr values, which are shown in Figure 3b. Most of the XCorr values are greater than 3, and only several of them are less than 2. We manually checked glycopeptide identifications with relatively low XCorr values, and all of them were relatively short peptides whose fragments matched very well with corresponding theoretical spectra. The low XCorr values assigned to short peptides are due to the fact that the peptide matches are normally biased for longer peptides, i.e., higher XCorr values for longer peptides, since there are more potential fragments and, therefore, more possible matches [54].

Figure 3
figure 3

Distribution of (a) ppm and (b) XCorr values assigned for each glycopeptide identification. The bin size for XCorr analysis is 0.5

Based on the same probability principle of Ascore [42], ModScore evaluates the confidence associated with each site localization [55, 56]. A value greater than 19 indicates a higher than 99% confidence level associated with the site localization, and a score greater than 13 corresponds to a higher than 95% confidence level. Among all the identified sites, 87% have a ModScore greater than 19, and only a very small portion of them (8%) have a ModScore less than 13, as shown in Figure 4c.

Figure 4
figure 4

Number of N-glycosylation sites identified in (a) peptides and (b) proteins. The ModScore distribution for each site localization is shown in (c)

Overall, 152 sites were identified on 110 cell surface glycoproteins in HEK293T cells, which are listed in Supplemental Table 1. The major advantage of the current method is that cell surface glycoproteins containing the azido group can be labeled under mild physiological conditions without toxic heavy metal ions (Cu(I) and Cu(II)) that are frequently used as a catalyst in the traditional click chemistry reaction, copper-catalyzed azide-alkyne cycloaddition (CuAAC) [57]. The limitations of the current method include the fact that the metabolic labeling method is not suitable for clinical tissue samples, but it can be employed for cultured cells and also model animals such as mice [58] and zebra fish [47]. In addition, this method can only be used to identify N-glycopeptides containing GalNAc, but even with this limitation, more than 150 unique glycosylation sites were identified in over 100 cell-surface proteins. The number of glycosylated proteins identified with the current method is comparable to previous results, in which 110 glycoproteins in Jurkat T-lymphocyte cells were identified with the cell surface-capturing method [33].

The majority of N-glycopeptides identified (95%) contained only one glycosylation site. Figure 4a shows the distribution of glycosylation sites in peptides; very few (six peptides) have two sites, and only one contains three N-glycosylation sites. Similarly, most proteins (72%) are singly glycosylated, and 21% of them contain two N-glycosylation sites (Figure 4b). The protein that contains five glycosylation sites was identified as nicastrin (Ncstn), which is an essential subunit of the gamma-secretase complex that is widely expressed in many different tissues [59].

3.3 Cell Surface N-Glycoprotein Clustering

All glycoproteins identified in this work were clustered using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) [60]. Membrane proteins were highly enriched with a P value of 1.37E-22. Of the 110 total proteins identified, 104 were categorized as membrane proteins. Six proteins were not defined as membrane proteins in the DAVID analysis, which could be due to incorrect annotations of these proteins, contamination, and/or experimental errors. Alternatively, they could be non-membrane proteins that are located on the cell surface through an anchor such as glycophosphatidylinositol (GPI) or other modified lipid groups.

All identified glycoproteins were categorized according to a biological process using DAVID. The biological processes with the highest enrichment include cell adhesion, cell motion, structure morphogenesis, transport, interspecies interaction, positive immune regulation, cell projection organization, stimulus response, and cell recognition (Figure 5a), all of which are consistent with the biological processes associated with cell surface proteins. Glycoproteins were also clustered based on molecular function, and the protein functions that were most highly enriched include signal transducer activity, transporter activity, and carbohydrate binding, as shown in Figure 5b. Cell surface proteins are known to participate in these biological functions and processes [61].

Figure 5
figure 5

Clustering of glycoproteins based on (a) biological process and (b) molecular function

3.4 N-Glycosylation Sites on Cell Surface Transporters

Membrane transporter proteins compose a large group of cell surface proteins and they regulate the input and output of many molecules and ions. About 10% of human genes are transporter-related, which corresponds well with the biological significance of transporters and their roles in cell homeostasis [62]. These membrane transporter proteins can serve as potential biomarkers and therapeutic targets. There were 10 transporter proteins identified in this dataset, which are listed in Table 1.

Table 1 N-Glycosylation Sites in Transporter Proteins Identified in the Current Cell Surface Experiment

The protein solute carrier family 3 member 2 (Slc3a2), also designated as cluster of differentiation 98 (CD98), is the 4F2 cell-surface antigen heavy chain and is required for the function of light chain amino-acid transporters [63]. In this experiment, three N-glycopeptides were confidently identified from Slc3a2, i.e., DASSFLAEWQN#ITK, LLIAGTN#SSDLQQILSLLESNK, and SLVTQYLN#ATGNR with corresponding N-glycosylation sites at N365, N381 and N424.

N-glycosylation may play determinant roles in trafficking these proteins to the cell surface since glycosylation has been proven to regulate the classic secretory pathway [64]. Glycans within proteins can significantly increase the overall hydrophilicity and impact protein interactions with other small molecules or macromolecules. Reversible glycan binding on these cell surface transporters may also participate in the regulation of molecular transport through the cell membrane.

3.5 N-glycosylation Sites on Cluster of Differentiation Proteins

Molecules presented on the cell surface can differentiate and classify the cell type. Cluster of differentiation (CD) molecules are a group of cell surface molecules that are selected to distinguish cell type. Traditionally these CDs are targets for immunophenotyping [65] and often have a wide variety of functions. For example, some CDs are receptors or ligands and some contribute to cell adhesion. To date, around 350 CDs have been designated to different types of human cells. In this experiment, 35 N-glycosylation sites were located on 24 CDs in HEK293T cells (Supplemental Table 2) and some selected sites are listed in Table 2.

Table 2 Selected CD Proteins Identified and Their Corresponding N-glycosylation Sites

Integrins are cell adhesion receptors that are conserved between species, and play critical roles in developmental and pathological processes. The integrin family is heavily involved in mediating the attachment of cells to the extracellular matrix and also takes part in specialized cell–cell interactions [66]. Integrin alpha-1 (Itga1), designated as CD49a, acts as a receptor for laminin and collagen and participates in anchorage-dependent negative regulation of epidermal growth factor-stimulated cell growth. Integrin alpha-2 (Itga2), CD49b, is a receptor for laminin, collagen, collagen C-propeptides, fibronectin, and E-cadherin. It is also accountable for adhesion of platelets and other cells to collagen, the modulation of collagen and collagenase gene expression, and organization of newly synthesized extracellular matrix. Integrin alpha-5 (Itga5), CD49e, is a receptor for fibronectin and fibrinogen. Glycosylation sites were identified on each of these three cell-surface integrins in HEK293T cells.

CD molecules are crucial for classifying cells, but their glycosylation remains largely unstudied. Considering the importance of these molecules and protein glycosylation on the cell surface, information regarding CD glycosylation can provide insight into the functions of these molecules. Cell-surface protein glycosylation analysis can offer valuable information that will lead to the identification of potential drug targets and biomarkers for diseases.

4 Conclusions

N-glycoproteins on the cell surface are essential for a wide range of cellular events, and their global analysis is exceptionally challenging. A novel method has been developed that integrates metabolic labeling, copper-free click chemistry and mass spectrometry-based proteomics techniques to comprehensively analyze cell surface N-glycoproteins. Labeling proteins with the azido functional group allows the selective enrichment and separation of cell surface glycoproteins, while the copper-free click chemistry reaction tags cell surface glycoproteins with biotin under physiological conditions. In this experiment, 152 N-glycosylation sites were identified in 110 cell surface proteins in HEK293T cells by MS. The main functions assigned to identified cell surface N-glycoproteins, including signal transducer, transporter, binding, and catalytic activity, are consistent with the documented functions of cell surface proteins. The current strategy provides an effective method for large-scale analysis of the cell surface N-glycoproteome, and can be extensively applied for further cell surface studies.