Protein glycosylation

It is a well-established concept that gene expression and protein expression are not the sole factors responsible for phenotype determination. The discovery of the varying roles of post-translational modifications (PTMs) of proteins has identified another level at which functional information is stored. Of the more than 200 different types of protein PTMs, glycosylation is a frequently occurring and particularly important one [14]. Glycosylation has been shown to have an important role in a number of physiological processes, including protein folding and trafficking, cell-cell and cell-matrix interaction, cellular differentiation, fertilization, and the immune response [59]. Approximately half of all mammalian proteins are glycosylated, with an estimated 3,000 different glycan structures recorded (not including all variants resulting from differences in glycan linkages and anomers), which can vary to a large degree, based on differences in tissue, cell type, and disease state [10, 11]. It is estimated that 250 to 500 genes are involved in the protein glycosylation process [12]. Carbohydrate molecules on proteins can be attached to asparagine residues within the N-X-S/T consensus sequence when X is not a proline (N-glycosylation), or to serine or threonine residues (O-glycosylation). This occurs during or after translation as the nascent protein is shuttled through the endoplasmic reticulum (ER) and subsequent organelles in the classical secretory pathway (Figure 1). However, glycosylation is not a template-based process such as DNA, RNA, or protein synthesis, but is rather based on the balance achieved by the expression and activity levels of the different glycan attachment and processing enzymes involved in trimming and addition of monosaccharides, and on the availability of precursor monosaccharide molecules, which in turn is dependent on nutrient resources and expression of other metabolic enzymes responsible for their synthesis and interconversion [7, 8, 13]. This greatly increases the complexity of the protein glycosylation process, resulting in extensive molecular microheterogeneity of glycoproteins, and thus the requirement for a specialized set of tools for their study.

Figure 1
figure 1

Life span of glycoproteins from translation to circulation. The translation of signal peptide-containing membrane and secreted protein occurs on the surface of the endoplasmic reticulum (ER), with the growing peptide chain being shuttled through the translocon complex into the lumen of the ER. In the ER lumen, core N-glycosylation of accessible N-X-S/T sites is performed by the oligosaccharide transferase component of the translocon complex while the nascent protein is being translated and folded. Following the completion of translation, folding, and core glycan processing, the protein is shuttled to the Golgi apparatus, where further N-glycosylation and O-glycosylation are performed by different glycosyltransferases. In the Golgi, glycoproteins are packaged into secretory vesicles bound for fusion with the plasma membrane, where the secreted proteins are released into the extracellular space and membrane proteins are presented on the surface of the cell, making them accessible for cleavage and release by proteolytic enzymes. Once in the extracellular space, these glycoproteins can then enter the circulation.

Glycosylation in cancer

Since the initial observation in 1969 showing that membrane glycoproteins of higher molecular weight were present in transformed mouse fibroblasts compared with their normal counterparts [14, 15], aberrant glycosylation patterns have been established as a common characteristic of oncologic malignancies. These patterns have been observed in almost all types of experimental and human cancers. Even under non-malignant conditions, individual glycoproteins are produced in a number of different glycoforms [16]. The differences in these forms can arise from differential occupancy of glycosylation sites or variability in attached glycan structures. This allows for great heterogeneity in glycosylation of single proteins even under normal physiological conditions. However, under normal physiologic conditions, the distribution of these glycoforms is stable and reproducible. Once malignant transformation occurs, when underexpression, overexpression, or neoexpression of glycan moieties can occur, this balance is disturbed, and can expand the degree of pre-existing microheterogeneity of individual proteins [17]. In tumors, the changes in glycan structures most often arise from disturbances in the expression and activity levels of different glycosyltransferases and glycosidases along the secretory pathway, in the ER and Golgi of cancer cells [1822]. This can lead to changes in the structures of both N- and O- linked glycans. For example, increased activity or expression of N-acetylglucosaminyltransferase V (MGAT5) has been shown in a number of tumors, resulting in increased glycan branching on proteins and increased tumor growth and metastasis [2327]. Alterations in terminal glycan residues can also occur during malignancy, which is often the case with the upregulation of different sialyltransferase enzymes in tumors [2833]. However, it must be noted that altered glycosylation does not only occur on proteins produced by the tumor itself, but may reflect the host's response to the disease. In patients with cancer, acute-phase proteins and IgGs have been shown to have glycosylation patterns distinct from those found under normal physiological conditions [18]. Therefore, the detection and quantification of the disturbances in protein glycosylation can aid in the screening and diagnosis of virtually all cancer types.

Glycoprotein cancer biomarkers

Some of the oldest and most common clinically utilized serological biomarkers for cancer diagnosis and monitoring of malignant progression are glycoproteins. Some of these include prominent glycoprotein biomarkers that are widely monitored in patients with prostate cancer (prostate-specific antigen (PSA)), ovarian cancer (carcinoma antigen (CA)125, mucin 16), colon cancer (carcinoembryonic antigen (CEA)), and non-seminomatous testicular carcinoma (human chorionic gonadotropin β-subunit (hCG-β)) (Table 1). Although all of these proteins have been shown to have aberrant glycosylation patterns in malignancy [2937], only their total protein levels are clinically monitored. Simultaneous measurement of their different glycoforms might increase the diagnostic potential of these molecules. For two other common tests, alpha-fetoprotein (AFP) for hepatocellular carcinoma and CA15-3 (mucin 1 epitope) for breast cancer, specific glycan structures on these proteins are monitored, as discussed below.

Table 1 List of common serological tumor markers in clinical use that contain a glycan componenta

Some of the most widely used discovery platforms for the identification of novel glycobiomarkers have been previously reviewed [17, 3840]. The methods used for the characterization and analysis of glycan-based cancer biomarkers currently used clinically and for others in earlier stages of development have also been previously reviewed by Adamczyk et al. [41]. In the present review, we focus on the currently available and potential future techniques that can be used for the quantification of glycoprotein biomarkers in biological fluid or serum patient samples.

There are three general approaches, using a variety of techniques, by which glycoproteins or carbohydrate epitopes can be quantified. The most commonly used approach involves the measurement of total levels of a given glycoprotein biomarker. This usually involves the production of monoclonal antibodies against a given glycoprotein, facilitating the development of an assay capable of quantifying total protein levels in the biological fluid of interest. This is the case with PSA, CA125, hCG-b, and CEA quantification (Table 1). However, this type of methodology is not capable of detecting the changes occurring in the glycosylation patterns of the target glycoprotein as a result of malignant transformation, thus missing out on another level of information that could lead to improved diagnosis and monitoring of disease. Therefore, even though a glycoprotein is being measured, its glycan moiety is completely ignored.

Another approach involves the detection and quantification of a particular glycan structure shown to be associated with cancer, such as the antibody-based measurement of the blood group antigen Lewisa in the CA19-9 assay [42]. This type of approach does not yield any information about the identity or quantity of the glycoprotein with the particular carbohydrate epitope, thus also does not include the full scope of information, which could lead to improved diagnosis, especially if the protein is produced directly by the tumor.

The third, most rarely used, and most difficult type of approach to develop allows for detection and quantification of both total protein levels and associated glycan structure(s), such as the measurement of the core-fucosylated species of AFP in hepatocellular carcinoma [43, 44]. This type of assay can yield the most information and overcomes the weaknesses of the other two approaches mentioned above. Therefore, the development of such a method would have the most diagnostic benefit.

The potential and the pitfalls

In the past decade or so, there have been significant advancements in the characterization of the glycosylation patterns of individual proteins and in the identification of glycoproteins in a number of complex biological fluids. This has occurred mostly through the development and refinement of mass spectrometry techniques and equipment, which, when used in concert with the traditional methods used for characterization of protein glycosylation, can provide a powerful complement of tools to tackle the problem of fully understanding the complexity and heterogeneity associated with protein glycosylation and applying the gained knowledge in a clinical setting. However, there has been limited progress in tapping the full potential of glycobiomarkers and their dual nature in order to develop an assay capable of simultaneously delivering information on the absolute quantity of the protein and of its associated glycan structures in complex matrices, such as serum, which is the preferred sample type for high-throughput clinical analysis.

Some of the best and most widely recognized cancer biomarkers are highly tissue-specific, such as PSA for prostate tissue, hCG for the placenta, and AFP for the developing fetus (Figure 2). Using such markers malignant transformation of cells in a single organ causing the overexpression or neoexpression of a protein can be detected and monitored more reliably and earlier in the progression of the disease, compared with a protein expressed ubiquitously or in multiple tissues. However, proteins with such characteristics are quite rare. Considering that glycosylation patterns of the same protein can differ both between tissues and between normal and transformed cells, the capability of detecting and quantifying these differences could confer tissue-/tumor-specific profiles on a large number of glycoproteins. The ability to perform such a task reliably, and in a routine fashion, could greatly expand the field of potential biomarkers and the chances of their application in the clinical setting.

Figure 2
figure 2

Gene expression of alpha-fetoprotein (AFP), beta-human chorionic gonadotropin (beta HCG), and prostate-specific antigen (PSA) by tissue. Figure adapted and modified from the BioGPS Application [151], using the HG_U133A/GNF1H Gene Atlas [152].

However, there is a series of technical and biological obstacles to developing quantitative assays that reflect the full picture of the status of a glycoprotein biomarker. The majority of challenges preventing reliable, clinically applicable binary measurement of glycoprotein biomarkers are of a technical nature. More specifically, there is only a very limited set of tools capable of performing this task, each with its own set of associated limitations and difficulties. Currently, the options for concurrent quantification of a protein and its associated glycans are limited to a combination of antibody-mediated protein capture and detection with glycan-specific antibodies, lectins, or mass spectrometry. The advancement of these approaches is hindered by the absence of a suitable recombinant technology capable of reliable and convenient production of glycoproteins with the desired glycan structures, which would allow for more convenient and detailed studies. However, because protein glycosylation is not a template-driven linear sequence-based process, such as DNA or protein synthesis, a suitable solution to this problem does not seem to be on the horizon, even though some advancements have been made [45]. Owing to the large number of combinations of branched oligosaccharide structures that can be created from available monosaccharides in eukaryotic cells, and especially cancer cells, where target protein production and normal glycosylation processes are greatly disturbed, the staggering glycan microheterogeneity can significantly impede the precise binary measurement of individual glycobiomarkers [46]. That is why the majority of proteins for which the development of these types of assays has been attempted are high-abundance proteins themselves (for example, transferrin, haptoglobin, IgGs, and alpha-1-acid glycoproteins). Therefore, a quantitative detection system encompassing the heterogeneity of glycan structures of a single protein in a single output holds great potential for bringing the use of more glycobiomarkers to a respectable (clinically testable) level.

The majority of the top 22 high-abundance plasma proteins, which account for 99% of protein content in serum, are glycoproteins [47]. These include such proteins as the Ig family members, haptoglobin, antitrypsin, and transferrin, among others. However, the majority of potential biomarkers are found at significantly (several orders of magnitude) lower levels in the serum. Taking into consideration that a specific glycan profile on one protein might indicate a malignant condition, but the same profile on another protein (for example, one of the high-abundance proteins) might not, the specificity of detection of low-concentration serum glycoproteins by lectins or even glycan-specific antibodies can be hindered by high background levels of contaminating high-abundance glycoproteins. Thus, these methods of detection lag far behind the gold standard (sandwich ELISA) in sensitivity, especially when taking into account that only a subset of the total population of the target protein is being measured.

Therefore, in this review, we focus on the technologies with the capability or a strong potential for binary (protein and carbohydrate) measurement of glycoprotein cancer biomarkers in serum, and describe the challenges associated with the different approaches.

Lectin-based methods

The existence of lectins has been known for over 100 years, since the discovery of ricin by Stillmark in 1888 [48]. However, wider application in research did not occur until the early 1970s [49, 50]. Lectins are proteins with a proven affinity and selectivity for specific carbohydrate structures, to which they can bind in a reversible fashion. Lectins can recognize carbohydrates conjugated to proteins and lipids, or free monosaccharides or polysaccharides. In excess of 500 lectins have been discovered, mostly of plant origin, and over 100 are commercially available [48]. They have been used in a wide variety of technical formats, including lectin blots, immunohistochemistry, liquid chromatography, and lectin microarrays. Despite extensive characterization and the many years of experience with lectin research, there are only a few applications in which lectins have been used in a clinically applicable high-throughput fashion to detect and quantify serological biomarkers in cancer. Lectins are the oldest and most reliable tools for glycoprotein characterization, and are indispensible in any endeavor involving analysis of protein-associated glycans; however, the lectin journey from an analytical to a quantitative tool has been a long one, with many obstacles and few successes.

Enzyme-linked lectin approaches for detection of carbohydrates have been known and used for close to 3 decades [51, 52]. These types of quantitative assays have been ported into a high-throughput multiwell plate format, similar to the common ELISA technique in which a protein of interest is captured and/or detected by an antibody, but with lectins taking over the antibody roles. Over the years, there have been several types of assays, which can be grouped together under the common name of 'enzyme-linked lectin assay (ELLA)'. In one format, serum or cell-bound proteins are non-specifically immobilized, and the global levels of a particular glycan structure are detected using a specific lectin. This has been performed on the sera of patients with squamous cell carcinoma of the uterine cervix by measuring the levels of the Thomsen-Friendenreich antigen (T-Ag) using the peanut agglutinin (PNA) lectin for detection [53]. The reactivity of a number of lectins to serum glycoproteins from patients with lung cancer was also measured using this general approach [54]. It has also been used extensively for detection and differentiation of a number of species of bacteria [5557]. In another use of lectins in an ELLA-type approach, an immobilized lectin is used to capture all glycoconjugates with a particular glycan structure from a complex biological sample, and the presence and quantity of a particular protein is then determined by antibody detection. An example of this approach was a study detecting wheat germ agglutinin (WGA)-bound mucins in the serum of patients with pancreatic cancer [58]. However, this approach requires the target glycoprotein to account for a significantly large proportion of the total glycoprotein content in the sample, which is often not the case. Another, more desirable approach involves the antibody-based capture of a single protein and subsequent detection of associated glycan components by lectins. This approach has been used to measure sialylation of transferrin [59], fucosylation of PSA in patients with prostate cancer [60], sialylation of recombinant erythropoietin [61], WGA and ConA reactivity to p185 in the serum of patients with breast cancer [62], and fucosylation of haptoglobin in the serum of patients with pancreatic cancer [63].

It must be noted that the antibody-lectin sandwich approaches are plagued by a number of technical issues, which can be addressed with varying degrees of success. A major issue is the inherent glycosylation of the antibodies used to capture a specific glycoprotein, which can cause a non-specific background signal from lectin binding, often masking the signal from the glycoprotein of interest. This effect can be minimized by the enzymatic or chemical derivatization of the antibody-associated carbohydrates prior to use in the assay [59, 64, 65]. Another issue is the limited recognition range of any given lectin for a particular glycan structure, thereby preventing the detection of the full scope of the heterogeneity of glycosylation on any particular glycoprotein. Use of multiple lectins for detection in an array format can ameliorate this issue (see below). When considering serum as the analyte matrix, another significant source of background signal in this type of assay comes from the non-specific contamination by high-abundance glycoproteins. This often masks the signal from low-abundance glycoprotein analytes. This is not an issue when measuring other high-abundance serum glycoproteins, such as transferrin [59] or haptoglobin [63], as dilution of the serum sample can lower the background noise to a minimal level. For low-abundance glycoproteins, for which sample dilution is not an option, more rigorous washing and blocking steps are required [66].

The greatest success with use of lectins for the diagnosis of malignant conditions has been the discovery and quantification of the Lens culinaris agglutinin (LCA)-reactive species of alpha-fetoprotein (AFP-L3). This has been shown to improve the specificity for hepatocellular carcinoma (HCC) compared with total AFP levels, as the latter can be elevated in pregnancy, hepatitis, and liver cirrhosis [43, 44, 67, 68]. However, in an ingenious departure from the ELLA-type approach, in which a lectin replaces an antibody in an ELISA format, the AFP-L3 test relies on the liquid-phase capture of AFP reactive to LCA, and subsequent measurement of bound and unbound portions of the protein by an ELISA for total AFP. Therefore, the lectin is not used for detection but for fractionation of the AFP glycoprotein populations in the patient serum, and the quantification is performed by a standard ELISA developed with antibodies recognizing peptide (non-glycosylated) epitopes. It is highly fortuitous, given the microheterogeneity associated with AFP glycosylation in HCC, that only the core fucosylation status of the single N-glycosylation site of AFP, as detected by LCA, is sufficient for successful diagnosis [69, 70].

Over the past decade, a new role has been identified for lectins in the characterization and quantification of serum glycoproteins in malignant conditions. In a re-imagination of the ELLA approach, multiple lectins are now being used to simultaneously detect different carbohydrate structures on antibody-captured glycoproteins in a microarray format. Several groups have created methods in which an antibody is immobilized in an array format and lectins are used to measure glycosylation of the captured proteins [65, 7173]. The major advantage of this approach is the ability to detect a glycan profile of any given glycoprotein, and to compare it between different samples in a high-throughput fashion. Aberrant glycosylation patterns of mucins, carcinoembryonic antigen-related cell adhesion molecule, and alpha-1-beta glycoprotein in clinical samples from patients with pancreatic cancer have been detected using similar methods by different groups [7476]. This type of approach goes a long way toward detecting the heterogeneity of glycan structures of individual glycoproteins, but at the core, it is only multiplexing of the ELLA method, with its associated restrictions, which has been known and applied with limited success over the past 3 decades.

Mass spectrometry-based methods

Advancements in mass spectrometry (MS) have revolutionized the field of carbohydrate research, and led to the initiation of a large number of studies dealing with the identification, analysis, and quantification of glycoconjugates [17, 77]. With regard to glycosylated proteins, these studies range from inspections of individual glycoproteins to elucidation of whole glycoproteomes. Toward these ends, MS has been coupled to a number of well-established, as well as novel technologies, dealing with chemical modification, chromatographic separation, and affinity purification of glycans to achieve the best results. These studies have been conducted on multiple MS platforms, including ion trap (IT), linear trap quadrupole (LTQ), time of flight (TOF), quadrupole/triple quadrupole (Q), Orbitrap, and Fourier transform ion cyclotron resonance (FTICR) mass analyzers [39]. As a result of its proven utility, MS analysis has become an almost absolute requirement for any study dealing with the identification and analysis of protein glycosylation. MS-based approaches for glycoprotein identification, analysis, and characterization have been reviewed extensively and in a number of publications [17, 39, 40, 77, 78]. Several major groups have focused on liquid chromatography (LC)-coupled MS methodologies for glycan analysis, using separation and enrichment of glycans by hydrophilic interaction liquid chromatography (HILIC), porous graphitized carbon (PGC), and reverse-phase (RP) liquid chromatography. Some examples include studies on HILIC for analysis of native and derivatized glycans [7981]; the use of PGC for enrichment and separation of native glycans [82, 83]; and the work of Alley et al. and Mechref using RP LC [84, 85]. However, the quantification of glycoproteins and their associated glycans using MS techniques is at a nascent stage, with no clinical applications to date. Similar to the strategies for identifying and characterizing protein glycosylation, MS can also be used to quantify glycoproteins only or glycoprotein-associated glycans only, or to simultaneously measure both the quantity of the protein and its associated carbohydrate structure. These quantification strategies have followed the same trend as the established MS-based techniques for quantifying proteins. These can be further separated into label-based or label-free approaches. Most of the common labeling methods have involved stable isotopic labeling techniques, such as 16O/18O, 12C/13C, stable isotope labeling with amino acids in culture (SILAC), isobaric tags (iTRAQ), and isotope-coded affinity tags (ICAT) [39]. These strategies are regularly used for comparison and relative quantification of glycoprotein analytes between samples. Label-free approaches have included spectral counting, ion-intensity measurement, and multiple/selected reaction monitoring (MRM/SRM). However, as can be seen from the majority of the recent examples in literature shown below, all of these approaches, and their combinations, have been limited to quantification of glycoproteins that are highly purified in background matrices much less complex than serum or other biological fluids of interest or dealing with one of the high-abundance proteins.

Although routinely used for identification and characterization purposes, an established application of MS in the glycomics field is the quantification of carbohydrates released, chemically or enzymatically, from individual or multiple glycoproteins. MALDI-MS instrumentation has been shown to be invaluable for this type of approach. This platform was used by two different groups to quantitate sialylated glycans enzymatically released (PNGase F-treated) glycoproteins in a high-throughput fashion. For example, a MALDI-TOF-based methodology was developed for absolute and relative measurement of up to 34 major N-glycans released from (mostly high-abundance) serum proteins by optimization of glycan release conditions through development of novel detergent reagents [86]. The diagnostic and stage stratification potential of MS-based quantification of permethylated glycans from serum proteins of patients with breast cancer was shown by a study that was able to identify and quantify close to 50 different glycan structures [87]. Relative quantification of anthranilic acid-derivatized glycans enzymatically released from alpha-1-acid glycoprotein purified from serum in combination with linear discriminant analysis has been shown to have the potential to discriminate between normal individuals and patients with ovarian cancer and lymphoma [88]. Similar approaches have also led to the identification of serum haptoglobin glycans with diagnostic potential in lung cancer [89] and liver disease [90].

Quantification of proteins, including some glycoproteins, by MRM/SRM and LC-MS has been performed for a number of biological fluids [9194]. Great advances have been made with approaches using immunoaffinity enrichment of peptides or proteins followed by MRM/SRM-based quantification, achieving levels of sensitivity applicable to the concentration range (ng/ml) at which low-abundance tumor biomarkers are found [9599]. This type of methodology has also been used in combination with different types of glycan affinity enrichment strategies, thereby producing hybrid assays in which classic glycoprotein enrichment strategies are used for capture of specific glycoforms, and MS is used for detection and quantification of the protein in those subpopulations by monitoring the MS2 fragmentation of non-glycosylated tryptic peptides. One such example was the quantification of the phytohemagglutinin-L4 (L-PHA)-enriched fraction of tissue inhibitor of metalloproteinase 1 from the serum of patients with colorectal carcinoma and the supernatant of colon cancer cell lines [100, 101]. A number of high-abundance serum proteins were quantified recently in the serum of patients with HCC by the same group using a similar approach of glycoprotein enrichment by lectin and quantification by MRM [102]. Also, a method for the measurement of total glycosylated and sialylated PSA has been recently developed, in which periodate-oxidized PSA tryptic glycopeptides are captured using immobilized hydrazide, released by PNGase F, and quantified by MRM using a triple quadrupole LC-MS [103]. However, it must be noted that these types of studies do not exploit the full potential of MS in detection of the heterogeneity of glycan structures associated with any given glycoprotein, but rather use this technology solely for protein quantification, which could be performed more conveniently and reliably by classic methods such as ELISA.

The true potential of MS in the quantification of protein glycosylation lies in the measurement of total levels of the glycoprotein, while simultaneously measuring its heterogeneously glycosylated subpopulations. The ultimate goal is the development of site-specific label-free methods that are capable of simultaneously quantifying multiple glycopeptides encompassing multiple glycosylation sites and their different glycoforms, using a non-glycosylated peptide from the glycoprotein of interest or a labeled exogenous peptide standard, which could serve as an indicator of the total glycoprotein concentration. Considering that MRM assays have been developed for simultaneous measurement of dozens of tryptic (or other proteolytic) peptides from dozens of proteins, it is not inconceivable that a similar technique could be developed for glycopeptides with different glycan structures from a single, or even multiple proteins. A general schematic of a glycopeptide-targeted MRM from a single glycoprotein can be seen in Figure 3A. However, to improve the sensitivity of such assays, further development and technical advances will be required.

Figure 3
figure 3

Glycopeptide MRM/SRM. (A) General schematic representation of multiple reaction monitoring (MRM). Peptides and glycopeptides from a protease (normally trypsin)-cleaved glycoprotein are subjected to triple quadrupole mass spectrometry (MS). Only selected parent ion ions were selected for fragmentation, and the resulting fragment ion intensities were used for (glyco)peptide quantification. (B) Representative chromatogram of simultaneous MRMs of 25 pyridyl amineated sialoglycopeptides found on 16 glycoproteins in mouse serum. Adapted and modified from Kurogochi et al. [109].

In addition to the general problems with quantifying glycoproteins described above, a number of technical limitations are currently preventing the application of this type of approach to glycoproteins found in samples of clinical interest. The major issue is the much lower ionization efficiency of glycopeptides compared with their non-glycosylated counterparts, generally following the trend that ionization efficiency decreases with glycan branching and sialylation [104, 105]. This can result in differences of several orders of magnitude in absolute signal values between glycopeptides and non-glycosylated peptides [104, 105]. Additionally, compared with the measurement of non-glycosylated peptides in the same quantity of the protein analyte, the MRM signal for any individual glycopeptide (of which there is a heterogenous population for any given glycosylation site of a glycoprotein) will be significantly lower, because it represents only a subset of a heterogeneous glycoform population. Major complications can also arise in developing a glycopeptide quantification method because of the absence of exogenous glycopeptide standards and incomplete proteolytic digestion cased by steric hinderence by the glycan chains [104, 106, 107].

Verification of candidate biomarkers in non-serological bio-fluids using MRM/SRM assays has become standard practice in biomarker discovery laboratories. The challenges associated with the development and optimization of MRM assays were significantly eased with the advent of MRM-transition-prediction and data analysis software such as Pinpoint (Thermo Fisher Scientific Inc., Rockford, IL, USA) and Skyline (Open-source software, MacCoss laboratory, University of Washington, Seattle, WA, USA). Owing to the absence of such invaluable tools for glycan-bound peptides, MRM development for this use is still a daunting task. However, the difficulties associated with the prediction of glycopeptide MRM transitions and their optimal collision energies can be overcome by monitoring common oxonium and peptide positive N-acetylhexosamine ions that occur during fragmentation [104, 108].

Despite these considerable obstacles, some proof-of-concept studies have been performed. For example, in a recent study by Song et al., [104] MRM assays were developed for the quantification of fetuin and alpha1-acid glycoprotein glycopeptides applicable to serum samples. Kurogochi et al. have beeen able to develop MRM assays for quantification of 25 glycopeptides from 16 glycoproteins found in serum of mice (Figure 3B) [109]. Specifically, sialic acid moieties on glycopeptide were oxidized with sodium periodate, enriched for by hydrazide chemistry, labeled with 2-aminopyridine, and the resulting labeled sialoglycopeptides were subjected to MS. Preliminary studies have also been performed with purified RNase B and asialofetuin [110]. Haptoglobin glycopeptides were characterized and relatively quantified in serum samples of patients with psoriasis [111] and patients with pancreatic cancer [112]. Ion-current intensities were used to quantify glycopeptides from alpha-1-acid glycoprotein [113]. The core-fucosylated subpopulations of several glycoproteins were quantified using partial deglycosylation with Endo F3 in conjunction with glycopeptide MRMs [114]. With the improvement and evolution of MS technology and sample-preparation techniques, these types of assays will play a more prominent role in the quantification of glycoproteins. In a futuristic scenario, to construct high-throughput platforms for the verification of cancer-exclusive glycoforms, these MRM-MS assays could be coupled to robotic immunoaffinity enrichment methods [115].

Alternative strategies

Although lectin and MS-based approaches for quantification of glycoproteins are the most common, there are other technologies that are also applied, and new ones are being developed, to be used alone, or in combination with each other. Well-established liquid-chromatography techniques using HILIC or PGC are readily available for enrichment and separation of glycan and glycoconjugates in conjunction with other detection and quantification methods [116118]. The most established affinity-binding agents for quantification of proteins and other molecules are antibodies, and the ELISA still remains the gold standard for the clinical measurement of serological targets. However, glycan-specific antibodies are extremely rare compared with antibodies recognizing peptide epitopes, and their use in the field is limited compared with lectins. This is because carbohydrates have been shown to be poor immunogens, and their antibodies have affinities comparable with those of lectins, but with a much more difficult development process. In addition, antibodies detecting an epitope that encompasses a part of a given protein's sequence, while at the same time recognizing its glycan structure, thereby giving site and glycoprotein specificity, are extremely rare. Therefore, the possible advantage of using a glycan-specific antibody over a comparable lectin is minor. The issue of cross-reactivity has been brought up for Tn antigen-recognizing antibodies [119]. In a recent study, 27 commonly used carbohydrate-binding antibodies against histo-blood group, Lewis, and tumor antigens were examined for their specificity using a glycan/glycoprotein array [120]. Although some showed high specificity and affinity for their targets, almost half of them exhibited cross-reactivity for other glycan structures. In cancer research, the role of such antibodies has been mostly limited to indirect quantification by immunohistochemistry and blotting. When considering applications of glycan-specific antibodies for serological markers of malignancy, the CA 19-9 and CA 15-3 tests stand out. By using a sandwich ELISA, the CA 19-9 test measures the serum levels of sialyl Lewisa antigen on glycoproteins and glycolipids, and is used for monitoring of pancreatic cancer progression and recurrence, and for differentiation of the cancer from pancreatitis [42, 121, 122]. The CA 15-3 test is used to quantify a sialylated O-glycosylation epitope on mucin 1 (MUC1), and is used for prognosis and monitoring of treatment for breast cancer [123, 124].

Chromatography-based strategies have also been used with some success. Ion-exchange chromatography is being used clinically for separation and quantification of serum transferrin glycoforms to test for congenital disorders of glycosylation [125, 126]. KLK6 glycoforms have also been measured in a number of biological fluids, including serum, from patients with ovarian cancer at low concentrations (down to 1 ng/ml) using strong anion exchange for separation and ELISA for quantification [127]. Novel strategies are also being used for the development of new carbohydrate-recognizing agents, which could be used in a quantitative fashion. Phage display technology has been used to improve and alter the binding properties of glycan-binding modules of glycan-processing enzymes and for development of carbohydrate-binding peptides [128132]. The technique of systematic evolution of ligands by exponential enrichment (SELEX) has been applied to the development of aptamers, single-stranded DNA or RNA oligonucleotides, which have been tried as binding agents for a number of carbohydrate moieties [130, 133137]. The more recent advancements and nascent technologies developed for carbohydrate detection, also referred to as glyco-biosensors, have been reviewed extensively [1, 138, 139]. Some of these include electrochemical impedance spectroscopy [140143], molecular 'tweezers' [144], nanoparticle displacement methods [145], quartz crystal microbalance [146, 147], and surface plasmon resonance [148150]. However, these technologies are garnered towards highly controlled in vitro systems, and will require further testing before application in a clinical setting.

Conclusions and perspectives

The clinical potential of glycoprotein biomarkers in cancer is indisputable. Some valuable successes have been achieved in the field, yet there is much room for improvement. The majority of the tools currently available have proven their utility beyond doubt when used for qualitative and characterization purposes. However, for each of these technologies, the leap from analytical to quantitative applications has not been sufficiently successful.

Over the next decade, the major goal will be the reliable detection and quantification of the full scope of glycan heterogeneity on any particular glycoprotein of interest, and the ability to differentiate these patterns between homeostatic and disease conditions. When the recent literature is searched for 'glycosylation quantification', it quickly becomes obvious that MS-based approaches have almost become an absolute requirement. However, when viewing the field as a whole, one gets an appreciation of the fact that MS advancements alone will not bring a major breakthrough. In the near future, the development of even more new glycan-recognition agents can be expected, such as novel naturally occurring or recombinant lectins, carbohydrate-recognizing antibodies, aptamers, and other glycobiosensors. Progress is also being made in the engineering and synthetic production of protein glycosylation, which will greatly aid in the creation of standards and uniform model systems for development of precise quantitative assays. In the near future, the predominance and expansion of immuno-based and lectin-based methods in practical applications of glycan quantification can be expected, especially given the recent advances in microarray technology. We believe that MS holds the greatest potential, but it is still hampered by a number of technical limitations, which will require significant technological progress to be made before it will be sufficiently reliable and applicable in the most appropriate manner. Nonetheless, we believe that MS is the most promising tool for detection and quantification of the full scope of protein-associated glycosylation down to the single monosaccharide unit level. The future appears bright, and progress in the field is inevitable; the only uncertainty is how long it will take.

Authors' information

At the time of submission of the manuscript, Uros Kuzmanov was a PhD student in the laboratory of Dr Eleftherios P. Diamandis, Professor of Laboratory Medicine and Pathobiology at the University of Toronto. Dr Hari Kosanam is a postdoctoral fellow in the same laboratory.