Deep undepleted human serum proteome profiling toward biomarker discovery for Alzheimer’s disease
Blood-based protein measurement is a routine practice for detecting biomarkers in human disease. Comprehensive profiling of blood/plasma/serum proteome is a challenge due to an extremely large dynamic range, as exemplified by a small subset of highly abundant proteins. Antibody-based depletion of these abundant proteins alleviates the problem but introduces experimental variations. We aimed to establish a method for direct profiling of undepleted human serum and apply the method toward biomarker discovery for Alzheimer’s disease (AD), as AD is the most common form of dementia without available blood-based biomarkers in clinic.
We present an ultra-deep analysis of undepleted human serum proteome by combining the latest 11-plex tandem-mass-tag (TMT) labeling, exhaustive two-dimensional liquid chromatography (LC/LC) fractionation (the 1st LC: 3 h for 180 fractions, and the 2nd LC: 3 h gradient per fraction), coupled with high resolution tandem mass spectrometry (MS/MS). AD (n = 6) and control (n = 5) sera were analyzed in this pilot study. In addition, we implemented a multiplexed targeted LC–MS3 method (TOMAHAQ) for the validation of selected target proteins.
The TMT–LC/LC–MS/MS platform is capable of analyzing 4826 protein components (4368 genes), covering at least 6 orders of magnitude in dynamic range, representing one of the deepest serum proteome analysis. We defined intra- and inter- group variability in the AD and control groups. Statistical analysis revealed differentially expressed proteins in AD (26 decreased and 4 increased). Notably, these altered proteins are enriched in the known pathways of mitochondria, fatty acid beta oxidation, and AGE/RAGE. Finally, we set up a TOMAHAQ method to confirm the decrease of PCK2 and AK2 in our AD samples.
Our results show an ultra-deep serum discovery study by TMT–LC/LC–MS/MS, and a validation experiment by TOMAHAQ targeted LC–MS3. The MS-based discovery and validation methods are of general use for biomarker discovery from complex biofluids (e.g. serum proteome). This pilot study also identified deregulated proteins, in particular proteins associated with mitochondrial function in the AD serum samples. These proteins may serve as novel AD candidate biomarkers.
KeywordsAlzheimer’s disease Biomarker Human blood Plasma Serum Mass spectrometry Proteomics Proteome Tandem mass tag
tandem mass tags
two dimensional liquid chromatography coupled with tandem mass spectrometry
triggered by offset multiplexed accurate mass high resolution absolute quantification
Cellular and biochemical components in blood play a central role in human physiology and their dynamic levels are considered to correlate with an individual’s healthy and diseased states [1, 2]. Blood is an exceptionally complex fluid, comprised of cells (i.e. red and white blood cells and platelets) and plasma (the liquid part) from which serum is collected after removing clotting factors with adequate coagulation. Human plasma/serum contains extraordinary diverse proteins, secreted from all types of cells and tissues for normal physiological function, leaked from damaged cells and tissues especially under disease conditions, or released from infectious organisms. Measuring various protein concentrations in plasma/serum is routine in clinical practice. The concentration dynamic range spans at least 10 orders of magnitude, from the most abundant albumin (~ 50 mg/ml) to cytokines of low abundance (e.g. 4.2, 7.4 and 11.2 pg/ml for interleukin-6, interleukin-17 and TNF-α, respectively) in normal individuals [3, 4]. This extremely high dynamic range raises a significant challenge for profiling the complete plasma/serum by a proteomics platform, commonly based on liquid chromatography-tandem mass spectrometry (LC–MS/MS). Depletion of highly abundant plasma proteins is often used to alleviate the dynamic range challenge, as the top 22 abundant proteins occupy approximately 99% of the total protein mass . The depletion may be achieved by affinity columns immobilized with antibodies against the top abundant proteins [5, 6, 7]. However, there are multiple caveats associated with the depletion method: (i) the antibodies are never completely specific and may remove other nonspecific proteins; (ii) the depletion is performed under non-denaturing condition, leading to co-immunoprecipitation and removal of antigen-bound proteins; and (iii) the depletion step generates significant experimental variations .
Advances in mass spectrometry (MS)-based proteomics [8, 9], especially in LC separation power and MS resolution and scan rate, enable the profiling of more than 15,000 proteins (> 12,000 genes) from mammalian tissue samples [10, 11]. Protein quantification can be achieved through data dependent acquisition (e.g. label free method and stable isotope labeling) , as well as data-independent acquisition . Tandem-mass-tag (TMT) is a commonly used stable isotope labeling method, which allows up to 11-plexed analysis [14, 15]. Although the accuracy of TMT measurement is often affected by ion co-elution-induced ratio compression, this issue is largely addressed by the MS3 method  or the combination of extensive LC fractionation, MS optimization, and computational correction . With the success of tissue profiling [18, 19], we attempted to apply this latest TMT–LC/LC–MS/MS technology to analyze blood-based complex biofluids for Alzheimer’s disease (AD) biomarker discovery.
Following the discovery of putative biomarkers, it is necessary to validate these candidates in large clinical cohorts, usually by Ab-based approaches or targeted MS methods , such as selected single, multiple and parallel reaction monitoring (SRM, MRM, and PRM, respectively) [20, 21]. More recently, Triggered by Offset, Multiplexed, Accurate mass, high resolution, and Absolute Quantitation (TOMAHAQ) has been reported as an isobaric targeted method [22, 23]. For each targeted peptide quantification, TOMAHAQ implements a synthetic, TMT0-labeled peptide, which is used to trigger the quantification of native target peptide by MS3, based on a pre-selected offset mass. During the generation of MS3 spectra, synchronous precursor selection (SPS) can improve quantification accuracy by selecting pre-defined b- or y ions in MS2.
Alzheimer’s disease is the most common form of dementia and the sixth-leading cause of death in the US, affecting more than 5 million Americans with a healthcare cost of $236 billion in 2016 . By 2050, AD patients are projected to reach 13.8 million in the US  and 100 million worldwide . Currently, AD diagnosis is based on patient’s symptoms, memory and behavior tests, brain imaging, as well as post-mortem brain pathological assays [26, 27]. Blood-based biomarkers, however, are not available for AD, and most proposed candidates are derived from known disease mechanisms, such as Aβ and tau [28, 29]. Here we present the unbiased, large-scale profiling of human serum specimens, revealing consistent mitochondrial protein changes between control and AD samples.
Patient sample description
Human blood sera were collected from control (n = 5) and AD patients (n = 6), provided by the Brain and Body Donation Program at Banner Sun Health Research Institute, with approval for this study. Clinical and pathological diagnoses were based on established criteria . All subjects consented to the study, and informed consent was obtained from each entrant. After clotting and centrifugation, the sera were frozen and stored at − 80 °C in aliquots of polyethylene tubes until use.
Serum protein extraction and quantification
Human serum proteins were extracted in fresh lysis buffer [50 mM HEPES, pH 8.5, 8 M urea, and 0.5% sodium deoxycholate with 1 × phosphatase inhibitor cocktail (PhosSTOP, Sigma-Aldrich)]. The protein concentration was measured by the BCA assay (Thermo Fisher Scientific) and confirmed by Coomassie-stained short SDS gel as previously described . The protein lysates were stored at − 80 °C in aliquots before use.
Protein digestion and TMT labeling
The digestion and labeling were performed based on an optimized protocol [32, 33]. Quantified protein (~ 0.1 mg in the lysis buffer with 8 M urea) for each TMT channel was directly digested with Lys-C (Wako, 1:100 w/w) at 21 °C for 2 h, diluted four-fold to lower urea concentration to 2 M, and further digested with trypsin (Promega, 1:50 w/w) at 21 °C overnight. The digestion was terminated by the addition of 1% trifluoroacetic acid (TFA) with centrifugation. The supernatant was desalted with Sep-Pak C18 cartridge (Waters), and then dried by a speedvac vacuum concentrator. Each sample was re-dissolved in 50 mM HEPES, pH 8.5, reacted with TMT reagents, pooled equally, and desalted again before LC/LC–MS/MS.
Extensive LC/LC-MS/MS analysis
The pooled TMT labeled peptides were resolved by offline basic pH reverse phase LC, and acidic pH reverse phase LC coupled with MS/MS analysis . The setting of basic pH LC included a XBridge C18 column (3.5 μm particle size, 4.6 mm × 25 cm, Waters), buffer A (10 mM ammonium formate, pH 8.0), buffer B (95% acetonitrile, 10 mM ammonium formate, pH 8.0) , and a 3 h gradient of 15–35% buffer B. Each fraction was collected every minute, ending with a total of 180 fractions. In the acidic pH LC-MS/MS analysis, each previous fraction was analyzed on a column (75 µm × 25 cm, heated to 65 °C to reduce backpressure) coupled with a Q Exactive HF Orbitrap mass spectrometer (Thermo Fisher Scientific). Peptides were resolved by a 3 h gradient (buffer A: 0.2% formic acid, 5% DMSO; buffer B: buffer A plus 65% acetonitrile). MS settings included MS1 scans (60,000 resolution, 1 × 106 AGC and 50 ms maximal ion time) and 20 data-dependent MS2 scans (410–1600 m/z, 60,000 resolution, 1 × 105 AGC, ~ 150 ms maximal ion time, HCD, 32% normalized collision energy, and ~ 15 s dynamic exclusion).
Identification and quantification of proteins by JUMP software suite
The bioinformatics processing of identification was carried out with our recently developed JUMP search engine, which combines the advantage of pattern- and tag-dependent scoring to improve sensitivity and specificity . A composite target-decoy database was used to estimate false discovery rate (FDR) . The protein database was generated by combining downloaded Swiss-Prot, TrEMBL, and UCSC databases and removing redundancy (human: 83,955 entries). Major parameters were precursor and product ion mass tolerance (± 15 ppm), full trypticity, two maximal missed cleavage, static mass shift for TMT tags (+ 229.16293 on Lys and N-termini) and carbamidomethyl modification (57.02146 on Cys), dynamic mass shift for oxidation (+ 15.99491 on Met), and three maximal modification sites. The resulting PSMs were filtered by mass accuracy, and then grouped by precursor ion charge state followed by the cutoffs of JUMP-based matching scores (J-score and ΔJn) to reduce FDR below 1% for proteins. When the same peptide is derived from numerous homologous proteins, the peptide was matched to the protein with the top PSM number, according to the rule of parsimony. The quantification was performed as previously described .
Calculation of abundance index of identified proteins by PSMs
The absolute protein abundance index of serum proteome was calculated based on previously reported methods [37, 38], using the total number of PSMs matched to a particular protein, normalized by theoretically detectable peptides from the protein. It was derived by the formula: (the number of PSMs/the number of theoretically detectable peptides) × a scale factor. The scale factor was set to 5000, which generated abundance indexes that were roughly equivalent to protein copy numbers per cell during deep proteomics analyses.
Evaluation of sample variations and principal component analysis
The measurement variation was analyzed according to intra- and inter-group replicates. The ratios of all proteins from the samples were modeled with a Gaussian distribution to evaluate standard deviation (SD). Principal component analysis (PCA) was used to visualize the differences among human disease groups. Relative expression of all proteins was used as input of PCA, using a R statistical analysis package (version 3.4.0) .
Differential expression (DE) analysis, pathway enrichment and protein–protein interaction (PPI) analysis
DE analysis was determined by student t test in the following steps: (i) calculating p values and applying a threshold of 0.05; (ii) filtering by at least 1.5 fold of the standard deviation in the analysis; (iii) manually examining all proteins to remove proteins quantified by only one peptide.
Pathway enrichment analysis was used to infer functional groups of proteins enriched in a given pathway. The analysis was performed using Fisher’s exact test (p value) with the BH correction for multiple testing (BH FDR). Enriched pathways with FDR < 0.05 were considered statistically significant.
DE proteins were matched to a composite PPI database by integrating STRING (v10) , BioPlex , and InWeb_IM , including 18,515 proteins and 469,993 PPI connections. Modules in each protein cluster were defined as previously reported . Modules were annotated by Gene Ontology, KEGG or Hallmark.
TOMAHAQ targeted LC–MS3 analysis
The TOMAHAQ assay was based on the initially reported protocol . Selected peptides were synthesized, purified (at least 95% purity), and dissolved in 20% acetonitrile. The peptides were labeled by a TMT0 reagent (Thermo Fisher Scientific), desalted, and spiked into the TMT11-labeled pooled samples. The amount of TMT0-labeled synthetic peptides was adjusted to ensure detection in MS1.
In the LC–MS3 analysis, the TMT0-TMT11 mixed samples were analyzed on a reverse phase LC coupled with MS3 analysis. The setting included a C18 column (50 µm × 15 cm, 1.9 μm particle size, heated to 65 °C to reduce backpressure), buffer A (0.2% formic acid, 5% DMSO) and buffer B (buffer A plus 65% acetonitrile) in a 1 h gradient of 10–35% buffer B at 250 nl/min, and an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific). The TOMAHAQ workflow comprises a sequence of decisions to prompt quantitative SPS-MS3 in multiple scans. In scan 1, survey MS1 scans (mass range: ± 50 m/z of target peptides, 60,000 resolution, 1 × 106 AGC and 100 ms maximal ion time) were used to detect one TMT0 labeled, synthetic trigger peptide (± 15 ppm). If the intensity threshold (1 × 105) was reached, the trigger peptide was fragmented in scan 2 (0.4 m/z isolation window, and ~ 35 NCE in CID) and detected by Orbitrap (15,000 resolution; 1 × 105 AGC; 50 ms maximal ion time). A “Product Ion Trigger” function was used to compare the trigger peptide MS2 spectra to a pre-determined MS2 product ion list (± 10 ppm). If at least 6 product ions were matched, it trigged scans 3 and 4 to analyze the corresponding target peptide, using a pre-selected offset (peptide-specific, e.g. 5.01 m/z for z = 2 and two TMT tags). In scan 3, target MS2 was collected (0.4 m/z isolation window, ~ 35 NCE in CID, 15,000 resolution; 1 × 105 AGC; 1000 ms maximal ion time). In scan 4, target MS3 was collected based on the previous MS2 and additional MS3 settings: Precursor Ion Exclusion (Low = 70, High = 5), Isobaric Tag Loss Exclusion (Reagent Tag Type = TMT to exclude “complement” MS2 ions), 0.4 m/z isolation window for 10 pre-defined MS2 product ions on a “Targeted Mass Inclusion List”, 55 NCE in HCD, 60,000 resolution, 1 × 105 AGC, and 2,500 ms maximal ion time.
Availability of data and materials
Results and discussion
Multiplexed quantitative analysis of undepleted human serum proteome
Estimation of the minimal fraction number to achieve high serum proteome coverage
In shotgun proteomics, longer analytical time is generally rewarded with higher peptide/protein coverage until the saturation point is reached. Indeed, prior to the analysis of the half of fractions (n = 90, every alternative fraction), identified peptides increased with fractions in an approximately linear fashion (Fig. 2b). After 100 fractions, the slope appeared to decrease dramatically, implicating that the analysis was close to saturation. To enhance the throughput of this platform, it is possible to analyze ~ 4000 proteins with the half of these fractions to balance coverage and MS usage.
Evaluation of sensitivity and dynamic range for the identified serum proteome
We next computed the abundance index based on PSMs after size normalization (see “Methods”) and evaluated the dynamic range in our dataset. The abundance index is consistent with known protein concentrations in the plasma database (R = 0.66, Fig. 3b, Additional file 1: Table S3). As to the dynamic range we covered, serum albumin has the highest concentration (7.6 × 1010 ng/L), and cardiac type troponin T2 (TNNT2) has the lowest concentration (3.0 ng/L), spanning a range of more than 10 orders of magnitude. Conservatively, looking at the 5% top and bottom quantile, the estimated dynamic range is 3.6 × 106 (Fig. 3b). The results indicate a broad dynamic range is covered by the deep analysis.
Quality control analysis and intra- and inter-group variations in AD-control serum proteomes
To fully compare intra- and inter-group variations, we obtained standard deviation values for all two-sample comparisons (n = 10 for the control group, n = 15 for the AD group, and n = 30 for the AD/control group). The averages of standard deviations in the control, AD, and AD/control comparisons were 0.75 ± 0.15, 0.73 ± 0.12, and 0.78 ± 0.14, respectively. Although the inter-group had slightly larger variations than the intra-group comparisons, there is no statistically significant difference, which may be due to the small cohort size, or large confounding factors, such as gender, age, genetic background, clinical treatment, and other pre- and post-sample collection variance [29, 45]. However, three-dimensional principal-component analysis (PCA) of all quantified proteins displayed the separation of control and AD cases (Fig. 4c), confirming the reproducibility of the analysis.
Serum proteomics reveals deregulation of mitochondrial pathways in AD cases
Consistently, the 30 DE proteins were enriched in mitochondria-related pathway, as well as the signaling of fatty acid beta oxidation and AGE/RAGE (Fig. 5d). Interestingly, several proteins (HSPA9, CYCS, DLD, and GATM) were also enriched in various pathways related to Alzheimer’s disease [46, 47, 48, 49]. Finally, we superimposed the DE proteins onto PPI network to extract functional modules that are assembled by interacting proteins to form functional units at a systems level. The PPI network was curated from the most commonly used databases, STRING , BioPlex , and InWeb_IM . Computational analysis identified 3 PPI modules, all related to mitochondrial function, including mitochondrial envelope (CYCS and GSTK1), intermembrane space (GATM and AGXT2), and matrix (AK2, DLD, HSPA9, HSD17B10, HSD17B8, and ECHDC2). Mitochondrial failure has been long proposed to play an important role in the development of Alzheimer’s disease [50, 51]. The master mitochondrial regulator PGC-1α  was reported to be dysregulated in AD brain during the progression of neuropathology and dementia, leading to the downregulation of mitochondrial genes including PCK2 , supporting our proteomic findings. Thus, comprehensive profiling of serum proteome revealed the change of key mitochondrial proteins in AD that may be relevant to disease development.
In this deep proteomics analysis, we also detected tau and APP proteins in the samples. However, these proteins did not show statistically significant difference between the control and AD samples, partially due to the limited sample size. Recently, Nakamura et al. developed an approach to measure plasma Aβ by immunoprecipitation (IP) and MS, and proposed an AD composite biomarker based on (APP)669–711/Aβ1–42 and Aβ1–40/Aβ1–42 ratios . The composite biomarker displayed high performance for predicting brain Aβ burden, and high correlation with Aβ1–42 in cerebrospinal fluid. Without the IP enrichment, the detailed ratio analysis could not be performed in our dataset. The IP-MS approach may be used to improve sensitivity for targeted biomarker candidates.
TOMAHAQ-based multiplexed approach for target validation in AD samples
We identified 4826 proteins and demonstrated high proteome coverage, sensitivity and reproducibility, as well as multiplexed targeted assays. Although extensive fractionation and long instrumentation time were employed in this pilot study, we propose to achieve similar results of ~ 4000 proteins within a reasonable time frame. This extensive TMT–LC/LC–MS/MS platform will be of general application for the measurement of complex clinical specimens. Remarkably, even in this small cohort, we identified consistent changes of 30 proteins in AD specimens compared to the non-dementia controls, in which 12 proteins were clustered to the mitochondria-related pathway. These novel protein signatures may be related to AD progression and have potential to be followed as biomarkers in a large scale investigation, possibly by the TOMAHAQ-based LC–MS3 assay.
To our knowledge, this study (30,506 peptides from 4826 proteins) represents one of the deepest, undepleted serum proteome profiling experiments from human biofluid. Previous studies usually attempted to increase the serum/plasma proteome coverage by immunodepletion of abundance proteins and extensive separation . In 2006, the combination of immunodepletion, chemical fractionation (isolating cysteinyl- peptides and glycol-peptides) and LC/LC–MS/MS, allowed the identification of 22,267 peptides from 3654 different proteins. In 2011, human plasma proteome datasets were compiled to produce a non-redundant list of 1929 proteins (20,433 peptides) of high confidence . In 2015, with the advance of better fractions and instrumentation, about 4600 proteins were analyzed in human plasma by immunodepletion, isobaric labeling and LC/LC–MS/MS. In 2017, the human plasma proteome draft included 3509 proteins identified at least two peptides, and about 1300 additional ambiguous proteins . The drawbacks of immunodepletion are the removal of non-targeted proteins, associated quantitative variability, and the cost of the antibody cartridge . Our study demonstrates the possibility to achieve deep analysis without the step of immunodepletion. However, all of these deep plasma/serum profiling experiments were time consuming due to a large number of fractions, which are not well suited for large clinical studies. Alternatively, a single-run, label-free protocol was introduced for rapid analysis of hundreds of plasma proteomes, and with additional pre-fractionation, interpretation of 1000 proteins became possible . Other approaches, such as SWATH, was used to quantify more than 300 plasma proteins in 232 plasma samples . Furthermore, the throughput of profiling of biofluids can be increased by sample multiplexing, such as iTRAQ/TMT labeling . Here, we adapted the TMT-derived TOMAHAQ method for targeted protein analysis. The integration of deep proteome coverage by extensive TMT–LC/LC-MS/MS in the discovery phase, and targeted measurement by TOMAHAQ in the validation phase, will represent a balance between comprehensive profiling and analytical time.
JP, BB, PCC, KKD, and TGB contributed to the conception and design of the project. T.G.B. provided the human specimens. KKD, MN, HT, ZP, AM and AAH performed the proteomics experiments. HW, KKD, BB, XW, YL, JHC, and JP analyzed and interpreted the data. JP, KKD, and HW wrote the manuscript. All authors read and approved the final manuscript.
The authors thank all other lab and facility members for helpful discussion, and Brian K. Erickson and Steven P. Gygi for TOMAHAQ consultation. The MS analysis was performed in the Center for Proteomics and Metabolomics; and the peptide synthesis was carried out in the Hartwell Center, both at St. Jude Children’s Research Hospital.
The authors declare that they have no competing interests.
Availability of data and materials
The mass spectrometry proteomics data have been deposited to the Proteome Xchange Consortium via the PRIDE partner repository with the dataset identifier PXD011482. Data are available via Proteome Xchange with identifier PXD011482. Reviewer account details: Username: email@example.com, Password: Q9Yjbjnr.
Consent for publication
The authors declare that they have no competing interests.
Ethics approval and consent to participate
All the participants have given the written informed consent. This research was approved by the Institutional Review Committee. All experiments were performed in accordance with the relevant guidelines and regulations.
This work was partially supported by National Institutes of Health grants R01AG047928 (J.P.), R01AG053987 (J.P.), U24NS072026 (T.G.B.), P30AG19610 (T.G.B.), Arizona Department of Health Services (contract 211002) (T.G.B.), the Arizona Biomedical Research Commission (contracts 4001, 0011, 05-901 and 1001) (T.G.B.), ALSAC (American Lebanese Syrian Associated Charities), and St Jude Children’s Research Hospital, partially supported by NIH Cancer Center Support Grant (P30CA021765).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 11.Stewart E, McEvoy J, Wang H, Chen X, Honnell V, Ocarz M, et al. Identification of therapeutic targets in rhabdomyosarcoma through integrated genomic, epigenomic, and proteomic analyses. Cancer Cell. 2018;34(411–26):e19.Google Scholar
- 39.Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5:299–314.Google Scholar
- 58.Sathe G, Na CH, Renuse S, Madugundu AK, Albert M, Moghekar A, et al. Quantitative proteomic profiling of cerebrospinal fluid to identify candidate biomarkers for Alzheimer’s disease. Proteomics Clin Appl. 2018 e1800105.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.