Fast quantitative urinary proteomic profiling workflow for biomarker discovery in kidney cancer
- 208 Downloads
Urine has evolved as a promising body fluids in clinical proteomics because it can be easily and noninvasively obtained and can reflect physiological and pathological status of the human body. Many efforts have been made to characterize more urinary proteins in recent years, but few have focused on the analysis throughput and detection reproducibility. Increasing the urine proteomic profiling throughput and reproducibility is urgently needed for discovering potential biomarker in large cohorts.
In this study, we developed a fast and robust workflow for streamlined urinary proteome analysis. The workflow integrate highly efficient sample preparation technique and urinary specific data-independent acquisition (DIA) approach. The performance of the workflow was systematically evaluated and the workflow was subsequently applied in a proof-of-concept urine proteome study of 21 kidney cancer (KC) patients and 22 healthy controls.
With this workflow, the entire sample preparation process takes less than 3 h and allows multiplexing on standard centrifuges. Without pre-fractionation, our newly developed DIA method allows quantitative analysis of ~ 1000 proteins within 80 min of MS time (~ 15 samples/day). The quantitation accuracy of the whole workflow was excellent with median CV of 9.1%. The preliminary study on KC identified 125 significantly changed proteins.
The result suggested the feasibility of applying the high throughput workflow in extensive urinary proteome profiling and clinical relevant biomarker discovery.
KeywordsUrine Proteome profiling Data independent acquisition Biomarker discovery
a simple and integrated spintip-based proteomic technology
normalized collision energy
false discovery rate
variable window DIA
retinoic acid receptor responder protein 1
Food and Drug Administration
Kidney cancer (KC) accounts for about 3% of all adult cancers and is one of the most fatal diseases of the genitourinary system . Early diagnosis of KC is crucial for the patients as it associates with excellent 5-year survival rate. However, around 30% of KC patients are diagnosed at the metastatic stage and are characterized with poor prognosis [2, 3]. The pathogenesis of KC have not been fully elucidated and there is still no acknowledged biomarker which can be used for its diagnosis or prognosis . Discovery of KC related potential biomarkers has important clinical significance.
Urine is an ideal detection objective for identifying biomarkers of urological malignancies as it can be easily and non-invasively obtained and contains cells and proteins that originate from urogenital system [5, 6]. Furthermore, as a glomerular filtrate of plasma, the urine proteome can reflect physiological and pathological status of the human body . Recent studies show that urinary proteome has great potential in classification and diagnosis both urogenital and systemic diseases [8, 9, 10, 11, 12, 13, 14]. However, urine is also a difficult proteomic sample to work with, due to its low protein concentration, high dynamic range of protein expression and high inter-individual variability. Compared with human plasma and tissue proteome, the urinary proteome has been relatively less studied.
Mass spectrometry (MS) has evolved as the mainstay for high-throughput proteome profiling of complex biological samples recently. Many efforts have been made to characterize urinary proteomes using MS in recent years. The regular urinary sample preparation process usually include protein extracted by organic solvent precipitation, protein re-dissolving and overnight digestion. Furthermore, to increase the detection depth, protein or peptide fractionation prior to LC–MS-analysis is currently widely employed. These processes typically include gel electrophoresis, ion exchange chromatography or high-PH reversed phase (RP) chromatography [15, 16]. Using such strategies, it has now become possible to identify more than 1500 proteins in a single urine sample  and even more than 6000 proteins when using hundreds of fractions from multi-dimensional separation strategies and pooled urine from dozens of humans . However, the pre-fractionation steps are undesirable in clinical application because they raise pre-analytical perturbations and restrict throughput. Due to the exertive processes of sample preparation and limited throughput, small sample size is usually enrolled in the discovery phase of current urinary biomarker studies, which reduces the credibility of the discovered biomarkers . Up to now, few urinary biomarkers derived from ‘discovery’ studies have been successfully translated into clinical practice . Increasing the urine proteomic detection throughput and reproducibility is urgently needed for discovering and verifying biomarkers with large sample cohorts.
A high-throughput proteome profiling workflow should include both efficient sample preparation method as well as robust MS acquisition approach. For sample preparation, the development of integrated proteomics sample preparation technologies has attracted increasing attention recently. Mann’s group reported an in-StageTip approach for quickly processing of cell and plasma samples in less than 2 h [21, 22]. Recently, a simple and integrated spintip-based proteomic technology (SISPROT) was developed by our group and has been successfully used in proteomic profiling of cell , tissue , microbiome , and plasma . It could be of great significance to apply such technologies for urinary sample preparation. In terms of MS analysis, data-dependent acquisition (DDA) is remaining the most widely adopted MS approach for untargeted screening in discovery proteomics. However, the semi-stochasticity of precursor ion selection and non-uniformity of scan point leads to poor reproducibility and compromise the accuracy of quantitative analysis [27, 28]. Compared with DDA, data-independent acquisition (DIA) method is emerging as a powerful technology for quantitative proteomics recently . The DIA approach divides all the precursor ions into several consecutive windows for fragmentation, and acquires all the resulting fragments in each isolation window. It features with high-throughput, quantitative consistency, and traceable data, which is very suitable for proteomic biomarker discovery studies [30, 31].
The first-morning urine samples of 21 KC patients and 22 healthy controls were collected from First Hospital of Xiamen. The KC patients were diagnosed by histopathological examination and none had undergone nephrectomy before sample collection. Samples with hematuria or proteinuria were excluded from the study. The clinical information of the KC patients is provided in Additional file 1: Table S1. Samples were centrifuged at 2000×g for 10 min at 4 °C to sediment cellular fragments and then stored at − 80 °C before analysis. For construction of project specific spectral library, two pooled urinary samples were prepared by mixing equal volume of each sample in each condition.
Urine sample preparation
Urine samples were prepared with a centrifuge-based protocol, which combined ultrafiltration and recently developed SISPROT technology  with optimization for urine. 2 mL neat urine were concentrated using an ultrafiltration devices (Amicon® Ultra-4 10 K, Merck) according to the instructions. The protein concentration of each urine were measured using Bradford protein assay (Bio-Rad, USA). The brief procedure for SISPROT is as following. Firstly, the SISPROT tip was made by filling 5 pieces of C18 disk (3 M Empore, USA) and then 2 mg POROS SCX beads (Applied Biosystems, USA) into a standard 200 μL pipette tip. Secondly, the urine sample (~ 20 μg of protein) was acidified with 0.1% (v/v) formic acid to pH 2–3 before loading onto the SISPROT tip by centrifugation. After washing with 20% (v/v) acetonitrile (ACN) in 8 mM potassium citrate buffer (pH 3), the proteins were then reduced with 10 mM Tris (2-carboxyethyl) phosphine hydrochloride (TCEP) in 9 mM potassium citrate buffer (pH 3) for 15 min (at room temperature). Protein digestion was subsequently carried out by injecting into the tip with 2 μg/μL trypsin (Promega, Madison, WI) in 10 mM iodoacetamide (IAA), 100 mM Tris–HCl (pH 8), and incubating in darkness for 1 h (at room temperature). The peptides were subsequently transferred to the C18 disk with 200 mM ammonium formate (AF, pH 10). After washing with 5 mM AF (pH 10), the peptides were directly eluted by 40 μL of 80% (v/v) ACN in 5 mM AF (pH 10) or fractionated by a stepwise increasing gradient of ACN (3, 6, 9, 15, and 80%) in 5 mM AF (pH 10). Finally, the obtained peptides were dried and resuspended in 10 μL of 0.1% (v/v) formic acid with iRT peptides (Biognosys, Schlieren, Switzerland) adding into each sample according to manufacturer’s instructions for LC–MS analysis. The entire sample preparation process takes less than 3 h and can be multiplexed on centrifuges with excellent reproducibility.
DDA acquisition and data analysis
DDA analysis was performed on an Orbitrap Fusion mass spectrometer connected to an EASY-nLC 1000 system (Thermo Fisher Scientific). The peptide separation was performed on an integrated spray-tip analytical column (75 μm i.d. × 20 cm) packed with 1.9 μm ReproSil-Pur 120 Å C18 resins (Dr. Maisch GmbH, Ammerbuch, Germany). A binary buffer system of 0.1% (v/v) formic acid in water (buffer A) and 0.1% (v/v) formic acid in ACN (buffer B) was used for separation at a flow rate of 250 nL/min. The injection volume is 2 μL. A 80 min gradient was performed as follows: from 3 to 7% B in 2 min, from 7 to 22% B for 50 min, from 22 to 35% B for 10 min, from 35 to 90% B in 2 min, and held at 90% B for 16 min. The DDA method consisted of a full MS scan over m/z range of 350–1550 at a resolution of 120,000 in the Orbitrap mass analyzer followed by data dependent MS/MS scans with a Top Speed method (3 s). MS/MS was carried out in the Orbitrap mass analyzer with a resolution of 15,000 using an isolation window of 1.6 m/z and HCD fragmentation with normalized collision energy (NCE) of 30%. The dynamic exclusion time was set to 40 s.
The Sequest HT  node integrated within the Proteome Discoverer (PD) software (Version 2.1, Thermo Fisher Scientific) was applied to search the raw data against the human Uniprot fasta database (70,947 entries, downloaded on Mar 10, 2017) appended with the Biognosys iRT peptides sequence. A maximum missed cleavages of two were allowed. Carbamidomethylation of cysteine was chosen as static modification, while oxidation of methionine and deamidation of asparagine and glutamine were set as dynamic modifications. False discovery rate (FDR) was set to 1% for peptide spectrum matches and proteins. MaxQuant  (version 220.127.116.11) was applied for the label-free quantification (LFQ) analysis of proteins with default settings. The same protein sequence database and same modification parameters were used as the PD search. The FDR was controlled as 1% for both peptide spectrum matches and proteins.
Generation of project specific library
For generation of project specific library, the pooled urine samples were measured in two ways. The samples were fractionated into 5 fractions with high-PH RP fractionation strategy and all the fractions were measured by DDA. Unfractionated samples were analyzed with DDA in four technical replicates. The injection volume is 2 μL for unfractionated sample and 5μL for fractionated components. The acquired DDA data were searched with PD software and the search result was imported to Spectronaut 11.0 (Biognosys) for library generation. The default settings were used as follows: fragment ions were selected over m/z range of 300 to 1800, fragments per peptide were restricted from 3 to 6, and a FDR threshold was set as 0.01.
DIA sample acquisition and data analysis
The DIA experiments were carried out on the same instrument as DDA. The same LC separation conditions were applied. A variable window DIA approach (termed as vDIA) specific for urine samples was developed. The precursor m/z distribution was obtained according to the DDA experiment of the pooled urine samples and the window list was established based on the criteria of equalizing the number of parent ions in each isolation window . Each DIA cycle consisted of a full MS scan over m/z range of 350–1400 with a resolution of 60,000 in the Orbitrap mass analyzer followed by 30 vDIA scans with a resolution of 30,000 and NCE of 30%. The cycle time was 2.4 s. The classic DIA approach contains a full MS scan and 32 DIA scans with fixed isolation width. The full scan was set over m/z range of 395–1205 with a resolution of 60,000; followed by DIA scans with 26 m/z fixed isolation width, resolution of 30,000 and NCE of 30% . Details of the two methods are listed in Additional file 2: Table S2 and Additional file 3: Table S3.
DIA data were analyzed with Spectronaut 11.0 (Biognosys) with default settings as follows: peak detection, dynamic iRT; correction factor, 1; interference correction on MS2 level, enabled; and cross run normalization, enabled. Protein inference was performed with the ID picker algorithm  in Spectronaut. The FDR cutoff was set as 1% for both peptide and protein levels.
For statistical analysis, the data were further analyzed by SPSS (version 18.0). Student’s t test was used to calculate the significance of protein intensity changes. Gene ontology (GO) enrichment was performed using DAVID (version 6.8) [37, 38].
Development of urinary sample preparation procedure
In this study, we tried to apply the SISPROT technology for high efficient urine sample preparation. Due to the relatively low protein concentration feature of urine sample and the limited loading volume of SISPROT device, it is impractical to use SISPROT technology to process urine sample directly. As an attempt, ultrafiltration technology was combined with SISPROT for sample preparation in our study. The neat urine were rapidly concentrated and desalted by ultrafiltration device. After protein concentration measurement, ~ 20 ug of protein filtrate were used for subsequent protein digestion on SISPROT. The digestion efficiency of urine sample on SISPROT was evaluated by calculating the missed cleavage rates of our DDA data by performing an additional search in the PD software allowing for maximum missed cleavages of ≤ 5. As shown in the Additional file 4: Table S4, ~ 99% of PSMs have a missed cleavage number of ≤ 2, which indicates the high efficiency of the SISPROT technology in processing urine samples. The entire sample preparation process takes less than 3 h and allows multiplexing on standard centrifuges.
Development of urine specific DIA method
The reproducibility was evaluated by calculating the percentage of peptides and proteins that identified in common over three repeated acquisitions. For DDA method, 881 proteins and 4440 peptides were identified in total, and only 547 (62%) proteins and 2095 (47%) peptides identified in common cover three replicates. The identification reproducibility was poor due to the semi-stochastic nature of DDA and may also due to the high protein diversity in both abundance and composition of urine sample. The data reproducibility was greatly improved in both protein and peptide levels when using DIA methods. Comparing with classic DIA approach, the number of proteins and peptides identified by vDIA method have increased 15.2% and 20.8%, respectively, with improved reproducibility. The identification reproducibility is as high as 98% in protein level with our developed vDIA method.
In order to compare quantitative performance, the DDA data were analyzed using MaxQuant for label-free quantification (Fig. 2c, d) . As a result, 402 proteins were quantified with a median coefficients of variation (CV) of 9.1% for DDA. For classic DIA and vDIA methods, 855 proteins and 1008 proteins were quantified, respectively. Compared to the standard DDA method, the vDIA method quantified 2.1 times peptides and 2.5 times proteins with a much better quantitative precision. It is worth mentioning that median CV% of our vDIA method in protein quantification is as low as 4.8%, with 73% of proteins showing CVs of ≤ 10% and 87% of proteins showing CVs of ≤ 20%.
Use of resource libraries for analysis of DIA data
Performance of the fast profiling workflow
Our integrated workflow enables reproducibly quantification of 986 proteins with a 80 min single-shot DIA analysis. As illustrated in Fig. 4d, the dynamic range of quantified protein intensity cover nearly five orders of magnitude. Among them, uromodulin and albumin were the two most abundant proteins, and PERG-1 (Retinoic acid receptor responder protein 1) was the least abundant one (Additional file 8: Table S6). The concentrations of some of the proteins were reported in the tens of pg/mL level, such as Interleukin-1 receptor type 2 (110 pg/mL), 60 kDa heat shock protein (90 pg/mL) and fractalkine (40 pg/mL) .
GO analysis was performed to provide insight into the cellular component of the quantified proteins. As illustrated in Fig. 4e, GO terms related to extracellular proteins (such as extracellular exosome, extracellular space and extracellular region), plasma membrane and cytosol were most overrepresented, which was consistent with recent large-scale urinary proteomic studies [18, 42, 43]. As urine is a glomerular filtrate of plasma, we compared our urinary proteome data with a recently reported plasma proteome data to check how many plasma-related proteins could be detected in urine. The plasma proteome dataset has a comparable detection depth of 1040 proteins and was reported by Mann’s group through extensive peptide fractionation and 16 h of MS time . As shown in Fig. 4f, a total of 393 (41.5%) of the proteins identified in the deep plasma proteome were common to the proteins quantified by our fast urinary proteome profiling (40.6%). This result was largely consistent with previous report that approximately one-third of urinary proteins originate from the plasma proteins . We further searched for the presence of approved blood biomarkers in our urinary proteome dataset. Among the 109 Food and Drug Administration (FDA) approved blood biomarkers , 53 (~ 50%) of them were reproducibly quantified in our list (Additional file 8: Table S6). As urine can be obtained in a more convenient and non-invasive way compared with blood, it would be significant if these blood biomarkers could be confirmed as urinary biomarkers. In addition, as a biofluid closest to kidney, urine was thought to contain rich information which could reflect kidney function and an ideal source for discovering kidney disease biomarkers. Lately, Gao’s group reviewed 38 reported urinary candidate biomarkers of glomerular and tubular injury . As indicated in Table S4, 25 of those biomarkers can be detected in our list. All the above results indicate the great potential of our fast urinary proteomic profiling in disease relevant biomarker detection.
High-throughput urinary proteome profiling of KC
Many efforts have been made to characterize more urinary proteins in recent years, but few have focused on the analysis throughput and detection reproducibility. In this study, we aim to develop a fast and robust urinary proteome profiling platform that can be applied in large-scale clinical detection. We reasoned that the entire process of such a platform, including sample preparation, MS acquisition and data analysis, ought to be fast, reproducible and convenient. In this regard, sample preparation processes were tried to be simplified and pre-fractionation procedures were overleaped. According to the characteristics of urine samples, a centrifuge-based protocol, which combined ultrafiltration and the SISPROT technology, was developed for urine sample processing. Starting with 2 mL of neat urine, the entire sample preparation process takes less than 3 h and allows multiplexing on standard centrifuges. For MS analysis, a urine specific vDIA method of 80 min gradient was developed (~ 15 samples per day). The acquired DIA data was subsequently analyzed with commercial available Spectronaut software. The whole workflow takes less than 5 h (Fig. 1). The performance of the whole workflow was evaluated, and both the detection reproducibility and quantitative precision were found excellent. It is noteworthy that the quantification precision of a < 10% median CV of the workflow is much smaller than the biological CV (reported as 60–70%) , which allows us to omit technical replicates for both sample preparation and MS acquisition, and to focus on other biological samples, which greatly improves the analysis throughput.
The high throughput workflow allows quantitative analysis of ~ 1000 proteins in single urine sample, which covers low abundance urinary proteins with reported concentration in the pg/mL level. It is worth mentioning that, comparing with plasma, urine is easier to be profiled to a relatively deep protein depth. With our study, over one thousand proteins can be detected with 80 min of MS time. But for plasma proteome, due to the negative effects of high abundance blood proteins, a comparable detection depth can only be obtained through extensive peptide fractionation and consuming more than 10 h of MS time. It could be a distinct advantage of urine over blood as a non-invasive biofluid in future clinical proteomics.
The workflow was further applied in a proof-of-concept urine proteome study of KC. During the analysis of a total of 43 samples, the workflow presented satisfactory stability and reproducibility in protein detection. 125 proteins were found differentially expressed between the KC patients and healthy controls. Some of the proteins have been detected as potential biomarkers for KC or associated with tumor growth and proliferation in previous studies, such as Multimerin-2 [47, 48], transmembrane protein 106A [49, 50], apolipoprotein D , vasorin , ras-related protein Rab-14 , retinol-binding proteins 5 , neuronal growth regulator 1  and prolactin-inducible protein [56, 57].
However, these significantly changed proteins can’t yet be treated as potential biomarkers for KC in current preliminary study. Due to the unique characteristics of urine, such as high inter-individual variability and susceptibility to various condition of the urinary system, larger sample cohorts and patients with non-malignant kidney disease controls are needed in a comprehensive discovery study to obtain more conclusive result. Another problem with current MS-based biomarker discovery studies on urine is that there is a lack of consistency in normalization of protein concentration. In this study, to correct for the different dilution factors among urine samples, same protein amount were used for digestion for different samples. The potential problem with this widely used total protein normalization strategy is that samples with proteinuria or hematuria could weaken the differences in biomarker levels between the patients and controls . Urinary creatinine-based normalization is another widely applied strategy. The potential problem with the strategy is that the excretion of creatinine is susceptible to renal function, muscle mass and metabolism, and can be dependent on exogenous factors such as age, gender and mental state . Further studies needed to be carried out to compare those normalization methods. Due to the limit of current sample size and significantly more effort for developing commonly accepted normalization method, such comprehensive study on KC is out of the scope of this study and will be carried out in our future study. However, the current proof-of-concept study suggested the feasibility of our workflow in extensive urine proteome profiling and clinical relevant biomarker discovery.
In this study, we develop a fast and robust workflow for streamlined urinary proteome analysis. The workflow integrate highly efficient centrifuge-based sample preparation technology and urinary specific DIA approach, that enables reproducibly quantification of ~ 1000 proteins with 80 min MS time. Compared to the standard DDA strategy, our DIA method quantified 2.1 times peptides and 2.5 times proteins with better quantitation precision. In addition, we evaluated the feasibility of using resource libraries for DIA data analysis. Although the overall performance of the public resource libraries is still not as good as that of project specific library, they present great potential to be used in future data analysis. The workflow was subsequently applied to a urine proteomic study of KC. A total of 43 samples were analyzed by the integrated workflow within 3 days. 125 proteins were found differentially expressed in KC patients. The result suggested the feasibility of applying the workflow in extensive urinary proteome profiling and clinical relevant biomarker discovery.
LL and RT designed the study. LL and QY performed the experiments, carried out the data analysis and prepared the manuscript. JZ and ZC performed the sample collection and assisted with sample preparation. RT provided technical assistance in SISPROT approach. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Consent for publication
Ethics approval and consent to participate
Informed consent was obtained from all participants included in the study. The study was approved by the institutional reviewer board.
This work was supported by the National Natural Science Foundation of China (21605076 and 21575057), the Shenzhen Innovation of Science and Technology Commission (JCYJ20170412154126026 and JCYJ20160229153100269), and the Ministry of Science and Technology of China (2016YFA0501403 and 2016YFA0501404).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 3.Masui O, White NMA, DeSouza LV, Krakovska O, Matta A, Metias S, Khalil B, Romaschin AD, Honey RJ, Stewart R, et al. Quantitative proteomic analysis in metastatic renal cell carcinoma reveals a unique set of proteins with potential prognostic significance. Mol Cell Proteomics. 2013;12:132–44.CrossRefGoogle Scholar
- 44.Pieper R, Gatlin CL, McGrath AM, Makusky AJ, Mondal M, Seonarain M, Field E, Schatz CR, Estock MA, Ahmed N, et al. Characterization of the human urinary proteome: a method for high-resolution display of urinary proteins on two-dimensional electrophoresis gels with a yield of nearly 1400 distinct protein spots. Proteomes. 2004;4:1159–74.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.