Introduction

Blood peptides

Blood peptides may be identified by C18 liquid chromatography electrospray ionization and tandem mass spectrometry (LC-ESI-MS/MS) [1]. The endogenous peptides of human blood were first identified by MS/MS fragmentation that demonstrated that a tryptic like endoproteinase activity cleaves peptides from proteins but an exopeptidase activity degraded the peptides creating a pseudo steady state [1,2,3,4,5]. The alternative RNA splicing of pre, pro or protein substrates combined with complex pathways of post translational processing may result in the cleavage of many peptides from proteins in circulation that may help mediate, or mark, important physiological processes [6]. Protein cleavage products from pro-opiomelanocortin, natriuretic peptides, insulin like growth factors, coagulation factor XIII, proglucagon-derived peptides, human kallikrein-related peptidase SERPINA1, ENOSF1, neurofilament medium polypeptide, circulating IGFBP-4 fragments and many others have been suggested to have some diagnostic or mechanistic importance [7,8,9,10,11,12,13,14,15,16,17,18]. Multivariate analysis provided about the same statistical power compared to univariate ANOVA of the main feature(s) [1, 19, 20]. Random and independent sampling of the endogenous tryptic peptides from clinical plasma samples revealed individual analytes that show significant variation by standard statistical tests such as the Chi Square test and ANOVA [1, 2, 4, 21,22,23]. Pre-analytical variation was exhaustively studied between fresh EDTA plasma samples on ice versus plasma samples degraded for various lengths of time to control for differences in sample handling and storage and showed the observation of peptides from many proteins may increase by on average twofold after incubation at room temperature [2,3,4] but that Complement C3 and C4B vary sharply with incubation time [2, 4] in agreement with previous results [1].

Sample preparation

Without pre-fractionation, only peptides from a few high abundance proteins may be observed by LC-ESI-MS/MS [24,25,26]. In contrast, with one step sample preparation by partition chromatography or differential centrifugation, low abundance proteins of ~ 1 ng/ml could be detected and quantified in blood samples by electrospray mass spectrometry [26,27,28]. The sensitive analysis of human blood fluids by LC–ESI–MS/MS is dependent on selective fractionation strategies, such as partition chromatography or organic extraction, to relieve suppression and competition for ionization, resulting in high signal to noise ratios and thus low error rates of identification and quantification [28]. Simple and single-use, i.e. disposable, preparative and analytical separation apparatus permits the identification and quantification of blood peptides and proteins with no possibility of cross contamination between patients that guarantees sampling is statistically independent [1, 2, 25,26,27]. Previously, the use of precipitation and selective extraction of the pellet [5, 27, 29, 30] was shown to be superior to precipitation and analysis of the ACN supernatant [31], ultra-filtration, [32] albumin depletion chromatography [33] or C18 partition chromatography alone [25]. Precipitating all of the polypeptides with 90% ACN followed by step-wise differential centrifugation with mixtures of organic solvent and water was the optimal method to sensitively detect endogenous peptides from cellular proteins in blood [24]. Here a ten-step gradient of acetonitrile/water with differential centrifugation to extract 200 µl of EDTA plasma for analysis by LC–ESI–MS/MS showed a high signal to noise ratio [24] and resulted in the confident identification of tryptic peptides [2] from ovarian cancer versus normal control samples.

Computation

Partitioning each clinical sample into multiple selective sub-fractions, that each must be separately resolved by analytical C18, provides sensitivity [24] but creates a computational challenge. Previously the 32 bit computer power was lacking to compare all the peptides of all the proteins of the many sub-factions from each patient in a large experiment [34]. At present the MS/MS spectra from random and independent sampling of peptides from thousands of LC–ESI–MS/MS may be fit to peptides using a 64 bit server and then compared across treatments using SQL SERVER/R that provides excellent data compression, relation and analysis [2, 21]. The protein p-values and FDR q-values as well as the peptide-to-protein distribution of the precursor ions of > 10,000 counts from organic extraction were confirmed against a null (i.e. known false positive) model of noise or computer generated random MS/MS spectra [2, 22, 35,36,37]. The standard SQL Server system permits the direct interrogation of the related data by the open source R statistical system without proteomic-specific software packages. Here for the first time the use of SQL/R has permitted the detailed statistical analysis of randomly and independently sampled LC–ESI–MS/MS data from multiple clinical locations and treatments in parallel that would be requisite for a multisite clinical trial.

Cancer proteins in blood fluids

Many non-specific, i.e. common, or so called “acute phase” proteins have been detected to increase by the analysis of blood fluids such as amyloids, complement, haptoglobin, alpha 1 antitrypsin, clusterin, (ApoJ), complement components, heat shock proteins, fibrinogens, hemopexin, alpha 2 macroglobulin and others that may be of limited diagnostic value [28, 38, 39]. There is good evidence that cellular proteins may exist in circulation, and even form supramolecular complexes with other molecules in the blood [40]. Proteins and RNA may be packaged in exosomes [41, 42] that are challenging to isolate and it appears that supramolecular complexes of proteins, including DNA/RNA binding proteins, from cells may exist in circulation [40, 43, 44]. Apolipoprotein A IV (APOA4) and vitamin D binding protein (VDBP) significantly discriminated malignant from benign cases of ovarian cancer but was not as good as CA125 for diagnostic accuracy [45]. A proteomic signature of ovarian cancer tumor fluid was identified and verified by targeted proteomics [46]. Protein Z was identified as a putative novel biomarker for early detection of ovarian cancer [47]. Cystatin B (CYTB) may be a potential diagnostic biomarker in ovarian clear cell carcinoma [48]. Here, the combination of step wise organic partition [24], random and independent sampling by nano electrospray LC–ESI–MS/MS, and large scale 64 bit computation with SQL SERVER/R [21] permitted the sensitive detection of peptides and/or phosphopeptides, and thus the presence of the parent protein chains and complexes, from human plasma for comparison of variation in ovarian cancer patients versus controls by the classical statistical approaches of the Chi Square test followed by univariate ANOVA [1, 22, 23].

Materials and methods

Materials

The HPLC was an Agilent 1100 (Santa Clara CA USA). The linear ion trap mass spectrometer was an LTQ XL (Thermo Electron Corporation, Waltham, MA, USA). The anonymous human EDTA plasma (9–20 per disease or normal control) with no identifying information was obtained from multiple clinical locations of St Joseph’s Hospital of McMaster University, The Ontario Tumor Bank of the Ontario Institute of Cancer Research, St Michaels Hospital Toronto, Amsterdam University Medical Centers, Vrije Universiteit Amsterdam, and IBBL Luxembourg under Ryerson Ethic Review Board Protocol REB 2015-207. The arbitrarily selected disease population samples were from patients that received a confirmed diagnoses of the disease indicated at the source institution. The plasma samples were collected before therapeutic intervention and no additional information about the samples were made available. C18 ZipTips were obtained from Millipore (Bedford, MA). C18 HPLC resin was from Agilent (Zorbax 300 SB-C18 5-micron). Solvents were obtained from Caledon Laboratories (Georgetown, Ontario, Canada). All other salts and reagents were obtained from Sigma-Aldrich-Fluka (St Louis, MO) except where indicated.

Sample preparation

Human EDTA plasma samples (200 μl) were precipitated with 9 volumes of acetonitrile (90% ACN) [27], followed by the selective extraction of the pellet using a step gradient to achieve selectivity across sub-fractions and thus greater sensitivity [24]. Disposable plastic 2 ml sample tubes and plastic pipette tips were used to handle samples. The acetonitrile suspension was separated with a centrifuge at 14,000 RCF for 5 min. The acetonitrile supernatant, that contains few peptides, was collected, transferred to a fresh sample tube and dried in a rotary lyophilizer. The organic precipitate (pellet) that contains a much larger total amount of endogenous polypeptides [27] was manually re-suspended using a step gradient of increasing water content to yield 10 fractions from those soluble in 90% ACN to 10% ACN, followed by 100% H2O, and then 5% formic acid [24]. The extracts were clarified with a centrifuge at 14,000 RCF for 5 min. The extracted sample fractions were dried under vacuum in a rotary lyophilizer and stored at − 80 °C for subsequent analysis.

Preparative C18 chromatography

The peptides of EDTA plasma precipitated in ACN, and extracted from the pellet in a step-gradient were then re-dissolved in 5% formic acid and collected over C18 preparative partition chromatography. Preparative C18 separation provided the best results for peptide and phosphopeptide analysis in a “blind” analysis [49]. Solid phase extraction with C18 for LC–ESI–MS/MS was performed as previously described [1, 25,26,27, 29]. The C18 chromatography resin (Zip Tip) was wet with 65% acetonitrile before equilibration in water with 5% formic acid. The plasma extract was dissolved in 200 μl of 5% formic acid in water. The resin was washed with at least five volumes of the same binding buffer. The resin was eluted with ≥ 3 column volumes of 65% acetonitrile (2 µl) in 5% formic acid. In order to avoid cross-contamination the preparative C18 resin was discarded after a single use.

LC–ESI–MS/MS

In order to entirely prevent any possibility of cross contamination, a new disposable nano analytical HPLC column and nano emitter was fabricated for recording each patient sample-fraction set. The ion traps were cleaned and tested for sensitivity with angiotensin and glu-fibrinogen prior to recordings. The new column was conditioned and quality controlled with a mixture of three non-human protein standards using a digest of Bovine Cytochrome C, Yeast alcohol dehydrogenase (ADH) and Glycogen Phosphorylase B to confirm the sensitivity and mass accuracy of the system prior to each patient sample set [35]. The statistical validity of the linear quadrupole ion trap for LC–ESI–MS/MS of human plasma [24] was in agreement with the results from the 3D Paul ion trap [22, 35,36,37]. The stepwise extractions were collected and desalted over C18 preparative micro columns, eluted in 2 µl of 65% ACN and 5% formic acid, diluted tenfold with 5% formic acid in water and 5% ACN, and immediately loaded manually into a 20 μl metal sample loop before injecting onto the analytical column via a Rhodynne injector. Endogenous peptide samples were analyzed over a discontinuous gradient generated at a flow rate of ~ 10 μl per minute with an Agilent 1100 series capillary pump split upstream of the injector during recording to about ~ 200 nl per minute. The separation was performed with a C18 (150 mm × 0.15 mm) fritted capillary column. The acetonitrile profile was started at 5%, ramped to 12% after 5 min and then increased to 65% over ~ 90 min, remained at 65% for 5 min, decreased to 50% for 15 min and then declined to a final proportion of 5% prior to injection of the next step fraction from the same patient. The nano HPLC effluent was analyzed by ESI ionization with detection by MS and fragmentation by MS/MS with a linear quadrupole ion trap [50]. The instrument was set to collect the precursor for up to 200 ms prior to MS/MS fragmentation with up to four fragmentations per precursor ion that were averaged. Individual, independent samples from disease, normal and ice cold control were precipitated, fractionated over a step gradient and collected over C18 for manual injection.

Correlation analysis

In this study we accepted about 15 million precursor ions with intensity > E4 counts that was previously shown to be at the 99% percentile of the noise distribution with an average signal to noise of approximately one hundred [2, 24]. Correlation analysis of ion trap data was performed with the X!TANDEM [51] and SEQUEST [52] algorithms to match tandem mass spectra to peptide sequences from a library of 158,071 unique Homo sapien proteins that differ by at least one amino acid from RIKEN, IMAGE, RefSeq, NCBI, Swiss Prot, TrEMBLE, ENSEMBL, UNIPROT and UNIPARC along with available Gene Symbols, all previous accession numbers, description fields and any other available annotation rendered non-redundant by protein sequence in SQL Server last assembled in May 2015. Endogenous peptides with precursors > 10,000 (E4) arbitrary counts were searched as fully tryptic peptides and/or phosphopeptides and the results compared in SQL Server/R. The X!TANDEM default ion trap data settings of ± 3 m/z from precursors peptides considered from 300 to 2000 m/z with a tolerance of 0.5 Da error in the fragments were used [22, 26, 36, 37, 51, 53]. The best fit peptide of the MS/MS spectra to fully tryptic and/or phospho-tryptic peptides at charge states of + 2 versus + 3 were accepted with additional acetylation, or oxidation of methionine and with possible loss of water or ammonia. The resulting accession numbers, actual and estimated masses, correlated peptide sequences, peptide and protein scores, resulting protein sequences and other associated data were captured and assembled together in an SQL Server relational database [21].

Data sampling, sorting, transformation and visualization

The linear quadrupole ion trap provided the precursor ion intensity and m/z values plus the peptide fragment MS/MS spectra. The MS/MS spectra were redundantly correlated to specific tryptic peptide sequences by the X!TANDEM and SEQUEST algorithms. The MS and MS/MS spectra together with the results of the X!TANDEM and SEQUEST algorithms were parsed into an SQL Server database and filtered [21] before statistical and graphical analysis with the generic R data system [21,22,23, 35, 54]. The peptide to protein correlation frequency counts for each gene symbol were summed over ovarian cancer versus control to correct the observation frequency prior to the Chi Square test using Eq. (1):

$$({\text{i}})\quad \chi2=({\text{Disease}}{-} {\text{Control}})^{2} /({\text{Control}} + 1)$$
(1)

The precursor intensity data for MS/MS spectra were log10 transformed, tested for normality and analyzed across institution/study and diseases verses controls by means, standard errors, quantile box plots and ANOVA [22, 23, 35]. The Chi Square test, and entirely independent analysis of the precursor intensity using the rigorous ANOVA with Tukey–Kramer HSD test, versus multiple controls was achieved using a 64 bit R server.

Results

The aim and objective of this study was proof of concept towards a method to compare the endogenous tryptic peptides of ovarian cancer plasma to that from multiple clinical locations that utilized random and independent sampling with a battery of robust and sensitive linear quadrupole ion trap ion traps where the results were compiled using a central SQL Server R statistical system. The method shows great sensitivity and flexibility but relies on the fit of MS/MS spectra to assign peptide identity, and statistical analysis of peptide observation frequency and intensity, and so is computationally intensive.

LC–ESI–MS/MS

The pool of endogenous tryptic peptides (TRYP) and/or tryptic phosphopeptides (STYP) were randomly and independently sampled without replacement by liquid chromatography, nano electrospray ionization and tandem mass spectrometry (LC–ESI–MS/MS) [2] from ovarian versus breast cancer, or female normal, other disease and normal plasma, and ice cold controls (see Additional file 1: Table S1) to serve as a baseline. The raw correlations were filtered to retain only the best fit by charge state and peptide sequence in SQL Server to entirely avoid re-use of the same MS/MS spectra. The filtered results were then analyzed by the generic R statistical system in a matrix of disease and controls that revealed the set of blood peptides specific to each disease state. The statistical validity of the extraction and sampling system were previously established by computation of cumulative p-values and FDR corrected q-values for each gene symbol by the method of Benjamini and Hochberg [55] and frequency comparison to null (i.e. known false positive) noise or random MS/MS spectra [2, 24]. The experimental LC–ESI–MS/MS resulted in 15,968,550 MS/MS spectra of which 1,916,672 (12%) were fit by X!TANDEM to distinct best fit peptides with p-values that were computed together to provide the cumulative p-value for each protein accession that resulted in over 14,000 types of protein gene symbols with p-values and FDR corrected q-values of < 1/10,000 (q ≤ 0.0001).

Frequency correction

A total of 269,371 tryptic (TRYP) and 274,356 phospho-tryptic (TRYP-STYP) MS/MS were correlated to proteins from female normal plasma. Similarly, 660,251 (TRYP) and 667, 467 (TRYP-STYP) MS/MS were correlated to proteins from ovarian cancer plasma and these sums were used to correct observation frequency. The observed frequency difference plot passed through the 0 point (no difference in observed frequency) at the 0 quantile point (mean of difference distribution) clearly indicating the observation frequency values were proportionally corrected prior to Chi Square comparison (Fig. 1).

Fig. 1
figure 1

Quantile plots of the corrected difference in observation frequency (Delta) and Chi Square values of the ovarian cancer (i.e. disease treatment) versus control as indicated. The difference of ovarian cancer (n ≥ 10) versus each of the female normal (n ≥ 5) using the Quantile plot tended to zero (see red line). Similar results were obtained by comparison to breast cancer or other controls (not shown). a Tryptic peptide corrected difference (delta) in observation frequency; b tryptic peptide Chi Square χ2; c tryptic and/or STYP the corrected difference (delta) in observation frequency; d tryptic and/or STYP peptide Chi Square χ2

Comparison of ovarian cancer to female normal by Chi Square analysis

A set of ~ 500 gene symbols showed Chi Square (χ2) values of ≥ 15 between the ovarian cancer versus the normal female samples. Ovarian-cancer-specific peptides and/or phosphopeptides from cellular proteins, membrane proteins, nucleic acid binding proteins, signaling factors, metabolic enzymes and others including uncharacterized proteins showed significantly greater observation frequency. In agreement with the literature, peptides from many common proteins including acute phase response proteins such as Haptoglobin (HP) [39], Haptoglobin Related Protein (HPR), Alpha Anti Trypsin (SERPINA1) [15] and others were more frequently observed in ovarian cancer samples [38] (Table 1). The Chi Square analysis showed some proteins with χ2 values that were apparently far too large (χ2 ≥ 60, p < 0.0001, df 1) to all have resulted from random sampling error (Fig. 1). Many proteins showed an observation frequency that was significantly greater in ovarian cancer plasma including ZNF91, ZNF254, F13A1, LOC102723511, ZNF253, QSER1, P4HA1, GPC6, LMNB2, PYGB, NBR1, CCNI2, LOC101930455, TRPM5, IGSF1, ITGB1, CHD6, SIRT1, NEFM, SKOR2, SUPT20HL1, PLCE1, CCDC148, CPSF3, MORN3, NMI, XTP11, LOC101927572, SMC5, SEMA6B, LOXL3, SEZ6L2 and DHCR24 (Table 1). The full list of Chi Square results are found in Additional file 2: Table S2.

Table 1 Ovarian cancer specific proteins detected by fully tryptic peptides and/or fully tryptic phosphopeptides that show a Chi Square (χ2) value of ≥ 60

Pathway and gene ontology analysis using the STRING algorithm

In a computationally independent method to ensure the variation in proteins associated with ovarian cancer were not just the result of some random process, we analyzed the distribution of the known protein–protein interactions and the distribution of the cellular location, molecular function and biological processes of the proteins identified with respect to a random sampling of the human genome. There were many interactions apparent between the proteins computed to be specific to ovarian cancer from fully tryptic (Fig. 2) and/or phospho tryptic peptides (Fig. 3). The ovarian cancer samples showed statistically significant enrichment of protein interactions and Gene Ontology terms that were consistent with structural and functional relationships between the proteins identified in ovarian cancer compared to a random sampling of the human genome (Table 2).

Fig. 2
figure 2

The Ovarian Cancer STRING network where Chi Square χ2 ≥ 15 from fully tryptic peptides. Ovarian Cancer tryptic peptide frequency difference > 15 and χ2 value > 15 at degrees of freedom of 1 (p < 0.0001). Network Stats: number of nodes, 173; number of edges, 260; average node degree, 3.01; avg. local clustering coefficient, 0.378; expected number of edges, 206; PPI enrichment p-value, 0.000175

Fig. 3
figure 3

The Ovarian Cancer STRING network where Chi Square χ2 ≥ 15 from fully tryptic phospho peptides. Ovarian Cancer STYP, frequency difference > 15 and χ2 value > 15 at degrees of freedom of 1 (p < 0.0001). Network Information: number of nodes, 191; number of edges, 182; average node degree, 1.91; avg. local clustering coefficient, 0.335; expected number of edges, 152; PPI enrichment p-value, 0.00911

Table 2 The summary of STRING analysis with respect to a random sampling of the human genome for gene symbols that show a Chi Square (χ2) value ≥ 15 (see Additional file 1: Table S1, Additional file 2: Table S2)

ANOVA analysis across disease, normal and control plasma treatments

Many proteins that showed greater observation frequency in ovarian cancer also showed significantly greater precursor intensity compared to breast cancer, the female normal controls, male and female EDTA plasma from other diseases and normals by ANOVA comparison. The mean precursor intensity values from gene symbols that varied by Chi Square (χ2 > 15) were analyzed by univariate ANOVA followed by the Tukey–Kramer Honestly Significant Difference (HSD) test in R [1, 23] (Table 3, Figs. 4, 5 and 6). For example, HPR showed precursor intensity quantile plots with  a linear and Gaussian distribution that ranged from E4 to more than E6 (Fig. 4). The common acute phase proteins HP, HPR, HPX, and SERPINA all showed significant increases with ovarian cancer (Fig. 5). Ovarian cancer showed a higher intensity of cellular proteins including Zinc Finger protein 91 (ZFN91), apparently extracellular protein LOC101930455 (XP_005275896 spidroin-1-like), Regulating Synaptic Membrane Exocytosis 1 (RIMS1), Transient Receptor Potential cation channel subfamily M member 5 (TRPM5), Helicase DNA Binding Protein 6 (CHD6), GTPase IMAP Family Member 4 (GIMAP4), and others by ANOVA followed by the Tukey–Kramer HSD test (Fig. 6). However, many proteins showed no difference between the ovarian versus the breast cancer clinical treatments such as APOA1 (Fig. 6).

Table 3 The analysis of mean peptide intensity per gene symbol for Haptoglobin related protein by ANOVA with Tukey–Kramer multiple means comparison
Fig. 4
figure 4

The quantile plot showing the normality of the Log10 peptide intensity values of HPR. The dashed red lines define an ideal Gaussian or Normal distribution

Fig. 5
figure 5

The variation in known plasma proteins across the clinical treatments. Treatment ID numbers: 1, Alzheimer normal; 2, Alzheimer normal control STYP; 3, AlzHeimer’s dementia; 4, Alzheimer’s dementia STYP; 5, Cancer breast; 6, Cancer breast_STYP; 7, Cancer_control; 8, Cancer control STYP; 9, Cancer ovarian; 10, Cancer ovarian_STYP; 11, Ice Cold; 12, Ice Cold STYP; 13, Heart attack Arterial; 14 Heart attack Arterial STYP; 15, Heart attack normal control, 16, Heart attack normal Control STYP; 17, Heart attack; 18, Heart attack STYP; 19, Multiple Sclerosis normal control; 20, Multiple Sclerosis normal control STYP; Multiple Sclerosis; 22, Multiple Sclerosis STYP, 23 Sepsis; 24, Sepsis STYP; 25, Sepsis normal control; 26, Sepsis normal control STYP. The ANOVA analysis of the proteins shown across treatments produced a significant F Statistic for means comparisons by Tukey–Kramer HSD test that showed significant differences between ovarian cancer or ovarian cancer STYP, versus the normal female control and/or breast cancer (see Additional file 1: Table S1, Additional file 2: Table S2 for Tukey–Kramer results for each protein shown). STYP: serine, threonine, tyrosine phosphorylation. Note that many proteins were not detected in the ice cold plasma

Fig. 6
figure 6

The variation in apparently cellular proteins in plasma across the clinical treatments. Treatment ID numbers: 1, Alzheimer normal; 2, Alzheimer normal control STYP; 3, Alzheimer’s dementia; 4, Alzheimer’s dementia STYP; 5, Cancer breast; 6, Cancer_breast STYP; 7, Cancer control; 8, Cancer control STYP; 9, Cancer ovarian; 10, Cancer ovarian STYP; 11, Ice Cold; 12, Ice Cold STYP; 13, Heart attack Arterial; 14 Heart attack Arterial_STYP; 15, Heart attack normal control, 16, Heart attack normal Control STYP; 17, Heart attack; 18, Heart attack STYP; 19, Multiple Sclerosis normal control; 20, Multiple Sclerosis normal control STYP; Multiple Sclerosis; 22, Multiple Sclerosis STYP, 23 Sepsis; 24, Sepsis STYP; 25, Sepsis normal control; 26, Sepsis normal control STYP. The ANOVA analysis of the proteins shown across treatments produced a significant F Statistic for means comparisons by Tukey–Kramer means comparison that showed a significant difference between ovarian cancer or ovarian cancer STYP (see Additional file 1: Table S1, Additional file 2: Table S2 for Tukey–Kramer results for each protein shown). STYP: serine, threonine, tyrosine phosphorylation. Note that many proteins were not detected in the ice cold plasma

Discussion

Random and independent sampling of peptides from step-wise fractionation followed by LC–ESI–MS/MS is a time and manual labor intensive approach that is sensitive, direct, and rests on few assumptions [2, 56]. High signal to noise ratios of blood peptides is dependent on sample preparation to partition the sample into many selective sub-fractions to relieve competition and suppression of ionization and thus achieve sensitivity [24,25,26] but then requires large computing power to re-assemble, organize and analyze the sub-fractions together into samples within treatments for statistical analysis [21, 24,25,26, 56]. Here three independent lines of evidence, Chi Square analysis of observation frequency, ANOVA analysis of peptide intensity, together with previously established structural/functional relationships from STRING all agreed that there was significant differences in the peptides from specific proteins of ovarian cancer patients compared to controls. The previous careful study of pre-clinical variation over time, and under various storage and preservation conditions, seems to rule out pre-clinical variation as the most important source of variation between ovarian cancer and other disease and control treatments [2,3,4]. Together the results amount to a successful proof of principal for the application of random and independent sampling of plasma from ovarian cancer versus multiple clinical treatments by LC–ESI–MS/MS to identify and quantify proteins and peptides that show variation between sample populations.

Pre-analytical variation

Collecting blood plasma samples directly onto ice might prevent the secretion of enzymes or proteins from blood cells, and prevent the degradation of proteins by proteases ex vivo. The effect of ex vivo proteolysis on the endogenous peptides of blood samples can be prevented by acid quench, protease inhibitors, freeze drying or ice to preserve the sample [1, 2, 4, 5]. EDTA plasma from blood collected on ice was stable when freeze dried with low peptide frequency and intensity but liquid plasma slowly degrades at room temperature [2, 4, 5]. Blood fluid contains a net weak tryptic activity [57] that may cleave endogenous peptides in vivo (peptidome) and endogenous proteolytic activities generate high levels of some of these same peptides ex vivo (degradome) [58, 59] where these two pools show some overlap [2]. The frequency and/or intensity of peptide observations increased in samples incubated at room temperature compared to ice cold samples that shared some peptides and proteins [1,2,3, 5, 24]. The increased frequency and average precursor intensity values of cellular proteins across the clinical samples compared to the ice cold controls indicates the some of the peptides and or proteins observed were released from cells, or degraded by proteases released or activated, ex vivo. There was apparently statistically significant variation in the cleavage of endogenous peptides from cellular proteins across the different disease and normal treatments, female samples and ice cold controls.

Chi Square analysis of ovarian cancer versus female normal

Specific endogenous tryptic peptides, were detected from ovarian cancer versus the corresponding normal female or the other diseases and controls. The large differences in observation frequency support the existence of disease-specific peptides in the blood plasma of ovarian cancer patients. The results here with Haptoglobin (HP) in Ovarian Cancer agree with previous results [39]. Large increases in the frequency and intensity of Haptoglobin Related Protein (HPR), alpha antitrypsin (SERPINA1), Hemopexin (HPX) or other proteins were observed, but the greater representation of these common, acute-phase response proteins is not likely to be highly specific to one disease [38]. Many of the proteins that were significantly increased in disease, compared to the 6 sets of controls, included amyloids, complements, haptoglobin, IgG chains, IITI, anti-trypsin, alpha 2 macroglobulin, fibrinogens, hemopexin, apolipoproteins that are elevated in more than one disease [38]. However, specific phosphorylations or other post translational modifications of acute phase or other common blood proteins might provide some greater utility than increases in these proteins alone [5, 60,61,62,63]. Many of the proteins that varied in ovarian cancer were previously shown to play a role in cancer biology, or were previously established tumor diagnostic or prognostic markers and several have previously been detected in the plasma of cancer: Coagulation factor XIII has been suggested to be a biomarker for screening colorectal cancer [9]; P4HA1 is a prolyl 4-hydroxylase that may be a prognostic marker for glioma [64]; Glipican has been localized to exosomes and previously implicated as a biomarker of cancer [42]; Laminin B2 promotes non-small cell lung cancer [65]; CSR1 is a tumor suppressor gene that activates CPSF3 preventing the interaction of XIAP with caspase [66]; MORN3 is a testes-cancer antigen that recruits the Sirtuin deacetylase that modifies P53 [67]; SIRT1 (Sirtuin) is a histone deacetylase that may regulate tumor formation [68]; Cyclin 1-like (CCN12) plays a role in cell cycle progression and proliferation [69]; NMI is an N-MYC and STAT interactor shown to increase in protein expression with tumor grade and plays a role in cell cycle progression [70]; Increased ITGB1 integrin beta 1 has been shown to be associated with some, but not all, solid cancers [71]; A gene expression array identified NEFM as indicative of the risk of prostate cancer [72]; PLEC1 was shown to promote esophageal cancer cell progression by maintaining the expression of SNAIL [73]; SRGN was show to be expressed in the exosomes of adenocarcinoma by LC–ESI–MS/MS [74]; DHCR reduces cholesterol, may play a role in cancer [75] and selective and potent inhibitors of DHCR have been developed [76]; SMC5 complexes with MMS21 that acts as an E3 ligase required to avoid gross chromosomal rearrangements [77]; Semaphorins such as SEMA6B were strongly down regulated in breast cancer [78]; Lysyl oxidase-like 3 was required for melanoma cell survival [79]; Seizure related 6 homolog (SEZ6L2) showed increased gene expression in primary lung cancer by RT-PCR and Western blot [80].

Pathway and gene ontology analysis by the STRING algorithm

The set of gene symbols that were significant from Chi Square analysis of the peptide frequency counts were independently confirmed by STRING analysis. The network analysis by STRING indicated that the peptides and proteins detected were not merely a random selection of the proteins from the human genome but seemed to show statistically significant protein–protein interactions, and showed significant enrichment of cellular components, biological processes, and molecular functions associated with the biology of cancer. The significant results from STRING analysis seemed to indicate that at least some of the differences observed could not have resulted from random sampling error between ovarian cancer and the female normal controls. The previously established structural or functional relationships observed among the ovarian cancer specific gene symbols filtered by χ2 were consistent with the detection of bone fide variation specific to ovarian cancer. The STRING results apparently indicate that specific protein complexes are released into the circulation of ovarian cancer patients [40].

Ovarian cancer specific variation by ANOVA

After testing the discrete frequency data using the computationally extensive Chi Square (χ2) test, the significant protein gene symbols were then analyzed by computationally intensive ANOVA of the continuous and normally distributed (Gaussian) log10 intensity values [22, 23, 35]. A potential role has been suggested for ZNF91 in some cancer pathogenesis [81, 82] and zinc finger proteins may play a role attenuating the cellular effects of viral genes [83] that may account for some 15% of cancer [84]. The large zinc finger superfamily that may bind RNA and DNA have been detected in human blood by partition chromatography, organic extraction of endogenous peptides and Western blot [25, 26, 30]. Regulation of the chromatin remodeling enzyme CHD6 was observed in the molecular analysis of urothelial cancer cell lines [85]. A novel translocation of LMBRD1-CHD6 (6;20)(q13;q12) was observed in acute myeloid leukemia [86]. Dis-regulation of CHD6 was also observed in models of colorectal cancer [87]. Sirtuin 1 (SIRT1) may promote cellular proliferation, migration and invasion in epithelial ovarian cancer [88] and inhibits p53-dependent apoptosis in human melanoma cells [89]. Hemopexin is expressed in a model of hepatocellular carcinoma from hepatitis B in woodchucks [90]. In contrast, there is no previous study of LOC102723511, (adhesive plaque matrix protein-like) that remains a hypothetical protein. Similarly, the glycine rich unknown protein XP_005275896 that is encoded by LOC101930455 may show some cryptic sequence homology to bacterial proteins and general features consistent with extracellular structural proteins that might be important for biochemical marker development [62]. In general, many of the proteins that showed greater frequency and/or intensity in ovarian cancer from plasma peptides were consistent with the previously established role of the proteins in cancer or tumor biology.

Ovarian cancer EDTA plasma peptides and proteins

It is not clear if the observed variation results from greater expression of the specific proteins, expression of proteases that target the observed proteins, greater susceptibility to endoproteolytic attack, greater resistance to exopeptidase activity, or the combinations, as the source of variation between proteins and sample treatments. It should be possible to specifically compare and confirm the levels of disease specific peptides and parent proteins by automatic targeted proteomics [4] after extraction of peptides in one step [30] or after collection of the intact protein chains over the best partition chromatography resin [26] followed by tryptic digestion and analysis. For example, C4B peptides discovered by random and independent sampling were shown to be a marker of sample degradation by automatic targeted assays [2,3,4]. Automatic targeted analysis of peptides from independent analysis provided relative quantification to rapidly confirm the potential utility of C4B peptide as a marker of sample degradation [4]. There is strong evidence that the action of disease-specific tryptic endoproteinase activity cleaves specific peptides in blood fluids that may sensitivity reflect changes in the corresponding parent proteins [1]. We cannot rule out that at least some of the endogenous peptides detected more specifically in ovarian cancer may reflect an increased concentration of the parent protein [38]. Attempts to analyze the proteins of blood by depletion and tryptic digestion first, followed by separation of peptides over strong cation exchange and C18 cannot be used to focus on one protein in a targeted manner [91]. In contrast, the separation of the proteins first by partition chromatography followed by tryptic digestion of the enriched fraction and C18 separation of peptides may permit the efficient, and automated, targeted assay of specific proteins without the use of immunological reagents [26]. Traditional partition chromatography using quaternary amine, propyl sulfate, concanavalin A, heparin or DEAE resin followed by trypsin digestion and LC–ESI–MS/MS robustly identify at least 4396 blood proteins by X!TANDEM using disposable preparative micro chromatography resins followed by LC-ESI-MS/MS [25, 26]. Thus one step organic extraction [27], and/or the partition chromatography of the parent proteins followed by tryptic digestion [25, 26], may be used to automatically confirm the peptides and proteins and provide relative quantification by ANOVA [35]. Subsequently, the best performing peptides and proteins may be absolutely quantified by external or internal isotopic standards [92].

Conclusion

The step wise organic extraction of peptides [24] provided for the enrichment of endogenous tryptic peptides with high signal to noise for random sampling [4] across disease and control (normal) treatments. A large amount of proteomic data from multiple diseases, controls and institutions may be stored, related and statistically analyzed in 64 bit SQL Server/R. The random and independent sampling of plasma endogenous tryptic peptides by LC-ESI-MS/MS identified many new blood proteins that were previously associated with the biology of cancer or that have been shown to be biomarkers of solid tumors by genetic or biochemical methods. The striking level of agreement between the results of random and independent sampling of plasma by mass spectrometry with those from cancer tissues and cells seems to indicate that clinical discovery of plasma by LC–ESI–MS/MS will be a powerful tool if it can be applied at a larger scale. A larger scale of extraction, and larger C18 preparative bed volume, would be required to automate the discovery and confirmation process for clinical applications by a modification of the existing method [24] to create a highly concentrated sample sufficient to fill and saturate the surface of an auto-sampling vial. Previous C4B peptides that were discovered as markers of sample degradation by random and independent sampling of tryptic peptides and were subsequently confirmed by automatic targeted analysis from independent samples [2,3,4] that strongly indicate a similar work flow could be applied to disease versus normal samples.