1 Introduction

As more and more alternative treatments become available for breast carcinoma, there is a need to stratify patients reliably [1]. Novel cancer biomarkers promise to achieve early diagnosis of cancer subtypes, optimized personalized therapy for individual patients, and assessment of risk factors for recurrence of disease [2]. Hereto, markers and/or signatures consisting of multiple markers are of need that not only show the molecular specificities of the tumor subtype, i.e., its enzymatic make-up, but also (genetic) specificities of the individual patient who is involuntarily hosting the tumor under treatment.

Protein markers for breast carcinoma have been amply suggested [36], and several protein marker candidates have been named from proteome analyses [715]. Established biomarkers, such as estrogen receptor and progesterone receptor already play a significant role in the selection of patients for endocrine therapy. Human epidermal growth factor receptor 2 (HER2) is recognized as a strong predictor of response to trastuzumab [16] whereas, more recently, the role of estrogen receptor and HER2 as negative and positive indicators for chemotherapy has also been explored [17, 18]. While tumor subtype-specific markers may suggest which target(s) would be the best ones to tackle in order to eliminate the tumor most efficiently, the (genetic) host signature may give indications about individual potential side effects that have to be considered during therapy. Concern towards the latter is given, e.g., in so-called responder/non-responder analyses [19, 20].

In recent publications, emphasis has been placed on signatures [2124] rather than single markers for disease diagnostics [22], prognostics [25], and patient stratification [26, 27]. Signatures of markers have been assembled from RNA expression profiles, e.g., the Oncotype DX assay [28, 29] and the MammaPrint profile [30, 31], which are currently used in clinical trials. Recently, investigations with differential proteome analysis [26] and immunohistochemistry [25] have suggested protein-based signatures with high potential for clinical application.

The reliability of a “proteome signature” in general depends on (1) the “ease” with which this signature can be screened in large(r) cohorts of patients, and (2) whether the suggested protein expression differences can be observed by using different technology platforms. When fulfilling the above mentioned prerequisites, a breast carcinoma proteome signature that is described in a multiparametric study ought to be able to reliably differentiate two different tissue states by predefined means in a robust manner [32, 33]. But perhaps most importantly, in order to increase the accuracy of a proteome signature, one has to take into consideration that proteins may be structurally modified upon disease-related insults. Hence, in order to investigate structural details from proteins that are suggested to contribute to our proteome signature, we analyzed protein spots in 2D gels using mass spectrometry.

Here we show that mass spectrometry enabled refining our proteome signature by identifying distinct isoforms and/or truncations of proteins as well as determining specific phosphorylation status of proteins that contribute to the proteome signature. We found that the refined proteome signature [26] can be used as well with 2D gel based differential proteome analysis of protein extracts from tumor and gland tissue. Multiparametric, quantitative monitoring of protein expression in breast carcinoma samples shows that even a set of 20 defined protein entries is capable of separating tumor samples from control tissue with high accuracy.

2 Experimental

2.1 Human Tissues and Samples

The study was approved by the Institutional Review Board. Tumor samples of invasive breast carcinoma and control samples, originating from the same breast, were taken from six postmenopausal women after modified radical mastectomy at the Women’s Hospital, Suedstadt Clinical Center, Rostock, Germany, after informed consent was given. No preoperative core biopsy was performed, and axillary lymph node involvement was detected in three cases. Tumor size ranged between 14 and 46 mm. All tumor samples had a histologic grading of 2 or 3. All tumors were classified as invasive ductal carcinoma (patients A–F); one also contained a tumor of lobular origin (patient E; see Table 1). Tissue pieces were immediately shock-frozen in liquid nitrogen and kept at – 80 °C until further work-up.

Table 1 Clinical and pathologic parameters of sample group

2.2 Protein Extraction and Sample Preparation

Proteins from fresh frozen tissues were extracted as described [34]. Briefly, frozen tissue samples were ground to powder with pestle and mortar and cooled in a liquid nitrogen bath. The powder was mixed with pre-frozen lysis buffer containing 2 M thiourea, 7 M urea, 4% CHAPS, 70 mM DTT, 0.5% Servalyte 3-10 (Serva, Heidelberg, Germany), Complete with EDTA (Roche Diagnostics, Mannheim, Germany) as Protease Inhibitor Cocktail, PMSF/Pepstatin A (Sigma, Munich, Germany), and Phosphatase Inhibitor Cocktail 1 and 2 (Sigma). Pellet suspensions were homogenized with glass beads, sonicated, and stirred six times on ice before centrifugation for 20 min at 4 °C and 13,000 rpm. Supernatants were aliquoted and stored at – 80 °C for further use. Protein concentration was determined using the Bio-Rad Protein Assay (Bio-Rad, Munich, Germany) [35, 36].

2.3 Two-dimensional Gel Electrophoresis and Gel Image Analysis

From each sample, two gels were generated. Two-dimensional gel electrophoresis was carried out as described [37]. In short, 400 μg of protein dissolved in lysis buffer was mixed at least with 2-fold excess of rehydration buffer (8 M urea, 2% CHAPS, 16 mM DTT, 0.5% IPG buffer (pH 3–10), Complete with EDTA for protease inhibition, PMSF/Pepstatin A, and bromophenol blue) to obtain a final sample volume of 350 μL, which was loaded onto one strip. The first dimension (IEF) was operated on Immobiline Dry strips (pH 3–10, NL) in an IPGphor system (GE Healthcare/Amersham Biosciences, Freiburg, Germany). Before the second dimension, strips were first exposed for 15 min in 5 mL equilibration buffer (50 mM TRIS HCl, pH 8.8, 6 M urea, 30% glycerol, 2% SDS, bromophenol blue) containing 1% (wt/vol) DTT for reduction of disulfide bonds. Subsequently, alkylation of free sulfhydryl groups was performed for 15 min in equilibration buffer, which was supplemented with 4% (wt/vol) IAA. For the second dimension, SDS polyacrylamide gels (12% T) were run for 18 h at 100 V in a Hoefer DALT Vertical Electrophoresis System (GE Healthcare/Amersham Biosciences, Freiburg, Germany) applying the Laemmli continuous buffer system [38]. Gels were fixed and stained with colloidal Coomassie brilliant blue G-250 [39, 40]. Stained gels were scanned with the Umax Mirage II Scanner (Umax Data Systems, Willich, Germany). Progenesis PG200, ver. 2006 (Nonlinear Dynamics Ltd., Newcastle upon Tyne, UK) was used for image analysis and matching. After spot detection and matching, the spot volume distributions in each gel were examined. Gels were considered to be of good quality when a Gaussian distribution was obtained. Only such gels that matched the Gaussian spot volume distribution criteria were used for differential analysis. Inclusion criteria for differential 2D gel analysis were set using expression data (spot volumes) and the number of matched spots from the patients A–C. First, those spots were included that fulfilled the two criteria: (1) at least 2-fold differentially expressed between tumor (T) and gland (G), and (2) at least three technical replicates (matched spots) in one of the groups. Second, in the case that a spot did not fulfill the above mentioned criteria of being at least 2-fold differentially expressed, we subtracted the number of technical replicates (matched spots) of one group from the other, and a spot was included when the difference of the number of matched spots between both groups was at least three.

2.4 Mass Spectrometric Peptide Fingerprinting

Protein spots of interest were excised automatically using the Flexys Proteomics Picker (Genomic Solutions, Ann Arbor, MI, USA). Excised gel plugs were subjected to in-gel digestion with trypsin (Promega) [41]. Sample preparation of peptide mixture was performed on an AnchorChip 600/384 target plate [42] using α-cyano-4-hydroxy cinnamic acid (CHCA) as matrix. Peptide mixtures were analyzed with a Reflex III MALDI TOF mass spectrometer (Bruker Daltonik, Bremen, Germany) equipped with the SCOUT source and delayed extraction and operated in positive ion mode using an acceleration voltage of 20 kV [43]. Spectra were externally calibrated using a commercially available Peptide Calibration Standard (Bruker Daltonik), as well as internally recalibrated using the following peptide ion signals derived from trypsin autoproteolysis: [M + H]+ 842.51, [M + H]+ 1045.54, [M + H]+ 2211.10, [M + H]+ 2807.39. Mass spectra were further processed and analyzed with the FlexAnalysis 2.4 and BioTools 3.0 software. A peak list of singly charged peptide ion signals was generated by manual labeling without previous smoothing. Database searches were performed against an in-house SWALL database (92,353 entries, UniProt release 13.6 that consists of Swiss-Prot release 55.6 and TrEMBL release 38.6) using the Mascot software ver. 2.2.03 (Matrix Science, London, UK) with the following search parameters: taxonomy: homo sapiens, peptide tolerance: 80 ppm, fixed modifications: carbamidomethylation of cysteines, variable modifications: oxidation of methionines, 1 missed cleavage. Signals from trypsin autoproteolysis and matrix as well as tryptic peptides of common keratins were excluded from the search.

2.5 Phosphopeptide Enrichment

Phosphorylated peptides were enriched using published protocols [44] with slight modifications [45]. In short, porous titanium dioxide immobilized to the inner surface of pipette tips (NuTip; Glygen, Columbia, MD, USA) was conditioned five times with 40% ACN, 0.1% TFA. Five μL of peptide mixture were mixed with 2.5 μL of 60 mg/mL 2,5-dihydroxy benzoic acid (DHB) in 80% ACN, 0.1% TFA, and loaded onto the titanium dioxide material by 50 times aspiration/dispension. Then, the column was washed once with 10 μL of 20 mg/mL DHB in 40% ACN, 0.1% TFA and once with 10 μL of 40% ACN, 0.1% TFA. Bound peptides were eluted in 3 μL of 0.25% NH4OH by 20 times aspiration and dispension.

2.6 Mass Spectrometric Peptide Sequencing

For mass spectrometric peptide sequencing 0.5 μL of the peptide mixture were prepared on an AnchorChip 600/384 target plate using 2.0 μL of DHB as matrix [5 mg/mL DHB in ACN/0.1% TFA in water (33/67 vol/vol)]. MS/MS spectra were acquired on an Axima MALDI QIT TOF mass spectrometer (Shimadzu Biotech, Manchester, UK) in positive ion mode utilizing a 337 nm nitrogen laser, and a 3D quadrupole ion trap supplied by a pulsed helium flow for cooling and argon gas to cause collisionally induced dissociation [46]. Spectra were externally calibrated with a manually mixed peptide standard consisting of bradykinin (1-7), [M + H]+ 757.39; angiotensin II, [M + H]+ 1046.53; angiotensin I, [M + H]+ 1296.68; bombesin, [M + H]+ 1619.81; N-acetyl renin substrate, [M + H]+ 1800.93; ACTH (1-17), [M + H]+ 2093.08; ACTH (18-39), [M + H]+ 2465.19; somatostatin, [M + H]+ 3147.46; insulin (oxidized β chain), [M + H]+ 3494.64. About 1000 profiles were summed and further processing and analysis of the MS/MS spectra was performed with the Launchpad software, ver. 2.8.4 (Shimadzu Biotech).

2.7 Western Blot Analysis

Western blotting was performed as previously described [26]. Here, 35 μg of protein in lysis buffer was loaded onto a NuPAGE Novex 12% Bis-Tris Gel (Invitrogen, Carlsbad, CA, USA) and separated by SDS-PAGE. Subsequent to electrophoretic separation of proteins they were blotted onto a PVDF-membrane (Immobilon; Millipore, Schwalbach, Germany) by semi-dry blotting. Blotting took place with 1.2 mA per cm2 gel for about 1.5 h. After blocking (TBS, 5% non-fat dry milk powder, 1% BSA, 0.02% thimerosal) for 2 h at room temperature, blotted proteins were probed with monoclonal mouse antibodies against nucleoside diphosphate kinase A (1:100; AbD Serotec, Duesseldorf, Germany), pyruvate kinase isozyme M2 (1:1000; ScheBo Biotech AG, Giessen, Germany), glyceraldehyde 3-phosphate dehydrogenase (1:5000; Abcam, Cambridge, MA, USA), and β-actin (1:4000; Sigma-Aldrich, Munich, Germany) for 15 h at 4 °C, respectively. Washing (TBS, 0.05% Tween 20, 0.1% BSA, 0.02% thimerosal) was followed by second antibody reaction (biotin-SP-conjugated goat anti mouse IgG + IgM (1:20,000; Jackson ImmunoResearch Laboratories, Inc., West Grove, PA, USA), for 1 h at room temperature. Another washing step and reaction with streptavidin peroxidase (1:10,000; Jackson ImmunoResearch Laboratories Inc./Dianova) for 1 h at room temperature was carried out. Detection was performed using the SuperSignal West Pico Chemiluminescent Substrate (Thermo Fisher Scientific/Pierce).

2.8 Bioinformatic Analysis

Unsupervised principle components analysis (PCA) [47] was performed with Matlab ver. 7.3.0 (R2006b), The MathWorks, Inc., Natick, MA, USA, as previously described [26]. Briefly, the log2-transformed normalized values were used after setting missing values to the smallest detected concentration in these cases when a protein was not identified in the sample. The first two PCs were selected to project the data into a subspace useful for visualization. Graphical representation was realized using the Origin software (ver. 6.1; OriginLab Corporation). Normalized values of selected spots are graphically represented in heat maps. Hierarchical clustering was performed based on the complete linkage method and Spearman's correlation coefficient as a measure of similarity. Signal intensities were centered and scaled row-wise for visualization purposes [48].

3 Results

3.1 Patients and Samples

Tissue samples from healthy breast (gland) and tumor were collected from six patients with breast cancer upon full removal of the gland. Histologic tumor grade was determined using an established scoring system (Table 1) [49]. Patients A–F were diagnosed with invasive ductal carcinoma. Patient E suffered from lobular carcinoma simultaneously. Patient C showed involved axillary lymph nodes with an underlying triple-negative tumor [50]. Axillary lymph node involvement was detected also in patients D and E. Perioperative diagnostic procedures ruled out distant metastatic disease for all patients.

As tumors were of different sizes at time point of surgery, we decided to cut the tissue samples into approximately 10 mm3 pieces after pathologic evaluation. Such tissue pieces from tumor and control samples were then subjected to protein extract preparation using standardized procedures.

3.2 Global Proteome Analysis Results and Confirmation by Western Blotting

Two dimensional gel analysis of the protein extracts of patient samples A–C, the same patient samples that were subjected to LC-MSE experiments previously [26], led to the assignment of 1203 well-reproducing spots in duplicate gels per sample (total number of gels was 12). These spots were assembled in a reference gel (Supplemental Figure 1, which can be found in the electronic version of this article). Next, spots were selected for peptide mass fingerprint analysis such that differentially expressed spots (reproducibly differential in at least two patient sample pairs) as well as so-called landmark spots were included. These analyses mounted to 208 non-redundant protein identifications which were present in 348 spots (Supplemental Figure 1, Supplemental Table 1). In selected cases, mass spectrometric partial amino acid sequence analysis was performed as well (see below).

Comparing the protein identification list of the 2D-gel analysis with our previously published proteome signature showed that 20 of the 208 proteins were contained in both lists (Table 2). These 20 proteins were found in 38 spots (Figure 1), which were (1) differential in abundance between the tumor and control samples, and (2) followed the inclusion criteria for differential 2D gel analysis (see the Experimental section).

Table 2 Protein identification results of differentially abundant spots belonging to proteins from the predefined proteome signature
Figure 1
figure 1

Image of a 2-DE reference gel from tumor tissue. About 1200 separated spots are visualized. Labeled spots are a subset of excised and in-gel digested spots that belong to proteins from the proteome signature [26]. For protein identification see Table 2;. 400 μg protein was loaded on immobiline strips (pH 3–10 NL) that were used for the first dimension. SDS-PAGE (12% T) was applied for the second dimension. Coomassie brilliant blue (CBB) staining

For example, the differential abundance of the proteins glyceraldehyde-3-phosphate dehydrogenase (GAP-DH), pyruvate kinase M2 (PKM2), nucleoside diphosphate kinase A (NDKA), and nucleophosmin was shown in patients A–C by 2D gel analysis, consistent with previous label-free quantitation analyses using nano-LC-MSE [26]. In accordance with the analyses with samples from patients A–C, matching protein spot abundance differences were observed in 2D gel pairs derived from samples of patients D–F, again confirming the previous findings.

Next we intended to confirm protein expression differences of proteins of interest, for which suitable antibodies were available, using immunoanalytical methods. Western blot analyses with three selected proteins were performed with materials from patients A–F (Supplemental Figure 3). The antibody for nucleophosmin was not suitable for Western blot analysis. For all three proteins (NDKA, GAP-DH, and PKM2) the Western blots showed higher expression in the tumor samples than in the gland tissue (control). The only protein for which Western blot bands were observed in the control samples was GAP-DH, but generally as weaker bands than those in the tumor samples.

It should be noted that Western blot analysis is not quantitative and the presence of post-translational modifications or structural differences due to alternative splicing etc. are normally not displayed by these experiments. By contrast, 2D gel image analysis in combination with mass spectrometric peptide mapping of differentially expressed proteins gives, all in all, more information on protein structures and/or structural differences. Due to the fact that one protein can migrate to more than one location in the 2D gel (e.g., GAP-DH, see above) mass spectrometric peptide mapping allows the analysis of the respective isoforms or post-translational modifications.

3.3 Mass Spectrometric Protein Structure Characterization

For protein structure analysis of 2D gel separated proteins, we applied mass spectrometric peptide mapping, focusing on the four proteins mentioned above as their protein spots contributed significantly to our pre-established proteome signature (cf. Supplemental Figure 2). For example, the MALDI spectrum (Figure 2a) of the peptide mixture derived from protein in spot 492 (cf. Figure 1) showed intense peptide ion signals that were assigned to the respective amino acid sequence of nucleoside diphosphate kinase A (NDKA; P15531). NDKA-specific ion signals were found at m/z 1149.64 [amino acid (aa) 57-66] and at m/z 2082.02 (aa 40-56). These ion signals matched only to the NDKA sequence but not to the nucleoside diphosphate kinase B (NDKB; P22392) sequence, a related protein with high sequence homology to NDKA. The sequence coverage for NDKA was 85%, resulting in an identification score of 229. Clearly, NDKA was identified in spot 492 that has been found to be differentially regulated (cf. Supplemental Figure 2 and Table 2) such that it was well presented in the gels from the tumor samples but very faintly seen in those from the control (gland) tissue.

Figure 2
figure 2

MALDI TOF MS peptide mass fingerprint (A) of NDKA. Selected peptide ion signals are labeled. Numbers in parentheses indicate partial sequence ranges. Peptide ion signals that are distinctive for NDKA are printed in bold. T: signals of trypsin autoproteolysis products. M: signals from matrix. + Na: sodium adducts. Mox: oxidation at methionine residues. Peptide ion with signal at m/z 2082.02 was subjected to MALDI QIT TOF MSn sequencing (B). The determined partial amino acid sequence is depicted and was assigned to a NDKA peptide comprising amino acids 40–56. Amino acid residues are depicted in single letter code. The mass spectrometric fragment ions from the Y"-type ion series and the B-type ion series as well as selected internal fragments and typical signals for Mox residues are indicated. Magnification factor is given. DHB was used as matrix

To verify peptide assignments of selected ion signals to partial sequences, these peptides were subjected to mass spectrometric sequencing through which their assignment was confirmed (Figure 2b). The ion signal at m/z 2082.65 yielded in intense B-type and Y"-type ion series that confirmed the peptide to originate from NDKA. The respective sequence part (aa 40-56: FMQASEDLLKEHYVDLK) is unique for this protein and clearly differentiates it from the respective partial sequence derived from NDKB. Sequencing also confirmed the oxidized methionine residue in this peptide.

The presence of pyruvate kinase M2 (PKM2; P14618) but not pyruvate kinase M1 (P14618) in spot 624 was proven by the MALDI peptide mapping result (Figure 3a), as some ion signals [e.g., ion signal at m/z 1931.95 (aa 384-399) and ion signal at m/z 2088.05 (aa 384-400)] could only be attributed to the respective partial sequences of PKM2.

Figure 3
figure 3

MALDI TOF MS peptide mass fingerprint (A) of PKM2. Selected peptide ion signals are labeled. Numbers in parentheses indicate partial sequence ranges. Peptide ion signals that are distinctive for PKM2 are printed in bold. T: signals of trypsin autoproteolysis products. K: signals from know keratin contamination. Mox: oxidation at methionine residues. *Unidentified ion signal. Peptide ion with signal at m/z 1931.99 was subjected to MALDI QIT TOF MSn sequencing (B). The determined partial amino acid sequence is depicted and was assigned to a PKM2 peptide comprising amino acids 384–399. Amino acid residues are depicted in single letter code. The mass spectrometric fragment ions from the Y"-type ion series and the B-type ion series are indicated. *Unidentified ion signal. DHB was used as matrix

Again, mass spectrometric sequencing of the respective ion signals confirmed correct assignments, as shown for the sequencing result of ion signal at m/z 1931.95 (Figure 3b). Intense B-type and Y"-type ion series confirmed that the peptide originated from PKM2 as this sequence part (aa 384-399: EAEAAIYHLQLFEELR) is unique for this protein form and clearly differentiates it from the respective partial sequence of PKM1.

Note that spot 624 was higher in abundance in the tumor compared with the control samples (cf. Supplemental Figure 2 and Table 2). It is of importance to make precise protein isoform assignments in differentially abundant protein spots as in some cases only the respective protein isoforms have been reported to be differentially expressed and, more importantly, show different functions in tumor (see the Discussion section).

Similarly, molecular details of more abundant proteins, such as glyceraldehyde-3-phosphate dehydrogenase (GAP-DH, P04406), also enhance the understanding of different disease states. Glyceraldehyde-3-phosphate dehydrogenase (GAP-DH, P04406) was identified in three neighboring spots (cf. Figure 1 and Supplemental Figure 2; spots 348, 350, and 373) showing differential expression between tumor and control tissue. Mass spectrometry revealed that the separation of spot 373 from spots 348/350 in the second dimension is caused by the loss of the N-terminal part. Ion signals at m/z 805.43 (aa 6-13) and 2114.08 (aa 62-80), covering this sequence part, were clearly identified in spectra from spots 348 and 350 (Figure 4a) but were missing in the spectra from spot 373 (Supplemental Table 2). The most N-terminal peptide ion signal in this spot was found at m/z 1613.92 (aa 67-80). This and further ion signals were present in all three spots in which GAP-DH was identified. Loss of the N-terminus results in an observable migration shift to lower apparent masses, consistent with the mass spectrometric results (Supplemental Figure 2).

Figure 4
figure 4

MALDI TOF MS peptide mass fingerprint (A) of GAP-DH from spot 350. Selected peptide ion signals are labeled. Numbers in parentheses indicate partial sequence ranges. Peptide ion signals that show unusual isotopic patterns are indicated. The inserts show blow-ups for ion signals at m/z 1613.92 (left) and m/z 1763.83 (right). T: signals of trypsin autoproteolysis products. Mox: oxidation at methionine residues. *Unidentified ion signal. Peptide ion pair with signals at m/z 1613.92/1614.93 was subjected to MALDI QIT TOF MSn sequencing. Selected Y"-ion signals of MALDI QIT TOF MSn sequencing of signal at m/z 1613.92/1614.93 of GAP-DH in spot 350 (B) and of signal at m/z 1613.92 of GAP-DH in spot 373 (C). Fragment ion signals are labeled. DHB was used as matrix

Perhaps even more interesting, differences in the peptide maps between the protein spots were found with some ion signals studying their isotopic patterns. Ion signals in spectra from spots 348 and 350 at m/z 1613.92 and 1763.83 showed isotope patterns that did not match the calculated isotopic distribution and indicated the presence of a mixture of peptides with a mass difference of 1 u. The ion signal at m/z 1613.92 (LVINGNPITIFQER) was accompanied by an overlapping ion signal at m/z 1614.93. As in the respective sequence there are two asparagine residues (N70 and N72), with one deamidation event, the potentially present sequences are LVINGDPITIFQER and LVIDGNPITIFQER, respectively. Similarly, the ion signal at m/z 1763.83 was associated with an ion signal at m/z 1764.83. Inspection of the respective partial sequences suggested deamidation of asparagine residue 316 to aspartic acid.

Mass spectrometric sequencing was used for identification of the presumed deamidation sites. Hence, when spot 350 was analyzed, the ion signal pair at m/z 1613.92/1614.93 yielded MS/MS fragmentation spectra in which the Y"10 ion at m/z 1174.6 showed no abnormalities in the isotope distribution (Figure 4b). This result is consistent with the presence of an asparagine residue in position 72 of the GAP-DH sequence (LVIN / DGNPITIFQER). This fragment ion was also seen in fragmentation spectra with the ion at m/z 1613.92, belonging to the sequence (LVINGNPITIFQER), as precursor when spot 373 was investigated (Figure 4c). It is of note that this fragment ion signal was by far the most intense ion signal, which is consistent with the fact that the D70-G71 bond broke. In the alternative case, a bond breakage between N70 and G71 should not lead to a fragment ion with very high intensity in comparison to the other ions in the spectrum.

By contrast, in material from spot 350, the Y"11 ion at m/z 1288.78 was found to be accompanied by a fragment ion signal at m/z 1289.74 (Figure 4b). This result showed the presence of two related sequences: sequence LVINGNPITIFQER was accompanied by sequence LVIDGNPITIFQER. Comparison with the fragmentation spectrum derived from spot 373 showed that the Y"11 ion at m/z 1289.69 showed normal isotope distribution and, therefore, the presence of just one sequence (LVINGNPITIFQER).

Similar results were also found for the other deamidation events in GAP-DH showing partial deamidation of asparagine residues at positions 9, 64, and 316 (cf. Supplemental Table 2) with different abundances in the respective protein spots. GAP-DH in spot 348 was partially deamidated at positions 9, 64, 70, and 316, whereas deamidation of GAP-DH in spot 350 was observed at positions 9, 70, and 316. Note that the GAP-DH in spot 373 lacked asparagine residues 9 and 64 and showed no evidence for deamidation at any of the above mentioned asparagine residues.

Protein spot-derived proteolytic peptides also harbor the information for determining post-translational modifications, such as phosphorylation. As an example, the phosphorylation status of a nucleophosmin-derived peptide was determined and showed the presence of two partially phosphorylated serine residues (Figure 5a).

Figure 5
figure 5

MALDI TOF MS peptide mass fingerprint (A) of nucleophosmin after phosphopeptide enrichment. Selected peptide ion signals are labeled. Numbers in parentheses indicate partial sequence ranges. Ion signals that correspond to phosphopeptides are printed in bold. T: signals of trypsin autoproteolysis products. –NH3 indicates loss of ammonia. Peptide ion with signal at m/z 3362.30 was subjected to MALDI QIT TOF MSn sequencing (B). The determined partial amino acid sequences are depicted and were assigned to a nucleophosmin peptide comprising amino acids 104–134 with partial phosphorylation at serine residues 106 and 125, respectively. Amino acid residues are depicted in single letter code. dA marks a dehydroalanine residue. The mass spectrometric fragment ions from the B-type ion series as well as selected internal fragments and typical losses of water are indicated. *Partial amino acid sequence with dA in position 125. DHB was used as matrix

The mass spectrum of the peptide mixture from the protein (nucleophosmin) in spot 328 upon TiO2 chromatography for enrichment of phosphopeptides showed peptide ion signals that were present either due to phosphorylation or due to the acidic character of the peptides. Interestingly, in the MALDI QIT TOF mass spectrum (Figure 5a), the peptide ion signal at m/z 3460.28 (aa 104-134) was accompanied by an ion signal at m/z 3362.30, indicating loss of phosphoric acid (neutral loss; – 98 Da) and formation of a dehydroalanine residue (dA). Both peptide ion signals showed, in addition, loss of ammonia (–17 Da) in this analysis. As the MS/MS spectrum of the ion signal at m/z 3460.30 gave only very few fragment ion signals, the ion signal at m/z 3362.30 was fragmented as well.

Now, mass spectrometric sequencing (Figure 5b) was possible due to intense B-type ion signals. The fragment ion series revealed that the precursor ion signal harbored two closely related sequences that differed only in the position of the dehydroalanine residue. The first partial sequence reads CGdAGPVHISGQHLVAVEEDAESEDEEEEDVK with the dehydroalanine residue in position 106. The second sequence shows the dehydroalanine residue in position 125, reading CGSGPVHISGQHLVAVEEDAEdAEDEEEEDVK (marked with * in Figure 5b). The most indicative ion signal for the presence of two related partial sequences was the B21 ion signal pair that differed in mass by 18.08 Da. Note that B22 and higher numbered B-type ions have the same mass for both sequences whereas the B-type ions with lower numbers than B21 are all expected to be 18 Da apart from each other. These ion doublets were observed down to the B14 ion (marked with *). The MS/MS analysis clearly showed that partial phosphorylation took place at serine residues 106 and 125 but not at serine residue 112. Functional consequences of these phosphorylation events in nucleophosmin were not investigated. Phosphorylation on serine 70 could not be confirmed by MS/MS analysis because fragmentation of the ion signal at m/z 2225.08 (cf. Figure 5a) gave too few fragment ion signals. Hence, the appearance of the ion signal at m/z 2225.08 upon phosphopeptide enrichment remains the only piece of evidence for phosphorylation at Ser-70 in our experiments.

Post-translational modifications, such as phosphorylation, acetylation, and deamidation or the presence of different isoforms mainly caused by alternative splicing as well as by truncations can alter the electrophoretic mobility of a certain protein significantly. As we showed here, analyzing the mass spectra of a protein in its related 2D gel spots in detail led to the detection of such protein modifications that mostly remain unsolved in global proteome analyses. Precise knowledge of protein structure details is, however, of importance when defining proteome signatures that may be applied for patient screening.

3.4 Patient Sample Classification via Protein Spot Abundance Differences

In order to test whether the sub-signature of 38 protein spots (20 proteins) that came out of the 2D gel analysis was performing well with respect to differentiating tumor from gland tissue, we first performed a principal component analysis using the respective LC-MSE data set (patients A–C, Supplemental Figure 4a). The result resembled that of the original signature (that contained peptides from 60 proteins) quite well. Again, PCA separated the tumor samples from the controls (gland) along the PC1 axis, indicating that the major discriminating proteins were maintained in this subset of proteins. Also, tumor samples were clustering more closely together than control samples, suggesting that the remaining protein expressions were more tumor-related and contained less information on inter-individual differences.

Of more importance here, similar results could be obtained with the 2D gel data set, i.e., using the spot volumes of the 38 spots (that reflected the 20 proteins from the sub-signature) from samples belonging to patients A–C (Supplemental Figure 4b). Despite being slightly more disperse in appearance, the separation between tumor and control samples along the PC1 axis was well feasible.

In agreement with the PCA results, hierarchical clustering enabled clear separation of the tumor from the control samples applying both, the LC-MSE data (Supplemental Figure 5a) with 20 protein entries and the 2D gel spot volumes (Supplemental Figure 5b) with 38 entries, respectively.

The fact that the retrospectively designed proteome signature derived from the LC-MSE platform was at least in part transferable to the 2D gel based spot analysis technology and that the resulting sub-signature was capable to perform equally well with respect to separating the tumor from the control samples appropriately was encouraging to include more samples. The following investigations on additional samples can be regarded prospective as the separation rules were applied unchanged.

The three additional samples that were derived from patients diagnosed with invasive ductal carcinoma (patients D–F), were worked-up and analyzed by 2D gel electrophoresis exactly in the same way as was done with the samples from the first three patients. Protein spot matching showed that these latter 2D gels were well comparable to the first set of 2D gels. The requested 38 protein spots were found in all gels showing satisfying reproducibility. Hierarchical clustering (Figure 6) of the spot volumes from the 38 selected protein spots for all patients (A–F) was performed with the duplicate gel information and again showed that the separation of tumor and control samples remained accurate with only one exception.

Figure 6
figure 6

Hierarchical clustering analysis with spot volumes of 38 protein spots from 2D gel data that belong to the proteome signature for patients A–F with invasive ductal carcinoma. The duplicate spot volume values from the 2D gels yield in clear separation of the tumor samples from the control (gland) tissues for invasive ductal carcinoma, except for sample FG2. Tumor samples (T) and control samples (G) are shown in single lanes. Protein spot numbers are indicated at the right

Note that all tumor samples were clustered correctly in the tumor group. Only the sample duplicate FG2 was erroneously sorted to the tumor samples. All other control (gland) measurements were clustering in the control group. Such high accuracy of separation is considered satisfactory with respect to identifying tumor samples and again shows reliability of the sub-signature.

4 Discussion

To our knowledge, this is the first report of a follow-up study for estimating the robustness of a predefined proteome signature for invasive ductal carcinoma. At the same time, we show that the information that is contained in a sub-signature of 20 proteins is maintaining a strong separation power, even when using a different methodology for determining differential protein abundance. Comparing the list of proteins from our proteome signature with those published as "repeatedly identified proteins" [51, 52], it can be seen that only five from the 20 proteins of the sub-signature fall into the categories of proteins whose protein abundance differences are thought as being merely "stress-related" (Table 3).

Table 3 List of proteins from predefined signature found in differential 2D gel analysis

It has to be mentioned that in this comparison, information on protein structure alterations is not considered. The fact that one and the same protein was identified in more than one spot and that only some of these spots showed differences in abundance (according to our criteria for differential expression) was clearly pointing towards the presence of distinct protein forms and/or modifications.

Today, most biomarker studies put emphasis on high throughput methods for identification of proteins from complex protein mixtures for setting up proteome signatures. Protein structure details are often not determined due to lack of modern data base designs and/or sophisticated search software. More advanced tools are needed that contain information about PTMs in such a way that this information is easily and automatically accessible to the MS user. In the majority of cases, it is still necessary to manually analyze the spectra for interpretation of unassigned signals and to make further mass spectrometric analysis using MS/MS methods to identify or validate PTMs and/or protein isoforms. In addition, information on such modifications gets lost, particularly if the different isoforms do not have their own entry in the data bases. However, a detailed analysis of single proteins concerning PTMs and isoforms is absolutely necessary because they can change protein function and physicochemical properties, for example, in a tumorous tissue compared with the normal state.

Some protein examples from our proteome signature for invasive ductal carcinoma are discussed in the literature in the context of their potential roles as risk markers for primary tumor aggressiveness, metastasis power, and for disease prognosis. Notably, the M2 splice isoform from pyruvate kinase has been reported to be important for cancer metabolism and tumor growth [53, 54]. Its phosphorylation through tyrosine kinases has been demonstrated [53, 55, 56] as being crucial in this respect. Its quantities have been measured and were found to be up-regulated in tumors [57], which is consistent with our findings. Phosphorylated pyruvate kinase M2 is responsible for metabolic changes in tumor cells such that glucose metabolites are directed away from the citrate cycle into anabolic processes (Warburg hypothesis) [58, 59]. It has also been reported that pyruvate kinase M2 plays an important role in caspase-independent cell death of tumor cells, providing a novel target for cancer therapy [60].

Similarly, nucleophosmin is frequently overexpressed, mutated, rearranged, and deleted in human cancer. Traditionally regarded as a tumor marker and a putative proto-oncogene, it has now also been attributed with tumor-suppressor functions [61]. Nucleophosmin has been reported as up-regulated in anti-estrogen-resistant breast tumor cells (MCF-7/LCC9), where it functions as a nucleolar phosphoprotein inhibitor of the putative tumor suppressor interferon regulatory factor-1 [62, 63]. Nucleophosmin expression blocks cellular growth and proliferation, whereas phosphorylation of Thr198 is not essential for nucleophosmins capacity to drive cell cycle progression and proliferation [64]. Interestingly, anti-nucleophosmin autoantibody levels increase significantly between diagnosis and 6 mo before recurrence in recurrent patients, suggesting anti-nucleophosmin autoantibody levels to function as simple serum biomarker for predicting the timing of recurrence and for monitoring response to endocrine manipulations in breast cancer patients [65].

Another protein that has been found consistently up-regulated in tumor samples was GAP-DH, a glycolytic enzyme that is also involved in the early steps of apoptosis, nuclear tRNA export, DNA replication, DNA repair, and transcription [66]. This protein, when considered as a “housekeeping” protein, might serve as a useful marker for a tissue with high metabolism as opposed to a tissue with low overall metabolism. Clearly, mammary gland tissue decreases its metabolic activity after menopause and stands in contrast to tumor tissue in this respect. However, GAP-DH expression has been described as being dynamic, e.g., affected by hypoxic stress [67]. It also was reported that GAP-DH accumulates in the nucleus after cells were treated with genotoxic drugs, such as mercaptopurine or thioguanine [68, 69]. Interestingly, when overexpressed, GAP-DH has been found to function as interaction partner of NDKA causing a change in NDKA substrate specificity. The newly formed protein complex functions as a histidine kinase on a number of proteins that participate in signal transduction pathways [70, 71]. It remains to be elucidated whether the literature reports refer to full-length GAP-DH or to a GAP-DH with altered length as was found in our studies.

NDKA expression was observed to be up-regulated above the threshold of detection in the tumor samples, in agreement with literature reports [72, 73]. However, NDKA expression was found not to correlate with primary tumor growth but with potential for metastasis [74, 75]. An inverse relation between NDKA amount and risk of metastasis development has been described [74, 76]. It has been found that lack of NDKA expression in tumor cells increases cell motility and, as a consequence, metastasis potential [77]. At least three mechanisms are thought to contribute to the metastasis-suppressive effects of NDKA, (1) its histidine kinase activity; (2) binding proteins that titer out “free” NDKA and inhibit its ability to suppress metastasis; and (3) altered gene expression downstream of NDKA [75, 7880]. Regulation of p53, for instance, by NDKA mediation [81] and vice versa, regulation of NDKA by p53 [81] has been reported to occur in the tumor, ultimately implementing its metastasis potential. The intriguing roles of metastasis suppressor proteins and the signal pathways they are involved in have led to various attempts to potentiate their effects. Unfortunately, histidine phosphorylation is poorly studied in mammalian cells because of the lability of histidine phosphates that is due to their high-energy fast kinetics, which renders them invisible in most analytical assays. Phosphohistidine lacks an antibody [75]. By contrast, histidine kinases are well known in bacteria and lower eukaryotes, in which they form a major signal transduction pathway to extracellular events (two-component-signal-transduction systems) [82, 83]. In model studies with human breast carcinoma cells, it could be shown that the metastasis potential was decreased upon treatment with compounds that were capable to increase NDKA expression [84]. Recently, pharmaceutical companies have identified histidine kinase inhibitors as a next generation approach to antibiotic-resistant bacterial infection [85], and these technologies and agents may be readily transferable to the cancer field, once anti-metastatic activity is proven. Note that some small molecules have already been developed with the intention to restore the expression or mimic the function of metastasis-suppressor genes, such as NDKA, and some of them are under clinical trials [86].

Pyruvate kinase M2 as well as GAP-DH and lactate dehydrogenase from our signature were considered good prognostic markers for breast cancer disease progression upon immunologic investigations of biopsy material [87]. Due to advanced capabilities in determination of protein structure details, such as pinpointing the site of phosphorylation, further knowledge shall be added to the proteome signature for increasing its value. Such advanced knowledge shall enable us to better understand the pathophysiologic roles of a given protein in the context of the disease. As has been requested [32], the available methodology is fostering the combination of detailed clinicopathologic information with experimental data from proteome analyses that have been regarded to be necessary for uncovering candidates that may serve as key-markers for diagnostics and/or prognostics, or function as future therapeutic targets in clinical applications. As the robustness of our proteome signature has now been demonstrated successfully, future investigations shall focus on the applicability of the described proteins as marker for early detection or as potential target for patient-tailored therapy. In ongoing data analysis, we screen for specific a pattern among carcinoma-subgroup with respect to axillary nodal status. A tumor protein signature predictive of negative axillary status could lead to a less invasive surgical approach.