Introduction

Phosphorylation is one of the most widespread posttranslational modifications of proteins and also occurs in the organic matrix of biominerals [1, 2]. Protein FAM20C has recently been identified as a kinase involved in phosphorylation of such secreted proteins [3, 4], but other kinases may also be involved [5, 6]. In a few cases experimental evidence indicated an important function for phospho groups in biomineral matrix proteins. The best-examined matrix phosphoprotein in this respect is mammalian osteopontin, first described as a major non-collagenous bone protein. Among the many functions suggested for this protein since its discovery (reviewed, for instance, in [7, 8]) is also phosphorylation-dependent inhibition of mineralization processes [9]. Removal of phospho groups by alkaline phosphatase significantly reduces its inhibitory potential in in vitro crystallization assays [10] and un-phosphorylated recombinant osteopontin, but not in vitro phosphorylated osteopontin, fails to inhibit mineralization of human smooth muscle cell cultures serving as a model for human vascular calcification [11]. A crucial role of phosphorylated residues in the interaction with mineral is also reported for dentin matrix protein 1 and dentin phosphophoryn [12, 13]. The only invertebrate example so far is orchestin, a major matrix protein from crustacean calcium storage structures. Phosphorylation of orchestin is necessary for calcium binding of the protein [14].

The recently published genomes of biomineralizing organisms enable high-throughput mass spectrometry-based analysis of biomineral proteomes and phosphoproteomes, thus facilitating the fast identification of phosphoproteins and phosphorylation sites [15, 16]. In the present study we add the phosphoproteome of the Lottia gigantea shell matrix to the recently published Lottia shell proteomes [17, 18]. Furthermore, we have re-quantitated the Lottia shell proteome using the iBAQ (intensity-based absolute quantification) method [19] as implemented in MaxQuant. This showed that 57 proteins make up 98% of the total identified proteome. We suggest that quantitation allows the identification of major proteins, which are the most likely candidates for functional shell proteins, while retaining information about minor proteins, irrespective of whether these minor proteins play a role in mineralization or not, and irrespective of whether they occur intra- or extra-crystalline.

Materials and methods

Matrix and phosphopeptide preparation

Lottia shell matrix was prepared as previously described [17] using method B for shell cleaning (2 h sodium hypochlorite incubation with 2 × 5 min ultrasound treatment). Reduction, carbamidomethylation and enzymatic cleavage of matrix proteins were performed using a modification of the FASP (Filter-aided sample preparation) method [20] as outlined below. Two-mg aliquots of acid-soluble or acid-insoluble shell matrix were suspended in 300 μl of 0.1 M Tris, pH8, containing 6 M guanidine hydrochloride and 0.01 M dithiothreitol (DTT). This mixture was heated to 56°C for 60 min, cooled to room temperature, and centrifuged at 13000 rpm in an Eppendorf bench-top centrifuge 5415D for 15 min. The supernatant was loaded into an Amicon Ultra 0.5 ml 30 K filter device (Millipore; Tullagreen, Ireland). DTT was removed by centrifugation at 13000 rpm for 15 min and washing with 2 × 1vol of the same buffer. Carbamidomethylation was done in the device using 0.1 M Tris buffer, pH8, containing 6 M-guanidine hydrochloride and 0.05 mM iodoacetamide and incubation for 45 min in the dark. Carbamidomethylated proteins were washed with 0.05 M ammonium hydrogen carbonate buffer, pH8, containing 2 M urea, and centrifugation as before. Trypsin (20 μg, Sequencing grade, modified; Promega, Madison, USA) was added in 40 μl of 0.05 M ammonium hydrogen carbonate buffer containing 2 M urea and the devices were incubated at 37°C for 16 h. Peptides were collected by centrifugation and the filters were washed twice with 40 μl of 0.05 M ammonium hydrogen carbonate buffer. The peptide solution was acidified to pH 1–2 with trifluoroacetic acid (TFA) and peptides were vacuum-dried in an Eppendorf concentrator.

Phosphopeptides were enriched by reversible binding to TiO2 beads (Titansphere 10 μm, GL Sciences, Japan) following established protocols [21] but substituting 2,5-dihydroxybenzoic acid in the loading buffer by 6% trifluoroacetic acid (TFA) [22]. Briefly, beads were washed first in 80% acetonitrile containing 0.1% TFA (washing buffer), then in 80% acetonitrile containing 6% TFA (binding buffer). Peptides were dissolved in binding buffer (200 μl/peptides of 2 mg matrix) and added to approximately 5 mg of loosely pelleted TiO2 beads. The mixture was incubated on a rotating wheel for 45 min. After centrifugation the supernatant was again incubated with fresh TiO2 beads as before. The beads were then washed twice with 200 μl of binding buffer followed by 2 × 200 μl of washing buffer. Finally the loaded beads were filled into C8 Stage Tips and phosphopeptides were eluted with 2 × 100 μl of a solution containing 40% acetonitrile and 15% ammonia. The eluate was vacuum-dried in an Eppendorf concentrator to ~20 μl and acidified with TFA. The peptides were purified on C18 Stage Tips [23] after dilution to 200 μl with 0.5% acetic acid.

LC-MS analysis

Phosphopeptide-enriched samples were analysed on a Q Exactive high-performance Quadrupole Orbitrap mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) [24] connected to an Easy-nLC 1000 nanoflow HPLC system (Thermo Fisher Scientific). Peptides were separated on a 50 cm column with an inner diameter of 75 μm filled with 1.8 μm C18 beads (Reprosil-AQ Pur, Dr. Maisch GmbH, Ammerbuch, Germany) prepared as described [25]. Peptides were eluted with acetonitrile in 0.1% formic acid using a gradient of 5-30% acetonitrile in 95min, 30-60% in 30 min and 60-95% in 8 min at a flow of 250 nl/min and a column temperature of 50°C [25]. Mass spectra were acquired in a data-dependent manner by automatically switching between MS and MS/MS in a top 10 approach. The resolution was 70000 for full spectra and 17500 (both at m/z 200) for HCD-derived fragments. The dynamic exclusion time was 30 sec.

Data analysis

To estimate the percentage of each protein in the total identified shell proteome, raw-files used in a previous study [17; method B] were re-analysed using the iBAQ (intensity-based absolute quantification) method [19] as implemented in MaxQuant version 1.3.9.21. Carbamidomethylation was set as fixed modification, variable modifications were acetyl (protein N-term), oxidation (M), pyro-Glu (Q,E) and phospho (STY). Maximal FDR for peptide spectral match, proteins and site was set to 0.01. The maximal peptide PEP was 0.01. Minimal peptide length was 7 amino acids. The minimal score for modified peptides was 50 and the minimal delta score for modified peptides was 17. A minimum of two sequence-unique peptides was required for identification, except for proteins that were identified with two or more unique peptides previously in separately analysed acid-soluble and acid-insoluble fractions [17]. In very few cases new proteins were accepted with one unique peptide if this peptide occurred several times in different fractions and with an abundance of >0.01. The second peptide option was activated to enable identification of co-eluting peptides with very similar mass [26]. Two miss-cleavages were allowed. The databases used were Lottia FilteredModels (Lotgi1_GeneModels_FilteredModels1_aa.fasta.gz) and Lottia AllModels (Lotgi1_GeneModels_AllModels_20070424_aa.fasta.gz) [27] downloaded from (http://jgi.doe.gov/), and a LOTGI subset of UniProtKB v2013_7 entries downloaded from http://www.uniprot.org/. These were supplemented with the reversed sequences and common contaminants automatically and used for quality control and FDR setting by MaxQuant. Phosphopeptides were accepted if they occurred at least twice or were confirmed by analysis of phosphopeptide-enriched samples.

Peptide mixtures for enrichment of phosphopeptides were prepared from three biological replicates prepared according to method B of [17]. The acid-soluble and the acid-insoluble matrix of each biological replicate were used to prepare five technical replicates, resulting in 30 raw files that were evaluated together using MaxQuant [26, 28] version 1.3.9.21 with the same settings as above with a minimum of one sequence-unique phosphopeptide only, but sequenced at least twice and in different replicates. The decoy mode was set to reward in MaxQuant. Phosphopeptide spectra were validated using the MaxQuant Expert system, which provides additional fragment annotations not included in the routine annotation [29]. Criteria were the assignment of major peaks, occurrence of uninterrupted y- or b-ion series of at least four consecutive amino acids, preferred cleavages N-terminal to proline bonds, the possible presence of a2/b2 ion pairs, the presence of immonium ions, and mass accuracy. In general only phosphopeptide identifications with a localization probability of ≥0.75 were accepted. However, in some cases adjacent residues, such as X(n)-S-S-X(n), could not be resolved with the fragmentation pattern of the respective phosphopeptides, making it impossible to exactly localize the phosphorylation site. As a result, lower localization probability scores were attributed to several residues. Such phosphopeptides were also accepted. Phospho sites were searched for known kinase motifs using Phosida Motif Matcher (http://www.phosida.com/) [30, 31] and PhosphoMotif Finder (http://www.hprd.org/PhosphoMotif_finder) [32]. Most sequence-unique peptides were identified several times and site occupancy of phospho sites was estimated by comparing the number of unmodified to the number of phosphorylated forms of individual peptides.

Sequence similarity searches were performed with FASTA (http://www.ebi.ac.uk/Tools/sss/fasta/) [33] against current releases of the Uniprot Knowledgebase (UniProtKB). Other bioinformatics tools used were Clustal Omega for sequence alignments (http://www.ebi.ac.uk/Tools/msa/clustalo/) [34], InterPro (http://www.ebi.ac.uk/interpro) [35] for domain predictions, and SignalP 4.1 (http://www.cbs.dtu.dk/services/SignalP/) [36] for signal sequence prediction. Amino acid composition and theoretical pI were determined using the ProtParam tool provided by the Expasy server (http://web.expasy.org/protparam/) [37]. Intrinsically disordered protein structure was predicted using IUPred (http://iupred.enzim.hu/) [38] and methods provided by the PredictProtein 2013 server (https://www.predictprotein.org/) [39, 40]. GO categories for subcellular location were derived from UniProt and Lottia database entries, signal sequence predictions and similarity to known proteins.

Results and discussion

Re-analysis and re-quantitation of Lottia shell proteins with MaxQuant-implemented iBAQ

In search of the reasons for apparent differences in previously published Lottia shell proteomes [17, 18] we noticed that database searches were done using the AllModels database in [18] while [17] used the FilteredModels database containing entries supported by EST sequences. Therefore we re-analyzed the raw-files produced previously for acid-soluble and acid-insoluble matrix prepared according to method B [17] (also used to identify phosphoproteins in the present report) using a combination of both databases and a subset of Uniprot containing Lottia + gigantea entries. Furthermore, to determine the approximate abundances of the identified proteins, the iBAQ (intensity-based absolute quantification) method [19] as implemented in more recent MaxQuant versions was enabled in this search. The previously used [17] emPAI method [41] belongs to the spectral count methods based on counting the number of identified unique parent ions per protein. In contrast, iBAQ and similar algorithms are called intensity-based because they calculate the sum of parent ion intensities of identified peptides per protein. In both types of methods, the numbers of theoretically possible peptides per protein for the protease used in sample preparation enter the equation to account for different protein lengths and distribution and frequency of cleavage sites. Comparison of the two different types of methods show a higher accuracy of the intensity-based methods, including iBAQ (for instance [42]), indicating that they should be given preference. Furthermore, the emPAI method in its original form [41] as we used it has become somewhat obsolete because of the recent progress in technology. For instance, modern mass spectrometers and the associated software provide high-confidence identifications of much longer peptides than previously possible. Consequently these long peptides are not included into emPAI calculations [41], but are included in iBAQ calculation.

Irrespective of the quantitation method accurate quantitation certainly also depends on the quality and completeness of the available sequence databases. Sequences not contained in the database can be neither identified by high-throughput mass spectrometry-based proteomic analysis nor quantitated. The same applies to sequences having no cleavage sites for the protease used in sample preparation. Faulty combination of sequences belonging to different proteins into one database entry or unnoticed faulty allocation of fragments of one protein to different database entries can all bias quantitation results. Finally, the abundance of proteins bearing many posttranslational modifications will be underestimated if the modification is not included in the analysis. In spite of these caveats we believe that routine quantitation of proteins in in-depth proteomic studies may be a useful tool to identify possible functionally important proteins for further study. We express the abundances as percentage of the identified proteome, obtained by normalizing the iBAQ intensities to the sum of all intensities. While the decision what to count as a major protein or a minor protein still remains arbitrary, it may now be more comprehensible to the reader and will possibly facilitate the decision of which proteins to choose for further studies.

The results of this new search (Additional file 1: Table S1) now includes all proteins published by [18] and contains 496 proteins/protein groups. Of these, 382 protein/protein group identifications were accepted (Additional file 2: Table S2) according to the rules stated in the Materials and Methods section. Twenty-three proteins were identified in the AllModels database only or in combination with the UniProt entries, including several very abundant ones (Table 1). Many groups contained several AllModels entries testifying to the high redundancy in this database. The corresponding MaxQuant table with protein data is contained in Additional file 1 (Additional file 1: Table S1), which also includes identifications not accepted. These were, for instance, identifications with only one single peptide with low scores or insufficient sequence coverage. The peptide data of the more than 4000 sequence-unique peptides, including peptide sequences and scores, are shown in Additional file 3 (Additional file 3: Table S3).

Table 1 Fifty-seven proteins with an individual percentage of equal to or larger than 0.1% constitute 98% of the total identified proteome

Quantitation with iBAQ showed that only 18 proteins/protein groups of a percentage of more than 1% of the identified proteome already constituted approximately 82% of the entire identified proteome (Table 1). This group comprised two very abundant (>1%) proteins not contained in the FilteredModels database, the Asp-, Gly-, Lys- and Ser-rich peroxidase-like protein-1 (DGLSP_LOTGI/Lotgi1|162078) and the Gly- and Ser-rich protein-1 (GSP1_LOTGI/Lotgi1|239214) [18]. If a percentage of larger than 0.1% was chosen as a threshold, a total of 57 proteins (Table 1) amounted to approximately 98% of the total identified proteome. These included CCD2 (coiled-coil domain-containing protein 2; Lotgi1|234936), the perlwapin-like protein PWAP_LOTGI/Lotgi1|239121, and the EGF-like domain-containing protein 2 (ELDP2/Lotgi1|167423) [15], which were contained in the AllModels database but not in the FilteredModels database. Almost all proteins also identified in [18] were contained in this fraction of the proteome. Exceptions were the EF-hand calcium-binding domain-containing protein 1 and 2 (EFCB1/B3A0Q5, EFCB2/B3A0R9), and Threonine-rich protein LUSP-15/TRP/B3A0R4, which apparently belonged to the minor components of the identified proteome (Additional file 2: Table S2). However, we also identified several entries with a high similarity to EFCB2 based on sequence overlaps with sequence identities of 43-90% (Figure 1). Taken together, this protein family constituted slightly more than 0.1% of the identified proteome.

Figure 1
figure 1

Alignment of EFCB2 to similar sequences. Sequences covered by MS/MS-sequenced peptides are shown in red. Slashes in the sequence of Lotgi1|239519 indicate an insert between signal peptide and the EFCB2-like sequence that does not occur in the other entries. All shown entries were part of protein groups containing other similar sequences due to the high redundancy of the AllModels database.

In agreement with a previous study [18] the major proteins comprised three peroxidase-like proteins (Table 1) including the most abundant protein Lotgi|162078/DGLSP_LOTGI. Peroxidases are a large and widespread family of enzymes catalysing redox reactions using a variety of electron donors and acceptors, including organic molecules. Peroxidases have been implicated previously in mollusc shell formation [43]. Possibly they are responsible for the sclerotization of the periostracum [4446], a proteinaceous layer confining the mantle cavity before the start of mineralization. As discussed previously [18] one may hypothesize that peroxidases function in stabilization of the newly secreted matrix by cross-linking some of its components. Another major protein, the abundance of which was noticed only using the AllModels database because the FilteredModels only contained a small fragment, was Lotgi1|166131. In this protein a long stretch of sequence with predicted disordered structure is followed by a predicted superoxide dismutase domain. Superoxide dismutases are a family of enzymes with widespread subcellular distribution that remove superoxide, a normal aerobic metabolite. One reaction product of superoxide dismutases is H2O2, a substrate of peroxidases.

In general, very little is known about the possible functions of shell matrix proteins, but in some cases similarities to known proteins and predicted domain structures may provide some clues for further studies. Predicted domain structures, GO terms for subcellular location, unusual amino acid composition features (amino acids representing ≥ 10% of the sequence) and theoretical isoelectric point for major identified Lotgi entries are included in Table 1. Extremely acidic matrix proteins (pI below 4.5) have found much interest in biomineralization research because of the possibility of direct interaction with the positively charged biomineral cations and have been hypothesized to act as nucleation sites involved in crystal formation [47]. The group of 57 proteins with an abundance of >0.1 includes eight of such uncharacterized unusually acid proteins (Table 1) that may deserve to be studied in more detail. Many proteins isolated from biominerals contain sequence regions of intrinsically disordered structure, a feature that is implicated in protein-protein interaction and mineral binding [48, 49]. Table 1 includes several proteins with extended sequence regions of predicted disordered structure, such as the peroxidase-like protein-1 (DGLSP_LOTGI), the methionine-rich protein MRP_LOTGI, peroxidase_like 3 (PLSP3_LOTGI), and the uncharacterized proteins in Lotgi1|163637, 159331, 235610, 234884, 171084, 158316, 236690, and 239574. In two sequences both features, unusual acidity and predicted long-range structural disorder, coincide (Lotgi|159331, 171084). However, like all predicted features, predicted structural disorder needs experimental validation before far-reaching conclusions can be drawn.

Sometimes predicted domains strongly indicate involvement of the respective protein in biomineralization events. The putative carbonic anhydrases encoded in Lotgi|238082/CAH1 and Lotgi|239188/CAH2 and discussed previously [18] may be important for carbonate ion delivery. Also of special interest are proteins containing chitin-binding domains, such as Lotgi1|226726, 228264, and 239574. Many mollusc shells contain chitin-based extra-crystalline scaffolds and chitin-binding proteins may be important for organizing such scaffolds or may mediate interactions between chitin and the calcified matrix [50]. However, for most proven and putative shell matrix proteins the function remains unknown at present.

Most of the identified proteins were only minor, or trace, components that may not have a function in biomineralization. However, it should be emphasised that there may be exceptions. For example, protein FAM20C (0.006% of the Lottia shell proteome; Additional file 2: Table S2), was recently identified as a Golgi apparatus kinase responsible for the phosphorylation of many secreted proteins, including proteins important for biomineralization [3, 4]. This kinase is also secreted to some degree, may be active in the extracellular space [5], and may enter biominerals in the company of its substrates. Of course this does not imply any function within the matrix but may explain its presence there. Other examples of the possible importance of trace components for biomineral formation are the sea urchin spicule proteins P58-A and P58-B. The extracellular domains of these predicted transmembrane proteins were detected as minor components in sea urchin spicule matrix [51] and both were subsequently shown by knock-down experiments to play an essential role in sea urchin larval skeletogenesis [52]. Also among the trace components are proteins known to have a predominantly intracellular location, such as cytoskeletal components and cytosolic enzymes (Additional file 2: Table S2). We think that these proteins do not have a function in biomineralization. However, even trace components with a well-defined intracellular role, such as ubiquitin (now also known to occur in the extracellular space, however [53]) may have a true role in biomineralization, such as in the matrix of the Pinctada fucata shell prismatic layer [54]. Finally it should be considered that the number of up-regulated genes, for instance after shell damage [55], is usually much larger than the number of major proteins identified in shell matrices. Possibly many of the trace proteins reflect regulatory or catalytic processes involved in the mineralization event at some point.

The phosphoproteome

Because of the low number of different proteins in the shell matrix and because the HCD (higher energy collisional dissociation) fragmentation method used in the previous shell proteome analysis [17] enables phosphopeptide analysis at high resolution and mass accuracy in the LTQ Orbitrap Velos [56, 57] without the need for neutral loss-dependent MS3 or multistage activation [58] used previously with CID fragmentation, we included phosphorylation as a variable modification in this re-analysis. The results indicated (Additional file 1: Table S1) that several major and a few minor proteins were phosphorylated to a variable extent. These preliminary results were validated by analysis of phosphopeptide-enriched samples of shell matrix proteins (Additional file 4: Table S4). Thirteen of these were confirmed by analyzing phosphopeptide-enriched fractions. Three more were identified only in phosphopeptide-enriched samples (Additional file 4: Table S4), yielding a total of 20 phosphoproteins. The MaxQuant phosphopeptide output table is shown in Additional file 5: Table S5. Nine major proteins with a percentage of more than 1% of the identified protein and five with a percentage between 0.1% and 1% (Table 1) were identified as phosphoproteins. Simultaneous determination of phosphorylated and non-phosphorylated versions of the phosphopeptides in the general survey without prior enrichment enabled an approximate estimation of site occupancy (Additional file 4: Table S4), which was very low in most cases. Site occupancy in the group of major proteins was highest in GEPRP/B3A0P5 and the uncharacterized protein of Lotgi1|154020. While GEPRP contained only two closely spaced phosphorylation sites, Lotgi1|154020 contained four sites in three peptides (Additional file 4: Table S4). This high site-occupancy strongly indicates that phosphorylation of these proteins may be functionally important. Three proteins, DGLSP/B3A0P1, PLSP2/B3A0P3 and CCD1/B3A0Q3 yielded more than three phosphopeptides with variable site-occupancy (Additional file 4: Table S4). Of these, Coiled-coil domain-containing protein 1 (CCD1)/B3A0Q3 was already shown to be extremely acidic previously [18], a feature that is enhanced by phosphorylation. This may be taken as a further indication of a very important, but as yet not understood, role of this protein in Lottia shell assembly.

Taking into account the number of phosphorylation sites and site occupancy, CCD1/B3A0Q3 may be considered as the major phosphoprotein of the Lottia gigantea shell matrix. We want to point out, however, that densely phosphorylated proteins with highly repetitive sequences, such as dentin phosphoryn, which contains almost exclusively aspartic acid, asparagine and phosphoserine [2], require special techniques to be identified and may be missing from our analysis.

A search for sequences including phospho sites for known kinase motifs indicated that approximately one third (16 of 46) of the unique S/T phospho sites comply with the Fam20C recognition site S-x-E or related motifs (S/T-x-E/D/pS/pT) [3, 4]. This percentage is in good agreement with the approximately 24% of human secreted phosphoproteins modified at the serine of the canonical FAM20C motif S-x-E [6]. However, much less is known about phosphorylation in invertebrate secreted proteins and the kinases involved. Therefore it is unknown whether these recognition sites are conserved between vertebrates and invertebrates. Five of the sites identified are in agreement with the typical casein kinase 2 motif S-x-x-E also modified in the mammalian mineralization-inhibiting protein osteopontin, and ten sites comply with the casein kinase 1 motif (D/E)n-x-x-S/T [1] indicating that secreted or membrane-bound kinases with casein-kinase-like activity are involved. Evidence for such kinases is summarized in [5, 6].

Conclusions

Our approach to proteomes of invertebrate biominerals consists of washing the biominerals with hypochlorite in a less stringent way than proposed recently [59] to preserve extra-crystalline matrix components, and to identify as many proteins as possible after in-gel digestion of slices of the entire gel [17] irrespective of staining intensity, or after in-solution digestion using filter-aided sample preparation (FASP) [20]. Included in protein identification is quantitation, which was done using exponentially modified protein abundance index (emPAI) [41] previously [17], but was recently superseded [60] in favor of the more accurate automated iBAQ method [19] as implemented in more recent versions of MaxQuant. We believe that this approach is well suited to identify candidates for functional matrix proteins, most likely found among the most abundant components, while retaining all of the information about trace components, irrespective of whether these may have a function in biomineralization or not, and irrespective of whether they are intra-crystalline or belong to the extra-crystalline matrix. Proteins predominantly located intracellularly, such as cytoskeletal components, ribosomal proteins, proteasome subunits or cytoplasmic enzymes, belong to the minor components of the Lottia shell proteome (Additional file 2: Table S2) constituting only an insignificant fraction of the total. However, the identification and quantitation of such proteins may also depend in some way on the biomineral examined, the instrumentation used, and the washing procedures applied to the shell and we agree with others [59, 61] that the mere presence of such proteins in the matrix sample does certainly not imply a function.The group of major proteins also contains several phosphoproteins. Those yielding high-occupancy phospho sites and/or many phosphorylated sequence-unique peptides were already identified without prior phosphopeptide enrichment in a general survey. However, subtleties such as the occurrence of different sites with high localization probability within one peptide sequence (Figure 2) are more likely detected with the higher copy numbers usually provided by phosphopeptide-enriched samples. Nevertheless, inclusion of phosphorylation among the variable modifications in general studies of low complexity proteomes may give an overview of what to expect with phosphopeptide-enriched samples and may provide a rough estimate of phospho site occupancies.

Figure 2
figure 2

An example of different partially occupied phospho sites in one sequence. This peptide occurs in the sequence of DGLSP/B3A0P1/Lotgi1|162078 (Aspartate-, glycine-, lysine- and serine-rich protein, aa324-335). A, peptide variant with phosphotyrosine identified by an uninterrupted series of y-ions for the rest of the sequence and the very intense diagnostic pY immonium ion at m/z 216.042. Expert annotations [29] were omitted, except for the major peak at m/z 120.0809 (phenylalanine immonium ion), to keep the spectrum clear. The doubly charged peptide ion was measured with a mass error of −0.014 ppm. PEP and phosphphorylation site localization probability were calculated by MaxQuant to be 8.96e-93 and 0.999. B, this time S4 was determined as the phosphorylation site in an uninterrupted series of y-ions from y1 to y11. The mass error was −0.490 ppm, PEP was 1.16e-54 and the localization probability was 1.00. Major peaks at m/z 120.0809 and 136.0756 were annotated by the MaxQuant Expert system as the phenylalanine immonium ion and the a1-ion. A major peak at m/z 192.1016 was not annotated. Expert annotations of most of the minor peaks are omitted for clarity. C, a third phosphorylation site at S8 was detected with a localization probability of 1.00 in still another variant of this peptide measured with a mass error of 0.531 ppm and with a PEP of 3.28e-164. Again, most expert annotations are omitted. *, ions showing a loss of H3PO4 from phosphoserine. Y-ions are shown in red, b-ions are shown in blue, b-or y-ions with a loss of ammonia or water are in orange, the a2 ion is shown in light blue, black identifies ions without annotation unless the annotation is shown on top of the peak.