Background

Sea urchin is an important model organism for developmental biology and in particular skeletogenesis, providing insight into common principles of biomineralization [13]. Like other biominerals, sea urchin skeleton elements are composite materials containing, in addition to the mineral component, a small percentage of biopolymers, the organic matrix. This network of organic molecules, pervading the mineral, controls the formation of biominerals and contributes to their final properties [46]. Research on the sea urchin model was boosted by the recent publication of the complete Strongylocentrotus purpuratus genome [7, 8]. The genome sequence enabled the search for potential novel biomineralization-related proteins and their transcriptional regulation [9]. It also made possible the direct identification of matrix proteins by mass spectrometry-based proteomics, revealing an unexpected complexity of test (shell), spine and tooth proteomes [10, 11]. However, proteomes cannot be considered complete without determination of post-translational modifications. One of the most widespread post-translational modifications occurring in proteins of biominerals is phosphorylation [12]. Organic matrices of biominerals as diverse as mammalian tooth and bone [13], chicken eggshell [14] or mollusk shell [15] contain phosphoproteins. In a few cases, such as crustacean orchestin [16], mammalian phosphoryn [17] or osteopontin [18, 19], phosphorylation was shown to be crucial for proper folding, calcium binding, and other function-related properties of these proteins. Phosphorylation of mammalian extracellular proteins involves casein kinase-like enzymes of the endoplasmic reticulum and Golgi apparatus [12, 20, 21] and membrane-bound ectokinases [22, 23]. Although sea urchin is a major model organism for biomineralization studies, data about phosphoproteins in sea urchin skeletal elements are scarce.

At least 353 putative protein kinases were predicted to be encoded in the S. purpuratus genome [24]. Some of these kinases apparently play a role in skeleton formation, as shown by kinase inhibition studies in embryos and cultured spicule-producing mesenchymal cells [2528], where specific kinase inhibitors prevented formation of skeletal elements. However, only one study reported the presence of phosphorylated proteins in a sea urchin mineralized structure, the tooth of Lytechinus variegatus [29]. However, no phosphorylation sites were identified. In test (the commonly used name for the sea urchin shell), phosphate groups were reported to be attached to unidentified matrix molecules [30].

Mass spectrometry has become the method of choice for detection of protein phosphorylation during the last decade because the method is highly sensitive, does not need radioactive labeling, enables determination of the modified amino acids, and does not require purification of the proteins [31]. MS-based phosphoproteomics involves the enzymatic cleavage of mixtures of proteins, fractionation of the peptides by nanoscale liquid chromatography, and analysis by mass spectrometry. However, because phosphopeptides usually constitute only a minor fraction of the peptide mixture used in automated high-throughput analysis of complex mixtures, phosphopeptides are usually enriched and specialized mass spectrometric methods, such as neutral loss-dependent MSn, have to be used for their detection [32]. The phosphoproteomes of sea urchin test (shell) and tooth organic matrix were analyzed using enrichment of phosphopeptides by reversible adsorption to TiO2 and neutral loss-triggered multistage activation [33] in LTQ-Orbitrap and LTQ FT Ultra mass spectrometers. Data were analyzed using MaxQuant [34], a novel integrated suite of algorithms developed for the analysis of high-resolution mass spectrometry data. Using this toolkit, 21 phosphoproteins were identified in test matrix and 15 in tooth matrix.

Materials and methods

Preparation of organic matrix

Sea urchins were killed by freezing. The shells (also called tests) were cut into two halves and emptied. The skeleton elements were soaked in sodium hypochlorite solution (6-14% active chlorine; Merck, Darmstadt, Germany) for 1 h with four changes of solution and ultrasonic treatment (Branson Sonifier model 1200) for 5 min after every change. The cleaned, disconnected calcified elements were washed with water, air dried, and collected separately. Teeth were powdered with mortar and pestle and the powder was washed with sodium hypochlorite again [11]. The water-washed, air-dried test plates and tooth powder were separately demineralized in 50% acetic acid (20 ml/g of dry biomineral) for 14-16 h at 4-6°C. The turbid suspension was dialyzed successively against 2 × 10 vol. 10% and 2 × 10 vol. 5% acetic acid at 4-6°C (Spectra/Por 6 dialysis membrane, molecular weight cut-off 1000; Spectrum Europe, Breda, The Netherlands). The precipitate, which formed during dialysis, and the clear supernatant were lyophilized together.

Preparation of peptides and enrichment of phosphopeptides

Test and tooth matrix proteins were carbamidomethylated as before [35] using iodoacetamide instead of iodoacetic acid. Reagents were removed by dialysis against 5% acetic acid. The protein concentration in carbamidomethylated organic matrix was determined by amino acid analysis in a Biotronik LC3000 analyzer after hydrolysis of the sample in 6 M HCl for 24 h at 110°C.

Organic matrices extracted from a total of nine tests were pooled. Aliquots of 2 mg of carbamidomethylated test matrix were suspended in 10 mM Tris buffer, pH 8, containing 6 M urea and 2 M thiourea, and cleaved with lysyl endopeptidase (Wako Chemicals, Neuss, Germany). After 8 h the reaction mixture was diluted with 0.05 M ammonium hydrogen carbonate to 2 M urea/thiourea and trypsin (sequencing grade modified trypsin, Promega, Mannheim, Germany) was added for 14 h of incubation. In a second approach lysyl endoprotease cleavage products of 2 mg test matrix were further cleaved with endoproteinase Asp-N for 14 h in 2 M urea (sequencing grade; Roche, Mannheim, Germany). Finally, cleavage products of 2 mg matrix samples were successively cleaved with lysyl endopeptidase (14 h), trypsin (8 h), and endoprotease Asp-N (14 h). The enzyme to matrix ration (w/w) in proteolytic digests was 1:50 for lysyl endopeptidase, 1:100 for trypsin and 1:300 for endoproteinase Asp-N. Incubations were performed at room temperature. Reactions were stopped by addition of trifluoroacetic acid to pH ~2. Insoluble material was removed by centrifugation and the supernatant was dried by vacuum centrifugation. Each combination of enzymes was applied to three or four 2 mg aliquots.

Tooth matrix extracted from 300 teeth from 60 sea urchins was combined. Aliquots of 1 mg carbamidomethylated matrix were cleaved with combinations of lysyl endopeptidase and trypsin or lysyl endopeptidase (8 h), Asp-N (14 h) and trypsin (8 h), as described for test matrix. Each cleavage method was performed with three aliquots of tooth matrix.

Phosphopeptides were enriched by reversible binding to TiO2 beads (GL Sciences, 10 μm beads) in a batch protocol [14, 36, 37]. Briefly, 2,5-dihydroxybenzoic acid was added to the acidified peptide mixture to a final concentration of 5 mg/ml and the mixture was incubated with 10 mg of washed beads/sample for 3 h at room temperature. The mixture was then briefly centrifuged to sediment the beads and the supernatants were incubated for another 3 h with a fresh batch of beads. The loaded TiO2 beads were washed with 50% acetonitrile in 0.1% trifluoroacetic acid, and the bound peptides were eluted with 15% NH4OH. The eluted peptide mixtures were acidified to pH~2 with trifluoroacetic acid and cleaned with C18 Stage (stop and go extraction) Tips [37].

LC-MS and data analysis

C18 reversed phase LC and mass spectrometric analysis was performed using a Proxeon Easy-nLC (Proxeon Biosystems, Odense, Denmark; software version 2.0) coupled to a LTQ-Orbitrap or LTQ-FT Ultra mass spectrometer (Thermo Fisher Scientific) via a nanoelectrospray ion source (Proxeon Biosystems). Full scans were recorded in the Orbitrap analyzer at a resolution of 60,000 or in the FT-ICR with a resolution of 100,000 (at m/z = 400) followed by MS/MS of the ten most intense peptide ions in the LTQ analyzer. Neutral loss-triggered multistage activation (Pseudo MSn [33]) for simultaneous fragmentation of neutral loss product and precursor was enabled at -97.97, -48.99 and -32.66 Th relative to the precursor ion, corresponding to a neutral loss of phosphoric acid from singly, doubly and triply charged ions.

Data analysis was performed using MaxQuant v1.0.12.33 [34] http://www.maxquant.org/, a software package making use of the Mascot search engine (Matrix Science, London, UK; version 2.2.04) for database searches. The database used consisted of the Strongylocentrotus purpuratus annotated gene models (Glean3) protein sequence database (ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Spurpuratus/fasta/Annotation ([7]; see also http://goblet.molgen.mpg.de/cgi-bin/seaurchin-genombase.cgi for further information about Glean [38]), the corresponding reversed database, and the sequences of common contaminants including human keratins from IPIhuman (a total of 58052 sequences). Carbamidomethylation was set as fixed modification. Variable modifications were oxidation (M), N-acetyl (protein), pyro-Glu/Gln (N-term) and phospho (STY). The initial peptide mass tolerance was set to 7 ppm and the MS/MS tolerance was set to 0.5 Da. Two missed cleavages were allowed. The minimal length required for a peptide was seven amino acids. The peptide and protein false discovery rates (FDR) were set to 0.01. The maximal posterior error probability (PEP), which is the probability of each peptide to be a false hit considering identification score and peptide length [34], was set to 0.01. At least one MS2 spectrum of each identified peptide was manually validated considering the assignment of major peaks, occurrence of uninterrupted y- or b-ion series of at least 3 consecutive amino acids, preferred cleavages N-terminal to proline bonds and C-terminal to Asp or Glu bonds, the possible presence of a2/b2 ion pairs, the presence of neutral losses from fragments, and mass accuracy. The ProteinProspector MS-Product program http://prospector.ucsf.edu/ was used to calculate the theoretical masses of fragments of identified peptides for manual validation. Localization probability values for phosphorylation sites were derived as described [39]. In addition to kinase motif prediction comprised in MaxQuant, we used NetPhos http://www.cbs.dtu.dk/services/NetPhos/ [40], NetPhosK http://www.cbs.dtu.dk/services/NetPhosK/ [41], and Phosida http://www.phosida.com [42] for phosphorylation site and kinase motif identification and prediction.

BLAST analysis was performed with the program provided by NCBI http://www.ncbi.nlm.nih.gov/blast and by searching against the non-redundant database for all organisms. FASTA and MPsrch search programs were used as provided by the European Bioinformatics Institute (EBI, http://www.ebi.ac.uk) searching against UniProt Knowledgebase and UniProtKB/Swiss-Prot protein sequence databases. Domains were predicted with NCBI Conserved Domain Search [43] and the MotifScan program of http://www.expasy.org/tools.

Results and Discussion

Test (shell) matrix phosphoproteins

The carbamidomethylated matrix of hypochlorite-washed test plates contained 15-20% protein as determined by amino acid analysis (w/w). Not all of the material was soluble in 8 M urea/thiourea, which was used to suspend the matrix before proteolytic degradation. Residual insoluble material was sedimented after proteolysis and contained 10-15% of the total protein by amino acid analysis. These observations agreed with earlier results indicating that a large part of the sea urchin test matrix is not protein [30, 44]. The highest number of peptides was obtained by cleavage with lysyl endopeptidase and subsequent treatment with trypsin. The other protease combinations yielded only few new peptides and no typical endoprotease Asp-N-derived peptide (Table 1). Altogether 27 unique phosphopeptides from 21 phosphoproteins were identified. These proteins contained 37 phosphorylation sites, the majority of which was identified with a localization probability p > 0.75 [39] (Table 1; Additional file 1: Test matrix protein phosphorylation sites; Additional file 2: Selected spectra of test matrix phosphopeptides).

Table 1 Test matrix phosphopeptides

Only two of the test matrix phosphoproteins (P19, SM30-E; Table 1) were identified in recent proteomic studies of spine and test matrix [10] or tooth matrix [11]. This indicates that most of the newly identified proteins were minor components in the matrix of hypochlorite-washed test plates and were detected in the present analysis due to the enrichment for phosphopeptides. The low abundance of phosphoproteins may explain the results of a previous study, which failed to identify any phosphorylated protein in test matrix [30]. Only two of the test matrix phosphoproteins were also identified in tooth matrix (Table 1, Table 2).

Table 2 Tooth matrix phosphoproteins

Test matrix phosphoproteins related to biomineralization

The only spicule matrix protein found to be phosphorylated was SM30-E (Table 1, Fig. 1), a major component of the test matrix [10]. However, the analyzed peptide mixture contained more non-phosphorylated copies of this peptide than phosphorylated ones, indicating that the protein was only partially modified at this single site. The test (shell) phosphoproteome (Table 1) also included P19, which was previously tentatively associated with mineralization processes because of its specific expression in spicule-forming primary mesenchymal cells of the sea urchin embryo [45]. P19 was recently identified as a phosphoprotein in Lytechinus variegatus teeth, probably as an intracellular component. The phosphorylation sites were, however, not determined [46]. This protein was also detected in the proteome of S. purpuratus tooth tissue previously, but was apparently not an intra-crystalline component because it disappeared from the organic matrix proteome after powdering of the teeth and hypochlorite treatment of the powder to remove residual cellular debris and extracellular matrix [11]. The single phosphorylation site, contained in an acidic peptide, was identified from a complex spectrum of product ions resulting from multiple ion activation events (Fig. 2). The site was not completely occupied since non-phosphorylated versions of this peptide were also identified. The powdered tooth matrix phosphoproteome did not contain this protein (Table 2).

Figure 1
figure 1

The phosphorylation site of protein SM30-E. This peptide was also identified in a non-phosphorylated version in the present and previous studies [10, 11], indicating that this site is only partially modified. The spectrum shows an uninterrupted series of y ions (y2-y7). This sequence tag, supplemented by some b ions, and the accurate mass of the complete peptide measured in the orbitrap, allowed the identification of this peptide by database searches. The most intense ions, y7 and y4, are due to preferential cleavage N-terminal of proline residues in position 7 and10 of the peptide sequence. This is a well known feature of Pro-containing peptides frequently used for manual validation of peptide assignments. Loss of H3PO4, indicated by —P, is first observed in b9 and y5, indicating phosphorylation of Thr in position 9 of the peptide sequence. Loss of NH3, indicated by -17, frequently occurs upon fragmentation of Asn-containing peptides. Cyclization of N-terminal Gln to pyroglutamate is common in peptides with N-terminal Gln.

Figure 2
figure 2

The phosphorylation site of protein P19. This peptide was also identified in a non-phosphorylated version in the present and a previous study [11] indicating that the site is only partially modified. P19 was implicated in biomineralization events previously [45] and was identified as a phosphoprotein in L. variegatus tooth tissue by phosphor-specific staining [46]. However, previous results also indicated that this protein was at best a very minor component of the intracrystalline matrix [11]. A high match of observed to theoretically expected fragments, including a sequence tag of y3-y7, together with the accurate measurement of the peptide mass, allowed the identification of this peptide. Loss of H3PO4, indicated by —P, and first observed with y10 indicated the presence of a phospho group at the only serine in the peptide sequence. Loss of NH3 and water is indicated by -17 and -18, respectively. These neutral losses are frequently observed in peptides containing Glu and Gln.

Phosphorylated novel test matrix proteins

High-quality spectra were obtained for peptides derived from a novel protein (hypothetical protein [Glean3:20139]) [Fig. 3 and Additional File 2: Figs. S1.16, S1.17 and S1.18]. One phosphorylated region, comprising three phosphorylation sites, was sandwiched between an extended proline- and threonine-rich motif and a 70aa-long glutamine-rich motif (Fig. 4). Another phosphopeptide was found in the N-terminal part of the Glu-rich motif (Fig. 4). No known domain signatures or similarities were identified for this sequence in database searches. Another hypothetical protein with multiple phosphorylation was encoded by [Glean3:16285/16293] (Table 1). The 1285 amino acid-long, moderately acidic (calculated pI 5.2) sequence contained a LIM domain (aa50-110), two proline-rich motifs (aa145-186, aa773-789) and a bipartite nuclear localization signal (aa773-789).

Figure 3
figure 3

Identification of the phosphorylation site of peptide NVAEAAGLSSNEVTQVK. Similar to the spectrum in Fig. 2, this spectrum shows the complexity to be expected from fragmentation of a relatively long peptide by multistage activation. However, the presence of extended series of y and b ions and the accurate mass of the intact peptide measured in the orbitrap enabled the unequivocal identification of the sequence. The phosphorylation site is identified by the increase of 80Da in the y-ion series starting with y8 and the absence of such an increase in b-ions up to b9. Loss of H3PO4, H2O, and NH3 is indicated by -P, -18, and -17, respectively. The neutral loss of water and NH3 is frequently observed upon fragmentation of peptides containing Asn, Gln, Glu, Ser and Thr, also contained in this peptide.

Figure 4
figure 4

Analysis of the [Glean3:20139] sequence. The Thr- and Pro-rich sequence (25% Thr, 26% Pro) is shaded light grey and the Gln-rich sequence (63% Gln) is shaded dark grey. Identified phosphopeptides are underlined, phosphorylated Ser are in bold italics. A phosphorylated region comprising three phospho sites is sandwiched between these two domains, while one phospho site is in the N-terminal region of the Gln-rich domain.

Test matrix phosphoproteins with high similarity to vertebrate proteins with known function or activities

The test matrix contained several phosphoproteins with known counterparts in vertebrates (Table 1). In many cases the phosphorylation sites of the sea urchin proteins were equivalent to those of mammalian proteins. This is illustrated for a sea urchin protein similar to pleckstrin homology domain-containing F2 [Glean3:18649] in Fig. 5. The human counterpart, PKHF2_HUMAN, has an overall sequence identity of ~63% to the sea urchin protein and was reported to be phosphorylated at Ser239 [39] and 248 [47]. While Ser239 was conserved and phosphorylated in the sea urchin protein, Ser 248 of the human protein was not present in the sea urchin protein. Another test matrix protein containing a conserved phosphorylation site was the src family kinase SFK3 [Glean3:05419]. The phosphorylated Tyr362 of this kinase is equivalent to phosphorylated Tyr426 of human proto-oncogene tyrosine-protein kinase YES (YES_HUMAN, Uniprot: P07947), which has an overall sequence identity of ~60% to the sea urchin protein. This tyrosine is a target of autophosphorylation in mammalian c-YES kinase [48]. The members of the S. purpuratus Src family have been cloned recently and their recombinantly produced SH2 domains were shown to be necessary for Ca2+ release at egg fertilization [49]. Also included in this group of proteins with conserved phosphorylation sites were two proteins similar to acidic ribosomal proteins P0/P1/P2 ([Glean3:15870/27123] and [Glean3:21739]). Both peptides were from the acidic C-terminus of these proteins which contains two phosphoserines as part of casein kinase target sites [50]. Entry [Glean3:08678] encoded a protein with similarity to vertebrate protein kinase C-delta 1. The identified phosphopeptide derived from the sea urchin protein matches a region of human protein kinase C type delta (KPCD_HUMAN, [Uniprot:Q05655]), which contains several known phosphorylation sites [51]. Thus, phosphorylated Ser736 of the sea urchin protein is equivalent to phosphorylated Ser654 of the human protein. However, we could not unequivocally localize the second phosphorylated residue of this peptide (Table 1). Finally, phosphorylated Ser38 of the sea urchin protein similar to translation initiation factor 3, subunit 8, [Glean3:26863] was equivalent to phosphorylated Ser39 of the human protein [SwissProt: Q99613] [52].

Figure 5
figure 5

Alignment of [Glean3:18649] (Similar to pleckstrin homology domain-containing protein) phosphopeptide to its human counterpart. Phosphoserine 239 [39, 47] of the human pleckstrin homology domain-containing family F member 2 protein is also phosphorylated in the homologous sea urchin protein. The phosphorylated serine 248 of human PKHF2 [47] is not conserved in the sea urchin protein.

Tooth matrix phosphoproteins

Amino acid analysis of carbamidomethylated tooth matrix indicated a protein content of 30-40% (w/w). The tooth phosphoproteome isolated from this matrix comprised 15 proteins (Table 2; Additional file 3: Tooth matrix protein phosphorylation sites; Additional file 4: Selected spectra of tooth matrix protein phosphopeptides), eight of which were already identified previously in proteomic analyses of sea urchin skeletal elements [10, 11] by means of non-phosphorylated peptides. These included proteins encoded in entries [Glean3:17588] and [Glean3:17590/22278], which belonged to the group of tooth matrix proteins with Ala- and Pro-rich and acidic Gly-rich motifs described previously [11], and [Glean3:13763], which contained a sequence similar to dentin matrix protein-4(DMP-4)/FAM20C, with up to 50% identity to vertebrate proteins. The acidic C-terminal peptide (Table 2) leading to the identification of this protein was, however, not part of the sequence region of [Glean3:13763] matching to FAM20C/DMP-4 proteins. DMP-4 is a secreted calcium-binding protein abundantly present in dentin and bone [53]. The mouse protein was also reported to be phosphorylated, but at a different site (Phosida database, http://www.phosida.com/). The protein was previously tentatively identified in sea urchin tooth matrix with a different, non-phosphorylated, peptide [11] located in the DMP-matching region, but sequence conservation at that particular site was low. The only protein possibly belonging to the SM30 family and found to be phosphorylated was a protein "similar to SM30" [Glean3:00164] (Table 2). This protein was previously tentatively identified in tooth matrix as a minor protein, but was, to the best of our knowledge, not previously mentioned by others as a spicule matrix protein [9]. Only two of the 15 tooth phosphoproteins were also contained in the test phosphoproteome (Tables 1 and 2).

Phosphodontin, the major acidic phosphoprotein of tooth matrix

The most interesting phosphoprotein, from the point of view of biomineralization, was encoded as a hypothetical protein in entry [Glean3:18919] and yielded 14 unique phosphopeptides. This protein, for which we propose the name phosphodontin, was already previously identified in a proteomic survey of the tooth matrix by means of non-phosphorylated peptides [11], but in that study it appeared as a minor component. In retrospect it is likely that the concentration of this protein was considerably underestimated before, because most of the possible tryptic peptides were phosphorylated. Phosphorylated peptides were, however, not searched for in the previous study and were therefore not included in the calculation of the exponentially modified protein abundance index (emPAI) used as quantification method [11]. The protein sequence of phosphodontin contained a predicted secretion signal sequence (Fig. 6) which was followed by a region composed of thirty-five 11-12 amino acid-long repeats of various modifications of the sequence EISSGEGEQPK. Most repeats coincided with single tryptic peptides, but in two cases the lysine was substituted by Asn, creating tryptic peptides containing two repeats. Another repeat was interrupted by an Arg (aa86-96), and still another one was part of a longer peptide (aa25-51). The most C-terminal repeat was truncated. These three latter peptides were not identified by MS/MS. The high percentage of Glu (25%) in the sequence and a C-terminal domain consisting almost entirely of aspartic acid conferred to this protein a theoretical pI of 3.9, which would be further lowered by phosphorylation. The modified site in each phosphorylated repeat was one of the two serines in position 3 and 4. In general, the evidence (Additional File 3: Tooth matrix protein phosphorylation sites; Additional File 4: Selected spectra of tooth matrix phosphopeptides) favored the second Ser in each repeat as phosphorylation site. However, we cannot exclude that the first Ser was phosphorylated in peptide variants of the same sequence and composition. While the first serine of most repeats would be classified as part of a CK2 target site, the second serine is in most cases not predicted to be part of any kinase target site by MaxQuant (Table 2). The determination of phosphorylation sites was complicated by the frequent presence of Glu as N-terminal amino acid and Met adjacent to the phosphorylation site in many peptides. Glu frequently occurred either in linear form or as pyroGlu, while Met was present unmodified or oxidized. A few peptides were also identified in non-phosphorylated versions in this (Table 2) or a previous study [11]. With the present set of data we cannot determine the total number of phosphates per protein molecule. This is because several repeats occur more than once in the sequence, giving rise to identical peptides. Thus, for instance, repeat EISSGEGGEQPK was contained eight times in the sequence of [Glean3:18919]. However, the number of identified peptides did not correlate to this high frequency (Tab. 2). Peptides arising from this repeat were also identified in unmodified form in this and a previous study [11]. Therefore it is not clear at present how many, and which, of these identical repeats contained occupied phosphorylation sites. The acidity and the high degree of phosphorylation of the protein is reminiscent of some mammalian tooth matrix phosphoproteins, such as dentin phosphoryn, dentin sialoprotein, or dentin matrix protein-1 [12], but there is no obvious sequence similarity of phosphodontin to any of these mammalian proteins. The lack of predicted structure indicated that this novel tooth matrix phosphoprotein belongs to the growing group of biomineral matrix proteins without a defined structure in the absence of a ligand [54]. Such intrinsically disordered proteins were shown to be frequent targets of kinases and the phosphorylation sites were less well conserved than phosphorylation sites in well-structured sequence regions of known function [55].

Figure 6
figure 6

The sequence of phosphodontin ([Glean3:18919]), the major phosphoprotein of tooth matrix. The predicted signal sequence and the Asp-rich (72% Asp) C-terminus are doubly underlined. Alternating sequence repeats are shaded light and dark grey. Possible phosphorylation sites derived from experimental data are printed in bold and underlined. Because several repeats occur in multiple identical copies and also non-phosphorylated copies were detected for several of them, it was not possible to determine the extent of phosphorylation of this protein. Furthermore, obviously only one Ser of each repeat can be phosphorylated at a time. Experimental data favor the second Ser of each repeat as the phosphorylation site, but do not completely exclude modification of the first one. Peptides sequenced by MS/MS are printed in italics.

Novel tooth matrix phosphoproteins

Tooth matrix phosphoproteins not previously detected in sea urchin skeletal elements or the test matrix phosphoproteome included a protein similar to nucleobindin 2a/b (Glean3_02353), dickkopf protein-1 (dkk1; [Glean3:03345], a protein with similarity to pecanex-like protein 1/transmembrane protein 26 [Glean3:10288], a protein similar to reticulocalbin [Glean3:14308] and a protein with similarity to multiple coagulation factor deficiency protein 1 [Glean3:14805]. These proteins were probably minor components of the tooth matrix, which became detectable after depletion of major non-phosphorylated proteins.

Vertebrate nucleobindins are calcium-binding phosphorylated proteins of the nuclear envelope and the endoplasmatic reticulum, but secreted forms were reported to occur in bone matrix and tooth matrix [56, 57]. The sea urchin protein sequence of [Glean3:02353] was, however, only half the length of vertebrate nucleobindins and contained only one predicted EF hand motif instead of two. Sequence identity to vertebrate proteins was approximately 30% in an overlapping region, but the phosphorylation site was not part of the overlap.

Entry [Glean3:03345] contained amino acids 106-339 of Uniprot entry [A1XR81:STRPU], likely representing Dickkopf protein (Dkk)-1. Dickkopf proteins are secreted inhibitors of the Wnt signaling pathway, which was also implicated in sea urchin development [38]. Human Dkk-1 [39], mouse Dkk-3 [58, 59] and chicken eggshell Dkk-3 [14] were reported previously to be phosphorylated at positions found in the same sequence region as the phosphorylation sites of the presumed sea urchin Dkk-1.

Glean3_14308 encoded a sequence with similarity to vertebrate reticulocalbin/calumenin, calcium-binding, EF-hand-containing residents of the ER and Golgi apparatus. The sequence identity of the sea urchin protein to the vertebrate proteins was approximately 30%. However, the sea urchin protein sequence contained a predicted secretory signal sequence and no ER retention signal, such as a C-terminal HDEL, indicating that it was a secreted member of this functionally diverse family of proteins [60].

Concluding remarks

Using state-of-the-art proteomics instrumentation and software we have detected 21 phosphoproteins in sea urchin test matrix and 15 in tooth matrix. Considering the overlap of two phosphoproteins between these two compartments, we have identified 34 phosphoproteins with 53 unique phosphopeptides containing 64 phosphorylation sites, most of which could be attributed to a single amino acid. Twenty-four phosphorylation sites were not part of a known kinase target site (Tab. 1 and 2). Most of the assigned kinase target sites were casein kinase sites, in agreement with earlier reports indicating that casein kinase-like proteins were largely responsible for the phosphorylation of extracellular proteins. All other assigned sites were part of well-known kinase target sequences, such as EGFR, NEK6 or PKA sites (Tab. 1 and 2). This was in accordance with a recent report predicting almost the same set of kinases as in humans from in silico analysis of the S. purpuratus genome [24].

Test matrix contained 21 phosphoproteins, only two of which were previously identified as matrix components by means of non-phosphorylated peptides (Table 1). This indicated that all other phosphoproteins were less abundant than the low abundance proteins of the previous proteomic analysis of test matrix [10]. Probably peptides became analyzable only due to the specific enrichment of phosphopeptides. Such minor components likely do not play a role as structural elements but may play a role in signal transduction chains or processing of bulk matrix components. In contrast, eight of the 15 tooth matrix phosphoproteins were previously identified as tooth matrix proteins by means of non-phosphorylated peptides (Table 2). One of them, encoded in entry [Glean3:18919], most probably belongs to the major tooth matrix proteins. In a recent proteomic survey this protein was calculated to be of low abundance only. However, the present study indicated that most of the possible peptides were not taken into account in our previous proteomic survey [11] because the search for phosphorylated peptides was not included. The number of identified and accepted phosphopeptides indicated that this protein, for which we propose the name phosphodontin, would most probably have occupied position two in the abundance ranking [11]. To further characterize phosphodontin it will be necessary to isolate it from tooth matrix in sufficient amounts. Phosphodontin did not show sequence similarity to the better characterized mammalian tooth phosphoproteins. However, properties such as the high degree of phosphorylation and high percentage of acidic amino acids may indicate that this protein has similar functions and importance as the mammalian dental phosphoproteins, and may therefore be an interesting target for future research.