Background

Selaginella (Selaginella moellendorffii) is a lycophyte believed to be originated from the earliest vascular plants approximately 410 million years ago[1]. Although lycophytes have existed twice as long as angiosperms, they have not evolved flowers and seeds since their divergence from other plant lineages. For this reason, Selaginella has been selected as a model plant to understand the early evolution of developmental and metabolic processes that are unique to vascular plants[2]. After a bacterial artificial chromosome library was constructed from clonally propagated plants[3], the complete Selaginella genome sequence was released in 2007[4]. Subsequently, a number of investigations on Selaginella were launched in different areas including gene evolution[510], pathway conservation[1116], genomic DNA composition and methylation[1719], sRNA functions and RNA editing[20, 21], and transposons[22]. Interestingly, Selaginella was found to utilize genes significantly different from flowering plants to generate secondary metabolites with potentials for pharmaceutical applications[2326]. Meanwhile, proteomic investigations on this non-seed vascular plant model have not been extensively reported. A two-dimensional electrophoresis-based approach was recently employed to explore the desiccation tolerance mechanism in the resurrection plant Selaginella tamariscina[27].

Post-translational modifications (PTMs) play important roles in the regulation of protein functions and they occur at distinct amino acid side chains or peptide linkages. It has been estimated that more than 200 types of PTMs exist in proteins[28]. Protein phosphorylation, principally on serine, threonine or tyrosine residues, is one of the most important and well-investigated PTMs. It represents a reversible molecular switch controlled by protein kinases and protein phosphatases, either activating or inactivating the target proteins[29]. Approximately one-third of all proteins in eukaryotic cells were estimated to be phosphorylated at any given time[30]. In plants, protein phosphorylation plays a central role in virtually all cellular processes, including carbon and nitrogen metabolism, growth and development, transcription and translation, responses to abiotic and biotic stresses, cell cycle, and apoptosis[31]. Therefore, the identification of protein kinases and phosphatases, their substrates, and the phosphorylation sites involved is crucial for the understanding of many fundamental processes in plants. Interestingly, Arabidopsis contains over 1000 protein kinases[32], which is twice as many as those in human, while the two genomes share similar number of genes[33]. Hence, protein phosphorylation events in plants appear to be very different and more complicated than those in mammals. In fact, a number of plant protein kinases implicated in early events of signal transduction are unique with no mammalian orthologs[34].

Phosphoproteomic investigations in plants were initiated in recent years following the completion of different genome sequencing projects. The highly abundant ribulose-1,5-bisphosphate carboxylase/oxygenase (RUBISCO) protein, which accounts of about 50% of total soluble proteins[35], hindered the detection of low-abundant proteins including many phosphoproteins[36]. Polyethylene glycol (PEG) fractionation has been used as a cost-effective and contaminant-free procedure to remove RUBISCO for improved detection of low-abundant proteins[3739]. In addition, phosphopeptide enrichment procedures, such as immobilized metal ion affinity chromatography (IMAC), are necessary to reduce the complexity of proteolyzed lysates for mass spectrometry analysis. IMAC is based on affinity purification through metal complexation with the phosphate group in phosphopeptides[40] and it has been adopted in Phosphoproteomic analysis in different plant systems[4146].

In the present study, we used the PEG fractionation approach followed by the IMAC procedure to prepare Selaginella samples for phosphoproteome profiling and identified 1588 unique phosphorylation sites. Our dataset revealed features that are consistent with the Arabidopsis phosphoproteome. We further identified phosphorylation events that are conserved between Selaginella and angiosperm orthologous sequences. Novel and unique phosphosites were detected in several photosynthesis-related proteins in Selaginella. Phosphorylation motifs recognized by known protein kinase classes were revealed for both evolutionarily conserved and Selaginella-specific proteins.

Results and discussion

General features of the Selaginella phosphoproteome dataset

We employed the procedures of PEG fractionation (Additional file1: Figure S1) and IMAC enrichment to isolate phosphopeptides from wild-growing Selaginella (Selaginella moellendorffii) for LC-MS/MS analysis. A total of 1593 unique phosphopeptides containing 1588 non-redundant phosphosites were discovered in our study (Additional file2: Table S1). Among them, 1104 were identified with high confidence of localization (localization probability ≥ 95%), 116 with median confidence of localization (80% ≤ localization probability < 95%), and 368 with low confidence of localization (localization probability < 80%). Phosphosites with high confidence of localization were categorized into pSer (86.2%), pThr (13.3%), and pTyr (0.5%). The relative distribution of the three phosphorylated residues is consistent with previous reports for different flowering plant species[45, 46]. As Ser/Thr kinases are commonly encoded in plant genomes, more frequent Ser and Thr phosphorylation events are expected. On the other hand, while typical Tyr-specific kinases are absent in plant genomes, a few plant kinases with dual specificity are believed to phosphorylate Tyr residues in proteins[47].

The 1104 confirmed phosphosites correspond to a total of 716 Selaginella proteins, 665 of them can be assigned to orthologous protein groups (Additional file3: Table S2) using the OrthoMCL algorithm with a cut-off of E-5 e-value and 50% sequence match[48, 49]. Forty two proteins are considered Selaginella-specific proteins since they could not be assigned to any OrthoMCL groups or do not have any matching sequences in the OrthoMCL database. These proteins may have evolved in lycophytes after their separation from other vascular plants including ferns and seed plants.

Analysis of phosphosite locations in Selaginella proteins

To analyze the locations of the identified phosphorylation sites, protein sequences were divided into 5% fractions and the number of phosphorylation events was counted within each fraction. As shown in Figure 1A, the highest number of phosphosites is found in the last fraction, i.e. the C-termini of proteins. We performed parallel analysis using an Arabidopsis phosphoproteome dataset (retrieved from P3DB) and found a very similar distribution pattern for the phosphosites (Figure 1B). Such phenomenon was also described in a phosphoproteomic study of mouse liver[50]. Hence, the more frequent C-terminal phosphorylation in proteins appears to be a common feature in different organisms, including plants and animals. The C-terminal region was suggested to be more exposed and flexible for protein phosphorylation[50].

Figure 1
figure 1

The distribution of phosphorylation sites along the phosphoproteins of (A) Selaginella and (B) Arabidopsis. The Arabidopsis dataset was retrieved from P3DB.

Functional categorization of the identified Selaginella phosphoproteins

To understand the functional distribution of the unique Selaginella phosphoproteins identified in this study, their cellular localization, molecular function, and biological processes were analyzed and compared with those of 2400 Selaginella proteins identified after LC-MS/MS analysis of PEG-fractionated samples without the IMAC enrichment procedure. Based on the comparison of Gene Ontology (GO) term annotations (Figure 2), the 3 most over-represented categories for the identified phosphoproteins in each GO vocabulary are: nucleus, plasma membrane and cytosol for “cellular component”; DNA/RNA binding, kinase activity, and transferase activity for “molecular function”; protein modification, phosphorus metabolic process, and transcription for “biological process”.

Figure 2
figure 2

The GO annotation distribution of Selaginella whole proteins (before IMAC enrichment) and phosphoproteins in the categories of (A) cellular component, (B) molecular function and (C) biological processes.

Location of phosphosites in characterized protein domains

To determine whether the Selaginella phosphosites are located in known structural and/or functional protein domains, Pfam database search (Wellcome Trust Sanger Institute) was performed to extract domain information of our identified phosphoproteins. A total of 594 proteins with domain information were retrieved. Among the 893 phosphosites in these proteins, only 201 (22.5%) were located inside protein domains (Table 1). Our findings are consistent with the observations from Arabidopsis phosphoproteome analysis suggesting that phosphorylation events may not have significant impact on domain-associated functions[51, 52].

Table 1 Location of phosphorylation sites on characterized protein domains

Phosphorylation motif analysis

A phosphorylation motif search was performed on our phosphopeptide dataset (localization probability ≥ 95%) using the Motif-X algorithm[53]. Peptide sequences are aligned with their length adjusted to ±7 residues from the central phosphosite for data submission. Over-represented patterns of amino acid sequences were generated with a minimum occurrence of 20 and a significance value of 10−6. All together, we obtained a total of 11phosphorylation (9 Ser and 2 Thr) motifs containing at least one fixed amino acid aside from the central phosphorylated residue (Figure 3A). Both the Thr-motifs are Pro-targeted (TP and PXTP) and there are 3 Ser Pro-targeted motifs (SP, PXSP, SPXR). All these motifs are possible substrates of glycogen synthase kinase 3, cyclin-dependent kinase, and mitogen-activated protein kinase. In addition, 3 basophilic motifs (LXRXXS, RXXS, KXXS) likely to be associated with the activities of Ca2+-dependent protein kinase (CPK), Ca2+/calmodulin-dependent protein kinase, or protein kinase A were identified. Furthermore, 3 acidic motifs (SDXE, SXD, and SE) potentially recognized by casein kinase II were generated. We also performed parallel Motif-X analysis using Arabidopsis phosphopeptides retrieved from P3DB and those obtained by Wang et al. (2013). One of the Selaginella motifs, KXXS, was not generated from the Arabidopsis analysis. Thirty two (out of 38) occurrences of such motif correspond to proteins assigned with OrthoMCL group with e-value < E−50 (Additional file4: Table S3), indicating that this basophilic motif is primarily associated with evolutionarily conserved proteins in Selaginella. Analysis of the 107 phosphosites in the Selaginella-specific proteins (those without any assigned OrthoMCL groups) revealed that they are more enriched in Pro-directed motifs when compared to all the identified phosphosites (49% vs 35%) (Figure 3B). On the other hand, the basophilic motifs are under-represented in the Selaginella-specific proteins when compared to all proteins identified (23% vs 38%). Consistently, a single SP motif with 36 occurrences was generated by Motif-X analysis for the 107 phosphosites found in the Selaginella-specific proteins. Taken together, most of the Selaginella phosphorylation events identified in this study are likely to be catalyzed by known classes of protein kinase classes in plants.

Figure 3
figure 3

Phosphorylation motifs in Sellaginella phosphoproteins. (A) Motif-x analysis of phosphorylation peptides with localization probability 95%. (B) Motif classes distribution of Selaginella(Sm) specific phosphorylation site (P-site).

Phosphorylation events in evolutionary conserved proteins

To identify phosphorylation events highly conserved between Selaginella and flowering plants, our identified phosphopeptides with confirmed phosphosites were clustered with phosphopeptides of Arabidopsis, rice, rapeseed, soybean and Medicago truncatula (retrieved from P3DB) by CD-HIT[54] using a sequence identity cutoff of 0.6 and an alignment bandwidth of 5[44]. A total of 107 Selaginella phosphopeptides harboring 115 Ser/Thr phosphosites were found to cluster with phosphopeptides from the other plants. More than 80% (97/123) of those Selaginella phosphosites were found to have equivalent phosphosites in at least one other species. The majority (90/106) of the Selaginella proteins harboring the conserved phosphosites are evolutionarily conserved proteins belonging to Orthogroups identified with e values < E-50(Additional file5: Table S4). Many of these proteins are involved in primary metabolism (e.g. Calvin cycle, glycolysis, TCA cycle, lipid biosynthesis), RNA processing, transcriptional regulation, cell cycle, protein phosphorylation (kinases), and signaling (e.g. G proteins, 14-3-3 protein, LRR-containing kinases). On the other hand, 17 of these highly conserved phosphorylation events are found in proteins with unknown functions. Selected phosphopeptide alignments containing highly conserved phosphosites in multiple plant species are shown in Table 2.

Table 2 Selected conserved phosphorylation sites within the orthologous proteins

Furthermore, we performed a close examination on the phosphorylation events in Selaginella photosynthesis-related proteins. The molecular machinery of photosynthesis has been highly conserved during plant evolution. Among our identified phosphoproteins with confirmed phosphosites, seven are involved in photosystem II (PSII) and two are involved in photosystem I (PSI) (Figure 4A). To reveal possible evolutionary significance, sequences were aligned with orthologs from Arabidopsis, rice, and Physcomitrella patens (moss), representing diverse lineages of dicot, monocot, and bryophytes, respectively (Figure 4B and Additional file6: Figure S2). In all cases, phosphorylation information is only available for the Arabidopsis proteins. Sequences of rice and moss are included for examination of phosphorylatable residues at equivalent sites.

Figure 4
figure 4

Sellaginella phosphoproteins involved in photosynthesis. (A) Graphical representation of the photosynthesis machineries using the KEGG classification system. Different proteins participating in light-dependent reactions are shown. Circles filled with red color denoted phosphoproteins with confirmed phosphosites identified in this study. UniProtKB accession numbers of Selaginella proteins are shown underneath the corresponding photosynthetic proteins. (B) Alignment of the identified Selaginella photosynthesis phosphoproteins (selected regions) with orthologous sequence from Arabidopsis, rice (ORYSA) and P. patens (PHYPA). Phosphosites identified in this study and in Arabidopsis are highlighted in red and yellow, respectively. Complete alignments of these proteins are available in Additional file 6: Figure S2. Phosphorylation information for the rice and P. patens sequences is not available.

As the first link in the chain of light-dependent reactions, PSII captures photons and uses the energy to extract electrons from water molecules. The light-harvesting chlorophyll a/b protein complex LHCII in PSII is composed of three proteins, namely Lhcb1, Lhcb2, and Lhcb3[55]. Non-phosphorylated LHCII functions as an antenna for PSII, but it migrates to come in contact with PSI following light-dependent phosphorylation of Lhcb1 and Lhcb2[56] which is likely to occur at N-terminal Thr residues[57]. Although no explicit phosphosites were identified[5557], STN7 kinase was strongly suggested to be required for phosphorylation of Lhcb1 and Lhcb2 to achieve state transitions between PSII and PSI[56]. In Selaginella, we detected N-terminal Thr phosphosites in D8QN27 (Lhcb1: Thr-44) and D8SUF1 (Lhcb2: Thr-42) (Figure 4B). Both phosphorylation events are conserved in Arabidopsis Lhcb1 and Lhcb2[58, 59]. Equivalent phosphorylatable residues are also found in rice Lhcb1 and Lhcb2 (Figure 4B). As these Thr residues are preceded by a basic residue (Lys or Arg), they represent potential signatures recognized by STN7[60]. In fact, the Thr-40 in Arabidopsis Lhcb2is phosphorylated in wild-type but not in the stn7 mutant[59], further suggesting that it is a target of STN7. In D8QN27 (Lhcb1), we also identified the Ser-54 Pro-directed phosphosite which is conserved in Arabidopsis Lhcb1[43] and an equivalent Ser residue in rice Lhcb1 (Figure 4B). Interestingly, all the N-terminal Ser/Thr residues mentioned above are not conserved in P. patens Lhcb1 and Lhcb2 and they were probably only evolved after the emergency of vascular plants. On the other hand, the Ser-48 and Ser-49 phosphosites in D8QN27 (Lhcb1) are located in a region not conserved with the Arabidopsis and rice sequences, but equivalent Ser residues are identified in moss Lhcb1. They may represent phosphorylation events that are lost in the angiosperm lineage.

Lhcb4, a minor chlorophyll-binding protein, was found to be phosphorylated in maize upon exposure to high light intensity for protection against cold stress[61]. The phosphosite Thr-112, a potential casein kinase II target, was identified in maize Lhcb4[62]. This residue is not conserved in Selaginella Lhcb4 (D8RTB9) but present in Arabidopsis (pThr-109), rice (Thr-111), and moss (Thr-119) (Figure 4B). On the other hand, 2 consecutive phosphosites (Ser-57, 58) were detected in D8RTB9and the equivalent Ser residues are only found in moss but not in Arabidopsis or rice. While both of them are located in basic motifs, Ser-58 may also represent a target for acidic casein kinase II.

The PSII core proteins PsbA, PsbD and PsbC are also known to undergo a strong and dynamic redox-regulated phosphorylation cycle[6365]. STN8-dependent phosphorylation of PSII proteins is required for rapid turn-over of photo-damaged PSII complexes and it is highly important during prolonged exposure of the photosynthetic apparatus to excess light[66]. As determined by its structure, STN8 kinase was reported to have a peculiar substrate specificity restricted to the very N-terminal Thr residue of PsbA, PsbD and PsbC[60]. For example, the phosphosite Thr-2 in Arabidopsis PsbD is phosphorylated by STN8[42, 58]. The same phosphorylation event is detected in Selaginella C7B2K2 (PsbD) while an equivalent Thr residue is found in rice (Figure 4B). On the other hand, while no N-terminal Thr phosphosites were identified in Selaginella C7B2K3 (PsbC), its Thr-346 phosphorylation is conserved in Arabidopsis PsbC[42, 52] and equivalent Thr residues are found in rice and moss. This site may represent a substrate of acidic or basic motif recognizing kinases, indicating the possibility of cross-talk between kinases as suggested previously[60].

The oxygen-evolving complex (OEC) is consisted of PsbO, PsbP and PsbQ. PsbO stabilizes the manganese cluster which is the primary site of water splitting. Besides, PsbO regulates dephosphorylation and turnover of the PSII reaction center PsbA[67, 68]. However, no phosphorylation events in PsbO have been reported previously in any plants. In Selaginella D8TBN9 (PsbO), we identified a unique Pro-directed Ser-219 phosphosite. The equivalent residues in other PsbO sequences examined are all Lys which is non-phosphorylatable (Figure 4B). PsbQ is required for PSII assembly, stability, and photoautotrophic growth under low light conditions[69]. The Selaginella PsbQ (D8S1M9) was found to be phosphorylated at the Ser-61 residue, which is a potential target of Pro-directed kinase. Equivalent pSer and Ser residues are found in Arabidopsis and moss PsbQ sequences, respectively (Figure 4B).

PsaC and PsaF are components of PSI which performs the light-induced electron transfer from plastocyanin or cytochrome c6 (Cytc) to ferredoxin. As a chloroplast-encoded PSI subunit, PsaC binds the two terminal electron acceptors (FA and FB). No phosphorylation was reported in PsaC previously in any plants. PsaC is extremely conserved among the four plant species examined here with most of the residues identical (Additional file6: Figure S2). Intriguingly, the phosphorylation event occurs at a unique residue (Ser-71) in Selaginella PsaC (C7B2J3). The equivalent residues in the other plant sequences are all non-phosphorylatable. The nuclear subunit PsaF provides a docking site for plastocyanin and Cytc on the lumenal side of PSI. In Arabidopsis, PsaF was reported to be phosphorylated at Ser-94, Ser-95, Tyr-99, and Ser-103[42, 52]. Most of the equivalent residues in Selaginella PsaF (D8QPQ3) are conserved except for Ser-95. On the other hand, the Ser-184 phosphosite in D8QPQ3 is located in the very C-terminal region which is absent in Arabidopsis and rice. The same residue was identified in the moss PsaF sequence, suggesting that the Ser-184 phosphorylation event might have been lost during the evolution of flowering plants.

Overall, several phosphorylated residues in the Selaginella photosynthesis proteins are conserved with equivalent phosphorylation in Arabidopsis and/or phosphorylatable residues in most of the plants examined, including Lhcb1: Thr-44, Lhcb2: Thr-42 and 46,PsbD: Thr-2,PsbC: Thr-346, and psbQ: Ser-61. The phosphorylation of Thr-46 in Lhcb2 is first identified in Selaginella and the equivalent residues in other plant sequences are likely to be phosphorylated. We also identified unique phosphorylated residues within highly conserved regions in Selaginella PsbO (Ser-219) and PsaC (Ser-71). On the other hand, phosphorylation events with equivalent residues only in moss were detected in Selaginella Lhcb1, Lhbc4 and PsaF. These phosphosites are located in low-homology regions when compared with the Arabidopsis and rice sequences, implicating that they were lost in the flowering plants during evolution. It will be very interesting to investigate how the different unique phosphorylation events are involved in light reactions in Selaginella.

Conclusions

Our work generates the first large-scale atlas of phosphoproteins in Selaginella which occupies a unique position in the evolution of terrestrial plants. Combining PEG fractionation with IMAC enrichment, a total of 1593 unique phosphopeptides (1588 individual phosphosites) representing 851 unique phosphoproteins were retrieved. An overview of the Selaginella phosphoproteomics data revealed general features which are largely consistent with the dicot model Arabidopsis. Known plant phosphorylation Ser/Thr motifs were extracted from total and Selaginella-specific phosphopeptides, implicating the conservation of phosphorylation machineries during vascular plant evolution. In fact, 97highly conserved phosphorylation events were identified among Selaginella and flowering plant homologs. In PSI proteins, we identified conserved residues which are potential targets of STN7 and STN8 kinases. On the other hand, several phosphosites unique to Selaginella were detected in the highly conserved PSI and PSII proteins. Future research into functional roles of Selaginella-specific phosphorylation events in photosynthesis and other processes may offer insight into the molecular mechanisms leading to the distinct evolution of lycophytes.

Methods

Protein extraction and PEG fractionation

Two-gram aerial tissues of wild-growing Selaginella moellendorffii collected from the Victoria Peak in Hong Kong were ground to fine powder in liquid nitrogen. The powder was homogenized in 10 mL of ice-cold Mg/NP-40 extraction buffer containing 0.5 M Tris-HCl (pH 8.3), 20 mM MgCl2, 2% v/v NP-40, 2% v/v β-mercaptoethanol, 1 mM phenylmethylsulfonyl fluoride and 1% w/v polyvinylpolypyrrolidone using the Tissue-Tearor (BioSpec) operated at maximum speed for 1 min on ice[70]. After centrifugation at 12000 × g for 15 min at 4°C, the supernatant was treated with 15% PEG-4000 and incubated on ice for 30 min, followed by centrifugation at 1500 × g for 10 min at 4°C. The pellet was washed sequentially with ice-cold 10% trichloroacetic acid/acetone, ice-cold 100% methanol containing 0.1 M ammonium acetate, and ice-cold 100% acetone. The supernatant was precipitated by adding four volumes of ice-cold acetone and then incubated at -20°C for 2 h. After centrifugation at 12000 × g for 5 min at 4°C, the pellet was rinsed as described above. For the plant debris left after the initial Mg/NP-40 extraction, residual protein was extracted by 4% SDS. After centrifugation, the supernatant was precipitated with ice-cold acetone, followed by sequential rinsing of the pellet.

Protein digestion and phosphopeptide enrichment

The pellets obtained from each of the above step were re-suspended in solution containing 0.2 M Tris-HCl (pH 8.0), 8 M urea and 4 mM CaCl2. Dissolved protein samples were reduced with 10 mM dithiothreitol for 30 min at 56°C, and the alkylated with 40 mM iodoacetamide for 30 min at room temperature in the dark. Protein concentration was measured by the Bio-Rad Protein Assay kit. Afterwards, trypsin (Worthington) was added in a 1:50 (enzyme: protein) w/w ratio and the mixture incubated overnight at 37°C. Trypsinized peptides were loaded onto a 1 g Sep-Pak C18 column (Waters), washed twice with 10 mL 1% acetic acid, eluted with 7 mL 80% acetonitrile containing 0.1% acetic acid, dried under speed-vacuum, re-suspended in 400 μL 1% acetic acid, and then loaded onto a mini-column of 40 μL IMAC resin prepared as described previously[71]. The IMAC mini-column was rinsed twice with 40 μL wash buffer containing 25% v/v acetonitrile, 100 mM NaCl and 0.1% v/v acetic acid, then washed once each with 40 μL 1% v/v acetic acid and 20 μL double-distilled water, eluted with 120 μL 6% w/v NH3.H2O, and dried under speed-vacuum. IMAC-enriched phosphopeptides derived from different PEG fractionated samples (Additional file1: Figure S1) were subject to LC-MS/MS analysis.

RPLC-ESI-MS/MS detection

The Triple TOF 5600 mass spectrometer (AB SCIEX), a hybrid quadrupole TOF platform[72], was coupled with an Nano-LC system (Agilent) utilizing Nanospray III ion-source (AB SCIEX). Mobile phase A (2% ACN, 0.1% formic acid) and mobile phase B (98% ACN, 0.1% formic acid) were used to establish a 120 min gradient comprised of 80 min (5-30% B), 12 min (30-60% B), 6 min (60-90% B), 10 min (90% B), and 12 min (90-5% B). The flow rate was 300 nL/min. Peptides were separated on a fused silica capillary emitter (New Objective) packed in-house with 5 μm C18 resin (New Objective), and analyzed in positive ion mode by electrospray ionization. For information dependent acquisition, each survey scan was acquired in 250 ms followed by 20 product ion scans collected in 50 ms/per scan.

Database searching of MS/MS spectra

For proteome analysis, raw data from Triple TOF 5600 were searched with ProteinPilot software (version 4.0, AB SCIEX) against the Uniprot Selaginella moellendorffii complete proteome database (downloaded in April 2011, 33195 sequences) using following parameters: Sample Type (Identification), Cys Alkylation (Iodoacetamide), Digestion (Trypsin), Search Effort (Rapid). The false discovery rate (FDR) analysis was done by using the tool integrated in ProteinPilot. All data were filtered at 1% FDR.

For phosphoproteome analysis, raw data MS/MS (wiff files) were converted to .mgf files and searched with the Mascot (version 2.2, Matrix Science) software[73] against the Selaginella proteome database using following parameters: fixed modifications was set to carbamidomethylation on cysteine, variable modifications was set to oxidation of methionine and phosphorylation at serine, threonine and tyrosine, peptide and MS/MS fragment tolerances were set to 20 ppm and 0.2 Da respectively, trypsin was selected as digestion enzyme, and up to two missed cleavages were allowed. All .mgf files were merged into one file followed by database searching.

Post-search data processing and phosphosite localization

The Mascot search result was first loaded into Scaffold (version 3.0, Proteome Software) for further analysis. In order to screen phosphopeptides with high confidence, “Min Protein” (protein identification probability), “Min # Peptide” (the number of unique peptides on which a protein identification is based) and “Min Peptide” (peptide identification probability) were adjusted to 20%, 1 and 95% respectively[74, 75]. Afterwards, the mzIdentML file generated by Scaffold was loaded into Scaffold PTM (version 1.1, Proteome Software) to determine the localization probability of phosphosites[76].

Gene ontology annotations

Gene ontology (GO) annotations of all identified proteins and phosphoproteins in Selaginella categorized into 3 classifications (Cellular Component, Molecular Function and Biological Process) were batch-retrieved from the Protein Information Resource (http://pir.georgetown.edu/pirwww/search/batch.shtml).

Analysis of phosphorylation site conservation

The Selaginella phosphopeptides were clustered with different plant phosphopeptides retrieved from the Plant Protein Phosphorylation Database (P3DB; http://www.p3db.org/) using the CD-HIT web server[54] (http://www.bioinformatics.org/cd-hit/). All phosphopeptide sequences were combined into a single Fasta file for data upload. Default parameters were adopted together with a 60% similarity cutoff and a bandwidth of 5. Conservation of phosphorylation sites among different plant species were then identified by manual inspection of the sequence alignment in each cluster.

Phosphorylation motif analysis

Sequence was centered on each phosphosite and extended to 15 amino acids (±7 residues). Phosphosites, which could not be extended because of N- or C-termini, were excluded from motif analysis. Only phosphosites with localization probability above 95% were used. General phosphorylation motif classes were assigned as defined previously[50]: P at +1 (Pro-directed); D/E at +1/+2 or +3 (Acidic), 5 or more D/E at +1 to +6 (Acidic); K/R at -3 (Basic), 2 or more K/R at -6 to -1 (Basic); otherwise (Others). Specific motifs were extracted from the data set by using motif-x algorithm (http://motif-x.med.harvard.edu/motif-x.html)[53]. The Selaginella proteome database in fasta format was retrieved (http://www.phytozome.com/) and uploaded as background. The significance threshold was set to 10−6 and the minimum number of motif occurrences was 20.