Introduction

Following translation at the ribosome, the N-terminus of a protein is subjected to a plethora of modifications among which are proteolytic processing and the addition of moieties such as acetyl, methyl, or other functional groups (Meinnel and Giglione 2008; Fortelny et al. 2015). In turn, the N-terminus enables subcellular targeting and determines protein half-life (Varshavsky 1996, 2019; Kunze and Berger 2015; Armenteros et al. 2019). Modifications of the N-terminus are introduced in a co- or a post-translational manner with co-translational acetylation and methionine-excision being among the most abundant modifications in eukaryotes (Ree et al. 2018; Giglione and Meinnel 2021). In plants, however, N-terminal acetylation also occurs in a post-translational manner on plastid stromal proteins after import and cleavage of their targeting peptide (Giglione and Meinnel 2021). In contrast to proteolytic trimming, amino acids can also be added to the apparent N-terminus of a protein in a ribosome-independent manner (Varshavsky 1996; Tasaki et al. 2012) as part of the N-degron pathway for targeted proteolysis. Various methods such as COFRADIC (Staes et al. 2011), TAILS (Kleifeld et al. 2010), or HUNTER (Demir et al. 2022) have been established and permit the characterization of proteases, high-throughput degradomics and profiling of N-terminal acetylation. In turn, N-terminomics data are available from public databases such as TopFIND (https://topfind.clip.msl.ubc.ca) for various organisms including human, mouse, yeast, and Arabidopsis.

In contrast, almost no N-terminomics data were available for the model plant Physcomitrella (Physcomitrium patens; Lueth and Reski 2023). This moss is a versatile model system for evo-devo studies (Horst et al. 2016), plant physiology (Decker et al. 2017; Wiedemann et al. 2018), and evolution of metabolic pathways (Renault et al. 2017; Knosp et al. 2024) due to its interesting evolutionary position at the early divergence of land plants (Rensing et al. 2008). It has further proven to be a valuable system for proteomic and proteogenomic research due to its easy and axenic culture conditions (Hohe et al. 2002) enabling highly reproducible and even GMP-compliant culture conditions (Sarnighausen et al. 2004; Heintz et al. 2006; Mueller et al. 2014; Hoernstein et al. 2016, 2018; Fesenko et al. 2019, 2021). Besides broad application in basic research, Physcomitrella is employed as a production platform for recombinant biopharmaceuticals in GMP-compliant bioreactors (Decker and Reski 2020; Ruiz-Molina et al. 2022; Tschongov et al. 2024). Furthermore, genomic and transcriptomic resources are well established and publicly available (Lang et al. 2018; Perroud et al. 2018; Fernandez-Pozo et al. 2020; Bi et al. 2024).

Here, we provide a snapshot of the N-terminome of the moss Physcomitrella with a focus on the cleavage of N-terminal targeting sequences, N-terminal acetylation, and N-terminal monomethylation. The data were compiled using 24 datasets from various experimental setups and subsequent N-terminal peptide enrichment using a modified TAILS approach. We reveal apparent N-terminal methylation not only of plastid and cytosolic proteins but also of mitochondrial proteins. Furthermore, we provide a list of confirmed targeting peptide cleavage sites along with a candidate list of proteins which are dually targeted to plastids and mitochondria as well as to mitochondria and the cytosol. With this, we provide a resource for basic research as it contains information about translation of splice variants as well as post-translational and post-transcriptional processing of proteins. Moreover, targeting of recombinant proteins to plastids of Nicotiana benthamiana (Maclean et al. 2007) or the extracellular space in Physcomitrella (Schaaf et al. 2005) enabled high yields of the desired recombinant product. Consequently, our present data also provide a comprehensive resource for further customized recombinant protein production and targeting in Physcomitrella.

Results and discussion

Overview of identified N-termini

The present data provide a qualitative overview of the N-terminome of the moss Physcomitrella using a compilation of 24 datasets from N-terminal peptide enrichments from different tissues, treatments, and different sample processing protocols. The datasets were obtained during method establishment for various purposes not related to the analysis performed in this study, and hence the data were only assessed qualitatively and will not allow any cross-sample comparison. A table providing details about the sample type, tissue employed, and other experimental parameters is available from Supplemental Table S1.

Enrichment of N-terminal peptides was performed as described in Hoernstein et al. (2018) with modifications. Free amino groups in the protein sample were blocked by reductive dimethylation according to Kleifeld et al. (2010) and depletion of internal peptides after proteolysis was performed according to McDonald and Beynon (2006). Mass spectrometry (MS) measurements were performed on an LTQ-Orbitrap Velos Pro (ThermoScientific, Waltham, MA, USA), and raw data were processed and searched with Mascot (Matrix Science, Chicago, IL, USA). All database search results were loaded in Scaffold5™ (V5.0.1, https://www.proteomesoftware.com) software and proteins were accepted with a ProteinProphet™ (Nesvizhskii et al. 2003) probability of at least 99% and a minimum of 1 identified peptide. Peptides were accepted at a PeptideProphet™ (Keller et al. 2002) probability of at least 95% and a Mascot ion score of a least 40 (Supplemental Table S2). Using these settings, from a total of 24 datasets, we identified 11,533 protein N-termini using 32,213 spectra corresponding to 4517 proteins (3,920 protein groups) with a decoy FDR (false discovery rate) of 0.4% at the protein level and 0.08% at the peptide level (Supplemental Tables S2, S3). Approximately 20% of the identified N-termini represented either the initiator methionine (start index 1, Fig. 1A) or the subsequent amino acid after cleavage of the initiator methionine (start index 2). For approximately 40%, a start index between 2 and 100 was identified, indicating proteolytic processing and cleavage of subcellular targeting sequences. A single experimentally determined N-terminus was observed in approximately 70% of all cases whereas for approximately 30% of the identified proteins, two or more N-termini were observed (Fig. 1B). This is in strong contrast to previous findings from proteins in Physcomitrella bioreactor supernatants (Hoernstein et al. 2018) where at least two distinct N-termini were observed for approximately 80% of all identified proteins. Further, we analyzed the presence of N-terminal modifications with a focus on N-terminal acetylation, monomethylation, and presence of pyro-glutamate (pyroGlu) at the N-terminus. The latter modification can occur spontaneously or via enzymatic catalysis on N-terminal glutamine residues (Schilling et al. 2008). Since pyro-glutamate formation can also occur following proteolysis during sample processing (Purwaha et al. 2014), only peptides where the preceding P1 amino acid did not match the specificity of the employed protease (e.g., no peptides with K or R as preceding amino acid in the case of trypsin digests) were considered here.

Fig. 1
figure 1

Overview of identified N-termini, N-terminal modifications, and identified cleavage sites of targeting peptides. A Frequency of identified N-terminal positions per identified protein accession. The start index represents the position number of the identified N-terminal amino acid in the corresponding protein model. B Frequency of the number of identified N-termini per protein. C Bar chart depicting the distribution of identified N-terminal modifications. Peptides bearing N-terminal pyro-glutamate (pyroGlu) were only counted if the preceding amino acid did not match the specificity of the applied protease (e.g., peptides identified with K|R in the P1 position were rejected in the case of trypsin digests)

Approximately 25% of all identified N-termini were acetylated, 71% had no modification and 4% were either methylated or had N-terminal pyroGlu (Fig. 1C). The actual level of N-terminal pyroGlu occurrence is likely higher, but due to specificity ambiguity with the experimentally employed proteases, this cannot be analyzed further. At the protein level, we found approximately 76% of the nuclear-encoded proteins having either the retained or the cleaved initiator methionine (1436 protein groups, Supplemental Table S3) to be N-terminally acetylated (1097 protein groups, Supplemental Table S3). This degree is slightly below the estimated degree of around 90% of N-terminal acetylated proteins in plants (Bienvenut et al. 2012; Linster and Wirtz 2018).

Post-import trimming of plastid proteins

Cleavable N-terminal sequences are required for subcellular and extracellular targeting of nuclear-encoded proteins. Their cleavage via specific proteases after translocation across a respective organellar membrane generates a new N-terminus that represents either the final N-terminus of the translocated protein or a new site for further proteolytic processing by organellar proteases. For Physcomitrella, a total of 8681 cleavable N-terminal targeting sequences are predicted (Supplemental Fig. S1) and here we compared our experimentally observed N-termini to these predictions allowing a tolerance window of ± 5 amino acids around a predicted targeting peptide cleavage site (Fig. 2A). In the following, a difference of 0 indicates agreement of an observed N-terminal amino acid with a predicted cleavage site (predicted P1 amino acid, Fig. S2A). Within this range, we confirm the predicted cleavage sites of 748 plastid targeting signals (cTP), of 57 thylakoid luminal targeting signals (luTP), of 154 mitochondrial presequences (mTP), and of 185 secretory signal peptides (SP) using our present N-terminomics data (Fig. S2B). This data is compiled in Supplemental Table S4.

Fig. 2
figure 2

Comparison of experimentally observed N-termini with predicted organellar targeting peptide cleavage sites. Depicted are frequencies of identified N-termini around a predicted targeting peptide cleavage site. A difference of 0 indicates an identified N-terminal amino acid which corresponds to the P1’ amino acid of a predicted cleavage site. Cleavages of plastid (A), thylakoid lumen (B), mitochondrial (C), and secretory (D) targeting signals were predicted with TargetP2.0. All data are available from Supplemental Table S4. (E) Bar chart depicting the distribution of plastid protein isoforms with confirmed cleavage of a plastid targeting peptide and their identified N-terminal modifications. Percentages are related to the total number of identified proteins with a cleaved N-terminal plastid targeting peptide (748) within a window of ± 5 amino acids around a predicted cleavage site. All data are available from Supplemental Table S4. Frequency of identified N-termini around a predicted plastid transit peptide cleavage site being either acetylated (F), unmodified (free, (G)), or monomethylated (H). Cum. [%]: cumulative percentage (red points)

Among the confirmed plastid proteins, we find approximately 42% to be N-terminally acetylated (317 protein isoforms, Supplemental Table S4). Apparently, N-termini identified around a plastid transit peptide cleavage site only matched the predicted cleavage site exactly for approximately 47% (Fig. 2A). This percentage is strikingly higher in all other cases with almost 70% for thylakoid luminal transit peptides (Fig. 2B) and around 65% for mitochondrial presequences and secretory signal peptides (Fig. 2C, D). Further, approximately 43% of the N-termini of plastid proteins within the chosen difference window deviate by one to five amino acids upstream of the predicted cleavage site with decreasing frequency. This effect is less apparent for luminal targeting sequences (approximately 33%), mitochondrial presequences (approximately 22%), and secretory signal peptides (approximately 32%). Consequently, the distribution of differences between predicted and observed plastid transit cleavage site (Fig. 2A) indicates a successive proteolytic post-processing pattern of plastid proteins after cleavage of their transit peptide. A similar scenario with multiple cleavage sites around predicted plastid transit peptide cleavage sites has also been observed in Arabidopsis (Bienvenut et al. 2012; Rowland et al. 2015).

N-terminal modifications of plastid and mitochondrial proteins

Among the proteins with an identified plastid transit peptide cleavage site, we found 42% protein isoforms with an acetylated N-terminus and almost 10% with a monomethylated N-terminus (Fig. 2E). Strikingly, plastid N-termini being acetylated and non-modified (free), both show this successive cleavage pattern whereas monomethylated N-termini do not show this pattern (Fig. 2F–H).

This raises the question whether this processing occurs on both, acetylated and free N-termini, or whether only free N-termini are processed and subsequently acetylated. One explanation would be that Nα-acetylation of plastid proteins is incomplete and affects only a fraction of each protein isoform. Effectively, many N-termini of plastid proteins were identified in this study in a dual state, being acetylated and free (e.g., Pp3c18_19140V3.1, Pp3c15_7750V3.4; Supplemental Table S4). In this case, Nα-acetylation would prevent N-terminal trimming, whereas the fraction with an unmodified N-terminus would be proteolytically processed to different levels and subsequently acetylated. This scenario may be supported by the fact that Nα-acetylated and free N-termini share a similar preference of N-terminal amino acids (Fig. 3), with alanine and serine being the most prominent ones. Apparently, the relative amino acid frequency of the N-terminally acetylated plastid proteins is strikingly similar to the relative frequency in Arabidopsis (Huesgen et al. 2013). Although the plastid protease inventory is under active investigation (Meinnel and Giglione 2022; van Wijk 2024), a specific protease for such N-terminal trimming has not yet been identified. Aminopeptidases identified in Arabidopsis were recently proposed to also confer trimming functions (Rowland et al. 2015; Meinnel and Giglione 2022), but their activity was investigated on released plastid transit peptides in conjunction with presequence proteases (Teixeira et al. 2017), and not on the protein N-terminus of the corresponding protein.

Fig. 3
figure 3

Sequence logos of identified N-termini with different modification states of plastid and mitochondrial proteins. “Dif” indicates the position difference upstream of a predicted plastid or mitochondrial transit peptide cleavage site. Transit peptide cleavage sites were predicted with TargetP2.0. The prediction data are available from Supplemental Table S4. “n” represents the total number of non-redundant sequences. Sequences were aligned at the identified N-terminal amino acid

On the other hand, a cleavage of an Nα-acetylated amino acid may also be possible, although until now no protease has clearly proven activity on acetylated N-termini of intact proteins. However, acylamino acid-releasing enzyme (AARE), a bifunctional serine protease (Tsunasawa et al. 1975; Fujino et al. 2000; Shimizu et al. 2003; Nakai et al. 2012; Hoernstein et al. 2023), has a proven activity on Nα-acetylated oligopeptides and activity on intact proteins is repeatedly considered (Tsunasawa et al. 1975; Arfin and Bradshaw 1988; Adibekian et al. 2011). Moreover, one Physcomitrella AARE isoform and the Arabidopsis AARE are localized not only to the cytoplasm but also to plastids and mitochondria (Hoernstein et al. 2023). This renders AARE an interesting new candidate for the processing of plastid proteins, especially since this protease shows a strong substrate preference for Ac-Ala (Yamauchi et al. 2003; Hoernstein et al. 2023), the most frequent Nα-acetylated amino acid of plastid proteins observed here (Fig. 3).

We also identified several mitochondrial proteins that are N-terminally acetylated around a predicted transit peptide cleavage site (Supplemental Tables S4, S5). Although Nα-acetylation of plastid proteins is well known, this modification has not yet been detected to a similar extent on mitochondrial proteins, and its apparent presence is not clear (Giglione and Meinnel 2021). The N-terminally acetylated amino acids were A, T, S, and in all cases were preceded by a methionine, which may also indicate alternative translation initiation. Consequently, we did not consider this to be a previously undiscovered modification of mitochondrial proteins, but rather a co-translational modification of a shorter, e.g., cytoplasmic isoform, resulting from alternative translation initiation or alternative splicing. In Physcomitrella, both mechanisms are known to target protein isoforms to distinct subcellular localizations (Kiessling et al. 2004; Hoernstein et al. 2023). Apart from these two scenarios, dual targeting of proteins to plastids and mitochondria via ambiguous targeting signals is also considerable. In this case, the N-terminally acetylated protein would represent the plastid-localized variant. Hence, we investigated those N-terminally acetylated and potentially mitochondria-localized proteins with a focus on alternative translation initiation sites or splice variants that would give rise to shorter, possibly cytoplasmic, protein isoforms (Supplemental Table S5). Subcellular targeting predictions were performed with Localizer, and the presence of ambiguous targeting signals was predicted with ATP2. In two cases (Pp3c13_17110V3.1, Pp3c21_2600V3.1), potential dual targeting to plastids and mitochondria was predicted by Localizer and ATP2, but alternative translation initiation from the downstream methionine was also likely (Supplemental Table S5). In most other cases, we found either a potential alternative translation initiation site (Pp3c15_21480V3.1, Pp3c4_3210V3.1) or alternative splice variants (Pp3c22_8300V3.1, Pp3c7_24050V3.2) that facilitate translation of a shorter open reading frame. For one protein (Pp3c9_14150V3.1, Pp3c9_14150V3.2), the situation remains unclear. Nevertheless, the present data indicate that the observed N-terminally acetylated proteins are localized in plastids or the cytoplasm rather than in mitochondria.

Besides acetylation, we also observed N-terminal methylation on plastid proteins and on mitochondrial proteins. Strikingly, monomethylation on plastid proteins was also identified predominantly not only on N-terminal alanine and serine but also on N-terminal methionine (Fig. 3). The apparent absence of a successive cleavage pattern similar to that observed for free or acetylated N-termini may indicate a stabilizing effect of monomethylation on the modified protein. This modification is found in prokaryotes and eukaryotes (Stock et al. 1987) but is poorly investigated in plants. It has been proven for the small subunit of Rubisco (RbcS) in pea, spinach, barley, and corn (Grimm et al. 1997). Accordingly, we found the N-terminal methionine of RbcS to be monomethylated at its N-terminal methionine (after transit peptide removal, Pp3c12_19890V3.4, Supplemental Fig. S3) in Physcomitrella, suggesting that this modification of RbcS is evolutionary conserved.

Apart from those plastid proteins, we also identify 49 mitochondrial proteins being monomethylated at their N-terminus, matching the predicted presequence cleavage site (Supplemental Table S4), including cytochrome C subunit 5B (COX5B, Pp3c19_11870V3.1, Fig. S4). A strong overrepresentation of serine as N-terminally monomethylated amino acid was observed (Fig. 3), whereas alanine and serine were equally frequent on non-modified N-termini of mitochondrial proteins. This specificity of methylated proteins in plastids and mitochondria resembles only partially the specificity of human NTM1A (Alpha N-terminal protein methyltransferase 1A; UNIPROT: Q9BV86) (Schaner Tooley et al. 2010; Wu et al. 2015) which methylates N-terminal alanine and serine when followed by a proline and a lysine. In our data, proline and lysine were not frequently observed as subsequent amino acids (Fig. 3). Intriguingly, the observed amino acid frequency of methylated plastid and mitochondrial proteins rather resembles the situation in yeast (Chen et al. 2021). Despite RbcS, N-terminal methylation of plant proteins was reported only on cytosolic and plastid ribosomal subunits and histones (Carroll et al. 2008; Webb et al. 2010). Again, the N-terminal amino acid sequences (Webb et al. 2010) share almost no homology with the methylated N-termini observed in our data. Nevertheless, we identified the cytosolic ribosomal subunit RPL19 to be methylated at its mature N-terminus (Pp3c18_14440V3.1, Fig. S5), but also other likely cytosolic proteins (Supplemental Table S3).

Finally, we also identified several proteins that were methylated at their N-terminus after cleavage of a predicted secretory signal peptide (Supplemental Table S4). Most seemed to be false positive identifications due to isotope peak errors and were not considered further.

Knowledge of N-terminal methylation of proteins, especially in plants, is scarce, and until now the methylation of RbcS was considered an exception (Grimm et al. 1997; Petkowski et al. 2013). In contrast, our data reveal that this modification affects several plastid and mitochondrial proteins with similar specificity of the methylating enzyme. In humans, two N-terminal methyltransferases are known so far, NTM1 and NTM2 (Schaner Tooley et al. 2010; Petkowski et al. 2013). To investigate whether homologues in Physcomitrella exist, we used the sequence of human NTM1 (UNIPROT: Q9BV86) as a query for a BlastP search (Altschul et al. 1997) against all Physcomitrella V3.3 protein models (Lang et al. 2018) using the Phytozome database (https://phytozome-next.jgi.doe.gov) (Goodstein et al. 2012) and identified a single protein (Pp3c22_8670V3.1; identity 36%, alignment length 74 amino acids) sharing the same protein family annotations as human NTM1A (InterPro: Alpha-N-methyltransferase NTM1 IPR008576; S-adenosyl-L-methionine-dependent methyltransferase IPR029063). In a reciprocal BlastP search using Pp3c22_8670V3.1 (hereafter referred to as PpNTM1) as a query, human NTM1 appeared as best Blast hit, confirming the orthology. A full InterPro search against all Physcomitrella V3.3 protein sequences (Lang et al. 2018) did not reveal any further hits with this protein family annotation. Consistent with this, PpNTM1 was also the only Blast hit when using the sequence of NTM2 (UNIPROT: Q5VVY1), a human homologue of NTM1, as a Blast query.

In Physcomitrella, PpNTM1 is expressed in all major tissues at moderate levels (Supplemental Fig. S6). We also found only a single Arabidopsis homologue (AT5G44450.1) which is predicted to localize to plastids by both predictors. Interestingly, PpNTM1 is predicted via TargetP2.0 to localize to mitochondria, whereas plastid localization is predicted by Localizer (Supplemental Table S6). Moreover, a potential alternative translation initiation site might be at M56 (Kozak Similarity Score ≥ 0.7 and < 0.8; Gleason et al. 2022a, b) which would not interfere with the predicted domain structure and enable cytosolic localization. We further investigated the conservation of residues with known catalytic function in the human isoform (Dong et al. 2015; Wu et al. 2015) in plant homologues with a focus on bryophytes, and mosses in particular, including two other species from the same family as Physcomitrella. Surprisingly, only one of seven known catalytic sites is conserved in Physcomitrella (Supplemental Figure S7A), while all but one are conserved in Arabidopsis. Notably, Arabidopsis NMT1 has a three amino acid long “EPV” motif where the human isoform has the motif “DIT” (Supplemental Fig. S7A). In turn, the EPV motif appears to be conserved in all other plant species analyzed here, except the chlorophytic alga Volvox carteri which features an S instead of V, and Physcomitrella which deviates completely, even from the orthologues of its closest relatives (Supplemental Fig. S7A). Thus, we performed phylogenetic reconstruction of the aligned protein sequences by calculating a maximum likelihood tree (Supplemental Fig. S7B). The phylogeny further indicates that the accumulation of changes in Physcomitrella NTM1 is species-specific. Finally, we checked the structure predictions from AlphaFold (Jumper et al. 2021; Varadi et al. 2024). While human and Arabidopsis NTM1 have obvious structural similarities (Supplemental Fig. S7C), the predicted structure of Physcomitrella NMT1 is different but of poor prediction quality (Supplemental Figure S7C). Nevertheless, the search for similar structures of PpNTM1 with Foldseek (van Kempen et al. 2023) again revealed sequences of NTM1 isoforms from other species such as rice (UNIPROT: Q10CT5).

Therefore, it is not yet entirely clear whether PpNTM1 is a methyltransferase responsible for the monomethylation observed here, and whether, at least in Physcomitrella, it could be targeted to both plastids and mitochondria. The present data do not provide evidence for predicted transit peptide cleavages or alternative translation initiation for this protein. Hence, further research is required to investigate the molecular function and localization of PpNTM1. Two scenarios are currently conceivable: (i) The deviant Physcomitrella NMT1 is responsible for the monomethylation of N-termini observed here. A targeted gene ablation based on highly efficient homologous recombination (Hohe et al. 2004) would result in knockout mutants with no or drastically reduced monomethylated N-termini. (ii) In addition to NMT1, at least one other enzyme is responsible for the observed monomethylation of N-termini in Physcomitrella and possibly also in other plants.

Conclusion

In the present study, we used a compilation of 24 proteomic datasets obtained from different experiments to gain first insights into the N-terminome of the model plant Physcomitrella. We found that the percentage of N-terminal acetylation of cytosolic proteins appears slightly lower than the estimated percentage in Arabidopsis. Our data allow the confirmation of hundreds of predicted targeting peptide cleavage sites localizing proteins to plastids. These data can now be used to optimize computational targeting predictors, for customized protein fusions and their targeted localization in biotechnology, and provide new insights into the potential dual targeting of proteins. Furthermore, we show that N-terminal monomethylation is a previously unknown modification of mitochondrial proteins. The function and effects of this modification need to be further analyzed, but we propose PpNTM1 as a candidate for protein methylation in plastids, mitochondria, and the cytosol.

Methods

Cell culture

For all experiments, Physcomitrella WT (new species name: Physcomitrium patens (Hedw.) Mitt.; Medina et al. 2019) ecotype “Gransden 2004” available from the international Moss Stock Center (IMSC, www.moss-stock-center.org, #40001) was used. Cultivation was performed using Knop medium (Reski and Abel 1985) containing 250 mg/l KH2PO4, 250 mg/l KCl, 250 mg/l MgSO4 × 7 H2O, 1,000 mg/l Ca(NO3)2 × 4 H2O and 12.5 mg/l FeSO4 × 7 H2O (pH 5.8). According to Egener et al. (2002) and Schween et al. (2003) 10 mL of a microelement solution (309 mg/l H3BO3, 845 mg/l MnSO4 × 1 H2O, 431 mg/l ZnSO4 × 7 H2O, 41.5 mg/l KI, 12.1 mg/l Na2MoO4 × 2 H2O, 1.25 mg/l CoSO4 × 5 H2O, 1.46 Co(NO3)2 × 6 H2O) was added per liter of medium. Gametophores were either cultivated on plates containing 12 g agar per liter liquid medium or on hydroponic ring cultures as described in Erxleben et al. (2012) and Hoernstein et al. (2016). Hydroponic gametophore cultures were started from protonema culture that was dispersed weekly using an Ultra Turrax (IKA, Staufen, Germany) at 18,000 rpm for 90 s. All cultivation was done at 25 °C in a day/night cycle of 16 h light with a light intensity of 70 µmol/sm2 and 8 h dark.

Treatments

Treatment with the proteasome inhibitor epoxomicin (Peptide Institute Inc., Osaka, Japan) was done using gametophores cultivated on agar plates. Gametophores were harvested and incubated in 10 mL Knop medium containing 20 µM epoxomicin for 24 h (enrichment II, Supplemental Table S1). Red-light treatment (enrichment III, Supplemental Table S1) was done using hydroponic gametophore cultures. Cultures were incubated for 3 days in a red-light chamber at 650 nm. In addition, 50 µM of the proteasome inhibitor MG132 (Selleckchem, Houston, TX, USA) were applied in the culture medium at the beginning of the treatment. Dark treatments (enrichment V and VI, Supplemental Table S1) were done by wrapping the entire boxes of hydroponic gametophore cultures in aluminum foil for the indicated time and wrapped boxes were cultivated further at the same conditions as before. Proteasome inhibition of gametophores during dark treatment (enrichment VI, Supplemental Table S1) was done by submerging a hydroponic ring culture entirely in Knop medium containing 100 µM MG132. The box was wrapped in aluminum foil and incubated for 24 h.

Enrichment of nuclei from gametophores

Eighteen g fresh weight (FW) gametophores were harvested from hydroponic culture and chopped in buffer I containing 1 M 2-methyl-2,4-pentandiol, 10 mM HEPES pH 7.5, 10 mM KCl 10 mM DTT, 0.1% PVP40, 0.1% PPI (P9599, Sigma-Aldrich, St. Louis, MO, USA) according to Nelson et al. (1994) using a custom 4 razorblade chopping device. The homogenate was successively filtered through a 40-µm and a 20-µm sieve and the flow-through was centrifuged for 30 min at 300×g at 2 °C. The supernatant was discarded, and the pellets were carefully dissolved in buffer II containing 110 mM KCl, 15 mM HEPES, pH 7.5, 5 mM DTT and 0.1% PPI. The enriched nuclei were further purified using three-step Percoll gradients (100%/60%/30%, 17-0891-01, GE Healthcare, Solingen, Germany) modified after Marienfeld et al. (1989). The Percoll gradients were centrifuged at 200×g at 2 °C for 30 min. The interface between 100 and 60% was recovered as well as the pellet at the top of the gradient attached to the tube wall. Both fractions were strongly enriched in nuclei, and thus pooled for further experiments. The samples were combined and washed with buffer II and centrifuged again for 10 min at 300×g at 2 °C. The pellet containing enriched nuclei was stored at − 20 °C until further use.

Sequential protein extraction from nuclei

Pellets containing enriched nuclei were dissolved in 400 µL 50 mM Tris–HCl, pH7.6, 1% PPI (P9599, Sigma-Aldrich) and sonicated (Sonopuls HD2070, Bandelin, Berlin, Germany) three times for 20 s with an amplitude between 60 and 90%. Fifty µL DNAse buffer and 50 µL DNase I (EN0521, Thermo Scientific, Waltham, USA) were added and the samples were incubated for 1 h at 37 °C. After centrifugation at 20,000×g for 20 min at 4 °C, the supernatant (Tris extract) was recovered and directly precipitated with 5 vol ice-cold acetone containing 0.2% DTT overnight at − 20 °C. The remaining pellet was dissolved in 400 µL 50 mM Tris–HCl, pH 7.6, 2% Triton X-100, 1% PPI, and again sonicated three times as before. Again, the sample was centrifuged, and the supernatant (Triton extract) was also acetone-precipitated overnight. The remaining pellet was dissolved in 50 mM Tris–HCl, pH 7.6, 4% SDS, 1% PPI, 50 mM DTT and incubated at 95 °C for 10 min. The sample was centrifuged, and the supernatant was acetone-precipitated overnight. All acetone precipitations were centrifuged at 20,000×g at 0 °C for 15 min. The supernatant was discarded, and the remaining protein pellet was washed for 1 h with 1 vol ice-cold acetone without DTT. The centrifugation step was repeated, and the supernatant was discarded afterward. The remaining protein pellets were air dried and stored at −20 °C for further experiments.

Sequential protein extraction from gametophores

One to two g FW of gametophores were ground in liquid nitrogen for 10–15 min. The fine powder was dissolved in Tris buffer containing 40 mM Tris–HCl, pH 7.6, 0.5% PVPP, and 1% PPI. The homogenate was sonicated for 15 min and afterward centrifuged at 20,000 × g at 4 °C for 30 min. The supernatant (Tris extract) was recovered. The remaining pellet containing cell debris was dissolved in 40 mM Tris–HCl pH 7.6, 2% Triton X-100, 1% PPI and again sonicated for 15 min. Again, centrifugation was performed at 20,000×g at 4 °C for 30 min and the supernatant (Triton extract) was recovered. Protein concentrations of the extracts were directly determined via the Bradford assay (Bradford 1976) and aliquots corresponding to 100 µg protein were precipitated with acetone containing DTT as described before.

Enrichment of N-terminal peptides

The dimethylation reaction was carried out according to Kleifeld et al. (2010) with some modifications. Protein pellets were dissolved in 100 mM HEPES–NaOH pH 7.5, 0.2% SDS. Reduction of cysteine residues was carried out using Reducing Agent (NP0009, Life Technologies™, Carlsbad, USA) 1:10 at 95 °C for 10 min or Bond-Breaker® (77,720, Thermo Scientific) 1:100 at 28 °C for 30 min. Alkylation was performed at a final concentration of 100 mM iodoacetamide for 20 min at RT. The dimethylation reaction was carried out by adding 2 µL of a 4% formaldehyde solution (Formaldehyde 13C, d2 solution, 596,388, Sigma-Aldrich or Formaldehyde-D2, DLM-805-PK, Cambridge Isotope Laboratories Inc.) and 2 µL of a 500 mM NaCNBH3 solution per 100 µL sample at 37 °C for 4 h. The same volumes of formaldehyde and NaCNBH3 were added again to the sample, and the reaction was carried out overnight at 37 °C. The dimethylation reaction was stopped by adding 2 µL of a 4% NH4OH solution per 100 µL sample for 1 h at 37 °C. Afterward, the samples were precipitated as described before using acetone without DTT for at least 3 h at − 20 °C. The final enrichment was modified according to McDonald and Beynon (2006).

SDS-based enrichment

The dried protein pellets were dissolved in binding buffer according to McDonald and Beynon (2006) containing 20 mM NaH2PO4, 150 mM NaCl pH 7.5 with 0.2% SDS and in solution digest using either trypsin (V5280, Promega, Madison, USA), GluC (90,054, Thermo Scientific) or chymotrypsin (V1062, Promega) was performed at an enzyme-to-substrate ratio of 1:25 for 4 h at 37 °C (trypsin, GluC) or 25 °C (chymotrypsin). Then the ratio was increased to 1:20 and the reaction was carried out overnight. Enrichment of N-terminal labeled peptides was carried out using 200 µL NHS-sepharose slurry (17-0906-01, GE Healthcare, Solingen, Germany) per 100 µg protein. The slurry was centrifuged for 30 s at 200×g. The supernatant was discarded and 400 µL ice-cold 1 mM HCl was added. The slurry was centrifuged again, and the supernatant was discarded. Afterward, the sepharose was washed with 1 mL binding buffer without SDS. The samples were applied to the prepared sepharose and incubated for 4 h at RT. The sepharose was again centrifuged, and the supernatant was transferred to a new tube containing freshly prepared sepharose. The used sepharose was washed with 20 µL binding buffer and the supernatant was also added to the freshly prepared sepharose. The enrichment reaction was carried out overnight at 4–8 °C. The enriched peptides were desalted using 200 µl C18 StageTips (SP301, Thermo Scientific) that were supplemented with an additional layer of Empore™ SPE Disk C18 material (66883-U, Sigma-Aldrich). The tips were washed prior to use with 100 µl 0.1% TFA and subsequently with 100 µl 80% ACN, 0.1% TFA. The tips were again equilibrated with 100 µl 0.1% TFA and the samples were loaded afterward. The remaining sepharose was washed with 50 µl binding buffer and the supernatant was also transferred to the tip. The tips were washed with 100 µl binding buffer and the retained peptides were eluted with 300 µl 80% ACN, 0.1% TFA. The eluate was vacuum dried and the samples were stored at −20 °C until further analysis.

RapiGest-based enrichment

The dried protein pellets were dissolved in 50 mM HEPES–KOH, 0.1% RapiGest surfactant (RPG, 18,600,186, Waters, Milford Massachusetts, USA). Proteolytic digest was performed as described before. After digestion, the RPG surfactant was cleaved by acidifying the sample to pH 2 using TFA as recommended by the manufacturer. The cleavage was performed at 37 °C for 45 min. Insoluble RPG remnants were removed by centrifugation at 13,000 rpm for 10 min at RT. The peptide-containing supernatant was subjected to solid-phase extraction using SampliQ C18 cartridges (1 ml, 100 mg, 5982–1111, Agilent, Santa Clara, USA). Prior to the extraction, the cartridge was washed successively with 1 ml 0.1% TFA,0.1% TFA in 80% ACN, 0.1% TFA. Then the peptide solution was applied and washed with 1 ml 0.1% TFA. The peptides were eluted in 600 µl 0.1% TFA in 80% ACN and vacuum dried. The dried peptides were stored at − 20 °C until further use. For enrichment of the N-terminal peptides, the dried and purified peptides were dissolved in binding buffer as described before without SDS, and the enrichment using NHS-sepharose was performed accordingly. Finally, the enriched N-terminal peptides were desalted again using SampliQ C18 cartridges as described before. The dried peptides were stored at − 20 °C until mass spectrometric analysis.

Mass spectrometry

NanoLC–MS/MS analyses were performed on an LTQ-Orbitrap Velos Pro (ThermoScientific) equipped with an EASY-Spray Ion Source and coupled to an EASY-nLC 1000 (Thermo Scientific). Peptides were loaded on a trapping column (2 cm × 75 µm ID. PepMap C18 3 µm particles, 100 Å pore size, Dionex, Thermo Scientific) and separated either on a 25 cm EASY-Spray column (25 cm × 75 µm ID, PepMap C18 2 µm particles,100 Å pore size) with a 30 min linear gradient from 3 to 30% ACN (V5280, Promochem) and 0.1% FA (56302, Thermo Scientific), or on a 50 cm EASY-Spray column (50 cm × 75 µm ID, PepMap C18 2 µm particles, 100 Å pore size, Dionex, Thermo Scientific) with a 360 min linear gradient from 3 to 30% ACN and 0.1% FA in the case of in solution digested proteins such as the enriched N-terminal peptides. MS scans were acquired in the Orbitrap analyzer with a resolution of 30,000 at m/z 400, MS/MS scans were acquired in the Orbitrap analyzer with a resolution of 7500 at m/z 400 using HCD fragmentation with 30% normalized collision energy. A TOP5 or TOP10 data-dependent MS/MS method was used. Dynamic exclusion was applied with a repeat count of 1 and an exclusion duration of 30 s or 2 min in the case of long gradients. Singly charged precursors were excluded from selection. Minimum signal threshold for precursor selection was set to 50,000. Predictive AGC was used with a target value of 106 for MS scans and 5 × 104 for MS/MS scans. Lock mass option was applied for internal calibration using background ions from protonated decamethylcyclopentasiloxane (m/z 371.10124). Electron-transfer dissociation (ETD) fragmentation was performed with 35% normalized collision energy. A TOP5 data-dependent MS/MS method was used. Dynamic exclusion was applied with a repeat count of 1 and an exclusion duration of 30 s. Singly charged precursors were excluded from selection. Minimum signal threshold for precursor selection was set to 75,000. Predictive AGC was used with AGC target a value of 106 for MS scans and 5 × 104 for MS/MS scans. ETD activation time was set to 250 ms for doubly charged precursors, 166 ms for triply charged precursors and 125 ms for quadruply charged precursors, and the AGC target was set to 200,000 for fluoranthene. Lock mass option was applied for internal calibration in all runs using background ions from iron(III) citrate (m/z 263.956311).

Raw data processing and database search

Raw data were processed with Mascot Distiller (V2.8.3.0, https://www.matrixscience.com/) and database searches were performed using Mascot Server (V2.7.0, https://www.matrixscience.com) against a database containing all V3 Physcomitrella protein models (Lang et al. 2018) as well as their reversed sequences as decoys. In parallel, a search was performed against a database containing the sequences of known contaminants, such as keratin (269 entries, available on request). For all samples, semi-specific protease specificities were chosen and in the case of tryptic digests, the specificity was set to semi-ArgC. Variable modifications were Gln > pyro Glu (N term Q) − 17.026549 Da, oxidation (M) + 15.994915 Da, acetyl (N-term) + 42.010565 Da, 13C,D2 dimethyl (N-term) + 34.063117 Da, hybrid-methylation (N-term) + 31.047208 (13CD2CH2) Da and deamidation of asparagine (N) + 0.984016 Da. Fixed modifications were carbamidomethyl (C) + 57.021464 Da and 13C,d2 dimethyl (K) + 34.063117 Da. In the case of samples dimethylated with D2 formaldehyde, the dimethylation (K, N-term) mass shift was + 32.056407 Da and hybrid methylation (N-term) was set to + 30.043854 Da (C2D2H2). A precursor mass tolerance of ± 8 ppm and a fragment mass tolerance of ± 0.02 Da were specified. Search results were loaded in Scaffold5™ software (V5.0.1, https://www.proteomesoftware.com/) using the high mass accuracy scoring and independent sample grouping method.

Computational analysis

The presence of cleavable N-terminal targeting signals (plastid, mitochondrion, secretome) was performed with TargetP2.0 (Armenteros et al. 2019) and in selected cases with Localizer (Sperschneider et al. 2017). Ambiguous targeting to plastids and mitochondria in Physcomitrella was predicted with ATP2 (Fuss et al. 2013). Potential alternative translation initiation was predicted with TIS (https://www.tispredictor.com/) (Gleason et al. 2022a, b). All plots and tables were created using custom PERL scripts and R (R Core Team 2024).

Multiple sequence alignment and phylogenetic reconstruction

The amino acid sequence of human NTM1 (UNIPROT: Q9BV86) was used as the query in BLASTP-like searches with DIAMOND (Buchfink et al. 2021) in “ultra-sensitive” mode against the proteomes of Anthoceros angustus (Zhang et al. 2020), Amborella trichopoda (Amborella Genome Project et al. 2013), Arabidopsis thaliana (Cheng et al. 2017), Calohypnum plumiforme (Mao et al. 2020), Ceratodon purpureus (Carey et al. 2021), Funaria hygrometrica (Kirbis et al. 2020), Marchantia polymorpha (Bowman et al. 2017), Oryza sativa (Ouyang et al. 2007), Physcomitrella (Lang et al. 2018), Physcomitrellopsis africana (Vuruputoor et al. 2024), Selaginella moellendorffii (Banks et al. 2011), Sphagnum fallax (Healey et al. 2023), Sphagnum magellanicum (Healey et al. 2023), Takakia lepidozioides (Hu et al. 2023), and Volvox carteri (Prochnik et al. 2010). Best hits were validated by reciprocal BLAST and aligned with MAFFT (Katoh and Standley 2013) in “localpair” mode with a maximum of 1000 iterations. Phylogenetic reconstruction was performed using RAxML-NG (Kozlov et al. 2019) using the “JTT-DCMUT + G4” model and 1000 bootstrap replicates, rooted at the split between human and plants and visualized using R (R Core Team 2024) and the ggtree package (Yu et al. 2017).