Targeting of proteins to the cell wall of the diatom Thalassiosira pseudonana

Diatoms are unicellular phototrophic organisms with huge ecological impact. Characteristic for these organisms is their peculiar cell wall, which is composed of inorganic and organic components. Cell wall formation is a highly complex and orchestrated process, and in the last years has been studied intensively, also on the molecular level. Here, we review on the cell wall proteins of diatoms, with a focus on the species Thalassiosira pseudonana. We report on the expression patterns of these proteins in synchronized cultures, as well as their modifications and intracellular targeting.


Background
Diatoms are eukaryotic, unicellular, photosynthetic organisms which play fundamental roles in the global ecology and biogeochemical cycling [1][2][3][4]. One of the diatoms' most remarkable characteristics is their cell wall, the frustule. It is made of amorphous silica (SiO 2 ) and characterizes each single species thanks to its typical micro-and nano-scaled patterns [5].
Given its peculiar structure, biologists like to describe the frustule with analogies to a petri dish (see Fig. 1): it is composed, indeed, of two half shells of different size named thecae, the larger one (the epitheca) overlapping a slightly smaller one (the hypotheca). Each theca is further subdivided into a valve (the lid of the dish) and a cingulum (the side of the dish); therefore, the epitheca is composed of the epivalve and the epicingulum, whereas the hypotheca is composed of the hypovalve and the hypocingulum. The two valves face each other at the opposite sides of the frustule and are joined together by the epi-and hypocingulum. The two cingula together form the girdle, consisting of a series of circular, overlapping structures called girdle bands [6][7][8]. The girdle band proximal to the valve is called valvocopula [6,8], whereas the most distal one-which covers the region where the epitheca overlaps the hypotheca-is called pleural band [9].
During vegetative growth diatom cells divide mitotically. Directly after cytokinesis the protoplasts of the newly formed daughter cells are still retained inside the mother cell's frustule and, before being able to separate, each cell must synthesize a new theca (Fig. 2). Because of this constraint, each daughter cell inherits from the parental cell only one theca and synthesizes the other one de novo [5]; therefore the inherited parental theca becomes the epitheca, while the new one becomes the hypotheca (Fig. 2). Afterwards, the two daughter cells separate from each other and each synthesizes new girdle bands, thereby increasing the valve-to-valve distance and its cell volume [10]. For most diatoms, this peculiar pattern of division determines a stepwise reduction in the average cell size of the population with the proceeding of the mitotic divisions [5]. However, some extensively studied species (Thalassiosira pseudonana, Phaeodactylum tricornutum A B C Fig. 1 The Thalassiosira pseudonana's cell and its frustule. The cell wall, especially the region of the valve, is decorated with silica ribs and pores, and some protruding structures called fultoportulae. and Cylindrotheca fusiformis) show no size reduction when grown in laboratory conditions, but the mechanism that they use to avoid shrinking is still unknown [10]. The cell wall of diatoms consists of biosilica, an organic-inorganic hybrid material, which is generally composed of nano-and micro-patterns of pores and ribs, organized in a hierarchical manner (e.g. [11,12]). A lot of efforts were undertaken to characterize the cell wall-associated organic fraction, on a molecular level, particularly for the species Thalassiosira pseudonana. This diatom has been the first to have its genome sequenced [13] and became the model organisms for studying biosilica composition and generation [14][15][16]. When the biosilica of diatoms is treated with a mildly acidic solution of ammonium fluoride, a soluble and an insoluble fraction can be separated, each containing different organic molecules [11,14,17,18]. In the soluble fraction obtained by the extraction of T. pseudonana's cell wall, long-chain polyamines (LCPA) are present [19] as well as highly post-translationally modified proteins such as silaffins or silacidins [14,18,20]; the insoluble fraction is composed of polysaccharides and proteins, such as cingulins [11,17,21]. The insoluble polysaccharides include chitin, which is arranged in fibers extending from the theca through specialized  [45]. The steps are: (1) the mother cell starts a new cell cycle. (2) Following DNA replication (S phase) the cell enters the G2+M phase; after cytokinesis (end of M phase) the two daughter cells' protoplasts have formed but are still retained inside the parental frustule. (3) In each protoplast, a valve-SDV is formed and the synthesis of the building blocks for valve generation starts within it. (4) When the process is completed, the content of the SDVs is exocytosed. (5) The daughter cells are not encased in the parental frustule any longer and they separate; cell A inherits the parental epitheca and it is therefore larger than cell B, which instead inherits the hypotheca. (6a, b) Analogously to what happens for valve synthesis, the building blocks for girdle bands formation are synthesized inside girdle band-SDV. (7) When the process is completed, the content of the girdle band-SDV is exocytosed; (8) by stepwise synthesis of new girdle bands (only the formation of the first girdle band is depicted), the cells increase the girdle length and their cell volume. By a yet unknown mechanism, the smaller cell restores the original size, and the average size of the population doesn't decrease with the mitotic divisions. When the cells have reached the maximum number of girdle bands they can enter a new round of the cell cycle. For simplicity, no organelle is shown except for the plasma membrane, the nucleus (in 1 and 2), the SDV. The newly synthesized biosilica is in cyan; the parental frustule is in grey; the cytoplasm membrane, nuclear membrane and the SDV membrane are depicted in dark blue. The relative lengths of the cell cycle phases are shown in the center

Review
Discover Materials (2021) 1:5 | https://doi.org/10.1007/s43939-021-00005-z 1 3 pores-called fultoportulae-present within the silica (e.g. [21]) (see Fig. 1); furthermore, a chitin-based network-like scaffold (resembling the shape and dimensions of the biosilica) was extracted from T. pseudonana's cell wall [22]. The biosynthesis of the cell wall of diatoms is in the center of many scientific projects and several in vitro assays contributed a lot to our understanding of biomineral generation (see for example: [23,24]). These studies have highlighted the importance of a specialized compartment for biosilica synthesis, the silica deposition vesicle [25,26].

The silica deposition vesicle
The silica deposition vesicle (SDV) is the organelle where the synthesis of the new components of diatoms frustule takes place. The SDV is a cisternal compartment that underlies the cytoplasmic membrane when fully expanded and which is surrounded by its own membrane (the silicalemma). The building blocks necessary for cell wall formation (i.e. valves and girdle bands) are first generated within the SDV, and then exocytosed outside of the plasma membrane [27]. According to the components of the frustule that are synthesized inside the vesicle, two different SDVs exist (valve-SDVs and girdle band-SDVs) with their specific positioning and orientation within the cell being presumably ensured by the cytoskeleton (summarized in [23]).
In T. pseudonana, several silicalemma proteins have been identified: the silicanin Sin1 [28] and the silicalemma-associated proteins SAP1 and SAP3 [29]; their molecular characterization and in vivo localization confirmed them to be membrane proteins with a longer, luminal, N-terminal domain separated from a short, cytosolic, C-terminal domain by a single trans-membrane domain. Furthermore, knockdown experiments with SAP1 and SAP3 [29] as well as knockout experiments with Sin1 [30] highlighted a prominent role of these proteins in silica morphogenesis and, in the case of Sin1 only, also an influence on the strength of the silica structure [30].
The SDV has some similarities to the vacuole of higher plants: the latter is, for example, an acidic compartment as the SDV (e.g. [31]), and a recent study described a V-type-H + -ATPase (VHA) as being responsible for the acidification of the SDV [32]. Depending on the ongoing cell wall synthesis step (and therefore on the cell cycle phase), VHA is translocated between the silicalemma, the tonoplast (i.e. the vacuole-surrounding membrane), and the plasma membrane (see Fig. 3). In addition to that, VHA appears to be fundamental in regulating the fusion of the SAP1 containing vesicles with the SDV. In any case, vesicles targeted to the SDV provide not only proteins for silica generation but also lipids for expansion of the SDV.

Transport of proteins to the cell wall
A study from 2010 [33] has identified an ER-resident kinase that was shown to phosphorylate recombinant silaffins in vitro. Based also on these results, a model was proposed [34], in which proteins of the cell wall are synthesized at the ER and afterwards directed to the Golgi (see Fig. 3). Cell wall-associated proteins not involved in silica biogenesis (such as frustulins and pleuralins) are proposed to be transported from the trans-Golgi network to the cytoplasmic membrane by secretory vesicles, whereas cell wall-associated proteins involved in frustule morphogenesis (silaffins, silacidins, cingulins, silicanins) might be transported to the SDV by a different type of vesicles. However, with the exception of three proteins (Sin1, SAP1 and SAP3), there's no published data showing an SDV transit of a frustule-associated protein.
At least for the SDV route, a receptor-like protein within the ER, the Golgi, or both should be responsible for specific recognition of the transport substrates and also be involved in vesicle generation. For fusion of the vesicles with the cytoplasmic membrane or the SDV, further specifying factors are necessary, such as SNAREs and Rab proteins. Proteins directed to the vacuole are transported via vesicles from the trans-Golgi/early endosome [35]; however, another transport pathway is known (e.g. [35]) in which vesicles of ER origin are directly targeted to the vacuole, bypassing the Golgi apparatus. As already mentioned, similarities between the vacuole and the SDV exist; therefore, it's possible that an ERvacuole-SDV transport pathway (bypassing the Golgi) exists as well (Fig. 3).
After the synthesis of the new cell wall component within the SDV is completed, it is delivered to the extraplasmatic site of the cytoplasmic membrane by exocytosis. For the latter, four possible mechanisms have been hypothesized (detailed in [28]) which will be briefly presented here: (A) the proximal silicalemma fuses with the already existing plasma membrane (becoming the "new" plasmalemma), whereas the distal silicalemma and the "old" plasmalemma are discarded; (B) the fusion mechanism between the plasmalemma and the silicalemma is identical to the one described in (A) but the old plasmalemma and the distal silicalemma are recycled to form an organic coat around the valve; (C) the old plasmalemma and the entire silicalemma are recycled to form the organic coat around the valve, and the new plasmalemma is formed from an extension of the remaining old plasmalemma, or from the ER; (D) the plasmalemma and the distal silicalemma fuse at the center of the valve, and the pulling of the membranes allow the SDV content to be exocytosed; the distal silicalemma is then taken up into the cytosol and recycled, whereas the proximal silicalemma becomes the new plasmalemma. In any case recent studies have shown that the membrane of the SDV is fusing with the plasma membrane [28,32] and that the hypothesis D is the only one fully consistent with the results obtained [28].

From transport routes to protein structures and targeting
Although hypothesis on the general transport pathways of cell wall-associated proteins have been already made, further speculations on the topic could be based both on studying the proteins' structure and their targeting signals, and on analyzing the correlation between their in vivo location and the expression pattern of the genes encoding them. For silaffins, it appears that their amino acid composition and related post-translational modifications (PTM), rather than the protein sequence itself, are important for their function [14,36]. All the known silaffins from both T. pseudonana and Cylindrotheca fusiformis, in fact, carry similar modifications: phosphorylation of hydroxy-amino acids; polyamine-type, and mono-, di-and trimethyl modifications of lysines; hydroxylation of proline and lysine residues; complex glycosylation and sulfation [10]. Also silacidins undergo extensive post-translational processing during maturation of the pro-peptides; For simplicity all organelles and cell structures are omitted, except for the nucleus, the cytoplasm membrane, the endoplasmic reticulum (ER), the Golgi apparatus, the silica deposition vesicle (SDV) and the vacuole. All known cell wallassociated precursor polypeptides contain a signal peptide for cotranslational import into the ER, therefore the four pathways are depicted as starting from the ER. Pathway A: secretory proteins, which are not targeted to the SDV, are first directed to the Golgi apparatus and then transported from the trans-Golgi network to the cytoplasmic membrane by transport vesicles. Proteins which are instead targeted to the SDV could reach this organelle via vesicles departing from the trans-Golgi network (pathway B) or directly from the ER, therefore bypassing the Golgi apparatus (pathway C). We cannot exclude that a still unknown SDV-targeting pathway involves the vacuole and the plasma membrane (pathway D; [32]). The inlet at the bottom right of the figure specifies the cargos, vesicle coat proteins and SNAREs (re-drawn and extended from Poulsen et al. [34])

Review
Discover Materials (2021) 1:5 | https://doi.org/10.1007/s43939-021-00005-z 1 3 these proteins, in fact, have many phosphorylated serine residues and contain several copies of the spacer sequences RRL, which is the target site of proteolytic cleavage [18]. All silaffins have at least one R-X-L motif and the same is true for the known silicalemma membrane proteins (silicanins and SAPs), having more than one R-X-L motif and several predicted PTM sites [28,29]. No detailed information about post-translational modification of cingulins is available to date [34], but all of them (except for CinY4) carry similar motifs [11,17,34]. Reports on enzymes post-translationally modifying silica-related biomolecules, as well as on their cellular localization, are rare. However, as mentioned above, a silaffin-kinase was found associated with the ER [33]. It would be useful to identify the remaining enzymes and especially their cellular location, which in case of a putative SDV location would imply functions in post-translational modifications of proteins in the SDV. The localization of cingulins and silaffins has been investigated [11,17,34,37] by expressing the corresponding genes both using their native promoters as well as non native, inducible ones [from the fucoxanthin chl a/c binding protein (fcp) and/or the nitrate reductase (NR) genes]; these studies have proven to be very useful for investigating the regulation of the cell wall biogenesis (see later and Fig. 4). Cingulins (CinW1 to CinW3 and CinY1 to CinY3), when expressed as GFP fusion proteins under the control of the NR promoter, were located in the girdle, even though with remarkable differences among them, concerning the number of girdle bands labeled and their position [17] (see below).
Regarding the silaffins Sil1, Sil3 and Sil4, different localizations were obtained: Sil1-GFP (when using the fcp promoter) is specific for the valve [34]; Sil4-GFP (when using the NR promoter) was identified in the valve and the more proximal girdle bands [34]; Sil3-GFP (when using the fcp promoter) showed either localization over the entire frustule [34,37] or, in another experiment using the same promoter [38], both in the valve and in the girdle, but excluding the pleural bands.
Targeting proteins to specific cellular localizations needs targeting signals. A detailed analysis of the protein Sil3 [34] demonstrated that, in addition to a canonical signal peptide (SP), the presence of a short amino acid stretch containing five non-consecutive lysines-so-called pentalysine clusters (or PLC)-is important for efficient silica targeting of Sil3 [34]. Modification of the isolated PLC motif led to mis-targeting of Sil3-derived peptides fused to GFPs, with the fluorescent signal mostly tagging to intracellular destinations. As all silaffins known to date contain a SP and at least one PLC, it is possible that this motif plays a similar role in the other known silaffins as well. This speculation might additionally be supported by the results of experiments in which individual PLC from Sil1 (PLC1), from Sil3 (PLC3ct) or an artificially created one (PLCart) were localized as GFP fusion proteins. In all the three cases silica targeting was achieved, even though with different efficiency and localization, confirming that PLC act as silica-targeting motifs in the silaffins from T. pseudonana [34].

Expression of genes encoding cell-wall associated proteins
The expression of genes encoding for proteins involved in frustule synthesis might be cell cycle-regulated and, if so, cell cycle-specific expression patterns should be expected in a synchronized culture. Synchronous growth of T. pseudonana's cells can be achieved by first switching from a silicon-replete medium to a silicon-free one, resulting in a stop during the G1 phase of the cell cycle [39]; following the re-addition of silicon, the cells overcome the arrest, conclude the G1 phase and proceed further in the cycle. Such synchronization method permitted researchers to enrich cultures for cells undertaking the same biological process, allowing detailed investigations on specific aspects of the cell-wall synthesis process [12,29,30,32,34,39,40]. Synchronized cultures have been investigated for transcript level changes [41,42] and the results have been reanalyzed for genes encoding SDV/cell wall specific proteins [23]. Here we refer to the data published by [41] and then re-interpreted in [23].
According to the composition of the proteins, cingulins were classified either as W-type or Y-type cingulins [17], both showing girdle band-specific localizations (see above). In synchronized cells, however, when fold changes were plotted against the time after Si addition [23,41], their expression pattern is different and the transcript levels are not comparable between same time points (Fig. 4). Whereas some W-type cingulin genes showed either no significant differences between the studied time points (cinW1 and cinW3) or slight upregulation at 7-8 h (the time of valve generation, cinW2), the Y-type cingulin genes-as specified by cinY1, cinY2, cinY3 and cinY4-were upregulated between 2 and 4 h (in the late G1/S phase), when the last girdle bands are synthesized and well before valve formation is initiated [24]. Thus, at least the transcriptional regulation indicated that the different protein composition of the W-and Y-type cingulins is reflected in their timing of expression, with the Y-type genes showing a similar expression pattern, with a peak before valve synthesis, whereas some W-type genes are constitutively expressed and show no, or very limited, regulation.
By analyzing the data on cingulins, one further aspect is remarkable: all the tested cingulins, when expressed as GFP fusion proteins via the NR promoter (which was constitutively functional under the experimental conditions used)  For all the proteins located in the girdle, the GFP signal is obviously absent from the surface of the valve, therefore resulting in a ring-like GFP signal when viewed from the top of the valve. The first image on the left of the upper row shows the wild type color images used as templates for this representation; since both of them are cross-sections of the cell, the lower image does not really show the surface of the valve but it is anyway used to specify the location of the GFP signal. For Sil3 (fcp) we refer to [34,37]; for Sil3 (native), CinW1, CinW3, CinY1 and CinY3 we refer to [17]; for CinW2 (NR) and (native), for CinY2 (NR) and (native) and CinY4 (native) we refer to [11]; for Sil1 (fcp) and Sil4 (NR) we refer to [34] localized exclusively in the girdle bands although being expressed in phases in which a valve-SDV only, or even no SDV, is present. Moreover, as these targeting experiments of cingulins by the use of the NR promoter led to a girdle localization, the signals determining the targeting pathway are expected to be located either within the protein or in the coding regions of the mRNAs, but not in the native 5′-and 3′-untranslated regions. Additionally, the locations of the GFP fusion proteins using the native versus NR promoters for gene expression were studied for two cingulin representatives [11].
In these experiments the CinW2-GFP fusion protein were located in all parts of the girdle, regardless of the promotor used. On the contrary, CinY2-GFP was located in the pleural bands only when using the native promoter (therefore cell cycle-specific), whereas the use of the NR promoter (constitutively expressed) led to a more widespread localization in the girdle. Thus, at least for CinY2, timing of expression is involved in the establishing the correct location.
In respect to the transcriptional regulation of the silaffin genes, a similar picture emerges when fold changes were plotted against the time after Si addition to the medium [23,43]: sil4 showed only a very slight transcriptional level change in the different cell cycle phases [23], whereas sil3 had a maximum in the phase of valve generation together with a minor peak in the time frame of girdle band synthesis [23,43]; sil1 showed a maximum during the first 2-4 h after silicon replenishment, followed by downregulation [43]. As already mentioned, when expressing Sil3-GFP under the control of the fcp promoter, the fusion protein localized either in the entire frustule [34,37] or in the valve and in the girdle without the pleural bands [38]. The latter location was the same as for a Sil3-GFP fusion protein when expressed under control of the native promoter [17]. Finally, a putative further level of frustule synthesis control was identified by analyzing the transcriptional expression pattern of the silacidin gene, which was slightly, if at all, down-regulated in the time of valve synthesis (Fig. 4) [23]. Surprisingly, translational downregulation of the silacidin gene led to an increase of the cell size [44], indicating that silacidins might directly or indirectly regulate the cell size by counteracting cell expansion.

Conclusions
Frustule synthesis in diatoms is a sophisticated process. In the model species Thalassiosira pseudonana, in vitro and in vivo experiments indicated several levels of regulation, which cover not only gene expression, protein synthesis and biosilica generation, but also the specific signals determining the intracellular targeting of proteins involved in silicification. In this respect, transcriptomic data might be useful because the mRNA expression pattern can be compared to in vivo protein localization data under the likely assumption that the transcriptional regulation is mirrored on the protein level. In any case, one should expect a delay of unknown duration between mRNA synthesis and accumulation of the respective proteins. The sil3 gene has been used as marker for the correlation between gene expression and frustule synthesis in Thalassiosira pseudonana, because its major expression peak marks the time of valve formation [23,41,43].
Comparing the localization of proteins when using the native or the non-native inducible promoters is an interesting approach but, given the low number of studied proteins, is difficult to extend hypothesis based on the obtained results. The same experimental design but on a higher number of candidate proteins, would result in a larger dataset and, therefore, in a more detailed discussion about the correlation between the transcriptional regulation of the genes and the correct cell wall targeting of the proteins they encode.
Due to the high experimental accessibility of Thalassiosira pseudonana, new techniques such as super-resolution microscopy or gene editing will boost our knowledge. In addition, applying state-of-the-art protein isolation and analytical technologies might help in determining the types and quantities of proteins of the cell wall, which is an important step towards artificial design of the cell wall of diatoms.