Structural characterization of the extracellular stalk material of the diatom Didymosphenia geminata

The study represents new bioanalytical characterization of mainly organic components of the poorly investigated extracellular polymeric substances (EPS) of the enigmatic diatom Didymosphenia geminata, an invasive, worldwide expanding species endangering diverse ecosystems. This microalga attaches its siliceous cells to rocky substrates using fibrous stalks, which are made of an EPS-based matrix stabilized by crystalline calcite. The EPS were analyzed using selected methods, including microscopic, spectroscopic, and spectrometric techniques. We identified diverse types of biomolecules. The presence of lipids, condensed aromatics, and heteroaromatic compounds in the EPS has been confirmed using high-resolution mass spectrometry (HR-MS). Additionally, both sulfur-containing functionalities and carboxylic acids were determined too using infrared (IR) spectroscopy and nuclear magnetic resonance (NMR) spectroscopy. For the first time, lignin compounds have been detected as one of the components of the EPS of the D. geminata diatom, using HR-MS and fluorescence microscopy (FM) in combination with specific staining techniques. By increasing the understanding of the chemistry and structural features of the stalks, we aim to develop potential applications and methods for removing these stalks from affected regions in the future, or, alternatively, to use them as a large-scale source of sustainable biocomposite material. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1007/s00216-024-05370-1.


List of Tables
S1. Solid-state 13 C-MAS-NMR NMR spectroscopic investigations were performed of the stalks of Didymosphenia geminata.Of the untreated stalks and the demineralized sample, single pulse excitation (SP) spectra were recorded using the hpdec pulse program (Figure S1 and Figure S2).The crystalline calcite has a long relaxation time and therefore the repetition time was set to 60 s to achieve a complete relaxation of all 13 C nuclei in the calcite.However, the long recycle delays lead to long experimental times and thus, only 4 k scans could be recorded in a reasonable time (approximately 3 days).By treating the stalks with a 6 M hydrochloric acid solution, the calcite was dissolved and could be removed from the samples.This way and due to the presence of paramagnetic ions in the material, the repetition time could be reduced to 1 s and more scans could be recorded.In addition to SP/MAS-NMR spectra, CP/MAS-NMR spectra were recorded giving a better SNR.However, the calcite signal cannot be detected by this method.The different regions in the spectra can be assigned to typical structural groups.The regions used here for the analysis of all spectra are summarized in the following Table S1.All spectra were recorded at a 400 MHz WB spectrometer in 4 mm ZrO 2 rotors at 10 kHz rotation frequency and with tppm15 decoupling.The results are presented in the Figure S1 to Figure S5.).The aforementioned compound classes can complex paramagnetic ions like iron ions, leading to a worse magnetization transfer due to fast relaxation in the CP/MAS-NMR experiment and consequent signal intensity reduction.Removing paramagnetic compounds may result in increased signal intensity.However, the use of EDTA as a complexing agent presents new issues, as previously mentioned, due to the inability to be fully removed from the sample, resulting in signal overlapping with the sample's signals.In all three samples are signals at a g-factor of 2.0 and 4.3 approximately.The g-factor of a free electron is 2.00232 [1].Generally, the signal at a g-factor of 2.0 cannot be properly assigned, because a variety of compounds and ions generate a signal in this region, e.g.organic radicals or a wide range of transition metal ions.However, it might originate from octahedral coordinated iron(III).The signal at 4.3 can be assigned to iron(III) with tetrahedral coordination.[2] To further confirm the complexing of iron ions, ICP-OES analyses were performed after a microwave digestion using HF of the samples.The results of the sample extracted with EDTA (Stalks_EDTA) are presented in the following Table S2.The sample contained the lowest content of transition metal ions due to the extraction with EDTA.As the results in Table S2 show, the content of most transition metal ions is below the LOQ and hence, a quantification of these elements is not possible.The content of silicon, titanium and aluminum in the sample probably originates from both the cell bodies (in regard to the silicon) as well as ground rock that is still present in the sample.After the extraction of the sample using EDTA, there is still a high iron content.Thus, we conclude that iron is complexed at the carboxylic acid groups as well as S-containing groups like sulfonic acid or sulfate ester groups.Therefore, the signal detected in the EPR spectra of all three samples presented here are assigned to tetrahedral iron(III) ions.In Figure 3 in the main part of this paper, the total count of molecular formulae that can be assigned to each heteroatomic class is presented.Furthermore, for each heteroatomic class, the relative abundance of molecular formulae can be shown alongside to the absolute count (see Figure S11).Here, the intensity of a peak assigned to a specific molecular formula influences the plot.If an ion has a high intensity in the spectrum, it results in a high relative abundance, even if this molecular formula is the sole molecular formula detected in this class.This can be illustrated by referencing Figure S12  A reversed trend is visible in the heteroatomic classes O 3 (red) and O 4 (green) in the negative ion mode (Figure S13).Here, both classes show a high number of molecular formulae in each class, however, the relative intensity in the mass spectrum is low in comparison to other oxygen-containing classes like O 9 (violet) and O 12 (light blue).

S-13
Figure S12: Section of the GALDI(+)-mass spectrum, m/z of 490 to 550, peaks assigned to N1O4 class colored in red.The figures of the n C -DBE plots for each heteroatomic class, which were analyzed in S4, are displayed in Figure S14 and Figure S15.The DBE was calculated using the equation (2) for a compound with a following composition of

S-15
The data shows significant differences between the negative and positive ion modes.Within the negative ion mode, especially compounds within heteroatomic classes that only contain oxygen as heteroatoms are detected.As

S-16
previously stated in this paper's main section, molecular formulae with an oxygen number of two to three can be assigned to fatty acids and fatty acid esters.In the heteroatomic classes O 5 to O 12 in Fig. S12 ions with a DBE greater 10 were detected.In combination with the high oxygen-content, these structures are assigned to lignin-like molecules.A single monolignol has at least a DBE of 5 (aromatic ring structure and a single double bond in the aliphatic propanoid-residue).Compounds in the mentioned oxygen classes are better ionized in the negative ion mode because lignin-like molecules are easily deprotonated.The oxygen-containing classes O 6 to O 10 included the highest number of lignin-like structures and the sum formula with a DBE greater 10 and a carbon number greater 18 indicates oligomers containing two to three monolignol-units.) can be assigned to a modified disaccharide.However, distinguishing between different monosaccharides or disaccharides with identical molecular formulae is not possible via mass spectrometry because the differences between each stereoisomer is the configuration of the hydroxy groups.Here, a fragmentation of different monosaccharide or disaccharide references is necessary to obtain their characteristic fragmentation pattern.
Using the positive ion mode, almost no molecular formulae containing only oxygen as a heteroatom could be detected.However, as mentioned in the main part of the paper, many of the detected compounds contain nitrogen and due to the high DBE, heteroaromatic compounds are most likely present in the sample.Molecular formulae containing sulfur are ionized and detected using both GALDI(+)-and GALDI(−)-MS.

S6. Fluorescence microscopy investigations
Different experiments were performed to investigate the fluorescence emission of the stalks after staining.Here, the two samples Stalk_raw and Stalk_EDTA as well as the lignin reference Indulin were stained with Safranin O (SO) and Acridine Orange (AO).Two differently concentrated AO solutions were used (0.01 M and 10 -6 M), because depending on the concentration of AO, it is forming either monomers or dimers (see main part of this paper).The structural formula of each dye is shown in Figure S16.The sample weights used for the experiments are summarized in Table S3.Table S3: Masses of all samples for each experiment.

SO AO (high) AO (low)
Stalk_raw 2.9 mg 3.0 mg 2.7 mg Stalk_EDTA 5.8 mg 5.3 mg 5.5 mg Indulin 5.8 mg 6.8 mg 5.4 mg SO is selective for staining lignin and indicates a red fluorescence and no green or blue emission.In the following Figure S17 to Figure S19, the results of the samples stained with SO are summarized.Indulin and Stalk_raw show similar results that indicate the presence for lignin in the stalks.The demineralized and extracted sample Stalk_EDTA lost its fibrous structure.Additionally, in the digital microscopic image in Figure S19 a) there is a fiber that does not belong to the stalk because of its different surface structure.The fiber has a prominent blue fluorescence that distinguishes it from the sample.
Two different concentrations of AO were used in the next experiment.Li and Reeve [3] noted using concentrations below 10 -6 mol/l lead to the formation of monomers of AO only resulting in green emission when lignin is present in a sample.Increasing the concentration of AO results in the formation of AO dimers that emit red fluorescence.When cellulose is present instead of lignin, isolated AO can be adsorbed resulting in green emission even at higher AO concentrations.[4] Looking at the results of the 10 -6 M AO staining solution, green fluorescence is visible in both the Indulin reference and the untreated stalks (Stalk_raw).This confirms the hypothesis mentioned above that lower concentrations lead to the adsorption of monomers and a green fluorescence.Probably, dimers are also formed, leading to the red coloring that is visible in Figure S20 d), Figure S21 d) and Figure S22 d).When the concentration is increased to 10 -2 M, there is no green fluorescence, so only dimers are formed, which interact with the lignin (Figure S23 to Figure S25).

Figure
Figure S10: GALDI-MS mass spectra of Stalk_raw in negative (left) and positive (right) ion mode.
and Figure S13.In the positive ion mode, the heteroatom class N 1 O 4 has only five molecular formulae assigned to it, making it the class with the fewest nitrogen-containing compounds.However, the relative abundance of the N 1 O 4 class is the highest, surpassing 14%.This implies that at least one of the five compounds has a high signal intensity in the mass spectrum.An in-house Matlab script was used to visualize the overall data set and peaks of the N 1 O 4 class.The mass spectrum shows two signals with m/z values of 506.232445Da (C 33 H 31 N 1 O 4 ) and 534.263685Da (C 35 H 35 N 1 O 4 ), both of which belong to compounds in the N 1 O 4 class and are the most intense signals in the GALDI(+) mass spectrum (Figure S12).

Table S1 :
Assignments of the important regions in the solid-state 13 C-MAS-NMR spectra.

Table S2 :
Results of the ICP-OES analysis of the sample Stalks_EDTA after a microwave digestion using HF, three-fold determination, elements with a content lower than the limit of quantification (LOQ) are colored red.
Compounds with higher oxygen numbers and n C values, but lower DBE values, can also be found in the classes O 11 and O 12 .For instance, a compound with an m/z of 341.108826Da and a calculated molecular formula of C 12 H 22 O 11 is found in O 11 and is characteristic of a disaccharide.Similar compounds are present in the O 12 oxygen class.The peak at m/z 383.119477Da can be assigned to C 14 H 24 O 12 which is probably an acetylated disaccharide.Similarly, the peak at 397.135042 Da (C 15 H 26 O 12