Introduction

Mycosporine like amino acids (MAAs) are a group of small secondary metabolites (usually ≤ 400 Da) featuring a cyclohexenone or cyclohexenimine backbone with amino acids or amino alcohol substituents. The class is known for its strong absorption in the UV range from 310 to 360 nm (Conde et al. 2003; Parailloux et al. 2020) and, due to their additional efficient radiationless decay, suggested to function as a natural UV filter. Up to now, over 70 MAAs are described from diverse organisms such as algae, cyanobacteria, and lichens (Ngoennet et al. 2018; Orfanoudaki et al. 2019a; Figueroa 2021). The highest diversity and contents have been described in red algae, while in green algae the occurrence correlates with their habitat; for example, in terrestrial species the MAA contents were comparable to those of marine red algae. However, in brown algae MAAs are missing (Hartmann et al. 2016; Orfanoudaki et al. 2019a). Due to their distribution and UV-filtering effect, MAAs are treated as potential chemotaxonomic markers and promising cosmeceutical ingredients (Schmid et al. 2006; Corinaldesi et al. 2017; Lawrence et al. 2018), which triggers a constant research interest.

Since their discovery in 1960 (Raj et al. 2021), significant efforts have been undertaken to improve their isolation and analysis. However, selective and sensitive methods for the qualitative and quantitative determination of these unique bioactive compounds are required to ensure insight into the MAA patterns, especially as literature reports geographic, taxonomic and seasonal variations in their content (Guihéneuf et al. 2018; Orfanoudaki et al. 2019a, 2021; Nishida et al. 2020; Parailloux et al. 2020; Gambichler et al. 2021). Therefore, the analysis of the specific MAA profiles is not only of ecological and pharmaceutical relevance, but may also contribute to chemotaxonomic investigations and the discovery of new bioactive metabolites. Several RP-HPLC methods (Chaves-Pena et al. 2019; Orfanoudaki et al. 2019a, 2020; Nishida et al. 2020), as well as HILIC methods (Hartmann et al. 2015; Parailloux et al. 2020) were described for their analysis, but also alternative techniques, namely capillary electrophoresis, have been applied (Hartmann et al. 2017). We have recently reported on the first comprehensive ultra-high-performance liquid chromatography (UHPLC) approach and provided a selective and fast methodology for the evaluation of MAAs (Zwerger and Ganzera 2022). Nevertheless, also current methods fail to detect and assign minor UV absorbing constituents in crude algal extracts (Bayona et al. 2022; Zwerger et al. 2022). In addition, other metabolites without chromophore are usually not addressed, though they might be relevant for taxonomic classification (Orfanoudaki et al. 2020).

More complex modern analytical approaches combine state-of-the-art separation and mass spectrometric techniques with sophisticated chemoinformatic analyses. Thus, they might be more suitable tools to visualize crude algal extracts. The so-called feature-based molecular network (FBMN) analysis, for example, utilizes the power of UHPLC separation with high-resolution tandem mass spectrometry (HRMS2) analysis to visualize the complex metabolomic space of extracts after a semi-automatic data treatment process (Allard et al. 2018; Nothias et al. 2020). The latter is based on the assumption that chemically similar molecules exhibit comparable MS2 fragmentation patterns.

Initially, such molecular network analyses were developed for dereplication. Due to their immense visualization power, however, FBMN analysis evolved further into a potent tool for metabolomic and ecologic studies (Bayona et al. 2022). The technique utilizes community-shared databases on the Global Natural Products Social (GNPS) molecular network platform (Wang et al. 2016; Bayona et al. 2022) with network visualization via Cytoscape (Shannon et al. 2003). In this way, informative layers (e.g., bioactivity or genetic relationships) can be integrated into the metabolic network, enhancing the interpretation and assignment of metabolites' action. This approach has already been introduced to natural product research and, e.g., used not only for the investigation of anthraquinones in Cortinariaceae (Hammerle et al. 2021), drug discovery in Euphorbia dendroides (Nothias et al. 2020) or bioactive products from Stryphnodendron pulcherrimum (Gomes et al. 2021), but also for several metabolomic studies of marine organisms like sponges (Scarpato et al. 2020; Bayona et al. 2022; Freire et al. 2022; Liu et al. 2022). There is also one publication covering antimicrobial from a brown alga (Fucus vesiculosus) (Buedenbender et al. 2020), yet the here presented study is the first application of FBMN in respect to MAAs.

The present work investigated polar algal extracts by means of FBMN analysis to test the hypotheses that this tool is suitable for (I) comprehensively visualizing the extensive amount of data generated by an approach such as UHPLC-HRMS2, identifying (II) known chemotaxonomic/bioactive markers (focus on MAAs), (III) other metabolites of algae as well as (IV) unknown MAAs, and to (V) study the distribution of MAAs in different species and provenances.

Materials and methods

Algal material and preparation of crude extracts

Several Nori (Porphyra spp.) samples were purchased at local supermarkets in Innsbruck (Austria) in 2021. Other algal species were available in house, collected in 2018 in Brittany (France) and morphologically identified by Prof. U. Karsten from the University of Rostock (Germany), or had been used for previous projects (Orfanoudaki et al. 2019a, 2020); voucher specimens of all samples are deposited at the Department of Pharmacognosy, University of Innsbruck, Austria. For an overview of all evaluated species please refer to Online Resource Table S1. The investigated extracts were prepared according to literature (Zwerger and Ganzera 2022).

Chemicals and reagents

MS-grade acetonitrile was acquired from Merck (Germany), ammonium formate from Serva (Germany). MS-grade formic acid as well as all other used chemicals were purchased from VWR International (Austria). Ultra-pure water was prepared in house with a Sartorius Arium 611 UV purification system (Sartorius, Germany). Standards of MAAs with confirmed chemical structure and purity were available in house from previous projects (Orfanoudaki et al. 2019a; Zwerger et al. 2022). Additionally, a standard mix from highly pre-purified MAA fractions was added to the sample set to allow the detection of minor MAA-compounds that could not be isolated in sufficient quantity till date.

Analytical conditions

UHPLC-HRMS2 Analysis

UHPLC experiments were performed on a Vanquish system (Thermo Scientific, USA) consisting of quaternary pump, autosampler, column oven, and variable wavelength detector connected to a Thermo Scientific Exploris 120 Orbitrap HRMS unit. Separation was carried out on a Phenomenex Luna Omega C18 100 Å column (100 mm × 2.1 mm; particle size 1.6 µm) protected by a SecurityGuard ULTRA guard C18 pre-column. The mobile phase comprised water with 0.25% formic acid and 20 mM ammonium formate (A) and acetonitrile (B). The applied gradient yielding the best results was as follows: from 0 to 12% B in 7 min, after this step changing to 90% B for one minute. Finally, the column was re-equilibrated with the initial solvent composition for 4 min. The flow rate was adjusted to 0.3 mL min−1. Column temperature was set at 11 °C and injection volume to 1 µL, respectively. The detection wavelength was 330 nm.

The instrument was controlled by Thermo Scientific Xcalibur 4.4 software. Calibration of the mass analyzer was done via the Thermo Scientific proprietary calibration mix and the respective automatic calibration function. The mass spectrometric parameters were as follows: Heated-ESI ionization source, positive polarity with static spray voltage (3400 V), sheath gas (N2): 30 arbitrary units, auxiliary gas (N2): 17.5 arbitrary units, sweep gas (N2): 1 arbitrary unit. Temperature of the ion transfer tube and vaporizer was adjusted to 370 and 420 °C, respectively. MS data (range 70–1000 m/z) was recorded from 0 to 8 min with a resolution of 60,000 FWHM for MS1.

Data dependent experiments were conducted with gradually increasing collision energies (HCD energy 15, 30 and 45 eV) at a resolution of 15,000 FWHM. The number of dependent scans was set to 3. The following selection of filters was employed: intensity threshold filter, dynamic exclusion (parent ions were placed in the exclusion list for 2 s after detection), isotope exclusion, charge state and apex filter. Additionally, a targeted mass exclusion was performed.

Mass spectrometry data pretreatment

Xcalibur software yielded data in.RAW format, which was converted to MZmine compatible.mzXML by MSConvert (Chambers et al. 2012). Subsequently, the data set was transferred and treated in MZmine v2.53 (Pluskal et al. 2010). For mass detection (positive mode, centroid data acquisition) a background noise filter of 1.0E5 and 0.0E0 was applied for MS1 and MS2 levels. Algorithm of the ADAP chromatogram builder was set to 1.0E5 for both the minimum group intensity and minimum group intensity threshold; minimum group size of scans was 5 and m/z tolerance 8.0 ppm. Hence, the chromatograms were deconvoluted via the ADAP algorithm (wavelets). The signal to noise (S/N) window was used as estimator with a threshold set to 80. The minimum feature height value was 1.0E5, the coefficient area threshold 100, peak duration range and retention time (RT) wavelet range varied from 0.01 to 0.80 min and 0.01 to 0.04 min, respectively. Isotopes were detected using the isotopes peak grouper with a m/z tolerance of 8.0 ppm and a RT tolerance of 0.05 min (absolute); the isotope selected was the most intense one and the maximum charge set to 1. Alignment of peaks was done applying the join aligner algorithm, with a m/z tolerance of 8.0 ppm, absolute RT tolerance of 0.05 min, weight for m/z 70 and weight for RT 30. The resulting aligned peak list was filtered using the peak-list rows filter option in order to remove any features not associated with MS2 scans and to select features eluted between 0.6 and 8 min only. No gap-filling was applied in this assay. Finally, utilizing an in-house database of MAAs established as described in the work of Geraldes et al. (2020) and references therein (for all library entries see Online Resource Table S2), dereplication was done whereby a m/z tolerance of 8.0 ppm was permitted. The resulting peak list, comprising 2973 individual features, was exported to a.mgf file and.csv quantitation table for submission to the GNPS platform.

Feature-based molecular network generation

Based on the above described.mgf file obtained with MZmine, a network could be generated using the feature-based molecular network workflow (Nothias et al. 2020) on the open source GNPS platform (Wang et al. 2016).

The data were filtered by removing all MS/MS fragment ions within ± 17 Da of the precursor m/z. MS/MS spectra were filtered by choosing only the top 6 fragment ions in the ± 50 Da window throughout the spectrum. The precursor ion mass tolerance and the MS/MS fragment ion tolerance were both set to 0.02 Da. Subsequently, a molecular network (MN) was created filtering edges by a cosine score above 0.6 and more than 5 matching peaks. Furthermore, edges between two nodes were kept in the network if and only if each of the nodes appeared among each other’s respective top 10 most similar hits. Lastly, a molecular family was created with a maximum size of 100; edges with lower scores were removed to achieve this upper threshold. Spectra in the network were then matched against GNPS spectral libraries (Horai et al. 2010; Wang et al. 2016), whereby library spectra were filtered in the same way as the input data. Only network and library spectra that coincided with a score above 0.6 and at least 5 matched peaks were considered. This job is available online on the GNPS platform following the hyperlink: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=acac2223c9cf4e0185f4ec66c06cb2c2. The FBMN was then visualized in Cytoscape v3.8.1 software (Shannon et al. 2003).

UV-signal workflow

For each of the investigated samples the UV trace at 330 nm was extracted. This wavelength is a common MAA denominator and has even been the applied consensus for the quantitative evaluation of MAAs in the past (Chaves-Pena et al. 2019; Orfanoudaki et al. 2020; Zwerger and Ganzera 2022). By using Origin 2020, all chromatograms were integrated, and the start and endpoint of each peak defined. In the combined feature list, the sample with the maximum abundance of each feature was determined. Subsequently, the features’ retention time was correlated to the retention time ranges of UV peaks in the specific extract. Every time a feature matched an UV signal (at 330 nm), it was given a positive value “1”. In case this assignment was not possible, the value “0” was added. A.csv table of the results obtained was exported and integrated into the Cytoscape network.

In silico annotation

The obtained MS2 data was exported from MZmine and processed via the in silico annotation tool SIRIUS 4 (Duhrkop et al. 2019), which represents a straightforward method for the fast and easy assignment for features of special interest in selected clusters. Based on the combination of high resolution isotope pattern determination and fragmentation trees for structure elucidation, a holistic assessment of molecular structures based on MS2 data is provided via the prediction of a molecular fingerprint as well as a molecular formula. This is achieved in a three-step process: (I) The molecular formula is predicted by SIRIUS; (II) A structure is assigned with CSI:FingerID; (III) The compound class is evaluated by CANOPUS. The following parameters were used for the calculation of molecular formulas: possible adducts: [M + H]+, [M + K]+, [M + Na]+; instrument: Orbitrap; mode: MS2; mass deviation: 5 ppm; molecular formula candidates: 10; candidates per ion: 1; consider only formulas in DBs: all included DBs. The CSI:FingerID algorithm (Duhrkop et al. 2015) was utilized: Search in DBs: All DBs, Fallback Adducts: [M + H]+, [M + K]+, [M + Na]+. Furthermore, via the integrated CANOPUS workflow, the chemical ontology of various structural components could be classified due to their fragmentation tree.

In addition to that, the recent MSNovelist software was employed, which de novo generates the molecular structure from the tandem-MS spectral data independently from the comparison to known compounds, therefore not being reliant on the quality of databases. In brief, this is done via an integration of fingerprint prediction with an encoder-decoder neural network. The inventors showed that compared to the GNPS library, 25% of structures were correctly identified on first rank, and 45% of structures were overall retrieved (Stravs et al. 2022), thus it seemed a good ad-on for this study where a lot of unknown compounds were expected.

Results and discussion

Study overview

To explore the visualization power of FBMN analysis in the field of phycology, a set of 33 samples (Table S1) was collated. We included 31 red algae, one green alga (Cladophora sp.) and a lichen (Lichina pygmaea), as well as several MAA standards in our set (I) to test the algorithm´s power to assign MS-features, (II) to see if the recorded features cluster in specific groups suitable for taxonomic assignments, and (III) to mine the network for undiscovered MAAs.

Although being also described in green algae, lichens, fungi, as well as cyanobacteria, MAAs are known to occur predominantly in red algae (Orfanoudaki et al. 2019a; Geraldes et al. 2020). Therefore, as our approach was focusing on the visualization of this group of compounds the chosen biomaterial was mainly belonging to the phylum “Rhodophyta”, covering a representative array of orders (e.g. Gracilariales, Halymeniales, Gigartinales, Ceramiales, Bangiales), families (e.g. Gracilariaceae, Corallinaceae, Bangiaceae, Rhodomelaceae, Ceramiaceae) and genera. Of some algae (e.g. Jania rubens, Pyropia plicata, Bostrychia scorpioides) multiple collections were available and thus included to highlight possible geographical and seasonal differences. Additionally, the sample set comprised a lichen as well as a green alga to indicate that their easy and correct differentiation from Rhodophyta would be possible.

As depicted in Fig. 1, in a first step the extracts were prepared. After sample analysis by UHPLC-DAD-HR-MS2 and output conversion (Chambers et al. 2012) (see Section 2.3.1), the obtained data set was pretreated in MZmine (Pluskal et al. 2010) and subsequently uploaded to the GNPS platform (Nothias et al. 2020), where a molecular network was created. The general terminology used throughout for describing the network is well explained in literature (Quinn et al. 2017; Aron et al. 2020). Next, the nodes were organized following compound class, taxonomy retention time, similarity as well as UV-response, to enable correct feature annotation.

Fig. 1
figure 1

Workflow of the study indicating the applied techniques and FBMN graphical filter options to profile a set of 33 samples, mainly algae, with special focus on MAAs

Feature-based molecular networking

Initial interpretation

The holistic network generated from the 33 sample extracts, a mixture of standard compounds (see Fig. S1), and a solution derived from highly enriched pre-purified MAA fractions exhibited 2973 individual nodes (genuine standards were flagged with a star in the network) and 3139 edges (Fig. 2). Out of these nodes, 2387 were self-loops (singletons). The remaining 586 nodes were gathered in different sized clusters (55 clusters comprising two nodes, 58 comprising three or more nodes). An in-house database with over 70 MAA-related entries was propagated on basis of the work of Geraldes et al. (2020) to enable the instant peak annotation of the standards based on MS1 spectra. To comprehensibly present the occurrence of selected analyte nodes in extracts, their presence was displayed as a pie chart. As it visualizes relative distribution, a semiquantitative evaluation is possible and algae exhibiting high concentrations of a certain target compound can be observed at first glance. The node color setting can be utilized to display taxonomic hierarchy in various quickly interchangeable layers, thereby indicating if a node is specific not only to a certain family but maybe is a feature of a specific subclass or order.

Fig. 2
figure 2

Detail of the Feature-Based Molecular Network derived from the UHPLC-HRMS2 analysis of all samples. Filters were applied for additional information layers: node size expresses UV-response, node frame thickness is proportional to polarity. Number in node specifies exact mass, width of edge indicates similarity of compounds and shows possible fragmentation pathway. MAA standards are indicated with a red star. Color coding according to taxonomic level “order”. In detail discussed clusters are framed

Highlighting the composition of the nodes according to taxonomic hierarchy clearly showed that phylum specific clusters exist, which allow the easy discrimination of the investigated red algae from the green alga and lichen in the sample set. For the green alga, one specific UV-negative cluster as well as several single nodes were identified, the extract of L. pygmaea exhibited several bigger and smaller clusters, which featured mainly/only metabolites specific for this sample (see supplementary Fig. S2).

Delving deeper into taxonomic levels yielded several order/family-specific clusters, e.g. on the family level big clusters of Rhodomelaceae, Gracilariaceae, and Bangiaceae were observed. For more detailed information see Online Resource Fig. S3.

By a closer inspection it was noticed that some clusters are not belonging to an entire family, but rather to a single genus or species, as for example in the case of Rhodomelaceae, more specifically the Bostrychia cluster. Sample-specific clusters were found for Gracilaria specimens and Mastocarpus stellatus. In the Bangiaceae-specific nodes, one of the respective clusters belongs solely to Phorphyra umbilicalis, the other investigated Porphyra species were not included. This could be hinting towards a species-specific marker cluster or a location specific one, as this specific species was the only one collected in the Baltic Sea, a semi-enclosed sea, in our sample set. This points to the evolution of species-specific metabolites in the class of Rhodophyta, and thus presents FBMN as a good visualization tool to access the metabolome of a single species and identify taxonomic markers on different hierarchic levels, which is, for example, beneficiary for large ecological/chemotaxonomical studies.

Extending the visual representation power of the generated network

To test the second hypothesis, i.e., that FBMN can be used to conveniently annotate and depict the metabolome of algae, the visualization of the network was enhanced by 4 steps to increase the retrievable information.

First, the analytes’ relative polarities were visualized by adding the retention time as frame width to the nodes. The broader the line, the later the peak was eluted, thus the less polar the analyte is.

Second, the features’ ability to absorb UV-light was included as binary value. In case the MS-signal co-eluted with a UV-signal the respective feature was visualized as big node, when it lacks this property it is small. For the compound class of interest, MAAs, strong UV-absorption is highly characteristic. Thus, this variable was intended to highlight MAA clusters and facilitate the identification of species with unknown MAAs.

Third, features representing standard compounds were highlighted with a star symbol, indicating an unambiguous identification. They represented a corroborated landmark for further structure prediction by databases or MS2 fragmentation patterns. Furthermore, nodes were also labeled with the exact precursor ion mass, and possible fragmentation patterns indicated on the edge between conjoined nodes.

This finally offered the possibility to display the similarity amongst different members of a cluster (fourth step). The relation between neighboring nodes was expressed by their cosine score, which determines the comparability of individual MS2 profiles. By plotting this parameter as width of the respective connective edge, the likeness of two nodes as well as the whole respective cluster could be immediately evaluated (Fig. 3).

Fig. 3
figure 3

Selected informational layers to visualize the FBMN approach, based on two examples/clusters

Feature classification and annotation

As a next step an ontologic evaluation – i.e., the classification of the features according to their chemical backbones – of all generated clusters was propagated via MolNetEnhancer (Ernst et al. 2019) with integrated ClassyFire (Djoumbou Feunang et al. 2016) algorithm. This was done to determine whether this editing offers additional information for the identification of metabolites in algae. As the specific class of MAAs is not featured in the algorithm, which may be due to their limited occurrence, the corresponding clusters could not be explicitly entitled. Nevertheless, other elements of the network could be putatively classified as “phenethylamines”, “amino acids, peptides, and analogues”, “amines”, “benzoic acids and derivatives”, “fatty alcohols”, and “fatty acid esters” (see supporting information Fig. S4). These compound classes can be expected in polar algal extracts, for example amino acids are well-known metabolites in algae (Fowden 1951; Nasr et al. 1967). However, these metabolites are usually not addressed in MAA focused studies, yet their presence could have a taxonomic relevance.

Several nodes could be characterized using online databases (GNPS). These libraries yielded 38 hits including metabolites, which are well known to be present in algae, e.g. betaine (Kato et al. 1996), D-( +)-galactose, 2-deoxy-D-glucose, acetyl-arginine, methionine and hordenine (Barwell and Blunden 1981; Yalçın et al. 2007). One MAA, namely mycosporine-serinol (ID2695), could be successfully annotated by the GNPS library with all fragment ions being consistent. The analyte was only detected in the extract of L. pygmaea and not part of any other cluster (i.e. represented as self-loop/single node). This is in good agreement with literature, as mycosporine-serinol was described for this specific species only (Roullier et al. 2009). However, the ten MAA standards were initially not assigned. This is due to missing publicly available MS2 data for MAAs or limited comparability of the deposited spectra. Thus, the GNPS library (accessed 5 Aug 2022) was manually screened for MAAs and shinorine, palythine and asterina-330 were found. As the library spectra had been recorded with different instruments (ion-trap, etc.) the spectral profiles did not match. Rodrigues et al. reported a similar problem and could show that after uploading suitable reference spectra for more than thirty N-Acyl-L-homoserine lactone standards, 96% of them could be assigned in a sample set (Rodrigues et al. 2022). Thus, we also added the MS2 spectra for MAA standards recorded by orbitrap MS to the online database, and all standard nodes were correctly retrieved by the algorithm. This shows the importance of data-sharing in the tandem-MS community, which was also expressed by Wang and colleagues (Wang et al. 2016).

Manual MAA-MS fragmentation analysis

The observed MS fragmentation spectra of the injected standards were analyzed in comparison to already published ones of other MAAs (Cardozo et al. 2008; Parailloux et al. 2020). Thereby the following general fragmentation route was identified, as exemplarily shown for shinorine (Fig. 4): A loss of 15 mass units is usually the first step and resembles the elimination of a methyl radical from the methoxy group on the ring scaffold. The elimination is initiated by the protonation of the adjunct carbon atom of the main ring atom, resulting in a weakening of the O–CH3 bond due to an increased bond length. The elimination of NH (also 15 m/z) was refuted by Cardozo and coworkers, as they utilized the electrophilic Fukui function to identify the C atom bearing the methoxy group as most prone to positive ionization, followed by both nitrogen atoms (Cardozo et al. 2008). In a second step, several neutral losses in the sidechain follow, i.e. losses of water (e.g. from m/z 318.1048 to m/z 300.0944 or m/z 274.1156 to m/z 256.1045), decarboxylations (e.g. from m/z 318.1048 to 274.1156) or dealkylations (m/z 333.1282 to m/z 303.1282, etc.), yielding even and odd mass fragments. Also, the subsequent loss of water and an intramolecular H rearrangement have been reported, which can be followed by aromatization of the ring system and repeated losses of water or NH3, finally resulting in masses with m/z values of 137 and 138, respectively (Cardozo et al. 2008).

Fig. 4
figure 4

MAA-typical MS-fragmentation tree as well as the exemplary interpretation of the recorded MS-spectrum of shinorine. Assignment of fragments according to literature data (Cardozo et al. 2008; Parailloux et al. 2020)

In silico annotation via SIRIUS, CANOPUS and MSNovelist

Different software options were evaluated for their suitability to annotate MAAs, whereby the available standard compounds were used as models. In SIRIUS, for six out of the ten MAAs the correct molecular formula was proposed as rank 1 in the hit list, for aplysiapalythine B (rank 3), palythine (4) and aplysiapalythine A (10) the match was less precise. Even not perfect, this proved that SIRIUS is well applicable for the determination of the molecular formula of known and the search for unknown MAAs. A meaningful result is expectable as for nine out of ten standard MAAs the correct molecular formula was in the top 10 hits and was therefore easy to identify. Without influencing the search algorithm, the correct structure was never found as hit one. Nevertheless, for five of the compounds a correct structure was propagated in the top 8 hits. Compounds without meaningful molecular formula prediction understandably also lacked the CSI:FingerID match. However, by refining the search parameters the results could probably be improved.

Using the integrated CANOPUS software, structural parts could be linked to groups at least down to level 5. In most cases, different types of amino acids could be assigned. Notably, only mycosporine-methylamine-threonine was wrongly classified as polyol (for SIRIUS/CANOPUS results see Fig. S5-S11). Therefore, by combining the results obtained through SIRIUS (molecular formula) and CANOPUS (chemotaxonomic classification), a significant benefit concerning the analysis, structure prediction and categorization of unknown compounds can be expected.

Subsequently, the most recent tool MSNovelist was employed for comparing both the standard compounds and the Bostrychia cluster that featured interesting and putatively novel MAAs. Unfortunately, this approach was not yielding meaningful results for MAAs and will probably need some improvement before being suitable for application in this field of research.

Analysis of promising/interesting MAA-containing clusters

To indicate the secondary metabolites of highest interest, the network’s clusters were –after activating the variable of the UV signal– reanalyzed (Fig. 2). Overall 18 clusters with more than three 330-positive nodes were observed. Notably, the standard MAAs were all located in one large cluster and three smaller clusters.

In cluster 1 (see Fig. 5), four of the five nodes could be successfully annotated to MAAs of the standard mixture, namely aplysiapalythine A (ID952) and B (ID603), palythine (ID12) and asterina-330 (ID132). The observed clustering effect is due to all analytes’ structural relation to palythine (ID12), i.e. they exhibit a cyclohexenimine scaffold with glycine substitution. This close structural similarity is also expressed through a high cosine score. It could be well traced by the broad edge between aplysiapalythine A (= N-propanolpalythine) (ID952) and asterina-330 (ID132), which both only differ by one methyl group in the side chain. These two nodes were linked by a narrower edge to the rest of the palythine-family, featuring a shorter or no sidechain. This cluster only comprised MAAs with a glycine substitution in the first and no acidic function in the second sidechain. Additionally, one node (ID1606) could be putatively associated to a palythine isomer, as this isobaric node’s connectivity was very strong expressing congruency of MS2 spectra. The retention time shift (0.2 min) refuted, however, an annotation as palythine. As the cosine score already indicated, MAA-typical fragment ions (e.g. m/z 230.0889, 186.0993 or 137.0705), were found and congruency to the recorded spectrum of palythine was identified. ID1606 could be identified in the extract of Chondrus crispus, but not in the standard mix or in samples of the Bangiales order or Rhodomelacea family, where only palythine (ID12) was present. Thus, the abundant alga C. crispus might be a promising source for the isolation of this unknown compound, possibly a new MAA.

Fig. 5
figure 5

MAA-containing clusters 1, 2 and 3, indicating dereplicated and possibly new compounds. Graphical representation according to Fig. 2

Cluster 2 (Fig. 5) featured the standard compound mycosporine-glycine-alanine (ID953); besides, the FBMN workflow indicated its potential for the identification of MAAs when no standards are available. Further nodes could be tentatively assigned to an entry from the in-house MAA library (Mycosporine-2-glycine, ID1124) as well as a novel MAA-related structure (ID3287) by characteristic fragmentation patterns.

Cluster 3 (Fig. 5) comprised the cis/tans-isomeric pair usujirene (ID1186) and palythine (ID1184). Their structural similarity was confirmed by a high cosine score. These two nodes congregating in a separate cluster is highly meaningful as they were the only ones exhibiting an unsaturated sidechain and were therefore not connected to any other MAAs. Furthermore, the UV coverage of all standard MAAs by 330 nm can be confirmed as both nodes possess a higher absorption maximum but are nevertheless correctly recognized by the filter. Interestingly, this cluster also featured other unknown compounds, which were only found in the highly enriched fractions. As all nodes in this cluster showed a rather high retention, the less polar character of the unsaturated sidechain could be corroborated.

The main MAA cluster 4 (Figs. 6 and 7) was split into parts A-D to facilitate discussion (for parts A and D refer to Online Resource Fig. S12). In this cluster a variety of other MAAs could be annotated by comparison with the database, e.g. shinorine (ID140), porphyra-334 (ID682), mycosporine-methylamine-threonine (ID1183), bostrychines E (ID1817) and F (ID1839), palythine-serine (ID581), and mycosporine-lysine (ID596). According to the automatic fragmentation suggestion indicated on the edges, subsection D contained several glycosylated structures closely related to the MAAs, which is also corroborated by recorded mass ranges of over m/z 470, yet showing a MAA typical fragmentation pattern. ID639 and ID1237 were found to be the bridge nodes between the subsection of glycosylated MAAs. They were featured in a variety of species, but not in the standard mixtures and therefore could not be assigned to any known MAA.

Fig. 6
figure 6

Cluster 4, subsection B. Structures assigned according to library matches and fragmentation patterns, unambiguously dereplicated MAAs are highlighted. Graphical representation according to Fig. 2

Fig. 7
figure 7

Cluster 4, subsection C. Structures assigned according to library matches and fragmentation patterns, unambiguously dereplicated MAAs are highlighted. Graphical representation according to Fig. 2

Overall, it was clearly visible that common MAAs like shinorine (ID140) or the putatively assigned palythine-serine (ID581) were present in many samples and families. Some nodes were only detected in the standard mixtures obtained from highly enriched and pre-purified fractions, otherwise these metabolites might have been undetected due to low concentrations. Nodes in this cluster also exhibiting a positive UV-response and closely related to annotated compounds may resemble new mycosporine-like amino acids that have so far not been described in literature. This applies for example to ID2233, which is specifically found in the extracts of Porphyra spp. (i.e. both Porphyra columbina and P. umbilicalis).

Notably, one cluster was only containing nodes from extracts originating from Bostrychia spp. (Fig. 8). It comprised known metabolites such as bostrychine A and B (ID3411 and ID1796) specific for this genus (Orfanoudaki et al. 2019b), as well as compounds that point to other MAAs (UV positive, similar fragmentation), but have not been isolated yet. One reason could be the sometimes-observed severe overlapping of peaks based on the UV-signal (see Online Resource Fig. S13), which makes their separation very challenging. Yet, by the networking approach using UHPLC-HRMS2 they were found. The Bostrychia cluster solely featured UV-positive nodes, which makes it even more interesting to consider these compounds as potential markers for this genus.

Fig. 8
figure 8

Cluster of features specific for Bostrychia genus. Known structures were assigned as described before, unknowns tentatively identified based on their relation to neighboring dereplicated nodes

Based on the m/z differences of correctly dereplicated neighboring compounds and fragmentation suggestions by the algorithm, the actual structures could be proposed. For example, starting from library hit ID1796 “bostrychine B”, the structure of the neighboring but unknown node ID1825 can be postulated: via the fragment ions m/z 84.0441 (literature 84.0442) and 130.0497 (literature 130.0497) (Zhang et al. 2019) the presence of a glutamine moiety can be deduced. The mass difference of both features (14.0156) pointed towards an additional methyl group in the sidechain of ID1796, which was also indicated as automatically added information on the edge. Consequently, the second amino acid substituent of ID1825 was putatively identified as serine instead of threonine (ID1796). The high similarity between these two nodes is also reflected in several shared fragment ions (e.g. m/z 185.0910, 228.3869, 374.1557). The difference of m/z 17 (418.1810—401.154) refers to a loss of NH3 (Cardozo et al. 2008) in the glutamine sidechain, and is not caused by a cleavage of water from serine/threonine as it was also observed in ID3411 (bostrychine B) and ID1807 (mycosporine glutamine).They both exhibit the aforementioned mass loss having glutamine as singular amino acid substituent. This led to the tentative identification of ID1825.

ID1795, revealing a close relationship to both aforementioned nodes, could be tentatively assigned to a new bostrychine A isomer. The corresponding MS2 spectra were highly similar, sharing nearly all major fragment ions. However, the considerable deviation in retention time (∆RT = 0.26 min) indicated that ID1795 is an undescribed isomer of ID3411. ID1802 was also annotated as possible bostrychine analogue, featuring a decarboxylated glutamine sidechain; a mass difference of 44 supported this suggestion. Even the associated sidechain fragment (m/z 86.0598) was detected in the mass spectrum, along with a signal indicating the loss of 17 m/z, which is typical for glutamine substituted MAAs (see above).

The adjacent node-pairs ID1807/ID3411 and ID1814/ID1802 showed the same indications for a nitrogen oxygen exchange in their structures. This means they possess either a cyclohexenimine or cyclohexenone scaffold, with otherwise same substitution, as for example observed for bostrychine A/mycosporine-glutamine. This is visualized by a direct edge from the oxygen-substituted rings to the corresponding nitrogen-substituted node only. Also, the mass difference of m/z 1, observed in most of the MS fragments of respective compounds, agrees to the difference between carbonyl and imine (for spectra of the discussed compounds see appendix Figs. S14 and S15).

Conclusion

The here presented UHPLC-HRMS2 based approach enabled a comprehensive survey of the composition of crude Rhodophyta extracts with special focus on their MAA content. Feature-based molecular networking showed to be a convenient method to visualize and summarize the extensive amount of data generated. Various informational layers reaching from similarity of features and their respective polarity to tentative fragmentation pathways and the features’ distribution in different species or orders could easily be added and facilitated the identification of both known and novel compounds. As the network allows a relative quantitative statement, also the chances for their isolation can be evaluated in a realistic way.

By utilizing ontology software like MolNetEnhancer, the entire network can be classified according to chemical taxonomy. For algae this offered a holistic view on the profile of structures. Besides, FMBN permitted the tentative identification of not annotated compounds without a corresponding library entry. As relation levels to neighboring nodes, possible fragmentation pathways and UV-absorption are visualized, a sound structural proposal can be made. This is much more meaningful than just evaluating MS2 spectra, lacking these additional information layers. The benefits of the proposed technique could convincingly be shown for the cluster of Bostrychia scorpioides samples, which featured a high number of yet unknown metabolites. In silico structure elucidation tools were applied to identify the molecular formula, structural class or even the entire chemical structure. Well characterized MAAs served as model compounds for these experiments. Naturally, the assignment of novel compounds will only be tentative unless the compound is subsequently isolated. However, the chances of correct annotation seem promising using FBMN.

This study hopefully has raised interest in molecular networking and some of the associated modern tools that are extremely helpful also for phycological research. They are convenient to visualize and evaluate huge data sets produced by modern analytical methods like UHPLC-HRMS2, which barely can be handled by conventional strategies. As exemplarily shown for red algae and MAAs for the first time, this paves way for further phytochemical as well as taxonomic or ecologic research. An additional advantage is the open source nature of respective platforms. By sharing network and spectral data, they grow steadily and facilitate the future dereplication of MAAs for researchers all over the world.