Introduction

Metabolomics is a scientific field dedicated to the comprehensive examination of small molecules and the metabolites within organisms. It involves the identification and quantification of these molecules at specific time points within a biological system, using advanced analytical instruments such as mass spectrometry (MS) and Nuclear Magnetic Resonance (NMR). Renowned for its potential to accelerate the discovery of new bioactive compounds [1], the effectiveness of metabolomics is significantly heightened when integrated with statistical methods, such as multivariate analysis. This integration not only simplifies data presentation but also substantially enhances the interpretation of the data obtained from these techniques [2].

Multivariate analysis, employed on data sets with multiple variables, effectively discerns patterns, correlations, and structures within the data [3]. Especially in the case of plant metabolite profiling studies using UPLC-QTOF/MS, this analysis helps in pinpointing potential chemical markers based on diverse criteria. These markers, quantifiable characteristics or indicators, are paramount for evaluating plant quality and distinguishing between plant sources from distinct regions [4]. Researchers increasingly use computational methods for the extraction and analysis of chemical information. Molecular networking (MN), a notable computational strategy, substantially improves the visualization and interpretation of MS data. This advancement aids in the identification of molecules and mining of chemical markers [5]. Integrating multivariate analysis with molecular networking presents a promising untargeted approach for the identification of chemical markers.

Epimedium koreanum Nakai (EKN), commonly known as horny goat weed, belongs to the Berberidaceae plant family, indigenous to South Korea and widely found in China and Japan [6, 7]. EKN is a traditional herb with historical usage for functional food, nutraceutical, and pharmaceutical applications. More than 130 secondary metabolites have been analyzed and classified from different Epimedium species [8] including prenyl-flavonoids, lignans, phenols glycosides, phenylethanoid glycosides, sesquiterpenes, acids, alkaloids, xanthones, and aldehydes. Notably, EKN is rich in flavonoids, especially 8-prenyl-flavonoid derivatives [9, 10]. These secondary metabolites of EKN are responsible for various bioactivities, including antimicrobial, antioxidant, anti-mutagenic, immunomodulatory, estrogenic, hypercholesterolemia-regulating, anti-rheumatic, and androgenic activities [11,12,13,14,15,16,17,18,19]. Given the broad usage of EKN, pinpointing its chemical markers is worth for ensuring the quality and authenticity of its sources. This study focuses on analyzing four EKN extracts from distinct South Korean regions to authenticate their chemical constituents and distinguish the regional sources, employing an integrated, untargeted metabolomics approach combining multivariate analysis and molecular networking.

Methods/experimental

Collection and preparation of samples

Aerial parts of Epimedium koreanum Nakai (EKN) were collected from wild fields located in four different regions, [S1] Wando, Jeollanam-do; [S2] Cheorwon, Gangwon-do; [S3] Yongin, Gyeonggi-do; and [S4] Hwacheon, Gangwon-do in South Korea (Fig. S1). The collected plants were obtained from the Natural Product Central Bank at the Korea Research Institute of Bioscience and Biotechnology (Daejeon, Korea). The voucher specimens (KPM028-045, PA000855, PA001124, and PA001125) was deposited at the Natural Product Central Bank of Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Korea. The samples were weighed in ten replications for each region. The dried samples (300 mg) were extracted with 10 mL of methanol at room temperature using a sonicator for 1 h, filtered, and evaporated using a rotary evaporator below 40 ℃. This process was repeated three times to obtain the total extract. Four distinct dried materials were each extracted ten times separately and were then analyzed. Forty different extracts (3 mg each) were dissolved in 1 mL of MeOH for UPLC-QTOF/MS analysis.

UPLC-QTOF/MS analysis

EKN extract was analyzed using Waters Acquity UPLC system combined with a XEVO-G2 XS QTOF mass detector (Waters, Milford, MA, USA) equipped with Atlantis T3 C18 column (1.7 μm, 1.2 mm i.d. × 100 mm) operated at 35 ℃ with 0.1% formic acid/water as mobile phase A and 0.1% formic acid/acetonitrile as mobile phase B. The water was purified using Milli-Q Academic, produced by Merck Millipore (Burlington, MA, USA). Acetonitrile and formic acid required for UPLC-DAD (Diode Array Detector)-QToF/MS analysis was purchased from Merck Millipore and Sigma-Aldrich (St. Louis, MO, USA). The sample analysis focused on phenolic compounds including flavonoids in the extract, was performed with the gradient elution as follows: 13% (B) for 0.00–1.00 min, 13–28% (B) for 1.00–7.00 min, 28–36% for 7.00–10.00 min, 36–38% for 10.00–12.00 min, 38–65% for 12.00–16.00 min, 65–100% for 16.00–16.01 min, 100% (B) for 16.01–18.50 min, 100–13% for 18.50–18.51 min, and 13% (B) for 18.51–21.00 min. The flow rate was 0.4 mL/min and the injection volume was 1 μL. In this study, MS analysis was conducted exclusively in negative mode because a wider range of compounds were effectively ionized and detected in this mode. In contrast, the positive mode did not yield satisfactory detection for these substances (Fig. S2). Data-dependent analysis was performed in negative mode under the following conditions; source temperature was set at 110 ℃; desolvation temperature was set at 350 ℃; the capillary voltage was 2.3 kV; cone voltage was 40 V, collision energy ramp LM 20–40 eV, and HM 50–90 eV. Throughout the analysis, a reference mass of leucine enkephalin (m/z 554.2615) was used for mass correction. All collected raw data sets were converted to mzML format using the MSConvert 3.0, then processed by MZMINE software version 2.53 [20] to extract molecular features for deconvolution, alignment, and integration using manual parameters based on the ion peak including m/z, retention time, and relative intensity. The aligned data was used for multivariate analysis and the GNPS molecular networking (Global Natural Product Social Molecular Networking).

Multivariate analysis

The processed data were exported as a CSV file, containing information on ion peaks including m/z, retention time, and relative intensity. The data were labeled for each group along with a series of repetitions, such as S1-1 to S1-10. Before PCA (Principal Component Analysis) and OPLS-DA (Orthogonal Partial Least Squares Discriminant Analysis) using the SIMCA-P 12.0 (Umetrics, Umeå, Sweden), the data file was mean-centered and Pareto-scaled. The visualization of the heatmap analysis and the VIP (Variable Importance in Projection) score plot were created using the web-based platform MetaboAnalyst 5.0 (https://www.metaboanalyst.ca/).

Molecular networking workflow

The processed MS/MS data were submitted to GNPS web platform to determine molecular networks (MN). Access to the created MN and its specific settings is available through this link: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=1f0eecd27ea04bb7a5d46bcaeefe7979. Parameters for generating the MN included a precursor mass tolerance of m/z 0.1 Da, an MS/MS fragment ion tolerance of m/z 0.5 Da, a minimum cosine similarity score of 0.7, at least 6 matching fragment ions, and a minimum cluster size of 1. Following this, the spectra in the network were compared with the GNPS spectral libraries. Matches between the network’s spectra and the library’s spectra were considered valid if they achieved a cosine similarity score of over 0.7 with a minimum of 6 matched peaks. Visualization of the resultant MN was carried out using the Cytoscape 3.7.0 software. Tentative identification of the components relied on manual analysis of the MS/MS spectral data.

Results

UPLC-QTof/MS analysis

The chemical composition of EKN extracts was investigated using UPLC-QTOF/MS in the negative ESI mode to identify various components. Fig. 1 displays the base peak chromatogram of EKN. Tentative identifications were made by comparing the proposed molecular formulas from accurate molecular mass and MS/MS fragment ions with existing databases and literature references. The MS and MS/MS spectral data for fifty components (1-50) detected in negative ESI mode are outlined in Table 1.

Fig. 1
figure 1

UPLC fingerprints of Epimedium koreanum at 280 nm UV absorption and mass spectroscopy in negative mode

Table 1 Tentatively identification of compounds 150 using UPLC-Qtof-MS analysis

Three phenolic compounds (1, 2, and 8) along with forty-seven flavonoid compounds were tentatively identified. The phenolic compounds (1 and 8) were recognized through ion fragments at m/z 191, indicative of quinic acid. Compounds 12, 13, and 14 showed a fragmentation pattern of the quercetin skeleton at m/z 301, consistent with [quercetin–H]. Compounds 3, 4, 18, 20, 21, and 28 exhibited a fragment ion at m/z 353, characteristic of 8-prenylkaempferol. Derivatives of 8-prenylkaempferol, featuring an additional hydroxyl group, were discerned from fragmentation ions at m/z 383 in compounds 22, 24, 25, and 45. Icaritin, distinguished by an additional methoxy group attached to the 8-prenylkaempferol moiety, was identified from ion fragments at m/z 367 in compounds 16, 2944, and 4750.

Multivariate statistical analysis

To assess the relative variability and identify potential chemical markers among EKN samples from various locations, a multivariate statistical analysis was conducted. Multivariate analysis, including PCA, OPLS-DA, heatmap, and VIP scores, were applied to visualize and pinpoint the chemical constituents correlated with the regional distinctions of EKN samples.

Principal component analysis (PCA)

PCA (Fig. 2) was performed to visualize the clustering patterns among the EKN samples based on regional distinctions and elucidate the metabolites associated with chemical variability. The PCA score plot revealed that PC1 accounted for 43.8% of the variance, while PC2 accounted for 14.3%. The samples were grouped into four distinct clusters (Wando (S1), Cheorwon (S2), Yongin (S3), and Hwacheon (S4)), each representing a different region, with each point within a cluster representing an individual sample. Notably, the S1 and S2 groups exhibited a close correlation, in contrast to the more distinct separations observed with S3 and S4 groups. Furthermore, the vectors representing data clusters for S2 and S3 indicated opposite directions, suggesting a negative correlation between these groups. The S3 group was particularly distinguishable, positioned away from the other three groups in a positive region of the PCA score plot, indicating a distinct profile. The model's goodness of fit (R2X=84.2%) and predictive capability (Q2=56.4%) underscore the model's effectiveness in discriminating between the four groups, each comprising 10 samples (Fig. 2a). Additionally, the PCA loading plot (Fig. 2b) highlighted the specific metabolites responsible for the differentiation among the groups. Metabolites including rhamnocitrin 3-O-glucoside (6), hyperoside (12), ikarisoside B (21), epimedin B (30), epimedin C isomers (32, 33), epimedin L (43), and epimedin K (44), contributed to distinguishing the samples according to their geographical origins.

Fig. 2
figure 2

Geographical variation analysis of EKN. a PCA of EKN at [S1] Wando, [S2] Cheorwon, [S3] Yongin, and [S4] Hwacheon and (b) The loading plot graph showing data points from mass spectroscopy

Heatmap and VIP score analysis

The heatmap plot and Variable Importance in Projection (VIP) score plot analysis (Fig. 3) showed the key metabolites based on relative variables, facilitating the identification of potential chemical markers. Heatmap analysis visualized twelve markers that demonstrated significant differences across four EKN samples. The variation in metabolite intensity was depicted through color depth, where deeper colors indicated more significant intensity variations. A VIP score > 1 indicates a variable's substantial importance within the dataset, as depicted in Fig. 3b. Here, eight metabolites are considered as the most crucial contributors to the overall model. Among these, some variables (15, 19, and 39) demonstrated low intensity, which might affect the accuracy of analysis if employed as chemical markers. Consequently, five compounds (4, 6, 12, 19, and 43) stand out as promising candidate chemical markers to differentiate EKN samples from four distinct locations.

Fig. 3
figure 3

Comparative metabolomic profiling of EKN from four regions. a Heat map analysis of the top twenty annotated peaks and (b) Variable importance in projection (VIP) score plot

Orthogonal projections to latent structures discriminant analysis (OPLS-DA)

OPLS-DA (Fig. S3–S8) was performed to discover potential chemical markers for distinguishing between pairs of groups. This process led to the creation of six separate OPLS-DA models for each pairwise comparison.

In the comparison between S1 and S2, the loading S-plot (Fig. S3a) identified compounds 4, 12, 22, 32, and 40 as being significantly distant from the average data. Among these, only compounds 22, 32, and 40, each with a VIP score above one, were considered potential discriminants between the S1 and S2 data series, as shown in Fig. S3. A similar analysis comparing S1 and S3 (Fig. S4) highlighted compounds 6, 32, 33, and 43 for their deviation from the mean, all with VIP scores above one, underscoring their potential as markers differentiating S1 from S3. Further, the S-plot for the comparison between S1 and S4 and subsequent VIP score analysis pinpointed compounds 6, 12, 22, 33, and 40 as distinct from the average (Fig. S5). The comparison between S2 and S3 identified compounds 6 and 30 as distinguishable markers (Fig. S6). Similarly, the analysis between S2 and S4 (Fig. S7). recognized compounds 6, 30, 43, and 44 as effective discriminators. Lastly, the analysis distinguishing S3 from S4 identified compounds 30, 43, and 44 as potential markers, supported by loading S-plot and VIP score analysis (Fig. S8). This streamlined approach effectively highlights key chemical markers for differentiating between groups.

Comparison analysis of selected chemical marker

Following the comprehensive analysis, three metabolites (6, 12, and 43) were selected as chemical markers to distinguish among the four EKN samples, as detailed in Table 2. Fig. 4 presents a bar graph comparing the relative intensities of each compound across regions S1-S4. Notably, compound 6 showed higher intensity in the Cheorwon area (S2) than in other areas, with its lowest detection in Yongin (S3). In contrast, compounds 12 and 43 were more prevalent in Hwacheon (S4), while exhibiting the lowest intensities in Wando (S1). Thus, the relative intensity of these compounds, as measured by mass spectrometry (Fig. S9), serves as a critical metric for differentiating EKN samples across the four locales.

Table 2 Quantitative comparison of peak areas for selected chemical markers
Fig. 4
figure 4

The bar graph of rhamnocitrin 3-O-glucoside (6), hyperoside (12), and epimedin L (43) with the relative peak area from four regions

Molecular networking analysis

Molecular networking (MN) analysis (Fig. 5) was conducted to elucidate chemical characteristics and trends in metabolites. This analysis grouped flavonoids and phenolic acids according to their chemical characteristics, as presented in Table 3. Notably, 8-prenyl flavonoid glycosides were identified across clusters A, D, F, and H, with clusters A, D, and H comprising multi-glycosides and cluster F featuring a mono-glycoside. The analysis also distinguished 8-prenyl flavonoids without glycosylation (Cluster B), flavonoid glycosides lacking a prenyl group (Cluster C), 8-prenylkaempferol derivatives having a fragment ion peak at m/z 383 (Cluster E), 8-prenyl flavonoid glucuronides (Cluster G), and a 6-prenyl flavonoid (Cluster I). Clusters A, C, D, and H were predominantly found in samples from Hwacheon (S4; yellow), indicating a higher concentration, whereas clusters E, F, and G were more prevalent in samples from Yongin (S3; blue), suggesting elevated levels.

Fig. 5
figure 5

Key molecular networking map and the identified key components of the EKN extracts using MS/MS data in negative mode. Each node is labeled with its parent mass, and the size of the nodes represents the relative quantity of each component. The network is visualized as pie chart indicating each corresponding sample

Table 3 Compound information clustered via molecular networking analysis based on MS/MS data in negative mode

Discussion

This study conducted a comprehensive analysis using advanced analytical techniques, including UPLC-QTOF/MS, multivariate statistical analysis, and molecular networking, to identify and characterize chemical markers within EKN samples. This approach elucidated the chemical diversity and spatial variability among EKN samples from four distinct regions in South Korea, highlighting the influence of geographical location on the chemical profile of EKN. Through multivariate analysis, such as PCA, OPLS-DA, heatmap, and VIP score analysis, key metabolites were identified, highlighting the variability and potential chemical markers among EKN samples from various locations. The discovery of three chemical markers (6, 12, and 43) not only facilitates the authentication of EKN but also enhance the understanding of its chemical variability influenced by geographical factors. Furthermore, the molecular networking analysis provided a detailed visualization of the chemical relationships and classifications of chemical constituents, highlighting the variation in cluster distribution among samples. This understanding underscores the geographic specificity of the EKN chemical profile.

In conclusion, this study underscores the significance of advanced analytical approaches in the comprehensive chemical profiling of natural products and suggests the potential of identified chemical markers in tracing geographical origins. Looking forward, it suggests avenues for future research to investigate the effects of environmental factors, such as soil and climate conditions, on EKN's chemical composition. Such research promises to deepen our comprehension of how environmental variations affect metabolite profiles, thereby enhancing the traceability and quality evaluation of herbal products.