Introduction

Flavonoids are widely distributed as glycosidic form by their group (isoflavone, flavonol, flavone, flavanone, anthocyanin, etc.) in most edible plants (vegetables, fruits and seed crops) and have been reported to help in prevention of human diseases such as inflammation, cancer, diabetes, obesity and neurodegeneration1,2.

Soybeans (Glycine max L.) are isoflavone rich source, and one of the most important crops due to their essential nutrients and biological effects through dietary soy foods (e.g. soup, tofu, soy sauce, soymilk)3,4. Most previous studies have been focused on the isoflavone profile and its health benefits of soybean seeds4. On the other hand, soybean leaves (SLs) are considered as potential by-products by their abundant flavonols5,6, and consumed as the traditional fermented foods (Jangajji and Kimchi) using young leaf in Korea7.

The SLs flavonoid studies have been performed from their extracts based on mass (MS) and nuclear magnetic resonance (NMR) spectroscopies in relation to the potential effect on diabetes8,9, lowering cholestetol10,11, atherosclerosis12 and vascular disease13. It was reported that the SLs from black and yellow-coated seed collected from Korea were characterized by primarily containing quercetin / isorhamnetin and kaempferol glycosides (QGs / IGs and KGs) depending on their seed coat color, respectively, and both extracts indicated the excellent effects on suppression of hepatic steatosis and promotion of insulin secretion, which are associated with high-fat-diet (HFD) induced obesity and diabetes14,15,16. Besides, from a KGs-rich fraction of Japanese unripe cultivar, ‘Jindai’ (SLs), kaempferol 3-O-(2″-O-glucosyl-6″-O-rhamnosyl)galactoside and kaempferol 3-O-(2″-O-glucosyl-6″-O-rhamnosyl)glucoside were found to be the predominant tri-glycosides to play an important role in reducing blood glucose of diabetic mice9,17. Another kaempferol 3-O-(2,6-di-O-rhamnosyl)galactoside purified from the Jindai cultivar was also determined to have potent antioxidant and hepatoprotective activities18.

A total of thirteen flavonol glycosides composed of QGs (3 tri- and 2 di-), KGs (4 tri- and 2 di-), and IGs (2 di-) were confirmed from the SLs of eight Japanese cultivars by MS and NMR elucidation, and among them, quercetin 3-O-(4″,6″-di-O-rhamnosyl)galactoside ranked the highest proportion as a new tri-glycoside in cultivar, ‘Clark’19,20. The flavonoid derivatives (12 QGs, 7 KGs and 5 isoflavones) whose glycosylated type and position are still unclear, were distributed differently in their contents (mg/100 g, dry weight) by Italian cultivars (Emiliana, Kure and Elvir) and plant parts (seeds, leaves, stems, pods and roots), particularly, the flavonols (487.3–1586.8) were much higher than isoflavones (91.3–124.3) in young leaves6. Moreover, from Chinese cultivar, its flavonols (KGs) were present only in the SLs, and contained about six times higher than seed isoflavones5.

Although twenty-eight flavonols (14 KGs, 10 QGs and 4 IGs), nine flavones and sixteen isoflavones were identified through previous SLs studies, in the IGs group14,16,19, only four di-glycosides (3-O-rutinoside, 3-O-robinobioside, 3-O-(2″-O-glucosyl)galactoside and 3-O-(2″-O-rhamnosyl)galactoside, based on isorhamnetin) were detected at low level from the SLs of black coated cultivars. Recently, two tri-IGs were characterized as isorhamnetin 3-O-rhamnosylrhamnosylglucoside and 3-O-rhamnosylrhamnosylgalactoside from the leaves of wild Taiwanese G. max subsp. formosana, but their glycosylated positions have not been determined21. Until now, in only few cultivars, most SLs flavonoids have been identified mainly using NMR-based techniques, and their detailed quantifications are also limited. Therefore, after the selection of representative soybean cultivars in which the genetic diversity is sufficiently considered, it is required to perform comprehensive structural interpretation based on MS fragmentations of SLs flavonoids from these samples.

In this study, based on MS and NMR analytical data reported, a LC–MS library was precisely constructed to carry out comprehensive flavonoids profiling from young leaves of 21 core-collected soybean cultivars. Through the integrated application of LC–MS library and UPLC-DAD-QToF/MS analysis, it was purposed to rapidly identify and quantify numerous flavonoid derivatives including novel tri-IGs found from the SLs. Ultimately, these detailed profiles will support breeding superior varieties which is expected to have excellent biological activities, and this study suggest that the SLs can be considered as a valuable edible resource due to their abundant flavonoids.

Results and discussion

Identification of 83 flavonoid derivatives in soybean leaves

A total of eighty-three flavonoid derivatives consisting of flavonol (55), flavone (9) and isoflavone (19) derivatives according to basic structures presented in Fig. 1A,B were tentatively identified from young leaves of soybean cultivars by comparing retention time, UV spectra, MS fragmentation using previously constructed NMR and LC–MS library (Table S1) and UPLC-DAD-QToF/MS analysis. These numerous flavonoid derivatives (flavonol-flavone and isoflavone, wavelengths at 350 and 254 nm, respectively) are presented with excellent separation in UPLC-DAD chromatograms of Fig. S1, and detailed with their compound name and MS characteristics by corresponding peak number in Table 1. The positive ionized fragmentation used in this study makes it easy to check the parent ion through adductive sodium (Na+, 23 Da), potassium (K+, 39 Da) and hydrogen (H+, 1 Da) ions as well as the specified glycosidic (e.g. glucosyl, glucose—H2O) loss from flavonoid structure, when compared with previous negative ionized studies22,23.

Figure 1
figure 1figure 1

Chemical structures of eighty-three flavonoid derivatives (A, 55 flavonols and 9 flavones; B, 19 isoflavones) presented from young leaves of 21 soybean cultivars. mal, malonyl; api, apiose; gal, galactose; glu, glucose; rham, rhamnose; gen, gentiobiose; rob, robinobiose; rut, rutinose; neo, neohesperidose; sop, sophorose.

Table 1 Characterization of eighty-three flavonoid derivatives from young leaves of 21 soybean cultivars using UPLC-DAD-QToF/MS.

Flavonol derivatives (55)

A total of fifty-five flavonol glycosides were mainly composed of di-groups [rham1-gal2, rham1-glu2 (neohesperidose, neo), rham1-gal6 (robinobiose, rob), rham1-glu6 (rutinose, rut): 308 Da] [glu1-gal2, glu1-glu2 (sophorose, sop), glu1-gal6, glu1-glu6 (gentiobiose, gen): 324 Da] and tri-groups [glu1(glu(1))-gal2(6), glu1(glu(1))-glu2(6): 486 Da] [rham1(glu(1))-gal2(6), rham1(glu(1))-glu2(6), glu1(rham(1))-gal2(6), glu1(rham(1))-glu2(6): 470 Da] [rham1(rham(1))-gal2(6), rham1(rham(1))-gal4(6), rham1(rham(1))-glu2(6): 454 Da] combined to the 3-OH of kaempferol (K, m/z 287), quercetin (Q, m/z 303) and isorhamnetin (I, m/z 317) (Fig. 1A and Table 1).

The structural profile of twenty-three flavonol tri-glycosides (peaks 1–7, 10, 12–17, 21–23, 27, 29, 33, 34, 41 and 42) include the pattern of 26 di-glycosides (peaks 8, 11, 18, 20, 24–26, 28, 30–32, 35, 37–39, 43, 44, 47, 51–55, 58, 60 and 62). Six di-glycosides ([M + H]+, m/z 627, 611, 641, based on Q, K and I, respectively) of glu1-gal2 (peaks 8, 26 and 31) and glu1-glu2 (sop, peaks 11, 28 and 32), which were predominant components from some cultivars (SLs 7, 10–12, 15 and 21) (Supplementary Fig. S1 and Table 1), presented the fragmentation of [M + H-glu]+ and [M + H-glu-gal]+ / [M + H-2glu]+. In particular, ‘Q 3-O-(2″-O-glu)gal’ (peak 8) and ‘Q 3-O-(2″-O-glu)glu’ (Q 3-O-sop, peak 11) were consistent with previous reports14,16 following the elution order of gal (Rt = 11.16 min) > glu (Rt = 11.39 min) confirmed after NMR elucidation (Supplementary Fig. S1 and Table S1), and closely related to corresponding tri-glycosides (peaks 1, 2, 6 and 7). Peaks 26, 28 and 31 also determined through interpretation of previous LC–MS and NMR results14,15,16,17,24, and furthermore, peak 32 was tentatively identified as ‘I 3-O-(2″-O-glu)glu’ (I 3-O-sop) on the basis of above mentioned identical information and reported for the first time from the SLs. Peaks 1 and 2 (m/z 789[M + H]+, 811[M + Na]+, 827[M + K]+, based on Q) corresponding to glu1(glu(1))-gal2(6) and glu1(glu(1))-glu2(6), respectively, were tentatively identified as ‘Q 3-O-(2″,6″-di-O-glu)gal’ and ‘Q 3-O-(2″,6″-di-O-glu)glu’ with fragment ions of m/z 627[M + H-glu]+, 465[M + H-2glu]+ and 303[M + H-2glu-gal]+ / [M + H-3glu]+. Also, peak 4 (m/z 773[M + H]+) was found to be ‘K 3-O-(2″,6″-di-O-glu)glu’ including structure of primary ‘K 3-O-(2″-O-glu)glu’ (K 3-O-sop, peak 28) and showed similar fragment patterns with peaks 1 and 2. Three tri-glycosides (peaks 1, 2 and 4) were reported for the first time in this source.

Peaks 6 and 7 (m/z 773[M + H]+, based on Q) corresponding to above glu1(rham(1))-gal2(6) and glu1(rham(1))-glu2(6), respectively, were tentatively identified as ‘Q 3-O-(2″-O-glu-6″-O-rham)gal’ and ‘Q 3-O-(2″-O-glu-6″-O-rham)glu’ with fragment ions of m/z 627[M + H-rham]+, 611[M + H-glu]+, 465[M + H-rham-glu]+, and 303[M + H-rham-glu-gal]+ / [M + H-rham-2glu]+, which were predominant components from black coated cultivars (SLs 8, 12, 17 and 19) including Cheongja 2 (SL3) (Tables 1 and 2). Likewise, ‘K 3-O-(2″-O-glu-6″-O-rham)gal’ (peak 16) and ‘K 3-O-(2″-O-glu-6″-O-rham)glu’ (peak 21) with m/z 757[M + H]+ were highly contained in similar cultivars (SLs 6, 9 and 14) with Daewon kong (SL2, yellow) and consistent with deglycosidic patterns of peaks 6 and 714,15,16,17,20,23,25. Especially, two di-glycosides related to peaks 16 and 21, ‘K 3-O-(2″-O-glu)gal’ (peak 26) and ‘K 3-O-(2″-O-glu)glu’ (K 3-O-sop, peak 28) were only detected with large amount in SLs 7 and 2114,15,17,24. Additionally, new tri-glycosides, peaks 22 and 23 (m/z 787[M + H]+, based on I) were identified as ‘I 3-O-(2″-O-glu-6″-O-rham)gal’ (named as soyanin I) and ‘I 3-O-(2″-O-glu-6″-O-rham)glu’ (named as soyanin II) in mainly SLs 13 and 19, respectively.

Twelve glycosides of glu1-gal6 (peaks 18, 30 and 44) / glu1-glu6 (gen, peaks 20, 39 and 51) and rham1(glu(1))-gal2(6) (peaks 3, 10 and 12) / rham1(glu(1))-glu2(6) (peaks 5, 17 and 15) were closely related to each other and identified simultaneously in SL4 unlike other cultivars (Supplementary Fig. S1). Among them, most glycosides were confirmed as new compounds except for K 3-O-(2″-O-rham-6″-O-glu)gal (peak 10) and K 3-O-(6″-O-glu)glu (K 3-O-gen, peak 39)5,14,23,25. In special, ‘I 3-O-(2″-O-rham-6″-O-glu)gal’ (peak 12) and ‘I 3-O-(2″-O-rham-6″-O-glu)glu’ (peak 15) with m/z 787[M + H]+ were tentatively determined as new tri-IGs through mass fragmented interpretation.

Twelve di-glycosides of rham1-gal6 (rob, peaks 35, 54 and 60)5,14,19,21,23,25 / rham1-glu6 (rut, peaks 38, 58 and 62)5,14,19,21 and rham1-gal2 (peaks 24, 37 and 47)14,16 / rham1-glu2 (neo, peaks 25, 43 and 55) were fragmented from the parent ions ([M + H]+, m/z 611, 595, 625, based on Q, K and I, respectively) to [M + H-rham]+ and [M + H-rham-gal]+ / [M + H-rham-glu]+. Six glycosides belonging to rham1-gal6 and rham1-glu6 described above were evenly distributed in SLs 1 (Shinpaldalkong2ho), 4 and 16, while, ‘K 3-O-(6″-O-rham)gal’ (K 3-O-rob, biorobin, peak 54) and ‘K 3-O-(6″-O-rham)glu’ (K 3-O-rut, nicotiflorin, peak 58) were only detected with large amount in SL5 (Supplementary Fig. S1). ‘K 3-O-(2″-O-rham)gal’ (peak 37) and ‘I 3-O-(2″-O-rham)gal’ (peak 47) of rham1-gal2 were confirmed as major constituents, but new glycosides (peaks 25, 43 and 55) of rham1-glu2 (neo) slightly contained in SL18.

Six tri-glycosides ([M + H]+, m/z 757, 741, 771, based on Q, K and I, respectively), rham1(rham(1))-gal2(6) (peaks 13, 27 and 33) and rham1(rham(1))-gal4(6) (peaks 14, 29 and 34) composed of rham1-gal6 (peaks 35, 54 and 60) and rham1-gal2 (peaks 24, 37 and 47) were fragmented with [M + H-rham]+, [M + H-2rham]+ and [M + H-2rham-gal]+. As major compound from mainly SL5, it was reported that ‘K 3-O-(2″,6″-di-O-rham)gal’ (peak 27)5,14,15,16,17,23,25 have significant antioxidant and hepatoprotective activities against carbon tetrachloride-induced liver injury in mice20. Especially, peak 42 of rham1(rham(1))-glu2(6) with peaks 13, 33, and 34 were newly identified from SLs 1, 3, 4, 8, 12, 13, 16, 17, 19 and 20, among them, ‘I 3-O-(2″,6″-di-O-rham)gal’ (peak 33) largely found as well as closely related to peak 27 presented as major tri-KGs in Korean representative variety, Shinpaldalkong2ho (SL1). Furthermore, peak 33, ‘I 3-O-(4″,6″-di-O-rham)gal’ (peak 34) and ‘I 3-O-(2″,6″-di-O-rham)glu’ (peak 42) were termed as soyanins III, IV and V, respectively (Fig. 3 and Supplementary Fig. S2). Recently, two tri-IGs (isorhamnetin 3-O-rhamnosylrhamnosylglucoside and 3-O-rhamnosylrhamnosylgalactoside) were partially characterized by LC–MS, UV spectra and hydrolysis from the leaves of wild Taiwanese G. max subsp. formosana, but their glycosylated positions have not been determined21.

Flavone (9) and isoflavone (19) derivatives

Among nine flavone derivatives identified as minor compounds, seven glycosides (peaks 48, 61, 65, 67, 68, 74 and 75) were described with combination to the 7-OH of luteolin (L, m/z 287), apigenin (AP, m/z 271) and chrysoeriol (C, m/z 301) (Fig. 1A and Table 1). Four glycosides (peaks 61/68, 74 and 75; [M + H]+, m/z 535, 519, 549, based on L, Ap and C, respectively) were malonylated with L 7-O-glu (cynaroside, peak 48), Ap 7-O-glu (cosmosiin, peak 65) and C 7-O-glu (thermopsoside, peak 67) corresponding to structures confirmed by comparing authentic standards and previous reports14,21,23,26. These new malonylated (mal) glycosides were tentatively identified as ‘L 7-O-(2″-O-mal)glu’ (peak 61), ‘L 7-O-(6″-O-mal)glu’ (peak 68), ‘Ap 7-O-(6″-O-mal)glu’ (peak 74) and ‘C 7-O-(6″-O-mal)glu’ (peak 75) with key fragment of [M + H-mal-glu]+. Peak 68 was found to be consistent with that isolated from Korean lettuce samples27.

From nineteen isoflavone derivatives (5 aglycones and 14 glycosides), the glycosides were presented as structures in which glucose (162 Da; peaks 9, 19, 49, 56 and 72), malonylglucose (mal-glu, 248 Da; peaks 50, 59, 6971, 77 and 78) and apiosylglucose (api-glu, 294 Da; peaks 36 and 45) combined to the 7-OH or 4'-OH of daidzein (D, m/z 255; peak 73), genistein (Gn, m/z 271; peak 79), glycitein (Gy, m/z 285), formononetin (F, m/z 269; peak 82), afromosin (Af, m/z 299; peak 83) and tectorigenin (T, m/z 301; peak 81) (Fig. 1B and Table 1). Among them, eleven isoflavones28,29,30 corresponding to aglycones (peaks 73 and 79), 7-O-glu (peaks 9, 19 and 49), 7-O-(6-O-mal)glu (peaks 59 and 70), 4'-O-(6-O-mal)glu (peaks 50 and 69), 7-O-(6-O-api)glu (peak 36) and 7-O-(2-O-api)glu (peak 45) have already been reported from seeds of soybean cultivars used in the present study. Particularly, peaks 50 and 69 were newly reported as ‘D 4'-O-(6″-O-mal)glu’ (6″-O-malonylisodaidzin; m/z 503[M + H]+, 255[M + H-mal-glu]+) and ‘Gn 4'-O-(6″-O-mal)glu’ (6″-O-malonylsophoricoside; m/z 519[M + H]+, 271[M + H-mal-glu]+)30 from the SLs, respectively, and provided similar fragmentation with the peaks 59 (6″-O-malonyldaidzin) and 70 (6″-O-malonylgenistin) well-known from soybean seeds and leaves (Supplementary Fig. S1, Supplementary Table S1 and Table 1)16,23,25,31,32,33,34,35. Additional peaks 36 and 45 were tentatively identified as new di-glycosides of ‘Gn 7-O-(6″-O-api)glu’ (6″-O-apiosylgenistin) and ‘Gn 7-O-(2″-O-api)glu’ (2″-O-apiosylgenistin) with same m/z 565[M + H]+, 433[M + H-api]+ and 271[M + H-api-glu]+, and have also not been reported in the SLs yet.

Eight methoxy-isoflavones of 7-O-glu (peaks 56 and 72) and 7-O-(6-O-mal)glu (peaks 71, 77 and 78) based on F, Af and T ([M + H]+, m/z 269, 299, 301; peaks 82, 83 and 81, respectively) were interestingly developed during the SLs growth, and their aglycones indicated certain fragment ion related to methyl (CH3, 15 Da) loss in MS positive ionization. Peaks 77 (6″-O-malonylononin), 82 (F) and 83 (Af) have been studied from the SLs by NMR and MS14,36,37, while two glycosides close to peaks 72 and 78 were characterized as afromosin O-glucoside and O-malonylglucoside whose malonylated and glycosylated positions are not determined38. Nevertheless, peaks 72 and 78 could be suggested as ‘Af 7-O-glu’ (m/z 461[M + H]+, 299[M + H-glu]+, 284[M + H-glu-CH3]+) and ‘Af 7-O-(6″-O-mal)glu’ (m/z 547[M + H]+, 299[M + H-mal-glu]+, 284[M + H-mal-glu-CH3]+) considering isoflavone profiles (elution order, UV spectra and QToF-MS data) presented in roots of Medicago truncatula39. Besides, peaks 56, 71 and 81 newly generated from the SLs were tentatively identified as ‘T 7-O-glu’ (m/z 463[M + H]+, 301[M + H-glu]+, 286[M + H-glu-CH3]+), ‘T 7-O-(6″-O-mal)glu’ (m/z 549[M + H]+, 301[M + H-mal-glu]+, 286[M + H-mal-glu-CH3]+) and ‘T’ (m/z 301[M + H]+, 286[M + H-CH3]+), respectively, depending on reports of Stellaria species belong to the Caryophyllaceae40, and necessary to confirm through further NMR studies.

Quantification of 83 flavonoid derivatives in soybean leaves

The contents of eighty-three flavonoid derivatives are summarized according to their aglycones and glycosides in Table 2. The total content (mg/100 g, dry weight) of these derivatives ranged from 342.5 to 992.7 (average 684.9) in young leaves of 21 soybean cultivars, and detailed as flavonols (275.1–854.0), flavones (3.6–17.3) and isoflavones (61.2–154.0) (Fig. 2A). These results (mainly flavonols, 83.6%) are consistent with previous reports that the leaf-flavonols (487.3–2280.0) were much higher than seed-isoflavones (240.2–445.2) as well as leaf-isoflavones (91.3–124.3)5,6,28. As presented in Fig. 2B,C, the abundant flavonols contained primarily as di- (50.4%) and tri- (44.0%) glycosidic forms from the SLs were distributed in the order of K (79.7–853.5, 57.5%), Q (1.6–376.3, 23.9%) and I (70.0–243.2, 18.6%) according to their aglycone types, and had different predominant aglycones under affected by the cultivar’s characteristics. Among eleven cultivars belonging to K-rich SLs (with yellow-coated seeds), the SLs 2, 5, 6, 7, 9, 14 and 21 were composed of about 100% KGs14,15. In particular, the SLs 7 (Kongnamulkong, Korean landrace for bean sprouts) and 21 (Himeyudaga, Japanese breeding line) showed the largest proportion of di-glycosides (94.7 and 91.3%), while the SLs 6 (Kongnamulkong, Korean landrace for bean sprouts) and 9 (Nongrim 51, Japanese breeding line) were expected to be superior cultivars due to their higher total flavonols (TFs; 765.6 and 854.0) with tri-glycosides levels (79.8 and 80.1%), respectively. In addition, the Q-rich SL17 (CS 02,028, Korean landrace) with QGs and TFs (48.5% and 776.3) possessed much higher tri-glycosides (65.3%) compared to the SLs 10 (49.9 and 93.1) and 15 (57.7 and 94.6) with QGs (%) and di-glycosides (%), respectively. Interestingly, despite the low TFs (359.6) of SL4 (PI 90,763, Chinese landrace), its flavonol profile mostly composed of new glycosides such as rham1(glu(1))-gal2(6) (peaks 3, 10 and 12) and

Table 2 Contents of flavonoid derivatives according to the aglycones and glycosides in young leaves of 21 soybean cultivars (mg/100 g, dry weight).

rham1(glu(1))-glu2(6) (peaks 5, 17 and 15) varied from that of other Korean cultivars based on differences in the collected origins. The SL18 (GNU-2007-14502, Korean landrace; TFs 378.8) also provided a specific profile of higher mono-glycosides (48.6%; mainly 3-O-glucose and 3-O-galactose) as presented in Fig. 2C and Table 2.

Figure 2
figure 2figure 2

Comparison in total contents (mg/100 g, dry weight) according to flavonoid (A) types (flavonol, flavone and isoflavone) as well as flavonol (B) aglycones (quercetin, kaempferol and isorhamnetin) and (C) glycosides (mono-, di- and tri-) in young leaves of 21 soybean cultivars.

Among I-rich SLs 1, 4, 11, 13, 16, 17 and 19, exceptionally, SL11 (Geomjeongkong-5, Korean landrace; IGs 31.4%, di- 92.7%) with higher TFs (773.5) contained predominantly di-glycosides of glu1-gal2 (peaks 8, 26 and 31), glu1-glu2 (sop, peaks 11, 28 and 32) and rham1-gal2 (peaks 24, 37 and 47) including I 3-O-(2″-O-glu)gal (peak 31, 103.8), I 3-O-(2″-O-glu)glu (peak 32, 65.3) and I 3-O-(2″-O-rham)gal (peak 47, 54.6), which were reported as low level in previous studies14,15,16,17,24. Besides, the new tri-IGs (178.7; soyanins IV, peaks 22, 23, 33, 34 and 42) closely related to di-IGs in above SL11 included I 3-O-(2″-O-glu-6″-O-rham)glu (70.5; soyanin II, peaks 23), I 3-O-(2″,6″-di-O-rham)gal (19.1; soyanin III, peaks 33) and I 3-O-(4″,6″-di-O-rham)gal (45.9; soyanin IV, peaks 34) as major IGs in SL19 (Junyeorikong, Korean landrace; TFs 611.8, IGs 39.8%, tri- 74.8%). It is considered that these IGs results play an important role on prediction of flavonol biosynthesis as well as determination of their precise structures (based on NMR) and contents from the SLs in further research.

In Fig. 3 and Supplementary Fig. S3, the flavonol biosynthetic pathways could be predicted through the present 52 glycosides (6 mono-, 24 di- and 22 tri-) according to aglycones (K, Q, and I) found from young leaves of 21 core-collected soybean cultivars. In general, the cyanidin 3-O-glucoside and peonidin 3-O-glucoside have been reported as major anthocyanins from black soybean seeds41,42,43. The rich QGs and IGs characterized only in black coated cultivars of this study14,16 suggest that they are closely related to the corresponding cyanidin and peonidin based-structures. Thus, it is considered significant that the rich SLs flavonol profiles can contribute to enhanced overproduction of seed anthocyanins through regulation of specific genes at the growth stage23,44,45 as well as select superior varieties which are expected to have higher biological activities. Thus, the present study summarized the relationship between cultivars and individual flavonoids content according to their aglycones and glycosides, and further described that the SLs from yellow-coated seed mostly composed of KGs, whereas, the SLs from black-coated seed presented as QGs and IGs rich sources (Fig. 2 and Table 2). In the future, it is also necessary to perform metabolomics approach to how these SLs flavonols change during the leaf growth and its fermentation, and investigate the correlation between SLs flavonoids and their biological activities in addition to agronomic characteristics.

Figure 3
figure 3

Proposed biosynthetic pathway of 17 isorhamnetin (I) glycosides (mono-, peaks 64 and 66; di-, peaks 31, 32, 44, 47, 51, 55, 60 and 62; tri-, peaks 12, 15, 22, 23, 33, 34 and 42) identified from young leaves of soybean cultivars (SLs 1, 4, 16 and 19). Compound names are presented according to peak numbers in Table 1. gal, galactose; glu, glucose; rham, rhamnose; gen, gentiobiose; rob, robinobiose; rut, rutinose; neo, neohesperidose; sop, sophorose.

Materials and methods

Plant materials

Among 23,199 soybean germplasms provided by the Gene Bank of National Agrobiodiversity Center (NAC, Korea), 1,000 core collected accessions with superior agronomic and functional traits were chosen. Finally, a total of twenty-one soybean cultivars (varieties, landraces and breeding lines) with a specific introduction (IT) number including three Korean representative varieties (Shinpaldalkong2ho, Daewon kong and Cheongja 2) were selected considering their genetic diversity and flavonoid profiles (Table 3). The seeds of these cultivars were sown on experimental field (5 June 2019, in rows at a spacing of 15 cm) located at the center (latitude/longitude: 35\(^\circ \) 4938.37 N/127\(^\circ \) 0907.78 E), and cultivated under similar conditions during the country’s cropping season (June-November 2019) and their leaves (randomly taken with 5–10 cm) were harvested after 4 weeks. The SLs were lyophilized and finely grounded with a sample mill for their use as analytical samples. Additionally, the seed coat color matured were further grouped as yellow, black and green-black when approximately 95% of their pods reached ‘mature color46’ in a maturity index. Experimental research and field studies on plant materials of this study complies with relevant institutional, national, and international guidelines and legislation.

Table 3 Characteristics of core collected soybean cultivars used in the present study.

Chemical reagents

Reference standards of apigenin, daidzein, daidzin, formononetin, genistein and genistin were obtained from Sigma-Aldrich Co. (St Louis, MO, USA); astragalin, calendoflavoside, cosmosiin, cynaroside, glycitin, hyperoside, isoquercitrin, isorhamnetin 3-O-glucoside, luteolin, narcissin, nicotiflorin and rutin as well as 6-methoxyluteolin and 6-methoxyflavone as internal standards were purchased from Extrasynthese (Genay, France); 6-O-malonyldaidzin and 6-O-malonylgenistin from Synthose Inc. (Ontario, Canada); kaempferol 3-O-gentiobioside, quercetin 3-O-gentiobioside and quercetin 3-O-sophoroside from PhytoLab GmbH & Co. (Vestenbergsgreuth, Germany); Cacticin and trifolin from MedChemExpress (Monmouth Junction, USA). LC–MS grade methanol, acetonitrile and water were supplied from Thermo Fisher Scientific Inc. (Waltham, MA, USA). Besides, formic acid (Junsei Chemical, Tokyo, Japan) was used as eluent additive in extraction and chromatographic separation of flavonoid derivatives.

Extraction of flavonoid derivatives

The powdered samples (1.0 g) were extracted with mixed solvents (10 mL, methanol:water:formic acid, 50:45:5, v/v/v) for 30 min at 200 rpm using an orbital shaker, and then centrifuged at 2016 × g and 4℃ for 15 min (LABOGENE 1580R, LABOGENE, Korea). Each supernatant was filtered through a PVDF syringe filter (0.2 µm, Thermo Fisher Scientific Inc., Waltham, MA, USA). The filtrates (0.5 mL) and internal standards (IS, 0.5 mL) were further diluted with distilled water to 7 mL (final volume), respectively. The IS solution (50 µg/mL) was composed of 6-methoxyluteolin (for flavonol and flavone) and 6-methoxyflavone (for isoflavone) to quantify the identified flavonoid derivatives. In order to obtain the crude flavonoids, a solid phase extraction (SPE) method was performed with a Hypersep C18 SPE cartridge (Thermo Fisher Scientific Inc., Waltham, MA, USA). Briefly, the initial cartridge activation was proceeded through washing with methanol (3 mL), followed by conditioning with distilled water (5 mL). Then, the previously diluted solutions of extracts and IS were sequentially loaded on activated cartridge and washed with distilled water (5 mL) to remove impurities. Finally, a loaded sample was eluted from the cartridge by 5 mL of methanol (with 1% formic acid). The semi-purified flavonoid eluate was concentrated using N2 gas and re-dissolved in 0.5 mL of extraction solvent prior to UPLC-DAD-QToF/MS analysis. All analyzes were carried out in triplicates.

UPLC-DAD-QTOF/MS analysis

An analytical system of UPLC-DAD (ACQUITY UPLC™ system, Waters Co., Miliford, MA, USA) and QToF/MS (Xevo G2-S QToF, Waters MS Technologies, Manchester, UK) equipped with CORTECS T3 C18 column (2.1 × 150 mm, 1.6 μm, Waters Co.) were operated to identify and quantify numerous flavonoid derivatives from young leaves of soybean cultivars. According to our previous reports46,47, chromatographic conditions used were: flow rate (0.3 mL/min), column oven temperature (30 °C), sample injection volume (1 μL). UV spectra was multi-scanned in the region of 210–400 nm (representative wavelengths; 254 nm for isoflavones, 350 nm for flavonols and flavones). The gradient profile was set followed as: initial 5% B; 20 min, 25% B; 25 min, 50% B; 30–32 min, 90%, 35–40 min, 5% B with 0.5% formic acid in water for eluent A and 0.5% formic acid in acetonitrile for eluent B used as mobile phases. Mass spectra were simultaneously measured with the range of m/z 100–1,200 in positive ionized mode using an electrospray ionization (+ ESI) probe, and their parameters used were: capillary voltage 3.5 kV, sampling cone voltage 40 V, source temperature 120 °C, desolvation temperature 500 °C, desolvation N2 gas flow 1020 L/h. To maintain mass accuracy, 0.5 mM sodium formate solution was used externally for the mass calibration, and also, leucine-enkephalin (2 ng/μL) was monitored internally as a reference standard (m/z 556.2766) in real time and introduced using the LockSpray interface at 10 μL/min.

Identification and quantification of flavonoid derivatives

The LC–MS library (from ‘RDA DB 1.0—Flavonoids’ completed in 2016)47 was constructed to carry out more clear and efficient identification of flavonoid derivatives from the SLs based on literature’s analytical data with structural evidences elucidated by NMR and MS spectroscopies, and composed of 53 flavonoids information including positive and negative ion fragmentations (Supplementary Table S1). The purposed flavonoids were tentatively determined by considering the positive fragmentation (reported and proposed), UV spectra (λmax, data not shown) and elution order presented in the constructed library48 (Table 2). Additionally, some derivatives of them were further confirmed through comparison with 25 types of reference standards provided in Table 2. However, since it is not complete to obtain all available standards consistent with the identified derivatives, the quantification for each peak (based on UV detection) was calculated as 1:1 without considering the relative response factor for IS, and expressed as mean ± standard deviation of their triplicated results (Table 3). Especially, in order to select and maintain a stable IS, it was verified that the pre-inserted IS did not overlap with sample peaks, and its recovery was repeatedly validated to correct errors that may occur during the SPE process.

Conclusions

In this study, a total of 83 flavonoid derivatives were comprehensively identified and quantified from young leaves of 21 core-collected soybean cultivars based on high-resolution UPLC-DAD-QToF/MS analysis with constructed LC–MS library previously reported. Among flavonoid derivatives, the abundant flavonols contained mainly as di- and tri-glycosidic forms from the SLs were distributed in the order of K, Q, and I according to their aglycone types, and had different predominant aglycones under affected by the cultivar’s characteristics. The SLs from yellow-coated seed mostly composed of KGs, whereas, the SLs from black-coated seed presented as QGs and IGs rich sources. From identified 83 flavonoid derivatives, the flavonol biosynthetic pathways were proposed according to the aglycones (K, Q, and I), so it is considered that the pathways can play a key role in determining their structures precisely and predicting flavonol biosynthesis. Thus, the SLs flavonoid profiles can contribute to breed superior varieties with excellent biological activities and perform metabolomics approach to investigate the changes of these flavonols of SLs during the leaf growth and fermentation in further study.