Introduction

Polygala tenuifolia Willd, also known as ‘Yuanzhi’ in Chinese, is a perennial herb in the family Polygalaceae and is widely distributed in China (Jin et al. 2014). In traditional Chinese medicine, P. tenuifolia is generally used as an expectorant and stimulant and is frequently applied in the treatment of forgetfulness, insomnia,P. tenuifolia and neurasthenia (Zhang et al. 2016). Previous studies on P. tenuifolia also showed that it has protective effects on the central nervous system and confirmed that the aerial parts of P. tenuifolia exhibit significant biological activities in ameliorating learning ability and memory impairments (Deng et al. 2020; Nguyen et al. 2020; Vinh et al. 2020; Zhao et al. 2020).

The first record of P. tenuifolia appeared in the Compendium of Materia Medica as a treatment for amnesia in China (Cheng 2021). It is an important medicinal herb and is one of the 42 tertiary wild species under national priority protection in China. Shanxi province is the main production area of P. tenuifolia in China, accounting for 70% of the national production and including 90% of the national germplasm resources. The domestic history of the P. tenuifolia is short, with little domestication, and breeding is also lacking. With the accelerating pace of life, the number of people with insomnia and forgetfulness is gradually increasing, the demand for P. tenuifolia has been growing. However, the lack of good P. tenuifolia cultivars has severely restricted the development of its industry.

Germplasm resources are the basis of plant breeding (Liu et al. 2020; Thudi et al. 2021). The investigation of agronomic and physiological traits of P. tenuifolia allows a comprehensive assessment of its various characteristics, quality and yield. At present, morphological markers are usually the simplest and most effective methods for the identification and evaluation of genetic resources (Chesnokov et al. 2020). The principal component analysis (PCA), correlation analysis and cluster analysis are the most widely used methods in morphological and genetic diversity research of genetic resources. They have been broadly used in soybean (Wang and Komatsu 2018), rice (Bhargavi et al. 2021), millet (Sharma et al. 2018), tomato (Zhou et al. 2015), among other plant cultigens, while few investigations have their focus on P. tenuifolia. Hence, we collected 157 germplasm resources of P. tenuifolia from different regions in China and analyzed their genetic diversity, estimated trait correlation, and did principal components and clustering based on 12 phenotypic traits. Our study provides the material and theoretical bases for the utilization of germplasm resources, parental selection and varietal improvement of P. tenuifolia in China.

Materials and methods

Experimental materials

A total of 157 germplasm resources of P. tenuifolia from Shanxi, Hebei, Inner Mongolia, Henan, and Shaanxi were collected in this study (Table 1). Samples were provided by the Institute of Cash Crops, Shanxi Agricultural University, and purified over three generations (Fig. 1).

Table 1 The statistic of the Polygala tenuifolia germplasm resource collected in this study
Fig. 1
figure 1

The geographical distribution of the collected Polygala tenuifolia germplasm resources

Overview of experimental base

The experiment was conducted at the experimental base of the Cash Crop Research Institute, Shanxi Agricultural University in Fenyang, China. Fenyang is located in Loess Plateau, with an average altitude of1414 m. The climate in Fenyang is temperate monsoon, with an annual average temperature of 9.7 °C, an average ground temperature of 12.6 °C, and annual average precipitation of 467.2 mm. The soil type of the experimental base belongs to a brown soil.

Experimental design

Polygala tenuifolia seeds were planted on 25th June 2016. The planting density was 3.5 kg/667m2, with an area of 3 m × 2 m, and three replications were set for each germplasm resource. The manual seed drill was a depth of 1.5 cm and a row spacing of 25 cm. Chicken manure 350 kg/667 m2 was added before planting, and no more fertilizer was used during the reproductive period. Weeding was performed for 4–5 times a year, and treatments for all regions were consistent. Seeds were harvested from 1–7 October 2019.

Measurement indicators

Plant traits, including the plant height, leaf shape, leaf color, and flower color, were recorded during the flowering stage in 2019. After harvesting the seeds in late July, three plants were randomly selected from each area to measure plant height. Root length, root thickness, and fresh weight were measured in October. When the moisture of roots dried to 20–24%, the bast was extracted manually, and the bast weight and wood core thickness were measured. After baking to a constant weight at 105 °C, the dry weight of the bast and wood core were measured. Qualitative traits were assigned different values (Fig. 2). For the leaf shape, thin and long was 1 (width ranging from 1.0 to 1.5 mm, length exceeding15 mm), thin and short was 2 (width ranging from 1.0 to 1.5 mm, length ranging from 10 to 15 mm), thick and long was 3 (width exceeding 1.5 mm, length ranging from 10 to 20 mm), and thick and short was 4 (width exceeding 1.5 mm, length exceeding 20 mm). For the leaf color, green was 1, gray-green was 2, and tender green was 3. For the flower color, purple was 1, blue-purple was 2, and light purple was 3.

Fig. 2
figure 2

Comparisons of different features of the collected Polygala tenuifolia germplasm resources

Data analysis

Statistical analyses of the raw data were conducted using Microsoft Excel 2019. Key metrics calculated included the overall mean (X), standard deviation (SD), and coefficient of variation (CV). Additionally, the Shannon–Wiener index (H′) was employed to assess the genetic diversity of the test materials, with Pi representing the proportion of the ith variant within the trait. The Shannon–Wiener index (H′) was calculated using the formula: H′ = − ∑ (Pi× ln (Pi)), where Pi is the frequency of the ith variant relative to the total number of variants for a given trait.

For correlation and cluster analyses, the data were processed using Origin 2022. Principal component analysis was performed with IBM SPSS Statistics 21 to further interpret the data and investigate underlying relationships.

Results

Variation and phenotypic diversities of different P. tenuifolia germplasm resources

As shown in Table 2, the above-ground traits of P. tenuifolia varied widely, with a coefficient of variation (CV) range of 13.5–44.35%. The variation in branching was the greatest at 44.35% and the variation in plant height was the least at 13.5%. The index of genetic diversity ranged from 0.66 to 2.07, with the smallest variation was noticed for  leaf color at 0.66 and the largest variation for plant height at 2.07.

Table 2 A Variation and genetic diversity analysis of the above-ground traits of Polygala tenuifolia

According to the CV in Table 3, the root traits of P. tenuifolia germplasm resources varied widely, with a range of variation from 15.98 to 57.29%, of which the branching variation was the largest at 57.29% and the fresh root thickness variation was the smallest at 15.98%. The index of genetic diversity ranged from 0.95 to 2.05, with the smallest difference of 0.95 in fresh root weight and the largest difference of 2.05 in wood core thickness (Table 3).

Table 3 Variation and genetic diversity of root traits of Polygala tenuifolia

Correlation analysis of phenotypic traits of different P. tenuifolia germplasm resources

The correlations were strong among root traits while were weak among above-ground traits (Table 4). The number of branches was highly significantly and positively correlated with the fresh root weight, fresh root thickness, dry weight of the wood core, dry weight of root bark, and was significantly and positively correlated with above-ground branches. Fresh root weight was highly significantly correlated with root length, fresh root thickness, dry weight of the wood core, and dry weight of root bark, and was significantly correlated with plant height. Root length was highly significantly and positively correlated with the dry weight of the wood core and the dry weight of root bark. Fresh root thickness was highly significantly and positively correlated with the dry weight of the wood core and the dry weight of root bark. The dry weight of the wood core was significantly and positively correlated with the dry weight of root bark. The dry weight of root bark was significantly correlated with plant height.

Table 4 Correlations among traits of different Polygala tenuifolia germplasm resources

PCA on phenotypic traits of different P. tenuifolia germplasm resources

Principal component analysis was performed on 13 phenotypic traits of 157 P. tenuifolia germplasm resources using SPSS24.0 software. The results showed that the cumulative contribution of the first four principal components (PC1, PC2, PC3 and PC4) amounted to 62.63% and contained most of the information of five above-ground traits and eight root traits.

The eigenvalue of PC1 was 4.5414 with a contribution of 34.9341%. Among its eigenvalue vectors, the traits with high absolute values were fresh root weight (0.4546), fresh root thickness (0.3945), and dry weight of root bark (0.4356), which are closely related to yield. Therefore, PC1 is the yield factor. The eigenvalue of PC2 was 1.4280 with a contribution of 10.9850%. In its eigenvector, the traits with high absolute values were the number of branch roots (0.3355), branches (0.5167), leaf shape (− 0.3576), and leaf color (0.4800). PC2 is branch and leaf traits. The eigenvalue of PC4 was 1.1209 with a contribution rate of 8.6225, and the trait with a higher absolute value was plant height (− 0.5309), so PC3 is plant height class. The eigenvalue of PC4 was 1.0518 with a contribution rate of 8.0910%, and the trait with a higher absolute value was wood core thickness (0.7418), and PC4 is wood core class (Table 5).

Table 5 Principal components (PC) and their contributions by each trait

Clustering analysis of phenotypic traits of different P. tenuifolia germplasm resources

Clustering analysis was performed on the 12 phenotypic traits of 157 P. tenuifolia germplasm resources by a systematic clustering method. When the genetic distance was ten, the participating materials were divided into four clusters (Supplementary Figure 1). Meanwhile, the phenotypic traits of the germplasm resources in four clusters were statistically analyzed (Table 6).

Table 6 Mean values and characteristics of phenotypic traits of the four taxa

Taxon I included 54 germplasm resources, of which 28 were from Shanxi, 11 from Shaanxi, 4 from Henan, and 11 from Hebei. The main characteristics of taxon I are heavy (13.46 g) and thick (6.78 mm) fresh roots, more branch roots (3.14), large yields, short and slender leaves, and purple flowers. Taxon I belongs to the high-yielding specific material. There are 33 germplasm resources of taxon II, including 11 from Shanxi, 8 from Shaanxi, 6 from Henan, and 8 from Hebei. The main characteristics of taxon II are long roots (45.56 cm), green leaves, purple-blue-purple flowers, and high yield potential. Taxon II belongs to the long-rooted specific material. Taxon III includes 46 germplasm resources, including 21 from Shanxi, 6 from Shaanxi, 2 from Henan, and 17 from Hebei. The main characteristics of taxon III are more branches in the above-ground part, high seed yield potential, slender and short leaves, and purple and blue-purple flowers. Taxon III belongs to the multi-branching specific material. There are 24 germplasm resources in Taxon IV, including 6 in Shanxi, 6 in Shaanxi, 1 in Henan, 9 in Hebei, and 2 reviewed varieties. The main characteristics of Taxon IV are high plant height (34.23 cm) and blue-purple flowers. Taxon IV belongs to the high plant height specific material.

Correlation analysis between active ingredients and agronomic traits in P. tenuifolia

According to the taxa of P. tenuifolia germplasm resources, 30% of germplasm resources of each taxon were randomly selected for active ingredient determination. Correlation analysis was performed on the active ingredients and phenotypic diversity of P. tenuifolia germplasm resources. The correlation between active ingredients and phenotypic traits of P. tenuifolia was small (Table 7). The root branch of P. tenuifolia was significantly correlated with xanthones, and the plant height was significantly negatively correlated with 3,6′-disinapoylsucrose. There were significant positive correlations between 3,6′-disinapoylsucrose and the contents of both saponin and xanthones. All root traits were highly significant correlated with each other. The correlations among above-ground traits and among root traits were small. Leaf shape was significantly and positively correlated with fresh root weight, fresh root thickness, and wood core thickness. Branch was significantly and positively correlated with the number of branch roots.

Table 7 Correlations between active ingredients and agronomic traits in  Polygala tenuifolia

Discussion

Germplasm resources are the cornerstone of crop improvement, and analysis of the genetic background of germplasm resources is the key to breakthroughs in plant breeding (Yang et al. 2020). In the present study, we evaluated the phenotypic characteristics of 157 P. tenuifolia germplasm resources collected from different regions across China, which aimed at enhancing and harnessing desirable traits in future breeding of P. tenuifolia. Our results revealed substantial variability in both above-ground and root traits, as indicated by a CV exceeding 13.5%. This level of variation is considerable, aligning with previous studies that interpret a CV greater than 10% as indicative of significant inter-sample variability (Toebe et al. 2018). The genetic assessments on P. tenuifolia also showed a high genetic diversity in its populations (Peng et al. 2018). The germplasm resources of P. tenuifolia involved in our study exhibit a relatively rich genetic variation, which is conducive to the screening of high-quality germplasm resources and provides abundant material for further breeding.

The comprehensive correlation analysis was conducted to investigate the relationships between active ingredients, root traits, and above-ground traits in P. tenuifolia germplasm. We observed robust correlations within distinct root traits and active ingredients, whereas correlations among various above-ground traits were relatively weaker. Notably, a significant positive correlation emerged between the number of branched roots and xanthones content, while plant height was inversely related to the levels of 3,6′-disinapoylsucrose. Additionally, 3,6′-disinapoylsucrose content displayed a positive correlation with both xanthones and saponin levels. Morphological traits such as leaf shape exhibited a significantly positive correlation with fresh root weight, thickness, and wood core thickness, whereas branching was significantly positive in correlation with the number of branched roots. Plant height, number of branched roots, and leaf shape were significantly correlated with 3,6′-disinapoylsucrose, xanthones, and overall yield, respectively.

Contrasting with the findings of previous research (Zhang et al. 2020; Zhang et al. 2017), our results suggested that the selection of P. tenuifolia germplasm for this study could account for these differences. Traditional selection criteria for P. tenuifolia, which favor attributes such as thicker tubers and fleshier, non-woody roots, were not validated by our analysis. The indicator compounds, xanthones, 3,6′-disinapoylsucrose, and saponin, did not exhibit a significant correlation with the traditionally valued root traits. This aligns with the outcomes reported by Wang et al. (2016), reinforcing the notion that morphological traits are crucial determinants of both yield and the medicinal quality of P. tenuifolia. Consequently, breeding strategies for P. tenuifolia should not rely solely on conventional wisdom but rather should incorporate a systematic evaluation of morphological traits in relation to key active ingredients. Our insights provide valuable guidance for the selection and cultivation of superior P. tenuifolia cultivars.

Principal component analysis can simplify the classification of morphological traits by grouping morphological traits into several integrated factors to reflect the original morphological trait variables without losing the original information (Jang 2021). In this study, the PCA results indicated that the first four principal components accounted for 62.63% of the total variation, containing most of the data from five above-ground traits and eight root traits. Four principal components were correlated with each other, indicating that the PCA was instrumental in guiding the selection of both the germplasm materials and the trait indicators for further analysis.

Based on the observed values of different variables of the materials, the degree of similarity between different materials was calculated, and the cluster analysis can define different materials with similarity degrees until all materials are included clustered into a phylogenetic tree. In our analysis of 157 germplasm resources based on 12 phenotypic traits; the germplasm was divided into four distinct taxa at a genetic distance of 10. Each taxon corresponded to a specific material trait: taxon I to high-yield, taxon II to long-rooted, taxon III to multi-branching, and taxon IV to tall plant stature. Additionally, we observed that the roots of P. tenuifolia sourced from Shanxi (Linfen City and Yuncheng City) exhibited greater length and weight compared to those from other regions under study.

The identification of morphological traits and genetic diversity of crop germplasm resources is usually performed by using PCA, correlation analysis, and cluster analysis, but most morphological traits are affected by both genetic and environmental factors. Therefore, the identification and genetic research of germplasm resources only based on a single morphological trail have limitations on the results. In the future, cytology, biochemistry, and molecular markers should be combined to obtain more accurately genetic diversity of P. tenuifolia and provide a theoretical basis for efficient utilization of P. tenuifolia germplasm resources.