Introduction

Roses, belonging to the genus Rosa of the Rosaceae family, are favored for their vibrant flower colors, captivating fragrances, and versatile applications as ornamental plants. Approximately 30,000 to 35,000 cultivated rose varieties have been bred (Qi et al. 2018), with extensive utilization in landscape architecture, floral arrangements, culinary endeavors, pharmaceutical manufacturing, perfumery, and various other industries. As a result, rose plants and their flowers possess a multitudinous cultural and economic value (Raymond et al. 2018). Rose plants are deciduous or evergreen, possessing erect and climbing stems, often accompanied by prickles. Their predominantly odd-pinnate compound leaves are arranged alternately. The rose plant flowers are perfect, possessing both male and female reproductive organs (Basu et al. 2015). Its fruits are aggregated achenes, commonly referred to as the ‘rose fruit’ (Zieliński et al. 2010). Based on the botanical taxonomy of wild roses, the genus Rosa has been subdivided into the subgenera Hulthemia, Rosa, Platyrhodon, and Hesperhodos. Furthermore, the most populous subgenus, Rosa, has been further subdivided into 10 sections and 6 subsections (Wissemann 2017). China is widely recognized as the primary source and breeding ground for most rose germplasm resources, distributed across various provinces and cities throughout the country. Notably, the southwest region, particularly the Sichuan and Yunnan provinces, serves as a significant genetic diversity hub of roses in China (Li 1994; Tang et al. 2008; Zhao and Zhang 2003; Jian et al. 2013). Various breeding methods are utilized to improve roses, encompassing traditional cross-breeding, molecular breeding, and others. Notably, the cross-breeding technique has emerged as the most prevalent approach (Cheng 2000; Smulders et al. 2019). Rose breeding and the generation of rose varieties can be divided into three distinct periods. From prehistoric times to 1875, this initial period was characterized by the natural process of interspecific hybridization and the emergence of new varieties solely through natural pollination. The second period, from 1875 to 1967, encompassed the development of a yellow-colored petal, continuously flowering variety by Pemet in 1900. The third period, from 1967 and continuing to the present day, involves extensive cross-breeding of Chinese, European, and Middle Eastern rose varieties, which served as the genetic foundation for the creation of the ‘modern rose variety’ (De Vries and Dubois 1996; Bendahmane et al. 2013; Lidia and Irina 2009).

The research on genetic diversity provides insights into the stability of several traits as well as the changes that occur during evolution, leading to distinct groups of genotypes. This exploration holds significance in cultivating and safeguarding germplasm resources (Yanchuk 2001; Glaszmann et al. 2010). Genetic diversity analysis has also unveiled the evolution of traits within the Rosa genus (Meng et al. 2011). Phenotypic traits serve as the most perceptible manifestation of genetic diversity (Gulsen et al. 2007). Statistical analysis of phenotypic traits is a prevalent approach for assessing plant genetic diversity, as these traits serve as the foundation for identification, classification, and scientific studies (Silberstein et al. 2003). Observing phenotypes allows us to investigate the molecular mechanisms underlying diverse phenotypic traits in distinct regions and assess the advantages of various hybrid combinations (Singh et al. 2013). However, it is important to note that phenotypic markers have certain limitations, as they are prone to frequently irreversible variations induced by environmental variables as a result of adaptation to adverse conditions. This drawback is both an opportunity and a barrier to scientific research. While it can hamper the precision of research on plant botanical classification research, certain variations can be harnessed for breeding purposes (Sensoy et al. 2007). The instability of phenotypic markers can be surmounted by employing molecular markers, widely employed for discerning genetic relationships among species (Ben-Meir and Vainstein 1994). Furthermore, molecular markers can aid in distinguishing between various sections within the Rosa genus and unveiling genetic similarities (Liorzou et al. 2016; Tan 2017; Yang 2020). The utilization of molecular markers has become increasingly prevalent in genetic map construction (Cao et al. 2000) in both human and plant species (Zietkiewicz et al. 1994).

The size of the flower corolla is the primary ornamental component in roses, significantly influencing the quality and economic value of cut roses, and is determined by various factors, encompassing flower diameter, flower height, petal length, petal width, and petal number. Therefore, it is crucial to research the quantitative traits affecting rose flower characters. These traits have been shown to be influenced by major genes, minor polygenes, or a combination of both, known as trait inheritance affected by major genes and polygenes (Gai et al. 2003). The rose, a perennial woody garden plant, to a great extent, has a wide range of phenotypes in the F1 generation. In outcrossing plants, the F1 generation can be considered equivalent to the F2 generation of inbred crops, thus serving as a suitable generation (F2) for segregation analysis (Yang et al. 2020). Consequently, this approach has facilitated the effective genetic examination of traits in polyploid outcrossing plants within the Rosaceae family (Calle et al. 2020; Shi et al. 2020; Lau et al. 2022).

During long-term evolution and artificial selection, a wide range of traits in germplasm resources of the genus Rosa have emerged because of the interplay between their genetic makeup and external factors. To facilitate the accelerated growth of the rose industry, it is imperative to prioritize the generation of novel genetic resources and genetic enhancement. In contrast to previous studies, this study has collected and evaluated a more substantial assemblage of Rosa germplasm resources, encompassing a comprehensive representation of most sections, totaling 192 genotypes. Additionally, a more comprehensive analysis of phenotypic traits pertaining to the stem, leaf, and flower was undertaken, surpassing previous studies. This study provides essential information on germplasm resources that can facilitate hybrid breeding, laying the foundation for genetic improvement and germplasm innovation in roses. Moreover, it accelerates the advancement of molecular-assisted breeding approaches, with the identification of novel market-trait associations enhancing the effectiveness of rose breeding.

Materials and methods

Plant materials

In this study, a total of 192 genus Rosa genotypes were collected, comprising 38 wild rose genotypes (W), 83 old garden rose genotypes (C), and 71 modern rose genotypes (M). All plants were cultivated in open-air fields at the ornamental plant agricultural station of Huazhong Agricultural University, located in Wuhan City, Hubei province, China. The plants were more than two years old and received regular water and fertilizer management throughout the period of phenotypic assessment. The Latin names and origin of the collected germplasm resources are presented in Table S1. In this study, two hybrid populations were created in the spring of 2020 by crossing the tetraploid rose cultivar R. hybrida ‘Midnight Blue’ (M10) with R. hybrida ‘Sheherazad’ (M19) and R. hybrida ‘Couture Rose Tilia’ (M29). Following winter sowing, two F1 populations were obtained in 2021, consisting of 105 F1 generation plants from the M10 × M19 cross and 89 F1 generation plants from the M10 × M29cross. All materials were cultivated in the ornamental plant agricultural station of Huazhong Agricultural University under standardized maintenance and management practices.

Assessment of plant phenotypic traits

In this study, we evaluated 33 phenotypic traits in 192 genotypes of the genus Rosa. The abbreviations used for these traits are as follows: flowering frequency (FF), overall flower amount (OFA), inflorescence (Inflo), flower color (FC), flower scent (FS), petal number (PN), flower type (FT), calyx tube surface (CS), sepal morphology (SM), petal shape (PeS), state of the front edge of the petal (SFEP), petal velvet (PV), stamen number (SN), pistil status (PiS), anther color (AC), stigma color (SC), style length (SL), style morphology (StM), flower diameter (FD), flower height (FH), petal length (PL), petal width (PW), annual stem color (ASC), stem prickle density (SPD), stem prickle morphology (SPM), number of leaflets (LN), leaf texture (LT), leaf color (LC), leaf edge serration (LES), apical leaflet shape (ALS), leaf tip shape (LTS), stipule shape (SS) and leaf area (LA).

The phenotypic trait classification was based on guidelines for evaluating the distinctness, uniformity, and stability in rose plants (Rosa sp.) and the botanical classification of Floral of China and adapted in accordance with the specific circumstances. The measurement parameters and assigned values for each trait are listed in Table S2. The fundamental principle is to assign a value of 0 when a trait is not present and then assign values on a progressive basis. FC, AC, SC, ASC, and LC were measured using the Royal Horticultural Society Color Chart (RHSCC). The quantitative traits FD, FH, PL, PW, LN, and LA were measured in three times for each rose genotype, and the average values were calculated for the statistical analyses. The quantitative trait PN was transformed into a categorical variable. The phenotypic distribution of FC, PN, and FT is presented in Figure 1. Figures S4 display the flower front and side views of certain germplasms.

Fig. 1
figure 1

Classification of three major floral phenotypic traits in the studied Rosa sp. genotypes. (a) Petal color classification of the Rosa sp. genotypes: a) White; b) Green; c) Yellow; d) Orange; e) Pink; f) Rose red; g) Red; h) Blue-violet; i) Purple red; j) Variable color; k) Multicolor. (b) Petal number: a) Single; b) Semidouble; c) Double; d) Fully double. (c) Flower type of the Rosa sp. genotypes: a) Flat; b) Cup-shaped; c) Spherical; d) Protruding; e) Altar-shaped; f) Rosette; g) Quartered rosette; h) Pompon; i) Anemone; j) Button eye. The scale bar =1cm

Statistical analysis was conducted on PN, FD, FH, PL, PW, ASC, SPD, and LN for a total of 105 M10 × M19 F1 generation plants and 89 M10 × M29 F1 generation plants. The values and measurement standards for each trait are presented in Table S3. The stem and leaf traits were measured in January 2022, while the floral traits were observed and recorded during three distinct flowering periods: from March to May 2022, September to November 2022, and March to May 2023. The data pertaining to flower traits were analyzed from F1 generation offsprings and exhibited a minimum of three phenotypically stable and similar flowers. Based on these criteria, the final numbers of individuals in the two populations for statistical analysis of flower traits were 51 in the M11 × M19 F1 and 60 in the M11 × M29 F1 population.

Data analysis

Genetic diversity analysis was performed according to the method proposed in the past research (Rui et al. 2018). The maximum (Max), minimum (Min), mean (μ), and standard deviation (δ) of the measured values of each quantitative trait were calculated. The coefficient of variation (CV) was calculated as [formula (1)].

$$CV=\delta \div \mu \times 100\%$$
(1)

All data calculations were performed using Excel 2021 software (Microsoft, China). Shannon-Weaver diversity index [formula (2)] was used to analyze the rose germplasm diversity (Liu et al. 2009; Tena Gashaw et al. 2016; Yirgu et al. 2022).

$${H}{'}=-\sum Pi\times {\text{ln}}Pi$$
(2)

In the formula, Pi is the frequency at which the ith level of a trait occurs.

Since traits followed a normal distribution, the four classes (M − 1.2818S), (M − 0.5246S), (M + 0.5246S), and (M + 1.2818S) were divided into 5 grades so that each probability of the grades 1-5 was approximately 10%, 20%, 40%, 20%, and 10%, respectively (Liu 1996), the M indicates mean, and the S indicates standard deviation.

Cluster analysis

Cluster analysis on the results of phenotypic traits and molecular markers was performed using the Ntsys 2.10 software. Firstly, the statistical data of 192 Rosa genotypes were converted into a format compatible with the software, and the experimental data were normalized. Then, the genetic distance between the corresponding data was calculated, and groups were assigned according to the genetic distance. The Origin 2021 software was used to plot frequency histograms to analyze the variation of each numerical trait in the two hybrid populations.

The data points from the qualitative traits were marked as 1 and 0 according to their presence or absence in the assessed genotypes (Liu et al. 2016), and the grading criteria were the same as those used in calculating the genetic diversity coefficient. The binary 1/0 matrix was imported into NTSYSpc 2.10 software to calculate the genetic similarity coefficient (GS) matrix by routine DICE of SIMQUAL (Dice 1945). The GS matrix was converted to a genetic distance (GD) matrix in Excel 2021 software (GD=1-GS), and the resulting distance matrix was equivalent to Nei and Li's genetic distance (GDNL) matrix (Nei and Li 1979; Soleimani et al. 2002). MEGA11 was used to draw the clustering analysis diagram of the GDNL matrix by using an unweighted pair-group method with arithmetic mean (UPGMA).

DNA extraction

A modified CTAB method was used to extract genomic DNA from the 192 Rosa genotypes. Specific operations are as follows: (1) 35 new leaves were collected from the test material, placed into tubes with steel balls from grinding, and then were quickly placed into liquid nitrogen. The grinding equipment was used to fully grind samples before DNA isolation. (2) CTAB was mixed with mercaptoethanol in a ratio of 50:1, and the mixture was preheated in a water bath at 65℃ for 10 min. (3) 500 μl of the preheated solution was added into each tube with the ground samples. After mixing evenly, the tubes were incubated in a water bath at 65℃ for 30 min, and then the samples were removed and cooled to the room temperature. (4) An equal volume of chloroform: isoamyl alcohol 24:1 solution was added, and the samples were mixed thoroughly and left to stand for 10 min. (5) The samples were then centrifuged at 10000 r/min for 10 min. (6) Approximately 500 μL of supernatant was transferred into a new centrifugal tube. (7) The steps 4 to 6 were repeated two or three times. (8) Anhydrous ethanol was added to the centrifuge tubes. (9) Then, the samples were centrifuged at 10000 r/min for 10 min. (10) The supernatant was removed, leaving a white precipitate. 1 ml 75% ethanol was added to each sample tube, and then centrifuged for 10 min at 10000 r/min. (11) The ethanol in the sample tube was evaporated, leaving a white precipitate. (12) The precipitated DNA was dissolved in ddH2O preheated at 37℃.

After DNA extraction, 1% agarose gel electrophoresis and NanoDrop One spectrophotometer were used (Thermo Fisher Scientific, USA) to assess DNA quality and concentration. Each sample was then diluted to a concentration of 20–50 ng/μL for PCR amplification.

SSR analysis

We selected 10 SSR molecular markers located on different rose chromosomes to assess DNA polymorphisms between the genotypes (Table 1) (Zhang et al. 2006; Hibrand Saint-Oyant et al. 2008). Genotyping with all 10 pairs of SSR molecular markers was carried out by a Biotechnology Company. The 192 Rosa genotypes were genotyped by the SSR markers, and the resulting products were examined using the QIAxcel Advanced automatic capillary electrophoresis apparatus.

Table 1 SSR primer information

SSR genotyping and clustering

The SSR molecular marker amplification results were recorded using a 1/0 matrix. The band presence or absence at the same migration location was represented by 1 and 0, respectively (Soleimani et al. 2002). The methods of generating the GS matrix and GDNL matrix and constructing the cluster map were consistent with those of phenotypic trait clustering.

Correlation and principal component analysis

Correlation analysis and principal component analysis (PCA) were performed using the IBM SPSS Statistics 27 software, and Origin 2021 was used to plot the correlation heatmap and the 3D PCA loading plot. The analysis was performed according to previously published methods (Jin et al. 2021; Yaghini et al. 2013). Specifically, according to the eigenvalue of the principal component and the principal component loading values of 29 traits, the trait coefficients were calculated and implemented into the equation as F1~F6. The trait coefficient was calculated by dividing the principal component loading values by the arithmetic square root of the principal component characteristic values. By considering the contribution rates of PC1~PC6, a comprehensive score formula, denoted as F, was constructed. \({\text{F}}=0.177\times {\text{F}}1+0.126\times {\text{F}}2+0.096\times {\text{F}}3+0.068\times {\text{F}}4+0.065\times {\text{F}}5+0.061\times {\text{F}}6\) The comprehensive score of the 192 Rosa genotypes was obtained after conversion.

Segregation analysis

Segregation Analysis (SEA v2.0.1) was performed for the quantitative traits based on a mixed major gene-polygene inheritance model (Wang et al. 2022). The mixed major gene-polygene inheritance model was evaluated based on the single-generation segregation analysis method of the plant quantitative traits. The maximum likelihood value (MLV) and Akaike's information criterion (AIC) of various genetic models were obtained by combining the distribution of the phenotype frequency with 11 genetic models based on the phenotypic data of the F1 population (pseudo F2). Simultaneously, these models were used to test the goodness-of-fit. Equal distribution (U12, U22, and U32), Smirnov (nW2), and Kolmogorov (Dn) tests were conducted to identify the optimal model. When selecting the optimal model, one or several modeles were selected as candidates of the optimal model according to the AIC minimum principle. Subsequently, considering the results of the goodness-of-fit test, the candidate model with the least number of goodness-of-fit test results reaching the significance level (P<0.05) was selected as the optimal genetic model for this trait (Wang et al. 2023).

Results

Genetic diversity and variation analysis based on phenotypic traits

The frequency distribution of six quantitative traits closely adhered to the normal distribution. Through following the methodology proposed by Liu et al. (1996), the six quantitative traits, which were deemed to follow a normal distribution, were subsequently classified into five classes by using four reference points. Consequently, the quantitative traits distribution were transformed to a qualitative trait distribution, enabling the calculation of the genetic diversity coefficient. The coefficient of variation (CV) for the six quantitative traits ranged from 28.42% to 80.21%, whereas the H' index ranged between 1.05 and 1.46 (Fig. 2a, Table S4). These findings suggest the occurrence of a substantial amount of variation in the genus Rosa. Notably, four floral traits (FD, FH, PL, and PW) demonstrate high genetic diversity similarity, indicating a potential genetic relationship. Additionally, the coefficient of variation for LA was the highest (80.21%), with a range of 131.89~6.28 cm2, indicating a substantial variation in leaf area within the rose genotypes assessed.

Fig. 2
figure 2

Genetic diversity and variation analysis of the Rosa sp. genotypes. (a) Bar chart of the diversity index for quantitative traits. (b) Bar chart of the diversity index for qualitative traits. (c) The inflorescence types of wild roses, old garden roses, and modern roses. (d) The flower-color types of wild roses, old garden roses, and modern roses. (d) The petal-number types of wild roses, old garden roses, and modern roses. (f) The flower types of wild roses, old garden roses, and modern roses. FF, flowering frequency; OFA, overall flower amount; Inflo, inflorescence; FC, flower color; FS, flower scent; PN, petal number; FT, flower type; CS, calyx tube surface; SM, sepal morphology; PeS, petal shape; SFEP, state of the front edge of petal; PV, petal velvet; SN, stamen number; PiS, pistil status; AC, anther color; SC, stigma color; SL, style length; StM, style morphology; FD, flower diameter; FH, flower height; PL, petal length; PW, petal width; ASC, annual stem color; SPD, stem prickle density; SPM, stem prickle morphology; LN, number of leaflets; LT, leaf texture; LC, leaf color; LES, leaf edge serration; ALS, apical leaflet shape; LTS, leaf tip shape; SS, stipule shape; LA, leaf area

The H' values of 27 qualitative traits in the Rosa germplasms ranged from 1.91 to 0.14 (Fig. 2b, Table S5), suggesting a high genetic diversity. Among these traits, FC presented the highest H' value, indicating a broad variation in flower color. This can be attributed to the global distribution of the genus Rosa germplasm resources and the influence of diverse geographical environments and artificial selection breeding, contributing to significant differentiation in flower color. Conversely, StM exhibited the lowest H' value, suggesting a relatively low variability in style morphology among the different rose varieties.

The flower is widely recognized as the most significant organ in ornamental plants (Wang et al. 2010), and its ornamental value is primarily determined by its floral traits (Nybom 2009, Datta 2018, Hibrand Saint-Oyant et al. 2018). In this study, we analyzed the variation of Inflo, FC, PN, and FT within the 192 Rosa sp. genotypes. Our findings revealed that Inflo could be categorized into five distinct types. Among the single-flower plants, modern roses constituted the majority, followed by old garden roses, with wild roses being the least (Fig. 2C, Table S6).

The classification of FC encompassed eleven distinct types (Fig. 1a), effectively including the majority of the variability in flower color. Modern roses presented the highest diversity, comprising ten color types. In contrast, wild roses presented a relatively limited range, primarily pink and white hues. On the other hand, old garden roses presented greater variability in FC, with a predominant presence of pink and purplish red shades and a lower representation of yellow (Fig. 2d and Table S7). Notably, the degree of flower color variation progressively increased from wild roses to old garden roses and ultimately to modern roses (Fig. 2c–f).

PN was classified into four distinct types, as depicted in Fig. 1b. Among these types, 116 genotypes presented full double petals, with old garden and modern roses accounting for nearly half of the studied population. A gradual increase in PN from wild to modern roses was evident, resulting in a greater abundance of double-flower genotypes (Fig. 2e, Table S8). Similarly, FT could be categorized into ten types, as illustrated in Fig. 1c. The old garden roses encompassed six distinct flower types. In comparison, modern roses presented eight flower types, while only four were presented in wild roses (Fig. 2f and Table S9). These observed changes in PN and FT variability align with the evolutionary trajectory and breeding the history of roses.

Correlation analysis among traits

The degree and the significance of correlations between phenotypic traits were assessed using the Pearson correlation coefficient and the respective p-values. Among the 19 phenotypic traits analyzed, the Pearson correlation coefficients ranged from - 0.35 to 0.89, as depicted in Fig. 3. Notably, PW exhibited a positive correlation with PL at 0.89, the highest correlation coefficient among the traits. Conversely, the most significant negative correlation coefficient (-0.35) was observed between LC and LT. This study primarily investigated the relationship between floral traits, specifically PL, PW, FD, and FH. These traits, which exert a direct influence on the determination of flower size, exhibited highly significant positive correlations (Fig. 3). The highly positive correlation observed can be attributed to the co-inheritance of closely linked genes, as suggested by previous research conducted by Debener and Linde (2009) and Shupert et al. (2007). Notably, a negative correlation between PN and SN was observed.

Fig. 3
figure 3

Correlation heat maps between 19 phenotypic traits among the 192 Rosa sp. genotypes. Red represents positive correlations, and blue represents negative correlations. The darker color represents a greater absolute value of the correlation coefficient. * P<0.05, and ** P<0.01. FF, flowering frequency; OFA, overall flower amount; Inflo, inflorescence; FC, flower color; FS, flower scent; PN, petal number; FT, flower type; CS, calyx tube surface; SM, sepal morphology; PeS, petal shape; SFEP, state of the front edge of petal; PV, petal velvet; SN, stamen number; PiS, pistil status; AC, anther color; SC, stigma color; SL, style length; StM, style morphology; FD, flower diameter; FH, flower height; PL, petal length; PW, petal width; ASC, annual stem color; SPD, stem prickle density; SPM, stem prickle morphology; LN, number of leaflets; LT, leaf texture; LC, leaf color; LES, leaf edge serration; ALS, apical leaflet shape; LTS, leaf tip shape; SS, stipule shape; LA, leaf area

Principal component analysis of phenotypic traits

The genus Rosa exhibits a high degree of diversity across numerous plant traits, including flower color, flower types, plant types, leaf types, and more. In this study, we collected data on 33 traits from a substantial sample of 192 Rosa sp. genotypes. This dataset is considered extensive. To simplify the datasets obtained and facilitate subsequent genotype clustering or classification for pattern identification, a principal component analysis (PCA) was conducted. PCA serves as a valuable initial analytical and clustering method by reducing the dimensionality of the complete dataset (Ringer 2008; Yaghini et al. 2013). Consequently, a three-dimensional scatter plot was generated using the data from 19 phenotypic traits in the 192 Rosa sp. genotypes, allowing for the visualization of relationships between different germplasm types (Fig. 4).

Fig. 4
figure 4

PCA plot of 192 Rosa sp. genotypes based on variability on 19 phenotypic traits. PCs were calculated with one mean value per variety. Percentages in brackets represent the variance explained by each principal component. The dots of different colors represent the different germplasm groups. C, old garden roses; M, modern roses; W, wild roses. The circles correspond to the 95% confidence ellipse for each germplasm group. The arrows represent the Eigenvectors corresponding to the 19 traits

The eigenvalues of the top six components had values greater than 1, with substantial decline in the slope evident in the scree plot (Fig. S1). The first three principal components accounted for 40.0% of the observed phenotypic variation. Principal component one (PC1) (17.7%) predominantly differentiated the genotypes due to variability in flower size attributions, including FD, FH, PL, PW, and LA. PC2 (12.6%) primarily differentiated the genotypes due to variability in stem and leaf attributions, such as LC, LT, LN, SPD, ASC, and FF. PC3 (9.6%) primarily differentiated the genotypes due to variability in flower organ color and number attributions, such as SN, AC, and PN (Table S12).

A significant overlap between the old garden and modern roses was observed, with an almost complete overlap (Fig. 4). Modern roses have predominantly derived from old garden roses, resulting in highly similar phenotypes. Conversely, most wild roses presented no overlap with either of the two germplasm groups and their phenotypes differed significantly. The loading scores (Fig. S2) were extracted, revealing that the five traits FD, FH, PL, PW, and LA were predominantly responsible for the variation across PC1. Their loading scores were consistent with the distribution direction observed in old and modern gardens. Similarly, the three traits SPD, ASC, and FF, predominantly responsible for the variation across PC2, presented values reflecting the distribution pattern of old garden roses and modern roses. Among the traits responsible for the variation across PC3, SN presented loading scores reflecting the distribution of wild roses across PC3, while PN aligned with the distribution observed in old garden roses and modern roses. This demonstrated that wild roses exhibited notable variation in flower size, number of flower organs, flowering frequency, number of leaflets, annual stem color, and stem prickles density, thereby enabling their differentiation based on these traits. It is noteworthy to mention that modern roses have predominantly originated from old garden roses, leading to highly similar phenotypes.

Comprehensive evaluation based on phenotypic traits

The correlation between the F score and the desirable trait values was positive. The composite scores displayed significant variability, ranging from -1.175 to 1.464 (Table S13). Among these scores, the top five materials (R. chinensis ‘Zihongxiang’, R. hybrida ‘Burgundy Iceberg’, R. hybrida ‘Conrad F. Meyer’, R. rugosa ‘Gaohong’, R. floribunda ‘Sheherazad’) demonstrated exceptional flower morphological traits and desirable flower types, making them ideal candidates for breeding purposes.

Furthermore, a total of 15 traits presented a significant positive correlation with the overall F score, with correlation coefficients ranging from -0.172 to 0.755 (Fig. S3). Specifically, a positive correlation was observed between twelve attributes (PL, PW, FD, LA, FS, SPD, ASC, FH, LT, PV, SN, SC, and AC) and the overall F score. Notably, the correlation coefficient between PL and PW exceeded 0.7, indicating a tight interrelationship. Consequently, based on our findings, the floral traits, including flower size, annual stem color, leaf area, and leaf texture, significantly influenced the overall phenotypic characteristics of roses.

Cluster analysis using phenotypic trait markers

The breeding of the first hybrid tea rose, R. hybrida ‘La France’, in 1867 marked the initiation of a prosperous era in modern garden rose breeding. Through the combined assessment of phenotypic and molecular markers, we found that wild roses, old garden roses, and modern roses are closely related regarding both phenotypic and molecular aspects as they were integrally exploited and utilized during the modern rose breeding history.

Utilizing 33 phenotypic traits, a cluster analysis was conducted encompassing all genotypes, resulting in a genetic distance ranging from 0.15 to 0.91. The 192 genotypes were categorized into seven groups, denoted as Group I to Group VII, based on a genetic distance of 0.29 (Fig. 5). Group I comprises only 3 genotypes, characterized by predominantly simple flowers, a strong flower scent, full double flowers, and a limited number of stamens. Group II consists of 2 genotypes, primarily presenting a single flowering period per year, a low overall flower count, cup-shaped flowers, wavy or pleated petal tips, and fewer prickles on the stem. Group III encompasses 13 genotypes, predominantly presenting full double flowers with few stamens and long petals. Group IV consists of 32 genotypes, predominantly comprising wild roses and old garden roses, with only one modern rose genotype. These materials primarily present a single flowering period per year with heart-shaped petals. Group V comprises 15 genotypes, most characterized by a large overall flower number, a smooth surface of the calyx tube, yellow anthers, and a low flower height. Group VI encompasses 23 genotypes, including only old and modern garden roses. These genotypes have primarily single flowers with a strong flower scent, full double petals, and biserrate leaves. Group VII comprises 104 genotypes, which are predominantly old garden and modern roses, with only four wild rose genotypes. The rose genotypes in Groups I and II present distinct phenotypic characteristics, such as multiple flowering periods per year, full double flowers, cup-shaped flowers, and light-yellow stigma. Notably, these groups consist of only 5 rose genotypes, suggesting their exceptional uniqueness and potentially constrained exploitation potential in breeding programs. Notably, the C05 (R. rugosa ‘Gaohong’) cultivar in Group II demonstrates remarkable phenotypic traits, positioning it as the fourth-highest scorer in terms of the overall evaluation. Consequently, this cultivar holds significant potential for future utilization in breeding programs.

Fig. 5
figure 5

Cluster dendrogram of the 192 Rosa sp. genotypes based on the phenotypic trait markers. The old garden roses are indicated by green circles, the modern roses are indicated by red triangles, and the wild roses are indicated by blue stars. The three parents of the F1 segregating populations are framed in red

Cluster analysis using SSR markers

In this study, a total of ten pairs of SSR primers were employed to amplify genomic DNA extracted from the 192 Rosa sp. genotypes. The amplified products were subsequently detected. Based on the results, the number of alleles at the ten SSR loci varied between 8 and 20. A total of 131 alleles were identified by amplifying the ten pairs of SSR primers, resulting in an average of 13.1 alleles per locus. Notably, the RW5D11 locus exhibited the highest number of alleles, with 20 alleles identified. Conversely, the CTG21 displayed the lowest allele number, with only eight alleles detected. The mean effective number of alleles across all loci was calculated to be 6.14, ranging from 3.85 to 10.00. The mean polymorphism information index (PIC) was 0.82, ranging from 0.74 to 0.90. Based on the established criterion of PIC≥0.5 (Botstein et al. 1980), it can be concluded that all of the aforementioned loci exhibited a significant level of polymorphism.

A cluster analysis was conducted using 10 SSR markers. The genetic distance varied from 0.03 to 1.00, classifying the 192 germplasms into seven primary groups based on a genetic distance of 0.34. These groups were denoted as Group I to Group VII (Fig. 6). Group I consisted of 20 genotypes, predominantly comprising wild roses and a few old garden rose varieties. Group II included 15 genotypes, consisting of 13 wild roses, 1 old garden rose, and 1 modern rose variety. Group III encompassed 14 genotypes, consisting of 7 old garden roses, 4 modern roses, and 3 wild rose genotypes. Group IV consisted of only 4 genotypes, consisting of 2 wild roses and 2 old garden rose varieties. Group V consisted of only 4 genotypes, comprising 1 modern rose and 3 old garden roses. Group VI encompassed 57 genotypes, with most being modern rose varieties. Group VII comprised 78 genotypes, with most being old garden roses. Overall, the wild rose accessions were predominantly intermingled with a few old and modern rose varieties. The majority of the former are grouped together, as are the latter. In contrast to the phenotypic marker based cluster analysis, the cluster analysis based on molecular markers revealed a more distinct single cluster encompassing the three rose germplasm groups.

Fig. 6
figure 6

Cluster dendrogram of the 192 Rosa sp. genotypes based on the SSR molecular marker genotyping. The old garden roses are indicated by green circles, the modern roses are indicated by red triangles, and the wild roses are indicated by blue stars. The three parents of the F1 segregating populations are framed in red

Two F1 populations revealed abundant variation and extensive segregation

Based on the cluster analysis results, three tetraploid modern rose genotypes, specifically M10, M19, and M29, were selected as parents due to their substantial genetic dissimilarity in phenotypic traits and molecular characteristics. Consequently, two F1 hybrid populations were obtained. The three parents presented significant variation in flower, stem, and leaf phenotypes (Fig. 7a–c). Based on the findings on the germplasm panel, the traits responsible for the variation across the first three principal components in the PCA, which also demonstrated significant differences in the F1 hybrid populations, were selected for phenotypic analysis, namely PN, FD, FH, PL, PW, ASC, SPD, and LN. The objective of creating a hybrid population is to generate novel varieties with exceptional traits while establishing a framework to facilitate the breeding process through the analysis of the inheritance of the traits of interest in the population.

Fig. 7
figure 7

Diagram of crosses between the parental genotypes, frequency distribution histograms, and correlation analysis in the two F1 hybrid populations. (a) The floral traits of the three parental lines M10, M19, M29. (b) The leaf traits of the three parental lines M10, M19, M29. (c) The stem traits of the three parental lines M10, M19, and M29. (d) Frequency distribution of variation types and diversity index of qualitative traits in the M10 × M19 F1 hybrid population. (e) Frequency distribution of variation types and diversity index of qualitative traits in the M10 × M29 F1 hybrid population. (f) Histogram of flower diameter frequency distribution in the M10 × M19 F1 hybrid population. (g) Histogram of flower height frequency distribution in the M10 × M19 F1 hybrid population. (h) Histogram of petal length frequency distribution in the M10 × M19 F1 hybrid population. (i) Histogram of petal width frequency distribution in the M10 × M19 F1 hybrid population. (j) Histogram of flower diameter frequency distribution in the M10 × M29 hybrid population. (k) Histogram of flower height frequency distribution in the M10 × M29 F1 hybrid population. (l) Histogram of petal length frequency distribution in the M10 × M29 F1 hybrid population. (m) Histogram of petal width frequency distribution in the M10 × M29 F1 hybrid population. The gray column in (f)–(m) represents the frequency distribution, and the curve corresponds to the normal distribution curve. The values of the yellow and red triangles in contact with the horizontal coordinate represent the trait values in the maternal and paternal lines, respectively. (n)(o) Trait correlation heat map in the M10 × M19 and M10 × M29 F1 hybrid populations, respectively. The lower part of the triangle indicates the Pearson correlation value; the higher part of the triangle indicates the significance, * P<0.05, ** P<0.01. The scale bar=1 cm. M10, R. hybrida ‘Midnight Blue’; M19, R. hybrida ‘Sheherazad’'; M29, R. hybrida ‘Couture Rose Tilia’. PN, petal number; ASC, annual stem color; SPD, stem prickle density; LN, number of leaflets

By examining the frequency distribution and genetic diversity index (H') of four qualitative traits in two F1 hybrid populations, we identified 14 variant types across these traits (Table S14). The observed values for H', ranging from 0.53 to 1.16 and 0.28 to 1.29 in the two hybrid populations, respectively, exceeded 1, indicating substantial variation in qualitative traits (Fig. 7d–e). The two F1 hybrid populations presented the greatest variation in SPD and the highest genetic diversity index, suggesting that the genetics underlying SPD are intricate and encompass a wide array of variants. Moreover, the variation regarding the FD, FL, PL, PW, and PN traits in the two F1 hybrid populations was assessed (Table S15). The coefficient of variation in the M10 × M19 hybrid population ranged from 15.82% to 65.49% and from 15.02% to 40.67% in the M10 × M29 hybrid population. These results indicate a medium or higher level of variation in the quantitative traits within the two F1 hybrid populations.

The frequency histogram (Fig. 7f–m) and the kurtosis and skewness values (Table S15) of the four quantitative traits, namely FD, FH, PL, and PW, indicated a continuous multi-peak skewed distribution. The inheritance pattern of these quantitative traits aligned with a mixed major-effect genes-polygene model. Consequently, we applied a mixed inheritance model incorporating major-effect genes and polygenes for the analysis. The AIC values of 11 genetic models were computed for each trait, and a candidate model was chosen for each trait in the two F1 populations based on the minimum AIC values observed (Table S16). The goodness-of-fit test results of the selected candidate models did not achieve statistical significance (P≥0.05) (Table 2), suggesting that the frequency distribution of the traits within the populations closely matched the theoretical distribution based on the candidate genetic models. Consequently, the candidate genetic models were deemed optimal. Based on the chosen optimal genetic models, our findings revealed that PN in the M10 × M19 hybrid population might potentially be controlled by one major-effect locus, while the remaining traits in the two hybrid populations might be controlled by two major-effect loci.

Table 2 Genetic models and their goodness-of-fit test results for 4 quantitative traits in the two F1 hybrid populations

Correlation analysis among traits

Correlation analysis was performed on eight traits within the two F1 hybrid populations (Fig. 7 n–o). In the M10 × M19 hybrid population, out of the correlations assessed on 28 trait pairs, ten trait pairs presented an extremely significant positive correlation, two pairs a highly significant negative correlation, four pairs a significant positive correlation, and one pair a significant negative correlation. Among the correlations assessed on 28 trait pairs within the M10 × M29 hybrid population, eight trait pairs presented a highly significant positive correlation, one pair a highly significant negative correlation, one pair a significant positive correlation, and two pairs a significant negative correlation. Notably, the traits PN, FH, FD, PL, and PW were positively correlated, highly and significantly in both F1 hybrid populations, mirroring the correlations observed in the 192 Rosa sp.genotypes collection. The strong positive correlation between them might be caused by a shared genetic determinant of these traits.

Discussion

Determination of rose core germplasms

The utilization of statistical analysis on phenotypic trait data allows for the more comprehensive evaluation of germplasm resources, thereby facilitating the identification and utilization of genotypes with desirable traits that can be used in the breeding process or cultivated. In this study, the materials presentedd comprehensive scores ranging from -1.175 to 1.464. Notably, the top five genotypes with the highest scores were R. chinensis ‘Zihongxiang’ (C31), R. hybrida ‘Burgundy Iceberg’ (M32), R. hybrida ‘Conrad F. Meyer’ (C55), R. rugosa ‘Gaohong’ (C05), and R. floribunda ‘Sheherazad’ (M19). These genotypes can be considered as the core germplasm collection for future rose breeding practices (Nadeem et al. 2014). In the results of cluster analysis utilizing phenotypic traits, it was observed that all four core genotypes, with the exception of M19, were classified under Group VI. This group primarily consists of old and modern garden roses, suggesting that the materials within Group VI present exceptional phenotypic characteristics. Conversely, in the cluster analysis results based on molecular marker genotyping, two modern rose genotypes, M32 and M19, were classified in Group VI, while the remaining three old garden rose genotypes were classified under Group VII. Furthermore, these three old garden rose genotypes were genetically distinct from the wild roses. This finding suggests that a significant number of breeding lines with high-quality, desirable traits have been generated through long-term rose breeding practices.

The comprehensive phenotypic and molecular marker data from the 192 Rosa sp. genotypes, their genetic group classification, and the populations resulting from high-quality parental lines crossing offer a foundation for future investigations into the molecular mechanisms underlying ornamental traits in roses. In recent years, research on roses had progressively elucidated the molecular mechanisms governing the development and inheritance of their exceptional traits and the regulatory interactions between traits and the environment (Yan et al. 2023), and elucidated an intricate light-mediated regulatory network governing the biosynthesis of anthocyanin in rose petals, using the cultivar R. hybrida ‘Burgundy Iceberg’ as the experimental material. The findings of this study indicated that R. hybrida ‘Burgundy Iceberg’ exhibited the highest score regarding the desirable traits among modern roses, both in terms of phenotypic and molecular marker-based associations, when compared to other contemporary rose varieties. In 2021, a significant milestone was achieved with the completion of the first high-quality genome assembly of R. rugosa at the chromosome level. Comparative analysis with R. chinensis revealed the specific expansion and retention of stress-related genes in R. rugosa, potentially contributing to its adaptation capacity in stressful environments (Chen et al. 2021). Additionally, based on our phenotypic observations, R. rugosa presented superior disease resistance compared to R. chinensis. Therefore, future research efforts can be directed toward investigating the disparity in disease resistance between R. rugosa and R. chinensis to enhance the disease resistance of cultivated varieties. The duration of ornamental plants' attractiveness and ornamental value is determined by the timing and duration of flowering, which indirectly impacts their economic value. Flowering time can be categorized into three types: once-and-only flowering (OF), occasional or re-flowering (OR), and repeated or continuous flowering (CF). Recent studies have suggested that the regulation of rose flowering time may involve homologues of TFL1, RCSPL1-RCTAf15B, and KSN allelic heterozygosity (Iwata et al. 2012; Kurokura et al. 2013; Yu et al. 2023; Bai et al. 2021). Most old garden roses present repeated or continuous flowering during the growing season, whereas most wild roses present once-and-only flowering in a season. Most of the 38 wild roses examined in this study presented once-and-only flowering. However, certain species, such as R. cymosa, R. bracteata, R. igantea, R. gallica, and R. rugosa ‘Purple Branch’, presented repeated or continuous flowering during the growing season. These particular species can serve as valuable germplasm resources for future investigations into flowering time and duration.

Similarities and differences between two cluster analysis approaches

Cluster analysis entails the division of individuals into numerous groups, thereby revealing the degree of relatedness and genetic similarity (Diday and Simon 1976). The methods employed in cluster analysis present inherent consistency, wherein a specific algorithm is employed to aggregate similar individuals into a single class while simultaneously distinguishing those with significant dissimilarities. The progression of cluster analysis, spanning from examining phenotypic traits to exploring genetic diversity through molecular markers, generally gravitates towards simplicity and precision. These methods presents complementary features to improve the identification and appropriate classification of genetic diversity within plant populations.

Our study delved into the interconnections among Rosa genotypes, considering phenotypic traits and molecular markers. Based on a combined analysis of phenotypic traits and molecular markers, the 192 rose genotypes were classified into seven distinct groups. The outcomes of two clustering analyses demonstrated the grouping of wild roses into a single class, signifying their preservation of unique phenotypic traits and notable differentiation from old and modern garden roses. The findings of this study suggest that the molecular markers examined have the potential to accurately assess kinship and similarity differences at the molecular level. The clustering analysis results of 65 Rosa spp. genotypes assessed by six STMS markers aligned with the horticultural classification results, which were consistent with the findings of our study (Scariot et al. 2006). However, some uncertainties persist despite the general classification of old garden rose, modern rose, and wild rose genotypes into three distinct groups. These observations highlight the challenges associated with the horticultural classification of cultivars (Liorzou et al. 2016). The wild roses analyzed in this study exhibit several distinctive phenotypic traits, including a once-per-season flowering, single-petal flowers, a high leaflet number, and trichome-like features on the calyx tube surface. These traits serve to differentiate them from both old garden and modern garden rose genotypes. The classification of old and modern garden roses dates back to 1867, and the clustering analysis conducted in this study confirms the longstanding close relationship between them, considering their breeding history. Furthermore, they share similarities in both phenotypic and molecular marker profiles. At the same time, old garden rose genotypes hold great importance in the evolutionary trajectory and breeding strategies employed to improve modern garden roses.

Significantly, discernible disparities were observed in the clustering outcomes of the two methodologies. Specifically, the genetic distance range derived from phenotypic trait clustering ranged from 0.15 to 0.91, a notably narrower span than the range of 0.03 to 1.00 observed by molecular marker clustering. This discrepancy indicates the accuracy and reliability of molecular markers, as they remain unaffected by external environmental factors (Yang et al. 2020). Muitiple simple sequence repeat (SSR) markers have been developed and extensively employed in the examination of genetic diversity in ornamental plants and cash crops (Meng et al. 2009; Panwar et al. 2015; Amar et al. 2011), thereby facilitating the genetic classification of both wild and cultivated roses (Scariot et al. 2006; Meng et al. 2009; Panwar et al. 2015).

In the current study, notable differences in phenotypic characteristics were observed among the wild, old garden, and modern rose genotypes. Additionally, many of these rose genotypes were grouped in a single cluster at the molecular level, indicating that the conservation of numerous repetitive elements within the Rosa genus plants is preserved over the long-term evolution and selective breeding, despite the considerable variation at a genome-wide level. The clustering analysis revealed that many wild roses did not contribute to the generation of contemporary garden roses. Subsequently, it is recommended that further studies be conducted into the exceptional traits of both old garden roses and wild roses to integrate them into modern rose breeding practices (Zlesak 2006).

The heredity of rose traits is complex

In this study, we have discovered a significant overlap and correlation between traits of old garden roses and modern roses. Initially, during the early stages of hybrid tea rose breeding, only ten rose species, namely R. chinensis, R. foetida, R. gallica, R. gigantea, R. moschata, R. multiflora, R. phoenicea, R. rugosa, R. rubra and R. wichurana (Crespel and Mouchotte 2003), were utilized as parental plants. However, the intercrossing among different genotypes progressively intensified as breeding techniques advanced. This phenomenon was further intensified by environmental shifts and deliberate selection (De Vries and Dubois 1996), gradually increased phenotypic trait variability in the rose germplasm resources (Fig. 2c–f).

Flower color, flower type, and floral scent of ornamental plants have received increased attention from researchers (Bendahmane et al. 2013). In the natural environment, bright flower colors are crucial in attracting pollinating insects (Davies et al. 2012). In contemporary times, flower color has also become a significant factor in consumer selection (Behe et al. 1999; Wijayani et al. 2017). The pigments present in rose petals primarily consist of anthocyanins, flavonols, and carotenoids (Wan et al. 2019). The presence and quantity of these three pigments determine the wide range of colors observed in roses, thereby contributing to the intricate mechanisms controlling rose color formation. Among the 192 Rosa sp. genotypes assessed in this study, the petal color variation and its genetic diversity index were the highest. This suggests that the genetic mechanisms underlying flower color are intricate, and multiple genes likely influence the differentiation in color between various species.

The flower, the primary ornamental component of the Rosa genus, has consistently garnered attention in studies on the genetic and regulatory mechanisms governing petal number. This study reveals a significant negative correlation between petal number (PN) and sepal number (SN) across a substantial collection of 192 Rosa sp. genotypes. The negative correlation between the floral organ number in different whorls may be attributed to the differential expression of B and C genes in the ABC model, resulting in the transformation of petals in whorl 2 into stamens in whorl 3 (Coen and Meyerowitz 1991; Weigel and Meyerowitz 1994; Kitahara et al. 2004; Francois et al. 2018). Previous studies have considered the number of petals as a quantitative trait regulated by multiple quantitative trait loci (QTLs) (Hibrand Saint-Oyant et al. 2008, Debener et al. 2001, Roman et al. 2015). In recent years, it has been suggested that two closely linked loci on LG3 (~27.80–33.83 Mbp) control petal number in double flowers (Hibrand Saint-Oyant et al. 2018; Schulz et al. 2021; Rawandoozi et al. 2023).

In cultivated rose species, the practice of negative selection against the presence of rose stem pickles is commonly observed (Chaanin 2003). In this study, a markedly pronounced negative correlation was identified between SPD and ASC. Previous studies have suggested that prickles on stems may be governed by the inheritance of multiple major-effect dominant genes (Debener and Linde 2009; Linde et al. 2006). Moreover, the association between stem prickles and stem color remains ambiguous, and their apparent negative correlation is worth further investigation in future research.

Conclusions

In summary, an integrated analysis was undertaken to evaluate the genetic diversity of 192 Rosa sp. genotypes through phenotypic and molecular markers. This analysis has erected a fundamental basis for understanding the genetic phenotypic trait variation within the genus Rosa, as well as for the identification and utilization of core germplasm resources, the selection of appropriate parental lines for hybrid breeding, and the development of novel varieties with desirable traits. In future research, cluster analysis results can be utilized to identify appropriate parental lines from diverse classification roses, facilitating the development of hybrid populations. Overall, this study not only advances the exploration of the genetics and inheritance of valuable ornamental traits but also establishes a foundation for the investigation and exploitation of genetic diversity in other ornamental plant germplasm resources.