Introduction

The genus Rosa L. is one of the largest genera in the Rosaceae family, with 149 recognized species widespread in Asia, Middle East, Europe, and North America (www.theplantlist.org, accessed on 1 January 2024). Amont ca. 60 wild rose species of Eurasian distribution (Wisseman and Ritz 2007), Rosa canina L. (dogrose) is used as ornamental but also in folk medicine, as food or in cosmetics owing either to its aroma and/or the nutraceutical value of its rose hips (Ercisli 2007; Ayati et al. 2018). The latter contain a broad number of high-value bioactive compounds, including vitamin C, flavonoids, tannins, anthocyanins, catechins, carotenoids, fatty acids, tocotrienols, and minerals (Rein et al. 2004; Ayati et al. 2018; Pehlivan et al. 2018; Maloupa et al. 2021). Due to its high nutraceutical importance coupled with only a few allergic reports or side effects to date, R. canina is regarded as an important fruit in the food industry which is widely employed to treat inflammations, cold and cough, skin disorders and chronic pain, or is used to prevent diabetes, hypertension, arthritis, aging, cardiovascular diseases etc. (Ahmad et al. 2016; Kerasioti et al. 2019).

The genus Rosa exhibits an intricate taxonomy and perplexed evolutionary history due to its varied reproductive strategies and unique pattern of inheritance (Wisseman and Ritz 2007) and DNA fingerprinting has being extensively used to analyze the variability in the genus Rosa (e.g., Gahlaut et al. 2021). Rosa canina is generally considered as a pentaploid species (2n = 5x = 35, with base number of chromosomes x = 7), with most morphological characteristics as well as flower volatiles being inherited maternally (Wisseman and Ritz 2007). In this context, most microsatellite loci show maternally inherited alleles, whereas most of the random amplified polymorphism design (RAPD) bands are only passed through seeds to the next generation (Wisseman and Ritz 2007). A draft genome of the wild rose R. multiflora Thunb. has been recently sequenced serving as a valuable genetic resource for breeding of cultivated roses (Nakamura et al. 2018). Our current understanding of segmental allopolyploid patterns in Rosaceae has significantly been broadened due to the sequence of a double haploid rose line from R. chinensis Jacq. combining short- and long- reads, as well as after the generation of a high-density single nucleotide polymorphism (SNP) genetic map from a tetraploid R. × polliniana Spreng. (syn. R. x hybrida Schleich.) population (Raymond et al. 2018). Furthermore, transcriptomic research on diverse germplasm, tissues, and types of roses has enabled to date the development of a gene expression database underlying candidate genes of interest for various members of the genus Rosa such as those related to pathogen infections (e.g., Dubois et al. 2012; Liu et al. 2018; Neu et al. 2019).

Greece has a broad number of wild-growing Rosa species, with native R. canina germplasm being one of the most valuable ones as it is associated with high ornamental, pharmaceutical, cosmetic, and nutritional value (Maloupa et al. 2021). The identification of high-resolving DNA-based markers is of paramount importance to unlock the potential of neglected and underutilized germplasm across the Mediterranean basin (Maloupa et al. 2021). In this context, the present investigation aimed to present the first study on the genetic diversity and relationships of 12 R. canina genotypes, by combining molecular characterization strategies based on Inter Simple Sequence Repeats (ISSR) (Reddy et al. 2002), Start codon-targeted (SCOT) (Collard and Mackill 2009) and Exon-based Amplified Polymorphism (EBAP) (Xiong et al. 2022). In particular, the objectives of this study were to: (i) evaluate the genetic diversity and population structure of the R. canina genotypes using ISSR as an arbitrary technique, SCoT as a gene-targeting technique markers and EBAP as an exon-targeting technique; and (ii) compare the level of information provided by ISSR, SCoT and EBAP markers to assess genetic similarities among the investigated germplasm. This study provides baseline data on the population structure and genetic information of R. canina wild-growing populations from Greece which could be useful and informative for further breeding programs.

Materials and methods

Plant materials studied

In total, nine R. canina populations were collected from the wider area of the Balkan Botanic Garden of Kroussia (BBGK) in Northern Greece (41°05′27″ N, 23°06′36″ E), among which seven were wild-growing at 600–650 m of altitude above sea level, and two ex-situ-maintained in the BBGK that were originally sourced from the wild (Fig. 1; Table 1). The expeditions were conducted during fall 2021 when the rosehips were ripe and aimed at the selection of genotypes with specific traits such as vigorous growth and strong fruiting potential. For DNA analysis, leaf samples from nine individuals were collected. After taxonomic identification, each genotype was given a unique IPEN (International Plant Exchange Network) accession number by the Institute of Plant Breeding and Genetic Resources (IPBGR) of the Hellenic Agricultural Organization—Demeter (ELGO-DIMITRA). Additionally, another three wild-sourced R. canina population samples with vigorous growth and strong fruiting potential were exploited from previous research projects (Maloupa et al. 2021). All collections were performed using the institution’s authorized special permit (Permit 26,895/1527 of 21/4/2021); this permit is issued yearly by the Greek Ministry of Environment and Energy after detailed reporting of the applicant.

Fig. 1
figure 1

Partial view of the ex-situ-maintained Rosa canina germplasm originating from wild-growing Greek native populations employed in this study

Table 1 Selected Rosa canina Greek native genotypes sampled from various sites of Mt Kroussia and other regions in northern Greece assigned with different IPEN (International Plant Exchange Network) accession numbers

Genetic characterization using ISSR, SCoT and EBAP molecular markers

DNA was isolated from young leaves using the NucleoSpin® Plant II Kit (Macherey–Nagel, Nordrhein-Westfalen, Germany) according to the manufacturer’s instructions. DNA concentration and quality were estimated spectrophotometrically at 260 and 280 nm using an Eppendorf BioPhotometer (Eppendorf, Hamburg, Germany). The integrity of the DNA was determined using gel electrophoresis on a 0.8% (w/v) agarose gel.

After preliminary screening of 50 ISSR (University of British Columbia—UBC), ScoT primers (Collard and Mackill 2009) and EBAP (Xiong et al. 2022), 25 unambiguously scorable and reproducible markers were selected on percentage of polymorphism (7 ISSRs, 9 SCoTs and 7 EBAPs). Oligonucleotide primers complementary to simple sequence repeats (UBC807, UBC810, UBC811, UBC834, UBC835, UBC840 and UBC841) and others complementary to codon targeted polymorphisms (SCoT1, SCoT13, SCoT14, SCoT15, SCoT30, SCoT33, SCoT34, SCoT61 and SCoT66) and exon targeted polymorphisms (EBAP2, EBAP3, EBAP4, EBAP6, EBAP8, EBAP13 and EBAP21) were used for PCR amplification. The reaction mixture (total volume 25 μL) contained the following reagents: 0.5 μL dNTPs (10 mM), 2.5 μL 10 × -buffer, 1 μL primer (10 mM), 0.5 μL template DNA, 0.1 μL Taq polymerase (5 U/μL) and 20.3 μL sterile dd H2O. PCR amplifications were carried out according to Kadoglidou et al. (2023) using an Eppendorf Mastercycler EP Thermal Cycler Range (Eppendorf, Hamburg, Germany): DNA denaturation was initiated at 95 °C for 5 min followed by 35 cycles at 95 °C of 30 s each, for DNA annealing at 48–56 °C (all at 52 °C excepting UBC811 at 48 °C), for annealing of the primers for 90 s and at 72 °C for 90 s for chain extension. The temperature was held at 72 °C for 5 min after the 35 cycles were completed.

The amplification products of the ISSR, SCoT and EBAP markers were separated by electrophoresis on a 1.5% (w/v) agarose gel and stained with ethidium bromide. The size marker was a 2-log DNA ladder (New England Biolabs, Ipswich, MA, USA). The gels were exposed to UV light in a UVItec Transilluminator (UVItec Limited, Cambridge, UK), and UVIDoc software UVIDocMw version 99.04 (UVItec Limited, Cambridge, UK) was used for analysis.

ISSRs, SCoTs and EBAPs alleles were scored based on whether specific fragments were present (1) or absent (0). The phenotypic data matrices for genetic (ISSR + SCoT + EBAP) information were also generated. All matrices were subsequently analyzed in the same fashion. Nei’s coefficient (Lynch and Milligan 1994) was determined to assess the genetic variance within the groups of genotypes, while Nei’s formula was used to determine their genetic distance (Nei 1978). Principal Component Analysis (PCA) was used as a graphical representation of a matrix to show how closely related the genotypes were. The calculation of number of alleles (Na), effective number of alleles (Ne), Shannon’s Information Index (I), Nei’s haploid gene diversity (h), Nei’s unbiased haploid gene diversity (uh), as well as Percentage of polymorphism (P) was performed using GenAlex 6.5 (Peakall and Smouse 2012) and Microsoft® Excel 2010/XLSTAT©-Pro software (Version 2013.4.07; Addinsoft Inc., Brooklyn, NY, USA).

The Dice similarity coefficient (Dice 1945) as implemented in the ade4 1.7–15 R package (Dray and Dufour 2007) was used along with the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) clustering algorithm to perform cluster analysis in R 4.0.2 (R Core Team 2020). The ape 5.4–1 package was used to compute bootstrap support (1000 bootstraps), and the phytools 0.7–47 package was used to display the resulting dendrograms (Revell 2012). The ‘admixture’ and ‘independent allele frequencies’ models were used to run STRUCTURE 2.3.4 (Pritchard et al. 2000). With a burn-in of 500,000 iterations and 1,000,000 MCMC repetitions for each run, 20 replicates from K = 1 to K = 6 were used. K was inferred using Evanno’s method (Evanno et al. 2005), which was run in the pophelper 2.3.0 R package (Francis 2017). The software Structure threader (Pina-Martins et al. 2017) was used to parallelize distinct runs of K.

Results

The utilization of ISSR, SCoT and EBAP markers in the genetic analysis of the germplasm collection of R. canina provided significant insights into the genetic diversity of the Greek native populations (wild-growing and ex-situ-maintained ones sourced from the wild). Regarding the ISSR markers, a comprehensive analysis of 99 genetic loci was conducted on a sample size of 12 individuals (one from each population), yielding a substantial dataset suitable for evaluating their genetic diversity. The average number of distinct alleles (Na) in the R. canina germplasm collection was found to be 1.818, whereas the average number of effective alleles (Ne) was 1.524 (Table 2). The Shannon's Information Index (I) was found to be 0.448, suggesting a modest degree of genetic variation. The diversity (h) and unbiased diversity (uh) were determined to be 0.302 and 0.330, respectively, highlighting the comprehensive genetic abundance throughout the germplasm collection. The latter exhibited significant number of polymorphic loci reaching 81.82%, thus outlining the extant genetic diversity. The binary data exhibited a wide range of band patterns with a total of 99 recorded bands, each with a frequency of 5% or higher. Pointedly, we detected no often occurring bands in 25% or 50% or less of the populations, thus suggesting a lack of widespread prevalence in band frequencies.

Table 2 Genetic diversity statistics of Rosa canina Greek native germplasm collection based on ISSR, SCoT and EBAP markers

Accordingly, a comprehensive analysis using SCoT markers was conducted on 114 loci across 12 population samples consisting of the germplasm collection. The average number of distinct alleles (Na) was 1.746, whereas the average number of effective alleles (Ne) was 1.422 (Table 2). The Shannon's Information Index (I) was 0.381, suggesting a moderate amount of genetic variation. The diversity (h) and unbiased diversity (uh) were 0.252 and 0.275, respectively. The population exhibited a significant proportion of polymorphic loci amounting to 74.56%, thus emphasizing the presence of genetic diversity within the germplasm collection. The data exhibited significant variation, with a total of 114 recorded bands, all having a frequency of 5% or above. Significantly, no often occurring bands in 25% or 50% or less of the populations were identified, hence suggesting a lack of widespread prevalence in band frequencies.

As for the EBAP markers, a total of 95 loci across 12 population samples were detected. The average number of distinct alleles (Na) was 1.821, whereas the average number of effective alleles (Ne) was 1.537 (Table 2). The Shannon's Information Index (I) was 0.457, suggesting a moderate to high amount of genetic variation. The diversity (h) and unbiased diversity (uh) were 0.309 and 0.342, respectively. The population revealed a significant proportion of polymorphic loci reaching 82.11%, thus highlighting the presence of genetic diversity within the germplasm collection, like other molecular markers. In total, 95 bands were recorded, each with a frequency of 5% or above. Suggestively, no bands that were prevalent in 25% or less or 50% or less of the populations were observed, thus suggesting a lack of general uniformity in band frequencies.

Additionally, UPGMA dendrograms based on Dice distance were constructed for each marker (Fig. 2). The ISSR data revealed the presence of two major clusters contributing to the observed grouping. The initial cluster comprised the genotypes 21,261 to 21,266 (last five digits of IPEN codes), with the remaining accessions being grouped together. The SCoT markers resulted in the formation of three major clusters; the initial cluster included the genotypes 21,261, 21,262, 21,264, and 3,2229 (last five digits of IPEN codes), the second cluster comprised the genotypes 21,263, 21,265, 21,266, and 21,267, while the last cluster encompassed the genotypes 19,674, 19,635, 14,191, and 19,193 (last five digits of IPEN codes). Concerning EBAP markers, three primary clades were identified once again. The first one comprised of the genotypes 21,261, 21,262, 21,263 and 21,264, the second one consisted of the genotypes 19,635, 19,193, 19,674 and 14,191 genotypes, while the third one included the 21,265, 21,266, 21,267 and 3,2229 genotypes (last five digits of IPEN codes). These findings suggested marker-specific variations in the genetic structure analyses of the studied R. canina germplasm collection. The consistent clustering patterns across markers implied reliability in the observed genetic grouping, and the distinct composition of genotypes within each cluster indicated potential genotypic associations and underlying genetic relationships.

Fig. 2
figure 2

Cluster analysis and Bayesian inference of genetic differentiation between ex-situ-maintained genotypes of wild-growing Rosa canina (shown with last five digits of their IPEN accession codes). The results are presented from the left to the right for inter simple sequence repeats (ISSR), start codon-targeted (SCOT) and exon-based amplified polymorphism (EBAP) analysis, respectively. a. b. c. Two-D principal component analysis (PCA) plot of the first two components for the 12 R. canina genotypes; d. e. f. Unweighted pair group method with arithmetic mean (UPGMA) dendrograms using Dice distance; g. h. i. Genetic assignment based on STRUCTURE using the Values of Evanno’s ΔK statistic for the most probable genetic structure model

The genetic structure was further elucidated by PCA using a covariance matrix and data normalization (Fig. 2). The initial two axes for ISSR, SCoT and EBAP markers accounted for a combined variance of 45.1%, 37.3% and 70.4%, respectively. The samples were evenly distributed over all three markers, showing a consistent pattern without any noticeable segregation. In the context of ISSR markers, a predominant concentration of samples was observed within the upper two quartiles, indicating a potential clustering tendency in the upper range. On the other hand, when considering SCoT and EBAP, a significant grouping of samples was depicted in the rightmost quartiles, indicating a tendency towards higher values in these specific markers. This observation implied a potential correlation between the molecular markers utilized and the distribution pattern of the studied plant material.

To gain a deeper insight into the genetic composition of the ex-situ- maintained R. canina germplasm gene pool (a snapshot of the species’ Greek native gene pool), STRUCTURE analysis was employed (Fig. 2). The determination of the optimal number of genetic clusters (K) was based on the ΔK statistic of Evanno (Evanno et al. 2005), and K = 3 was identified as the most fitting value across all molecular markers, including ISSR, SCoT, and EBAP. The latter suggested a consistent and robust genetic structuring pattern within the studied germplasm collection, indicating the representativeness of three distinct genetic subpopulations. The selection of K = 3 for all markers highlighted the consensus in determining the genetic composition of the herein studied germplasm and suggested that the R. canina collection can be effectively classified into three genetically uniform sections. This systematic method not only improved our understanding of genetic linkages throughout the germplasm collection but also offered useful insights for future breeding programs and conservation initiatives that focus on specific genetic clusters.

Discussion

The genetic diversity analysis of the R. canina germplasm collection using ISSR, SCoT, and EBAP markers has demonstrated substantial genetic variation within the species. This was consistent with findings from other research using similar markers in R. canina and other species. For instance, Jamali et al. (2019) reported that ISSR markers can effectively separate R. canina genotypes based on geographic regions, which is crucial for genetic preservation and breeding purposes (Jamali et al. 2019). Similarly, the high genetic diversity observed in our study aligned with findings from Joshi et al. (2021) who has emphasized the potential of ISSR markers in the improvement of the ornamental and perfume industries in roses (Joshi et al. 2021). The SCoT markers have been shown to be informative in evaluating the genetic diversity and relationships in various species. Previous research has underlined that SCoT markers were reliable in assessing genetic relationships among dill genotypes (Kadoglidou et al. 2023). This was also true for the herein study, where SCoT markers contributed significantly to understanding the genetic makeup of R. canina germplasm collection. Xanthopoulou et al. (2015) have also found that SCoT markers were more informative and useful for identifying and analyzing summer squash genetic diversity compared to ISSR markers.

Furthermore, the consistent clustering patterns across the three markers (ISSR, SCoT, and EBAP) indicated robust and reliable genetic grouping within the R. canina germplasm collection. These findings have significant implications for sustainable exploitation strategies and future breeding programs. The identification of distinct genetic clusters within the studied germplasm collection represents a valuable resource for breeding programs aimed at enhancing specific traits in R. canina. In specific, the genetic makeup of different groups of R. canina accessions can facilitate informed decisions about which plants to select in future breeding efforts employing Greek materials; can help identify Greek native biotypes with enhanced levels of bioactive compounds and concomitantly potential health benefits; can detect local biotypes with desirable traits such as high yield potential, enhanced fruit quality, and innate resistance to pests and diseases. Such assets are ultimately aimed to enable future growers to optimize production and meet market demands more effectively.