Cannabis sativa, an ancient cultivated plant, has played a pivotal role in human history, contributing to various industries such as food, textiles, construction, pharmaceuticals, and recreation (Burgel et al. 2020; Hammami et al. 2021). In recent years, the intricate tapestry of this botanical marvel of Industrial hemp has led to significant attention to its potential benefits, especially the therapeutic, as Cannabidiol (Chen and Pan 2021), a non-psychoactive compound, has emerged as a subject of extensive exploration due to its remarkable therapeutic potential, including anti-inflammatory, analgesic, and anxiolytic effects (Yang et al. 2020; Burgel et al. 2020). The pharmacological significance of CBD has spurred investigation aimed at maximizing its yield, specifically in high-CBD hemp cultivars cultivated under diverse environmental conditions (Yang et al. 2020; Abdollahi et al. 2021). To optimize cannabinoid production, breeders must understand genetic variability, environmental cues, and gender-specific traits (Moliterni et al. 2004; van Bakel et al. 2011). Notably, Toth et al. (2020) emphasized the pivotal role of female plants in CBD production, as Crispim Massuela et al. (2022) underscored. The significance of aligning growth phases with CBD content in industrial hemp is underscored by Burgel et al. (2020), highlighting the ongoing need for advancement in cultivation and harvesting and specifically addressing the unexplored dynamics of cannabinoid accumulation in industrial hemp genotypes during different growth stages. This involves discerning the impact of genotype and harvest time on biomass yield, separated into leaves, inflorescence, and cannabinoid yield. Assessing CBD content throughout various growth stages of industrial hemp is imperative for understanding variability and its diverse potential applications across industries. This knowledge is instrumental in developing new hemp strains with heightened CBD content and improved therapeutic benefits (Crispim Massuela et al. 2022; Mostafaei Dehnavi et al. 2022). Abdollahi et al. (2021) further emphasized the intricate relationship between cannabinoid content and genetic traits, growth stages, and environmental conditions. Yang et al. (2020) also recognized a gap in current replicated research on this topic and documented a noteworthy escalation in Tetrahydrocannabinol (THC), CBD, and Cannabigerol (CBG) concentrations during flower development of hemp. Hemp, characterized by its diploid nature and unique sex chromosomes, possesses a complex genetic framework being crucial for its agricultural and industrial traits (Moliterni et al. 2004; van Bakel et al. 2011).

Uncovering the genetic diversity and structure of industrial hemp is paramount for hemp breeding programs, aiming to develop cultivars with enhanced agronomic and industrial traits, underscoring the importance of comprehending the genetic construction of this invaluable plant (Sawler et al. 2015). Baloch et al. (2017) asserted that the analysis of plant genetic diversity is pivotal in plant genetics, breeding, conservation, and evolution. Therefore, understanding the genetic and cannabinoid variation of hemp is not only vital for enhancing its agricultural and industrial characteristics but also holds significance in exploring its therapeutic potential, particularly concerning CBD. Analyzing genetic diversity and structure within industrial hemp populations facilitates identifying and conserving unique genetic resources being crucial for future cultivar development (Shams et al. 2020). In recent years, deoxyribonucleic acid (DNA) markers have emerged as powerful tools for plant genetic analysis (Nadeem et al. 2018). Unlike traditional breeding methods, molecular markers enable swift and precise identification of plants with desired features, such as high CBD content and early detection of plant gender. Significant progress in plant breeding has been witnessed following the introduction of DNA molecular markers (Alsaleh et al. 2022), enabling genetic characterization at early developmental stages. Inter Simple Sequence Repeat (ISSR) markers have demonstrated their value in genetic diversity analysis across various investigations, including hemp research by Kayıs et al. (2010), Lata et al. (2010), Zhang et al. (2014), Khatak et al. (2016), and Shams et al. (2020). These markers have also played a pivotal role in investigating genetic diversity in different crops.

Notably, earlier research in other crops underscores the continued significance of ISSR markers in genetic diversity analyses. The appeal of ISSR markers lies in their high polymorphism, reproducibility, and cost-effectiveness, making them indispensable tools in unraveling genetic variations in diverse plant species. The early separation of male and female plants in cannabis cultivation holds agricultural significance, given the gender-specific influence on key traits (Chandra et al. 2020). Numerous researchers have undertaken various experiments to identify molecular markers for early-stage sex determination. Noteworthy investigations conducted by Törjek et al. (2002) and Sakamoto et al. (2005) have meticulously outlined the transformation of male-specific markers into Sequence Characterized Amplified Region (SCAR) markers. Furthermore, Rode et al. (2005) exemplified this phenomenon by reporting the identification of alleles for SSR markers and revealed distinctions between different X chromosomes. In a related study, Borin et al. (2021) stated understanding the gender-specific traits in hemp is paramount, especially for agriculturally significant attributes.

The quest for a reliable molecular marker for early-stage sexual differentiation in hemp is crucial, necessitating further validation through additional features linked to gender traits. Sesiz et al. (2024) highlighted the utility of a genome-wide association study (GWAS) for discovering genetic markers associated with significant characteristics, providing a more precise and cost-effective alternative to traditional breeding methods. Utilizing molecular markers in GWAS can accurately pinpoint the location of genes controlling vital traits such as gender, offering valuable information for developing more effective breeding techniques, including Marker-Assisted Selection (MAS). Integrating reliable markers for sexual differentiation in the early stages of the crop facilitates efficient sex screening during seedling, enabling the removal of undesired plants and conserving vital resources like labor, water, fertilizers, and space. Beyond streamlining cultivation practices, this approach ensures the intentional cultivation of highly desirable female or male plants (Rode et al. 2005; Borin et al. 2021).

Despite the global economic significance of industrial hemp, a noticeable gap exists in investigations into the CBD content and systematic research on the genetic structure diversity of industrial hemp cultivars or landraces cultivated particularly in Türkiye. Therefore, this research aims to measure CBD content by exploring the development of cannabinoids in the inflorescences and leaves of industrial hemp under open-field conditions during two growth stages: flowering and harvest time. Additionally, the study seeks to assess the variability and genetic structure of industrial hemp populations. Furthermore, the research includes the detection of gender-specific markers in hemp plants. This information can potentially help the development of new hemp cultivars with enhanced agronomic and industrial properties. Moreover, the study may contribute to preserving genetic resources in industrial hemp, fostering the breeding of novel cultivars with improved traits specific to the Turkish context. The implications of this research extend beyond national borders, offering valuable insights into optimizing the cultivation and harvesting of industrial hemp for CBD production on a global scale.

Materials and methods

Plant material

In this research, a total of 43 industrial hemp (Cannabis sativa L.) genotypes were examined. These genotypes were obtained from the Hemp Research Institute, Yozgat Bozok University, and comprised 19 female (♀) plants, 19 male (♂) plants, and five monoecious (⚤) plants. These plants were collected from twelve local Turkish landraces, one Turkish-released cultivar, six foreign cultivars from different international locations, and five unknown landraces of untraceable origin. The cultivation began in the spring of 2021 at the dedicated research and implementation area of the Hemp Research Institute in Yozgat, Türkiye. The site coordinates are 39°49′7.19″ N latitude and 34°48′9.59″ E longitude. The experimental period 2021 had precipitation of 509.6 mm, the mean temperature was 10.1 °C, and no crop was planted the previous year. The cultivation included 19 dioecious and 5 monoecious genotypes, and their identities can be found in Supplementary Table S1. The Federer's seminal work in 1956 inspired the experimental design, which was meticulously executed. The plants were planted with a precise 20 cm spacing between rows, and the area was kept weed-free and disease-free through the strategic application of herbicides and fungicides. Agronomic standards and meticulous plant protection practices were upheld throughout the cultivation period. The harvesting was done in August and September, timed uniquely for each plant to align with their maturation profiles. We categorized the collected samples as follows: (FF): Inflorescence harvested during the flowering stage; (FH): Inflorescence harvested during the harvesting stage; (LF): Leaves harvested during the flowering stage; and (LH): Leaves harvested during the harvesting stage.

Isolation of genomic DNA and ISSR analysis

Fresh leaves were explicitly acquired to isolate total genomic DNA from each, encompassing females, males, and monoecious plants. According to Doyle and Doyle, the Cetyl trimethyl ammonium bromide (CTAB) protocol was employed. (1987) at the Laboratory of Science and Technology Application and Research Center (Bilim ve Teknoloji Uygulama ve Araştırma Merkez; BİLTEM), Yozgat Bozok University, Yozgat-Türkiye. The consequent DNA was evaluated for quantity and quality using 0.8% agarose gel electrophoresis. Following isolation, the DNA was diluted to a concentration of 25 ng µl−1 and used for ISSR analysis. For performing ISSR analysis, at the beginning, 50 ISSR primers were chosen and used for primary screening on four individuals to determine levels of polymorphism and assess primer functionality. The 20 primers that yielded the highest polymorphism were then used to screen all individuals (Table S2). The PCR amplification protocol was based on Alsaleh et al. (2020) and involved amplification of ISSRs from approximately 50 ng of genomic DNA, 0.4 mM Deoxyribonucleotide triphosphate (dNTP), 0.8 pmol ISSR primers, 0.2 U Taq DNA polymerase, and 1X PCR buffer (10 mM Tris HCl pH 8.3, 50 mM KCl, three mM MgCl2) in a 10 μl PCR reaction. The Mastercycler was programmed to perform an initial denaturation step at 94 °C for 2 min, followed by 40 cycles of 1 min at 94 °C at the specific annealing temperature for each primer (as listed in Table S2). The final extension step was carried out at 72 °C for 2 min, followed by a further extension at 72 °C for 10 min, and a final cooling step to 4 °C. The ISSR-PCR products were then run on 2.5% Agarose gels submerged in 1X of Tris, Borate, and Ethylenediaminetetraacetic acid (TBE) buffer using an electrophoresis chamber alongside a standard size marker (100 bp DNA Ladder) to estimate the molecular size. The gels were stained with ethidium bromide solution (at a final concentration of 0.5 µg/ml) for 20 min, and the PCR fragments were visualized on an ultraviolet (UV) transilluminator.

High-performance liquid chromatography (HPLC) quantification of CBD contents

CBD assessment was conducted on different parts (leaves and whole inflorescence) of the tested hemp individuals', as outlined in Table S3. Samples were collected at two distinct growth stages: at flowering and at harvest. The analysis involved manually collecting leaves and whole inflorescence from each individual at each sampling stage. To ensure optimal preservation until the commencement of the required investigations, the samples were securely stored in paper bags within a controlled environment characterized by darkness, low temperature, and low humidity. Then, the collected plant samples were kept in a dark oven at 40 °C until they became dry. They were then ground, weighed, and extracted in methanol at a ratio of 1:100. The extracted samples were re-diluted at 1:100 in methanol, passed through filters, and taken into HPLC vials for analysis. CBD detection was carried out using an HPLC instrument (Shimadzu-LC-20AT, Japan) with a photo-diode array (PDA) detector (Shimadzu-SPD-M20, Japan) at a wavelength of 210 nm. The HPLC system was equipped with a specialized column (Cannabis Analyzer for Potency, 2.7 µm, 150 mm × 4.6 mm, Shimadzu, USA) for cannabinoid detection. The column temperature was maintained at 35 °C, and the mobile phase system used was A (0.1% formic acid in acetonitrile): B (0.1% formic acid in water) in an isocratic mode with a solvent flow rate of 0.3 ml/min. The sample injection amount was 10 µL. The HPLC procedures were conducted at BİLTEM laboratories at Yozgat Bozok University, Türkiye. The cannabinoid content of the samples was determined from the absorbance values of each sample using the calibration chart created by analyzing the calibration solutions in the HPLC device in the first stage and the extracted cannabis plant samples in the second step.

Statistical analysis of phenotypic and molecular data

Forty-three individuals underwent systematic categorization into five discrete groups (A, B, C, D, and E), delineated by their gender, origin, and the intricate insights garnered from population structure analysis. Group A was made up of 13 females from Türkiye, and Group B was composed of 13 males from Türkiye. Group C included six female individuals, out of which five had an unknown lineage. These females were accompanied by the foreign cultivar "Finola." To avoid confusion, we have referred to these six females as Unknown females. Similarly, Group D comprised males sourced from five individuals of untraceable origin, and one male individual of the foreign “Finola” cultivar also accompanied these males. To avoid confusion, we will refer these six males as unknown males. Lastly, Group E comprises a unique set of five monoecious cultivars of wholly foreign origin. We conducted initial analyses, beginning with Descriptive Statistics to explore group differences. Subsequently, Analysis of Variance (ANOVA) was applied to the data from CBD Content Analysis using Microsoft Excel Significant differences were highlighted using Tukey's HSD (honestly significant difference) test with the rstatistics package (Kassambara 2023). Visual representations were generated through Violin Plots to illustrate distribution and variability across groups, utilizing Jeffrey's Amazing Statistics Program (JASP) software version 0.11.1 (Love et al. 2019). Additionally, Factorial Analysis was carried out, and variance interpretation was achieved through Principal Component Analysis (PCA) using XLSTAT 2016.02.28451. The variables used in PCA and their corresponding eigenvalues were documented.

Lastly, Hierarchical Clustering Classification was performed using JASP software version 0.11.1, presenting a hierarchical tree that delineates clusters and relationships among data points on the factorial map. Analysis of Molecular Variance (AMOVA) was done using Genetic Analysis in Excel (GenAlEx) version 6.501 software, developed by Peakall and Smouse (2001). 574 ISSR loci were meticulously recorded from the data generated by 20 carefully selected primers. Each locus was scrutinized twice, adhering to the method delineated by Buntjer (1999), and this verification process was executed within the Cross Checker 2.91 software. The ISSR loci data were binary, with '1' signifying the presence of a marker and '0' denoting its absence within the genomic representation of each respective sample. These binary scores were systematically organized into a binary matrix. The determination of population structure was accomplished using the model-based clustering method, as implemented in STRUCTURE version 2.3.4 software, as described by Pritchard et al. (2000). This analysis was performed under the parameters number of clusters or populations assumed K = 10, with 5000 burning iterations and 50,000 repetitions over ten iterations. Subsequently, the results were subjected to further analysis using Structure Harvester version 0.6.94 software, developed by Earl and Von Holdt (2012), to estimate the optimal number of clusters (K) within the population, following the methodology proposed by Evanno et al. (2005). Principal Coordinate Analysis (PCoA) was conducted employing GenAlEx version 6.501 software. Genetic distance calculations were performed using Diversity Analysis and Representation for Windows (DARWin) version 6.0.21 software by the methodology described by Perrier et al. (2003). The proportion of phenotypic variation explained by gender for each marker was estimated using the coefficient of determination (R2) value in Trait Analysis by aSSociation, Evolution and Linkage (TASSEL 5) version 5.2.86 software (Bradbury et al. 2007). The Bonferroni threshold for multiple testing and an adjusted corrective entry was applied to determine significant associations (Kaler and Purcell 2019). To further enhance our understanding of genetic relationships and individual similarities, we utilized Tassel. This tool enabled the creation of a tree diagram and facilitated the detection of significant similarity among the studied individuals using the Unweighted Pair-Group Method with Arithmetic Averaging (UPGMA).


Cannabidiol variations assessment

HPLC analysis revealed varying levels of cannabinoid constituents. In Table S3, an overview of CBD levels is presented for different combinations of hemp plants. Female individuals such as Van (♀-FF: 8.501%), Maltepe1 (♀-FF: 5.309%), and Mısır Çarşısı (♀-FF: 4.965%) exhibit notably high CBD content in female inflorescence during the flowering stage. These individuals present valuable genetic reservoirs for high-CBD hemp landraces. Individuals such as Kavacik (♀-FF: 1.490%) from Turkish landraces and Finola (♀-FF: 0.554%) and USO31 (♀-FF: 0.822%) from foreign individuals, respectively, displayed relatively lower CBD content. Male inflorescence, like ♂-FF varied from 0.01 to 1.693%, exhibits stable CBD content across individuals. Monoecious individuals showed varied CBD content, and individuals such as Futura75 (⚤-FF: 1.898%) and Santhica27 (⚤-FF: 1.707%) present promising options for cannabinoid extraction. On the other hand, female inflorescence (♀-FF) displayed notable variations in CBD concentrations, demonstrating higher CBD contents compared to other parts or growth stages, ranging from 0.554% to 8.501%.

Developmental stages also played a crucial role; female inflorescence at the flowering stage (♀-FF) exhibited a mean CBD content of 3.258%, which decreased to 1.746% at the harvesting stage (♀-FH) (Table 1). Similarly, within monoecious plants, significant intra-plant variability was observed. Inflorescence from the flowering stage (⚤-FF) had an average CBD content of 1.409%, while inflorescence from the harvesting stage (⚤-FH) averaged 0.873%. Despite comparatively lower CBD content, monoecious individuals remained valuable for CBD production, emphasizing the pivotal importance of precise harvest timing to optimize CBD yields. Additionally, ANOVA analysis underscored significant variations in CBD levels observed across various plant parts and developmental stages, as elucidated in Table S4a. The calculated F-statistic of 18.11, surpassing the critical F-value (1.96), indicated highly significant differences in CBD content among these groups. Female inflorescence during the flowering stage (♀-FF) exhibited the highest average CBD content at 3.258%, emphasizing the pinnacle of cannabinoid synthesis during this phase. Similarly, Monoecious inflorescence at the flowering stage (⚤-FF) displayed an average CBD content of 1.409%, comparable to specific sets of female inflorescence, signifying their potential as valuable sources of cannabinoids. In contrast, leaves from various stages across females, males, and monoecious plants displayed significantly lower CBD contents, ranging from 0.16 to 0.57% (Table S4b). Tukey's Honestly Significant Difference (HSD) test identified a group with noteworthy differences in CBD content across plant types and growth stages, as detailed in Table S5. Gender-Based Differences in CBD Content: Significant variations in CBD content among females were observed across different plant parts. The Kruskal–Wallis test results, detailed in Table S6, reinforce our earlier findings, confirming substantial disparities in CBD content among traits (H (9) = 80.609, p < 0.001). This non-parametric analysis aligns with the outcomes of the ANOVA, underscoring the robustness of the identified differences.

Table 1 Descriptive Statistics of Cannabidiol Contents of Leaves and Inflorescences at Flowering and Harvesting Stages in an Industrial Hemp Panel

Furthermore, the chi-squared test conducted, characterized by a test statistic of 162.995 and 26 degrees of freedom, revealed highly significant differences (< 0.001), as detailed in Table S6. Finally, Violin plots were utilized to visually represent CBD content distributions across various plant types and growth stages. This graphical approach showcased the data spread, deepening our understanding of the dataset's intricacies (refer to Fig. 1).

Fig. 1
figure 1

The distribution of Cannabidiol (CBD) content across different plant parts, types, and growth stages of female (♀), male (♂), and monoecious (⚤) plants, with samples categorized as follows: FF (inflorescence from the flowering stage), FH (inflorescence from harvesting stage), LF (leaves from the flowering stage), and LH (leaves from harvesting stage)

Principal Component Analysis (PCA) confirmed the outcomes observed in other statistical analyses, revealing distinct eigenvalues. The primary components (F1 and F2) were pivotal in elucidating the most significant variance in CBD contents (Fig. S1). Specifically, F1 contributed 54.8% to the total variance, while F2 accounted for an additional 31.5%, collectively explaining 86.389% of the variability (Fig. 2). The Factor analysis elucidated patterns within the dataset. RC1 exhibited substantial loadings for all traits, indicating a shared underlying factor influencing these traits. Specific characteristics such as female-FF, female-FH, female-LH, and female-LF demonstrated high loadings on RC1, suggesting a common genetic or environmental basis. RC2 indicated distinct associations, primarily involving monoecious groups, pointing towards unique factors governing monoecious hemp genotypes (Fig. S2).

Fig. 2
figure 2

Bioplot diagram of cannabinoid profiles across hemp genotypes: Influence on CBD synthesis in individuals and distinct pattern

Conversely, monoecious samples exclusively from foreign individuals exhibited distinct CBD patterns (Fig. 2). Comparative analysis of sexual types unveiled notable differences in CBD content. Dioecious plants such as Medical and Van displayed diverse CBD profiles, reflecting genetic heterogeneity within this category. Conversely, monoecious individuals from different international origins exhibited specific cannabinoid patterns.

Defining genetic population structure

STRUCTURE software was employed, coupled with the Structure Harvester program, to investigate intraspecific differentiation and population structure to gain detailed insights. The model-based clustering approach identified the optimal number of populations (K) as 3, a determination supported by a significant Delta K value of 7 (Fig. S3). The results exhibited a high correspondence based on sex nature, emphasizing the reliability of the clustering. Considering a Delta K value of 7.08386 at K = 3, the individuals were distinctly clustered mainly based on sex, forming three clusters (Fig. 3a). The first cluster comprised Unknown females, monoecious plants, and two Turkish females (Kavcik and Van). The second cluster exclusively included males, whether Unknown, foreign or domestic. The third cluster was composed solely of Turkish females, highlighting clear gender-based genetic distinctions. The mean F-statistics (Fst) values (0.3168, 0.1479, and 0.1115) suggest varying degrees of genetic differentiation among the three clusters. Further exploration using K5 and K8 revealed additional layers of genetic structuring (Fig. 3a). At K5, individuals clustered not only based on their sex nature but also relatively on their geographical origins, forming six groups. The first group consisted of nine Turkish females. The second group included Unknown females, monoecious samples, and two Turkish females (Kavcik and Van), indicating potential genetic similarity despite geographical disparities. The third group comprised nine Turkish males, underscoring their unique genetic structure. The fourth group consisted of four males, three Turkish, and the male unk-5. The fifth group comprised two Turkish females (Maltepe2 and Narlisaray), forming a distinctive cluster. The last group included six males, with the majority being Unknown, alongside the presence of the Turkish landraces Narlisaray (Fig. 3b). In the K8 analysis, the individuals were partitioned into seven clusters, describing relative similarities based on gender and geographical origin. The first cluster included six Turkish females, and the second cluster comprised Unknown females, monoecious plants, and Turkish females (Kavcik and Van). The third cluster included four Unknown males, while the fourth cluster consisted of four males, three Turkish, alongside the Unknown-1. The fifth cluster included exclusively male individuals, predominantly Turkish, with the presence of the Unknown-5. The sixth cluster comprised three Turkish females, indicating their unique genetic composition. The final group included Turkish females of Narlisaray and Maltepe2, underscoring their distinct genetic profile within the broader population (Fig. 3c).

Fig. 3
figure 3

The survey of genetic relationship among 43 cannabis individuals using STRUCTURE at K = 3, 5, and 8 (a, b, and c), respectively. Distinct groups are represented by different colors, and each bar symbolizes a sample. The predominance of a color over others in a bar signifies that the sample belongs to the group represented by that color

Exploring genetic diversity in hemp populations and principal coordinate analysis (PCoA)

Genetic distance analysis revealed intraspecific variation in cannabis sativa individuals. Within Turkish females, distinct genetic distances were apparent, averaging 0.223. For example, Maltepe4 and Eminönü3 displayed a genetic distance of 0.151, indicating a notable genetic lowest differentiation between Turkish female individuals. In contrast, Maltepe2 and Kavacik exhibited a higher genetic distance of 0.313, suggesting a substantial genetic distinction between these Turkish female individuals. Among Turkish male individuals, a range of genetic distances was observed, with an average higher than Turkish females 0.226. Female individuals also exhibited varying genetic distances, reflecting their genetic diversity. For instance, a relatively low genetic distance of 0.133 was observed between Finola and Unk-1, indicating a close genetic relationship among these females. Conversely, a higher genetic distance of 0.215 was noted between Unk-2 and Unk-3, suggesting a more distant genetic relationship between these individuals. Similar patterns of genetic differentiation were observed among Unknown male individuals. The genetic distance matrix highlighted significant diversity, with values ranging from 0.127 to 0.215.

In contrast, Unkown females also displayed a more downward average genetic distance (0.171) than their male counterparts (0.183). Unknown males also exhibited a close genetic relationship. The monoecious individuals displayed intriguing patterns of genetic differentiation. The higher genetic distance was observed between Ferimon and US03 (0.249), while the lowest one was observed between Ferimon and Santhica27 (0.142).

Interestingly, monoecious individuals exhibited an average genetic distance of 0.194. This intermediate value suggests a moderate genetic diversity within this subset.

The pairwise population matrix of Nei genetic distances provides, as evidenced in Table S7, an additional, comprehensive overview of the genetic differentiations among distinct hemp populations. Analyzing the genetic distances between Turkish females and Turkish males reveals a relatively low genetic distance of 0.031, indicating a moderate level of genetic divergence within the Turkish Cannabis population. Comparing Turkish females with Unknown females or males reveals higher genetic distances (0.044 and 0.037, respectively), signifying more significant genetic divergence between these groups. The lowest genetic distance between Turkish and Unknown males (0.027) suggests a closer genetic relationship. At the same time, the highest distance was noted between Unknown females and Turkish males (0.057). Notably, the genetic distances between the monoecious population and other groups are moderate, ranging from 0.039 to 0.055.

Furthermore, the genetic distances among monoecious individuals and other populations are notably intermediate, emphasizing a relatively homogenous genetic composition. In parallel, the Nei genetic identity analyses, as illustrated in Table S8, accentuated significant genetic similarities among specific population pairs. Turkish and Unknown males exhibit the highest important genetic identity of 0.973. Turkish females and Turkish males also show a remarkably high genetic identity of 0.969, indicating a shared genetic background between these gender groups. Comparing Turkish females with Unknown females reveals a genetic identity value of 0.957. These values suggest substantial genetic overlap between Turkish and Unknown populations, emphasizing the potential historical gene flow and shared genetic heritage. The genetic identity among monoecious populations and their divergence from other groups (ranging from 0.947 to 0.962) is noteworthy. The relatively genetic identity values within the monoecious populations underscore the genetic coherence among monoecious individuals, indicating a shared genetic background despite geographical distinctions.

PCoA represented in Fig. 4 analysis delineated four distinct clusters within the population, with notable nuances in their composition. The Turkish female and male populations, as well as Unknown female and male populations, exhibited clear and separate genetic profiles. A particularly intriguing revelation was identifying an admixture cluster encompassing Unknown females, monoecious populations, and one Turkish female individual.

Fig. 4
figure 4

Relationship between individuals visualized by Principal Coordinate Analysis (PCoA) using Inter Simple Sequence Repeat (ISSR) data

The PCoA focusing on different gender populations (females, males, and monoecious) revealed substantial distinctions. Specifically, males exhibited notable differences from the female or monoecious population and maintained a significant separation from each other (Fig. S4). In contrast, the monoecious population was intricately entwined and closely associated with the female population, indicating a higher degree of genetic similarity between monoecious individuals and females. PCoA conducted based on the genders and origins of the studied groups (Fig. 5) revealed precise divergent hereditary positions. Notably, the data displayed logical arrangements for the individuals or populations, with the closest genetic distance observed between monoecious and Unknown female groups. Conversely, the most significant genetic disparities were evident in the placements between Unknown females or monoecious and Turkish males.

Fig. 5
figure 5

Principal Coordinates Analysis based on the genders and origin of studied groups

Analysis of molecular variance (AMOVA)

Employing AMOVA revealed that 8% of the total genetic variance (SS = 585.191, df = 4) was attributed to differences among these populations. The predominant portion of genetic variation (92%) was observed within the populations (SS = 3320.251, df = 38) (Fig. 4 and Table S9).

Genetic Diversity Within-Population: Turkish female and male populations exhibited higher genetic diversity (Na = 1.507 ± 0.036, Ne = 1.226 ± 0.011) than their Unknown counterparts, indicating a richer allelic variation among these Turkish groups (Table 2). Shannon's Information Index (I = 0.258 ± 0.009) also reflected this heightened diversity, emphasizing the complexity of genetic information within these populations. Expected Heterozygosity (He) and Unbiased Expected Heterozygosity (uHe) were also comparatively higher in Turkish populations.

Table 2 Band Frequencies, Estimated Allele Frequencies, and Estimated Heterozygosity by Population for Binary (Diploid) Data Mean and standard error (SE) over Loci of the Populations

Genetic Differences Between-Populations: Comparing populations, Turkish females and males displayed significantly higher genetic diversity (percentage of polymorphic loci = 74.91% and 70.73%, respectively) compared to Unknown females, Unknown males, and monoecious individuals.

Overall Genetic Diversity: The grand mean across loci and populations revealed an average genetic diversity (Na = 1.097 ± 0.018, Ne = 1.199 ± 0.005) and polymorphism (percentage of polymorphic loci = 53.52% ± 8.00%). Table S10 presents a detailed summary of band patterns among five distinct Cannabis populations, shedding light on the unique genetic signatures within each group. Turkish Female and Male populations exhibit 435 and 416 different bands, respectively. Notably, 55 private bands in Turkish Females and 43 in Turkish Males indicate distinctive genetic markers exclusive to each gender, underlining gender-specific genetic variability. Comparatively, Unknown female, Unknown male, and monoecious populations display 244, 273, and 243 bands, respectively. Among these, eight private bands in Unknown females, 6 in Unknown males, and 18 in monoecious individuals signify rare, population-specific genetic elements within the Unknown groups. Intriguingly, the absence of locally common bands (< = 25% and <  = 50%) across all populations.

Genetic relationships among individuals revealed by UPGMA

As depicted in Fig. 6, the dendrogram revealed a complex phylogenetic structure characterized by the emergence of four primary branches; two were separate, one for females of Narlisaray and one for Maltepe2 unique genetic profiles. The central cluster (A1), encompassing a diverse array of female and male plants, mainly Turkish, exhibited further complexity upon closer examination. It bifurcated into two sub-branches, one subbranch displayed striking homogeneity among nine genetically Turkish females (A-1), and the other was a more extensive cluster (A-2), encompassing a diverse array of female, male, and monoecious plants, exhibited further complexity upon closer examination, and revealing intriguing patterns of genetic relatedness. For instance, Unknown females formed a distinct sub-group, closely located to the Turkish Kavacik individual, and samples from the monoecious category clustered in the other sub-branch, including the female landraces of Van. The second sub-branch of the cluster (A2) primarily consisted of male individuals, incorporating Unknown group males and two Turkish males, Narlisaray and Maltepe1. A fascinating pattern emerged among males within the second main branch of dendrograms (B). Turkish males constituted a predominant cluster within this branch. Upon closer examination, this main branch revealed further complexity. It is divided into two distinct sub-branches, each with its unique composition. One sub-branch exclusively comprised males from Turkish individuals (B-1). In contrast, the other sub-branch (B-2) included a mixture of Turkish males and one male of unknown identity, specifically identified as "Unk-5."

Fig. 6
figure 6

Unweighted Pair-Group Method with Arithmetic Averaging (UPGMA) tree showing relationships between 43 industrial hemp individuals revealed by Inter Simple Sequence Repeat (ISSR) markers. Individual names are followed by F, M, and Mo letters, which correspond to females, males, and mon8oecious

Identification of a sex-associated 785-bp fragment

The Mixed Linear Model (MLM), incorporating Population structure (Q) and Kinship matrix (K), revealed one marker-trait association (MTA) at a significance level of p < 0.05, particularly associated with the male gender (Table 3). A distinct 785 bp fragment was consistently observed in all male plants tested. At the same time, it was conspicuously absent in female and monoecious plants. This specific fragment 785 bp, generated by the ISSR880 primer, exhibited a strong association with the male sex phenotype in hemp. The presence of this band exclusively in male individuals and its absence in females and monoecious plants suggest its potential as a sex-linked marker in hemp (Fig. S5).

Table 3 The marker that shows a significant association with male-gender using Mixed Linear Model (MLM), incorporating Population structure (Q) and Kinship matrix (K) models in industrial hemp


CBD variability in industrial hemp individuals

While industrial hemp is primarily cultivated for fiber and seed production, the presence of cannabinoids adds significant value to these crops. To optimize harvest outcomes, it is essential to understand the changes in chemical composition at various stages of hemp growth, which is pivotal in maximizing the value of this industrial crop. As a foundational step toward establishing a cannabis breeding program, our study aimed to characterize the CBD content in 43 hemp individuals, mainly from Türkiye. Our findings highlight the potential of Turkish hemp as a new region for diverse cannabis landraces with varying CBD content. Delving into various plant parts and growth stages, we uncovered substantial variations in CBD levels. Female plants, particularly during the flowering stage, exhibited significantly higher CBD content than other plant parts, highlighting the importance of precise developmental timing in maximizing cannabinoid production. This finding aligns seamlessly with previous research, including works by Burgel et al. (2020), Yang et al. (2020), and Abdollahi et al. (2021) all of which consistently demonstrated that the flowering stage represents the peak of CBD synthesis. This consensus not only strengthens the significance of our observations but also emphasizes the pivotal role played by the flowering stage in cannabinoid production across diverse hemp genotypes.

Furthermore, our comprehensive analysis indicated a general decrease in CBD contents across all studied individuals at the harvesting stage, highlighting the need for precision in timing the harvest to optimize CBD yields. Specific individuals, specifically Mısır Çarsısı, Maltepe1, and Van, demonstrated notably elevated CBD content (4.965%, 5.309%, and 8.501%) in female inflorescences. These landraces, identified as valuable genetic resources, hold promise for developing high-CBD hemp strains. These landraces exhibit substantial CBD content, underscoring their potential for medicinal and industrial uses.

In the Yozgat region, our investigation uncovered significant variations in CBD content among the same hemp cultivars, namely "Finola, USO 31, Santhica 27, Futura 75, Ferimon, and Fedora 17," with respective values of 0.554%, 0.822%, 2.040%, 3.086%, 1.794%, and 2.280%. Notably, these values were markedly higher than the CBD levels reported by Burgel et al. in 2020, which were 0.152%, 0.111%, 0.114%, 0.147%, 0.233%, and 0.331% for the corresponding cultivars.

Furthermore, Glivar et al. (2020) also documented lower CBD concentrations within two seasons for USO 31, Santhica 27, Ferimon, and Fedora 17, with values (except for Finola) of 0.757% and 0.479%. It is noteworthy that Finola exhibited relatively similar CBD concentrations in both our study and the findings of Glivar et al. (2020). Notably, Turkish individuals consistently displayed significantly higher CBD levels in comparison. These disparities in CBD content among the same hemp cultivars in different investigations highlight the influence of environmental factors, cultivation practices, and possibly genetic variations on cannabinoid production. The observed disparity in CBD content between Turkish, Unkown, and Foreign cultivars underscores the intricate nature of cannabinoid synthesis in industrial hemp.

Further genomic and molecular investigations are essential to unravel the underlying mechanisms to identify the specific genetic factors contributing to the elevated CBD levels observed in Turkish individuals. Environmental factors, encompassing climate, soil composition, and sunlight exposure, profoundly influence plant metabolism, including cannabinoid biosynthesis. Variations in these parameters between Turkish cultivation regions and foreign hemp-growing areas could underpin the observed differences in CBD content. Microclimatic conditions, in particular, might modulate the expression of genes related to cannabinoid production. Exploring the specific environmental nuances of each region is essential for gaining insights into the ecological regulation of CBD synthesis. The disparity in sample size between Turkish, Unknown, and foreign individual’s raises intriguing questions about the potential influence of sample size on CBD content differences. It is worth noting that the notion that the smaller number of Unknown or foreign cultivars caused the lower CBD synthesis in their types should be reconsidered, especially since published reports by many researchers for the same sorts showed lower CBD contents for the same individuals; it aligns relatively with our conclusion. This consistency suggests that factors beyond sample size might be at play, urging further comprehensive investigations into the intricate dynamics of cannabinoid synthesis in different hemp genotypes.

Genetic population structure in Cannabis Sativa

Indeed, recent years have witnessed a surge in research exploring the genetic diversity within Cannabis species, including industrial hemp, employing molecular tools like ISSR markers. These investigations have made significant contributions to our understanding of the genetic structure of Cannabis populations. There are many reasons for utilizing ISSR primers to discern the genetic variatıon of industrial hemp: versatility and polymorphism of ISSR markers, cost-effectiveness and ease of use, and rapid results and applicability in population genetics. Moreover, several investigations have successfully utilized ISSR markers in various aspects of Cannabis genetics research. These investigations have explored genetic diversity, hermaphroditism, population structures, and the characterization of seized Cannabis, highlighting the versatility and effectiveness of ISSR markers in addressing diverse research questions in Cannabis genetics. Structure analyses unraveled intricate patterns of intraspecific differentiation. The model-based clustering approach identified three distinct populations (K = 3) within the studied individuals of Cannabis sativa, supported by a substantial Delta K value of 7, signifying robust clustering. These clusters correlated with the sex of the plants, revealing evident genetic distinctions among males, females, and monoecious individuals. Mean Fst values (0.3168, 0.1479, and 0.1115) indicated varying degrees of genetic differentiation among the three clusters, supporting the existence of distinct genetic subpopulations. Additional peaks in the Delta K plot suggested finer-scale genetic structures and further exploration with K5 and K8 uncovered other layers of gene structuring (Fig. S3). Exploring higher K-values (K = 5 and K = 8) revealed additional layers of genetic structuring in the study. At K = 5, individuals not only clustered by sex but also showed relative grouping based on geographical origins, highlighting the influence of both genetic and geographical factors on population diversity. At K = 8, individuals were partitioned into seven clusters, showing similarities based on both gender and geographical origin. The observed genetic lineages within Turkish females, Unknown females, and various male groups emphasized the intricate interplay of genetic factors and geographic origins in shaping genetic diversity among Cannabis sativa individuals, suggesting and highlighting shared genetic elements between these individuals. This aligns with Liu et al. (2023) exploration of the genomic basis of geographical differentiation and breeding selection for plant architecture traits.

Genetic analysis of individuals and hemp population relationships

In the Turkish subset, both males and females exhibited diverse genetic configurations. Females showed a slightly lower average genetic distance. Analyzing the genetic distances between Turkish females and Turkish males reveals a relatively common genetic distance, indicating a moderate genetic divergence within the Turkish Cannabis population. While notable, this low divergence and the close genetic proximity indicate shared genetic traits influenced by historical and geographical factors, potentially indicating gender-specific gene flow. Unknown females and males displayed varying genetic distances, highlighting overall diversity. Despite geographical disparities, Unknown males exhibited close genetic relationships. In monoecious individuals, genetic distances indicate moderate diversity. The genetic identity among monoecious populations and their divergence from other groups is noteworthy. The relatively genetic identity values within the monoecious populations underscore the genetic coherence among monoecious individuals, indicating a shared genetic background despite geographical distinctions.

Gender-based variations were apparent, with lower genetic distances in females and an intermediate value in monoecious groups, suggesting nuanced genetic differentiation. It has been observed that males of Turkish origin and those of Unknown origin share a high genetic identity value of 0.973. Similarly, Turkish females were compared with Unknown origin and foreign monoecious females. Results indicate a genetic identity value of 0.957 and 0.962, respectively. These values indicate a substantial genetic overlap between Turkish, Unknown, and foreign populations, emphasizing the potential historical gene flow and shared genetic heritage. Genetic identity analysis can guide breeders in selecting parent plants with optimal genetic distances to enhance desired traits. The observed genetic distance also highlights the potential for harnessing diverse genetic traits from these individuals, contributing to developing hybrid cultivars with desired characteristics.

Notably, the data PCoA displayed logical arrangements for the individuals or populations, with the closest genetic distance observed between monoecious and Unknown female groups. Conversely, the most significant genetic disparities were evident in the placements between Unknown females or monoecious and Turkish males.

Clear and separate genetic profiles by PCoA for females and males of Turkish, Unknown, and monoecious underscored significant gender distinctions. Despite the proximity between Turkish and Unknown males, their distinct clustering suggested similar genetic sources of localized adaptation or historical isolation, highlighting regional factors' impact on genetic differentiation. An intriguing revelation was identifying an admixture cluster comprising Unknown females, monoecious populations, and a Turkish female individual, indicating complex genetic intermingling and potential shared genetic heritage or historical connections. This admixture suggests interactions and gene flow among diverse groups, providing valuable insights into their evolutionary history. Focusing on different gender populations, the PCoA highlighted significant distinctions between males, females, and monoecious individuals, with males maintaining a considerable separation from females. The close association between the monoecious and female populations indicated higher genetic similarity. Furthermore, the PCoA based on both gender and geographic origin revealed precise divergent hereditary positions, emphasizing shared genetic traits between monoecious and Unknown female groups and significant genetic disparities between Unknown females and Turkish male groups. These findings underscore the influence of both gender and geographic origin on the genetic structure of the Cannabis sativa populations studied.

Analysis of molecular variance

AMOVA analysis elucidated the intricate genetic dynamics in Cannabis sativa populations, highlighting the substantial impact of gender and geographical origin on overall genetic diversity. Eight percent of the total genetic variance was ascribed to population differences, emphasizing the vital role of localized environmental factors and individual genetic variation within designated groups. This finding resonates with prior research, aligning with research by Soorni et al. (2017) and Soler et al. (2017), highlighting substantial within-population variation compared to between-population estimates. The high variation within Iranian Cannabis populations (93.09% to 95.74%) observed by Soorni et al. underscores the importance of understanding the complexity of genetic diversity within specific regions. Additionally, Soler et al.'s findings, indicating significant variation among individuals within cultivars, emphasize the role of within-individual variation, especially concerning heterozygosity, in shaping the genetic landscape of Cannabis populations. The study by Shams et al. (2020) further supports our findings, emphasizing the intricate genetic diversity among studied populations. Their reported high genetic differentiation (Gst = 0.36) and commendable genetic diversity (Hs = 0.141) highlight the complexity of genetic interactions within Cannabis populations. Our study unveiled substantial differences between Turkish, Unknown, and foreign monoecious populations. Turkish female and male groups exhibited higher genetic diversity, as indicated by various parameters such as the number of alleles, Shannon's Information Index, Expected Heterozygosity, and Unbiased Expected Heterozygosity, indicating a broader genetic pool within these groups. Private congenital bands in Turkish females and males underscored gender-specific genetic variability unique to Turkish Cannabis individuals and emphasized the substantial genetic diversity inherent in Turkish Cannabis. In contrast, Unknown populations showed fewer private bands. The absence of commonly observed bands (< = 25% and <  = 50%) across all populations is intriguing, suggesting a scarcity of widely shared genetic traits and highlighting the distinctive genetic composition within each population. These findings underscore the discernible genetic structure differentiating Turkish populations from foreign counterparts or Unknown populations. Regional adaptations, historical factors, or selective pressures potentially influence this structure.

Phylogenetic relationships

The resulting dendrograms revealed a complex phylogenetic structure characterized by distinct clusters, shedding light on the genetic profile of these Cannabis sativa individuals. Turkish female individuals displayed unique genetic profiles, with separate branches for Narlisaray and Maltepe2. This diversity emphasizes the significant genetic variation even among individuals of the exact geographical origin. Such different genetic signatures within Turkish female individuals underscore the importance of targeted breeding efforts and conservation strategies for preserving these unique genetic resources. The giant main branch (A) encompassed a diverse array of female, male, and monoecious plants, revealing intricate patterns of genetic relatedness. One sub-branch highlighted the homogeneity among nine genetically similar Turkish females (A-1), indicating shared genetic ancestry and a potential common breeding history. In contrast, another sub-branch exhibited a mixed composition, comprising Finola, Unknown females, monoecious plants, and Turkish females. Despite geographical disparities, this diverse group showed substantial genetic affinity, suggesting shared evolutionary trajectories and historical genetic exchanges among these populations. The proximity of Finola, Unknown females and monoecious plants within these sub-branches confirmed intriguing possibilities of gene relatedness, transcending geographical distances. Turkish males formed a predominant cluster within the second main branch (B), indicating a strong genetic affinity among these individuals. This clustering suggests the presence of shared genetic elements, reflecting a common genetic heritage among Turkish male hemp plants. Further analysis revealed complexity within this main branch, with two distinct sub-branches. One sub-branch exclusively comprised males from Turkish individuals, highlighting a specific genetic lineage among these domestic cultivars. In contrast, the other sub-branch (B-2) included a mix of Turkish and Unknown males, precisely the cultivar "Unk-5." The coexistence of Turkish, Finola, and Unknown males in the same sub-branch (A-2) also indicates potential genetic kinship between these plants, suggesting genetic relationships that transcend geographical boundaries.

Significance of the identified sex-associated marker in hemp breeding

The detected MTA indicated a statistically significant association between the molecular marker ISSR880bp785 and the male gender trait, with an explanation of 50% of phenotypic variation suggesting extremely substantial explanatory power for the observed variability in gender. These findings offer valuable insights into the potential role of the ISSR880bp785 marker in influencing the gender trait in the studied plan. The discovery of the sex-associated 785-bp fragment, ISSR880, marks a significant breakthrough for the industrial hemp industry, carrying crucial implications for cultivation practices. Accurate gender identification is essential for optimizing hemp breeding efforts, with male plants playing a pivotal role in controlled cross-pollination to develop new hemp cultivars. This gender-specific marker expedites the identification and selection of male plants, streamlining the breeding process and facilitating the creation of hemp cultivars tailored to specific purposes. While ISSR880 shows promise, acknowledging potential challenges is essential.

Further validation research must confirm its specificity across hemp cultivars, assess stability under diverse environmental conditions, and ensure applicability to different genetic backgrounds. Collaborative efforts among geneticists, breeders, and biotechnologists are crucial to addressing these challenges and fully leveraging ISSR880's potential in industrial hemp breeding. The development of user-friendly assays, including sequence-characterized amplified region (SCAR) markers, should be a focus to facilitate widespread adoption in hemp breeding programs guided by the gender-specific marker ISSR880. Marker-assisted selection becomes a powerful tool, providing hemp breeders with a precise method for selecting male plants, ensuring a more efficient and targeted breeding process in industrial hemp cultivation.


In this investigation, our scrutiny delved into the intricate realms of cannabinoid variability and genetic diversity within Cannabis sativa, with a particular focus for the first time on Turkish hemp populations, a novel exploration in the scientific domain. Discerning noteworthy differences in CBD content across diverse plant parts and growth stages underscores the necessity for meticulous harvesting practices. The identification of high-CBD Turkish individuals unveils promising prospects for the augmentation of breeding programs aimed at optimizing cannabinoid yields. This regional specificity in CBD production has significant implications for scientific research and industrial applications. Our genetic analyses unraveled pronounced distinctions between male, female, and monoecious plants alongside subtle yet discernible genetic variations segregating Turkish, Unknown, and foreign monoecious populations. The richness of Turkish hemp landraces may provide novel allelic diversity. Of paramount significance is the unearthing of a sex-associated 785-bp fragment emanating from the ISSR880 primer, which will serve as a distinctive gender-specific marker exclusive to male individuals within the industrial hemp milieu, a revelation poised to exert transformative influence within the hemp industry. The amalgamation of these findings regarding CBD content disparities and genetic intricacies is pivotal for shaping the trajectory of future research endeavors and conservation initiatives, providing invaluable guidance for the sustainable cultivation and reasonable utilization of industrial hemp.