Cannabis sativa L. (hemp or cannabis) belongs to the small family of Cannabaceae, which comprises of ten genera with Humulus (hop, 3 species) as sister genus (Yang et al. 2013). The genus Cannabis is monotypic (Small and Cronquist 1976) and it has been distributed globally as one of the oldest known crop plants. The use of C. sativa can be characterized as multi-purpose of fibre for paper, textile or construction materials (Karus and Vogt 2004), as seeds for food and feed (Callaway 2004) and the female inflorescence as medicine (Ben Amar 2006), and psychotropic drug (Szendrei 1998).

Central Asia is regarded as origin of C. sativa (de Candolle 1885; McPartland et al. 2018). According to McKim (2003), the first archaeological discoveries in China are from the Neolithic period, around 4000 BC, while Long et al. (2017) date the first utilization to 8,000 BC. In contrast to the prevalent Central-Asia-Origin hypothesis of C. sativa, molecular evidence reveals that this species probably comes from a low latitude region of India (Zhang et al. 2018a). The most profound localisation of the origin was addressed by McPartland et al. (2018), who identified the northeastern Tibetan plateau near Qinquai Lake as origin by pollen analysis. From there, C. sativa spread over 6 Ma ago to Europe, 1.2 Ma ago to eastern China and 32.6 thousand years ago to India. C. sativa favors a mild climate with sufficient water and sunlight, and early humans spread it into a range of favorable temperate and sub-tropical niches where it becomes naturalized throughout Eurasia, in parts of Africa, and more recently in the New World (Clarke and Merlin 2016).

Molecular plant phylogeographic studies have mostly relied on the chloroplast (cp) genome because of the low mutation rate of this single and non-recombining unit of inheritance (Schaal et al. 1998). In C. sativa, cpDNA has been ascertained as a valuable tool for such analysis with sufficient variability on the interpopulational level (Gilmore et al. 2007; Zhang et al. 2018a).

Genebank collections are composed of genetic materials with original sources from in situ conditions or from breeding/research programs. Many of these materials can no longer be found in situ for a variety of reasons (Fowler and Hodgkin 2004). Gene banks were created at the beginning of the twentieth century as repositories of genetic material to preserve genetic diversity and to provide easy access of genetic materials to breeders (Fowler and Hodgkin 2004). The difficulty of maintaining such ressources for medicinal plants is their uncountable number of taxa (Lohwasser and Weise 2020).

This study develops molecular markers from cpDNA of C. sativa for analyzing the variability and relationsships of cannabis accessions of the genebank Gatersleben and published plastomes.

Material and methods

Development of chloroplast SNP markers

The chloroplast genomes of three C. sativa genotypes (Genbank ID/cultivar: KR363961/‘Yoruba Nigeria’ (Oh et al. 2016), KP274871/‘Carmagnola’, KR779995/‘Dagestani’ (Matielo et al. 2020)) were assembled, SNPs localised and primers for SNP candidates developed with Primer3 (Untergasser et al. 2012) as implemented in Geneious Prime 2020.2.2 (Biomatters Ltd.). The SNP candidates were tested with a test set of 33 individuals of different accessions with high-resolution melting analysis. Of the 29 SNP candidates, 8 markers were selected on the basis of curve types easy to distinguish. Two individuals representing the two curve types per marker were selected and furtheron added as references to each run.

Sample material

Fifty-three accessions of the IPK collection were grown in the greenhouse and leaves from 10 plants per accession were sampled with the exception of CAN33 and CAN60, where only 9 and 8 plants could be sampled, respectively. Therefore, the study was comprised of 527 Cannabis individuals. The leaf samples were dried at 38 °C in a drying oven and stored until analysis.

DNA Extraction, PCR and HRM

DNA was extracted with a modified CTAB method (Schmiderer et al. 2013). Concentration and quality of the DNA were determined on a 1.5% agarose gel electrophoresis and a NanoDrop 2000 (Fisher Scientific). HRM with pre-amplification was performed on a Rotor-Gene 6000 (Qiagen). For a PCR reaction in 10 µl, 1 µl of genomic DNA (1:100 dilutions of the original DNA extract) was added to a master mix containing 1 × HOT FIREPol® EvaGreen® HRM Mix (no ROX) (Solis BioDyne) and 100 nM forward and reverse primers (ordered at Life Technologies), respectively. The PCR cycle profile included a denaturation step at 95 °C for 14 min, followed by 45 cycles (95 °C for 10 s, 59 °C for 20 s and 72 °C for 20 s) with a final denaturation step at 95 °C for 30 s. For high-resolution melting curve analysis (HRM) the temperature was increased from 69 °C to 81 °C by 0.1 °C/s. All reactions were completed in duplicates with non-target controls in each run.

Statistical analysis

The statistical analyses were done with R 3.6.2. (R Core Team 2019) under Rstudio 1.2.5033 (RStudio Team 2019) using the packages poppr (Kamvar et al. 2014, 2015) and ggtree (Yu et al. 2017). Distances were calculated according to Prevosti et al. (1975), the Simpson index (Simpson 1949), Nei’s expected heterozygosity (HEXP) (Nei 1978), as well as the genetic differentiation using GST (Hedrick 2005). As a measure of linkage disequilibrium, \({\bar{r}}\)d (an adapted form of the index of association IA (Brown et al. 1980)) were calculated (Agapow and Burt 2001).

The identified haplotypes were compared with Geneious Prime 2020.2.2 (Biomatters Ltd.) to complete chloroplasts of hop species (Humulus lupulus (KT266264, MG573060), H. scandens (MH118122) and H. yunnanensis (MK423880)) and cannabis (C. sativa (KY084475), C. sativa cv. ‘Carmagnola’ (NC026562), C. sativa cv. ‘Dagestani’ (KR779995), C. sativa cv. ‘Yoruba Nigeria’ (NC027223), C. sativa cv. ‘Yunma 7’ (MW013540), C. sativa (MH118118) and C. sativa cv. ‘Cheungsam’ (KR184827)).


Marker development

Fifty-three accessions of C. sativa from the genebank Gatersleben were analysed with 8 chloroplast markers using high-resolution melting analysis (HRM). In order to detect chloroplast markers, three published chloroplast genomes of C. sativa were aligned and 38 polymorphisms (16 indels and 16 SNP, of which 7 transitions) were identified (Supplementary Table 1). Candidates were preselected on their theoretical suitability for high-resolution melting analysis (cf. exemplarily to Supplementary Fig. 1). Those candidates were evaluated with a small sample set and then narrowed to a set of one INDEL (marker P18) and seven SNP’s (Tables 1, 2). As in many plant species, cannabis chloroplast DNA contains two inverted repeats (26,011 bp each), which separate a large single copy region (84,059 bp) from a small single copy region (17,829 bp) (Zhang et al. 2018b). All but one markers were located in the large single copy region, only marker P10 was in the small single copy region. Five markers were intergenic, one (P21) in an intron of rps16, and two in a coding region (P18 in matK-trnK-UUU and P12 in rps11) (Table 1).

Table 1 cp Markers used for the analysis on the 527 individuals of Cannabis sativa
Table 2 Characteristics of the eight selected markers

Description of the markers

The expected heterozygosity of the markers over all populations was in average 0.37 with most markers in a narrow range between 0.45 and 0.5 (Table 2) while two markers were very low with 0.063 (P26) and 0.077 (P10). The average GST of all markers was 0.87 with a GST of P26 with 0.76 and P10 with 0.96 as two extremes, while all other markers ranged between 0.87 and 0.91.


In total, 6 haplotypes could be identified, denominated haplotypes ‘A’ to ‘F’ (Figs. 1, 2). 84% of the individuals were classified either as haplotype ‘A’ (34%) or ‘F’ (50%). All other haplotypes are filling successively the gap between the two most extreme haplotypes ‘A’ and ‘F’ (‘B’: 4%, ‘C’: 1%, ‘D’: 3% and ‘E’: 8%). Thirty-seven accessions (70%) were homogeneous with only one haplotype (Fig. 1). Out of the 37 homogeneous populations, 21 were pure ‘F’, 10 pure ‘A’, 3 pure ‘E’, 2 pure ‘B’ and 1 pure ‘D’. The haplotype ‘C’ was only present in mixed accessions. In the heterogenous accessions, 12 were found with two haplotypes and four with three haplotypes. Most of the heterogenous populations (9 out of 12 the populations with two haplotypes) contained the haplotypes ‘A’ and ‘F’.

Fig. 1
figure 1

Number of multilocus genotypes observed in each population. N = 527, number of haplotypes = 6 (A–F)

Fig. 2
figure 2

Minimum spanning network. Genetic distance, relationship between individuals, A–F Haplotypes identified in this study

No geographical pattern of haplotypes could be observed from the accessions’ passport data (data not shown). However, the most commonly known European fibre type cultivars consisted of haplotype ‘F’ or ‘F’ mixed with another haplotype (in most cases mixed with the ‘A’-type). For ‘Fibrimon’ and ‘Kompolti’, three accessions per cultivar from different providers were in our sample set. All ‘Fibrimon’ accessions were pure ‘F’ haplotype, while two accessions of cv. ‘Kompolti’ were ‘F’-type and one a mixed ‘A/F’-type. Other fibre cultivars in the genebank could also be grouped in either pure ‘F’-type or mixed ‘A/F’-type. Pure ‘F’-types were ‘Fibrimon 21’, Juso 14’, ‘Fasamo’ and ‘Schurig’. Mixed ‘A/F’-type were ‘Fibrimon 56’, ‘Eletta Campana’, ‘Superfibra’, ‘Lorrin 110’, ‘Futura’ and ‘Havelländische’. The Italian ‘Carmagnola in Selezione’ was – as a singular exception amongst the fibre accessions—a pure ‘A’-type.

Basic populations statistics were calculated for the heterogenous populations separately and for all populations as an overall mean (Table 3). The Shannon–Wiener index of haplotype diversity ranged in the heterogenous populations from 0.33 to 0.944 and the Simpson Index from 0.18 to 0.56. A number of mixed populations had only one individual of a different haplotype (evenness = 0.57). In only one population (52) the number of individuals of different haplotypes was in balance (evenness = 1). The expected heterozygosity ranged from 0.06 to 0.41. The level of linkage disequilibrium was in all heterogenous populations highly significant and ranged from 0.58 to 1. Overall (so including all individuals), the Shannon–Wiener index was 1.19, Simpson’s index 0.63, the evenness 0.73, the expected heterozygosity 0.37 and the index of association 0.53. In the analysis of molecular variance (AMOVA), the populations were well differentiated with 79% of the variation located among populations Table 4).

Table 3 Basic population statistics from heterogenous populations (no. of haplotypes > 1)
Table 4 Analysis of molecular variance of C. sativa accessions (AMOVA) (ø Sample total = 0.788)

All three Humulus species with published plastomes (H. lupulus, H. scandens and H. yunnanensis), as well as C. sativa (Herbarium of the Kunming Institute of Botany, province Yunnan, China (Zhang et al. 2018b) and the cannabis fibre variety ‘Yunma 7’ (Deng et al. 2021)) were characterized as haplotype ‘B’. Therefore, haplotype ‘B’ can undoubtedly be regarded as the ancient haplotype where the other haplotypes in cannabis were evolved from (Fig. 2). None of the cannabis polymorphisms were polymorphic in the hop chloroplasts, all mutations were occurring in Cannabis after separation of the two genera from their common ancestor. Two genebank accessions in our study were belonging to haplotype ‘B’, one from France (no further background information available) and one from Spain designated as ‘Kongo Hanf’ (= ‘Congo hemp’). Two of the published cannabis plastomes were also belonging to haplotype 'B', (cv. 'Yunma 7' and an accession without any further description, submitted by the Kunming Institute of Botany, Yunnan, China (GenBank accession no. MH118118)). Further published cannabis plastomes were haplotypes ‘A’ (cv. ‘Carmagnola’), ‘C’ (cv. ‘Cheongsam’) and ‘F’ (cv. ‘Yoruba Nigeria’ and cv. ‘Dagestani’).


Marker development

Chloroplast markers have several advantages, such as maternal inheritance. As a result, they are usually useful to explore genetic structure and gene flow between rather than within populations, field of applications are evolutionary studies, migration of plants (biogeography) and profiling genotypes and gene pools. The highly significant linkage disequilibrium, that are usually in plants used to identify clonality, could have been also expected for non-recombining maternal lineages as in chloroplasts or mitochondria.

Also in C. sativa, nuclear marker variability (e.g. expected heterozygosities of 0.68 (Gilmore and Peakall 2002) or 0.75 (Soler et al. 2017)) was higher than chloroplast variability (expected heterozygosity of 0.37 in our study). However, one study showed expected heterozygosities of nuclear markers below that of our results (0.22 to 0.32 (Lynch et al. 2016)).

Nuclear markers revealed higher intrapopulational cannabis variability, demonstrated with microsatellite markers which attributed only 32% to the variation between cultivars, while 37% and 31% was intra-cultivar and intra-individual, respectively (Soler et al. 2017). Chloroplast markers moved the focus to the higher level of between cultivar variability with 69% (Zhang et al. 2018a) to 79% (this study). The major disadvantage of cp (or mt) markers are their limitations in absolute numbers, 38 cp markers in total based on three cannabis chloroplast genomes in comparison to (fractional) 24,710 ncSNPs (Soorni et al. 2017) or 14,031 ncSNPs (Sawler et al. 2015). However, depending on the type of query, just a few, but powerful cp markers may deliver sufficient information for an intra-specific classification e.g. for identifying different genepools (Gilmore and Peakall 2002).

Overlapping accessions in different cpDNA studies

Gilmore et al. (2007), developed 5 cpDNA and 2 mtDNA markers with good discrimination power of accessions and identified with this set 6 haplotypes that clustered the samples into three haplotype groups. Comparing some jointly used fibre cultivars showed that the grouping of Gilmore et al. (2007) was not the same as ours. Zhang et al. (2018a) sequenced with 5 primer pairs in highly variable cp regions and identified 23 haplotypes that grouped nicely into 3 haplotype groups. Overlapping accessions exhibited that our haplotypes ‘C’ and ‘F’ could be found in haplogroup ‘H’ in the study of Zhang et al. (2018a) and our haplotype ‘A’ fell into haplogroup ‘M’. This could be evident that haplotype ‘A’ was already in Central and Western China, while haplotype ‘F’ is found in North-East and North-West of China. Studies of Mongolian cannabis accession could possibly bridge the two distribution areas of haplogroup ‘M’/haplotype ‘F’, providing strong evidence of the origin of this haplogroup/haplotype already in todays Northern China or North of China, followed by a Western migration to Europe. This mutational differentiation of the chloroplast in the North was probably accompanied by ecological adaptation to Northern conditions (e.g. day length, temperature) that made the ‘F’-type so successful in migrating west. This example of finding common ground between studies shows that harmonizing cannabis cpSNP would create an effective tool in the future.

Is the origin of cannabis in the Chinese province Yunnan?

The haplotype ‘B’ is common in Cannabis and Humulus and must have been present in the common ancestor of the two genera. The identification of the original haplotype allowed the determination of the sequence of the eight cp mutations used here over time because of the non-recombining maternal lineages. Cannabis and Humulus diverged between 18.23 mya (8.83–36.56 mya) (Zhang et al. 2018a) and 27.8 mya (McPartland 2018). Pollen analysis pinned the center of Cannabis origin to the northeastern Tibetan plateau near Qinquai Lake from where it spreaded west to Europe (6 mya ago), east to eastern China (1.2 mya ago) and to India (only 32.6 thousand years ago) (McPartland et al. 2018). The spreading was supported by the estimation of the crown age of Cannabis to 2.24 mya (0.81–5.81 mya) (Zhang et al. 2018a). The two ‘B’-type accessions submitted to the genebank Gatersleben from France (no further information provided) and Spain (‘Congo hemp’) were not helpful in tracing back the origin of cannabis. More informative were the two published plastomes with haplotype ‘B’ from independent sources, a plastome from a plant of the Herbarium of the Kunming Institute of Botany (province Yunnan, China) (Zhang et al. 2018b) and the plastome of cv. ‘Yunma 7’, a main cultivar in fibre production (Deng et al. 2021), bred in Yunnan province (Amaducci et al. 2015). This province has a long tradition of using cannabis (Clarke and Gu 1998) and is one of the main production areas of hemp fibre in China (Deng et al. 2021). Provided that both samples were originally collected from natural stands in Yunnan, cannabis could have had its origin in this province.

Cp haplotypes in the European breeding history for fibre use

In Europe, domestication of Cannabis was occurring in the copper/bronze age indicative of an domestication event independent from the Chinese domestication (McPartland et al. 2017). Most European fibre cultivars were derived from European landraces and consisted—at least partly—of haplotype ‘F’, corresponding to the ‘fibre-type’ haplotype ‘1,122,121’ of Gilmore et al. (2007). The monoecious German cultivar ‘Fibrimon’ (three accessions in the genebank, all ‘F’-type accession) was found to be bred from old German origins, probably landraces, namely ‘Schurig’ (‘F’-type) and ‘Havelländer’ (‘A/F’-type), both originally of Central-Russian origin (Hoffmann 1961). Since monoecisms is a desired trait for fibre use, but rare in C. sativa, most of the French cultivars (e.g. ‘Fibrimon 21’ (‘F’-type) and ‘Fibrimon 56’ (‘A/F’-type) go back to ‘Fibrimon’ (de Meijer 1995). The Hungarian variety ‘Kompolti’ (three genebank accessions, two ‘F’-type, one ‘A/F’-type) was obtained from ‘Fleischmann hemp’ which had its origin in Italy (de Meijer 1995). The Romanian cultivar ‘Lovrin 110’ (‘A/F’-type) was derived from Bulgarian landraces and the Russian ‘Juso 14’ (‘F’-type) from ‘JUS-6’, a crossing between a Southern origin, a Northern Russian dwarf origin, and the German ‘Odnodomnaya Bernburga’ (de Meijer 1995). The Italian ‘Eletta Campana’ (‘A/F’-type) originated from a selection from a Northern Italian landrace ‘Carmagnola’ and high fibre strains from Germany (de Meijer 1995). In breeding, selection itself is not restricted to geographical distinct entities, but includes all materials useful and approachable. Chinese fibre strains e.g. were the basis for some fibre cultivars in the United States at the beginning of the twentieth century. The cultivar ‘Chington’ (China-Washington), extensively used by hempseed growers in Kentucky, was developed from seeds obtained from Hankow, China (Dewey 1927). The Hungarian three-way hybrid ‘Kompolti Hybrid TC’ had also a Chinese component (de Meijer 1995). So, many cultivars used today may be based on already early global exchange explaining the occurrence of the ‘A’-type in fibre cultivars.

Accessions in genebanks are a mixture of donations from many different sources, genebank exchanges and own collection trips. Collection trips can usually be regarded as only trustful sources when it comes to a defined geographic origin, since donations are in most cases of selected materials, collected from mostly undefined sources planted in a field and often pollinated without isolation. That is demonstrated by the variety ‘Kompolti’, present in the genebank from three different donors. ‘Kompolti’ is usually a pure ‘F’-type. One accession, however, was a mixture of the haplotypes ‘F’ and ‘A’. Since chloroplasts are only maternally inherited, accidential cross-pollination can be ruled out. Such a mixture can only origin by mixing seeds. For cannabis, even collection trips were often not reliable sources due to the exchange of genetic materials for hundreds of years and over long distances and subsequent subspontaneous naturalization, either unintentionally or by cultivation of illegitimate strains hidden in a natural environment (Szendrei 1998). Therefore, it is difficult to distinguish natural from naturalized populations in cannabis.