Genome-wide development of insertion-deletion (InDel) markers for Cannabis and its uses in genetic structure analysis of Chinese germplasm and sex-linked marker identification

Pan, Gen; Li, Zheng; Huang, Siqi; Tao, Jie; Shi, Yaliang; Chen, Anguo; Li, Jianjun; Tang, Huijuan; Chang, Li; Deng, Yong; Li, Defang; Zhao, Lining

doi:10.1186/s12864-021-07883-w

Genome-wide development of insertion-deletion (InDel) markers for Cannabis and its uses in genetic structure analysis of Chinese germplasm and sex-linked marker identification

Research
Open access
Published: 05 August 2021

Volume 22, article number 595, (2021)
Cite this article

Download PDF

You have full access to this open access article

BMC Genomics Aims and scope Submit manuscript

Genome-wide development of insertion-deletion (InDel) markers for Cannabis and its uses in genetic structure analysis of Chinese germplasm and sex-linked marker identification

Download PDF

Gen Pan^1,2^na1,
Zheng Li¹^na1,
Siqi Huang^1,2,
Jie Tao¹,
Yaliang Shi¹,
Anguo Chen^1,2,
Jianjun Li^1,2,
Huijuan Tang^1,2,
Li Chang^1,2,
Yong Deng^1,2,
Defang Li^1,2 &
…
Lining Zhao^1,2

2297 Accesses
8 Citations
4 Altmetric
Explore all metrics

A Correction to this article was published on 13 September 2021

This article has been updated

Abstract

Background

Cannabis sativa L., a dioecious plant derived from China, demonstrates important medicinal properties and economic value worldwide. Cannabis properties have been usually harnessed depending on the sex of the plant. To analyse the genetic structure of Chinese Cannabis and identify sex-linked makers, genome-wide insertion-deletion (InDel) markers were designed and used.

Results

In this study, a genome-wide analysis of insertion-deletion (InDel) polymorphisms was performed based on the recent genome sequences. In total, 47,558 InDels were detected between the two varieties, and the length of InDels ranged from 4 bp to 87 bp. The most common InDels were tetranucleotides, followed by pentanucleotides. Chromosome 5 exhibited the highest number of InDels among the Cannabis chromosomes, while chromosome 10 exhibited the lowest number. Additionally, 31,802 non-redundant InDel markers were designed, and 84 primers evenly distributed in the Cannabis genome were chosen for polymorphism analysis. A total of 38 primers exhibited polymorphisms among three accessions, and of the polymorphism primers, 14 biallelic primers were further used to analyse the genetic structure. A total of 39 fragments were detected, and the PIC value ranged from 0.1209 to 0.6351. According to the InDel markers and the flowering time, the 115 Chinese germplasms were divided into two subgroups, mainly composed of cultivars obtained from the northernmost and southernmost regions, respectively. Additional two markers, “Cs-I1–10” and “Cs-I1–15”, were found to amplify two bands (398 bp and 251 bp; 293 bp and 141 bp) in the male plants, while 389-bp or 293-bp bands were amplified in female plants. Using the two markers, the feminized and dioecious varieties could also be distinguished.

Conclusion

Based on the findings obtained herein, we believe that this study will facilitate the genetic improvement and germplasm conservation of Cannabis in China, and the sex-linked InDel markers will provide accurate sex identification strategies for Cannabis breeding and production.

View this article's peer review reports

Genome-wide analysis of microsatellite and sex-linked marker identification in Gleditsia sinensis

Article Open access 17 July 2020

Whole-genome resequencing of wild and cultivated cannabis reveals the genetic structure and adaptive selection of important traits

Article Open access 27 July 2022

Population structure and genetic diversity of Moroccan cannabis (Cannabis sativa L.) germplasm through simple sequence repeat (SSR) analysis

Article 10 October 2023

Introduction

Cannabis sativa L., a member of the family Cannabinaceae, is a diploid (2n = 20) monocotyledon and one of the oldest cultivated plants. Although it originated in Central Asia, its cultivation was soon commenced worldwide for applications in folk medicine, textile fibre, oil, and recreational use [1]. Cannabis is a botanical genus of flowering plants divided into two distinct species, namely Hemp and marijuana, based on its tetrahydrocannabinol (THC) content [2]. Although Cannabis cultivation is being restricted in many countries due to its widespread usage as a recreational drug, there has been a resurgence of interest for its agronomic potential and especially its medical value; its outer and inner stem tissues can be used to prepare bioplastics and concrete-like material in construction sectors owing to the rich source of both cellulosic and woody fibres, and its metabolites exert potent bioactivities on human health especially for the treatment of pediatric seizure disorders.

Cannabis is a dioecious species, which includes both male and female flowers separated on different plants. The sex of the plants commonly affects economically relevant traits like fibre quality and cannabinoid (CBD) content. In general, male plants have a better fibre quality, while CBD content in female plants is higher than that in male plants. Therefore, an ideal ratio of male-to-female individuals must be maintained with different production purposes to improve economic efficiency. However, it is difficult to identify the sex of plant via the mere examination of morphological traits before flowering, and DNA molecular marker technology has been considered as an accurate and reliable method for the sex identification of dioecious plants, as it is unaffected by plant growth stages [3].

Conventional breeding is considered the primary method for developing new varieties in Cannabis breeding programs. However, this process is extremely challenging and often spans several years [4]. Previous studies have indicated that advancements in molecular technologies offer several molecular breeding strategies, such as the use of molecular markers to overcome the limitations of conventional breeding [4, 5]. A shift from isozyme and random amplified polymorphic DNA (RAPD) to amplified fragment length polymorphism (AFLP), simple sequence repeat (SSR), and single nucleotide polymorphism (SNP) has occurred, and these markers have been used for genetic analysis and sex identification in Cannabis [6,7,8,9,10,11,12,13,14,15]. Although different types of Cannabis molecular markers have been identified and utilized, research on Cannabis is lagging compared to other crops like rice, wheat, and maize. As a result, the density of molecular markers in Cannabis is relatively low, which is insufficient for genetic study in Cannabis, including genetic map construction, gene/QTL mapping, and genetic analysis.

Insertion–deletions (InDels) are recognised as major sources of genetic structural variations found widely distributed across the plant genomes. InDels like SSRs are also a type of length polymorphisms originating from a single mutation event, which is generally bi-allelic and single-locus in nature. Meanwhile, InDels exhibit many desirable inherent genetic characteristics of both SNP and SSR markers, such as co-dominance, abundance, and random distribution across the genome [16]. Generally, unlike SNP, InDel markers have been considered breeder-friendly markers, with limited infrastructure requirements, and its products can be detected in regular genetics and breeding laboratories using polyacrylamide gel electrophoresis (PAGE) or simple gel-based size separation procedures. Furthermore, InDels markers are commonly amplified without stutter bands, which renders them more valuable. In a few previous studies, InDels were also found to be more polymorphic than microsatellite markers [17, 18]. As a valuable complement for both SNPs and SSRs markers and owing to their significance in crop genomic studies, InDel markers have been widely identified in rice [19], barley [20], oil rapes [18, 21], maize [22], and other plants [23,24,25,26], and to our knowledge, no research on genome-wide development of InDels in Cannabis has been reported so far. This knowledge gap limits the comprehensive molecular analysis of Cannabis.

China has been considered one of the putative centres of origin for Cannabis, and a region where Cannabis has been cultivated for more than 2000 years for obtaining fibre, oil, and for other purposes [27]. However, the fibre yield, fibre quality, and CBD content are vital factors limiting the development of the Cannabis industry in China, rendering significance to the genetic improvement of the Cannabis crop cultivated in China. Previous studies have shown that the genetic structure analysis of the germplasm can facilitate genetic improvement in other crops [28, 29]. Until now, the genetic diversity and population structure of Cannabis were analysed using SSR and ISSR markers [9, 10, 30, 31]. However, in Cannabis, most SSR and ISSR markers usually display multiple loci [9, 10, 30, 31], thereby posing challenges in the application of molecular analysis such as the comparison of genes/QTLs detected using different genetic populations in Cannabis. Alternatively, the single-locus nature of InDels may help overcome this drawback of multi-locus SSRs and ISSRs. Though the draft genome sequences data were published in 2011 [32], data quality has not met the criteria for genome-wide development of InDel markers and the location of such valuable markers in the Cannabis chromosome has not been elucidated. Recently, a high-quality chromosome-scale reference genome of a drug-type strain “Purple Kush” and the hemp variety “Finola” were obtained, which enabled the genome-wide capture of InDels in the Cannabis genome [33]. With the objectives to increase the density of molecular markers of Cannabis genome and to establish a significance for SSR markers in Cannabis genomic studies, the present study focused on the genome-wide development of InDels and the application of these markers in genetic structure analysis of Chinese germplasm and identification of sex-linked marker in Cannabis. Our study results will help establish a valuable tool for the molecular analysis of Cannabis in the future, and the information on the genetic structure of the Cannabis germplasm and sex-linked marker will aid the genetic improvement and molecular breeding of Cannabis.

Result

Distribution of InDel markers

Data on whole genomes for “Purple Kush” and “Finola” were downloaded from ftp://ftpmips.helmholtz-muenchen.de/plants/barley/public_data/. On a genome-wide basis, 47,558 InDels were identified between PK and FN in the genomic DNA sequence database (Table S1). InDel sites varied from 4 bp to 87 bp, and the number of the InDel sites decreased markedly with an increase in the InDel length. Four InDel sites were found to be the most common InDel sites (11286), accounting for 23.7% of the total InDels (Fig. 1). Meanwhile, the distribution of the InDels on each chromosome of the FN genome was different. As shown in Fig. 2, the number of InDels on each chromosome ranged from 2177 to 5081. Chromosome 5 exhibited the highest number of InDels among the Cannabis chromosomes, while chromosome 10 exhibited the lowest. Additionally, the densities of InDels on each chromosome were different, and chromosome 9 exhibited the highest density of InDels (67.5 InDels/Mb) while chromosome 2 exhibited the lowest (44.5 InDels/Mb) (Fig. 2, Fig. 3).

Development of InDel markers for whole Cannabis genome and polymorphism analysis

In total, 47,558 InDel markers between FN and PK were successfully developed, with a density of 47.1/Mb in the FN genome. Of these InDel markers, 31,802 InDel markers were non-redundant based on the specificity, and its density in the FN genome was found to be 31.51/Mb (Table S1). The lengths of all primers ranged between 18 bp and 24 bp, and the product sizes ranged from 80 bp to 400 bp. Eighty-four primer pairs distributed along the chromosomes with intervals of about 10 Mb were selected to evaluate the quality of InDel markers across three Cannabis varieties (Fig. S1). The results showed that 80 primers were amplified successfully, and 38 primers exhibited polymorphisms among three varieties (“Yunma 6”, “Neimengudali”, “Qingdama 1”). Of all the polymorphism primers, 14 primers which exhibited two alleles among the above-mentioned three varieties were used for further study.

Genetic diversity analysis and population structure

The 14 InDel primers were used to analyse the genetic relationships of 115 accessions, and a total of 39 polymorphic bands were amplified. The PIC ranged from 0.1209 to 0.6351, with an average of 0.4109, and the gene diversity varied from 0.1243 to 0.6865, with an average of 0.4664. The average MAF was 0.6484 and ranged from 0.4478 to 0.9348 (Table 2). Thereafter, cluster analysis was conducted based on the unweighted pair-group method with arithmetic means (UPGMA) using the NTSYS-pc2.11 software. As showed in Fig. 4, at a genetic distance of 0.74, the 115 accessions were divided into two groups. Group I included 84 accessions, mainly consisting of the varieties cultivated in the northern regions of China (up to 90%). Group II included 31 accessions, and most of them were from the southern regions of China (90.3%).

In PCoA, the two main axes explained approximately 59% of the total variation, at 44 and 15%, respectively. The 115 Cannabis varieties could also be classified into two groups using the genetic similarity matrix, which was similar to cluster analysis results (Fig. 5).

Based on the 39 alleles amplified using 14 InDels, the population structure of the 115 individuals was further estimated under the Hardy–Weinberg Equilibrium using the STRUCTURE V2.3.3 software. Delta K values were plotted against K values, and the best number of clusters was obtained via the Structure Harvester platform (http://taylor0.biology.ucla.edu/structureHarvester/). As shown in Fig. 6, Delta K reached a maximum value at K = 2, which indicated that the 115 cultivars could be partitioned into two populations (Fig. 6).

As showed in Table 1, the flowering time of 115 Cannabis genotype varied from 23 days to 125 days. Thereafter, cluster analysis was conducted using IBM SPSS Statistic 19.0 with the longest distance method and the Euclidean distance square. As shown in Fig. 7, at an inter-class distance of 25, the 115 genotypes were divided into two groups; group 1 included 34 cultivars, which mainly originated from the southern regions of (30), and group 2 contained 81 cultivars, most of which were from the northern regions of (74), such as Northwest China (15) and Northeast China (37).

Table 1 Information about the origins and flowering time for 115 Cannabis cultivars used in this study

Full size table

Screening of sex-linked InDel markers and PCR-based verification of known-sex plants

Based on the latest report which indicated that chromosome pair 1 was the sex chromosome pair in Cannabis [34], fifteen pairs of primers evenly distributed on chromosome 1 were designed and used to amplify twelve samples (six females and six males) from the F₂ population crossed by “Yunma 6” and “H4” (Table S2). As shown in Fig. 8a and Fig. 8d, two primers pairs (Cs-I1–10, Cs-I1–15) amplified two bands in male plants (251 bp and 398 bp; 293 bp and 141 bp), while one band (398 bp; 293 bp) was amplified in female plants.

To further verify the versatility and accuracy of the two primers pairs, samples from 24 known-sex plants from the dioecious variety, “H4”, and 10 known-sex plants from the feminized variety, “ZY1” were used for amplification via PCR, respectively. The results showed that 12 female plants showed amplification of the 398-bp fragment, while 12 male plants showed amplification of the two bands (398 bp and 251 bp in size) (Fig. 8b). Consistent with the amplification fragment in female plants of “H4”, all plants from “ZY1” showed amplification of the 398-bp fragment using the Cs-I1–10 primer pair (Fig. 8c). Similar to the Cs-I1–10 primer pair, 12 female plants and all plants from “ZY1” showed amplification of the 293-bp fragment, while 12 male plants showed amplification of the two bands (293 bp and 141 bp in size) using Cs-I1–15 primer pair (Fig. 8e-f).

Discussion

Although different types of molecular markers, such as RAPD, ISSR, SSR, SNP and ARFP, have been used in the molecular biology studies conducted on Cannabis, such as genetic diversity analysis, sex identification, and QTL mapping [9, 10, 12, 15, 30, 31], these molecular makers remain fewer in number compared with those available for other crops, which poses challenges for genetic map construction and QTL mapping. In addition, a genome-wide survey of InDels has not yet been carried out for Cannabis. In this study, 31,802 InDel markers were identified in the Cannabis genome, and the average density across the FN genome was 0.031 InDels/kb (Table S1), which was much less compared to that found in other species such as rice, oilseed rape, maize and cotton [18, 22, 35, 36].

Molecular analyses like map-based gene cloning, GWAS, and MAS, rely on the availability of several genetic markers with detailed information of their position on the genome. The PCR-based InDel markers are extensively applied during initial mapping to identify unknown genes in rice, maize, wheat, and other crops [22, 37,38,39,40]. However, due to a lack of availability of chromosome-scale genome assembly, information about their physical position on the chromosome is not available [9,10,11,12,13,14,15, 30], which hinders the comprehensive molecular analysis of Cannabis. In this study, 26,982 InDel markers were developed with a density of 26.7 InDels/Mb. Notably, the exact physical positions of all identified InDels on the Cannabis genome were also determined, rendering it convenient to identify InDel markers in target genome regions, which, in turn, would help accelerate map-based cloning and marker-assisted trait selection research in Cannabis.

To analyse the population structure of the 115 Cannabis germplasms from the varieties cultivated in China, 84 InDels distributed along the Cannabis chromosomes with intervals of approximately 10 Mb were selected for the polymorphism analysis, and 38 InDels were found to exhibit polymorphism among three accessions. The polymorphism rate was 45.2%, similar to the extent in chickpea (46.6%) [41], lower than that found in jute (58%) [26], and higher than that in maize (18.68%) [22], which indicated that the polymorphism rate might relate to the plant species. Additionally, of the 36 InDels, 14 InDels amplifying only two fragments were selected for the genotyping of the 115 accessions. The PIC values ranged from 0.1209 to 0.6351, with an average of 0.4109, indicating that most of the InDels have a moderate range of genetic diversity, lower than that of SSR markers in Cannabis [10]. The possible reason was that most InDels used in this study are single-locus (Fig. S2), while, in general, SSRs are multi-locus.

The genetic structure of different genotypes can guide breeding programs for developing varieties with a broad genetic background. The genetic diversity of the Cannabis germplasm has been analysed using two types of markers: SSR and ISSR [9, 30, 31]. In the present study, 39 fragments were amplified using the 14 InDels, and when Delta K was at a maximum value of 2, the 115 accessions were partitioned into two subgroups. In group 1, the sharing proportion of the cultivars of group 2 ranged from 0.011 to 0.453, while in group 2, its sharing proportion of group 1 varied from 0.011 to 0.336 (Table S3). Most cultivars from the northern regions of China belonged to Group I, while most cultivars from the southern regions belonged to Group II (Fig. 6). Similar to the results of population structure analysis, the 115 accessions were clearly clustered into two major groups using UPGMA clustering (Fig. 4). As Cannabis is an annual and photoperiod-sensitive crop, and the day length may determine the floral transition and flowering times, we suggest that the climate, influenced by the latitude and day length, is an essential factor affecting the Cannabis germplasm diversity. In this study, the 115 accessions from China were distinctly classified into two groups (Figs. 4, 5, 6 and 7), and the two groups were consistent with the temperate climate and subtropical climate zones in China, respectively, which was in agreement with the analysis of Gao et al. (2014) and Zhang et al. (2018) [9, 42]. Additionally, both group I and group II included the cultivars from central regions of China like the HeNan provinces, implying that the breeders in these areas might frequently exchange Cannabis germplasm resources with the breeders from the northern or southern regions.

Cannabis is a short-day crop, which is sensitive to photoperiod. Flowering time is an important agronomic trait that affects cannabidiol (CBD) and fibre yield content. Consistent with the population structure analysis, PCoA analysis and UPGMA clustering results (Figs. 4, 5 and 6), 115 Cannabis genotypes were also clustered into two groups according to their flowering time. The cultivars of the group 1 mainly originated from Southern China. In contrast, group 2 was mainly composed of varieties from the northern regions of China (Fig. 7). In general, when the northern Cannabis cultivars are introduced to the southern regions, the plants will encounter early flowering. In this study, though the cultivars ‘22’ and ‘214’ originated from northern China, the plants did not encounter early flowering when cultivated in the southern regions of China (HuNan province), which might support the notion of a superior germplasm for developing wide adaptable Cannabis varieties according to day length.

Owing to the different economic values between female and male plants, a suitable ratio of females to males individuals is vital for enhancing economic efficiency. To overcome the difficulties of the accurate identification of sex through morphological methods before flowering, eight pairs of markers mainly consisted of RAPD markers were reported for sex identification in Cannabis [11,12,13,14,15]. However, these RAPD markers had a common shortcoming of poor repeatability and dominance. Additionally, the accuracy of 8 pair markers for sex identification was only validated by using natural populations, thus limiting its application in the Cannabis breeding program [11,12,13,14,15]. In this study, the two primer pairs, Cs-I1–10 and Cs-I1–15, were screened for sex identification, and except for the natural populations, an F1-segregated population and a feminized variety were used to verify its accuracy (Fig. 8). Thus, its applications are broader than those previously reported for sex identification in Cannabis breeding program. Interestingly, similar to the sex-linked SSR markers CS308 [14], the same fragments in size were presented in both female as well as male plants using Cs-I1–10 and Cs-I1–15, indicating these markers were not specific to the Y chromosome, which was different from the markers MADC1 to MADC3 on Y chromosome [11,12,13].

Conclusion

In this study, we first developed 31,802 non-redundant InDel markers with a density of 31.5/Mb in the FN genome. Of these markers, 14 InDel markers could be used to divide the 115 Chinese Cannabis cultivars into two groups by genetic diversity analysis, population structure, and PCoA analysis. Additionally, two InDel markers, Cs-I1–10 and Cs-I1–15, related to female and male plant in Cannabis have been screened out. These genome-wide InDels and data on the genetic relationships of the Chinese Cannabis germplasm would serve useful in the further molecular analysis in Cannabis, and two sex-linked markers may provide accurate sex identification strategies at the early stage of Cannabis in production and breeding program.

Materials and methods

Plant materials and DNA extraction

A total of 115 Cannabis accessions were collected from different regions in China and preserved in our institute. Detailed information on these cultivars is summarised in Table 1. Flowering time is the time from sowing to flowering. When more than 50% of the plants of each cultivar bloom, the flowering time was scored and listed in Table 1. Additionally, six female and six male individuals, selected from an F₂ population derived from a cross between a female “H4” plant and a male “Yunma 6” (Y6) plant, were used for the screening of sex-linked marker. Furthermore, 24 samples (12 females and 12 males individuals) from the “H4” variety and ten samples from the feminized Cannabis variety, “ZY1”, were used for further validation of the sex-linked marker.

DNA extraction

The young leaves of each sample at the flowering stage were collected for DNA extraction. A Plant Genomic DNA Kit (Tiangen Biotech, Beijing, China) was used for DNA extraction. DNA quality and quantity were checked using an Eppendorf BioSpectrometer (Eppendorf, Hamburg, Germany), and the DNA was further diluted to a 10 ng/L working solution.

Detection and design of the insertions/deletions (InDels)

The genomic DNA sequences of PK and FN were obtained from ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/417/725/GCA_003417725.2_ASM341772v2. The DNA sequences of PK represented the reference genome, which was compared with that of FN using MUMer (http://mummer.sourceforge.net/manual/) software to capture the InDel loci (≥ 4 bp). Then, based on the InDel loci data, the primers were designed using the Primer 3.0 software (http://pgrc.ipk-gatersleben.de/misa/primer3.html). One pair of primers with the highest scoring was selected in the design results for the experiments. Furthermore, all InDel markers were checked for specificity using the TBtool software by blasting with the reference genome to avoid nonspecific amplification [43]. Only unique InDels were retained and listed in Table S1.

InDel genotyping

The 84 primer pairs evenly distributed in the FN genome were selected for polymorphism analysis. Polymerase chain reactions (PCRs) were performed using 10 μL aliquots of the reaction mixture, including 7 μL of the PCR mix solution (Qingke, Nanjing, China), 1 μL of the forward primer (10 nmol/L), 1 μL of the reverse primer (10 nmol/L), and 1 μL of the DNA template. PCR was conducted as follows: an initial step at 95 °C for 5 min, followed by 32 cycles of 30 s at 94 °C, 30 s at 55 °C, 40 s at 72 °C, and a final extension of 10 min at 72 °C. Primers used for genotyping were listed in Table 2 and Table S2.

Table 2 The primers used in this study

Full size table

Genetic diversity assay and population structure

Similar band types of 115 Cannabis cultivars on the electropherograms amplified using the same InDel markers were considered the same allele. Each polymorphic band detected by the same given primer represented an allelic mutation. To generate molecular data matrices, clear bands for each fragment were scored in every accession for each primer pair and recorded as 1 (presence of a fragment), 0 (absence of a fragment), and 9 (complete absence of band). PowerMarker version 3.25 was used to calculate the polymorphism information content (PIC), the number of alleles (NA), major allele frequency (MAF), and gene diversity for each InDel. A clustering map was conducted based on genetic distances and the unweighted pair group method with arithmetic mean (UPGMA) using the SM functionality of the NTSYS-pc2.10e software. Principal Coordinate Analysis (PCoA) was also performed using the NTSYS-pc2.10e software to resolve clustering patterns among genotypes. STRUCTURE v2.3.4 was used to estimate the population structure of the 115 Cannabis genotypes, and the number of the sub-population (K) was set from 1 to 10 based on admixture models and correlated with band frequencies three times. IBM SPSS Statistic 19.0 was used for cluster analysis of 115 Cannabis cultivars with the longest distance method and the Euclidean distance square based on the flowering time of each cultivar.

Availability of data and materials

Male-specific sequences of Cannabis cultivar ‘H4’ amplified by InDel marker Cs-I1–10 and Cs-I1–15 have been uploaded to the NCBI SRA database. SRA accession: PRJNA734672. The data will be accessible with the following link: “https://www.ncbi.nlm.nih.gov/sra/PRJNA734672”.

Change history

13 September 2021
A Correction to this paper has been published: https://doi.org/10.1186/s12864-021-07960-0

Abbreviations

InDel:: Insertion/Deletion
CBD:: Cannabinoid
THC:: tetrahydrocannabinol
RFLP:: Restriction fragment length polymorphism
RAPD:: Random amplified polymorphic DNA
AFLP:: Amplification fragment length polymorphism
SSR:: Simple sequence repeats
ISSR:: Inter-simple sequence repeat
PCR:: Polymerase chain reactions
PIC:: Polymorphism information content
NA:: Number of allele
MAF:: Major allele frequency
UPGMA:: Unweighted pair group method with arithmetic mean

References

Andre C, Hausman J, Guerriero G. Cannabis sativa: the plant of the thousand and one molecules. Front Plant Sci. 2016;7:19. https://doi.org/10.3389/fpls.2016.00019.
Barcaccia G, Palumbo F, Scariolo F, Vannozzi A, Borin M, Bona S. Potentials and challenges of genomics for breeding cannabis cultivars. Front Plant Sci. 2020; https://doi.org/10.3389/fpls.2020.573299.
Li J, Ye C. Genome-wide analysis of microsatellite and sex-linked marker identification in Gleditsia sinensis. BMC Plant Biol. 2020;20(1):338.
Kage U, Kumar A, Dhokane D, Karre S, Kushalappa A. Functional molecular markers for crop improvement. Crit Rev Biotechnol. 2016;36(5):917–30. https://doi.org/10.3109/07388551.2015.1062743.
Article CAS PubMed Google Scholar
Salentijn E, Zhang Q, Amaducci S, Yang M, Trindade L. New developments in fiber hemp (Cannabis sativa L.) breeding. Ind Crop Prod. 2015;68:32–41. https://doi.org/10.1016/j.indcrop.2014.08.011.
Article Google Scholar
Gigliano G. Identification of Cannabis sativa L. (Cannabaceae) using restriction profiles of the internal transcribed spacer II (ITS2). Sci Justice. 1998;38(4):225–30. https://doi.org/10.1016/S1355-0306(98)72116-1.
Article CAS PubMed Google Scholar
Forapani S, Carboni A, Paoletti C, Moliterni V, Ranalli P, Mandolino G. Comparison of hemp varieties using random amplified polymorphic DNA markers. Crop Sci. 2001;41(6):1682–9. https://doi.org/10.2135/cropsci2001.1682.
Article CAS Google Scholar
Datwyler S, Weiblen G. Genetic variation in hemp and marijuana (Cannabis sativa L.) according to amplified fragment length polymorphisms. J Forensic Sci. 2006;51(2):371–5. https://doi.org/10.1111/j.1556-4029.2006.00061.x.
Article CAS PubMed Google Scholar
Gao C, Xin P, Cheng C, Tang Q, Chen P, Wang C, et al. Diversity analysis in cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers. PLoS One. 2014;9(10):e110638. https://doi.org/10.1371/journal.pone.0110638.
Article CAS PubMed PubMed Central Google Scholar
Soler S, Gramazio P, Figàs M, Vilanova S, Rosa E, Llosa E, et al. Genetic structure of Cannabis sativa var. indica cultivars based on genomic SSR (gSSR) markers: implications for breeding and germplasm management. Ind Crop Prod. 2017;104:171–8. https://doi.org/10.1016/j.indcrop.2017.04.043.
Article CAS Google Scholar
Sakamoto K, Shimomura K, Komeda Y, Kamada H, Satoh S. A male-associated DNA sequence in a dioecious plant, Cannabis sativa L. Plant Cell Physiol. 1995;36(8):1549–54.
CAS PubMed Google Scholar
Mandolino G, Carboni A, Forapani S, Faeti V, Ranalli P. Identification of DNA markers linked to the male sex in dioecious hemp (Cannabis sativa L.). Theor Appl Genet. 1999;98(1):86–92. https://doi.org/10.1007/s001220051043.
Article CAS Google Scholar
Sakamoto K, Abe T, Matsuyama T, Yoshida S, Ohmido N, Fukui K, et al. RAPD markers encoding retrotransposable elements are linked to the male sex in Cannabis sativa L. Genome. 2005;48(5):931–6. https://doi.org/10.1139/g05-056.
Article CAS PubMed Google Scholar
Rode J, In-Chol K, Saal B, Flachowsky H, Kriese U, Weber W. Sex-linked SSR markers in hemp. Plant Breed. 2005;124(2):167–70. https://doi.org/10.1111/j.1439-0523.2005.01079.x.
Article CAS Google Scholar
Toth J, Stack G, Cala A, Carlson C, Wilk R, Crawford J, et al. Development and validation of genetic markers for sex and cannabinoid chemotype in Cannabis sativa L. GCB Bioenergy. 2020;12(3):213–22. https://doi.org/10.1111/gcbb.12667.
Article CAS Google Scholar
Lv Y, Liu Y, Zhao H. mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development. BMC Genomics. 2016;17(1):290. https://doi.org/10.1186/s12864-016-2614-5.
Article CAS PubMed PubMed Central Google Scholar
Wu K, Yang M, Liu H, Tao Y, Mei J, Zhao Y. Genetic analysis and molecular characterization of Chinese sesame (Sesamum indicum L.) cultivars using insertion-deletion (InDel) and simple sequence repeat (SSR) markers. BMC Genet. 2014;15(1):35. https://doi.org/10.1186/1471-2156-15-35.
Article CAS PubMed PubMed Central Google Scholar
Liu B, Wang Y, Zhai W, Deng J, Wang H, Cui Y, et al. Development of InDel markers for Brassica rapa based on whole-genome re-sequencing. Theor Appl Genet. 2013;126(1):231–9. https://doi.org/10.1007/s00122-012-1976-6.
Article CAS PubMed Google Scholar
Wu D, Wu H, Wang C, Tseng H, Hwu K. Genome-wide InDel marker system for application in rice breeding and mapping studies. Euphytica. 2013;192(1):131–43.
Article Google Scholar
Zhou G, Zhang Q, Tan C, Zhang X, Li C. Development of genome-wide InDel markers and their integration with SSR, DArT and SNP markers in single barley map. BMC Genomics. 2015;16:804.
Mahmood S, Li Z, Yue X, Wang B, Chen J, Liu K. Development of InDels markers in oilseed rape (Brassica napus L.) using re-sequencing data. Mol Breeding. 2016;36(6).
Liu J, Qu J, Yang C, Tang D, Li J, Lan H, et al. Development of genome-wide insertion and deletion markers for maize, based on next-generation sequencing data. BMC Genomics. 2015;16(1):601. https://doi.org/10.1186/s12864-015-1797-5.
Li W, Cheng J, Wu Z, Qin C, Tan S, Tang X, et al. An InDel-based linkage map of hot pepper (Capsicum annuum). Mol Breed. 2015;35(1):32. https://doi.org/10.1007/s11032-015-0219-3.
Article CAS PubMed PubMed Central Google Scholar
Moghaddam S, Song Q, Mamidi S, Schmutz J, Lee R, Cregan P, et al. Developing market class specific InDel markers from next generation sequence data in Phaseolus vulgaris L. Front Plant Sci. 2014;5:185. https://doi.org/10.3389/fpls.2014.00185.
Lv H, Yang L, Kang J, Wang Q, Wang X, Fang Z, et al. Development of InDel markers linked to Fusarium wilt resistance in cabbage. Mol Breeding. 2013;32(4):961–7. https://doi.org/10.1007/s11032-013-9925-x.
Article CAS Google Scholar
Yang Z, Dai Z, Xie D, Chen J, Tang Q, Cheng C, et al. Development of an InDel polymorphism database for jute via comparative transcriptome analysis. Genome. 2018;61(5):323–7. https://doi.org/10.1139/gen-2017-0191.
Article CAS PubMed Google Scholar
Crocq M. History of cannabis and the endocannabinoid system. Dialogues Clin Neuro. 2020;22(3):223–8.
Article Google Scholar
Singh N, Choudhury D, Tiwari G, Singh A, Kumar S, Srinivasan K, et al. Genetic diversity trend in Indian rice varieties: an analysis using SSR markers. BMC Genet. 2016;17(1):127. https://doi.org/10.1186/s12863-016-0437-7.
Article PubMed PubMed Central Google Scholar
Bhatta M, Morgounov A, Belamkar V, Poland J, Baenziger P. Unlocking the novel genetic diversity and population structure of synthetic hexaploid wheat. BMC Genomics. 2018;19(1):591. https://doi.org/10.1186/s12864-018-4969-2.
Article PubMed PubMed Central Google Scholar
Zhang L, Chang Y, Zhang X, Guan F, Yuan H, Yu Y, et al. Analysis of the genetic diversity of Chinese native Cannabis sativa cultivars by using ISSR and chromosome markers. Genet Mol Res. 2014;13(4):10490–500. https://doi.org/10.4238/2014.December.12.10.
Article CAS PubMed Google Scholar
Zhang J, Yan J, Huang S, Pan G, Chang L, Li J, et al. Genetic diversity and population structure of cannabis based on the genome-wide development of simple sequence repeat markers. Front Genet. 2020;11:958. https://doi.org/10.3389/fgene.2020.00958.
Van B, Stout J, Cote A, Tallon C, Sharpe A, Hughes T, et al. The draft genome and transcriptome of Cannabis sativa. Genome Biol. 2011;12(10):R102.
Article Google Scholar
Laverty KU, Stout JM, Sullivan MJ, Shah H, Gill N, Holbrook L, et al. A physical and genetic map of Cannabis sativa identifies extensive rearrangements at the THC/CBD acid synthase loci. Genome Res. 2019;29(1):146–56. https://doi.org/10.1101/gr.242594.118.
Article CAS PubMed PubMed Central Google Scholar
Prentout D, Razumova O, Rhoné B, Badouin H, Henri H, Feng C, et al. An efficient RNA-seq-based segregation analysis identifies the sex chromosomes of Cannabis sativa. Genome Res. 2020;30(2):164–72. https://doi.org/10.1101/gr.251207.119.
Article CAS PubMed PubMed Central Google Scholar
Lü Y, Cui X, Li R, Huang P, Zong J, Yao D, et al. Development of genome-wide insertion/deletion markers in rice based on graphic pipeline platform. J Integr Plant Biol. 2015;57(11):980–91. https://doi.org/10.1111/jipb.12354.
Article CAS PubMed Google Scholar
Mi W, Wang N, Shen C, Huang C, Wen T, Lin Z. Development and evaluation of InDel markers in cotton based on whole-genome re-sequencing data. Acta Agron Sin. 2019;45(2):196.
Article Google Scholar
Sun Z, Liu Y, Xiao S, Hu J, Pan G, He J, et al. Identification of quantitative trait loci for resistance to rice black-streaked dwarf virus disease and small brown planthopper in rice. Mol Breeding. 2017;37(6):1–9.
Article CAS Google Scholar
Gao L, Yang G, Li Y, Fan N, Li H, Zhang M, et al. Fine mapping and candidate gene analysis of a QTL associated with leaf rolling index on chromosome 4 of maize (Zea mays L.). Theor Appl Genet. 2019;132(11):3047–62. https://doi.org/10.1007/s00122-019-03405-1.
Article CAS PubMed Google Scholar
Liu G, Zhao T, You X, Jiang J, Li J, Xu X. Molecular mapping of the Cf-10 gene by combining SNP/InDel-index and linkage analysis in tomato (Solanum lycopersicum). BMC Plant Biol. 2019;19(1):15. https://doi.org/10.1186/s12870-018-1616-7.
Article PubMed PubMed Central Google Scholar
Qiu L, Wang H, Li Y, Wang W, Liu Y, Mu J, et al. Fine mapping of the wheat leaf rust resistance gene LrLC10 (Lr13) and validation of its co-segregation markers. Front Plant Sci. 2020;11:470. https://doi.org/10.3389/fpls.2020.00470.
Article PubMed PubMed Central Google Scholar
Jain A, Roorkiwal M, Kale S, Garg V, Yadala R, Varshney R. InDel markers: an extended marker resource for molecular breeding in chickpea. PLoS One. 2019;14(3):e0213999. https://doi.org/10.1371/journal.pone.0213999.
Article CAS PubMed PubMed Central Google Scholar
Zhang Q, Chen X, Guo H, Trindade L, Salentijn E, Guo R, et al. Latitudinal adaptation and genetic insights into the origins of Cannabis sativa L. Front Plant Sci. 2018;9:1876. https://doi.org/10.3389/fpls.2018.01876.
Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y, et al. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol Plant. 2020;13(8):1194–202. https://doi.org/10.1016/j.molp.2020.06.009.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank all the staff in our lab for providing useful suggestions and technical assistance. We are very grateful to the editor and reviewers for providing constructive comments for the improvement of our manuscript.

Research involving plants

We confirm we have included a statement specifying the local, national or international guidelines and legislation and the required or appropriate permissions and/or licences for the study. Experimental research on plants and field studies in this work comply with the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Chinese Cannabis.

Funding

This research was supported by China Agriculture Technology Research System (CARS-16-E-02), Central Public-interest Scientific Institution Basal Research Fund (1610242020003; 1610242019001). Funds were used for performing the experiment, and analysis data and in writing the manuscript.

Author information

Gen Pan and Zheng Li contributed equally to this work.

Authors and Affiliations

Institute of Bast Fiber Crops, Chinese Academy of Agricultural Sciences, Changsha, 410205, China
Gen Pan, Zheng Li, Siqi Huang, Jie Tao, Yaliang Shi, Anguo Chen, Jianjun Li, Huijuan Tang, Li Chang, Yong Deng, Defang Li & Lining Zhao
Key Laboratory of the Biology and Process of Bast Fiber Crops, Ministry of Agriculture, Changsha, China
Gen Pan, Siqi Huang, Anguo Chen, Jianjun Li, Huijuan Tang, Li Chang, Yong Deng, Defang Li & Lining Zhao

Authors

Gen Pan
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Siqi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Tao
View author publications
You can also search for this author in PubMed Google Scholar
Yaliang Shi
View author publications
You can also search for this author in PubMed Google Scholar
Anguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Li
View author publications
You can also search for this author in PubMed Google Scholar
Huijuan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Li Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Deng
View author publications
You can also search for this author in PubMed Google Scholar
Defang Li
View author publications
You can also search for this author in PubMed Google Scholar
Lining Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GP and ZL performed the experiments; GP wrote the paper; YLS, JT, AGC, JJL and HJT participated in the sample collection and DNA extraction; LC and YD analysed the data; SQH revised the manuscript; LNZ and DFL designed the experiment, and LNZ also provided the materials. All authors read and approved the version to be published. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Defang Li or Lining Zhao.

Ethics declarations

Ethics approval and consent to participate

The 115 Chinese Cannabis materials were provided by National medium-term germplasm bank of bast fiber (Institute of Bast Fiber Crops, Chinese Academy of Agricultural Sciences). The study was conducted in accordance with 1961 International Convention against drugs, and also was approved by the 10th document issued by the State Drug Administration and the Ministry of agriculture in China in 2002.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: Fig. 2 was a duplicate of Fig. 3 in the original publication.

Supplementary Information

Additional file 1.0 Table S1.

All InDel markers developed in this study.

Additional file 2. Table S2.

The primers used for screenning of sex-linked InDel markers. Table S3. The genetic admixture of 115 cultivars. Fig. S1 The physical location of 84 InDel primers on Cannabis chromosome used in this study. Fig. S2. Amplification products from 96 Cannabis cultivars using the InDel markers CS-I1–2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Pan, G., Li, Z., Huang, S. et al. Genome-wide development of insertion-deletion (InDel) markers for Cannabis and its uses in genetic structure analysis of Chinese germplasm and sex-linked marker identification. BMC Genomics 22, 595 (2021). https://doi.org/10.1186/s12864-021-07883-w

Download citation

Received: 22 February 2021
Accepted: 28 June 2021
Published: 05 August 2021
DOI: https://doi.org/10.1186/s12864-021-07883-w

Genome-wide development of insertion-deletion (InDel) markers for Cannabis and its uses in genetic structure analysis of Chinese germplasm and sex-linked marker identification

Abstract

Background

Results

Conclusion

Similar content being viewed by others

Genome-wide analysis of microsatellite and sex-linked marker identification in Gleditsia sinensis

Whole-genome resequencing of wild and cultivated cannabis reveals the genetic structure and adaptive selection of important traits

Population structure and genetic diversity of Moroccan cannabis (Cannabis sativa L.) germplasm through simple sequence repeat (SSR) analysis

Introduction

Result

Distribution of InDel markers

Development of InDel markers for whole Cannabis genome and polymorphism analysis

Genetic diversity analysis and population structure

Screening of sex-linked InDel markers and PCR-based verification of known-sex plants

Discussion

Conclusion

Materials and methods

Plant materials and DNA extraction

DNA extraction

Detection and design of the insertions/deletions (InDels)

InDel genotyping

Genetic diversity assay and population structure

Availability of data and materials

Change history

13 September 2021

Abbreviations

References

Acknowledgements

Research involving plants

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1.0 Table S1.

Additional file 2. Table S2.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation