Development and use of genic molecular markers (GMMs) for construction of a transcript map of chickpea (Cicer arietinum L.)
- First Online:
- Cite this article as:
- Gujaria, N., Kumar, A., Dauthal, P. et al. Theor Appl Genet (2011) 122: 1577. doi:10.1007/s00122-011-1556-1
- 1.6k Downloads
A transcript map has been constructed by the development and integration of genic molecular markers (GMMs) including single nucleotide polymorphism (SNP), genic microsatellite or simple sequence repeat (SSR) and intron spanning region (ISR)-based markers, on an inter-specific mapping population of chickpea, the third food legume crop of the world and the first food legume crop of India. For SNP discovery through allele re-sequencing, primer pairs were designed for 688 genes/expressed sequence tags (ESTs) of chickpea and 657 genes/ESTs of closely related species of chickpea. High-quality sequence data obtained for 220 candidate genic regions on 2–20 genotypes representing 9 Cicer species provided 1,893 SNPs with an average frequency of 1/35.83 bp and 0.34 PIC (polymorphism information content) value. On an average 2.9 haplotypes were present in 220 candidate genic regions with an average haplotype diversity of 0.6326. SNP2CAPS analysis of 220 sequence alignments, as mentioned above, provided a total of 192 CAPS candidates. Experimental analysis of these 192 CAPS candidates together with 87 CAPS candidates identified earlier through in silico mining of ESTs provided scorable amplification in 173 (62.01%) cases of which predicted assays were validated in 143 (82.66%) cases (CGMM). Alignments of chickpea unigenes with Medicago truncatula genome were used to develop 121 intron spanning region (CISR) markers of which 87 yielded scorable products. In addition, optimization of 77 EST-derived SSR (ICCeM) markers provided 51 scorable markers. Screening of easily assayable 281 markers including 143 CGMMs, 87 CISRs and 51 ICCeMs on 5 parental genotypes of three mapping populations identified 104 polymorphic markers including 90 markers on the inter-specific mapping population. Sixty-two of these GMMs together with 218 earlier published markers (including 64 GMM loci) and 20 other unpublished markers could be integrated into this genetic map. A genetic map developed here, therefore, has a total of 300 loci including 126 GMM loci and spans 766.56 cM, with an average inter-marker distance of 2.55 cM. In summary, this is the first report on the development of large-scale genic markers including development of easily assayable markers and a transcript map of chickpea. These resources should be useful not only for genome analysis and genetics and breeding applications of chickpea, but also for comparative legume genomics.
Chickpea (Cicer arietinum L.), the only cultivated species within the genus Cicer, is a self-pollinated diploid (2n = 2x = 16) crop with a relatively small genome size of 740 Mbp (Arumuganathan and Earle 1991). It ranks third among food legumes in terms of production after common bean (Phaseolus vulgaris) and pea (Pisum sativum). Total annual world production of chickpea is 9.8 million tonnes, and major producers India and Pakistan contribute 65 and 10%, respectively, to the world harvest (FAOSTAT database, 2008, http://faostat.fao.org/site/567/DesktopDefault.aspx?PageID=567#ancor). Chickpea seed contains 20–30% protein, approximately 40% carbohydrates and only 3–6% oil (Gil et al. 1996) and, moreover, it is a good source of different minerals such as calcium, magnesium, potassium, phosphorus, iron, zinc and manganese (Ibrikci et al. 2003).
Majority of chickpea crop grown area falls under semi-arid tropic (SAT) region of the world where the crop is exposed to several biotic and abiotic stresses. As a result, the crop productivity is seriously challenged in SAT regions. Although chickpea breeders have been working tirelessly to develop improved chickpea varieties with enhanced resistance or tolerance to biotic and abiotic stresses, modern biotechnological tools offer opportunities to accelerate such activities. Molecular markers, genetic maps and markers associated with traits are pre-requisites for undertaking molecular breeding leading to crop improvement (see Varshney et al. 2005a). Many kinds of molecular markers including RFLP (restriction fragment length polymorphism), RAPD (random amplification of polymorphic DNA), AFLP (amplified fragment length polymorphism), and microsatellite or simple sequence repeat (SSR) have been developed in different crops and applied for a variety of studies in the field of genetics and plant breeding. However, the choice of the marker system is largely dependent on the intended application, costs involved in development and genotyping and the ease of use. Among the different classes of molecular markers, SSRs have been proven useful for a variety of applications in plant genetics and breeding because of their reproducibility, multi-allelic nature, co-dominant inheritance, relative abundance and genome-wide coverage (Gupta and Varshney 2000). In case of chickpea, several hundred SSR markers have been developed (Winter et al. 1999; Choudhary et al. 2006, 2009; Sethy et al. 2006; Nayak et al. 2010). However, most of these markers were designed from genomic sequences and do not essentially represent the functionally important coding regions.
Due to emphasis on functional genomics and advent of next generation sequencing (NGS) technologies, it has become possible to develop the markers from genes or coding regions (Varshney 2010). As these markers are derived from genes and a putative function is known or can be deduced for the corresponding genes majority of times, these markers are popularly referred as ‘genic molecular markers (GMMs)’ (Varshney 2010) or ‘functional markers’ (Andersen and Lübberstedt 2003). A number of GMMs have several intrinsic advantages over genomic DNA markers as they serve as a useful source for identification of ‘perfect marker’ for marker-assisted selection (MAS), estimating the functional genetic diversity present in germplasm collection, comparative mapping among related species and identification of chromosome duplication events. Genetic maps developed based on GMMs are popularly called as ‘transcript maps’ or ‘functional maps’.
A number of methods have been used for developing GMMs in past for several crop species (Gupta and Rustgi 2004; Varshney 2010). Some of these methods include: (a) identification of SNPs (single nucleotide polymorphisms) by allele re-sequencing for candidate genes across different genotypes (e.g. Kota et al. 2008), (b) development of SSR markers from genes or expressed sequence tags (ESTs), called EST-SSR markers (see Varshney et al. 2005b); (c) identification of SNPs through in silico mining of ESTs coming from different genotypes and development of markers based on such SNPs, often referred as EST-SNP markers (e.g. Kota et al. 2008), (d) designing the primers from exonic regions to amplify the intronic region and detect either length or sequence polymorphism in introns (Feltus et al. 2006), referred here as intron spanning region (ISR) markers. While EST-SSR markers can be assayed on traditional agarose or polyacrylamide or capillary electrophoresis, a number of SNP genotyping platforms, e.g. pyrosequencing (Alderborn et al. 2000; Ching and Rafalski 2002; Varshney et al. 2008), mass spectrometry (Rodi et al. 2002), Affymetrix chips (Borevitz et al. 2003), GoldenGate assays (Fan et al. 2003, Rostoks et al. 2006), are available. However, one of the most cost-effective SNP genotyping approaches is conversion of SNPs into cleaved amplified polymorphic sequence (CAPS) assay which can be done by using agarose gel electrophoresis (Varshney et al. 2007). ISR markers are generally assayed on denaturing MDE (mutation detection enhancement) gel (Thudi et al. 2010). Using above-mentioned methodologies and several others, a large number of GMMs as well as transcript maps have been developed in several crop species such as rice (Wu et al. 2002), wheat (Qi et al. 2004), barley (Stein et al. 2007; Kota et al. 2008; Sato et al. 2009), soybean (Choi et al. 2007) etc.
In case of chickpea, though few reports are available on identification of SNPs based on re-sequencing of some genes (Rajesh and Muehlbauer 2008; Nayak et al. 2010), no comprehensive study has been undertaken so far on identification of SNPs at large scale or development of a detailed transcript map. Therefore, this study was undertaken with following objectives: (a) SNP discovery and sequence diversity analysis through allele re-sequencing approach, (b) development of CAPS markers for assaying SNPs, (c) development of ISR markers, (d) optimization of SSR markers derived from ESTs, and (e) construction of transcript map of chickpea.
Materials and methods
Plant material and DNA extraction
List of chickpea genotypes used for allele re-sequencing for identification of SNPs
Total genomic DNA was extracted from leaves of 2-week-old seedlings using high-throughput mini DNA extraction protocol as mentioned in Cuc et al. (2008). The quality and quantity of extracted DNA was checked on 0.8% agarose gel. The DNA was normalized to 5 ng/μl for further use.
Designing of primer pairs
For SNP discovery based on allele re-sequencing approach, primer pairs were designed using PRIMER3 for: (a) 688 chickpea ESTs/genes, (b) 657 heterologous genes or transcription factors (TFs) (identified in Medicago truncatula, Medicago sativa, Lotus japonicus, Lupinus spp., Arachis hypogaea, P. sativum, Crotolaria tenuifolia, P. vulgaris, Phaseolus coccineus, Glycine max, Glycine soja, Robinia pseudoacacia and Trifolium pratense), (c) 77 chickpea EST-SSRs (Varshney et al. 2009a), (d) 87 EST contigs containing SNPs that could be optimized for CAPS assay (Varshney et al. 2009a) and (e) 121 intronic regions identified based on comparison of chickpea ESTs with Medicago genome.
Polymerase chain reaction (PCR)
PCR for all above-mentioned markers was carried out in 20 μl reaction in a GeneAmp® PCR System 9700 thermal cycler (Applied Biosystems, Foster City, CA, USA) containing 5 ng template DNA, 0.2 mM dNTPs, 2 mM MgCl2, 2 pmol of forward and reverse primer, 1 U of Taq DNA polymerase (Sib enzyme), and 1× PCR buffer. The amplification cycles were: initial denaturation of 5 min at 94°C followed by 10 cycles of denaturation for 15 s at 94°C, touchdown from 61 to 51°C with 1°C decrease in each cycle for 20 s followed by extension at 72°C for 30 s. The next 40 cycles included denaturation at 94°C for 15 s, annealing at 54°C for 30 s and extension at 72°C for 30 s followed by final extension of 30 min at 72°C and left at 4°C until further use. For testing the amplification, PCR products were separated on 1.2% agarose gel electrophoresis.
Allele re-sequencing and SNP detection
For allele re-sequencing-based SNP discovery, PCR products obtained using 688 chickpea and 657 heterologous genes/transcription factors were sequenced in both directions using Sanger sequencing methodology. Initially, PCR amplicons producing solitary bands were purified by treating with 1 U each of Exonuclease I (Exo) and shrimp alkaline phosphatase (SAP). Subsequently, the EXO/SAP-treated PCR products were incubated at 37°C for 45 min followed by denaturation at 80°C for 15 min to deactivate remaining EXO activity. All the Exo/SAP-treated products were sequenced from both ends using respective forward and reverse primers at Macrogen Inc., Seoul, South Korea (http://www.macrogen.com/).
Sequencing data were inspected manually for possible sequencing error. The forward and reverse sequences for the given gene/EST were merged to prepare the contig for each genotype using DNA Baser v 2.9 software (http://dnabaser.com) under the following parameters: minimum match percentage, 80%: minimum of overlap, 25 bases; quality value of 25. Subsequently, contigs for all genotypes were aligned using Clustal W (Thompson et al. 1994; http://www.ebi.ac.uk/Tools/clustalw2/index.html) and aligned files are saved with .aln extension. The .aln files of each gene/marker were subsequently opened in BioEdit version 18.104.22.168. SNP identification among different genotypes was undertaken manually after considering the base quality.
The multiple sequence alignment files for each gene/marker after manual confirmation of identified SNPs in BioEdit programme were saved as FASTA file. Subsequently, FASTA files for different genes were put together in a single file. This file was used in SNP DIVersity ESTimator module (divest.pm) developed at ICRISAT (Jayashree et al. 2009) for calculating the polymorphism information content (PIC) value of individual SNPs as well as gene/marker, nucleotide diversity (π), number and PIC value of haplotypes for each gene/marker.
Three genotyping assays were employed for generating marker genotyping data: (a) CAPS assays, (b) capillary electrophoresis and (c) polyacrylamide gel electrophoresis using MDE gel.
All the identified SNPs through allele re-sequencing undertaken in this study and in silico mining approach of ESTs, described in Varshney et al. (2009a), were subjected to ‘SNP2CAPS’ program (http://hpc.icrisat.cgiar.org/Pise/5.a/data_manipulation/snp2caps.html; Thiel et al. 2004) which employed restriction enzymes Rebase database (810.gcc version) containing recognition sequence information for 725 commercially available restriction enzymes. The programme predicted possible conversion of several SNP-containing markers/genes into CAPS assays.
All the predicted CAPS candidates were amplified using the same PCR conditions as mentioned above. Amplicons were then subjected to restriction digestion with the corresponding restriction enzyme followed by electrophoretic separation on agarose gel electrophoresis (3% agarose, 1× TBE buffer, 1 h, 120 V) and visualized by means of ethidium bromide staining (Varshney et al. 2007).
Polyacrylamide gel electrophoresis using MDE gel
PCR products, generated using ISR markers, were denatured and separated on single-strand confirmation polymorphism (SSCP) gels containing MDE solution as given in Thudi et al. (2010).
PCR products generated using EST-SSR markers were analyzed on capillary electrophoresis using ABI PRISM® 3730 DNA analyzer, and allele calling was carried out as given in Varshney et al. (2009b).
Genetic mapping and map construction
Genotyping data for polymorphic markers on inter-specific mapping population (ICC 4958 × PI 489777) were generated on 131 recombinant inbred lines (RILs). In addition, marker genotyping data for 218 marker loci were compiled from Winter et al. (1999, 2000) and Nayak et al. (2010) and 20 DArT (Diversity Array Technology) marker loci (unpublished). The map distances were calculated by applying the ‘Kosambi’ mapping function (Kosambi 1944) as per MAPMAKER/EXP 3.0 programme (Lander et al. 1987). For integrating new marker loci into different linkage groups (LGs), a minimum logarithm of odds (LOD) of 3.0 was used. With an objective of developing the high-quality genetic map, JoinMap® 4 programme (Van Ooijen 2006; http://www.kyazma.nl) was used to construct high-quality genetic map using the same set of marker loci determined in MAPMAKER/EXP 3.0 programme. Maps were drawn with MapChart, version 2.1 (Voorips 2002). The relative mapping positions of the unmapped markers from JoinMap® 4 programme have been shown on the right side of the linkage groups.
Development of GMMs
For identification of SNPs in chickpea, a total of 1,345 primer pairs were designed for genes or transcription factors identified in chickpea or other legume species, e.g. M. truncatula, M. sativa, L. japonicus, Lupinus spp., A. hypogaea, P. sativum, C. tenuifolia, P. vulgaris, P. coccineus, G. max, G. soja, R. pseudoacacia and T. pratense. Subsequently, these primer pairs were used to amplify 2–20 genotypes given in Table 1. All successful amplicons generated were used for sequencing in both directions. Sequence data obtained on all the genotypes for a given gene were aligned and compared to identify the SNPs.
Chickpea ESTs/genes-derived primer pairs
A total of 688 primer pairs were designed as follows: (a) 409 for SNPs identified based on Illumina sequencing of four genotypes namely ICC 4958, ICC 1882, ICC 506EB and ICCC 37 (Ca2C, Ca2S, CaESTs and Ca series; this study), (b) 217 ESTs (AGLC series; Buhariwalla et al. 2005, this study), and (c) 62 candidate genes (CaHa series; Singh et al. 2008, Rajesh and Muehlbauer 2008). Primer sequences for all above-mentioned genes/ESTs are given in ESM Table 1. After screening these primer pairs on two chickpea genotypes namely ICC 4958 and ICC 1882, only 587 primer pairs showed scorable amplification. Subsequently, all 587 primer pairs were used on 2–19 genotypes. Sequencing of amplicons in both directions provided high-quality sequences for 305 markers/genic regions on 2–19 genotypes.
Heterologous genes-/transcription factors-derived primer pairs
Development, amplification and sequencing status of gene sequences based primers derived from chickpea and heterologous species
Primer Id series/source of genes
Number of genotypes surveyed
Illumina/454 chickpea sequences (409)
Chickpea ESTs (279)
Rajesh and Muehlbauer (2008)
Singh et al. (2008)
Total chickpea (688)
Total heterologous (657)
In summary, a total of 988 markers showed scorable amplification and high-quality sequences were generated for 439 markers.
Identification of SNPs and estimates on sequence diversity in chickpea germplasm
Sequence data generated for 439 candidate genic regions across 2–20 genotypes were aligned and these alignments were edited manually by trimming low-quality sequences at both ends. The analysis of the sequence alignments across the genotypes showed occurrence of SNPs in case of 220 candidate genic regions (ESM Fig. 1). Among these genes, 122 have been originated from chickpea, while 98 have come from heterologous species (35 from Medicago spp., 19 from soybean, 13 each from L. japonicus, Lupinus spp., 6 from T. pratense and 12 genes from other legumes). DIVEST analysis of above-mentioned sequence alignments has shown a total of 1,893 SNPs in 67,828 bp sequence data generated for 220 candidate genic regions with a SNP frequency of 1/35.83 bp. Among 1,893 SNPs identified, 1,089 accounted for the transition and 888 for transversion. In addition, a total of 63 InDel were also observed.
As the genes surveyed for sequence diversity were originally derived from gene/transcription factor sequences of chickpea and related legume (heterologous) species and examined on cultivated and wild species, sequence diversity was estimated as per origin of the gene/TF sequence as well as in terms of the cultivated versus wild species.
Sequence diversity based on genes from chickpea versus heterologous species
Sequence diversity analysis based on source of genes and species types
Number of genotypes surveyed (average)
Total length of sequence surveyed (average in bp)
Number of genes showing SNPs
Number of SNPs identified (average)
PIC value of individual SNP (average)
Nucleotide diversity range (average)
Number of haplotypes (average)
Haplotype diversity range (average)
PIC value of haplotypes (average)
Source of genes
Heterologous species (657)
Within species type
Wild species (0–11)
220 genes across 2–20 genotypes
With an objective to identify the informative set of SNPs for chickpea genetics and breeding, PIC values were calculated and these ranged from 0.21 to 0.50 (average 0.34) in chickpea genes, while 0.11 to 0.46 (average 0.34) in heterologous species genes. As PIC values of the bi-allelic SNPs cannot exceed 0.50, sequence data for these candidate genic regions were analyzed in terms of haplotypes as well. Number of haplotypes observed varied from one to nine with an average of 0.41 (range 0.22–0.97) haplotype PIC value in chickpea genes, while heterologous species genes showed one to ten haplotypes with an average PIC of 0.61 (range 0.20–1.0). Haplotype diversity calculated was higher with heterologous species genes (0.7704) as compared to chickpea genes (0.5220).
It is interesting to note that at least eight genic regions coming from heterologous species (LG80, LG99, LG101, LG103, LG104, LG111, LUP120, TC77515) and one candidate genic region (AGLC212) from chickpea showed >50 SNPs across the genotypes examined. However, higher PIC values (>0.50) were observed with three chickpea-derived candidate genic regions (Ca2C21276, Ca2S126415_1648_0587, Ca2C3599).
Sequence diversity among cultivated versus wild species genotypes
While analyzing the sequence data as per cultivated species and wild species, a higher level of sequence diversity was observed in wild species as compared to cultivated species. On surveying 220 candidate genic regions in cultivated and wild species, the number of SNPs present in wild species (1,445 SNPs) was more than 10 times higher as compared to cultivated species (132 SNPs). Similarly, nucleotide diversity was higher (23.5 × 10−3) in case of wild species as compared to cultivated species (5.1 × 10−3). In terms of PIC value of SNPs, cultivated species depicted average PIC value of 0.41 (ranging from 0.20 to 0.50) and wild species showed average PIC value of 0.36 (ranging from 0.19 to 0.50).
As the number of SNPs are much higher in case of wild species, the number of haplotypes ranged from 1 to 10 (average 2.0) in wild species in contrast to 1 to 3 (average 1.3) in cultivated species. Haplotype diversity was also observed high in case of wild species (0.9759) as compared to cultivated species (0.6671). Similarly, a higher PIC value of haplotype was observed in wild species (average 0.76) as compared to cultivated species (0.46).
In total, 1,893 SNPs with an average SNP frequency of 1 SNP per 35.83 bp were observed on surveying 67,828 bp sequence data across all the 220 candidate genic regions. PIC values of SNP ranged from 0.11 to 0.50 with an average of 0.34 across all the 220 genic regions. Number of haplotypes across 220 genic regions ranged from 1 to 10 with PIC value ranging from 0.20 to 1.00 (avg. 0.50). On an average 2.9 haplotypes (ranged from 1 to 10) were present in 220 genic regions with an average haplotype diversity of 0.6326 (ranged from 0.2230 to 1.1670).
Development of CAPS markers for assaying SNPs
SNPs identified from allele re-sequencing were converted into potential CAPS markers by relating the SNP position to the presence/absence of a restriction site in amplicon using ‘SNP2CAPS’ programme. After analyzing 220 sequence alignments, as mentioned above, with SNP2CAPS programme, a total of 192 (87%) genic regions showed the presence of a putative restriction site in the sequence alignments. In addition, another potential set of 87 CAPS-based SNP markers was available from Varshney et al. (2009a).
Development of ISR markers
Details on development, amplification and polymorphism assessment of CISR markers
Medicago truncatula (Mt) chromosome number
Number of corresponding chickpea unigenes having intronic region
Number of primer pair designed for amplification of intronic regions
Number of primer pairs yielding scorable amplification
Number of primer pairs showing polymorphism across 5 chickpea genotypes
Chromosome (0) + strand
Chromosome (1) + strand
Chromosome (2) + strand
Chromosome (3) + strand
Chromosome (4) + strand
Chromosome (5) + strand
Chromosome (6) + strand
Chromosome (7) + strand
Chromosome (8) + strand
Validation of EST-derived SSR (EST-SSR) markers
Based on a comprehensive EST dataset, 77 EST-SSR markers were reported in an earlier study (Varshney et al. 2009a; ESM Table 6). Screening of these EST-SSR markers on 5 parental genotypes (ICC 4958, ICC 1882, ICC 283, ICC 8261 and PI 489777) of three mapping populations in this study showed amplification in case of 51 EST-SSR (66.23%) markers. These markers have the designation ICRISAT Chickpea eST Microsatellite (ICCeM) markers.
As mentioned above, successful assays were developed for 143 CGMM, 87 CISR and 51 ICCeM markers. All these markers were screened on a set of five chickpea genotypes that represent parents of one inter-specific (ICC 4958 × PI 489777) and two intra-specific mapping populations (ICC 283 × ICC 8261; ICC 4958 × ICC 1882).
In case of CGMMs, 53 (37.06%) CGMMs showed polymorphism for the inter-specific mapping population, while 6 (4.2%) CGMMs and 2 (1.4%) CGMMs were polymorphic in two intra-specific mapping populations namely ICC 4958 × ICC 1882 and ICC 283 × ICC 8261, respectively (ESM Table 4). Out of 87 CISR markers tested, 21 (24.14%) markers showed polymorphism across 5 genotypes (ESM Table 5). While 18 (20.69%) CISR markers showed polymorphism in the parental lines of the inter-specific mapping population, 3 (3.45%) CISR markers showed polymorphisms for each of two intra-specific mapping populations. In case of ICCeMs, 19 (37.25%) markers showed polymorphism for the inter-specific mapping population, while 9 (17.65%) markers showed polymorphism for each of two intra-specific mapping populations (ESM Table 6).
Polymorphism status of easily assayable GMMs in three mapping populations
Primer pairs designed
Marker assays optimized
Number of polymorphic markers
ICC 4958 × PI 489777 (inter-specific population)
ICC 4958 × ICC 1882 (intra-specific population)
ICC 283 × ICC 8261 (intra-specific population)
Construction and features of genetic map
As a large number of GMMs showed polymorphism for the inter-specific mapping population (ICC 4958 × PI 489777) and this population has been used as a reference mapping population in several studies (e.g. Winter et al. 1999, 2000; Nayak et al. 2010), this population was selected for developing a transcript map based on GMMs. As a result, genotyping data were collected for all 90 (53 CAPS-SNP, 19 EST-SSR and 18 ISR) polymorphic markers on the mapping population.
For development of a genetic map, marker genotyping data for 238 marker loci were compiled from other sources (Winter et al. 1999, 2000; Nayak et al. 2010; ICRISAT, unpublished). Genotyping data of these markers together with 90 newly developed markers in the present study were used with MAPMAKER/EXP version 3.0. As a result, only 79 (out of 90) new marker loci were integrated into this genetic map. The JoinMap® 4 analysis on these marker genotyping data, however, could integrate only 62 GMM (32 CGMM, 15 CISR and 15 ICCeM) loci into this genetic map of chickpea. The putative mapping positions of the remaining 15 CGMM and 2 CISR loci have been shown to the right of linkage groups. The linkage groups have been designated according to Nayak et al. (2010).
Molecular markers are important genetic tools for understanding genome dynamics and facilitating molecular breeding. In case of chickpea, the progress in the area of development of molecular markers and genetic map, however, has remained slow. Nevertheless, in the past few years, significant progress has been made in the area of development of molecular markers (Hüttel et al. 1999; Sethy et al. 2006; Lichtenzveig et al. 2005; Buhariwalla et al. 2005; Choudhary et al. 2006, 2009; Hyten et al. 2010) and genetic maps (Winter et al. 2000; Pfaff and Kahl 2003; Choi et al. 2007; Radhika et al. 2007; Millan et al. 2010; Nayak et al. 2010). However, majority of these markers have derived from genomic DNA library and, therefore, developed markers or integrated marker loci onto the genetic map do not essentially represent genes. While gene-based molecular markers, popularly called GMMs, have been developed in large number in several crop species including some legumes like soybean (Choi et al. 2007), concerted efforts were not undertaken in case of chickpea. The present study, therefore, reports development of a comprehensive set of GMMs and integration of some of these GMM loci into this genetic map. In this context, three main approaches have been used: (a) allele re-sequencing approach to identify SNPs and conversion of some of these SNPs into CAPS markers (CGMMs), (b) aligning the chickpea unigenes to Medicago genome to develop intron-based markers (CISRs), and (c) optimizing of EST-derived SSR markers (ICCeMs).
SNPs and sequence diversity in chickpea germplasm
Although 988 candidate genes-based primer pairs showed scorable amplification, high-quality sequences were obtained for only 439 candidate genic regions. The analysis of sequence alignment for these genic regions showed 1,893 SNPs in 220 genes and 63 InDel across a set of 2–20 genotypes. SNP and InDel markers have been proven very useful for estimating the linkage disequilibrium and association mapping for crop improvement (Rafalski 2002).
Scanning of 67,828 bp sequence data (220 candidate genic regions) led to the identification of 1,893 SNPs with an average SNP frequency of 1 SNP per 35.83 bp in the chickpea germplasm surveyed. However, earlier SNP report in chickpea presented SNP frequency as 1/61 bp in coding region while 1/71 bp in genomic region (Rajesh and Muehlbauer 2008). The SNP frequency in the present study is even higher than the earlier report and it can be attributed to use of higher number of genotypes and especially more wild species. Interestingly, SNP frequency in chickpea seems to be higher as compared to other crop species like: wheat (1/1,000 bp, Bryan et al. 1999); rice (1/89 bp, Nasu et al. 2002); barley (1/300 bp, Kota et al. 2008); common bean (1/76 bp, Gaitán-Solís et al. 2008) and soybean (1 in 277 bp, Zhu et al. 2003). However, this is an outcome of using 11 wild species representing the secondary as well as tertiary genepool in the present study. In case SNP frequency is considered only in germplasm of only cultivated species, the SNP frequency is 1/513 bp. In this scenario, it is evident that the cultivated genepool of chickpea is narrow as compared to other legume or cereal species mentioned above.
In general, SNPs are bi-allelic, 76 SNPs for 25 candidate genic regions showed three alleles that enhance the value of SNPs identified in the present study. For instance, the marker CGMM101 (primer pair TC77707) is having 2 bp substitution at 128 and 136 bp length from G → T → A and C → A → G, respectively, showing haplotype diversity of 0.70 across all accessions, but 0.95 across wild chickpea accessions (ESM Fig. 1).
The chickpea lines coming from wild species had >10× higher SNPs as compared to lines coming from the cultivated species. Similarly, nucleotide diversity (π) was about 5× higher in wild species (23.5 × 10−3) as compared to the cultivated species (5.1 × 10−3). While comparing this π with the earlier report of 14 × 10−3 by Rajesh and Muehlbauer (2008), it is clear that π in cultivated species is lower and is higher in wild species in the present study. It is interesting to note that the LG101 marker derived from L. japonicus showed the highest π as 74.0 × 10−3 (cultivated species, 4.8 × 10−3; wild species, 88.4 × 10−3), whereas genic region for AGLC57, derived from chickpea EST, showed minimum π as 0.6 × 10−3.
However, unlike number and frequency of SNPs and π, the PIC value of SNPs in the wild species is lower (average 0.36) than the cultivated species (average 0.41). As PIC value is a direct function of allelic frequency for the given SNP in the germplasm collection and as compared to few genotype per wild species (4 for C. reticulatum and 1 each for 7 wild species), 11 lines were examined for cultivated species in the present study, a higher PIC value in cultivated species is not unexpected.
Comprehensive set of GMMs for chickpea genetics and breeding
In total, primer pairs were designed for 1,630 genes; however, 1,184 primer pairs showed scorable amplification in the two genotypes examined. Thus, this study reports the first comprehensive set of 1,184 GMMs comprising 988 candidate genes including 220 genic regions with SNPs or 143 CAPS (CGMMs), 87 intron-based markers (CISRs) and 51 EST-SSRs (ICCeMs) for chickpea. The primer sequence information provided in ESM Tables 1, 3, 5 and 6 will be of great value for chickpea genetics and breeding community.
In terms of development of CAPS markers, a total of 279 CAPS candidates including 192 identified in this study using allele re-sequencing approach and 87 identified in an earlier study by in silico mining of ESTs (Varshney et al. 2009a) were assayed on five chickpea genotypes. While scorable amplification was observed in 173 (62.01%) cases, CAPS assays were observed in 143 cases (82.66%). A higher success rate was observed for conversion of SNPs identified through allele-specific sequencing (59.90%) as compared to those that were identified through EST mining (32.18%) approach. This can be attributed for the possible sequencing error instead of presence of true SNPs. Also, CAPS identified based on in silico mining of ESTs did not show polymorphism in these five genotypes. This is possible as the genotypes deployed for CAPS validation in the present study are different than those from ESTs that were used for mining (Raju et al. 2010).
A first generation ‘transcript map’ of chickpea
With an objective of integrating the GMMs into genetic map, easily assayable 281 GMMs were screened on 5 parental genotypes of 3 mapping populations. As a maximum number of markers (90) showed polymorphism in the inter-specific mapping population, genotyping data collected for these markers were attempted to integrate the corresponding marker loci into genetic map. Although analysis of these marker data using MAPMAKER/EXP version 3.0 integrated 79 GMM loci, the map distance observed was high. Therefore, to develop a high-quality map, JoinMap® 4 programme was used but 62 GMM loci could be integrated into this genetic map. Therefore, the relative mapping positions of the remaining 17 markers have been shown on right side of the linkage groups.
Out of 218 framework markers used in the present study, 64 marker loci were gene-derived (Nayak et al. 2010), the present map has a total of 126 GMM loci. Probably, this is the first genetic map of chickpea that contained the highest number of GMM loci integrated. The present transcript map has 15.75 ± 4.6 GMM loci per linkage group with an average marker density of 1 marker per 6.08 cM. As compared to other legume crop species, transcript maps have been developed recently in cowpea (Muchero et al. 2009) and soybean (Choi et al. 2007). It is evident that integration of more GMM loci is required to enhance the density of GMM loci on a transcript map. This first generation transcript map will be useful for trait mapping, comparative mapping with other legume species as well as linking genetic map with physical map of chickpea as the GMM loci integrated in this map are sequence and based and represent the genes. It is also anticipated that the integrated GMM loci will serve as anchor markers for other chickpea maps so that those maps can be aligned with the reference genetic map.
In summary, the present study provides primer sequences for 1,184 amplifiable markers, 281 easily assayable GMMs and a transcript map of the chickpea comprising 126 GMM loci. In addition, the study also reports detailed features of SNPs in both cultivated as well as wild species. The GMMs and this transcript map should be useful on one hand to enhance our understanding in the area of genome dynamics of chickpea as well as comparative legume genomes, on the other hand these resources may be proven very useful for chickpea genetics and breeding applications.
The authors are thankful to the Department of Biotechnology (DBT), Government of India and CGIAR Generation Challenge Programme (GCP) for financial support. Thanks are also to P. Srinivasa Reddy and Spurthi Nayak for their help in some experiments and preparing the MS, and to Martin Crespi (France) and Richard Oliver (Australia) for providing access to some heterologous genes/ primers used in the study.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.