Background

Rice (Oryza sativa L.) is a staple food crop in India and many parts of the world. In India, it occupies the largest area under cultivation and has maximum share in grain production [3]. India is one of the centers for rice diversity and large diversity has been reported both at inter- and intra- specific levels [36]. Yield, quality characters and tolerance to biotic and abiotic stresses are major objectives of varietal development [25].

A large number of rice varieties are released and notified every year in India with higher yields, tolerance to biotic and abiotic stresses and to meet the requirement of changing farming systems based on user demands. Different rice varieties of distinct genetic background are a good promise for the future rice crop improvement. This has contributed to a large extent to the major increases in agricultural productivity in the twentieth century [10]. It is generally thought that continuous selection among the crosses of genetically related cultivars has led to a narrowing of the genetic base of the crops on which modern agriculture is based, thus contributing to the genetic erosion of the crop gene pools [33].

A robust and reliable method of fingerprinting is required for identification and purity testing of these varieties [41], as well as to study the genetic relationships among different cultivars [18, 42]. Genetic characterization of crop plants has gained momentum with the advent of PCR based molecular markers. Nowadays, SSR is a marker of choice for molecular characterization as it is co-dominant, distributed throughout the genome, highly reproducible, variable, reliable, easily scorable, abundant and multi-allelic in nature [37]. SSR markers have been used by many researchers [9, 17, 44] for characterization of rice varieties. SSR markers even in less number can give a better genetic diversity spectrum due to their multi allelic and highly polymorphic nature [24].

Recent reports suggest that genetic diversity in crop varieties released over the years fluctuates in successive time periods [8, 48]. In case of wheat there are reports available which showed an increase [16], decrease [12] as well as constant gene diversity over a period of time [15, 35]. The similar trend was also reported in rice [23, 47]. Over the last few centuries, rice has faced diversity loss [6] especially, after the green revolution due to replacement of native varieties with high yielding varieties [14]. Despite of a large number of varieties being developed in every year, molecular studies on a small set of rice varieties has revealed narrow genetic base [26, 45].

The present study was undertaken with the aim to assess the trend in genetic diversity of Indian rice varieties released and notified over the period from 1940 to 2013 and to understand the genetic relationship amongst the varieties by employing both hierarchical and model based approach using hyper variable simple sequence repeats (HvSSR) markers.

Results and Discussions

Our study is the first major effort to analyze the trend of genetic diversity in the large set of Indian rice varieties released over the years. A total of 729 varieties released from the year 1940 to 2013 was analyzed using HvSSR markers. These varieties possessed various agronomical and economically important traits such as tolerance to biotic and abiotic stresses (drought, cold, salinity and lodging), aroma content, grain yield and early maturity etc. (Additional file 1: Table S1).

HvSSR marker based analysis

Thirty-six HvSSR markers were used to characterize 729 rice varieties. Gene diversity, heterozygosity, major allele frequency and PIC were calculated for all the 36 HvSSR markers. A total of 112 alleles was amplified with an average of 3.11 alleles per locus (Table 1). Similar observations were also reported; 3.02 alleles per locus with SSR markers during characterization of 25 Indian rice hybrids [2] and 3 alleles per locus in a set of 192 Indian rice germplasm characterization [25]. The number of alleles amplified per HvSSR primers varied from 2 to 5 with maximum numbers of alleles (5) being amplified by primers; HvSSR09-11, HvSSR11-21, HvSSR11-58 and HvSSR12-13. A similar number of alleles (2–5) for SSR markers were reported in 141 Basmati rice accessions of North Western Himalaya [37]. The PIC values for HvSSR primers ranged from 0.04 (HvSSR06-16) to 0.58 (HvSSR03-37) with a mean of 0.29. Shah et al. [38] and Pachauri et al. [29] have reported mean PIC values 0.37 and 0.38, respectively in different sets of rice varieties which were closer to our result. On the other hand, Pal et al. [30] reported mean PIC value 0.40 on a set of basmati and non-basmati varieties and Salgotra et al. [37] have reported mean PIC value 0.40 in a basmati collection of north-western Himalaya, which were little higher than our result. The gene diversity ranged from 0.04 (HvSSR06-16) to 0.66 (HvSSR03-37) with an average of 0.33. Gene diversity obtained in the present study was quite low as compared to 0.52 [25] and 0.54 [6] reported in rice germplasm lines and varieties, respectively. Heterozygosity varied from 0.92 (HvSSR03-02) to 0.00 (HvSSR05-30) with an average of just 0.15. The low level of heterozygosity has also been reported in other studies on rice [5, 25] and this could be attributed to its self pollination behavior. The major allele frequency was also calculated for all 36 HvSSR markers which ranged from 0.37 (HvSSR03-37) to 0.97 (HvSSR06-16) with an average of 0.76 (Table 1). The average major allele frequency in the present study was higher as compared to the previous studies on Indian rice varieties [46] and Korean landraces [19].

Table 1 List of HvSSR primers used for genotyping of 729 rice varieties along with their chromosomal position, product size, No. of alleles amplified, gene diversity, heterozygosity and PIC value

Hierarchical cluster analysis

The amplicons generated by HvSSR markers across 729 varieties were used for cluster development using the neighbour joining (NJ) method. The unrooted tree (Fig. 1) grouped 729 rice varieties into two major clusters, 400 varieties in cluster1 whereas; 329 varieties were grouped in cluster2. This grouping was further supported by studies of Upadhyaya et al. [46], Nachimuthu et al. [25] and Das et al. [9] who also have reported two clusters during their studies in Indian rice germplasm. Further, we also analyzed clustering pattern of rice varieties based on their traits. Passport data of some of the varieties contained information about a few key traits which helped us in studying trait based grouping in the NJ clusters. Out of 40 aromatic rice varieties, 25 were grouped into cluster2 and 15 into cluster1. Similarly, out of 15 salt tolerant varieties nine were present in cluster2 and six were present in the cluster1. This suggests that for the development of aroma and salinity tolerant varieties breeders may have used diverse parents, as reflected by their grouping in both the clusters. Additionally, we also analyzed clustering pattern of hybrids. Interestingly, out of 30 hybrid varieties studied in the set, 22 (73.3 %) were grouped into cluster2 and just eight in cluster1. In cluster1 out of eight hybrids six were having IR series parents common (Additional file 2: Table S2).

Fig. 1
figure 1

NJ tree of the 729 rice varieties

Model based population structure

Population structure divided 729 varieties into 3 populations (Figs. 2 and 3 and Additional file 3: Figure S1). Population1 (pop1), population2 (pop2) and population3 (pop3) contained 72, 329 and 328 varieties, respectively. Further, based on the membership fractions, varieties under different populations were categorized as pure or admixture. The varieties with the probability more than ≥0.80 score was considered as pure and less than 0.80 as an admixture. Pop1 showed 44 pure (61 %) and 28 admixed (38 %) individuals, pop2 showed 282 pure (86 %) and 47 (14 %) admixed individuals and pop3 showed 260 pure (79 %) and 68 (20 %) admixed individuals. The mean Fsts value of pop1, pop2 and pop3 were 0.0118, 0.3240 and 0.2667, respectively, and mean alpha value was 0.0829 (Table 2). The allele frequencies (divergence among populations) were 0.0686 between pop1 and pop2; 0.0533 between pop1 and pop3 and 0.0548 between pop2 and pop3 (Table 3). Earlier studies on population structure have reported two to eight sub-populations using different rice collections [1, 4, 13, 20, 4951]. Roy et al. [36] and Upadhyaya et al. [46] have also reported similar population number in different set of Indian rice varieties. The relatively small value of alpha (α = 0.0829) in present study reveals that, only few individuals were admixed. Alpha value approaching zero indicates that most individuals in the study are from separate populations [19] whereas; an alpha value greater than 1 indicates that most of accessions of populations are admixed [28]. Distributions of rice varieties in different populations based on their traits were also studied. All 30 hybrid varieties and most of the aromatic rice varieties were grouped in pop2 (Fig. 2 and Additional file 3: Figure S1). It was also observed that 190 varieties out of 329 in pop2 were released after 1979, which indicates that most of the recently released varieties were present into pop2. Both hierarchical and model based population structure showed that large number varieties in cluster2 (347) and pop2 (329) correspond to each other.

Fig. 2
figure 2

Population structure of 729 rice varieties

Fig. 3
figure 3

Estimation of population using LnP(D) derived Δk for k from 1 to20

Table 2 Mean value of alpha, Fst1, Fst2 and Fst3 inferred from model based approach
Table 3 Allele-frequency divergence among populations computed using estimates of P (Model based approach)

AMOVA and PCoA of clusters obtained using hierarchical approach

AMOVA for the 729 varieties was performed based on the two clusters obtained using hierarchical cluster analysis. The two populations showed 3 % variance among themselves, whereas, 61 % variance was recorded among individuals and 36 % variance within individuals (Fig. 4 and Table 4). PCoA based on hierarchical clusters (labeled with two different colours) showed intermixing of two groups across the coordinates (Fig. 5). The first three axes explained 15.9 % of cumulative variation (Table 5).

Fig. 4
figure 4

Analysis of molecular variance (AMOVA) of 729 rice varieties based on populations obtained by hierarchical approach

Table 4 Summary of analysis of molecular variance (AMOVA) for hierarchical clustering approach
Fig. 5
figure 5

Principal coordinate analysis (PCoA) of 729 rice varieties based on populations obtained by hierarchical approach

Table 5 Percentage of variation explained by the first 3 axes using Principal coordinate analysis for hierarchical clustering approach

AMOVA and PCoA of populations obtained using model based approach

AMOVA was performed on three populations obtained using a model based approach. Among three populations 11 % variance was recorded, whereas, among individuals, 55 % variance and within individuals, 34 % variance was found (Fig. 6 and Table 6). Choudhury et al. [7] have also reported similar pattern of variation in Indian rice germplasm using populations derived from the model based approach. PcoA revealed that large genetic diversity exists in Indian rice varieties. The first three axes explained 15.9 % of cumulative variation (Table 7). In PcoA, rice varieties were labeled with three different colours which represent the three populations obtained from population structure (Fig. 7). The pop1 and pop2 showed distinct grouping whereas; the individuals of pop3 were distributed over pop1 and pop2.

Fig. 6
figure 6

Analysis of molecular variance (AMOVA) of 729 rice varieties based on population obtained by model based approach

Table 6 Summary of analysis of molecular variance (AMOVA) for model based approach
Table 7 Percentage of variation explained by the first 3 axes using Principal Coordinate analysis for model based approach
Fig. 7
figure 7

Principal coordinate analysis (PCoA) of 729 rice varieties based on population obtained by model based approach

Hierarchical based AMOVA analysis showed less (3 %) variation among population, compared to model based structure population (11 %). The reason for less variation between populations in case of hierarchical clusters may be due to the number of groups predicted (two clusters) which was higher in case of model based approach (three groups).

Decadal genetic diversity trend analysis in Indian rice varieties

To understand the genetic diversity trend in Indian rice varieties released and notified from 1940 to 2013 an interval of 10-years was taken to keep the number of varieties almost constant in each interval. Except for interval 1940–1965, which was about 26 years, other intervals were about 10 years. Decadal analysis showed a steady increase in gene diversity of varieties released during 1966–1975 (0.314), 1976–1985 (0.315), 1986–1995 (0.335), 1996–2005 (0.346) but it showed decreasing trend during 2006–2013 (0.290). In contrast to gene diversity, major allele frequency showed decreasing trend in the varieties released during 1940 (0.774) to 2005 (0.751) whereas; it showed an increase during 2006–2013 (Table 8). Number of alleles (Na) increased remarkably up to 17.3 % in the varieties if we compare the allele number present in the interval 1940–1965 with the allele number present in the interval 1966–1975. But after 1975, only 3.9 % increase was observed, when interval of 1966–1975 was compared with an interval of 1976–1985 and 1 % increase in the number of alleles was recorded when 1976–1985 interval was compared with 1986–1995. Allele number was constant during 1986–1995 to 1996–2005 interval, but it showed 3 % increase from interval 1996–2005 to 2006–2013. Similar trends in the number of alleles (Na) was reported by Choudhary et al. [6], whereas; in contrast to this Wei et al. [47] reported decreasing trend in Na after 1980s using different set of rice collections. Similarly, gene diversity also showed a steady increase in the varieties released during 1940s to 2005, but it decreased in those released during 2006–2013. Mantegazza et al. [23] have reported steady increase in the gene diversity level in Italian rice germplasm; in contrary to this no change was observed in the level of diversity in rice varieties of Nepal [43]. PIC also showed increasing trend in all varieties released over the period of 1940 (0.267) to 2005 (0.302) whereas; it showed a decrease during 2006–2013 (0.255). Comparison of per cent increase in PIC value showed an increase of 2.27 % between interval 1940–1965 and interval 1966–1975, but no major change was recorded for the interval 1966–1975 and interval 1976–1985. Further, it showed an increase of 7.13 % and 3.45 % between interval 1976–1985 and 1986–1995 and between intervals 1986–1995 and 1996–2005, respectively. But it drastically decreased (15 %) between 1996–2005 and 2006–2013. It clearly indicates massive loss of genetic diversity in the last 10 years, which may be attributed to shifting towards the trait specific breeding during this period. According to Choudhary et al. [6], the reason for this trend could be the selection priorities of breeders for need based breeding.

Table 8 Pattern of gene diversity, heterozygosity, major allele frequency and PIC over decadel periods

Pedigree-based analysis of hierarchical cluster and model based population

Analysis of varieties sharing common parentage (Additional file 1: Table S1) showed that they were grouped in the same cluster (Fig. 1) or population (Fig. 2). For example, varieties sharing common parentage like CNM-25 and CNM-31, GR11 and GR4, ADT 31 and ADT 33, Kumbham and Makaram were grouped into common cluster2 and pop2. Similarly, Archana and Pusa 4-1-11, KalingaI and KalingaII, Vjaya and Jayanthi were grouped into pop2 and cluster1. Chaitanya and Krishnaveni, Deepti and Nandi, Aruna and Remya, PTB-39 and PTB-41, Sasyasree and Vikas, Moniram, Bahadur, Piolee and Kushal were grouped into pop3 and cluster1, Dharitri and Savitri were grouped into pop3 and cluster2 (Figs. 1 and 2 and Table 9). There were a few exceptions where varieties having common parentage were not grouped together in the same cluster or population. For example, Kanchi and Vaigai were grouped into same population (pop2) but, in different clusters. Similar trend in pedigree based clustering was also observed by Upadhyay et al. [6, 46]. Upadhyay et al. [46] showed that varieties with at least one common parent were grouped in one cluster and Choudhary et al. [6] showed that varieties released during different decades were also grouped together due to the presence of common parents in their pedigree.

Table 9 Rice varieties sharing common parentage

Co-linearity between hierarchical cluster and model based population analysis

The Co-linearity between varieties grouping in hierarchical cluster and model based population structure was confirmed by Venn diagram. Liu et al. [22] studied the Chinese wild rice collection and has also shown that Venn diagram is a robust method to study overlapping accessions. In the Venn diagram out of 729 varieties, 244 rice varieties (56.5 %) were common between pop2 and cluster2. Similarly, 252 rice varieties (55.3 %) were common between pop3 and cluster1 (Fig. 8a and b). This study supports that grouping of rice varieties based on hierarchical cluster and model based approach were more than 55 % similar.

Fig. 8
figure 8

a Venn diagram showing co linearity between population 2 & cluster 2. b Venn diagram showing co linearity between population3 & cluster1

Conclusion

The present study based on 36 HvSSR markers distributed over all 12 chromosomes of rice suggests that after green revolution breeders have used different parentage for improving the yield, quality and plant architecture, but after 2006 priority of breeders have changed and instead of plant architecture, more focus was on breeding for biotic and abiotic stress tolerance and trait-specific improvement. This could be the possible reason that allele number recorded over the period has not decreased, but genetic diversity and PIC have shown a sudden decrease after 2006. To broaden the genetic base, there is an urgent need to incorporate more diverse donor parents in the breeding program for varietal improvement in rice.

Methods

Plant materials

Seed samples of 729 varieties of rice received from Indian National Genebank, ICAR- National Bureau of Plant Genetic Resources (NBPGR), New Delhi. The details of each variety along with passport data (National ID, i.e. Indigenous Collection (IC) number, state, local name, pedigree, traits) are given in Additional file 1: Table S1 [39].

DNA extraction from rice seed

Seeds of each variety (10–12 seeds) were dehusked and used for DNA isolation using QIAGEN DNeasy plant mini kit (Hilden, Germany). Fine powder was obtained by grinding kernels using tissue lyser (Tissue lyser II Retsch, Germany) with a tissue lyser adapter set (QIAGENq). QIAGEN DNeasy plant mini kit protocol was followed for DNA isolation.

Genotyping of rice varieties using SSR markers

Initial screening was done with 120 highly variable SSR (HvSSR) marker loci with repeat lengths of 51–70 bp which were located across all 12 chromosomes of rice [40]. Finally 36 most polymorphic markers (3markers/chromosome) were selected, which were covering both long and short arm of rice chromosome for genotyping of 729 rice varieties. Temperature of amplification for each primer was standardized by gradient PCR with selected rice samples.

Working stocks (10 ng/μl) of genomic DNA of all the 729 varieties were prepared. PCR reaction mixture (total volume of 10 μl) contained 2 μl genomic DNA (10 ng/μl), 0.8 μl of 25 mM MgCl2, 1 μl of 10X buffer, 0.2 μl of each primer (10 nmol), 0.2 μl of 10 mM dNTPs, 0.2 μl of Taq DNA polymerase (Fermentas, Life Sciences, USA) and 5.6 μl distilled water. The conditions for PCR amplification were as follows: initial denaturation at 94 °C for 4 min followed by 36 cycles of 94 °C for 30 s, Ta for 45 s, 72 °C for 1 min and final extension at 72 °C for 10 min. 4 % metaphor agarose gel was used for analyzing the amplified products with the constant supply of 120 V for 4 h. Gel documentation system (Alpha Imager®, USA) was used to record the gel pictures.

Statistical analyses

Power Marker 3.5 [21] was used to calculate major allele frequency, gene diversity, heterozygosity and polymorphic information content (PIC) for each locus of HvSSR markers. The genetic distance calculated for each variety with Power marker, which was used for cluster development using neighbor-joining (NJ) tree. The un-weighted neighbor joining tree was constructed using DARwin software 5.0.158 [32] GenAlEx V6.5 [31] was used to study PCoA and AMOVA. To study the population structure model-based program, STRUCTURE 2.3.3 [34] was used and three replications were run for each K. Each run was implemented with a burn-in period of 100,000 steps with 100,000 Monte Carlo Markov Chain replicates [34]. The membership of each genotype was run for a range of genetic clusters from value of K = 1 to 20 by taking the admixture model and correlated allele frequency into account. The plateau of ΔK was obtained by plotting LnPD values derived for each K [11]. The final population was determined using “Structure harvester” program (http://taylor0.biology.ucla.edu/structureHarvester/). Venn diagram analysis was performed to identify common varieties between cluster and population using software Venny 2.1 [27].

Ten-year interval analysis

All 729 rice varieties were divided into six different intervals on the basis of year of release and notification [1940–1965 (18 varieties), 1966–1975 (69), 1976–1985 (127), 1986–1995 (168), 1996–2005 (147), and 2006–2013 (136)]. Except 1940–1965, all intervals showed approximately comparable number of genotypes. The value of gene diversity, heterozygosity, major allele frequency and PIC of rice varieties falling in six time intervals were analyzed using the Power Marker. The mean value of gene diversity, heterozygosity, major allele frequency and PIC were plotted with years (time interval) on X axis and values of gene diversity, heterozygosity, major allele frequency and PIC on Y axis (Fig. 9).

Fig. 9
figure 9

Graphs showing gene diversity, PIC, genotype no and allele number of the 729 rice varieties