Introduction

Basmati is a type of long-grained fragrant rice, and only certain named varieties grown in specified regions of India and Pakistan are permitted to carry the name Basmati (Singh et al. 2000). Authorities in India and Pakistan have recently approved specific modern varieties derived from crosses with Basmati landraces to be sold labeled as Basmati if they conform to certain physical and aromatic characteristics (Mahajan et al. 2018; Roy et al. 2018). Basmati rice imported into the EU and UK is subject to regulations (whereby nine approved Basmatis have tariff free status) and by the industry Code of Practice (CoP) to ensure that it is accurately described and presented for sale so as not to mislead the consumer. In 2017, the number of approved Basmati varieties was revised and increased from 15 to 41, and the CoP was updated to reflect this (Rice Association 2017). However, concerns were raised within the rice industry that the increased number of Basmati rice varieties coming onto the market may not all be identifiable with currently used DNA markers. There were also concerns that some of the new varieties could have a more distant relationship to landrace varieties than the previously approved Basmati varieties.

Two previous studies were funded by the Food Standards Agency (UK) to identify DNA markers for Basmati authentication. Initially, Bligh (2000) identified 12 microsatellite markers (also known as simple sequence repeats, SSR) that together could be used to distinguish Basmati varieties from non-Basmati varieties. Then, between 2003 and 2008, over 200 publically available rice microsatellite markers and 50 InDel markers were tested at Bangor University on the 15 Basmati varieties listed in the 2005 CoP, and informative markers were identified and used in surveillance in the UK (Steele et al. 2008; Woolfe and Steele 2019). Subsequently, Eurofins Genomics (Ebersberg, Germany) selected 13 SSR markers from these two previous studies and added two additional SSR markers to come up with a set of 15 microsatellites. These 15 SSRs have been widely used by industry regulators in the EU for over a decade to determine between the Basmati varieties and quantitatively estimate the ratios of Basmati to non-Basmati grains present in mixtures.

An additional PCR maker known as fgr (Bradbury et al. 2005) that is based on InDel and SNP variation is commonly used to detect non-fragrant rice. Together, these 16 markers were used to test the recently approved Basmati varieties that were added to the CoP in 2017 (Nader et al. 2019). The results revealed that five of the newly approved Basmati varieties (Yamini CSR-30, Pusa Basmati 3, Vallabh Basmati 22, Vallabh Basmati 23, and Pusa 1637) had the same allele profiles as other permitted varieties in the 2017 CoP (Rice Association 2017).

KASP is a different type of PCR-based marker system that identifies DNA polymorphism at specific genomic loci; these can be either an InDel (insertion/deletion of one or more nucleotides) or a single nucleotide polymorphism (SNP). It is not possible to develop KASP markers that directly target SSR sequences. KASP requires end-point PCR with detection of two florescent dyes indicating the genotype depending on the amount of each of two alternative primers that become incorporated into amplicons. The reaction results in co-dominant allele calls for the target variant SNP or InDel. Successful application of KASP technology relies on primer designs that account for variations between non-target sequences of different rice varieties to ensure detection of the target polymorphic alleles (Steele et al. 2018). Suitable variations occur throughout the genome, so for this study, it was considered strategic to use target SNP or InDel variants located within 10 kb of the microsatellites currently used for Basmati testing and rice authentication or that have previously shown high levels of polymorphism in rice.

In non-aromatic rice a protein product, BADH2 (also known as BAD2) is produced as the gene product of the BADH2 gene. But in fragrant rice, BADH2 is not produced, and its absence enables the biosynthesis of 2-acetyl-1-pyrroline which accounts for the characteristic aroma of cooked Basmati rice (Chen et al. 2008). Basmati varieties, along with some other fragrant rice varieties, contain an 8 bp deletion and two SNPs (badh2.1) in exon 7 of the BADH2 gene, and this is the most common allele of BADH2 responsible for aroma in rice (Kovach et al. 2009). The PCR marker fgr (Bradbury et al. 2005) targets the badh2.1 variant, and it is widely used in Basmati authentication tests, so SNP and InDel variants in the BADH2 gene were predicted to be potential informative KASP markers.

Glutinous rice does not contain granule-bound starch synthase (GBSS), the protein product encoded by the Waxy gene, because only immature 3.3 kb Wx pre-mRNA containing intron I is produced. Basmati is an intermediate between glutinous and non-glutinous rice having both splice variants of the Waxy gene (mature 2.3 kb Wx mRNA and immature 3.3 kb Wx pre-mRNA) resulting in a phenotype with intermediate gelatinization temperature (70–74 °C) and medium gel strength, two important traits that characterize Basmati rice. Variants in the Waxy gene were, therefore, predicted to be potential informative KASP markers.

The objective of this study was to test 364 selected KASP marker designs for detection of polymorphism and utility in discrimination among the varieties listed as Basmati in the 2017 CoP and between approved Basmati and non-approved rice varieties.

Methods

Sourcing of Reference Materials

The UK Rice Association requested samples of all approved Basmati varieties from the All India Rice Exporters’ Association (AIREA) and Rice Exporters Association of Pakistan (REAP). Thirty-three samples representing different varieties of polished grains were received from them in 2017. India was the original source of 25 samples, and Pakistan was the original source of 8 samples (Table 1). The 2017 CoP lists Punjab Basmati 2 twice, but only one sample with that name was received (from India). The Indian Punjab Basmati is also known as Bauni Basmati (Roy et al. 2018), while unrelated the Pakistani Punjab Basmati is genetically close to Chenab Basmati. Only one approved sample labeled as Punjab Basmati was received, and it was from Pakistan; therefore, Punjab Basmati (Bauni Basmati) was missing from the analysis. Other approved varieties (Table 1) were from Bangor University’s collection of rice reference materials that had been obtained previously by the Food Standards Agency between 2003 and 2011 from verified sources in India and Pakistan, IRRI, and from a collection at Nottingham University collected before 2000 (Bligh et al. 1999). The final set of 41 approved Basmati varieties available for testing included two approved samples labeled Shaheen and Shaheen Basmati.

Table 1 Names and origins of rice samples used in this study

Reference materials of 19 non-approved varieties were selected from Bangor’s collection (Table 1), either because they have previously been detected as adulterants to commercial Basmati or because they could potentially be used as adulterants, or alternatively, in certain cases and subject to appropriate agreements, they could potentially be approved as Basmatis in the future. All reference materials were stored at 4°C.

DNA Extraction

The study was designed to maximize the number of KASP genotyping assay designs tested and not the number of replicate samples. Nineteen non-replicated samples were extracted using the CTAB method described by Saghai-Maroof et al. (1984) from 20 g of multiple grains of a single sample ground in a coffee grinder (cleaned and decontaminated between every sample, with the first ground material from each separate sample discarded). DNA was also extracted from 60 rice samples from single grains (up to five replicates per sample) following the sampling and extraction protocol described in the standard operating procedure for the identification of selected Basmati rice varieties (FSA 2005) using Phytopure reagents (GE Healthcare) supplied by Sigma-Aldrich.

DNA samples were tested for concentration and quality (A260:A280 and A260:A230) with a NanoDrop™ 1000 spectrophotometer (Thermo Fisher). In addition, the size and integrity of the DNA was checked by running 5 ul of each extracted sample (not normalized) mixed with 2-ul loading buffer, on a 0.8% agarose gel stained with Safeview (Bioline). A control of IR64 rice leaf DNA extracted with a Qiagen DNeasy kit was run alongside the samples on the gel. The extract from one replicate of each sample, extracted with either method, with the highest DNA concentration measured with the NanoDrop™ 1000, was selected for use in the first round of KASP screening. Only samples extracted with the Phytopure methods were selected for the second screen.

Design and Selection of KASP Markers

To design KASP, sequence data from 120 publically available re-sequenced rice genomes were mined to identify flanking sequences of variants due to SNPs (single nucleotide polymorphisms) or InDels (insertion/deletions). The bioinformatics approaches used were alignment of each genome against the Shuhui498 indica reference genome (Du et al. 2017), Bowtie2 for sequence read alignment (Langmead and Salzberg 2012), and SAMtools for genotype likelihood calculation and variant calling (Li et al. 2009). The appropriate International Union of Pure and Applied Chemistry (IUPAC) nucleotide ambiguity codes were used to incorporate all detected variation into KASP primer design to improve versatility of the markers compared to the earlier designs that were based on only nine re-sequenced genomes (Steele et al. 2018). For each KASP design a .bed file containing the sequences spanning 60 bp on each side of the target variant was used to generate primers with the Kraken™ software by LGC, Biosearch Technologies (Hoddesden, UK).

From over 1 million new designs, 364 KASP markers were chosen for this study of which 139 were validated in this study for the first time. They were selected to target variations considered to be relevant to Basmati varieties using the following three criteria:

  1. (i)

    Within 10 kb of the genome alignment positions of primers for the microsatellite markers widely used in Basmati authentication testing, including RM1, RM223, RM202, RM44, RM201, RM229, RM241, RM171, RM55, RM263, RM212, RM252, RM282, RM339, RM55, RM5432, and RM72. Up to 5 KASP were selected that were within 5 kb for each microsatellite.

  2. (ii)

    In proximity of the fragrance-determining gene BADH2 on chromosome 8 (all designs targeting variations within 5 kb of BADH2 were selected). One KASP targeted the 8 bp InDel within Badh2.1. Twelve KASP targeted other functional polymorphisms associated with other Badh2 variants: a 7 bp deletion in exon 2 (Shi et al. 2008); a 7 bp insertion in exon 8 (Amarawathi et al. 2008); a 3 bp insertion in exon 13 (Myint et al. 2012); and a 3 bp deletion in 5’ UTR (Shi et al. 2014). Due to insufficient sequence information in the original publication, it was not possible to design KASP in the positions relating to the specific badh2 sequence variations described by Shao et al. (2013). A further six KASP designs were located in close proximity to microsatellites that show co-segregation with the major aroma locus.

  3. (iii)

    Located within the amylose gene (Waxy) locus on chromosome 6 (at 1.64 Mbp) that controls biosynthesis of GBBS which produces amylose, a major protein that binds to starch granules. Five KASP were designed for sequences within Waxy including two for target functional variants.

  4. (iv)

    Located in or near to genes considered of interest by breeders in India and Pakistan. A total of 225 KASP markers were selected from successfully validated new designs used for rice genotyping in Innovate UK project 103711.

  5. (v)

    Selected to give an even distribution of background markers across all 12 chromosomes.

Genotyping

KASP genotyping was carried out by LGC Biosearch Technologies, Hoddesdon, UK, via their all-inclusive service for plant samples (https://www.biosearchtech.com/services/genotyping-services/all-inclusive-services) using their high-throughput workflow and genotyping instrumentation.

An initial main screen used 364 KASP markers to test 60 DNA samples (41 extracted with Phytopure from a single grain and 19 extracted using the CTAB method from multiple grain samples). Some CTAB-extracted samples failed to amplify with any markers or were only successful with a limited number of KASP markers. For this reason, a second screen with only 23 KASP was run on 21 varieties using Phytopure-extracted single grain samples, most in replicate (two to five). A maximum of five replicates were used for all varieties that had shown within variety microsatellite variation (Nader et al. 2019).

A data file containing called genotypes was provided for both screens by LGC Biosearch Technologies, and these could be analyzed using their SNPViewer™ software (https://www.biosearchtech.com/support/tools/genotyping-software/snpviewer). Where no allele could be called, a “?” was given in the spreadsheet; this could either be due to a null allele (due to a variant that is incompatible with the primer binding) or due to a failed PCR reaction which was most likely to be caused by poor quality DNA. Genotyping results from both rounds were combined into a single dataset in Microsoft Excel containing a maximum of 327 marker genotypes for each variety. In the few instances where different replicates of the same sample differed, the heterozygous genotype was called and used for subsequent analysis.

Data Analysis

From 327 successful KASP markers, all monomorphic loci and loci with only one homozygous genotype present among the entire dataset were removed, leaving 255 polymorphic KASP to be analyzed with the aim of identifying the smallest and most informative subsets of KASP markers.

The markers were sorted according to the frequency of the rarest (minor) alleles. Those with the lowest minor allele frequencies (MAF) were then considered first, in a test procedure that added each marker genotype in turn, stepwise for each variety, until additional markers did not further distinguish the variety from other varieties in the test set. Any markers that were uninformative for all varieties were discarded. This procedure identified the greatest number of unique varieties that could be separated with as few markers as possible.

Eight varieties (Basmati 217, Ranbir, Taraori, Chenab Basmati, Pusa Suganda, Sugandha, S. Shabnam, Niab/221-9) that could not be uniquely identified using the remaining markers were removed. Then, for each uniquely identifiable variety, the above analysis was run with only the markers that were informative for that variety. This produced a small (< 9) subset of markers for each target variety that could be used to identify that a single specific variety from all the others.

Nine families were assigned as follows: (A) Basmati traditional types; (B) Super Basmati and Basmati 2000 types; (C) Pusa Basmati types; (D) other Basmati types; (V) likely to be Basmati but not approved references; (W) non-Basmati (fragrant) closely related to approved varieties; (X) other non-Basmati (fragrant); (Y) American long-grain fragrant rice, non-approved; and (Z) non-fragrant potential adulterants. To identify a set of markers that could distinguish between these families, a similar stepwise procedure to that described above was used. However, the order in which markers were considered was determined by the number of families containing a variety with the minor allele genotype for each marker, with those markers with a minor allele in the most families being tested first. Secondary sorting of marker order was based on descending MAF. The same stepwise procedure was carried out for each variety, but testing for each variety stopped when all possible varieties, based on the genotypes of the markers included in the test set, belonged to the same family. A dendrogram based on the smallest subset of markers that could distinguish between families was produced using a neighbor joining clustering method in Tassel (Bradbury et al. 2007).

Results

DNA Concentration and Quality

CTAB- and Phytopure-extracted DNA samples measured by the NanoDrop™ 1000 both showed concentrations ranging from 4 to 400 ng/ul. The agarose gel analysis showed that samples with measured concentrations of < 20 ng were either faint or not visible and all samples had lower molecular weight than the Qiagen DNeasy-extracted control sample used on the gel. For this reason, only samples that were both clearly visible on the gel and had measured concentrations of > 50 ng/ul were considered suitable for genotyping with KASP technology, and these were selected for use in the first KASP screen. However, a large proportion of the CTAB-extracted samples failed with all or some of the KASP used in the first screen; hence, a second screen was carried out with Phytopure extracts of the same samples, but with a reduced number of KASP.

KASP Genotyping

From 364 KASP designs submitted, 327 (90%) were converted successfully to KASP markers and used for genotyping the rice samples (Supplementary Data Sheet 1). In total, 255 (78%) of the successful KASP showed polymorphism in the set of varieties tested. Thirteen KASP were monomorphic and 59 revealed only one homozygous allele and were therefore not considered informative in this population. There was full agreement between the first and second round of genotyping for the varieties that were genotyped in both rounds.

Within Variety Variation

Replicate single grain samples for three of the varieties did not show consistent alleles, whereas the other 18 varieties that were tested in replicate were consistent. The three variable samples were as follows:

  • Basmati 564—one out of four replicates had the allele corresponding to the 8 bp insertion at BADH2_PM while a different replicate was homozygous for C (not T) at RM166_SNP_7.

  • Chenab Basmati—heterozygous at RM209_SNP_nn_1 in two of the five replicates.

  • Basmati 198—the two replicate samples carried different alleles for c9bg_ff_SNP_4.

The two different samples Shaheen and Shaheen Basmati differed at five markers.

Identification of Individual Varieties and Families

Forty-four of the 60 varieties could be distinguished with 69 KASP markers (Supplementary Data Sheet 1, Variety ID column). Eight varieties were removed from the dataset for subsequent analysis, including Taraori and Chenab Basmati, because this study could not distinguish between them exclusively with homozygous alleles (Table 2). The remaining set of 52 varieties, containing 37 approved Basmatis, could all be distinguished in a pairwise comparison of variety confirmation, with between three to eight separate KASP markers from a pool of 98 KASP (Supplementary Data Sheet 2). Discrimination at the level of nine families was possible with only 24 markers, but a small proportion were based on differences between heterozygous and homozygous calls (Supplementary Data Sheet 3 and Fig. 1).

Table 2 Varieties showing the same KASP genotypes as other varieties in this study, and therefore not included in the analysis of 52 for variety confirmation
Fig. 1
figure 1figure 1

a UPGMA clustering based on 24 markers that discriminate between families. Boxes denote 41 approved basmati varieties, and circles indicate samples with verified origin in Pakistan. b Simple matching coefficient values 0 = identical (dark blue) 1 = entirely different (dark red) for all pairwise comparisons between rice varieties from 24 selected KASP that distinguish between eight families (A–D approved Basmati accessions, V–Z non-approved accessions)

Discussion

Industry regulators need reliable tools to distinguish differences within the approved Basmati varieties listed in the 2017 CoP and between approved varieties and non-approved rice varieties. This study used KASP markers to test 41 of the approved Basmati varieties listed in the 2017 CoP and demonstrated that they distinguish between more pairs of the approved Basmati varieties than has been previously possible with microsatellites (Nader et al. 2019). KASP could distinguish between some pairs of Basmati varieties which current microsatellites could not. One pair is Basmati 370 and Type 3, for which five polymorphic KASP markers were identified (Table 3), all from chromosome 2. KASP can also distinguish Vallabh Basmati 22 from Vallabh Basmati 23, Pusa Basmati 1 from Pusa Basmati 1637, Pusa Basmati 6 (Pusa 1401) from Pusa Basmati 1728, and Basmati 386 from Kernal. However, only one homozygous/heterozygous KASP polymorphism was detected between Basmati 370 and Ranbir (Table 3), whereas two microsatellites can distinguish this pair of varieties (Nader et al. 2019). Previous (unpublished) results with DNA extracted from leaf material showed that Taraori was distinguished from Yamini and Kernel with KASP OsR498G0713985600_SNP_ff_1 and that Basmati 217 was distinguished with KASP RV211_SNP_nn_2, but these results were not repeated in this study which used polished grain extracts.

Table 3 KASP markers with potential to quantify components in specific mixtures of varieties

Within variety KASP variation was only detected in three of the replicated approved samples: Basmati 564, Chenab Basmati, and Basmati 198. This finding is in agreement with the microsatellite dataset of Nader et al. (2019). The fact that several markers differentiated between the two different samples named Shaheen and Shaheen Basmati, from different origins, is noteworthy.

The 2017 CoP listed variety Punjab Basmati (Bauni Basmati) from India was not available for this study. We suggest that the next iteration of the CoP should include both Punjab Basmati (India) and Punjab Basmati (Pakistan) because despite sharing a name, they could have different heritage. Punjab Basmati 2 should appear only once in the CoP.

Comparison of Basmati with Non-approved Varieties

KASP markers could distinguish 17 of the non-approved varieties (out of 19 tested) from the 41 approved Basmati samples. The precise origins of the two samples that could not be distinguished are unclear; Basmati_1 was sourced from a historic collection of fragrant rice and has a very close match to Ranbir, Basmati 370, and Basmati 217, while the Commercial Sample had a KASP genotype that is very close to Improved Pusa Basmati 1 (Pusa 1460).

Previously it has been difficult to distinguish the approved variety Super Basmati from the non-approved variety S. Shabnam. Although a single polymorphic marker KASP marker was identified that distinguished Super Basmati from S. Shabnam, the difference was based on a null allele (Table 3).

In order for authenticity testing to be cost-effective, the smallest number of informative markers possible is needed. A set of 24 KASP markers can separate the majority of the approved varieties (Fig. 1) distinguishing between nine pre-defined families, of which four families contain the approved Basmati varieties. Nader et al. (2019) showed that 15 microsatellites and fgr could distinguish between a different four families of approved Basmati varieties, including two families of traditional and two of evolved varieties. Alternatively, pairwise comparisons might be more cost-efficient, and for this markers that distinguish specific pairs can be selected from the set of 98 markers identified for variety confirmation (Supplementary File, Sheet 2). The industry needs to consider which non-approved varieties (including ones not available for this study) might be used as adulterants for specific approved varieties, and then the appropriate KASP markers can be selected for development into quantitative tests.

Suitability of KASP Designs

These KASP markers, designed from comparison of 120 rice genomes, gave a 90% success rate for PCR amplification in 60 test samples which had not been used for primer design.

KASP markers selected within both the fgr and Waxy (amylose) genes were found to be useful in variety discrimination. The KASP marker targeting the 8 bp deletion in allele badh2.1 (BADH2_PM) gave the same pattern of variation as was detected in the same samples using the fgr PCR marker (Nader et al. 2019), so it can be considered an equivalent KASP marker. The KASP marker Frg8_Ins_Exon13 was designed to amplify a 3 bp insertion that is known to occur in Paw San, a fragrant Pearl rice variety from Myanmar (Myint et al. 2012), but only genotypes homozygous for the corresponding deletion (-,-) were detected. The absence of the alternative fragrance allele (3 bp insertion) in any of the CoP samples indicates that this alternative allele does not account for fragrance in any of them. There was no attempt made in this study to design a KASP assay for the second most common fragrance allele (badh2.7) which also has not been observed in Basmati varieties (Withana et al. 2020).

Deeper investigation into the scope and utility of KASP is needed. We recommend that large-scale studies should be designed to test a larger relevant set of non-approved varieties as well as to compare more replicates of each approved sample (and more representatives of each variety from different sources). This would give a better estimate of within variety variation and could fill the gaps in the current dataset.

Tens of thousands more marker designs could be screened using higher throughput sequencing-based genotyping technology for the same price as ~300 KASP, for example, by SeqSNP developed by LGC Biosearch Technologies (biosearchassets.blob.core.windows.net/assetsv6/seqsnp-targeted-genotyping-by-sequencing-alternative-routine-breeding-programs.pdf). However, high-throughput sequencing requires leaf samples for extraction of higher molecular weight DNA than can currently be obtained from polished rice. Sequencing-based genotyping with much larger numbers of markers should reveal further discrimination between varieties, but KASP technology will be more appropriate for routine testing where fewer than 100 markers would be the ideal number utilized by official control labs or industry.

The most relevant set of KASP markers must ultimately be agreed by the industry, and then blind testing of samples across multiple labs should be instigated in order to further affirm the robustness and fitness for purpose of the KASP markers. All future work should avoid CTAB extractions for DNA preparation from polished rice grains, and we recommend the use the Phytopure DNA extraction method.

This study paves the way for the development of procedures for quantitative testing of mixtures of specific sets of varieties that are of particular interest to the industry. The variety confirmation assays could be developed into a confirmation assay to distinguish among varieties, where smaller subsets of assays could be tested in a stepwise approach to confirm variety identification of a sample.

Primers for any of the KASP markers that were successfully amplified in this study (Supplementary Data Sheet 1) can be ordered from LGC Biosearch Technologies. The target loci used for KASP designs could also be adapted for amplification with technologies such as TaqMan or quantitative PCR.

Conclusion

This work has established a framework which uses the emerging molecular biological system (KASP markers) to add further discrimination for Basmati rice above and beyond the traditional microsatellite approach. KASP markers are shown here to be flexible and effective analytical tools with demonstrable value for food authenticity testing by industry and regulators.