Accelerating public sector rice breeding with high-density KASP markers derived from whole genome sequencing of indica rice

Few public sector rice breeders have the capacity to use NGS-derived markers in their breeding programmes despite rapidly expanding repositories of rice genome sequence data. They rely on > 18,000 mapped microsatellites (SSRs) for marker-assisted selection (MAS) using gel analysis. Lack of knowledge about target SNP and InDel variant loci has hampered the uptake by many breeders of Kompetitive allele-specific PCR (KASP), a proprietary technology of LGC genomics that can distinguish alleles at variant loci. KASP is a cost-effective single-step genotyping technology, cheaper than SSRs and more flexible than genotyping by sequencing (GBS) or array-based genotyping when used in selection programmes. Before this study, there were 2015 rice KASP marker loci in the public domain, mainly identified by array-based screening, leaving large proportions of the rice genome with no KASP coverage. Here we have addressed the urgent need for a wide choice of appropriate rice KASP assays and demonstrated that NGS can detect many more KASP to give full genome coverage. Through re-sequencing of nine indica rice breeding lines or released varieties, this study has identified 2.5 million variant sites. Stringent filtering of variants generated 1.3 million potential KASP assay designs, including 92,500 potential functional markers. This strategy delivers a 650-fold increase in potential selectable KASP markers at a density of 3.1 per 1 kb in the indica crosses analysed and 377,178 polymorphic KASP design sites on average per cross. This knowledge is available to breeders and has been utilised to improve the efficiency of public sector breeding in Nepal, enabling identification of polymorphic KASP at any region or quantitative trait loci in relevant crosses. Validation of 39 new KASP was carried out by genotyping progeny from a range of crosses to show that they detected segregating alleles. The new KASP have replaced SSRs to aid trait selection during marker-assisted backcrossing in these crosses, where target traits include rice blast and BLB resistance loci. Furthermore, we provide the software for plant breeders to generate KASP designs from their own datasets. Electronic supplementary material The online version of this article (10.1007/s11032-018-0777-2) contains supplementary material, which is available to authorized users.

Nepal, enabling identification of polymorphic KASP at any mapped trait or QTL in relevant crosses.

59
Cost is a major factor that determines whether or not marker assisted selection (MAS) is a viable breeding method 60 for national programmes and smaller breeders. Despite advantages such as improved reliability, MAS will rarely 61 be used if it is more expensive than phenotyping. Reducing the costs of markers increases the frequency of cases 62 where MAS is more cost effective than phenotyping. KASP is a cost effective and flexible proprietary technology 63 of LGC Genomics, however, public sector rice breeders have been slow to adopt it because KASP assays have 64 not been widely published in linkage maps to the same extent as SSRs. Where costs permit, SSRs are still the marker for a specific gene, through to several thousands of markers for applications such GS. The effectiveness 92 of KASP has been demonstrated in plant-breeding applications, including quality control analysis of germplasm 93 (Semagn et al., 2012;Ertiro et al., 2015), screening for candidate alleles and genotyping (Mideros et al., 2013; 94 Pham et al., 2015), bulk segregant analysis and genetic mapping (Ramirez-Gonzalez et al., 2014;Mackay et al., 95 2014), and MAS (Cabral et al., 2014;Leal-Bertioli et al., 2015).

96
Marker assisted breeding has been introduced in Nepal's national programmes, mainly based on SSRs 97 but recently incorporating existing KASP for background selection. However, few of the existing rice KASP were 98 suitable for selection at the breeders' targets of BLB and blast resistance genes and aroma QTLs. Therefore, the 99 objective of the work reported here was to identify appropriate SNPs and InDels, for this purpose, in order to 100 facilitate the uptake of KASP for greater efficiency of rice breeding. At current rates the KASP genotyping service 101 is estimated to be 60% cheaper than running SSRs in-house at NARC's laboratories in Kathmandhu, Nepal: Genotyping 475 samples with 10 assays costs $0.20 per data point with KASP (full genotyping service, including shipping costs), $0.39 with in-house KASP and $0.53 with in-house SSRs.

104
This study used whole genome NGS specifically to identify large numbers of SNP and InDel variations 105 and used bioinformatics filtering of NGS reads to discover potential KASP assays throughout the rice genome.

106
We re-sequenced nine indica rice lines and aligned the sequences to the indica reference genome to maximise the 107 identification of applicable loci. The study provides new evidence on the effectiveness of using NGS sequence 108 data from a limited number of lines and makes comparisons between the new potential KASP and those that were 109 available prior to this work for density and genomic distribution throughout the physical map in a range of crosses.

150
(Supplementary File S1) to retrieve the flanking sequences 50 bp either side of each variation site, and identify 151 variants suitable for KASP markers following a stepwise identification process (Figure. S1). The criteria for 152 selection were that the flanking sequences a) did not contain any InDels; b) contained a maximum of four 153 ambiguous bases; c) had a base coverage of at least five at any position; and d) had no more than four consecutive 154 repeats of any 1-5 nucleotide sequence. Variants that passed this filtering were defined as potential KASP markers.

155
The SNP positions of the potential KASP markers were used in the diversity analysis of potential KASP assays 156 below.

157
In-silico analysis of diversity using the new and existing KASP markers for the nine rice lines.

159
The sequence variants of each of the 1,329,325 potential KASP that passed the filtering (Figure. S1) were used to 160 make 45 comparisons -the 36 possible pairwise comparisons between these nine lines and the nine comparisons 161 to the indica reference cultivar. For the 2,015 existing KASP markers based on rice SNPs that had previously 162 been developed (Pariasca-Tanaka et al. 2015), the KASP primer sequences were aligned against the indica 163 reference using BLAST (Altschul et al., 1990)

229
For each of the nine lines, those variations that were suitable for KASP markers were categorised 230 according to the nature of the polymorphism against the indica reference (Table 1)   This new approach of pair-wise comparisons for each of the nine resequenced lines against each other and against 241 the indica reference genome identified many more potential new KASP than previously existed for rice (Table   242 2). The highest diversity in the pairwise comparisons was >511,000 in the new set (IR65482 with Sunaulo      Of the 1,890 existing KASP markers that could be aligned against the indica reference, 1,159 (61%) were 263 polymorphic between at least one of the sequenced lines and the indica reference genome. However, they were 264 not evenly distributed throughout the genome nor across all lines ( Figure. 2). In pairwise comparisons between 265 the lines there were between 345 and 520 informative polymorphic markers for each cross combination (Table 2 266 and Figure. S5). There were some areas of the genome that had polymorphisms in all of the crosses (e.g, between there is only one polymorphic marker and it is only in crosses with Loktantra). Consecutive informative existing KASP markers were not often close together, in only 1.1% of cases were they closer than 1 kb. The median 271 distance between markers of 353 kb across all pairwise combinations of lines was over 2,700 times longer than 272 that found for the new markers ( Figure S6 and Tables S3 and S4).   (Tables S5 and S6).

295
Parental lines were genotyped with the KASP markers as controls and the results confirmed the presence 296 of the predicted alleles in the parents but also revealed within-line genetic variation for some of the parents at alleles for 70 existing KASP and 39 newly validated KASP (Table 3 and Table S7), of which 30 were discovered 300 from filtering and 9 identified by manual design.
SNPs provide the highest genome-wide density of genetic variants and occur in both coding and non-coding genomic regions. Due to their bi-allelic nature not all SNPs and InDels will be polymorphic for all cross insufficient to meet all rice-breeding challenges because, apart from being less numerous than available SSRs,

321
Here, NGS was used for re-sequencing nine indica breeding lines, chosen with no deliberate effort to 322 select for high diversity, and it identified an average of 1.05 million SNP or InDel variants between any one of 323 the individual rice lines and the indica reference genome, out of a total of 2.5 million variants across the whole 324 set of lines (available at www.ncbi.nlm.nih.gov/bioproject/395505). By mining this data using bioinformatics 325 filtering we discovered hundreds of thousands of potential new KASP markers giving high resolution coverage 326 over the entire genome ( Figure. 1, Table 1). This has vastly reduced the number of regions with no selectable 327 markers (compare Figures 1 and 2) and offers breeders access to over 1.3 million informative KASP with a 328 minimum of more than 245,000 for any paired combination of the 9 rice lines (Table 2) and has produced over 329 650 times more KASP marker sequences than were available in rice to date. Approximately 92,500 (7%) were 330 located in exons and altered the amino acid sequence encoded, so could be used as functional markers (Table 1). common recent ancestors (Table S1) so this data set should provide a high density of polymorphic KASP assays would be identified if the filtering criteria were relaxed slightly to allow the detection of KASP markers in gaps 336 at target genomic regions. Relaxing the criteria is a practical option as they were very stringent; they provided a 337 52% conversion rate for new markers from identified variations but excluded 37% of the 1,159 existing KASP.

338
Early rice genome sequencing of indica and japonica revealed about 1 SNP per kb (Feltus et al., 2004) 339 and the material that is subsequently selected to be re-sequenced determines the density of NGS based markers

349
We have demonstrated how high-throughput sequencing data can be used to identify so many new KASP 350 markers that they will be useful for many traits across many parental combinations. A set of 39 fully validated 351 marker designs are given here (Table 3). These design sequences can be submitted directly to LGC Genomics for 352 purchase of KASP primers through their KASP by Design (KBD) or KASP on Demand (KOD) services, or for 353 their full genotyping service. This allows breeders with no bioinformatics expertise to utilise these markers in 354 their breeding programs. The software provided (Supplementary File S1) enables breeders to easily generate 355 KASP marker designs using their own, or publicly available, NGS datasetsfor any species. In addition, the 356 sequencing reads for the nine resequenced lines is a valuable resource containing suitable variants for numerous 357 breeding targets.

358
The work has led to suitable KASP assays for NARC and Anamolbiou (Nepal) and many more assays 359 are being rolled out to rice breeders in India (SKUAST) and Pakistan (NIBGE) with the support of LGC markers close to existing SSR markers or in a region of interest, without the need for any bioinformatics analysis.

364
In the meantime, the paper authors can be contacted for details of KASP marker designs based on the nine     C  T  T  T  T  T  C  C  C  C  bu0000026 8:11193818  8:18168439  Intergenic SNP  Yes  Background  C  T  T  T  T  T  C  C  C  C  bu0000027 8:21701896  8:20380804  Intron deletion  Yes  Fragrance QTL  TG  TG  TG  TG  TG  TG  -TG  TG  TG  bu0000028 8:21701975  8:20380883  Intron SNP  Yes  Fragrance QTL  T  T  T  T  T  T  C  C  C  C  bu0000029 8:21704520  8:20383435  Intron SNP  Yes  Fragrance QTL  C  C  C  C  C  C  T  T  T  T  bu0000030 8:28422597  8:26729241  Intergenic SNP  Yes  Xa resistance  T  G  G  T  T  T  G  G  G  G File. S1 KASP marker design sequence generation software.