Introduction

Rice (Oryza sativa) blast disease, caused by the fungus Magnaporthe oryzae (Bourett and Howard 1990), which is a teleomorph of the complex Ascomycete genus that is composed of interfertile anamorphs (Barr 1977; Dean et al. 2005), is a major constraint on rice yield and quality in the rice-planting regions of the world. The disease is difficult to combat owing to the ability of the causal fungal agent to adapt to its host and to frequently mutate. The long-term use of traditional chemical pesticides to control rice blast disease can lead to environmental pollution and is not economical. Thus, breeding disease-resistant cultivars is a more environmentally friendly and effective strategy to manage blast disease (Savary et al. 2012). The identification of resistance genes/quantitative trait loci (QTLs) and understanding disease-resistant mechanisms are particularly important for the breeding of disease-resistant plants. In rice, 216 blast resistance-related genes/QTLs have been identified (http://www.ricedata.cn/ontology, TO:0000074). On the basis of differences in conserved protein motifs, such as nucleotide-binding sites, leucine-rich repeats, Toll–interleukin receptors, coiled coils, transmembrane receptors, and protein kinases, resistance genes can be classified into different categories (Liu et al. 2007a). More than 100 resistance genes have been identified and at least 24 of these resistance genes/QTLs have been cloned (http://www.ricedata.cn/gene/gene_pi.htm).

Although most of the above resistance genes/QTLs were identified using linkage mapping (Wang et al. 1999; Lin et al. 2007; Takahashi et al. 2010; Zeng et al. 2011; Liu et al. 2013; Chen et al. 2015; Li et al. 2015), the detection of genes/QTLs is limited by the biparental materials used. With the development of high-density genetic mapping, genome-wide association studies (GWASs) based on linkage disequilibrium have become effective gene/QTL mapping strategies for natural populations (Zhu et al. 2008). GWASs overcome the shortcomings of linkage mapping that are associated with using a population derived from biparental materials (Yu et al. 2006). A series of GWASs have been successfully performed to dissect complex traits in multiple crop species, including maize (Riedelsheimer et al. 2012), soybean (Wen et al. 2018), wheat (Sukumaran et al. 2015), cotton (Du et al. 2018), and sorghum (Chopra et al. 2017). In rice, a high-density single nucleotide polymorphism (SNP) map and comprehensive HapMap were constructed several years ago (Huang et al. 2010). Subsequently, GWASs were successfully applied to uncover 14 complex rice agronomic traits, and 37 associations were identified (Huang et al. 2010). One of these associations, located on chromosome 7, was analyzed in depth, resulting in the identification of the candidate gene OsSPL13 that positively regulates cell size and enhances rice grains’ length and yield (Si et al. 2016). In addition, based on the same high-density SNP map and population (Huang et al. 2010), a GWAS was successfully applied to detect QTLs related to rice blast resistance, and 30 associated loci were identified (Wang et al. 2014).

Previously, we developed a custom-designed array (Lu et al. 2015) that consisted of 5291 SNPs evenly chosen from the Rice Haplotype Map Project Database (http://www.ncgr.ac.cn/ricehap2/) (Huang et al. 2010). We have successfully used the array to perform GWASs for different agronomic traits and identified multiple important associations that will be beneficial in the molecular breeding of rice in future (Lu et al. 2015, 2016, 2018; Feng et al. 2016; Zhang et al. 2017). Here, we examined rice blast resistance using a GWAS based on the custom-designed SNP array in 355 diverse indica accessions. The objectives of the present study were as follows: (1) to analyze the genetic architecture of rice blast resistance, (2) to detect a set of lead associations for candidate gene identification, and (3) to preliminarily verify novel functional candidate genes that might participant in regulating blast resistance.

Material and methods

Plant material and phenotypic evaluation

The association mapping panel was composed of 355 indica accessions that were selected from our previous study, which were described in detail by Lu et al. (2015). The phenotypic evaluation was performed at the China National Rice Research Institute in Hangzhou (N 30° 32′, E 120° 12′), China, in 2016. Seeds were planted in plastic trays (length × width × height, 43 × 30 × 7.5 cm) to test rice blast resistance using 16 M. oryzae strains collected from South China. Each accession was inoculated at the third to fourth leaf stage using the spray method and then planted in a greenhouse having a high temperature (~ 35 °C) and high humidity (~ 80%) environment. The disease level was evaluated 7 days after inoculation. The diseased leaf area (DLA) was used as the evaluation criterion (Chuwa et al. 2015): 0 = no lesions; 1 = small, brown, specks of pinhead size; 3 = small, roundish to slightly elongated, necrotic, gray spots about 1–2 mm in diameter; 5 = typical blast lesions infecting <10% of the leaf area; 7 = typical blast lesions infecting 26–50% of the leaf area; and 9 = typical blast lesions infecting > 51% leaf area and many dead leaves. Three replications were performed for each experiment.

Genotyping, population structure and association mapping

The genotypes of all of the accessions were obtained using a custom-designed array containing 5291 SNP markers reported in our previous study (Lu et al. 2015). The SNPs having minor allele frequencies < 5% were removed, resulting in 4032 SNPs being used in the association analyses.

The population structure analysis was performed using STRUCTURE v2.2 (Pritchard et al. 2000a) with the following parameters: 1 to 15 populations with five runs, 10,000 burn-in period, and 100,000 Markov Chain Monte Carlo replications. The principal component analysis was performed using PowerMarker v3.25 (Liu and Muse 2005) and NTSYSpc v2.1 (Rohlf 2000). The pairwise relatedness coefficients were estimated using SPAGeDi v 1.4c (Hardy and Vekemans 2002).

The association mapping was performed by TASSEL v4.0 (Bradbury et al. 2007), and the EMMA (Kang et al. 2008) and P3D (Zhang et al. 2010) algorithms were used to reduce the computing time. A mixed linear model, with the population structure (Q) and relative kinship matrix (K) as covariates, was used to control the rate of false positive associations. A compromised threshold of P = 1E−03 was used to identify the significant associations.

Candidate gene identification and expression profiling

The potential candidate genes near the associations were identified within a 200-kb genome region (± 100 kb of the lead SNPs). The annotations of these candidate genes were obtained from the Rice Haplotype Map Project Database (http://www.ncgr.ac.cn/ricehap2/). The relative expression levels of the target genes were assessed by quantitative real-time PCR (qRT-PCR). Total RNA was extracted from young leaves using a MiniBEST Plant RNA Extraction kit (TaKaRa Bio Inc., Japan), following the manufacturer’s instructions. Next, first-strand complementary DNA was synthesized using PrimeScript RT Master Mix (TaKaRa Bio Inc). Then, the reaction mixture was run on a 7500 Real-Time PCR system (Applied Biosystems, Carlsbad, CA, USA) (Lu et al. 2016). Rice Ubq-2 was used as the internal control, and all of the reactions were repeated using three independent biological replicates.

RNA sequencing

The blast resistance rice variety Shuidaobawang (CH491) was inoculated with the M. oryzae S182 strain. The uninoculated variety acted as the control. After lesions showed, total RNA was extracted from young leaves using a MiniBEST Plant RNA Extraction kit (TaKaRa Bio Inc) according to the instructions. Three biological replications for each treatment were used for transcriptome sequencing. The enrichment of mRNA, fragment interruption, addition of adapters, size selection, and PCR amplification were performed by CapitalBio Technology Ltd. (Beijing, China). The library was sequenced using the Illumina HiSeq™ 2500 platform.

Identification of differentially expressed genes

The raw sequencing data was collected, and low-quality reads, adaptor sequences, and empty reads were removed. Next, the transcriptomes were remapped to the reference genome (Oryza sativa L. spp. Japonica, var. Nipponbare, MSU 6.0) using HISAT to check the sequencing quality (Kim et al. 2015). The gene expression level was quantified as the total number of reads for each gene that uniquely aligned to the reference genome. To identify genes related to blast resistance, we selected genes that had expression levels that were significantly altered by treatment (inoculation) compared with the control. A false discovery rate ≤ 0.05 and an absolute value of log2 ratio ≥ 3 were used as the thresholds to judge the significance of differences in gene expression.

Results

Material distribution and phenotypic variation

A set of 355 genetically diverse indica accessions collected from the 17 main provinces in South China, which represented the major rice-planting region, was used to perform the GWAS to detect novel loci associated with resistance to rice blast disease (Figure S1a; Table S1). A total of 16 strains of rice blast collected from eight provinces in South China were used to investigate blast resistance at the seedling stage (Figure S1a; Table 1; Table S2). The resistance level of each accession was evaluated by DLA after inoculation. Extensive and rich phenotypic variations were observed in the resistance to the 16 blast strains (Figure S1b). The disease rating as assessed by the DLA ranged from 2.99 for ‘S254’ to 6.68 for ‘S171’, with an average of 5.23 (Table 1).

Table 1 Phenotypic variation for 16 blast strains

The correlation analyses between disease rating and both longitude and latitude indicated that the DLA resulting from inoculations with the 16 strains was significantly positively correlated with latitude except for S102, while the DLAs caused by ‘S172’ (r = −0.12, p = 0.023), ‘S193’ (r = −0.15, p = 0.005), and ‘S366’ (r = −0.14, p = 0.011) were highly negatively associated with longitude (Table S3; Figure S2). Moreover, most of the correlation coefficients of disease rating as assessed by DLA resulting from inoculations of the 16 strains were also significantly (p < 0.05) or highly significantly (p < 0.01) positively correlated with each other, especially between ‘S122’ and both ‘S149’ (r = 0.62, p < 0.01) and ‘S366’ (r = 0.61, p < 0.01), while only the correlations between ‘S172’ and both ‘S182’ (r = 0.05) and ‘S254’ (r = 0.07) were not significant (p > 0.05) (Figure S3).

Population divergence and relative kinship analysis

Our previous study showed that the indica population can be classed into four subpopulations with moderate levels of differentiation (Lu et al. 2015). Here, we reused 355 indica accessions from the previous study as the GWAS population and reanalyzed the genetic component of each accession using STRUCTURE v2.2 software (Pritchard et al. 2000a). The maximum value of Evanno’s ΔK was K = 5 (Fig. 1a), suggesting that the genetic structure of the 355 indica accessions had five classifications (Fig. 1b), which was supported by the principal component analysis plot (Fig. 1c). The percentage of phenotypic variation explained by population structure ranged from 6.01 to 26.36%, with an average of 15.44% (Table 1). Thus, population structure might cause false positive associations that should not be ignored in further GWAS analyses (Yu et al. 2006). Most of the pairwise kinship coefficients were less than 0.1, indicating that there was no or weak relatedness among the accessions (Figure S4).

Fig. 1
figure 1

Population structure of the association mapping panel. a ΔK values plotted as the number of subpopulations. b Component of different subpopulations (K = 2–5). c Principal component analysis

GWAS for resistance to 16 blast-causing M. oryzae strains

Our previous work with the indica panel suggested that false positive associates were only greatly controlled by a Q + K model (Lu et al. 2015). Thus, a GWAS for resistance to rice blast was performed to identify the associations using TASSEL v4.0 under the Q + K association model (Bradbury et al. 2007). To select the major associations and mined known loci, the lead associations with the lowest p values were maintained in a 200 (± 100)-kb region. After clumping, 127 significant associations, including eight known loci, were identified (Table 2; Table S4). All of the significant associations explained 29.77% of the average phenotypic variation, ranging from 15.59% for resistance to ‘S122’ to 49.50% for resistance to ‘S182’ (Table 2).

Table 2 Summary of significantly associations for 16 strains

Most of the lead SNPs were very close to known genes. In rice varieties resistant to ‘S242’, Pi9 was only ~ 31 kb away from the significant association, seq–rs2897 (p = 7.10E−04) on chromosome 6, and in rice varieties resistant to ‘S172’, Pita was only ~ 64 kb away from lead SNP seq–rs5659 (p = 4.96E−08) on chromosome 12 and Pi1 was only ~ 89 kb away from the lead SNP seq–rs5411 (p = 5.92E−04) on chromosome 11 (Figure S5; Table S4). In addition, some associations were also identified in our previous GWAS repots using the different indica association population (Wang et al. 2014). For example, the two significant SNPs, seq–rs5460 (p = 8.54E−05) and seq–rs5489 (p = 9.00E−04), on chromosome 11 were associated with resistance to ‘S168’, and the lead SNP seq–rs3576 (p = 1.55E−06) on chromosome 7 was associated with resistance to ‘S366’ (Figure S5; Table S4). A set of significant associations was newly identified. For resistance to ‘S122’, a new lead SNP (seq–rs2233, p = 7.97E−04) and a cluster of new significant associations were detected on chromosomes 4 and 12, respectively. For resistance to ‘S182’, a new lead SNP (seq–rs5745, p = 1.09E−11), with the lowest p value, was obtained on chromosome 12. Moreover, for resistance to ‘S359’, a series of significant associations lead by seq–rs2863 (p = 4.19E−05) was detected on chromosome 6 (Figure S5; Table S4).

Assessment of GWAS significant associations

The greatest number of associations, accounting for 34.6% of all the associations, was identified on chromosome 12. In addition, 13.4% of all associations was distributed on chromosome 9, followed by chromosomes 11 (7.9%) and 6 (10.2%) (Fig. 2a). Furthermore, of the associations distributed on these four chromosomes, a series of hotspots was observed (Fig. 2b). In rice, chromosomal hotspots for rice blast resistance were frequently identified, especially on chromosomes 6, 11, and 12 (http://www.ricedata.cn/gene).

Fig. 2
figure 2

Hotspot distribution and pleiotropism of the associations. a Percentage of significant associations on different chromosomes. b Distribution of hotspots on chromosomes 6, 9, 11, and 12. c Pleiotropy of different associations

Further analyses of these associations revealed that 25 associations, including four known genes, showed pleiotropy that was associated with more than two strains (Fig. 2c). For example, the lead SNP (seq–rs4199, Pos: ~ 10.35 Mb) located on chromosome 9 was associated with five different strains, which was supported by the correlation coefficient value among the DLAs caused by the five strains (Figure S3). Moreover, the lead SNP is 117.25 kb and 41.94 kb away from two known blast resistance genes, Pi5 and Pi 56(t), respectively (Lee et al. 2009; Liu et al. 2013). Thus, the two genes appear to be important candidate genes for broad-spectrum resistance to blast, as previously reported (Liu et al. 2013). In addition, two other broad-spectrum resistance genes, Pi9 (Qu et al. 2006) and Pita (Bryan et al. 2000; Tacconi et al. 2010), were also obtained (Fig. 2c). Apart from the three known loci, 22 new pleiotropic associations were identified, including seq–rs4185 on chromosome 9 and seq–rs5745 on chromosome 12, which were correlated in four different strains (Fig. 2c). These pleiotropic associations would be very helpful in the molecular breeding of blast resistance in rice.

The efficacy of pyramiding elite alleles for different numbers of associations and random markers was examined to determine their effects in different strains, without considering the interaction among the associations/markers. The rice blast resistance increased (disease rating decreased) with the pyramiding of greater numbers of elite alleles (Figure S6a). However, the resistance level to rice blast remained almost unchanged, even as more elite alleles of random markers were pyramided (Figure S6b). Thus, enhancing the frequencies of elite alleles of significant associations could increase rice blast resistance.

Identification of candidate genes for the associations

For the 127 associations, 3877 candidate genes were identified within a 200-kb genomic region for each association from the Rice Haplotype Map Project Database (http://www.ncgr.ac.cn/ricehap2/). After clumping, 2341 nonredundant candidate genes were obtained (Table S5), including 45 disease resistance-related genes (Table S6). Of these 45 disease resistance candidate genes, 13 had been previously identified (Lopez–Gerena 2006; Shi et al. 2011; Jung et al. 2014; Kang et al. 2016), five were cloned in previous studies, such as Pi9 (LOC_Os06g17900), Pid2 (LOC_Os06g29810), Pi5 (LOC_Os09g15840), Pi56(t) (LOC_Os09g16000), and Pita (LOC_Os12g18360), and 1 (LOC_Os12g23930) overlapped with our previous GWAS results using a different population (Wang et al. 2014) (Figure S7a; Table S6). The 45 resistance-related genes were mainly distributed on chromosome 11, followed by chromosomes 6 and 8 (Figure S7b). This was consistent with the clustering distribution of the associations (Fig. 2a, b). Furthermore, the four lead SNPs, seq–rs2897, seq–rs5460, gwseq–rs12, and seq–rs722, were identified in a relatively large number of resistant genes (Figure S7c), indicating that they would be helpful in breeding a rice variety with broad-spectrum and high-level resistance to rice blast. The relative expression level of the candidate gene LOC_Os12g23930 (Os12g0427000), which was also identified in our previous study, highly significantly decreased after inoculation (Figure S7d; Table S7). Its relative expression pattern was consistent with our previous report (Wang et al. 2014), suggesting that the candidate gene might be involved in negatively regulating rice blast resistance.

Identification of blast resistance genes by combining GWAS and RNA sequencing

To identify blast resistance candidate genes, we inoculated a medium-resistance rice variety (CH491) with the M. oryzae S182 strain. ‘CH491’ that was not inoculated acted as the control. After the lesions appeared, RNA from the leaves was sequenced (unpublished). A total of 232,180,474 reads were generated using the Illumina sequencing platform. Of these reads, a set of 91,869,902 reads were mapped (Table S8). In total, 683 significantly differentially expressed genes, including 282 up- and 401 downregulated genes, were identified (Table S9; Fig. 3a). Among these differentially expressed candidate genes, eight genes were functionally annotated as encoding resistance proteins (Table S10). In addition, one of the eight genes, LOC_Os01g66020, was presumed to be associated with rice leaf blight resistance in a previous report (Jung et al. 2014).

Fig. 3
figure 3

Candidate gene identification by integrating GWAS and RNA sequencing analyses. a Detection of differentially expressed genes by RNA sequencing. Blue and red crosses represent up- and downregulated genes, respectively. Green crosses represent unchanged genes. b Association peaks of the three overlapped genes with two lead SNPs. c Three overlapped genes identified by RNA sequencing and GWAS for S182 strain. d Relative expression levels of the three target genes. Error bar is standard error. One and double asterisks represent t test at 0.05 and 0.01 significant level, respectively

For resistance to ‘S182’, 382 blast resistance candidate genes were detected by the GWAS analyses (Fig. 3b; Table S9). Three, LOC_Os02g02850, LOC_Os02g41590, and LOC_Os02g41630, were confirmed by the RNA sequencing analyses (Fig. 3c; Table S10). The former is located near the association SNP seq–rs722 (p = 1.88E−04), and the latter two are close to the association SNP seq–rs1136 (p = 4.82E−04) on chromosome 2 (Fig. 3c; Table S5). A previous study suggested that the gene, LOC_Os02g41630, was a member of the phenylalanine ammonia-lyase gene family (OsPAL1), and it was highly homologous with the member OsPAL5. Furthermore, another member, OsPAL4, is a broad-spectrum disease resistance-related gene, and the resistance levels to three diseases, bacterial wilt, sheath blight, and rice blast, were significantly enhanced in heterozygous mutant materials (Tonnessen et al. 2015). Highly significant (p < 0.01) or significant (p < 0.05) expression level differences of the three candidate genes occurred before and after inoculation (Fig. 3d; Table S7).

Discussion

Here, a GWAS was used to identify associations of genes to rice blast resistance. Although GWAS is a useful strategy to reveal complicated traits and mine candidate genes (Huang et al. 2010; Wang et al. 2014; Lu et al. 2015, 2016, 2018; Si et al. 2016; Feng et al. 2016; Chopra et al. 2017; Zhang et al. 2017), population structure appears to affect the associations, which may result in false positive associations (Atwell et al. 2010). To eliminate false positive associations, structural association and genomic control have always been used in previous studies (Pritchard et al. 2000b; Yu et al. 2006). In our work, the accessions were selected from a previously reported indica-only panel (Lu et al. 2015), which should reduce false positive assertions. In addition, the population structure was reanalyzed using the program STRUCTURE v2.2 (Pritchard et al. 2000a), and the familial relationships were re-estimated using SPAGeDi (Hardy and Vekemans 2002). The panel contains no or weak relatedness among the accessions (Fig. 1; Figure S4). Previously, four GWAS models were used to evaluate the best association model in two environments, and the Q + K model was selected (Lu et al. 2015). Therefore, we used the Q + K model for the association mapping.

Here, we identified 127 associations for rice blast resistance, including eight known loci, such as Pita (Bryan et al. 2000; Tacconi et al. 2010), Pi9 (Qu et al. 2006), Pi5 (Lee et al. 2009), Pi56(t) (Liu et al. 2013), and Pid2 (Chen et al. 2006), in the indica-only panel. More importantly, a set of new significant associations with resistance to the 16 strains, especially S102, S122, S182, and S359 on chromosomes 6, 12, 12, and 6, respectively, was identified on 12 chromosomes (Figure S5; Table S4). In addition, we identified two associations with resistance to ‘S168’ on chromosome 7 and one lead SNP associated with resistance to ‘S366’ on chromosome 11. These had also been identified in our previous GWAS using a different association population (Wang et al. 2014). These new significant associations will aid in molecular breeding and functional candidate gene identification in the future.

Chromosomal hotspots have previously been identified for rice blast disease. For example, nine, 17, and 15 loci containing 15, 24, and 17 major blast resistance genes, respectively, were obtained on chromosomes 6, 11, and 12, respectively (http://www.ricedata.cn/gene). In our study, the associations also showed the presence of chromosomal hotspots on chromosomes 6, 9, 11, and 12, which was basically consistent with previous reports (Fig. 2b). Moreover, several lead SNPs were identified as being associated with the multiple strains (Fig. 2c), which might represent a multigenic effect, suggesting that a broad-spectrum resistance gene is near to the lead SNP. These candidate genes may share a common mechanism against different strains because of the pleiotropic or hitchhiking effect (Wang et al. 2014; Lu et al. 2015). In addition, these lead SNPs with pleiotropic and multigetic effects would be helpful for pyramid breeding in future.

To perform the GWAS, we used a custom-designed SNP array containing 5291 SNPs for the indica panel. Although, some GWASs involve ten thousand or a million SNP markers (Huang et al. 2010; Kang et al. 2016), given the linkage disequilibrium distance in the indica panel (~ 123 kb) (Huang et al. 2010), the genome coverage should be sufficient to identify most of the associations related to rice blast resistance. However, a higher density of SNPs would be helpful to identify causative genes and polymorphisms. For instance, the GWAS mapping resolution was improved for the target zone, from 220 to 20 kb, when using a 700-K SNP dataset rather than a 44-K SNP dataset (Kang et al. 2016).

Furthermore, we combined RNA sequencing and a GWAS analysis to identify blast resistance candidate genes against ‘S182’, and three overlapping genes were obtained (Fig. 3). One of the three genes, LOC_Os02g41630, is a member of phenylalanine ammonia-lyase gene family (OsPAL1), and another member, OsPAL4, is a broad-spectrum disease resistance-related gene for bacterial wilt, sheath blight, and rice blast (Tonnessen et al. 2015). The combined gene annotation and qRT-PCR were used to investigate the potential biological functions of the candidate gene. The gene’s expression level decreased after inoculation, which indicated that it may be involved in the negative regulation of rice blast resistance. Despite the relative expression verification, the biological functions of all of the candidate genes need to be validated using biotechnology experiments.

In summary, we identified a set of marker–trait associations and obtained multiple candidate genes for rice blast resistance using GWAS strategy. In addition, we combined a GWAS and RNA sequencing to mine for blast resistance candidate genes and obtained three overlapping candidate genes, which were further supported by the qRT-PCR analyses. Our report further confirms that GWAS is a powerful approach for discovering the quantitative blast resistance candidate genes compared with traditional linkage mapping, especially when used in combination with other analyses. This work provides a basis for further studies of the potential functions of these candidate genes. Our further work will also focus on validating the functional variants using molecular biology experiments. These broad-spectrum associations will be used to develop breeding markers for molecular assisted selection.