Background

Most important agronomic traits of rice (Oryza sativa L.), such as heading date, yield, plant height and grain size, are controlled by quantitative trait loci (QTLs). Although multiple QTLs have been reported over the past few decades, few of them have been isolated or fine-mapped (Hu et al. 2015; Huang et al. 2009a; Ishimaru et al. 2013; Jiao et al. 2010; Konishi et al. 2006; Song et al. 2007; Wang et al. 2015; Xue et al. 2008; Yano et al. 2000). As the phenotypic manifestations of these quantitative traits show continuous variation and are always influenced by environmental factors and genetic backgrounds, only suitable genetic populations can provide a strong foundation for genetic and genomic research. To date, many types of genetic populations have been developed for QTL research and utilization. Primary genetic mapping populations, such as the F2, F2:3 and BC1 populations, were often used for analyzing the genetic basis of QTL during early studies (Ahn et al. 1993; Li et al. 1995; Redona and Mackill 1996). However, these types of genetic populations can only detect a few major QTLs with large genetic effects due to complex and unstable genetic backgrounds (Yamamoto et al. 2000). To further explore the genetic basis of complex traits, some permanent mapping populations, such as recombinant inbred lines (RILs) and doubled haploid (DH) have been developed. QTLs with minor effects can also be detected by using these permanent populations, but they are still inadequate for fine mapping and cloning of QTLs (Yano 2001). As a permanent population, chromosome segment substitution lines (CSSLs) can improve the precise detection of QTL regulation of complex traits and make it much easier for QTL fine mapping and cloning (Mei et al. 2006; Takai et al. 2007). Each line of CSSLs has the same genetic background as its recurrent parent, except for a few substituted segments from a donor parent. CSSLs are ideal for mapping genetic population for the elimination of most genetic background noise and dissection of QTLs into the single Mendelian factor (Xu et al. 2010). Although development of CSSLs is laborious and time-consuming, more and more CSSLs have been developed for their significant advantages (Bessho-Uehara et al. 2017; Furuta et al. 2014; Hao et al. 2006; Kubo et al. 2002; Qiao et al. 2016; Shim et al. 2010; Yoshimura 1997).

Marker-assisted selection (MAS) is a powerful technique of selecting target segments during CSSLs construction. With the publication of rice whole genome sequences, many SSR (simple sequence repeat) markers, Indels (insertions and deletions) markers and SNPs (single nucleotide polymorphisms) markers have been exploited for MAS. The density of molecular markers that are distributed in the whole genome determine the accuracy of CSSLs. Low-density molecular makers may easily miss small introgression segments and double-crossovers (Xu et al. 2010). In fact, less than 200 molecular markers were used to develop CSSLs because of laborious work (Bian et al. 2010; Shim et al. 2010; Xu et al. 2007; Zhu et al. 2009). Next-generation sequencing technology provides a high-throughput genotyping method that can construct high-resolution physical maps and calculate the length of substituted segments in the construction of CSSLs. Recently, several RILs and CSSLs have been exactly genotyped by whole-genome resequencing, and high-quality linkage maps were constructed for QTL dissection (Gao et al. 2013; Huang et al. 2009b; Xu et al. 2010; Zhang et al. 2011; Zhu et al. 2015b). Among the CSSLs that were reported previously, few were developed by using the male sterile line and were subjected to high-throughput genotyping by whole-genome resequencing to obtain accurate physical maps. It will be necessary to develop novel CSSLs with high-resolution physical maps.

Previously, our lab dissected yield-associated loci using an RIL derived from a cross between PA64s and 9311 (Gao et al. 2013). Since then, to facilitate QTL mapping and cloning, 3 sets of CSSLs were developed and were derived from three representative rice varieties: the first genome-sequenced japonica cultivar Nipponbare, an elite restore indica cultivar 9311, and a male sterile line PA64s with a mixed genetic background of indica and japonica. One set of CSSLs was developed from a cross between PA64s as the donor and Nipponbare as the recipient. The variety 9311 was used as recipient parent for the other two sets of CSSLs with Nipponbare and PA64s as donor parents, respectively. To obtain precise physical maps for QTL detection, the three sets of CSSLs were genotyped by high-throughput whole-genome resequencing. QTLs analysis for 11 major agronomic traits of the 3 sets of CSSLs was performed using the bin maps converted from the precise physical maps.

Results

Development of the CSSLs

A total of 311 polymorphic molecular markers were used to assist selection during CSSLs development (Additional file 1). Among them, 136 markers denoted as A, AB or AC were used for MAS in the population derived from PA64s × 9311 crosses, and 162 markers denoted as B, AB or BC were used for CSSL analysis on the population derived from a cross between 9311 and Nipponbare. Then, 138 markers denoted as C, AC or BC were used for the population derived from PA64s × Nipponbare crosses.

The development procedure for the 3 sets of CSSLs is shown in Fig. 1. Hybrid F1 plants, namely, LYP9, were generated from a cross between maternal PA64s and paternal 9311. Then, F1 plants were backcrossed twice with 9311 to produce BC2F1 generation. The first MAS was performed for BC2F1 generation with 136 polymorphic markers distributed on all 12 chromosomes, and 65 plants were selected from 93 individuals, in which a few of heterozygous substituted segments came from PA64s, while the major genomic regions were homozygous for 9311 alleles. Subsequently, the selected 65 plants from BC2F1 generation were backcrossed twice again with 9311 to generate BC4F1. Twelve individuals from each BC3F1 and BC4F1 line were genotyped using the markers on the target-substituted regions. In total, 54 plants from BC4F1 that had one to three heterozygous substituted segments were self-crossed to produce BC4F2. Then, 12 individuals from each BC4F2 were genotyped, and the homozygous substituted individuals on target regions were self-crossed to obtain 54 CSSLs. The remaining BC4F1 plants that had a few substituted segments that were continuously backcrossed with 9311 to generate BC5F1 and BC6F1. Then, 20 BC5F2 plants and 7 BC6F2 plants that were selected using MAS were self-crossed to obtain 27 CSSLs. Finally, a total of 81 CSSLs with a few substituted segments were developed, which were named A set CSSLs for easy description (Fig. 1a). Similarly, the other two sets of CSSLs, including 55 lines and 61 lines, were obtained with the same technique using 162 and 138 polymorphic markers, respectively (Fig. 1b and c). Meanwhile, 55 CSSLs with major genomic regions from 9311 had a few substituted segments from Nipponbare, which were named B set CSSLs. Then, 64 CSSLs were named C set CSSLs and contained a few substituted segments from PA64s, in which the major genomic regions were homozygous for Nipponbare.

Fig. 1
figure 1

Flowchart of the procession of developing 3 sets of CSSLs. a Precession of developing A set CSSLs. b Precession of developing B set CSSLs. c Precession of developing C set CSSLs. The female parent is written in the front and the male parent in the back in hybrid combinations. The red arrow indicates that the genotypes of plants were identified by MAS. Numbers under the red arrow represent the selected individuals of lines by MAS. The green circle represents a self-cross

Characteristics of the CSSLs

To obtain accurate physical maps for the 3 sets of CSSLs, all of the CSSLs were subjected to high-throughput genotyping by whole-genome resequencing. Physical maps for the 3 sets of CSSLs were constructed using the physical locations of filtered SNPs (Fig. 2). The A set CSSLs containing 81 lines were found to carry 310 substituted segments, and there were at least 230 substituted segments in B set CSSLs and 367 substituted segments in C set CSSLs (Table 1). The average number of substituted segments was approximately 26, 19 and 31 per chromosome and ranged from 13 on chromosome 6 to 44 on chromosome 3, 10 on chromosome 6 to 30 on chromosome 3 and 5, and 24 on chromosome 5 to 38 on chromosome 4 for A, B and C sets of CSSLs, respectively. Statistical data for segment lengths showed that more than 61% segments were shorter than 2.00 Mb, while approximately 11% segments were longer than 7.00 Mb (Fig. 3). The total length of substituted segments was approximately 1152.61 Mb in A set CSSLs and was approximately 868.10 and 1177.96 Mb in B and C sets CSSLs, respectively (Table 1). The average length of substituted segments varied little, with an average of 3.72, 3.77 and 3.21 Mb in A, B and C sets of CSSLs, respectively. The total coverage length of substituted segments was 363.14 Mb in A set CSSLs, while that length was 325.70 and 331.40 Mb in B and C sets of CSSLs, respectively. The coverage rate of substituted segments was 97.3% in the A set CSSLs, which varied from 88.1% on chromosome 8 to 100.0% on chromosomes 2, 4, 5, 9 and 10. The coverage rate of substituted segments was nearly the same in B and C sets of CSSLs; it was 87.3% and 88.8%, respectively, and ranged from 67.3% to 100.0% on each chromosome.

Fig. 2
figure 2

Graphic genotypes of 3 sets of CSSLs. a Genotypes of A set CSSLs. b Genotypes of B set CSSLs. c Genotypes of C set CSSLs. Different colors represent different genotypes: red, 9311; blue, PA64s; green, Nipponbare. Each row represents a CSSL. Chr is short for chromosome

Table 1 Distribution of substituted segments on 12 chromosomes for 3 sets of CSSLs
Fig. 3
figure 3

Distribution of the lengths of the substituted segments in the 3 sets of CSSLs. A, B and C represent A set CSSLs, B set CSSLs and C set CSSLs, respectively

Phenotypic Variation

The phenotypic variations of 11 traits were observed between the 3 sets of CSSLs and their parents for QTL analysis. The t-test revealed that the differences between PA64s and Nipponbare were extremely significant concerning the panicle number (PN), primary panicle branch number (PPB), secondary panicle branch number (SPB), spikelet number per panicle (SN), grain length (GL), grain width (GW) and 1000-grain weight (TGW) in Shenzhen, as well as all traits but plant height (PH) and panicle length (PL) in Hangzhou. Both 9311 and Nipponbare showed significant differences in all traits but the flag leaf length (FLL) in Hangzhou and Shenzhen (Additional file 2: Table S1). The correlation of trait variation is shown in Additional file 2: Tables S2 to S7. The results show that extremely significant positive correlations were detected between the grain number and the secondary panicle branch number with the highest correlation coefficient, 0.8883 on average. In both Hangzhou and Shenzhen, 1000-grain weight showed an extremely significant positive correlation with grain length and grain width in the 3 sets of CSSLs, whereas it was negatively correlated with the spikelet number per panicle. Extremely significant positive correlations were also detected between plant height and panicle length in the 3 sets of CSSLs, except the B set CSSLs in Shenzhen. The phenotypic values of the 11 traits in the 3 sets of CSSLs were all found to be continuous and exhibited normal or skewed distribution patterns across both environments, indicating quantitative inheritance of the 11 characters and satisfying the demands for QTL analysis (Additional file 3: Figure S1).

QTL Analysis for 11 Traits Based on CSSLs

To conduct the QTL analysis, resequencing-based physical maps of the 3 sets of CSSLs were converted into bin maps. A total of 71 QTLs for 11 traits in 3 sets of CSSLs were detected in both Hangzhou and Shenzhen and were distributed on 11 chromosomes, except for chromosome 10 (Fig. 4 and Table 2). Among the 34 QTLs that were detected in Shenzhen and the 37 QTLs in Hangzhou, 35 QTLs were detected in A set CSSLs under both environments, as well as 13 and 23 QTLs in the B and C sets of CSSLs, respectively. Eight QTLs were commonly detected in both Shenzhen and Hangzhou for the same phenotype, and 4 QTLs had overlapping interval locations in different populations. There were also 8 QTL clusters that were detected, including FLL and PPB, SPB and SN, GL and TGW, and GW and TGW, which coincided with significant correlations (Table 2, Additional file 2: Tables S2 to S7).

Fig. 4
figure 4

The positions of QTLs located on each chromosome. Markers for CSSLs development are shown in left. HZ and SZ represent Hangzhou and Shenzhen, respectively

Table 2 QTLs for 11 traits detected in 3 sets of CSSLs in Shenzhen (SZ) and Hangzhou (HZ)

Compared with the previously identified QTLs by physical location, 22 of the 71 QTLs were reported in the previous studies, and the remaining 49 QTLs were unique. Among the 22 reported QTLs, the interval regions of the four QTLs overlapped with the locations of four known genes (Additional file 4: Figure S2). For the QTL with the largest effect on plant height in C set CSSLs, qPH1–2 can explain 26.3% of the phenotypical variation and was located in the region from 38.0 to 38.5 Mb on chromosome 1, which contained sd1, the rice green revolution gene that is responsible for rice semi-dwarfness (Sasaki et al. 2002). In the B set CSSLs, the Nipponbare alleles increased the flag leaf width at qFLW4 under two environments, which overlapped with the location of Nal1/ LSCHL4 that controlled leaf width and yield (Fujita et al. 2013; Qi et al. 2008; Zhang et al. 2014). For the secondary panicle branch number, qSPB1–2 was mapped to the region that ranged from 35.1 to 36.2 Mb on chromosome 1, while LAX1, a gene controlling panicle branch development, was previously cloned (Komatsu et al. 2003). Finally, the major QTL qGW5 with Nipponbare alleles that increased the grain width was detected in B and C set of CSSLs under two environments, while GW5/GSE5 was located in the region by previous studies (Duan et al. 2017; Liu et al. 2017).

Validation and Fine-Mapping of qGL1

qGL1 had a stable effect in B set CSSLs under both environment and was also detected in A set CSSLs in Hangzhou (Fig. 4 and Table 2). As CSSL-A8, CSSL-A29 and CSSL-B54 had overlapped regions on qGL1 location, the interval of qGL1 could be narrowed to the region of 2.1 Mb which ranged from 28.503 Mb to 30.611 Mb on chromosome 1 (Fig. 5a). To identify genes associated with grain length and further reduce the interval of qGL1, we performed GWAS by analyzing a natural rice population, consisting of 317 rice varieties that were collected from the 3000 Rice Genomes Project (Wang et al. 2018). Using the efficient Mixed-Model, 8 SNPs at qGL1 location on chromosome 1 were identified to be significantly associated with grain length. Then, we used pairwise linkage disequilibrium (LD) correlations to estimate a candidate region from 30.492 Mb to 30.652 Mb for qGL1 (Fig. 5b). Hence, both our GWAS and CSSLs results confirmed that the interval of qGL1 could be narrowed to a 119 kb region ranged from 30.492 Mb to 30.611 Mb.

Fig. 5
figure 5

Fine mapping of qGL1. a the interval of qGL1 detected in A set CSSLs and B set CSSLs. Different colors represent different genotypes: red, 9311; blue, PA64s; green, Nipponbare. Dashed lines indicate the overlap regions in different CSSLs. b Local Manhattan plot (top) and LD heatmap (bottom) surrounding the peak on chromosome 1. Dashed lines indicate the candidate region for the peak. c variation of grain length between 9311 and 3 CSSLs. HZ and SZ represent Hangzhou and Shenzhen, respectively. Error bars are s.d. * and ** indicate the least significant difference at 0.05 and 0.01 probability level compared with the recurrent parent in SZ or HZ, respectively. d Differential gene expression and annotation. arpkm is short for Reads Per Kilobase per Million mapped reads; binformation refers to its Arabidopsis homologs at The Arabidopsis Information Resource (TAIR)

There were 18 annotated genes within this 119 kb interval based on Rice Genome Annotation Project Website (http://rice.plantbiology. msu.edu/) (Additional file 2: Table S8). Among the 18 genes, 3 genes expressed different significantly between 9311 and PA64s in young panicle tissues based on RNA-seq data. The expression of LOC_Os01g53140 and LOC_Os01g53220 in 9311 were higher than that in PA64s, while the expression of LOC_Os01g53250 was significantly higher in PA64s (Fig. 5d). Moreover, sequence variations of those genes between 9311 and Nipponbare were identified and expressions at RNA level were also analyzed in young panicles between 9311 and CSSL-B54 (Additional file 6: Figure S4). Both 9311 and Nipponbare had amino acid variations of these three candidate genes, which were caused by SNPs on exons. Nipponbare and CSSL-B54 showed lower expression level on LOC_Os01g53140 and LOC_Os01g53220 compared with 9311, which was coincided with the RNA-seq data. Similarly, the expression of LOC_Os01g53250 was higher in Nipponbare and CSSL-B54 than that in 9311. LOC_Os01g53220 has previously been designated as OsHsfC1b, which is a heat shock transcription factor and plays a role in ABA-mediated salt stress tolerance in rice (Mittal et al. 2009; Schmidt et al. 2012). LOC_Os01g53250 product is a NADPH reductase whose homologous gene in Arabidopsis is ATR3. ATR3 is NADPH-dependent diflavin oxidoreductase 1, which is involved in embryo development ending in seed dormancy and oxidation-reduction process (Varadarajan et al. 2010). The product of LOC_Os01g53140 is an expressed protein. Its homologous protein in Arabidopsis is a neuronal acetylcholine receptor subunit alpha-5 which is subunit of mitochondrial respiratory chain complex I and is involved in photorespiration (Jennifer et al. 2010). Collectively, on the basis of our results, we suggest that qGL1 is located in a 119 kb region on chromosome 1 and LOC_Os01g53140 and LOC_Os01g53250 are the two most likely candidate genes for qGL1.

Discussion

CSSLs are the ideal population for detecting the precise QTLs that control complex traits by eliminating genetic background noise (Nadeau et al. 2000; Qiao et al. 2016; Tian et al. 2006; Ye et al. 2010; Zhou et al. 2017). On the other hand, because CSSLs contain a few substituted chromosome segments from donor parent, they can be used as NILs themselves or used to develop NILs with additional cross with the recurrent parent, which are fast and powerful tools for QTL validation and isolation (Ebitani et al. 2005; Mulsanti et al. 2018; Zhu et al. 2017). Under a certain condition, CSSLs are limited for QTL detection. Because serval CSSLs carry some long-substituted chromosome segments that can range between 10 and 20 Mb, they can hardly detect the QTLs that are close to each other with opposite effects. Here, we developed 3 sets of CSSLs using repeated elite parents, which may be much easier for QTL detection and mutual authentication.

Because there are different genetic backgrounds among lines or individuals, the epistatic effects between QTLs for many quantitative traits estimated by using conventional experimental materials are confounded with other genetic factors (Zhu et al. 2015a). Due to the single segment substitution, CSSLs may be ideal materials for verifying and researching the interaction between two QTLs in the additive effect and the epistatic effect by a cross between two related CSSLs. In rice, a number of QTLs and their epistasis on traits, such as plant height, heading date, and yield components, have been well analyzed by using CSSLs or secondary F2 populations that were derived from CSSLs (Chen et al. 2014; Kubo et al. 2008; Wang et al. 2017; Yang et al. 2018; Zhu et al. 2015a). On the other hand, CSSLs can also be used for QTL pyramiding in plant breeding. Gn1a can increase the grain number and improve rice yield simultaneously through inhibiting OsCKX2 gene expression in the inflorescence meristem, while the green revolution gene sd1 reduces bioactive gibberellin (GA) abundance, which results in semi-dwarfism to resistant to lodging. NIL-sd1 + Gn1a generated by crossing NIL-Gn1a and NIL-sd1 obtains an increased grain number but decreased plant height compared with Koshihikari, which provides a new strategy for improving crop yield through QTL pyramiding (Ashikari et al. 2005). The rational design rice which pyramided the good appearance genes GS3 and qSW5 from 9311 and the superior quality genes Wx and ALK from Nipponbare showed significantly improved grain appearance and quality compared with Teqing (Zeng et al. 2017). GW5/GSE5 is a major QTL that regulates grain width and weight predominantly by influencing cell proliferation in spikelet hulls (Duan et al. 2017; Liu et al. 2017). NAL1/LSCHL4, which regulates the development of the vascular bundle conferred leaf width, enhances the grain productivity of indica cultivars by increasing the number of secondary panicle branches and the spikelet number per panicle with declines in the seed-setting rate and in the 1000-grain weight (Fujita et al. 2013; Qi et al. 2008; Zhang et al. 2014). In this study, the confidence intervals of qFLW4 and qGW5 that were detected in the B set CSSLs overlapped with the locations of Nal1 and GW5, respectively (Table 2). The grain width performance of CSSL-B35 and CSSL-B36, which carried Nipponbare alleles segments at qGW5, showed significant differences compared with 9311 in across two environments, as well as the flag leaf width of CSSL-B10 at qFLW4 (Additional file 4: Figure S2 D and E). The increasing effects of Nipponbare alleles at both sites suggested a possibility of improving the 9311 yield by pyramiding the good appearance QTL qFLW4 and qGW5 from a cross between CSSL-B10 and CSSL-B35 or CSSL-B36.

PCR-based molecular markers are useful tools for selection during CSSLs development. However, they can hardly detect the small substituted segments or determine the length of substituted segments because the number of polymorphic markers used for selection is limited. Resequencing-based high-throughput genotyping technology can overcome these shortcomings by improving the resolution of the physical map accurately and reducing laborious and time-consuming work compared with the conventional molecular marker-based genotyping method (Watanabe et al. 2018; Xu et al. 2016; Zhang et al. 2011; Zhu et al. 2015b). As illustrated in Additional file 5: Figure S3, double-crossovers in CSSL-A35, CSSL-A44 and other CSSLs were detected based on throughput resequencing technology, while these double-crossovers can hardly detect by PCR-based molecular markers for the limited number of molecular markers. In this study, we developed 3 sets of CSSLs by molecular markers for selection and constructed high-density physical maps of the 3 sets of CSSLs by the whole-genome resequencing approach for high-throughput genotyping. A total of 197 lines were selected by 311 molecular markers for the development of the 3 sets of CSSLs. The coverage rates of substituted segments were 97.3%, 87.3% and 88.8% in the A, B and C sets of CSSLs respectively (Table 1). The number and length of substituted segments in 3 sets of CSSLs were also counted precisely using high-throughput genotyping data (Fig. 3), which provided a platform for genetic research and molecular rice breeding.

Due to significant phenotypic variations and the strong potential for heterosis between indica and japonica subspecies, many populations included some CSSLs that were derived from crosses between indica variety and japonica variety have been constructed for QTL mapping and gene cloning (Song et al. 2007; Wei et al. 2010; Zhou et al. 2016). To facilitate QTL detection and isolation, we developed 3 sets of CSSLs from three types of rice varieties: a typical japonica cultivar Nipponbare, an elite indica cultivar restore line 9311, and a middle type cultivar PA64s with a mixed genetic background of indica and japonica. To evaluate the potential advantages of the 3 sets of CSSLs for QTL detection, the phenotypic variations of 11 agronomic traits were observed between the 3 sets of CSSLs and their parents in Hangzhou and Shenzhen. Among the 71 QTLs detected in the 3 sets of CSSLs for 11 agronomic traits, 22 QTLs, including 4 known genes, were located in the vicinity of QTLs that affect these traits detected in the other populations, which suggested less epistatic effects of these QTLs in varied genetic backgrounds (Fig. 4 and Table 2). Because of the pleiotropic effects or the tight linkage of genes, multiple QTLs affecting related traits were mapped to the same region to form QTL clusters, such as qSPB12, qTGW12 and qSN12 on chromosome 12 in A set CSSLs and qFLL2, qPPB2–1 and qSN2 on chromosome 2 in B set CSSLs, which also coincided with significant correlations, thus indicating that the relationship among these quantitative traits was intricate.

Although CSSLs are powerful tools for QTL detection, the intervals of QTLs are often too large, which are determined by the length of substituted segments. GWAS based on large-scale resequencing provide a powerful platform for finding genetic variants that can be directly used for crop improvement (Yano et al. 2016). In this study, the interval of qGL1 was located to the region of 2.1 Mb by two CSSLs. In parallel, we also found 8 associated SNPs at qGL1 location through GWAS and estimated a candidate region from 30.492 Mb to 30.652 Mb for qGL1 by LD correlations. As the confidence interval resulted from GWAS is overlapped with the CSSLs location, the interval of qGL1 was narrowed to the region of 119 kb which ranged from 30.492 Mb to 30.611 Mb on chromosome 1. The expression profiles of the diverse varieties of the collection in a particular tissue can more accurately aid in the determination of candidate genes, and annotation of candidate gene in the local region can also provide evidences for gene function (Xuehui et al. 2012). In this study, 3 genes expressed different significantly in the candidate region in young panicle tissue between 9311 and CSSL-B54. Among the 3 differentially expressed genes, LOC_Os01g53140 and LOC_Os01g53250 are the two most likely candidate genes for qGL1, as they may be involved in embryo development and energy metabolism, respectively, according to homologous alignment and gene function annotation.

Conclusions

In this study, three sets of CSSLs were developed containing 81, 55 and 64 lines, respectively, which provided a platform for basic genetic research and offered ideal materials for molecular rice breeding. High-throughput whole-genome resequencing strategy made the physical maps for the 3 sets of CSSLs more accurate for precise information on location and length of substituted segments, which facilitated novel QTLs discovery. Combined with GWAS on grain length and expression profiles on young panicle tissues, qGL1 detected in CSSLs was fine mapped within a 119 kb region on chromosome 1 offering a new gene resource for improving grain shape.

Methods

Plant Materials

Nipponbare was the first map-based sequenced japonica cultivar. Restorer line 9311and photothermo-sensitive male sterile line PA64s are the parents of pioneer super hybrid rice LYP9, which has been widely cultivated for commercial production in China. MAS was used for the development of three sets of CSSLs during crossing and backcrossing. Three parents and the populations derived from the three parents were grown annually at the experimental farm of China National Rice Research Institute in Hangzhou (119°54′ E, 30°04′ N) and Lingshui (110°00′ E, 18°31′ N).

DNA Extraction and Molecular Analysis

CTAB method was used for the extraction of genomic DNA from fresh young leaves of each individual (Rogers and Bendich 1988). The DNA amplification was performed by PCR (Polymerase Chain Reaction) with the protocol as follows: predenaturation at 94 °C for 4 min; 38 cycles of denaturation at 94 °C for 30 s, annealing at 55–60 °C for 30 s and extension at 72 °C for 30 s, with a final extension at 72 °C for 5 min. The reactions were carried out in 96-well PCR plates in 10 μl volumes containing 50–100 ng template DNA, 0.2 μmol/L of each primer and 5 μl of 2× Taq Master Mix (Vazyme Inc.). Electrophoresis of the amplification products were carried out on 4% agarose gels and photographed using GelDoc XR Gel Documentation System (Bio-Rad Inc.).

Most polymorphic molecular markers used for MAS were SSR markers that were selected from the Gramene Markers Database (McCouch et al. 2002; Project IRGS 2005). The other markers used were InDels markers developed from a BLASTN alignment between the genome sequence of three parents. Primers were designed on Primer3 web (http://primer3.ut.ee/).

High-Throughput Genotyping by Whole-Genome Resequencing

At least 1 μg of genomic DNA from each sample of CSSLs was used for resequencing with HiSeq PE150 strategy on Illumina HiSeq Xten. All DNA samples had backups for verification. Approximately 10× coverage sequence reads were generated for each CSSL. The O. sativa cv. Nipponbare IRGSP 1.0 reference genome was used for alignment (Kawahara et al. 2013). SNP calling was carried out with SAMtools (Li et al. 2009). A hidden Markov model was used to impute genotypes of recombinational chromosome fragments based on the observed genotypes of SNPs, following the previous method (Xie et al. 2010). The recombinational fragments less than 300 kb were identified as missing data in our study, and a crossover was defined between two adjacent blocks with different genotypes. An interval between two adjacent crossovers in the whole CSSL population was regarded as a bin (Zhou et al. 2015).

Phenotypic Assessment for CSSLs

The 3 sets of CSSLs and their parents were grown for phenotypic assessment at the experimental farm in Hangzhou (119°54′ E, 30°04′ N) and Shenzhen (114°03′ E, 22°32′ N) in 2017. Twenty-five-day-old seedlings of each line were transplanted into an eight-row plot with six plants per row and spacing of 20 × 20 cm. The field management followed normal agricultural practice. The measurements of the plant height, panicle number, flag leaf length and flag leaf width were performed directly in the field at 25 days after heading. For all those traits, five plants of each line were sampled from the middle of each plot, and the main culm of each plant was chosen for trait measurement. At maturity, the main panicles of the individuals were harvested from 5 plants in the middle of each plot to measure the panicle and grain-related traits in the laboratory. For the five main panicles of each line, panicle length was measured with rulers; the primary panicle branch number, secondary panicle branch number and spikelet number per panicle were counted manually. Fully filled grains per panicle were used to calculate the grain length, grain width and 1000-grain weight using the WinSEEDLE Analysis System (Regent Instruments Inc.). Basic statistical analysis and correlation analysis were conducted with SAS 8.0 (SAS Institute). Five replications were performed for each trait.

QTL Analysis for 11 Major Agronomic Traits Based on CSSLs

Bin maps were converted from the physical maps of 3 sets of CSSLs for QTL analysis. Most donor segments between or among the different lines of each CSSL have a little overlap. To perform QTL analysis easily and precisely, the overlapping chromosome segments of the CSSLs were utilized to delineate smaller segment sizes that were described as bins (Huang et al. 2009b). The QTL analysis was performed from these bins that served as genetic markers using QTL IciMapping V4.0 (www.isbreeding.net/software/). The likelihood ratio test based on stepwise regression for the additive QTL (RSTEP-LRT-ADD) method was employed for power analysis. The mapping parameters of probability in stepwise regression was set at 0.001, and the multicollinearity control by condition number was 1000 as the default. The LOD threshold for each dataset was set based on a permutation test (1000 permutations, P = 0.05). When the QTL LOD score was larger than 2.5, the QTL was designated having a major effect.

Genome-Wide Association Study and Linkage Disequilibrium

A total of 317 varieties are selected from 3010 diverse accessions of Asian cultivated rice and the SNPs information was acquired in public database (Wang et al. 2018). GWAS for the grain length were carried out using Efficient Mixed-Model Association eXpedited (EMMAX) software package (Hyun Min et al. 2010). The LD between SNPs in the 317 varieties was evaluated using squared Pearson’s correlation coefficient (r2) as calculated with the -r2 command in the software PLINK. The LD heatmap surrounding peaks in the GWAS was constructed using LDheatmap R package (Shin et al. 2006).

RNA-Seq Analysis

We extracted total RNA from the 2–5 cm long young panicle tissues of rice plants with an RNA extraction kit (Trizol reagent, Invitrogen). The libraries were sequenced on a HiSeq. PE150 paired-end reads were generated. The sequencing reads were aligned to the Nipponbare (Oryza sativa L. spp. japonica) reference genome (http://rice.plantbiology.msu.edu, TIGR Version 7). A gene was considered expressed if the reads per kilobase of transcript model per million mapped reads was > 0.4. Differential gene expression between sample groups was analyzed using Cufflinks software for each gene. Genes were considered differentially expressed if the |log2(fold change)| ≥1 and FDR (false discovery rate) < 0.05.

Quantitative Real-Time RT-PCR Analysis

Total RNA was isolated from the 2 cm long young panicle tissues in Shenzhen with an RNA extraction kit (Trizol reagent, Invitrogen). cDNA was synthesized using a Rever Tra Ace® qPCR-RT kit (TOYOBA). Real time PCR amplification mixtures (10 μl) contained 50 ng template cDNA, 2 × SYBR Green PCR Master Mix (invitrogen), and 200 nM forward and reverse primers. Reactions were run on a StepOnePlus™ Real-Time PCR System (Thermo Fisher Scientific). The relative expression level of each transcript was obtained by comparing to the expression of the OsActin1 gene. Primers for candidate genes and OsActin1 are listed in Additional file 2: Table S9.