Background

As one of the world major economic crops, cotton plays an important role in society. Fiber is the main product of cotton, providing raw materials for the textile industry [1]. In addition to lint (‘fiber’), cottonseed is comprised of kernel, hull and fuzz. Cottonseed kernels are regarded as the best source of vegetable protein after soybean and the fifth most important oil crop after soybean, palm, canola and sunflower [2, 3]. Fiber yield and fiber quality, as well as cottonseed quality traits including seed index, oil percentage and protein percentage are quantitative traits. A previous study had reported the correlation among yield, fiber quality and cottonseed quality traits. Pahlavani et al. [4] reported that oil content was largely affected by seed size. Kothari et al. [5] reported positive relationships for seed oil with fiber strength, uniformity index, and fiber length. Positive correlations were found for seed protein and several agronomic traits whereas negative correlations were found between oil and lint yield along with other agronomic traits. Moreover, cottonseed oil content was also negatively related to seed protein content [6].

The much higher value of cotton fiber made it a primary objective of cotton breeding in the past, which resulted in less consideration of cottonseed quality including oil and protein contents [7]. A recent survey suggested that approximately 5000 QTL had then been identified in cotton [8], which included QTL related to cottonseed quality [7, 9,10,11,12,13,14]. In addition, QTL associated with seed index and oil content had been identified through GWAS enabled by development of sequencing technology and release of cotton reference genomes [15,16,17,18]. However, few stable QTL could be identified for further study.

In past years, simple sequence repeat (SSR) markers had been used to construct many genetic maps in crop research. However, the low polymorphism rate of SSR markers in cotton made it difficult to construct a saturated genetic map, which limited the application of the genetic map in DNA marker assisted selection (MAS). Due to their abundance across the whole genome, single nucleotide polymorphism (SNP) markers became popular in genetic map construction and MAS in recent years [19, 20]. With the rapid development and application of NGS technologies, many complexity reduction approaches have been developed to identify SNPs, such as restriction site-associated DNA sequencing (RAD-Seq) [21] specific locus amplified fragment sequencing (SLAF-seq) [22], and genotyping-by-sequencing (GBS) [23]. Compared with other sequence technologies, SLAF-seq has many merits including: 1) no requirement for a reference genome sequence and polymorphism information; 2) repetitive sequences can be avoided; and 3) a balance between marker density and population size can be maintained by varying the fragment size [22]. In addition, the release of genome sequences of G. raimondii, G. arboreum, G. hirsutum and G. barbadense facilitated the application of NGS technology in cotton research [15,16,17,18]. Recently, SLAF-seq was applied for genetic map construction, QTL identification and variation analysis in cotton [22, 24]. For example, Ali et al. [22] constructed a high-density genetic map containing 6254 single nucleotide polymorphism markers which covered 3141.72 cM and identified 95 QTL for fiber quality traits. Shen et al. [24] harbored 132,880 SNPs and 6296 InDels between the reference genome (TM-1) and the five tetraploid cotton species, including G. hirsutum cv. Emian22, G. barbadense acc. 3–79, G. tomentosum, G. mustelinum and G. darwinii. Zhang et al. [25] constructed a genetic map including 5521 high-quality SNP markers by SLAF-seq and detected 18 QTL associated with boll weight.

In this study, a recombinant inbred line (RIL) population of 180 lines was developed from a cross between two upland cotton cultivars/lines, Yumian 1 and M11. Then, SLAF-seq was applied to genotype RILs. The present study aims to construct a high-density genetic map to identify QTL for seed index, cottonseed oil and protein content in upland cotton. The results will facilitate future molecular breeding programs to better exploit the full economic potential of cotton.

Results

Phenotypic performance

Descriptive statistics for all traits across three environments were shown in Additional file 5: Figure S1 and Additional file 1: Table S1. Both the skewness and kurtosis values of six traits, including hundred seed weight (HSW, g), hundred kernel weight (HKW, g), ten kernel length (TKL, mm), ten kernel width (TKW, mm), oil (KOC) and protein (KPC) content, were < 1.0 in three environments, which indicated that all traits did not deviate significantly from a normal distribution. Subsequently, correlation analysis across three environments was conducted separately (Additional file 2: Table S2). All traits showed significant correlations with other traits except kernel length, which had normal correlations with KOC and KPC. In addition, KPC showed significant negative correlations with others. The variation among genotypes and environments was highly significant for all test traits, which indicated the influence of each of these factors on cottonseed growth (Additional file 3: Table S3).

SLAF-seq data analysis and SNP marker development

Restriction fragments ranging from 314 bp to 344 bp were selected for further analysis. These fragments were distributed approximately evenly over the genome (Additional file 6: Figure S2). After sequencing, a total of 452.32 M paired end reads were generated for the two parents and the RIL lines, and 93.81% of these bases were of high quality with Q30 (indicating a 0.1% chance of an error) and average GC content of 38.47%. In total, 60,718,390 reads (26,086,993 for Yumian 1, 29,906,736 for M11 and 4,724,661 for RILs) were obtained. Among these clean reads, the percentages of reads anchored on the reference genome for Yumian 1, M11 and 86 RILs were 99.48, 99.54 and 99.52%, respectively. The percentages of reads properly mapped on the reference genome for Yumian 1, M11 and the RILs were 91.37, 89.8 and 93.62%, respectively (Additional file 4: Table S4). The SLAF number for Yumian 1 was 709,329 with an average sequencing depth of 33.14-fold. For M11, the SLAF number was 718,771 with an average sequencing depth of 38.15. For the RIL lines, 396,418 SLAFs were obtained with average depth of 12.08 (Additional file 4: Table S4). Among these SLAFs, 316,514 SNPs were identified, and 36,161 (11.42% of) SNPs showed polymorphism in the RIL population. Based on the character of the RIL population, only aa × bb polymorphisms were used for further analysis. This type included 21,632 members. After multiple filtering, 7033 SNPs with average sequencing depth of 19.09 were used to construct the genetic map.

Genetic map construction

By genetic linkage analysis, a total of 7033 loci were mapped on 26 chromosomes, covering 3353.15 cM with an average distance of 0.48 cM between consecutive markers. Among the 7033 loci, the At genome contained 4295 loci spanning 1701.91 cM at an average of 0.40 cM between adjacent markers, whereas the Dt genome included 2738 loci spanning 1651.24 cM with an average of 0.60 cM between adjacent markers. Chromosome A13 (703 loci) contained the maximum loci, followed by A01 (644) and A02 (613), whereas the fewest were on D06 (109), with an average of 270 loci on each chromosome. The longest chromosome was D05 (229.75 cM), and the shortest was D04 (83.06 cM), with an average chromosome length of 128.97 cM (Fig. 1; Table 1; Additional file 7: Supplement 1).

Table 1 Characteristics of the genetic map

In addition, 377 (5.36%) of the 7033 mapped SNPs showed segregation distortion (P < 0.05). The At genome included 126 (33.42%) and the Dt genome 251 (66.58%, Table 1). There was no distorted marker on chromosomes A07, A08, D03, D04, D07, D08, D11 and D12. Chromosome D06 had the largest number of distorted loci (82) (Fig. 1).

Fig. 1
figure 1

Genetic maps and QTL for cottonseed quality in the Yumian 1 × M11 RIL population

QTL mapping of seed size, oil and protein content

Based on the high-density genetic map and genotype and trait data, a total of 58 QTL, including 12 for HSW, eight for HKW, six for TKL, six for TKW, 13 for KOC and 13 for KPC, were identified (Table 2; Fig. 1). These QTL explained 10.5–56.9% of the total phenotypic variance with LOD values ranging from 2.0 to 14.5. Among these QTL, 31 were on the At subgenome and 29 on the Dt (Table 2; Fig. 1). Twenty-two QTL had positive additive effects derived from Yumian 1, the others deriving from M11 (Table 2).

Table 2 QTL for cottonseed traits identified across three environments

For HSW, 12 QTL were detected on eight chromosomes with LOD scores ranging from 2.03 to 9.08 and PVE ranging from 10.5 to 40% (Table 2; Fig. 1). The favorable alleles of eight QTL (qHSW-A09.1, qHSW-A11.1, qHSW-D02.1, qHSW-D02.2, qHSW-D03.1, qHSW-D03.2, qHSW-D05.2, qHSW-D09.2 and qHSW-D13.1) came from M11, and four (qHSW-A01.1, qHSW-A01.2, qHSW-D05.1 and qHSW-D09.1) came from Yumian1. Two QTL (qHSW-A11.1 and qHSW-D03.1) were detected across three environments and five (qHSW-A01.1, qHSW-A01.2, qHSW-D09.1, qHSW-D09.2 and qHSW-D13.1) across two environments.

Eight QTL for HKW were detected on six chromosomes, with LOD scores ranging from 2.05 to 13.47 (Table 2; Fig. 1). Among these QTL, five favorable QTL alleles increasing hundred kernel weight came from M11, whereas three originated from Yumian 1. Four QTL (qHKW-A01.1, qHKW-A01.2, qHKW-A11.1 and qHKW-D03.1) were detected across three environments, with PVE values of 18.3, 15.6, 15.4 and 53.5%, respectively.

Six QTL for ten kernel length were identified on chromosomes A05, A11, A12, D02, D08 and D13. The PVE of these QTL ranged from 11.2 to 20.8%. Among these QTL, two favorable alleles were contributed by Yumian 1 and the rest came from M11. However, only two QTL (qTKL-A05.1 and qTKL-D02.1) were identified in three environments.

Six QTL for TKW were detected on six chromosomes (Table 2; Fig. 1), with two on D03. The PVE values for these QTL ranged from 10.7 to 56.9%. Favorable alleles for four QTL (qTKW-A11.1, qTKW-D01.1, qTKW-D03.1 and qTKW-D09.1) derived from M11, while two (qTKW-A01.1 and qTKW-A08.1) were from Yumian1. Three QTL (qTKW-A01.1, qTKW-A11.1 and qTKW-D03.1) were detected in three environments.

Thirty QTL were detected for KOC on ten chromosomes, with PVE ranging from 10.5 to 48.2% and LOD scores ranging from 2.0 to 11.6 (Table 2; Fig. 1). Among them, favorable alleles for nine QTL (qKOC-A03.1, qKOC-A09.1, qKOC-A11.1, qKOC-A13.1, qKOC-A13.2, qKOC-D02.1, qKOC-D03.1, qKOC-D05.1 and qKOC-D09.1) were contributed by M11, and others (qKOC-A01.1, qKOC-A01.2, qKOC-A01.3 and qKOC-A08.1) came from Yumian1. Two QTL (qKOC-A01.2 and qKOC-D09.1) and one QTL (qKOC-D03.1) were identified across two and three environments, respectively.

Thirty QTL for KPC were mapped on eight chromosomes, explaining 10.7–49.1% of the phenotypic variance (Table 2; Fig. 1). Chromosomes A01, A08 and D02 contained four, two and two QTL on different regions, respectively. Among these QTL, seven favorable alleles increasing trait value came from Yumian 1, whereas the rest were from M11. One QTL (qKPC-D03.1) was detected in three environments.

QTL hotpots/cluster

In this study, we found seven QTL clusters distributed on 6 chromosomes, including three on the At subgenome and four on the Dt subgenome (Table 3). Every QTL cluster possessed at least three QTL for different traits. A01-cluster-1 had the highest number of QTL (7 QTL for qHKW, qHSW, qTKW, qKOC and qKPC). D03-cluster-1, A11-cluster-1 and A01-cluster-1 contained five, three and two stable QTL, respectively. These QTL clusters could be priorities for further application (Table 3).

Table 3 QTL clusters for cottonseed traits identified across in the Yumian 1 × M11 RIL population

Discussion

Correlation between seed size, oil and protein content

After measuring seed size, oil and protein content, we analyzed the correlation among these traits. Beyond the significant correlation between seed weight (HSW, same as seed index, and HKW) and oil and protein content (KOC and KPC), as described by Pahlavani et al. [4], we found that kernel shape (TKL and TKW) was significantly correlated with seed weight (HSW and HKW), oil and protein content (KOC and KPC). TKW was more closely correlated with KOC and KPC than TKL (Additional file 2: Table S2). Approximately 80% of the dry weight of the cottonseed kernel consists of storage lipid and protein, and cotyledon tissue accounts for 60% of the cottonseed kernel [26]. Due to the physical shape of cotyledons, their influence on kernel width may be larger than kernel length. In addition, there is rapid accumulation of oil and storage protein in the embryo maturation stage over 25–45 DPA with increased size and weight of the cotyledons [27, 28]. This growth trajectory may be the reason that TKW was more significantly correlated with KOC and KPC than TKL and further study is needed to understand the correlation between kernel shape and other traits.

The direction of favorable QTL alleles

The favorable alleles for a trait do not necessarily come from the more favorable parent. For instance, Liu et al. [14] identified 14 QTL for seed index, with five favorable alleles coming from Yumian1 and the remainder from CCRI35. Zhang et al. [22] detected 16 stable QTL for boll weight, including 8 whose favorable alleles came from the maternal parent and 8 from the paternal parent. Among 60 QTL detected in the present study, 22 favorable alleles came from Yumian 1 with the rest from M11 (Table 2). This result, combined with previous reports, indicated that both the superior and inferior parent could contribute QTL alleles that increase the trait value, contributing to transgressive segregation in progeny populations.

Stable and common QTL

Stable and major QTL for yield and quality are important to molecular breeding. It is well known that quantitative traits are controlled by multiple genes and affected by environment [14]. In the present study, variance analysis also suggested a significant influence of environment on the development of cottonseed. Hence, this study considered QTL identified in all test environments as stable QTL. Thirteen stable QTL were detected, most of which were within QTL clusters/hotpots (Table 2, Table 3). These stable QTL deserve priority for further research, including fine mapping, candidate gene identification and molecular mechanism analysis of cottonseed development. Moreover, these stable QTL have the potential to improve cottonseed quality through MAS.

Until now, QTL or SNP associated with seed index have been identified by traditional QTL mapping methods or GWAS [8, 17, 29]. We compared the stable QTL detected in this study with QTL identified in previous studies through the physical position of the nearest marker(s). Two stable QTL had been previously reported, while 11 (qHKW-A01.1, qHKW-A01.2, qHKW-A11.1, qHKW-D03.1, qTKL-A05.1, qTKL-D02.1, qTKW-A01.1, qTKW-A11.1, qTKW-D03.1, qKOC-D03.1 and qKPC-D03.1) were newly found. Identifying candidate genes controlling these new QTL for kernel length and kernel width will accelerate research into the mechanism of cottonseed growth. These common QTL and novel stable QTL would be priorities for MAS to improve cottonseed quality by transferring favorable alleles into cotton cultivars.

Methods

Population construction

A RIL population including 180 lines was developed from a cross between Yumian 1, a high fiber quality cultivar, was bred through a multiple-line intermating program [30]; and M11, a high oil line provided by Dr. Du from Cotton Research Institute. The parents were crossed at Southwest University, Chongqing, China, in the summer of 2010. The F1 seeds were planted in Hainan, China, in the winter of the same year. In the summer of 2011, 180 F2 plants were randomly selected. Since then, single-seed descent was executed from F2:3 to F2:8. The RIL population was formed in the summer of 2015. All RIL lines along with two parents were planted in Chongqing, China, in the summer of 2016, Hainan, China, in the winter of 2016 and Anyang, China, in the summer of 2017, respectively.

Phenotypic data analysis

All naturally opened bolls were hand-harvested. After ginning and drying, one hundred seeds were selected randomly and weighed to determine seed index (HSW, g). Subsequently, the cottonseed kernels were firstly used to measure hundred kernel weight (HKW, g), ten kernel length (TKL, mm) and ten kernel width (TKW, mm) after hulling. Then, the kernels were ground into powder to detect oil (KOC) and protein (KPC) content by Fourier Transform Infrared Spectrometer (NIRFlex® N-500). The frequency distribution and correlation coefficients among these traits were analyzed by SPSS version 20.0 (SPSS, Chicago, IL, USA), and the phenotypic trends and the relevance of these traits were illustrated intuitively in box plot drawings by Plotly 2.0 (https://plot.ly).

DNA preparation, SLAF-library construction, and high throughput sequencing

Total genomic DNA was extracted from fresh young leaves of two parents and 86 RILs according to a modified CTAB method by Zhang et al. [31]. The SLAF-seq strategy for library construction was according to Shen et al. [24] with some modifications. The cotton reference genome used in this study was released by Zhang et al. [16]. A pilot experiment was carried out to determine the enzymes selected for library construction and the size of the restriction fragments for SLAFs. Clean DNA was digested into fragments with the specific enzyme combinations RsaI+HaeIII (NEB, Ipswich, MA, USA.). After a series of treatments to these restriction fragments, high-throughput sequencing was performed using an Illumina HiSeqTM-2500 (Illumina, Inc., San Diego, CA, USA) at the Biomarker Technologies Corporation in Beijing. Subsequently, examination was performed to evaluate the result of sequencing.

Sequencing data grouping and genotyping

SLAF identification and genotyping was based on procedures described by Sun et al. [32] and Shen et al. [24]. Initially, low-quality reads (quality score < 20e) were filtered out and the remaining reads were arranged for the progenies according to the duplex barcode sequences. Then, 5 bp terminal sites were trimmed, to yield high quality reads. The G. hirsutum reference genome was retrieved from Phytozome (https://phytozome.jgi.doe.gov/Ghirsutum_er). Clean reads were mapped to the reference genome using Burrows-Wheeler-Aligner (BWA) software [33]. Sequences were defined as one SLAF marker if they mapped on the same position with over 95% identity [16]. Subsequently, GATK software and Samtools/bcftools were used to detect SNPs between the parents [34,35,36]. SNPs of low quality were filtered out, based on the following criteria: a) minimum read depth less than 10; b) average base quality less than 30; c) SNPs in each RIL anchored on different position; and d) SNPs in RILs with more than 40% missing data [24].

Map construction and segregation distortion analysis

HighMap was used to order the SLAF markers, correct genotyping errors within the chromosomes and calculate the genetic distance between adjacent marker. Besides, SMOOTH was applied to correct errors based on the parental contribution of genotypes, and a k-nearest neighbor algorithm was used to impute missing genotypes as described by Zhang et al. [22]. Chi-squared tests were employed to test loci for deviation from the 1:1 expected segregation ratio (p < 0.05).

QTL analysis

The QTL influencing cottonseed size, oil and protein content were identified by MapQTL 6.0 [37], using multiple QTL mapping. A threshold of log of odds ratio (LOD) ≥ 2.0 was used to declare suggestive QTL as suggested by Lander and Kruglyak [38]. Positive additive effects indicated favorable alleles derived from M11, while negative additive effects indicated favorable alleles from Yumian 1. The QTL nomenclature was designated as: q + trait abbreviation + chromosome number + QTL number. QTL identified in three environments were considered stable.