Background

Cotton represents the main source of natural textile fibers in the world and is the most prevalent raw material used in the textile industry (Wang et al. 2018). High yield and fine fiber-quality are prerequisites to meet the ever-increasing demand. Upland cotton (Gossypium hirsutum) accounts for more than 90% of the global cotton production because of its high yield potential and broad adaptability, but it has moderate fiber quality, whereas G. barbadense produces exceptionally fine-quality fibers, but with lower fiber yield (Cai et al. 2014; Hu et al. 2019).

Most agronomic and economic traits, such as yield and fiber quality, are quantitative traits that are controlled by multiple loci/genes. Moreover, environmental influence is substantial in the control and expression of these traits. Significant negative correlations between fiber quality traits and yield traits have been reported (Wang et al. 2015; Liu et al. 2018; Zhang et al. 2020). Dissecting the genetic basis of yield and fiber quality is essential for simultaneous improvement of yield and fiber quality.

Molecular genetic methods, especially molecular markers, have been applied widely in cotton in last decade. Recently, the development of the molecular markers was accelaerated with the release of assembled genome sequences of G. hirsutum (Li et al. 2015; Zhang et al. 2015; Wang et al. 2018; Yang et al. 2019) and G. barbadense (Liu et al. 2015; Yuan et al. 2015). Numerous genetic linkage maps, including the intraspecific map of G. hirsutum and the interspecific map between G.hirsutum and G.barbadense, have been constructed using restriction fragment length polymorphisms (RFLPs), simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). Thousands of quantitative trait loci (QTLs) for yield and fiber quality in cotton are documented in CottonQTLdb (Release 2.3, Said et al. 2013; Said et al. 2015). However, to date, there are few studies about the simultaneous dissection of the genetic basis underlying complex traits and their genetic correlations in multiple upland cotton populations by QTL mapping.

In the present study, we used four F2 populations derived from the hybridization between two G. hirsutum normal lines (4133B and SGK9708) and two introgression lines (Suyuan04–3 and J02–247). Four corresponding genetic linkage maps were constructed using SSR markers. QTL mapping was implemented with the integration of the genotypic and phenotypic data of eight agronomic and economic traits, including yield and fiber quality. Our findings will not only contribute to dissecting the genetic basis underlying yield and fiber quality and their genetic correlations but also provide insights into the simultaneous improvement of yield and fiber quality in upland cotton breeding.

Materials and methods

Plant materials and field experiments

Two G. hirsutum normal lines (4133B and SGK9708) with high yield potential but moderate fiber quality, and two introgression lines (Suyuan04–3 and J02–247) with superior fiber quality were used as the parents to generate four F2 populations in this study. SGK9708 was derived from CCRI41, which is a widely planted cultivar with wide adaptability; 4133B was derived from the hybridization of SGK9708 and the offspring of Gan4104 and CZA (70)33 and has high combining ability; Suyuan04–3 was derived from the distant hybridization of [83–811 × (86–1 × G.armourianum)]; and J02–247 was derived from the cross of Suyin45 × Sukang310, and has large cotton bolls as well as superior fiber length and strength. The cotton materials were provided by the National Mid-term Gene Bank for Cotton of China.

In 2014, the seeds of the four parents were sown in Anyang, Henan Province and four cross combinations, 4133B × Suyuan04–3 (hereafter referred to as 4Su), 4133B × J02–247 (4 J), SGK9708 × J02–247 (SgJ), and SGK9708 × 4133B (Sg4) were constructed. Winter breeding of cotton was carried out in Hainan Province. In 2015, the four F2 populations, 4Su, 4 J, SgJ and Sg4, which consisting of 271, 248, 276, and 304 individuals, respectively, were planted in Anyang. Each row was 7 m in length and 0.8 m apart, with 20∼22 plants planted in each row. Field management practices were implemented according to local farming practices.

Trait measurements and statistical analysis

In mid-September, all of the plants in the four F2 populations were investigated for plant height (PH). During the harvesting season, all of the seed cotton was collected and boll weight (BW) and lint percentage (LP) were calculated after the seed cotton had been weighed and ginned. Fiber quality traits, namely fiber length (FL), fiber strength (FS), fiber length uniformity (FU), micronaire (MIC), and fiber elongation (FE), were tested using an HVI 1000 (Uster Technologies, Switzerland) in the Cotton Quality Supervision, Inspection and Testing Center, Ministry of Agriculture, Anyang, China.

The descriptive statistics, namely the maximum, minimum, and mean values, standard deviation, and coefficient of variation (CV), for the eight traits across the four populations, were processed using Microsoft Excel 2013. A correlation matrix was calculated and visualized using the corrplot package in R (Wei and Simko, 2016).

SSR markers analysis

Young leaves were collected from each plant and stored at − 80 °C. Genomic DNA of individuals from the F2 populations and their parents was extracted from young leaf tissues using a modified cetyltrimethylammonium bromide (CTAB) method (Paterson et al. 1993).

Polymorphism detection for the four pairs of parents was performed using 5 713 SSR primers. The primers that amplified stable polymorphic products were selected for genotyping the F2 populations. The SSR primer sequences were downloaded from CottonGen (https://www.cottongen.org; Yu et al. 2014). We used a local Basic Local Alignment Search Tool (BLAST) program (Altschul et al. 1990) to map the SSRs to a physical map. The SSR sequences were queried against the G. hirsutum genome sequences (Wang et al. 2018) and the top BLAST hit was selected for further analysis. The separating and silver staining of polymerase chain reaction (PCR) amplified products were performed as detailed by Feng et al. (2015).

Genetic linkage map construction

The genetic linkage map was constructed using JoinMap 4.0 with the regression mapping method and logarithm of odds (LOD) threshold of 5.0. The Kosambi function was used to convert the recombination frequencies to map distances.

QTL mapping and analysis

WinQTL Cartographer 2.5 was applied to identify QTLs with the composite interval mapping (CIM) method. The parameters were set as 1.0 cM for the mapping step, 5 for control markers, and 1 000 for permutation tests. QTLs were considered significant if the corresponding LOD score was > 2.5. The additive effect, dominant effect, and R2 (percent of phenotypic variance explained by a QTL) were estimated. QTLs detected for the eight traits were named as q-trait-linkage group number (McCouch et al. 1988). A graphic representation of the linkage groups and QTLs were created using MapChart 2.2 (Voorrips 2002).

The action mode of a QTL was represented as dominance degree (|D/A|), i.e., an absolute value of dominant effect (D) divided by additive effect (A) (Stuber et al. 1987). The value was considered as additive if the dominance degree was < 0.2, partial dominance for 0.2∼0.8, dominance for 0.81∼1.2, overdominance for > 1.2.

The QTLs identified in this study were compared with those in CottonQTLdb (Said et al. 2015) to determine whether the identified QTLs were novel or had been detected previously. QTLs identified in the present study that shared the same or overlapping confidence intervals with QTLs in the CottonQTLdb based on common marker position were considered as QTLs that had been identified in previous studies.

Results

Phenotypic variation of the four F2 populations

The phenotypes of eight agronomic and economic traits across four F2 populations were evaluated. Extensive phenotype variations and transgressive segregation were observed (Table 1 and Fig. 1). Transgressive segregation means that the phenotypic values of some individuals were better than those of the superior parent or worse than those of the inferior parent (Reyes 2019). The CV values revealed differences in variability among the eight traits (Table 1). The CV value for LP was low (5.96–7.98%), whereas both CV values of PH and BW were high and similar (PH: 16.9–21.95%; BW: 15.66–19.7%). Among FL, FS, FU, FE, and MIC, the CV value was lowest for FU (1.59–2.61%) and highest for MIC (13.87–22%). Frequency distribution analysis showed normal distribution for seven of the traits, MIC was the exception (Fig. 1), suggesting that these traits were quantitative traits controlled by multiple genes and suitable for QTL mapping.

Table 1 Phenotypic variation of 8 traits in four F2 Populations
Fig. 1
figure 1

Histogram of the frequency distribution for 8 traits across 4 populations. Plant height (PH), boll weight (BW), lint percentage (LP), fiber length (FL), fiber strength (FS), fiber length uniformity (FU), micronaire (MIC), fiber elongation (FE)

Correlation analysis

Correlation analysis between 32 sets of phenotypic data from the eight traits across the four F2 populations revealed significant correlations for different traits within and between populations (Fig. 2). BW and LP were significantly negatively correlated (− 0.87 < r < − 0.62) in three populations (4Su, 4 J, Sg4), whereas BW had significant positive correlations with FS, FU, FE, and MIC (0.13 < r < 0.67) in the 4 J and SgJ populations (Fig. 2). Negative correlations between BW and FL, FS, FU, and FE (− 0.89 < r < − 0.82) and positive correlations between BW and MIC (r = 0.82) were observed in the 4Su population (Fig. 2). Conversely, LP was positively correlated with FL, FS, FU, and FE (0.95 < r < 0.99) and negatively correlated with MIC (r = − 0.90) (Fig. 2). For the fiber quality traits, positive correlations were found between FL and FS, FU, FE; FS and FU, FE; FU and FE in all four populations (0.19 < r < 0.998) (Fig. 2).

Fig. 2
figure 2

Correlation analysis among 8 traits. The magnitude of the correlation is indicated by different colors of numbers at the left diagonal and circles at the right diagonal. The circles marked by cross indicate no significant correction was observed between traits (P > 0.05)

PH and BW in the 4 J population were positively correlated with PH, LP, FL, FS, FU, and FE in the 4Su population (0.25 < r < 0.88) and negatively correlated with BW and MIC in the 4Su population (− 0.776 < r < − 0.772). Conversely, negative correlations (− 0.83 < r < − 0.15) and positive correlations (0.68 < r < 0.77) between LP, FL in 4 J population and corresponding traits in 4Su population were observed (Fig. 2). PH, LP, and FE in the Sg4 population were positively correlated with LP, FL, FS, FU, and FE in 4Su population (0.11 < r < 0.17) and negatively correlated with BW (− 0.14 < r < − 0.13); LP, FE, and MIC in the Sg4 population were positively correlated with PH and BW and negatively correlated with LP and FL (− 0.24 < r < − 0.15) in 4 J population (0.14 < r < 0.27) (Fig. 2).

Overall, within populations, most of the correlations were negative between the two yield traits (BW and LP), whereas, most of the correlations were positive among the fiber quality traits, as well as between BW and the fiber quality traits (Fig. 2). Significant correlations were found between multiple traits among the 4Su, 4 J, and Sg4 populations (Fig. 2), suggesting the influence of the common parent 4133B on the traits.

Genetic map construction

Five thousand seven hundred thirteen SSR primers were used to detect polymorphisms in the four pairs of parents. Seven hundred thirty-nine polymorphism SSR primers with clearly amplified bands were retained, including 203 polymorphism primers between 4133B and Suyuan04–3 (Additional file 5: Table S1a), 208 between 4133B and J02–407(Additional file 5: Table S1b), 158 between SGK9708 and J02–407 (Additional file 5: Table S1c), and 170 between SGK9708 and 4133B (Additional file 5: Table S1d). The polymorphism rates of the primers for the four comparisons were 3.55, 3.64, 2.77, and 2.98%, respectively.

Joinmap 4.0 was employed to construct a genetic linkage map. For the 4Su population, 71 markers were assigned to 10 linkage groups (LGs) with a total map length of 585.97 cM (Table 2, Additional file 1: Fig. S1, Additional file 6: Table S2a). The average length of the LGs was 58.6 cM, and the average distance of markers was 8.25 cM. The longest LG, LG9, contained the most markers (27), and half of the LGs contained only three markers.

Table 2 Summary of 4 genetic linkage maps

For the 4 J population, 61 markers were assigned to 10 linkage groups with a total map length of 752.45 cM (Table 2, Additional file 2: Fig. S2, Additional file 6: Table S2b). The average length of the LGs was 75.2 cM, and the average distance of markers was 12.34 cM. LG7 contained the most markers (21) and LG3 contained the least markers (3).

For the SgJ population, 83 markers, approximately half of the 158 polymorphism markers, were assigned to 15 linkage groups with a total map length of 855.04 cM (Table 2, Additional file 3: Fig. S3, Additional file 6: Table S2c). The average length of the linkage groups was 57 cM. The longest average distance of markers was 21.46 cM on LG13 and the shortest was 1.06 cM on LG14.

For the Sg4 population, 52 markers, approximately one-third of the 170 polymorphism markers were assigned to nine linkage groups with a total map length of 1 163.66 cM (Table 2, Additional file 4: Fig. S4, Additional file 6: Table S2d). The average length of linkage groups was 129.3 cM, and the average distance of markers was 22.38 cM.

QTL mapping

The WinQTL Cartographer 2.5 with the CIM algorithm was used to identify the QTLs for eight traits in the four F2 populations. As a result, a total of 50 QTLs with R2 of 0.1–59.24% were identified, 27 were for fiber quality traits and 16 were for yield traits. A total of 23, 4, 8, and 15 QTLs were detected in the 4Su, 4 J, SgJ, and Sg4 populations, respectively (Table 3, Fig. 3). LG9 in the 4Su population harbored the highest number of QTLs (13), following by LG6 (6) and LG1 (5) in the Sg4 population.

Table 3 Summary of QTLs identified in four populations for 8 traits
Fig. 3
figure 3

QTLs identified for 8 traits across 4 populations. a 4Su population; b 4 J population; c SgJ population; d Sg4 population

Seven QTLs for PH were identified, but six of them in the 4Su population had only minor effects (0.11% < R2 < 4.02%; Table 3, Fig. 3). The additive effects of QTLs qPH2–1 and qPH2–2, which with the higher R2 (2.66% and 4.02%), were positive, indicating that the favorable alleles were from Suyuan04–3. The action modes of qPH2–1 and qPH2–2 were over-dominance according to the dominance degree values.

Eight QTLs for BW were identified with R2 of 1.17%∼9.31% in the 4 J (1), SgJ (1), and Sg4 (6) populations (Table 3, Fig. 3). It is noteworthy the LGs that harbored one of QTLs in the 4 J (qBW4) and SgJ (qBW2) populations were anchored to chromosome A05, and the common SSR marker, NAU1255, was detected close to the QTL interval implying that NAU1255 was closely linked to BW. Furthermore, the directions of the additive and dominance effects of these QTLs were the same.

Eight QTLs for LP were identified with R2 of 1.68% ~ 18.11% in the 4Su (2), 4 J (2), and SgJ (4) populations (Table 3, Fig. 3). The additive effects of two major QTLs, qLP74J and qLP2 with R2 > 10% were negative, indicating that the favorable alleles were from J02–247. The action modes of qLP74J and qLP2 were over-dominance and dominance, respectively.

Eleven QTLs for FL were identified in the 4Su (6), 4 J (1), and SgJ (4) populations (Table 3, Fig. 3). Multiple QTLs were in the same LG of a population, for example, qFL9–1, qFL9–2, and qFL9–3 with R2 of 0.35%∼7.70% were in LG9 of the 4Su population. Interestingly, both LG7 in the 4Su population and LG6 in the SgJ population were anchored to chromosome A13 (Table 2). The common SSR markers, BNL2449 and NAU1211, were detected near QTLs qFL74Su and qFL6, hinting that BNL2449 and NAU1211 may be closely linked to FL. In addition, the additive effect of QTL qFL2–2 was positive, suggesting that the favorable alleles come from the male parents, Suyuan04–3 and J02–247, that is endowed with superior fiber quality.

Five QTLs for FS were identified: four with R2 of 2.95%∼7.15% in the 4Su population and one major QTL with R2 of 15.10% in the Sg4 population (Table 3, Fig. 3). The additive effects of these four QTLs in the 4Su population were positive, whereas the additive effect of the one major QTL in the Sg4 population was negative, implying that the parent, 4133B may not have conferred the favorable allele.

Only two QTLs for FU were identified with R2 of 0.10%∼1.21% in the same LG of the 4Su population (Table 3, Fig. 3).

Four QTLs for FE were identified with R2 of 0.16%∼5.62% in the 4Su, SgJ and Sg4 populations (Table 3, Fig. 3). The additive effect of one QTL, qFE8, was negative and the action mode was additive, whereas the additive effects of the other three QTLs were positive and the action modes were over-dominance.

Five QTLs for MIC were detected in three LGs in the 4Su and Sg4 populations (Table 3, Fig. 3). A major QTL, qMIC2 with R2 of up to 59.24%, was in LG2 of the Sg4 population, the other four QTLs were minor with R2 0.15%∼6.29%. The dominance degree values of all QTLs, except qMIC9–2, were up to 9.41∼92.03, suggesting the action modes were over-dominance.

A hotspot region was detected in LG9 of the 4Su population (Fig. 3a). Three QTLs (qFL9–1, qFS9–1, qFE9) were identified only at the position of 96.31 cM and the expansion of this region from 95.31 cM to 105.81 cM revealed the presence of eight QTLs for six traits: PH (105.81 cM), LP (95.31 cM), FL (96.31 cM, 102.81 cM), FS (96.31 cM, 101.81 cM), FE (96.31 cM) and MIC (100.81 cM). Therefore, this hotspot region may be an important genome region that affects agronomic and economic traits in cotton. Two other QTLs, qFU9–1 and qMIC9–1, were identified in the same LG9 at 41.71 cM.

QTLs comparison and analysis

We compared all of the identified QTLs with the QTLs in CottonQTLdb database. The results showed that one-fifth of our QTLs (10/50) overlapped with previously reported QTLs, illustrating the reliability of our QTL mapping and indicating the other 40 QTLs were novel QTLs. The 10 common QTLs were reported to be associated with FL (4), FS (2), PH (1), BW (1), LP (1), and FE (1) traits. QTLs for FL were the most identified QTLs in both the present study (11) and CottonQTLdb database (494), which may have increased the probability of a hit.

QTLs for different traits that shared the same or overlapping confidence intervals were considered to be in QTL clusters. In the present study, a total of nine QTL clusters were identified in the 4Su (5), 4 J (1), and Sg4 (3) populations. The QTL cluster harboring the most QTLs was the hotspot region described above, with eight QTLs for six traits. Another QTL cluster in the same LG (LG9 in the 4Su population) contained QTLs for FU and MIC (Fig. 3a).

As we know, BW and LP represented yield traits, FL, FS, FU, FE and MIC represented fiber quality traits. With this prerequisite, the analysis of paired trait QTLs was employed. There were 19 paired trait QTLs within six paired traits (BW and FL, or FE; LP and FL, FS, FU, or FE) that had significant medium or high positive correlations (|r| > 0.3) in the F2 populations. Six of the 19 paired trait QTLs had the same direction of addictive effects (Additional file 7: Table S3).

Discussion

To dissect the genetic basis underlying yield and fiber quality as well as their genetic correlations, two upland cotton normal lines (4133B and SGK9708) and two introgression lines (Suyuan04–3 and J02–247) were selected as parents, and four F2 populations were constructed. Among these populations, the female parents of 4Su, 4 J and SgJ were potential high yield lines, and the male parents were superior fiber quality lines. Thus, extensive phenotypic variation was observed in the cross combinations whose parents had a distant kinship. All of the targeted traits exhibited normal distribution patterns across the four F2 populations (Table 1, Fig. 1), suggesting that these traits were quantitative traits controlled by multiple genes.

Furthermore, all of the traits exhibited transgressive segregation and many individuals with transgressive phenotypes were found (Table 1, Fig. 1). For example, all of the median values of FL and FS in the 4Su, 4 J, and SgJ populations were higher than or nearly 30, fiber with two quality values over 30 (FL ≥ 30 mm and FS ≥ 30 cN·tex− 1) is generally considered as fine quality one. In plant breeding, transgressive segregation provides an adaptive advantage for traits (Reyes 2019). To a certain extent, high yield and fine–quality fibers are the outcome of the adaptation of cotton. Therefore, it is not surprising that many instances of transgressive segregation were observed for these traits in the F2 populations. Furthermore, some of these transgressive lines can be used to breed for high-quality fiber. However, these characteristics imply that the favorable alleles of the fiber quality traits were generally from the introgression lines parents.

Quantitative traits are known to be influenced by the environment. Therefore, to identify stable QTLs, mapping populations are usually planted in multiple environments (Tang et al. 2015; Diouf et al. 2018; Zhang et al. 2020). However, we detected many stable QTLs including qBW4 and qBW2, using the four F2 populations. Although qBW4 was detected in the 4 J population and qBW2 was detected in the SgJ population, they had a common marker and the corresponding LGs were on the same chromosome (A05). Therefore, these two QTLs could be considered as one. This study has demonstrated an alternative method of detecting stable QTLs through multiple populations.

The detection of QTLs cluster is consistent with the results of previous studies, that found that QTLs for fiber quality were clustered on the same chromosome, D09, where the majority of markers in LG9 were mapped, harboring important loci for fiber quality traits (He et al. 2007; Qiao et al. 2019). These results illustrated that QTLs in clusters may be closely linked or have pleiotropic effects (Vikram et al. 2015; Zhao et al. 2016; You et al. 2019; Yuan et al. 2018), which explains the significant phenotypic correlations or linkage drag found between related traits (Zhang et al. 2020). The simultaneous improvement of paired trait QTLs that have the same QTL additive effect direction and significant medium or high positive correlations should be made (Zhang et al. 2020). In the present study, we identified six paired trait QTLs with significant positive correlation and additive mode of gene action (Additional file 7: Table S3) that may be valuable for further simultaneous improvement of yield and fiber quality traits.

We found that qLP9 for LP and qFL9–1 and qFL9–2 for FL were in the same QTL cluster in LG9 of the 4Su population. Furthermore, a high positive correlation between LP and FL and the same direction of the addictive effect were observed. We plan to use the detected QTL clusters in a future breeding program by fine–mapping the QTL clusters using large scale segregating populations and gene-editing technology to break the negative correlation and further improve yield and fiber quality.

Conclusions

In this study, four F2 populations were derived from hybridization between two G. hirsutum normal lines and two introgression lines. Four corresponding genetic linkage maps were constructed and QTL mapping was implemented following the integration of phenotypic data of eight agronomic and economic traits. Fifty QTLs across the four populations were detected; 27 were for fiber quality traits and 16 were for yield traits. QTLs in the same cluster, such as qLP9 for LP and qFL9–1 for FL, were prioritized for further research. Our results will help to dissect the genetic basis underlying yield and fiber quality, and lay a promising foundation for simultaneously improving yield and fiber quality in upland cotton breeding programs.