Background

Cotton (Gossypium spp.) is an important fiber crop species worldwide. Among all the cultivated species, upland cotton (Gossypium hirsutum L.) is widely planted and accounts for more than 95% of the global cotton production. Cotton fiber is the main raw material for the textile industry, and its characteristics affect the quality of cotton textiles. Due to the growing human population and societal reasons, the demand for textiles is increasing. Moreover, owing to the serious environmental pollution associated with materials that rely on fossil resources such as petroleum, synthetic fibers do not meet the criteria for sustainable development. Therefore, the production of increased amounts of and better cotton fibers is becoming crucial.

Cotton fibers initiate from the ovule epidermis on the day of anthesis. After fiber initiation, fiber cells expand rapidly as a result of increased intracellular swelling pressure and cell wall relaxation. The fiber development process can be divided into four overlapping stages on the basis of morphological characteristics during a 50-day period: initiation (0–3 days post anthesis (DPA)), elongation (3–20 DPA), secondary cell wall thickening (16–40 DPA) and maturation (40–50 DPA) [1]. Mature fibers can be classified into two types: lint fibers and fuzz fibers. Lint fibers initiate at 0–2 DPA, and fuzz fibers initiate 2–5 days later [2].

As a heterotetraploid crop species, upland cotton has a large and complex genome [3]. Identification of the genes involved in fiber initiation and elucidation of the fiber development mechanism are hard. Cotton fibers, which constitute a type of seed hair, reportedly share similarities with leaf trichomes [4,5,6]. Based on the epidermal hair regulatory complex GL1-TTG1-GL2 in Arabidopsis thaliana [7,8,9,10,11], the homologous genes involved in cotton ovule epidermis were cloned and shown to restore the trait in Arabidopsis hairless mutants [12,13,14]. With the development of transgenic technology, an increasing number of genes involved in fiber development have been functionally verified. When the expression of the R2R3 MYB transcription factor GhMYB109 was inhibited, fiber initiation was delayed, indicating that this gene is required in the initial development of fibers [15]. Overexpression of GhMYB25 resulted in increases in the numbers of cotton fibers and leaf epidermal hairs, while gene silencing led to opposite results [16]. When the GhMYB25-like gene was silenced, the plants produced naked seeds but normal epidermal hairs on other parts, indicating this gene has important functions during the fiber cell differentiation stage [17]. Then, GhMYB25-like was found to be the mutated gene responsible for the naked seed mutant phenotype through a map-based cloning strategy [18]. In addition, silencing of GhHD-1 reduced the number of epidermal hairs and delayed fiber initiation, and the initial fiber number increased when GhHD-1 was overexpressed [19].

In addition to the fiber initiation associated genes mentioned above, some factors that regulate fiber elongation have been reported, such as the turgor pressure [20], sucrose [21], calcium (Ca2+), reactive oxygen species (ROS) and actin [22]. By activating the function of the GhCaM7 gene, Ca2+ plays key roles in promoting cotton fiber elongation, while ROS has an opposite effect in this pathway [23]. WLIM1a and GhCFE1A, whose gene products are involved in actin bundles and act as dynamic linkers between the endoplasmic reticulum and actin cytoskeleton, play important roles in fiber elongation [24, 25]. Shan et al. revealed a new molecular mechanism by which GhHOX3 promotes cotton fiber elongation [26]. Ethylene can also promote cell elongation by regulating sucrose synthase, tubulin and expansion-related proteins [27,28,29].

Genes are decisive factors in controlling phenotypes. Identifying genes related to fiber development has become crucial for genetic improvements of fiber. RNA sequencing (RNA-seq) techniques have been applied, and a number of fiber-related genes have been screened and identified [30, 31]. The complete whole-genome sequence of upland cotton [32, 33] has provided a reference for functional genomics research. As an effective tool to reveal gene expression differences, transcriptome sequencing represents a classic high-throughput method for identifying differentially expressed genes (DEGs) between different species or between different developmental stages [34, 35]. In the present study, we observed seed phenotypes at the fiber initiation stage, measured the length of developing fibers between short- and long-fiber cotton lines, and confirmed the difference in fiber quality. Comparative transcriptome analysis was then applied to these two cotton groups to clarify two major issues: (1) the differences in transcriptional changes associated with different developmental stages and genotypes and (2) the genetic basis for the differences in fiber quality between short- and long- fiber cotton lines. The results of this study might provide new insights into the molecular mechanisms of fiber development in upland cotton, which would help researchers explore new ways for creating additional and better fibers.

Methods

Plant materials and sample collection

Four Gossypium hirsutum lines (69–6025-12, Liao 1779, 601 long-staple cotton (LSC), J02–508) were selected in this study. The seeds of all four lines were retrieved from the national cotton germplasm mid-term bank located at the Institute of Cotton Research, Anyang, China. The fiber length (FL) of these four lines was stable across years, but there were large differences between the lines (Table 1). The FL of 69–6025-12 and Liao 1779 was always less than 26 mm; therefore, these lines composed what was defined as the short-fiber group (SFG). The FL of 601 LSC and J02–508 was always greater than 32 mm; therefore, these lines composed what was defined as the long-fiber group (LFG).

Table 1 Analysis of fiber traits of four cotton lines

Flowers were tagged on the day of anthesis, and sample collection was performed at − 3, 0, 3, 5, and 10 DPA. The − 3 DPA samples were collected based on bud length. The ovules and fibers were dissected immediately after the bolls were harvested, frozen in liquid nitrogen, and stored at − 80 °C. Three biological replicates were performed for every sample.

Phenotypic observation of developing fibers

At the lint fiber initiation stage, 0 and 1 DPA ovules were observed with scanning electron microscopy. Three ovules were randomly selected from each material, and the number of bulged cells within a unit area of 0.02 mm2 (0.1 mm × 0.2 mm) in the middle of the surface was counted. At the fuzz fiber initiation stage, 3 DPA ovules were collected for paraffin sections. The detailed processes of scanning electron microscopy and paraffin sectioning were performed as reported by Liu et al. [36, 37]. At the fiber elongation stage, 10 DPA ovules were boiled and combined with running water, after which the length of attached fibers was measured [23]. Mature fibers were harvested by hand, and five fiber quality traits were evaluated [38].

RNA extraction, library construction and transcriptome sequencing

The plant tissues were finely ground under liquid nitrogen. An RNAprep Pure polysaccharide polyphenol plant total RNA extraction kit (DP441, TIANGEN, Beijing) was used, and the operation steps were in strict accordance with the instructions. One microliter of the extracted RNA was subjected to 1% agarose gel electrophoresis to check the RNA integrity. In addition, the RNA concentration and contamination were measured by a Nanodrop 2000 spectrophotometer (Thermo, USA).

According to the two cotton groups and four representative stages/tissues, we mixed equal quantities of RNA within groups and at the same stage. The sample mixing scheme is shown in Table 2. This operation aimed to highlight the differences between two groups and between the different fiber developmental stages. In total, 16 cDNA libraries comprising 2 cotton groups at 4 developmental stages/tissues (Table 2) and 2 biological replicates were sequenced on an Illumina HiSeq 2500 sequencing platform, and 125 bp/150 bp paired-end reads were generated.

Table 2 Sample mixing scheme for transcriptome sequencing

Bioinformatics analysis

We performed all bioinformatics analyses in accordance with standard procedures. The raw data in FASTQ format were first processed to obtain clean reads, and then the Q20, Q30 and GC content of the clean reads were calculated. Second, Gossypium hirsutum TM-1 genome [33] and gene model annotation files (CottonGen database, http://www.cottongen.org) were used for compiling a reference genome for clean read alignments. Third, HTSeq software [39] was used to count the number of reads mapped to each gene, and the fragments per kilobase of transcript per million mapped reads (FPKM) was calculated for estimating gene expression levels. The genes with an adjusted p value < 0.05 and a |log2 (fold change)| > 1 were then assigned as differentially expressed. In addition, personalized Gene Ontology (GO) enrichment analysis, pathway enrichment analysis, gene expression pattern analysis and principal component analysis (PCA) were performed using OmicShare tools, a free online platform for data analysis (www.omicshare.com/tools). The detailed calculation methods and parameter settings involved in the above analysis are described in Additional file 1.

Validation of RNA-seq by qRT-PCR

To validate the RNA-seq data, 12 DEGs from the pathway enrichment analysis were selected for qRT-PCR analysis. The specific primers of these genes directly referenced the data from qPrimerDB (https://biodb.swu.edu.cn/qprimerdb) [40]. Reverse transcription and qRT-PCR were performed as previously described [41]. To properly evaluate PCR amplification efficiency, we performed 3 parallel replicates using five templates of 2-fold serial gradient dilutions first [42]. Then three technical replicates and three biological replicates were performed on all reactions. The ΔΔCt algorithm was used for calculating relative gene expression. The endogenous reference gene used was GhActin7 (AT5G09810). All primer pairs used in the analysis and their amplification efficiencies are shown in Additional file 2.

Results

Differences in fiber phenotypes between long- and short-fiber cotton lines

The fiber phenotypes of the SFG (69–6025-12, Liao 1779) and LFG (601 LSC, J02–508) were different at the initiation, elongation and maturation stages. The ovule epidermal cells began to bulge at 0 DPA (Fig. 1a-d). One day later, the number of bulged cells and the cell diameter peaked (Fig. 1e-h). The average cell number of the four materials, from high to low, was in the order of 69–6025-12, Liao 1779, J02–508 and 601 LSC; the average numbers were 124.3, 106, 80.3 and 79.3, respectively, at 0 DPA (Fig. 1i), and the average numbers were 139, 110.3, 80.6 and 79, respectively, at 1 DPA (Fig. 1j). Multiple hypothesis test results showed that the differences between the SFG and LFG were significant both at 0 DPA and 1 DPA; i.e., the initial fiber numbers in the SFG were greater than those in the LFG.

Fig. 1
figure 1

Scanning electron microscopy images of 0 DPA and 1 DPA ovules illustrating fiber initiation on the ovule surface. This image shows four upland cotton lines, i.e., 69–6025-12 (a, e), Liao 1779 (b, f), 601 LSC (c, g) and J02–508 (d, f). Scale bars: 100 μm (a-h). i Statistical results of the initial fiber number per unit area at 0 DPA. j Statistical results of the initial fiber number per unit area at 1 DPA. The height of the column is the average of three counts, and the error bars indicate ± S.D. The columns in the graph are arranged from large to small. The letters above the columns indicate the significance level of the initial fiber numbers of the four cotton materials. The same letters indicate no significant difference, and different letters indicate the difference was significant

The fiber cells elongated slightly at 3 DPA. The fiber cells of 69–6025-12 and Liao 1779 were shorter through the longitudinal section (Fig. 2a, b), while those of 601 LSC and J02–508 were longer (Fig. 2C, D). The fiber densities differed when the ovule chalazal end was further enlarged. Fibers of 69–6025-12 and Liao 1779 were arranged closely (Fig. 2A, B); there was almost no gap between fiber filaments, and the density was very high. In 601 LSC and J02–508, scattered filaments can be seen with relatively clear voids, and the fiber density was relatively low (Fig. 2C, D). This difference was consistent with the bulged cell numbers observed at 0 and 1 DPA (Fig. 1).

Fig. 2
figure 2

Fiber phenotype observations of four upland cotton lines. Median longitudinal sections of 3 DPA ovules illustrating the extent of fiber coverage on the surface of a seed: (A, a) 69–6025-12; (B, b) Liao 1779; (C, c) 601 LSC; (D, d) J02–508. Scale bars: 1 mm (A-D); 400 μm (a-d). E Difference in FL at 10 DPA between the two cotton groups. Three biological replicates were measured per material. The error bars represent the means ± S.D.s, p = 2.38E-05. F Difference in mature FL between the two groups of cotton lines. Three fiber samples were measured. The error bars represent the means ± S.D.s, p = 1.02E-08. (***, p < 0.001; two-tailed Student’s t-test)

At 10 DPA, the FL of 69–6025-12 was 12.0 mm, and that of Liao 1779 was 13.0 mm; in contrast, the FL of 601 LSC and J02–508 was 18.5 mm and 17.5 mm, respectively (Fig. 2E). Two sets of data were averaged and then subjected to a two-tailed t-test. The results showed that there was a significant difference in FL between the two groups (Fig. 2E).

After fiber maturation, the FL of the four materials was measured; the values were 25.77 mm, 25.63 mm, 33.00 mm, and 32.50 mm (Fig. 2F, Table 1). The difference between the SFG and the LFG was extremely significant according to the results of the t-test. In addition to FL, five other fiber traits (lint percent, uniformity, fiber strength, micronaire and elongation rate) were also analyzed. In general, the differences in all six traits between the two groups were extremely significant (Table 1). Among these traits, a positive correlation was found between lint percent and micronaire value, indicating that a high micronaire value (coarse fiber) will lead to an increase in lint percent, which is consistent with the published results [43]. However, lint percent was negatively related to FL, uniformity, strength and elongation (Table 1).

Phenotypic observations at each developmental stage suggest that significant differences in the final fiber quality of the four materials were formed by the gradual accumulation of differences during fiber initiation and elongation. Therefore, these four materials might constitute a good model for revealing the molecular mechanism governing differences in fiber initiation and elongation.

Transcriptome sequencing and quality assessment

A total of 16 cDNA libraries were constructed and sequenced in this study. The ratio of clean reads to the total number of raw reads was 94.94 to 96.44% (Additional file 3), indicating that the proportion of low-quality data in the raw reads was very low. Moreover, the proportions of sequences that could be mapped to the reference genome exceeded 85.54% in all samples, indicating no contamination and that the reference genome was appropriate. The percentage of multiple mapped reads was less than 10%, which met the criteria. In addition, Q30 ≥ 91.83% and GC contents between 43.01~45.03% were identified in all samples (Additional file 3). The data above showed that the transcriptome sequencing was of high quality.

Based on the number of genes expressed at different levels, the threshold by which genes with an FPKM > 1 (indicating that those genes were expressed) reached more than 50% in all samples (Additional file 4). Similarities greater than 97% between two biological replicates (Additional file 5) were then analyzed by the Pearson correlation coefficient method, indicating that our experiment had good repeatability. Moreover, the similarity between the SFG and the LFG at the same developmental stage reached 94% (Additional file 5) and was greater than the threshold of 92%, which indicated that our experimental operations were qualified. The gene expression became increasingly dissimilar to that of the lint fiber initiation stage (SFG-0 and LFG-0) in the subsequent two stages in both the SFG (SFG-5 and SFG-F10) and the LFG (LFG-5 and LFG-10) (Additional file 5). These results indicated that there is a dramatic change in gene expression patterns as the developmental stage changes. However, compared with the stage 0 (SFG-0 and LFG-0) and stage 5 samples (SFG-5 and LFG-5), the 10 DPA ovules (SFG-O10 and LFG-O10) were more strongly correlated than were the 10 DPA fibers (SFG-F10 and LFG-F10). This phenomenon may be related to the fibers and ovules in the previous two stages not separating. Further confirming the relationship between the two groups at different developmental stages, the PCA showed that the same developmental stage of the four materials clustered together, indicating the difference between developmental stages was greater than that between materials (Additional file 6).

Analysis of DEGs

According to the DEG identification criteria, 773, 2191, 803 and 898 DEGs were identified at four developmental stages between the LFG and SFG (Fig. 3 and Additional file 7). The greatest number of DEGs occurred at the fuzz fiber initiation stage (5), which may be related to this stage including both fuzz fiber initiation and lint fiber elongation. Additional DEGs were identified between different developmental stages (tissues) within the material (Fig. 3). These findings showed that samples at different stages indeed dramatically changed, which was consistent with our PCA results (Additional file 6).

Fig. 3
figure 3

Multiple comparisons between the SFG and LFG at various stages of fiber development. The numbers around the arrows denote the numbers of genes differentially expressed for the specified comparison. Red, upregulation; green, downregulation

Cluster analysis of all the DEGs was performed using FPKM values (Additional file 8). The expression patterns were similar between the SFG and LFG but different between stages. However, the patterns at stage 5 and stage O10 were relatively similar and consistent with the Pearson correlation coefficient analysis results (Additional file 5). Moreover, gene expression was continuously upregulated or downregulated concurrently with the developmental process, which may help to reveal the mechanism governing phenotypic differences during fiber initiation and elongation.

Gene expression pattern analysis

The expression patterns of the DEGs among three developmental stages (0, 5, F10) (Additional file 7) in the SFG and LFG were analyzed (Fig. 4a). In the SFG, most DEGs were enriched in profile 0 (4974 genes, accounting for 26.00% of 19128 DEGs), followed by profile 6 (3576 genes, 18.70%) and profile 7 (2914 genes, 15.23%). In the LFG, most DEGs were enriched in profile 3 (4665 genes, 22.43% of 20795 DEGs), followed by profile 0 (4491 genes, 21.60%) and profile 7 (3317 genes, 15.95%).

Fig. 4
figure 4

Patterns of gene expression and GO enrichment of DEGs across three developmental stages in the SFG and LFG. a Patterns of gene expression across three developmental stages in the SFG and LFG inferred by Short Time-series Expression Miner (STEM) analysis. Each square represents a trend of gene expression. The text on the square indicates the profile ID number and the number of genes within that profile. The black line represents the expression tendency of all the genes. A colored square indicates that the pattern was significantly enriched (p < 0.05). The profiles were ordered based on the number of genes enriched therein. b GO enrichment analysis of three significant patterns in the SFG. c GO enrichment analysis of three significant patterns in the LFG. The significance of the most represented GO terms in each profile is indicated by the p value. The red areas represent significant enrichment (p < 0.05), whereas the dark gray areas represent nonsignificance (p > 0.05)

The DEGs in profile 0 and profile 7 were significantly enriched in both the SFG and LFG (Fig. 4a). The expression of DEGs in profile 0 was downregulated continuously, while that in profile 7 was upregulated continuously. At the fiber initiation stage, related genes are highly expressed, but their expression levels are downregulated as development progresses [16, 17, 19]. This finding is consistent with the trend exhibited by profile 0, indicating that DEGs enriched in profile 0 are involved in the regulation of fiber initiation. However, the genes related to fiber elongation were highly expressed at 10 DPA, but their expression was lower during the early developmental stages [23, 24, 26]. These findings are consistent with the pattern of profile 7, indicating that DEGs enriched in profile 7 might be involved in fiber elongation.

Four hundred eighty-three more DEGs were enriched in profile 0 in the SFG than in the LFG (4974 vs. 4491) (Fig. 4a); i.e., more genes were involved in fiber initiation in the SFG than in the LFG. This finding was consistent with the previous result by which the SFG (69–6025-12, Liao 1779) had more bulged epidermal cells at 0 DPA and 1 DPA than did the LFG (Fig. 1). Compared with profile 0, profile 7 had 402 fewer DEGs enriched in the SFG than in the LFG (2914 vs. 3317) (Fig. 4a); i.e., more genes were involved in fiber elongation in the LFG than in the SFG. This finding was also consistent with the result by which the FL of the LFG was greater than that of the SFG at 10 DPA (Fig. 2E). Most gene expression patterns during fiber development were similar between the SFG and LFG, but the main difference lied in the number of genes and the function of key genes therein.

Enrichment analysis

GO enrichment analysis was performed on the genes within each profile (Additional file 9). In the SFG, genes within profile 0 were enriched mainly in the cellular macromolecular metabolic process (GO:0044260, p = 1.05E-02), microtubule-based movement (GO:0007018, p = 1.52E-09), the microtubule-associated complex (GO:0005875, p = 9.45E-10), microtubule motor activity (GO:0003777, p = 1.37E-09), sequence-specific DNA binding (GO:0003700, p = 2.58E-15), and signal transducer activity (GO:0004871, p = 6.85E-03). The genes within profile 7 were enriched mainly in the cell wall macromolecule metabolic process (GO:0044036, p = 8.28E-03), intracellular signal transduction (GO:0035556, p = 5.85E-05), polymeric cytoskeletal fibers (GO:0099513, p = 7.30E-07) and calmodulin binding (GO:0005516, p = 1.66E-03) (Fig. 4b). In the LFG, the significantly enriched terms in profiles 0 and 7 are roughly consistent with the corresponding profiles in the SFG, with the exception of some specifically enriched terms, such as the oxidoreductase complex (GO:1990204, p = 2.55E-02) in profile 0 and ATP hydrolysis-coupled proton transport (GO:0015991, p = 3.23E-05), regulation of GTPase activity (GO:0043087, p = 5.01E-04), and carbohydrate derivative binding (GO:0097367, p = 5.15E-03) in profile 7 (Fig. 4c).

Previous results have shown that genes in profile 0 are associated with fiber initiation. Therefore, DEGs between SFG-0 and LFG-0 (Fig. 3, 354 + 419 DEGs) together with genes within SFG-profile 0 and LFG-profile 0 were analyzed (Fig. 5a). A total of 196 genes (90 + 32 + 74) were downregulated in the SFG or LFG and differentially expressed during the fiber initiation stage (Fig. 5a). Similarly, we analyzed the DEGs between SFG-F10 and LFG-F10 (Fig. 3, 445 + 358 DEGs) together with the genes within SFG-profile 7 and LFG-profile 7 (Fig. 5b). There were 152 genes (65 + 27 + 60) upregulated in the SFG or LFG and differentially expressed during the fiber elongation stage (Fig. 5b). To confirm which metabolic pathways the DEGs were involved in, Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was then performed on 196 and 152 DEGs (Additional file 10 and Fig. 5c, d). Twenty-two DEGs participated in different metabolic pathways during the fiber initiation (Fig. 5c), while 31 DEGs were active during the fiber elongation (Fig. 5d). Functional annotations of these 53 DEGs were obtained by homologous comparison to entries within The Arabidopsis Information Resource database (TAIR, http://www.arabidopsis.org/). Six different KEGG classifications were obtained, each containing at least one DEG (Table 3).

Fig. 5
figure 5

Summary of DEGs identified in this study. a Venn diagram showing the number of DEGs shared between profile 0 in the SFG and LFG and LFG-0 vs SFG-0. b Venn diagram showing the number of DEGs shared between profile 7 in the SFG and LFG and LFG-F10 vs SFG-F10. c KEGG pathway annotation of 196 DEGs (90 + 32 + 74) from A. d KEGG pathway annotation of 152 DEGs (65 + 27 + 60) from B

Table 3 DEGs with homologs in Arabidopsis according to KEGG pathway analysis

Twelve genes, which might be related to fiber development, were selected for expression analysis on the basis of their functional annotation. After they were clustered by rows, the expression level of these 12 genes could be divided into three types (Fig. 6a). The first type showed a regular pattern in which the expression levels of the DEGs were downregulated, and the expression levels were greater within LFG-0 than within SFG-0 (Fig. 6a, top). This type included three genes, TUB6 (Gh_A07G1077), TPS11 (Gh_D08G0936), and ERF1 (Gh_D11G0426). The second type showed that the expression levels of DEGs were upregulated, and the expression levels within LFG-F10 were greater than those within SFG-F10 (Fig. 6a, middle). This type included 6 genes: MES1 (Gh_D11G1910), TUB1 (Gh_D01G0939), TUA2 (Gh_D02G2420), RS5 (Gh_A05G3108) and two pectin lyase (PL) family genes (Gh_D11G0237 and Gh_D12G2437). As illustrated in our expression pattern analysis, profile 0 and profile 7 were related to fiber initiation and elongation, respectively. Therefore, the highly expressed genes in the LFG might have more promoting effects. The third type was similar to the second type, but the expression levels within SFG-F10 were greater than those within LFG-F10 (Fig. 6a, bottom). PER64 (Gh_A02G1663), Gh_A11G0965 (glycosyl hydrolase) and Gh_A05G3328 (PL) were included in this type, indicating that these three genes may negatively regulate fiber elongation.

Fig. 6
figure 6

Expression analysis and verification of DEGs. a A heat map of 12 DEGs from profile 0 and profile 7. b Correlations between qPCR and RNA-seq results for the 12 selected genes. Each point represents a fold change in expression level at F10 compared with 0. The fold change values were log2 transformed

RNA-seq validation by qPCR

To validate the RNA-seq data, the 12 genes shown in Fig. 6a were selected for qRT-PCR analysis. A very significant correlation between the transcriptome sequencing and qPCR data was presented (Fig. 6b). Although there were certain differences in expression levels, the expression trends were essentially the same, indicating that the data obtained from the transcriptome sequencing in this study were credible.

Discussion

Differential energy distribution between the SFG and LFG lines

Any life activity requires energy, and the process of fiber development is no exception [22, 44]. During fiber initiation and rapid elongation, the number of transcribed genes and their expression level may also be affected by the energy distribution in fiber cells. Yoo et al. proposed a model in which carbon resources might be reallocated in cotton fibers by domestication [45]. The carbon resource reallocation hypothesis was well supported by a proteomic analysis of developing fibers [41].

In our electron and light micrographs, differences in fiber initiation between the SFG and LFG lines were demonstrated (Fig. 1). Subsequent gene expression pattern analysis indicated that the DEGs involved in fiber initiation in the SFG were more than in the LFG (Fig. 4, profile 0). Although LFG lines had a lesser fiber number than SFG lines, their FL was longer (Fig. 2), which was positively correlated with the DEGs involved in fiber elongation in the LFG (Fig. 4, profile 7). According to the hypothesis mentioned above, the SFG lines would “invest” more in fiber number, while the LFG lines would “invest” more energy in fiber length. Therefore, controlling the energy distribution in gene transcription may be critical for fiber development regulation.

Advances in the study of fiber development mechanisms by expression profile sequencing

A transcriptome can reflect the gene expression and regulatory rules in specific tissues at the whole-genome level. With the advancement of technology, the expression profile sequencing technology has progressed from cDNA array and microarray expression analyses to transcriptome sequencing, e.g., expressed sequence tags and RNA-seq. At the beginning of 2000, using a cDNA array, Li et al. identified 14 highly expressed cDNAs in fibers from wild-type and mutant cotton plants [46]. Shi et al. constructed a cDNA library for microarray expression analysis; these authors reported that the ethylene synthesis pathway was significantly upregulated during fiber elongation and that ethylene did promote this process [27]. Later, different researchers successively studied the changes in fiber expression profiles between different cotton species [47], during various stages [48], and under domestication [31]. These findings have deepened our understanding of genes related to fiber development.

The DNA microarray method is based on the hybridization of specific probes and is subject to large background interference, which limits the number of identifiable expressed genes. As the cost of sequencing has decreased, expression profiling has entered the era of transcriptome sequencing. Compared to DNA microarrays, transcriptome sequencing has higher throughput, lower background interference, and a wider range of gene expression detection. Using DNA microarray technology, Rapp et al. compared the expression levels of 40,430 genes involved in the development of two cotton species and found that domestication altered the expression of nearly a quarter of those genes [31]. Later, the team used transcriptome sequencing to compare and analyze the transcription levels of multiple upland cotton cultivars and wild species and found that almost one-third of the genes are expressed in the fibers and that the expression of approximately 5000 genes has changed as a result of domestication [45].

With the advancement of sequencing technology and the updating of sequencing platforms, the cost of RNA-seq technology has been decreasing, and the quality of sequencing is becoming increasingly reliable. In particular, the completion of the TM-1 genome sequencing [32, 33] has provided an improved reference genome for upland cotton research, making RNA-seq the most common, conventional and effective means to study upland cotton. The present study used the most recent RNA-seq technology and obtained high-quality results with excellent clean read rates, mapping rates, GC contents, Q30 values and so forth (Additional file 3). Twenty-two DEGs involved in fiber initiation and 31 DEGs involved in fiber elongation were detected here. Some of these have been reported to be related to fiber development, whereas some are new discoveries and are discussed separately below.

Ethylene metabolism and fiber development

Ethylene is a key regulator of cotton fiber cell growth [22, 27, 28]. The expression patterns of ethylene metabolism-related genes clearly differ between cotton species during fiber development [28, 29]. Ethylene plays a two-way regulatory role in fiber development; i.e., too much or insufficient amounts will inhibit fiber elongation, which is one of the reasons for the differences in FL [32]. In the present study, an ethylene response element binding factor (ERF1) was identified, and its expression level significantly differed between the two groups (Fig. 6a). The KEGG enrichment results revealed that this gene plays a role in the ethylene regulatory network of plant hormone signal transduction. The upstream factor of ERF1 is an ethylene-insensitive protein (EIN3). Members of the EIN3 family of nuclear-localized proteins are essential the plant response to gaseous ethylene. The activation of various ethylene response genes depends on the expression of ERF1 [49]. ERF1 acts on downstream DNA directly, thus affecting plant growth and development. As shown in Fig. 6a, expression of the ERF1 gene (Gh_D11G0426) was gradually downregulated in both the SFG and LFG but was highly expressed in the LFG. This finding indicated that, compared with that in the SFG, the response to ethylene in the LFG was faster and more efficient, which may be the main reason for the longer fibers in 601 LSC and J02–508.

Tubulin and fiber development

Microtubules are an integral part of the cytoskeleton and are widespread throughout the cytoplasm. They are found in eukaryotic cells and are formed by the polymerization of globular α-tubulin and β-tubulin dimers. Microtubules, an essential structural component of fiber cells, participate in the maintenance of cell structure and form the cytoskeleton together with microfilaments [50]. When the assembly of microtubules was blocked with colchicine during fiber elongation, a large number of microtubule fragments and disordered microfibrils were visible; compared with that of the control cells, the cell wall surface of these treated cells became wrinkled, and fiber elongation was significantly inhibited [51]. In an in vitro ovule culture experiment, fiber production significantly decreased in the presence of sulfamethoxazole (a microtubule-disrupting agent) and significantly increased when treated with paclitaxel (a microtubule-stabilizing agent). The abovementioned inhibition or promotion occurred during the early developmental stage (1–3 DPA); after that stage, fiber morphology can be altered by chemical agents, but fiber numbers cannot, indicating that tubulin may play an important role in fiber initiation [52]. The tubulin content in 8–28 DPA fibers was measured using monoclonal antibodies to α- and β-tubulin, and the results showed that the content increased approximately threefold at 10–20 DPA. After 20 DPA, the proportion of tubulin content relative to the total fibrin plateaued or slightly decreased, indicating that a rapid increase in tubulin was associated with rapid fiber elongation [53]. Comparative proteomic profiling between a short-fiber mutant (Li1) and its wild type revealed that some cytoskeleton-related proteins in the Li1 mutant were significantly reduced and that the actin cytoskeleton structure was severely distorted; as a result, vesicle trafficking was therefore blocked, suggesting that the short fibers of the Li1 mutant were associated with tubulin defects [54].

In this study, three tubulin-encoding genes (TUB6, TUB1, TUA2) were identified (Fig. 6a). TUB6 was highly expressed at 0 DPA, while TUB1 and TUA2 were highly expressed at 10 DPA, indicating that these genes play roles in fiber initiation and elongation, respectively. KEGG analysis revealed that TUA2 and TUB1 encode α-tubulin and β-tubulin, and their dimers can aggregate to form microtubules. Both proteins were expressed at higher levels in the LFG than in the SFG, which provided adequate microtubule protein for fiber elongation. This finding also explains why the fiber quality was better in the LFG (601 LSC and J02–508) than in the SFG lines.

Peroxidase (POD) and fiber development

Peroxisomes, which are vesicles surrounded by a single membrane, are ubiquitous in various types of eukaryotic cells. Enzyme activity constantly changes during plant growth and development. In plant tissues, POD can convert some carbohydrates into lignin, increasing the degree of lignification; therefore, enzyme activity is generally greater in aging tissues than in young tissues. The marker enzyme of peroxisomes is catalase, whose main role is to hydrolyze hydrogen peroxide (H2O2). Numerous studies have shown that ROS play a complex regulatory role in fiber development. In in vitro ovule culture experiments, H2O2 was shown to be involved in a feedback regulatory mechanism of ethylene biosynthesis, which regulates cotton fiber development [55]. Overexpression of the GhCaM7 gene promoted fiber elongation, and the concentration of ROS in the GhCaM7-overexpressing line was greater than that in the wild type, indicating that ROS are key regulators involved in fiber elongation [23]. Liu et al. detected a dramatic increase in ROS on the − 3 and − 2 DPA ovule surfaces in upland cotton fibreless mutants, indicating that homeostasis of ROS may play a key role in the regulatory mechanisms of cotton fiber development [56]. Virus-induced silencing of the GhPK6 gene in cotton plants resulted in increased FL and a decreased accumulation of ROS, indicating that, by regulating ROS, the cotton cytoplasmic pyruvate kinase gene GhPK6 may play an important role in cotton fiber elongation [57].

PER64 (Gh_A02G1663) is a member of the POD family of enzymes. POD can increase the degree of lignification in fiber cells. In the present study, the PER64 gene was highly expressed in the SFG (Fig. 6a), and its expression level gradually increased with development, revealing a significant negative regulation of fiber elongation. KEGG enrichment analysis revealed PER64 is involved in the phenylpropanoid biosynthesis process and catalyzes the synthesis of lignin from carbohydrates. Studies have shown that the phenylpropanoid biosynthesis pathway is relatively more active in wild cotton than in domesticated cotton and is not conducive to fiber elongation, so it has been subjected to great amounts of negative selection pressure during domestication [45, 58]. Therefore, active POD in the SFG may be a reason leading to the inferiority in FL of 69–6025-12 and Liao 1779.

Other fiber development-related genes

Three PL family genes (Gh_D11G0237, Gh_D12G2437, Gh_A05G3328) were differentially expressed during fiber elongation (Fig. 6a). Fiber primary cell walls contain high amounts of pectin, which can promote fiber elongation by regulating the production of cell wall polymers [29]. With changing pectin content, the homeostasis of cell wall polymers is altered by the differential expression of these three genes. These facts can somewhat explain the differences in FL between the SFG and LFG lines. Some carbohydrate metabolism-related genes were also identified (Fig. 6a). For example, the protein encoded by TPS11 (Gh_D08G0936) catalyzes the synthesis of trehalose from glucose, and that encoded by RS5 (Gh_A05G3108) catalyzes the production of raffinose from galactosides and sucrose. Sugars not only provide energy for cell development but also serve as substrates for the biosynthesis of cell wall polymers such as cellulose and pectin; therefore, sugars are closely related to fiber development [21, 59].

In total, 53 DEGs were identified through the KEGG enrichment analysis (Table 3). The discussion of the genes with known functions above indicates that our RNA-seq results are reliable. The remaining genes in Table 3 would be good candidates for revealing the mechanisms of fiber initiation and elongation.

Conclusions

In this study, differences in the fiber initiation and elongation stages between long- and short-fiber cotton lines were determined based on phenotypic observations. The developmental dynamics of the cotton fiber transcriptomes of these lines was the comparatively analyzed with RNA-seq. Gene expression pattern analysis of the DEGs between developmental stages revealed that profile 0 and profile 7 were enriched both in the SFG and LFG. Twenty-two genes were identified from profile 0 (related to fiber initiation), in which ERF1 was involved in ethylene metabolism and was expressed at a higher level in the LFG than in the SFG, which affected fiber initiation. Thirty-one genes were identified from profile 7 (related to fiber elongation), in which TUA2 and TUB1 were involved in microtubule synthesis and were expressed at a higher level in the LFG than in the SFG, while PER64, which encoded a POD, was expressed at a higher level in the SFG. From a functional standpoint, microtubules were the exact structural component necessary for fiber elongation, and by ultimately accelerating cell lignification, POD was not conducive to fiber elongation. Therefore, the differential expression of these genes may be the main reason leading to phenotypic differences in the fibers.

The results of this study have increased the understanding of relevant metabolic pathways involved in the process of fiber initiation and elongation. It is the first step in screening candidate genes for genetic improvement of cotton fiber. This work supplements current knowledge in this area and can guide future research.