Background

Auxin, widely distributed in higher plants, influences nearly all aspects of plant growth and development through regulating cell division, expansion, differentiation, and patterning [1]. Studies in transcripts have revealed that early auxin responsive genes are induced by auxin within minutes [2]. Most of early regulated auxin responsive genes are classified into three families: Aux/IAA, Gretchen Hagen3 (GH3), and small auxin-up RNA (SAUR) [3]. Among these early auxin response genes, SAURs are considered to be the most abundant. Apart from transcriptional regulations by auxin, many SAURs are also regulated post-transcriptionally as a conserved downstream destabilizing element (DST) in the 3′-untranslated region that confers high mRNA instability [4].

Since the first SAUR gene was identified in elongating soybean hypocotyl sections [5], members of this gene family have been identified by genome-wide analyses in diverse plant species, such as Arabidopsis [3], rice [6], sorghum [7], tomato [8], potato [8], maize [9], citrus [10], and ramie [11]. In total, more than 674 SAUR genes were identified from different species, but only a small portion of them have been functionally characterized. In Arabidopsis, the overexpression of AtSAUR19 subfamily proteins (AtSAUR19 to AtSAUR24) with a N-terminal tag resulted in root waving, increased hypocotyl elongation and leaf size, defective apical hook maintenance, and altered tropic responses [12]. A further analysis showed that AtSAUR19 stimulated plasma-membrane (PM) H+-ATPase activity to promote cell expansion by inhibiting the PP2C-D phosphatases [13, 14]. Other AtSAUR63 (AtSAUR61 to AtSAUR68 and AtSAUR75) subfamily proteins also led to long hypocotyls, petals and stamen filaments in transgenic Arabidopsis [15]. Gene AtSAUR36 has been reported to have a role in promoting leaf senescence [16]. In rice, the up-regulation of the OsSAUR39 gene negatively regulated auxin biosynthesis and transport [17]. Three other SAUR proteins, i.e., AtSAUR76, AtSAUR77 and AtSAUR78 affected ethylene receptor signaling and promoted plant growth and development upon auxin responses [18]. Recently, many light-induced and/or repressed SAUR genes were reported to mediate differential growth of cotyledons and hypocotyls [19]. These results have shown that SAUR proteins regulate diverse aspects of plant growth and development.

Cotton is one of the most important economic crops for its natural textile fiber in the world, as it produces the cotton fiber, a highly elongated cell derived from the ovule epidermis. Previous reports have shown that auxin plays an important role in fiber development [20, 21]. Specifically, the number of lint fibers was significantly increased with an overexpressed iaaM gene during fiber initiation in ovules [22]. However, no SAUR genes have been reported in cotton thus far. The genome sequences of the two tetraploid cotton species Gossypium hirsutum - AD1 [23, 24] and G. barbadense - AD2 [25, 26] and the two closest living extant relatives to their descendants G. raimondii - D5 [27, 28] and G. arboreum - A2 [29] provide an important genomic resource for a genome-wide analysis of gene families and other related genetic studies [30,31,32]. In this study, a genome-wide identification of SAUR genes in the four species with currently sequenced genomes was performed to characterize the SAUR gene family with respect to their structural, genomic and gene expression features. The results obtained in this study will provide new data for further studies on auxin signaling in cotton growth and development.

Methods

Gene retrieval and characterization analysis

The genome sequences of G. raimondii (JGI_v2.1), G. arboreum (BGI_v2.0), G. hirsutum acc. TM-1 (NAU-NBI_v1.1) and G. barbadense acc. 3-79 (HAU-NBI_v1.0) were downloaded from the CottonGen website [33]. To identify potential SAUR proteins in the four cotton species, all the SAUR amino acid sequences from Arabidopsis, rice, sorghum, tomato, and potato, maize, citrus and ramie were used as query in local BLAST (with an E-value cut off of 1e-5) searches individually against the four cotton species genome databases. Next, candidate sequences were inspected with the HMMER software 3.0 with the HMM profiles of auxin-inducible signature domain structure (PF02519) and the pfam database [34] to confirm the presence of the conserved SAUR domain. Then, the ProtParam tool [35] was used to analyze the physicochemical parameters (i.e., length, molecular weight, and isoelectric point) of SAUR proteins [35]. Subcellular localization prediction was conducted using the CELLO v2.5 server [36].

Phylogenetic analysis

Multiple alignments for all of the available and predicted SAUR full-length protein sequences were performed using ClustalX2 with a manually adjustment where appropriate for the alignment of the SAUR domain. A phylogenetic tree was constructed using the Neighbor Joining (NJ) method of MEGA 6.0 [37] with the pairwise deletion option and a Poisson correction model. For a statistical reliability analysis, bootstrap tests were performed with 1000 replicates.

Chromosomal locations and gene duplication analysis

The physical chromosome locations of all SAUR genes were obtained from the genome sequence databases. Mapchart 2.2 [38] software was used to generate the chromosomal location image. The predicted SAUR proteins were first aligned by ClustalW2 at EMBL-EBI (http://www.ebi.ac.uk/Tools/msa/clustalw2/) prior to a gene duplication analysis. Gene duplication events were defined according to the following conditions: the alignment region covered more than 80% of the longer gene and the identity of the aligned regions was over 80% [30].

Gene structure and conserved motif analysis

The gene exon/intron structure was analyzed using the Gene Structure Display Server (GSDS) [39] by comparing the cDNAs with their corresponding genomic DNA sequences. The Multiple Em for Motif Elicitation (MEME; version 4.11.2) program was used to analyze the protein sequences of the four cotton species G. raimondii, G. arboreum, G. hirsutum, and G. barbadense designated GrSAURs, GaSAURs, GhSAURs, and GbSAURs, respectively. The following parameter settings were used: size distribution, zero or one occurrence per sequence; motifs count, 5; and motif width, between 6 and 37 wide.

Search for upstream sequence elements

For the promoter analysis, 2000 bp of genomic DNA sequences upstream of the start codon (ATG) of each GhSAUR genes were downloaded from their genome sequence. The PLACE database [40] was used to search for cis-acting regulatory elements in the putative promoter regions.

Gene expression analysis

The expression patterns of genes coding for GhSAURs in Upland cotton were analyzed in three developmental stages of two backcross inbred lines (BILs) (NMGA-062 and NMGA-105): 0 days post anthesis (DPA) ovules, 3 DPA ovules, and 10 DPA fibers, and two developmental stages of cultivar Xuzhou 142 and its fuzzless-lintless mutant Xuzhou 142 fl: −3 and 0 DPA ovules. The two BILs genotypes were derived from an interspecific backcrossing for two generations between Upland SG747 as the recurrent parent and G. barbadense Giza75 followed by selfing, and they had a significant difference in fiber length [41, 42]. All the genotypes were grown in the Experimental Farm, Institute of Cotton Research (ICR), Chinese Academy of Agricultural Sciences (CAAS), Anyang, Henan province, China. Flowers at 0 DPA were tagged and harvested at −3, 0, 3 and 10 DPA to dissect ovules at −3, 0, and 3 DPA and developing fibers at 10 DPA. Three biological replications with 15 flowers per replication were used for each sampling stage. Ovules or fibers were immediately frozen in liquid nitrogen after dissection in the field and stored at −80 °C until use. RNA from each tissue sample was extracted using an RNAprep Pure Plant kit per manufacturer’s instructions (Tiangen, China). RNA quality and quantity were checked using an Agilent 2100 Bioanalyzer, then cDNA libraries were constructed. Fragments of 200-700 nt in size were paired-end sequenced using an Illumina HiSeq 2000 platform in BGI (Shenzhen, China) following the manufacturer’s instructions. After removing adapter sequences and low-quality reads including these with more than 5% unknown nucleotides or with more than 20% nucleotides of sequencing quality ≤10, the clean transcriptome sequencing data were submitted to the NCBI Sequence Read Archive (SRA) with the accession numbers SRP038911 and SRP039385 for NMGA-062 and NMGA-105, and accession number SRP056184 for Xuzhou 142 and Xuzhou 142 fl. Fragments per kilobase of transcript per million fragments (FPKM value) were calculated to normalize the expression level of each expressed GhSAUR gene based on gene length and the number of mapped reads. The formula is FPKM=\( \frac{10^6C}{NL/{10}^3} \), where C is the number of fragments that are uniquely aligned to a specific gene, N is the total number of fragments that are uniquely aligned to all genes, and L is the number of bases on the specific gene. The expression profiles were clustered using the Cluster 3.0 software [43].

Quantitative RT-PCR analysis

To further characterize the expression of selected GhSAUR genes, tissue samples were collected at −3, 0, 3, 5, 7, 10, 15, 20, and 25 DPA ovaries (i.e., bolls) from NMGA-062 only and dissected for ovules at −3, 0 and 3 DPA and fiber at other stages. The tissues were immediately frozen in liquid nitrogen and stored at −80 °C until used for total RNA extraction. Root, stem, and leaf samples were also harvested. For each tissue sample, three biological replications were used.

To study the responses of selected GhSAURs to IAA application, seed from cultivar CCRI 10 (G. hirsutum) was planted in potting soil at 25 °C in a culture room with a 16-h light/8-h dark cycle. Young plants at the four-true leaf stage were treated with 100 mM IAA, and control plants were sprayed with an equal volume of ddH2O. Leaves were then harvested at 0, 5, 10 and 30 min, and 1 h after the treatment for RNA extraction. Three biological replications (10 plants per replication) were used for each time point.

Total RNA was isolated from various tissue samples with a Tiangen RNAprep Pure Plant kit (Tiangen, China) according to the manufacturer’s instructions. The first-strand cDNA fragment was synthesized from total RNA using PrimeScript®RT Reagent kit (Takara, Japan). Then the cDNA templates were diluted 8 fold and used for quantitative RT-PCR (qRT-PCR). Gene specific primers (Additional file 1: Table S1) were designed using the Primer 5.0 software. The histone-3 gene (AF024716) was used as the internal control, as this gene as a reference gene has been commonly used in numerous studies in plants including cotton to verify different gene expression levels in various tissue samples [44, 45]. The qRT-PCR experiment with three replicates was performed on a Mastercycler® ep realplex (Eppendorf, German) in a volume of 20 μL containing 10 μL of 2 × UltraSYBR Mixture (With ROX) (CWBIO, China), 6.2 μL of RNase-Free water, 3 μL of cDNA template, 400 nM of forward and reverse primers into 96-well plates. The thermal cycling conditions were as follows: an initial denaturation step of 10 min at 95 °C, followed by 40 cycles of 15 s at 95 °C for denaturation, 25 s at 60 °C for annealing and 30 s at 72 °C for extension. Then, the melting curve analysis was performed. The relative expression levels of genes were calculated using the 2-△△CT method and normalized to the histone-3 gene. The results were statistically analyzed using a t test.

Co-localization of SAUR genes with quantitative trait loci (QTL) for fiber length (FL) and single nucleotide polymorphism (SNP) identification of FL-QTL co-localized SAUR genes

To co-localize GhSAURs with QTL for FL, we downloaded molecular markers for FL QTL hotspots reported in interspecific G. hirsutum × G. barbadense populations [46] and markers for 4 FL QTLs reported in an BIL population containing the two BILs (NMGA-062 and NMGA-105) differing in fiber length [41]. Based on the anchoring marker’s location in the G. hirsutum TM-1 genome [24], the positions of the corresponding QTL hotspots for FL in chromosomes and the 4 FL QTL regions identified in the BIL population were determined. Chromosome locations of SAUR genes within the FL QTL regions (25 cM) were considered to be the targeted co-localized SAUR genes.

To identify SNPs, FL-QTL co-localized SAUR genes in allotetraploid G. hirsutum genome [24] were compared with homologous/allelic genes from two G. barbadense genomes [25, 26] G. raimondii genome [27], and G. arboreum genome [29] using a local Blastn program. Then, genes from the A2 genome and the At subgenome of AD1 and AD2 were aligned with ClustalX2 using default parameters, as with the genes from the D5 genome and the Dt subgenome of AD1 and AD2. SNPs between allotetraploids and diploids were identified manually for each QTL co-localized SAUR gene.

Results

Identification of SAUR gene family in two diploid and two allotetraploid cotton

With all the SAUR amino acid sequences from Arabidopsis (67), rice (56), sorghum (71), tomato (99), potato (134), maize (79), citrus (70) and ramie (71) as query, a Blast search against the CDS data identified a total of 157, 107, 227, and 192 predicted SAUR sequences in G. raimondii, G. arboreum, G. hirsutum, and G. barbadense, respectively. The putative SAUR members were then analyzed for conserved domains using the HMMER and Pfam programs, leading to the identification of 145, 97, 214, and 176 SAUR genes in G. raimondii, G. arboreum, G. hirsutum, and G. barbadense, respectively (Table 1; Additional files 2, 3 and 4: Table S2, S3, S4). These genes were named consecutively from GrSAUR1 to GrSAUR145, GaSAUR1 to GaSAUR97, GhSAUR1 to GhSAUR214, and GbSAUR1 to GbSAUR176 for the four species, respectively, according to the order of their corresponding chromosomal locations. More than 97.5% of the 632 identified SAUR genes encode proteins ranging between 64 to 185 amino acids (AA), except for 16 genes with different lengths, i.e., less than 64 or more than 185 AA (i.e., GrSAUR140 encoding protein with 53 AA, GbSAUR31, GrSAUR109, GbSAUR27, GbSAUR158, GrSAUR40, GbSAUR91, GhSAUR126, GrSAUR134, GbSAUR161, GaSAUR43, GbSAUR13, GrSAUR138, GbSAUR125, GhSAUR114 and GbSAUR34 encoding proteins with 190, 198, 199, 204, 206, 226, 228, 232, 232, 232, 239, 240, 242, 252, 295 and 376 AA, respectively). The predicted SAUR genes encode proteins with the predicted molecular weight (Mw) ranging from 6.06 to 43.11 kDa and the theoretical isoelectric point (pI) varying between 4.69 and 11.42. The protein subcellular localization prediction showed that 397 of the 632 SAUR proteins were located in the nucleus, while others were plasma membrane, cytoplasmic, mitochondrial, chloroplast, or extracellular localized.

Table 1 The SAUR gene family in Gossypium raimondii

Phylogenetic analysis of the SAUR gene family

To study the phylogenetic relationship of the SAUR family, we performed a phylogenetic analysis of 647 SAUR protein sequences from Arabidopsis, rice, maize, tomato, potato, sorghum, citrus, and ramie and 632 cotton SAURs by generating a phylogenetic tree. As shown in Fig. 1, the SAUR proteins can be placed into 10 distinct groups, designated from group I to X, which were based on their sequence similarities with orthologs in other plants. Group I and VII contained 365 and 237 SAUR members, respectively, and constituted the two largest groups in the phylogeny, while group III and V contained only 24 and 30 members, respectively. Each group contained SAURs from at least 9 plant species. Interestingly, group VI only contained SAURs from dicot species, i.e., G. raimondii, G. arboreum, G. hirsutum, G. barbadense, Arabidopsis, tomato, potato, citrus, and ramie as seen in Additional file 5: Table S5.

Fig. 1
figure 1

A Phylogenetic tree of SAUR proteins from Gossypium raimondii, G. arboreum, G. hirsutum, G. barbadense, Arabidopsis, rice, maize, tomato, potato, sorghum, citrus, and ramie. The phylogenetic tree was generated using MEGA 6.0 with the Neighbour-Joining (NJ) method with 1000 bootstrap replicates. Different colored line marks groups I - X of the SAURs

Chromosomal distribution and duplication events of SAUR genes in cotton

Using the genome sequences of the four cotton species as references, the identified 632 SAUR genes were mapped onto chromosomes or scaffolds. Of which, 543 of the 632 SAURs were assigned to chromosomes, while the remaining 89 SAURs were located in unmapped scaffolds (Fig. 2). Of the 133 GrSAURs mapped to 13 chromosomes of G. raimondii, chromosome D5_chr5 (with 31 genes) and chromosome D5_chr1 (with 21 genes) had the most number of SAUR genes, while D5_chr3 had only 1 SAUR gene. The 97 predicted SAUR genes in G. arboreum were also distributed unevenly across its 13 chromosomes, with chromosome A2_chr5 (similar to the homoeologous chromosome D5_chr5 in G. raimondii) harboring more genes (31) and chromosome A2_chr4 harboring the least genes (2). A total of 185 GhSAUR genes were mapped to 25 chromosomes of the G. hirsutum genome with 1 to 34 genes per chromosome, except for no SAUR genes identified on chromosome AD1_A02 in the At subgenome. Similarly, more SAUR genes were clustered on chromosomes AD1_A03, AD1_D02, AD1_D05, and AD1_D13. A total of 128 GbSAUR genes distributed unevenly over all the G. barbadense genome except for chromosome AD2_A02. The number of genes per chromosome ranged from 1 (chromosome AD2_A11) to 18 (chromosome AD2_D02).

Fig. 2
figure 2

Distribution of SAUR genes on Gossypium raimondii (a), G. arboreum (b), G. hirsutum (c), and G. barbadense (d) chromosomes. The scale represents megabases (Mb). The chromosome numbers of G. raimondii (D5_chr1 - D5_chr13), G. arboreum (A2_chr1 - A2_chr13), G. hirsutum (AD1_A01 - AD1_A13, AD1_D01 - AD1_D13), and G. barbadense (AD2_A01 - AD2_A13, AD2_D01 - AD2_D13) are indicated above each vertical bar

To elucidate the expanded mechanism of the SAUR gene family, we performed a gene duplication event analysis including tandem duplication and segmental duplication in the four cotton species. After multiple and pairwise alignments of GrSAURs, GaASURs, GhSAURs and GbSAURs, we chose the paralogous genes with the criteria described in previous studies [30]. As a result, 98, 54, 187 and 144 putative paralogous SAUR genes with high gene identity and similarity were found, accounting for 67.6%, 55.7%, 87.4% and 81.8% of the entire SAUR gene family in G. raimondii, G. arboreum, G. hirsutum, and G. barbadense, respectively. We observed that tandem duplication and segmental duplication events contributed to the expansion of the SAUR gene family in cotton. The details for the duplicated gene pairs were listed in Additional file 6: Table S6. In G. raimondii, 14 pairs of tandem duplication and 21 pairs of segmental duplication events were detected. The clusters of tandem duplication were mainly on chromosome D2_chr5. In G. arboreum, 3 clusters of SAUR genes were produced by tandem duplications, and the clusters of genes were distributed on the same chromosome (i.e., A2_chr5). In the same species, 17 pairs of SAUR genes were produced by segmental duplications and the two genes in each pair were from different chromosomes. In G. hirsutum, 9 clusters of genes with tandem duplications were detected mainly on chromosome AD1_A03 and AD1_D02, and 75 pairs of genes with segmental duplication events were detected with the two genes from each pair distributed on different chromosomes. The most majority of GbSAUR genes in G. barbadense were found to occur from tandem duplication or segmental duplication events. There are 23 pairs of tandem duplication clusters existed mainly on chromosome AD2_A05 and AD2_D02. 23 gene pairs with segmental duplication all distributed on different chromosomes. Interestingly, most of the tandem duplication events in GhSAURs occurred on chromosome AD1_D02. It is apparent that homeologous chromosome 5 in G. raimondii (D5) and G. arboreum (A2), and chromosome D02 in G. hirsutum (AD1) and G. barbadense (AD2) all harbored more SAUR genes derived from tandem duplications. Chromosome D02 from AD1 and AD2 may be also homeologous to A2_chr5. The tandem duplication gene pairs were shown in Fig. 3 except for genes that were not localized to a specific chromosome.

Fig. 3
figure 3

The Circos diagram of paralogous gene pairs identified in GrSAURs, GaSAURs, GhSAURs, and GbSAURs. The chromosomes of Gossypium raimondii, G. arboreum, G. hirsutum, and G. barbadense were filled with red, green, blue, and purple colors, respectively. A line between two genes indicates a paralog

Gene structure and conserved motifs

To gain an insight into the diversification of the SAUR genes in cotton, the exon/intron organization and conserved motifs were further analyzed. Based on the evolutionary relationships (Fig. 4a, Additional files 7, 8 and 9: Figure S1A, S2A, S3A), the detailed structure features of SAUR genes were shown in Fig. 4b and Additional files 7, 8 and 9: Figure S1B, S2B, S3B. In general, more than 90.7% of SAUR genes lacked introns, 19 genes in GrSAURs, 4 genes in GaSAURs, 11 genes in GhSAURs, and 19 genes in GbSAURs each had 1 intron. Only 4 genes, namely GrSAUR73, GaSAUR76, GhSAUR114, and GbSAUR34 each had 2 introns and another 2 genes (GrSAUR109 and GbSAUR13) each had 3 introns. Conserved motifs in the 632 SAUR proteins were identified using the MEME online tool (Fig. 4c, Additional files 7, 8 and 9: Figure S1C, S2C, S3C). Motifs 1, 2, and 3 constitute the conserved SAUR-specific domain of approximately 60 residues in the central region of the sequences and were identified in most of the predicted SAUR proteins. Motifs 4 and 5 accounted for 39.7% and 25.8% of the SAUR members, respectively, suggesting that these features might have contributed to some specific functions in the SAUR family.

Fig. 4
figure 4

Phylogenetic relationships, gene structure and motif compositions of the Gossypium raimondii SAUR genes. a The phylogenetic tree was constructed using MEGA 6.0 with the Neighbour-Joining (NJ) method with 1000 bootstrap replicates. b Exon/intron structures of SAUR genes from G. raimondii. The introns, CDS and UTRs are represented by black lines, green and blue boxes respectively. The scale bar represents 0.5 kb. c Protein motif. Each motif is represented in the colored box

Similar to the histidine-rich (H-rich) regions found in Arabidopsis, sorghum, tomato, and potato [3, 7, 8], the H-rich regions were identified in the sequences of 39 predicted cotton SAUR genes. They are 7 GrSAURs, 6 GaSAURs, 14 GhSAURs, and 12 GbSAURs. The multiple alignments among these 39 SAUR proteins were shown in Additional file 10: Figure S4, and the H-rich regions were located on both or either of the N-terminal and C-terminal sequences.

Promoter regions of GhSAUR genes

The scanning of cis-acting regulatory DNA elements within promoter regions (2.0 kb from the start codon) of 165 randomly chosen GhSAUR genes was performed using the PLACE database. The results revealed that the promoters of the SAUR gene family contain numerous DNA elements predicted to be auxin signaling transduction related cis-elements. At least one of the seven major auxin-responsive cis-elements – S000024, S000026, S000234, S000270, S000273, S000360, and S000370, has been found in the promoter regions of the SAURs, except for 12 predicted GhSAURs genes. Another two regulatory sequences, i.e., Ca2+-responsive cis-element (S000501) and calmodulin-binding/CGCG box (S000507) were found in 47 predicted GhSAURs genes (Additional file 11: Table S7).

Responses of SAUR genes in leaves to an exogenous IAA application

The expression of SAUR genes is regulated at multiple levels in other reported species [3]. We analyzed the expression of 16 GhSAUR genes in leaves under exogenous IAA treatment (Fig. 5). 11 of these genes were up-regulated at 5 min to 1 h after the IAA treatment, while 3 genes (GhSAUR56, GhSAUR61, and GhSAUR163) were down-regulated by the exogenous IAA application. Another 2 genes (GhSAUR63 and GhSAUR181) showed a relatively stable expression regardless of IAA treatment. Therefore, the response of SAUR genes to IAA treatment in leaves varies, depending on SAUR genes.

Fig. 5
figure 5

Expression patterns of GhSAUR genes in leaves under an IAA treatment. The x-axis represents different minutes (0, 5, 10, 30, and 60) after IAA treatment, and the y-axis indicates the relative expression levels. Error bars show the standard deviation of three biological replicates

Expression characterization of GhSAUR genes in developing ovules and fibers

The RNA-seq transcriptome data from the two backcross inbred lines (BILs) NMGA-062 and NMGA-105 at different developmental stages (0 DPA and 3 DPA ovules, and 10 DPA fibers) and Xuzhou142 and Xuzhou142 fl mutant at −3 and 0 DPA ovules were used to analyze the expression patterns of candidate GhSAUR genes in Upland cotton. The NMGA-062 had a greater fiber length than NMGA-105. Among the 214 GhSAURs, 72 genes had an FPKM ≥1 in at least one of the three developmental stages of the two BILs and were used to analyze the relative expression of each gene. Based on a cluster analysis, these SAUR genes showed four major patterns (Fig. 6a). The first group is composed of 20 genes showing an overall higher level in 0 DPA and 3 DPA ovules and 10 DPA fibers, when compared with other genes. In the second group, 25 genes showed a progressive decrease from 0 DPA to 10 DPA, while 14 genes increased from 0 DPA to 10 DPA in the third group. The other 13 genes in group four showed an overall lower expression level. With the absolute value of log2-fold change ≥1 which was also statistically significant as the standard to judge differently expressed genes (DEGs), we found that 6 GhSAUR genes were DEGs when compared between the two BILs at the same development stage, while 30 GhSAUR genes were found to be DEGs among the three different developmental stages (Fig. 6c). As shown in Fig. 6, common DEGs were detected among different comparisons.

Fig. 6
figure 6

Expression profiles of GhSAUR genes based on RNA-seq data of two backcross inbred lines (BILs) and Xuzhou142 (WT) and Xuzhou142 fiberless and fuzzless (fl) mutant. a Transcript levels of 72 GhSAURs in three stages of two BILs. b Transcript levels of 40 GhSAURs in −3 and 0 DPA ovules of Xuzhou142 (WT) and its fl mutant. c The number of differentially expressed SAUR genes between different stages of BILs NMGA-062 with longer fibers (L) and NMGA-105 with shorter fibers (S). For example, L0 represents 0 DPA ovules of NMGA-062. d The number of differentially expressed SAUR genes between different stages of Xuzhou142 (WT) and Xuzhou142 fl mutant (fl). For example, WT_-3 represents −3 DPA ovules of Xuzhou142

In another RNA-seq transcriptome profiling, only 40 SAUR genes were found to be expressed in −3 or 0 DPA ovules of Xuzhou142 and its fl mutant (Fig. 6b). Genes showed a relatively higher level of expression in group one than in group two. Of 10 genes showing significant differential expressions, 2 and 4 DEGs were detected between Xuzhou 142 (WT) and its fl mutant, at −3 and 0 DPA ovules, respectively, and 6 and 5 DEGs were detected between −3 and 0 DPA ovules in Xuzhou 142 (WT) and its fl mutant, respectively. Interestingly,3 of the 5-6 DEGs between −3 and 0 DPA ovules are in common in the two genotypes (Fig. 6d).

To further study the expression profiles based on RNA-seq, quantitative RT-PCR (qRT-PCR) was conducted for 12 SAUR genes in 5 organs and 8 fiber developmental stages of NMGA-062 (Fig. 7). The results showed that some SAUR genes exhibited diverse expression profiles, while others showed similar expression patterns. Specifically, four genes, GhSAUR62, GhSAUR158, GhSAUR126, and GhSAUR90, were exclusively highly expressed in stems or flowers. The expression of GhSAUR110, GhSAUR33, GhSAUR26, GhSAUR65, GhSAUR72, and GhSAUR181 increased between −3 and 20 DPA in that it peaked in fibers at 10 or 15 DPA, and then decreased at 20 DPA fibers. Another two genes, GhSAUR63 and GhSAUR56, were increased from −3 to 25 DPA with high expressions at 25 DPA fibers.

Fig. 7
figure 7

Expression patterns of GhSAUR genes in different tissues and developmental stages of NMGA-062 based on quantitative RT-PCR (qRT-PCR). The x-axis represents different developmental stages (−3, 0, and 3 DPA ovules; 5, 10, 15, 20, and 25 DPA fibers; Ovule, 10 DPA ovule; Root; Stem; Leaf; Flower), and the y-axis indicates the relative expression levels as determined by qRT-PCR. The error bars shown are the standard deviation of three biological replicates

Co-localization of SAURs with QTL for FL

To better understand the potential function of SAUR genes related to fiber length (FL), we co-localized the SAUR genes with reported FL quantitative trait loci (QTL). As a result, 20 genes were mapped with the anchored FL QTL or FL QTL hotspots within a 25-cM region (Fig. 8). There was 1 gene (GhSAUR3) on chromosome AD1-A01, 1 gene (GhSAUR36) on AD1-A04, 1 gene (GhSAUR53) on AD1-A07, and 2 genes (GhSAUR76 and GhSAUR77) on AD1-A12 located within the FL QTL hotspots in the At subgenome. While 6 genes (i.e., GhSAUR128, GhSAUR129, GhSAUR130, GhSAUR131, GhSAUR132, and GhSAUR133) on chromosome AD1-D05, 5 genes (i.e., GhSAUR148, GhSAUR149, GhSAUR150, GhSAUR151, and GhSAUR152) on AD1-D08, 4 genes (i.e., GhSAUR171, GhSAUR172, GhSAUR173, and GhSAUR174) on AD1-D12 were located within the FL QTL hotspots in the Dt subgenome. Of these co-localized SAUR genes, only 3 genes were differentially expressed between the two BILs differing in fiber length. For example, GhSAUR149 was up-regulated in 10 DPA fibers of the BIL with longer fibers as compared with the BIL with shorter fibers, and its expression in the long fiber BIL was also higher in 10 DPA fibers than in 0 DPA ovules. However, GhSAUR148 in the long fiber BIL was down-regulated at 3 DPA ovules than at 0 DPA ovules; And GhSAUR173 in the long fiber BIL was also down-regulated at 3 ovules and 10 DPA fibers than 0 DPA ovules.

Fig. 8
figure 8

A co-localization analysis of SAUR genes with fiber length quantitative trait loci (QTL). Only genes co-localized with the FL QTL are shown in the figure

Identification of single nucleotide polymorphisms (SNP) in SAUR genes co-localized with FL

Sequence variations in the predicted SAUR genes among the sequenced G. raimondii (D5), G. arboreum (A2, Shixiya1), G. hirsutum (AD1, TM-1) and G. barbadense (AD2, 3-79 and Xinhai 21) were further analyzed (Additional file 12: Figure S5). 12 genes had 1-7 SNPs between AD1 and AD2, while other 8 genes were identical between AD1 and AD2. Among the SNP-containing SAUR genes, GhSAUR3 and GhSAUR77 from AD1 on the At subgenome have identical SNP sequences to the homologous genes in A2. Among 9 genes with homologous genes on D5, the SNP sites of GhSAUR174 in AD2 were identical to the homologous D5 genes, while the SNP sites of GhSAUR132, GhSAUR133, GhSAUR149 and GhSAUR171 in AD1 were identical to the homologous genes in D5. Interestingly, the SNP sites of homologous GhSAUR128, GhSAUR148, GhSAUR150 and GhSAUR152 genes in both AD1 and AD2 shared with D5 depending on SNP sites. 7 SNPs in GhSAUR53 from the At subgenome were detected between AD1 and AD2, but its sequence was different from both A2 and D5.

To understand if the co-localized SAURs are genetically associated with fiber elongation, a sequence comparison between NMGA-062 and NMGA-105 was performed, but no SNPs were identified. The results indicated that SAURs are not genetically related to fiber length, implying that the differences in fiber length between the two species are unlikely related to the natural sequence variations in the SAUR genes.

Discussion

SAUR gene family in cotton

Previous reports have suggested that the SAUR family regulates a series of cellular, physiological, and developmental processes in response to hormonal and environmental signals in higher plants [1]. However, the molecular network that links specific hormonal and environmental signals is still unknown. With the availability of genome sequences, a genome-wide identification and annotation of SAUR genes has been performed in Arabidopsis (72), rice (58), sorghum (71), tomato (99), potato (134), maize (79), citrus (70), mulberry (62), hemp (56), and ramie (71) [3, 6,7,8,9,10,11]. In this study, 145, 97, 214, and 176 SAUR genes in four sequenced cotton species, G. raimondii, G. arboreum, G. hirsutum, and G. barbadense, respectively, were identified in-silico and characterized. Compared with most of the SAUR gene family numbers in other reported species, more members were existent in cotton. It suggests that the SAUR family in cotton experienced an extensive expansion during its evolutionary history. It was reported that tandem and segmental duplication events contributed to the expansion of the SAUR family in Solanaceae species and maize [8, 9]. In this current study, 55.7 - 87.4% of the SAUR gene family members in cotton were likely from tandem duplication and segmental duplication events. The duplication events occurred about 115-146 and 13-20 million year ago in G. arboreum and G. raimondii, respectively, which was followed by the formation of the G. hirsutum and G. barbadense from the hybridization of the two extant progenitors relatives and polyploidization event 1.5 million years ago [23, 26]. Therefore, the SAUR gene number in the tetraploids G. hirsutum and G. barbadense is likely to depend on the number of G. arboreum and G. raimondii. Segmental duplications can further contribute to the expansion of the SAUR family. The chromosomal distribution of SAUR gene family also showed that genes in this family were not randomly distributed on the genome, but in tandem arrays of extremely related paralogous genes, as reported in other species [3, 6]. Thus, tandem duplication and segmental duplication events also contributed to expansion of SAUR family in cotton, as other reported gene families [30, 32].

Genomic structure of SAUR family genes

The majority of SAUR gene family lacks introns. Only one SAUR gene in Arabidopsis has an intron, while none of the OsSAURs in rice harbors any intron. Among other species, 6 out of 58 SAUR genes in maize, 3 out of 99 SAUR genes in tomato, 9 out of SAUR genes in potato, 10 out of 70 SAUR genes in citrus contained introns based on the sequenced genomes [3, 6, 8,9,10]. Similar phenomenon in cotton was found in this study in that about 9.3% of the SAUR genes in cotton carried introns. As the occurrence of alternative splice in intronless genes is usually low, the function of certain SAUR family genes is likely stable.

The motif analysis of the sequenced cotton species showed the existence of the conserved 60 amino acid domain with three motifs specific to the SAUR proteins, similar to SAURs in tomato, potato, tobacco, rice, sorghum and Arabidopsis [8]. Although SAUR proteins have variable N- and C-terminal extensions, the relatively short sequence lengths render a high level of similarity between SAUR genes.

Expression profiles and putative functions of GhSAURs

As an important type of auxin responsive genes, SAURs participate in the auxin signal pathway. The cis-element analysis in putative promoter regions of the GhSAURs showed that most genes possess at least one type of auxin-responsive cis-elements. We analyzed the expression levels of randomly chosen 16 GhSAUR genes under IAA treatment. The expression levels of 11 analyzed genes were up-regulated, while 3 were down-regulated after the treatment. This confirmed previous reports that SAUR genes were upregulated or repressed to some extent following an auxin treatment [1, 11].

Auxin plays an important role in fiber development, as shown by a previous report that the lint yield and fiber fineness were improved with the overexpression of the IAA biosynthetic gene iaaM, driven by the promoter from the petunia MADS box gene Floral Binding protein 7 [22]. The application of auxin increased fiber units and promoted fiber initiation in in-vitro cultured cotton ovules [20]. In this study, the expression profiles of SAUR genes in two BILs (differing in fiber length) were analyzed in fiber initiation and elongation stage. We found many SAUR genes in group II and III showed differential expressions from 0 DPA to 10 DPA (Fig. 6). We suggest that these genes may be regulated during fiber initiation and elongation. We also showed that several genes were differentially expressed at the fiber initiation stage between Xuzhou142 and its fl mutant.

We also investigated the gene expression patterns of 12 SAUR genes in various tissues. As Fig. 7 showed, apart from four genes that had a relatively high expression level in stems and flowers, the other tested genes had high levels in different fiber development stages (−3 to 25 DPA). The alignment of these genes with Arabidopsis and rice SAUR genes showed that GhSAUR33 had a high similarity with AtSAUR61-AtSAUR68 subfamily and OsSAUR54. In Arabidopsis, transgenic plants expressing SAUR63:GFP or SAUR63:GUS fusions had long hypocotyls, petals and stamen filaments, while overexpressed artificial microRNAs targeting SAUR63 subfamily led to reduced hypocotyls and stamen filament elongation. The results indicated that these AtSAUR genes regulated cell expansion to change the hypocotyl growth [15]. OsSAUR54 was preferentially expressed in rice stigma, and may promote pollen tube growth [47]. Cotton fibers are single-celled trichomes, and grow via a similar mode to pollen tubes in that the fiber cell elongation depends on cell expansion [48]. Our qRT-PCR result and the function of orthologous genes of GhSAUR33 indicated that it may have a similar function in cotton fiber elongation.

In cotton, the auxin signaling pathway was associated with the dedifferentiation and redifferentiation during somatic embryogenesis in a transcript profiling analysis on SAUR genes [49]. In another study, SAUR genes were found down-regulated in a dwarf cotton genotype when compared with the wild type, indicating their involvement in the growth of plant height [50]. In Arabidopsis, SAUR genes were reported to regulate plant growth and development via regulating cell expansion [12,13,14], shade avoidance responses [51], tropic growth [52], root growth [18], auxin transport [15], and leaf growth and senescence [16]. As a plant specific gene family, SAURs in cotton are also likely to have diverse functions. However, no other researches about SAUR genes were reported in cotton except for the above two reports. Therefore, our study provides an important piece of information that will facilitate our understanding of specific functions of SAURs in cotton growth and development.

Conclusions

This study provides a comprehensive analysis of SAUR gene family in sequenced genomes of four cotton species for the first time. The phylogenetic analysis of SAURs classified the SAUR genes into 10 groups. A chromosomal location and gene duplication analysis revealed that duplication events have contributed to the expansion of the SAUR gene family in cotton. Most studied GhSAUR genes showed differential expressions in leaves in response to auxin applications. A further expression analysis using RNA-seq transcriptome and qRT-PCR showed various expression patterns of SAUR genes in early developmental ovules and fibers. Although 20 SAURs are co-localized with fiber length quantitative trait loci (QTL), no sequence variations were identified between two interspecific backcross inbred lines (BILs) with different fiber length, suggesting an unlikely genetic involvement of these SAURs genes in fiber elongation.