Background

Cotton is one of the major fibre crops worldwide, and has extensive phenotypic diversity among the 50 representative species (Wendel and Cronn 2003). Among these species, upland cotton (Gossypium hirsutum L.) is the leading fibre production crop and grown in more than 80 countries/regions of the world. Currently, G. hirsutum is responsible for 95% of the annual cotton production in the world (Chen et al. 2007). Because of its economic importance, such as high-yield and environmental suitability, G. hirsutum has attracted considerable scientific interest of plant breeders and agricultural scientists.

However, domesticated upland cotton genotypes have narrow genetic base (Abdurakhmonov et al. 2008; Tyagi et al. 2014). To broaden the genetic base through hybrid breeding programs, the genetic divergence among available germplasm is a prerequisite. Genetic variation among different cotton genotypes for morphological and fibre quality traits was studied for their improvement. Cultivated cotton was domesticated in Mesoamerica and has a low level of genetic polymorphism (Rungis et al. 2005; Ai et al. 2017). The narrow genetic base of upland cotton has become a serious concern since limited genetic diversity translates to limited allelic availability for continued genetic gain. So, it is crucial to explore novel germplasm resources for potential natural genetic diversity and develop innovative genomics tools to efficiently mobilize these useful genetic variations to breeding germplasm, which should help to overcome existing and potential problems of cotton production associated with narrow genetic base of the cultivar germplasm (Ulloa et al. 2013; Hinze et al. 2017).

Variation in germplasm collections has been utilized for identifying desirable genotypes to enhance yield improvement. To identify the desired genotype, various morphological traits are being employed such as fibre quality, yield component factors, resistance, seed quality (Abdurakhmonov et al. 2008; Zhao et al. 2014; Badigannavar and Myers 2015; Huang et al. 2017; Sun et al. 2017). These distinct qualitative traits, called morphological markers, are often reliable for germplasm characterization and trait associated selections; such qualitative traits are stable in expression across various environments. Thus, characterization of germplasm plays a vital role in crop improvement.

Evaluation of germplasm and quantification of genetic diversity are indispensable for a pragmatic use of plant genetic resources, also for determining evolutionary relationships (Amezrou et al. 2017). Studies of the variation present in germplasm collections have been carried out by employing plant morphological attributes as characterization tools (Zhang et al. 2012; Salazar et al. 2016). For cotton geneticists and breeders, the precise evaluation of the genetic diversity of excellent G. hirsutum germplasm will provide a guide for choosing parents and predicting the degree of inheritance, variation, and level of heterosis, which are essential for realizing the breeding goals (Hinze et al. 2015).

To better understand and effectively utilize G. hirsutum germplasm in China, it is necessary to evaluate global collections of the species for phenotypic traits and to study their genetic variation in different environments. The objectives of the present study were: (i) to assess the phenotypic and genotypic variation of five fibre quality traits in 719 global collections of G. hirsutum; (ii) to compare variation of phenotypic traits of accessions from the different regions; (iii) to detect and analyze correlations among the investigated traits; and (iv) to group the accessions using principal component analysis (PCA) and phylogenetic tree based on single nucleotide polymorphism (SNP) markers. Our findings could provide useful information for selecting elite parental materials for breeding programs of upland cotton.

Methods

Plant materials and field trials

A collection of 719 upland cotton germplasms was evaluated in this study. These accessions were from different countries/regions, 588 were collected from China and 131 were from other countries/regions (Additional file 1: Table S1). The accessions were grown in eight different environments in a randomized complete block at Baoding (115°47′N, 38°87′E), Hejian (116°13′N, 38°42′E), Xinji (115°12′N, 37°54′E), and Qingxian (116°91′N, 38°65′E) in Hebei Province and Yacheng (109°20′N, 18°38′E) in Hainan Province in 2014, denoted 14BD, 14HJ, 14XJ, 14QX and 14HN, and Xinji and Qingxian in Hebei Province and Yacheng in Hainan Province in 2015, denoted 15XJ, 15QX and 15HN, respectively. Two replicates were performed for each accession in five locations in 2014, and three replicates were performed for the three locations in 2015. Briefly, one row of each accession was planted for each replicate, with 20–22 plants per row, 30–35 cm between plants within rows and 80 cm between rows.

Measurement of fibre quality traits and data collection

When mature, 20 naturally open bolls from the central part of the plants from each accession were hand-harvested at each location and ginned by machine. Fibre samples were sent to the Supervision and Testing Center of Cotton Quality, Ministry of Agriculture of China in Anyang, Henan Province for fibre property determination. Fibre quality traits, including the upper-half mean fibre length (FL, mm), fibre strength (FS, cN·tex− 1), fibre micronaire (FM), fibre uniformity (FU, %) and fibre elongation (FE, %), were measured using a high volume instrument (HVI).

Statistical analysis

The data for the phenotypic characters were analyzed by determining the mean, minimum, maximum, standard deviation (SD) and frequency distribution. The relationships among traits were calculated using Pearson’s correlation analysis for the accession means using SPSS 22.0 software.

Cluster based on phenotypes and genotypes

The accessions were classified by principal component analysis (PCA) on mean values of all five-quality traits among different environments. PCA was performed using the OmisShare tools (www.omicshare.com/tools). Phylogenetic tree among individual SNPs (Sun et al. 2017) was constructed by calculating Nei’s genetic distance using PowerMarker version 3.25 (Liu and Muse 2005) and was visualized by iTOL (http://itol.embl.de/).

Results

Characterization of five fibre quality traits

We found a wide range of variation in the five quality traits among the evaluated accessions (Fig. 1). For example, FL varied from 23.68 mm to 33.74 mm; FS varied from 25.02 cN·tex− 1 to 33.94 cN·tex− 1; FM varied from 3.65 to 5.97; FU varied from 81.69% to 86.65%; FE varied from 6.21% to 7.13%.

Fig. 1
figure 1

Frequency distribution of 719 Gossypium hirsutum L. accessions for five fibre quality traits. a Fibre length; b Fibre strength; c Fibre micronaire; d Fibre uniformity; e Fibre elongation

In this study, all cotton accessions were assigned four different germplasm types, namely the Yellow River (406), the Yangtze River (123), North/Northwest China (59) and Abroad (131). The means and variation ranges of the five quality traits of accessions from different regions across the two years are presented in Table 1. Through comparison analysis, we found that the abroad accessions tended to have the highest FL (29.51 mm and 28.66 mm) in 2 years, the highest FS (30.05 cN·tex− 1), highest FM (4.99) and highest FU (85.11%) in 2014, compared with the other three germplasm types accessions (Table 1). Additionally, our results demonstrated that the North/Northwest accessions in China tended to have higher FL, higher FS, lower FM and lower FE than did the other ecotype accessions in 2 years, such as those from the Yellow River and the Yangtze River.

Table 1 Descriptive statistics of the five measured traits in 719 upland cotton accessions from different regions

Correlation between phenotypic traits

Correlation coefficients (r) were highly significant (P < 0.01) in 540 of the 780 trait combinations, where r ranged from − 0.398 to 1 (Fig. 2). The correlation coefficient and P values among the traits under investigation are presented in Additional file 1: Table S2. The objective was to explore which traits are well associated and meaningful for breeding. The correlations of each trait were largely consistent between different locations in 2 years. Overall, FL had high positive correlation coefficients with FS, FU and FE, whereas FS also had high positive correlations with FU and FE. We also found negative correlations between FM and FL, FS, FU. Of the above correlation coefficients, FL_14BD was positively and significantly correlated with FS_14BD (r = 0.784**), and FL_14HJ had a significantly positive correlation with FE_14HJ (r = 0.734**). There were negative and significant correlations between FM_14HN and FL_14HN (r = − 0.398**) as well as between FM_14BD and FS_14BD (r = − 0.280**). The results showed that the higher FL and FS, the lower FM.

Fig. 2
figure 2

Correlation coefficients among the 40 phenotypic traits using 719 upland cotton genotypes evaluated in 2014 and 2015

Mining germplasm resources with excellent quality traits

Through evaluating the consistency of the data between the two years, analysis of the means of 719 G. hirsutum accessions showed that 31 accessions not only had FL values higher than 30.00 mm but also had FS values higher than 30.00 cN·tex− 1 (double-thirty quality, Table 2). Eight accessions were from abroad, 15 were in the Yellow River, four from the Yangtze River and four from North/Northwest China. Meanwhile, these accessions reached double-thirty quality values in at least six environments and had FM values between 3.5 and 4.9. The accession W82–1 had both FL and FS values more than 30.00 in eight environments. The accession MSCO-12 had highest mean value of FL (33.74 mm); the accession J02–508 had highest mean value of FS (33.94 cN·tex− 1). In addition, the accessions MSCO-12, W82–1, Xinluzao17, Zhong078 and Jinmian12 were the top five for mean value of FL; the accessions J02–508, Nongda13, SuBR6202Bt, W82–1 and Zhong078 were the top five for mean value of FS; three accessions Zhong078, Shiwu107 and Luwu16 reached A level (3.7–4.2) for FM value in addition to double-thirty quality values.

Table 2 The 31 elite upland cotton germplasms screened based on phenotypic traits

PCA and genetic diversity

Hierarchical cluster analysis among 719 G. hirsutum accessions from the different regions were obtained using principal component analysis (PCA) with the five fibre quality traits (Fig. 3). The first two PCA components explained 94.83% of the total variation. The PC1 was the most important and explained 83.82% of the total variance, and 11.01% of the variation among accessions attributed to the PC2. The Yellow River accessions distributed in the lower part of the plot; Most of North/Northwest China accessions distributed in the middle part of the plot; and most of the Yangtze River and abroad accessions distributed in the upper part of the plot. However, most of the accessions distributed in the middle of the plot and there was no clear boundary among the four regions. Notably, elite quality germplasms, 480 for MSCO-12, 303 for W82–1, 673 for Zhong078, 725 for Nongda13, and 647 for J02–508, were clearly distinguished from the other accessions in the plot. Additionally, 31 accessions with excellent fibre quality are shown in the phylogenetic tree based on the SNPs (Fig. 4). The tree provides information on the genetic relationships between these elite accessions. The result show that there is a relatively long distance among these germplasms except for some individual accessions. Although the elite lines are scattered through the tree, elite Chinese accessions are grouped closely with abroad accessions, emphasizing that genetic relationships within the Chinese cultivars are more important than the regions where they currently grown for selecting superior parents in cotton quality breeding.

Fig. 3
figure 3

PCA for 719 upland cotton accessions based on the mean values of five fibre quality traits

Fig. 4
figure 4

Phylogenetic relationship of 719 upland cotton genotypes using SNP markers

Discussion

Morphological and molecular markers have been used extensively to describe the variability of different crops (Li et al. 2008; Huang et al. 2016; Lei et al. 2017). Most of these investigations showed a high variation of measured traits among accessions of various geographical regions. Improving fibre quality is one of the most important challenges in G. hirsutum breeding. However, the phenotypic variation of these traits is continuous and influenced by different aspects (Abdurakhmonov et al. 2008). In this study, we found relatively high variation of the five fibre quality traits among the accessions. These results, especially the identification of excellent germplasm, will provide important support for improving fibre quality in upland cotton.

Our results showed that the five fibre traits have high levels of variation in different regions. However, the average FL and FS of the Chinese accessions was lower than abroad ones. The reason was closely related to the history of Chinese cotton breeding. It was known that Chinese cotton breeders mainly focused on improving yield in several decades, and this goal was indeed achieved (Dai et al. 2017). More recently, the goal of cotton breeding in China has turned to emphasis on fibre quality improvement, especially breeding for double-thirty cultivars in fibre quality to benefit cotton financial markets (Fang et al. 2017; Ma et al. 2018). All of the five traits demonstrated highly significant correlations between the two years, indicating substantial reproducibility of the results. The correlations among different fibre quality traits suggest that the FL and FS could be over other traits for breeding aims because of their positive relationship. These results are consistent with previously reported results (Nie et al. 2016; Sun et al. 2017).

By comparing the means of the fibre quality related traits over 2 years, we screened elite germplasms with longer fibre length (> 30 mm), higher FS (> 30 cN·tex− 1) and better FM (3.7–4.2), while some genetic materials with shorter fibre length (< 26 mm) and fewer FS (< 26 cN·tex− 1) was also observed. We identified a wide range of variations in these traits and screened elite germplasms that may be used as excellent parents for upland cotton breeding in China. Simultaneously, we preserved rich genetic resources by broadening possible phenotypic variations for future genetic research on the various traits.

In this study, PCA showed that there was no obvious distinction among accessions from different growing regions (Fig. 3). Chinese accessions and abroad accessions could not be clearly distinguished, even among Chinese accessions from different geographic regions. This was most likely due to the breeding history of the cotton germplasm. Current Mexico-Guatemala region is considered the site of original domestication and the primary center of G. hirsutum diversity (Brubaker et al. 1999). Early Chinese accessions came from abroad and China breeders have since created distinct cotton cultivars from these imported accessions by using various breeding methods. Our results illuminated the complex intermingling of imported genetic stocks in modern Chinese accessions, indicating that distinct, region-specific cultivars have not emerged. For optimizing fibre quality, the results indicate that elite and high-performing lines bred in China or abroad will also have high performing in diverse growing areas within China. However, breeding focused on these lines will further restrict the already limited diversity in cultivated upland cotton varieties.

Conclusions

This study provided a detailed description of a population that represents a wide range of upland cotton diversity germplasm. The results of the fibre quality traits evaluations of the G. hirsutum accessions showed a wide range of variation over 2 years. In general, abroad accessions tended to have higher FL and FS than did Chinese accessions. Among different geographic regions accessions in China, North/Northwest accessions tended to have the highest FL, FS and best FM. Through evaluating the five fibre quality traits over 2 years, we selected 31 elite germplasms reaching double-thirty quality values. PCA based on phenotypes revealed no clear boundary among the germplasm materials along the first two principal coordinates of different geographic origins and different regions. This study will enable breeders to make useful information about possible parents for cotton breeding programs.