Background

Cotton (Gossypium hirsutum L.) is mainly grown for fiber and oil purposes and is crucial for the elevation of a country’s economy and also called “White gold” (Komala et al. 2018). It is recognized globally as an important fiber crop and is cultivated in tropical and sub-tropical regions of the world. Cotton is an important crop for an agricultural country like Pakistan. Pakistan is the 4th largest producer of cotton. Moreover, cotton seeds are also used to produce edible oil and feed in the form of seed cake (Shakeel et al. 2015). The shares of the cotton are 0.8% in the GDP and 4.5% in the agriculture sector, but the yield per acre is low in Pakistan due to pest attack and climate change. Cotton production in Pakistan has been reduced to 9.861 million bales and the area under cotton cultivation is 2 373 thousand hectares. The production has been declined up to 17.5%, causing a decrease in cotton ginning by 12.74% (Anonymous 2019–20).

Hence, there is a dire to develop disease- and pest-resistant cotton varieties with high yield and superior fiber quality. Climate change is also adversely affecting the productivity of cotton. Many high-yielding cultivars are available but do not possess better fiber quality as required by the textile industry. The decline in yield and quality has been recorded over the last decade because of the loss in genetic diversity of existing genotypes (McCarty Jr. et al. 2013).

Mainly, four types are cultivated across the world including two diploids, G. herbaceum and G. arboreum, and two tetraploid species, G. hirsutum and G. barbadense. The genetic base of the cultivated cotton species should be broadened by conducting breeding programs (Ali et al. 2017; El-Kady et al. 2015; Shabbir et al. 2016). Genetic diversity is an important tool that is helpful in breeding programs for the conservation of germplasm. Genetic diversity should be exploited to produce cultivars with desired characters. The information about the nature and extent of variation, genetic advances, and heritability is required for the selection of superior genotypes and the study of the relation among yield components and fiber characters.

The correlation coefficient assesses simple associations among the plant attributes but does not help us to conclude anything about the selection of the plants. Determination of correlation coefficients should be calculated to devise the best possible combinations among different attributes to get expected results (EL-Mohsen and Amein 2016). The correlation coefficient is divided into direct and indirect effects and is useful to examine the reason for the observed correlation (Manonmani et al. 2019). It provides a better understanding of the character’s association. Principal component analysis (PCA) and cluster analysis have been used to uncover the similarities and differences among different genotypes and placed them into various clusters (Naik et al. 2016). The principal component analysis is the most appropriate statistical method to partition total variation that facilitates the selection of elite parental lines. It also revealed the importance of the main contributors to total variability. Moreover, genetic variability among existent genotypes can be assessed by principal component analysis which will help in calculating phenotypic variability (Amna et al. 2013). The knowledge of variability provided by PCA is helpful to select genetically- and agronomically-important genotypes (Isong et al. 2017). The main objective of the present study is to assess the genetic variations and diversity in yield and fiber traits in cotton genotypes and to analyze the associations among them.

Materials and methods

The present study was conducted to assess the genetic variability among 15 cotton genotypes that were obtained from Cotton Research Group, Department of Plant Breeding and Genetics, University of Agriculture Faisalabad, Pakistan (Table 1). These genotypes were planted in three replications under a randomized complete block design (RCBD) in the research area of Department of Plant Breeding and Genetics, University of Agriculture Faisalabad on 11th June, 2019.

Table 1 List of Genotypes

Five plants of each genotype were selected from each replication and data was recorded for characters including plant height, the total number of nodes per plant, height to node ratio, the number of monopodial branches per plant, the number of sympodial branches per plant, the number of bolls per plant, boll weight, the total number of seeds per boll, seed index, lint index, GOT, upper half mean length (UHML), fiber strength, fiber fineness, uniformity index, short fiber index reflectance, and seed cotton yield.

Plant height

The height of five plants of each genotype from each replication was calculated from the zero node to the apical bud in cm. The average height for each genotype was calculated for statistical analysis.

The number of nodes per plant

As the cotton plant grows, nodes are developed on the main stem. The total number of nodes was recorded for each plant. Average nodes of each genotype were calculated for statistical analysis.

Height to node ratio

The height to node ratio was calculated by dividing plant height by the total number of nodes for each plant. The average height to node ratio of each genotype was calculated from five replicates.

The number of monopodial branches per plant

Monopodial branches are the indirect fruiting branches that were present at the lower part of the plant. Five plants were selected from each replication and the monopodial branches were counted.

The number of sympodial branches per plant

Sympodial branches are the fruiting branches that were present all over the stem of plant. Sympodial branches were counted on five selected plants per replication and then averaged.

The number of bolls per plant

Bolls were picked from five plants of each genotype per replication. The average number of bolls was counted.

Boll weight (g)

The seed cotton yield of a plant was divided by the total number of bolls to calculate the boll weight.

The number of seeds per boll

The total number of seeds per boll was calculated for five plants of each replication by using the total seed weight, seed index, and the total number of bolls. Then the average number of seeds per boll was calculated for each genotype.

Seed cotton yield per plant (g)

The seed cotton yield was collected from bolls of five plants of each genotype per replication. The seed cotton was weighed with electronic balance and then average seed cotton yield was computed for each genotype.

Seed index (g)

Seed index was calculated by weighing 100 healthy seeds and the average was calculated.

Ginning out turn (GOT, %):

Electrical ginner was used for ginning of seed cotton yield. Then the weight of lint was calculated by using an electronic balance. The ginning out turn was calculated by using the following formula.

$$ \mathrm{GOT}\%=\frac{\mathrm{Sample}\ \mathrm{lint}\ \mathrm{weight}}{\mathrm{Seed}\kern0.17em \mathrm{cottonweight}}\times 100 $$

Lint index (g)

Lint index is the weight of fiber produced by 100 seeds and calculated by using the following formula.

$$ \mathrm{Lint}\kern0.5em \mathrm{index}=\frac{\mathrm{Weight}\kern0.34em \mathrm{of}\kern0.28em 100\kern0.28em \mathrm{seed}\times \mathrm{GOT}\times 100}{100-\mathrm{GOT}\times 100}\times 100 $$

Fiber traits

Fiber length was measured in mm by using HVI. Then the average fiber length was calculated for all samples for statistical analysis. Fiber fineness was also measured by using HVI. After test compilation the sample was removed from the chamber. The average fiber fineness of each sample was calculated for statistical analysis. The uniformity index was calculated by dividing the mean length by upper half mean length (UHML). It is the percentage of uniformity. Fiber strength was determined by placing the fiber between two clamps. The extent of the whiteness of light reflected by cotton fibers is called reflectance. It is used to measure the color grade of cotton in conjunction with the yellowness of fiber. Short fiber index is the percentage of fibers in a sample, by weight, less than half inch in length. It was also measured by HVI.

Statistical analysis

Analysis of variance was performed by using Statistics 8.1 following Steel et al. (1997) to estimate the genotypic differences among all genotypes. The coefficient of variation was also calculated for all characters for the comparison of variability. Genotypic, phenotypic and environmental variances were divided by the means of respective traits to compute coefficients of variation. Heritability in a broad sense was calculated by diving genotypic variance with phenotypic variance. The genetic advance was also calculated at 20% selection intensity. The genotypic and phenotypic correlations were computed by using agricolae packages implemented in R software (R Development Core Team 2015). The correlation coefficients were partitioned into direct and indirect effects by using path coefficient analysis following Dewey and Lu (1959). The divergence was calculated by the principal component.

Results

Analysis of variance revealed highly significant (P ≥ 0.01) differences among genotypes for plant height, the number of sympodial branches, the number of bolls per plant, seed cotton yield, and fiber strength. The number of monopodial branches, boll weight, GOT, lint index, seed index, the number of seeds per boll, upper half mean length (UHML), fiber strength, short fiber index, uniformity index, and reflectance while the results were not significant for the total number of nodes and height to node ratio (Tables 2 and 3)

Table 2 Mean squares for yield and fiber quality related traits in G. hirsutum genotypes
Table 3 Mean squares for yield and fiber quality-related traits in G. hirsutum genotypes

Phenotypic variances were slightly higher than genotypic variances. Estimates of genetic variability revealed that the highest genotypic coefficient of variation for seed cotton yield was followed by the number of monopodial branches, the number of sympodial branches, plant height, the number of bolls per plant, and boll weight. A similar trend was also observed for the phenotypic coefficient of variation. The values of environmental coefficients were low as compared with the genotypic coefficient of variation. This indicated that the influence of environment was less on these characters. Plant height, monopodial branches, the number of bolls, lint index, seed index, and seed cotton yield displayed high heritability with maximum genetic advance per percent mean (Table 4).

Table 4 Estimation of parameters of genetic variability and heritability

Correlation and path coefficient analysis

Genotypic and phenotypic correlation coefficients were estimated among all possible combinations of different characters. Correlation study displayed that seed cotton yield was positively and significantly associated with plant height, the number of monopodial branches, the number of sympodial branches, GOT (%), the number of bolls, seed per boll, seed index, uniformity index, and reflectance while negative significant relation was observed for short fiber index at genotypic level (Table 5). At the phenotypic level, seed cotton yield was positively and significantly associated with sympodial branches, plant height, GOT (%) and the number of bolls while the significantly negative association with short fiber index (Table 6).

Table 5 Genotypic correlation coefficient for yield and fiber traits in upland cotton
Table 6 Phenotypic correlation coefficient for yield and fiber traits in upland cotton

Plant height was significantly positively correlated with the number of sympodial branches, uniformity index, and seed cotton yield while a significantly negatively significant correlation was observed with GOT, the number of bolls, UHML, boll weight, seeds per boll, and short fiber index at the genotypic level. The number of monopodial branches displayed a positive and significant correlation with seeds per boll and seed cotton yield. It was positively and non-significantly associated with GOT. The number of sympodial branches was highly significant and positive association with plant height, fiber fineness, seed cotton yield, and seed per boll while boll number was significantly and positively correlated with boll weight, UHML, GOT, and seed cotton yield. The number of seeds per boll was significantly and positively correlated with GOT, uniformity index, seed cotton yield, and the number of monopodial branches. The correlation coefficient revealed seed index was significantly and positively related to boll weight, uniformity index, and seed cotton yield. GOT had a significant and positive association with the number of bolls, seed cotton yield, and seeds per boll. Uniformity index depicted a significant and positive relationship with seed cotton yield, reflectance, and fiber fineness while reflectance had a significant and positive association with fiber strength and seed cotton yield.

At the phenotypic level, plant height was positively and significantly correlated with sympodial branches and seed cotton yield whereas a significantly negative correlation was observed for GOT. Sympodial branches had a significantly positive association with the uniformity index and seed cotton yield whereas a non-significant and positive association was observed for fiber strength and fiber fineness while GOT was positively and significantly associated with the number of bolls and seed cotton yield. GOT, seed cotton yield, and fiber strength displayed a positively significant association with boll numbers. The short fiber index displayed a negative and significant relationship with seed cotton yield at both phenotypic and genotypic levels.

Path coefficient analysis determined direct and indirect effects of all the attributes on dependent variable. It revealed that traits like the number of monopodial branches, plant height, seed per boll, short fiber index, GOT, and boll weight impacted positively and directly on yield. The remaining traits exerted negative direct effects on yield (Table 7).

Table 7 Direct and Indirect effects for various yield and quality-related traits in upland cotton

Principal component analysis

Mean data of all the studied traits was analyzed for PCA to study the genetic divergence among 15 cotton genotypes by using XLSTAT. It provided information about factor loading and the share of each attribute to total variability. The total variance was divided into 14 components. Two-dimensional representation was illustrated by using the first two PCs. The first 6 PCs out of the total of the fourteen PCs displayed eigen-values (> 1) and had maximum share to total variability. PC-I, II, III, IV, V, and VI had a share of 22.63, 16.43, 14.37, 11.12, 9.61, and 8.60% to total variability, respectively (Table 8). PC-I and PC-II contributed 22.63 and 16.43% to total variation. PC-I was mainly related to plant height, uniformity index, the number of sympodial branches and seed cotton yield while PC-II revealed higher and positive values for the number of bolls, seed index, UHML, fiber fineness and fiber strength. PC-III and PC-IV had 14.37 and 11.12% share to total variability, respectively. PC-III had high and positive values for attributes including the number of monopodial branches, reflectance, seed cotton yield, boll weight, and lint index while negative values were observed for seed per boll, UHML, the number of sympodial branches, and short fiber index. PC-IV displayed positive and higher values for boll weight, lint index, seed index, and fiber strength. PC-V and PC-VI had contribution of 9.61 and 8.60% to total variation, respectively. The number of bolls, seed index, the number of monopodial branches and the number of seeds per boll displayed high and positive eigen-vectors for PC-V while for PC-VI positive and high values were demonstrated on UHML, uniformity index, boll weight, reflectance, the number of monopodial branches, and seed index (Table 9).

Table 8 Eigenvalues and principal components for yield and fiber attributes in cotton genotypes
Table 9 Contribution of variables to total variability

Scree plot

Scree plot demonstrated the variance percentage in accordance with all principal components illustrated by a graph between the eigenvalues and principal components. PC1 displayed the highest variability of 22.63% with the eigenvalue of 3.622. Minimum variability was observed in PC13 and PC14 with the eigenvalues of 0.038 and 0.002, respectively. PC1 had maximum variability so the genotypes in PC1 should opt for selection (Fig. 1).

Fig. 1
figure 1

Scree plot of principal component analysis

Biplot

Biplot displayed that variables were represented on the plot in the form of vectors. The relative distance of the variables from the origin regarding PC-1 and PC-2 depicted the share of each variable to the total variation of the germplasm. Boll weight, the number of seeds per boll, the number of monopodes and reflectance showed minimum differences as they were close to the origin whereas GOT, plant height, the number of sympodial branches, lint index, UHML, fiber fineness, short fiber index, seed index, fiber strength, boll number, seed cotton yield, and fiber attributes displayed maximum differences as they were at a greater distance from the origin (Fig. 2).

Fig. 2
figure 2

Principal component biplot for the contribution of traits

Discussion

The exploitation of genetic diversity is important to select diverse genotypes for breeding programs. Heritability and genetic variability help us to point out the extent of improvement that can be made through selection on the basis of phenotype (Magadum et al. 2012). Genetic variability provides the basis to exploit the genes for the improvement of the desired attributes (Baloch et al. 2014). The divergence of the available germplasm should be utilized to produce genetically diverse and better-quality genotypes (Ullah et al. 2017). Selection is done based on the association present between yield characters and fiber properties. Variation from genetically diverse plant resources can be utilized to develop favorable and pest-resistant varieties. Therefore, the variation present among different morphological and fiber characters should be studied for crop betterment (Memon et al. 2017).

Analysis of variance displayed significant variation for all studied traits except total nodes and height to node ratio. Similar results for the mean square of genotypes were also reported by Baloch et al. (2015), Dhamayanathi et al. (2010) and Djaboutou et al. (2017).

Estimates of genetic variability revealed the highest genotypic coefficient of variation for seed cotton yield followed by the number of monopodial branches, the number of sympodial branches, plant height, the number of bolls, and boll weight. A similar trend was observed for the phenotypic coefficient of variation (Shakeel et al. 2015a). Plant height, the number of monopodial branches, the number of bolls, lint index, seed index, and seed cotton yield displayed high heritability with maximum genetic advance per percent mean. Similar results were reported by many researchers including Dhivya et al. (2014), Khan et al. (2017), Shar et al. (2017), and Hayat and Bardak (2020). Fiber attributes displayed high phenotypic variance values in comparison with genotypic variance as reported by Shakeel et al. (2015). Heritability was maximum for micronaire and fiber strength with low genetic advance as revealed by Nawaz et al. (2019).

Correlation analysis revealed that seed cotton yield was significantly and positively associated with plant height, the number of monopodial branches, the number of sympodial branches, GOT, the number of bolls, seed per boll, seed index, uniformity index, and reflectance while a negative significant relation was observed for short fibre index at the genotypic level. At the phenotypic level, seed cotton yield was positively and significantly associated with the number of sympodial branches, plant height, GOT, and the number of bolls while a significantly negative association with short fibre index. Similar results have been reported by Reddy et al. (2019) while Kumbhar et al. (2020) reported a significantly positive association of plant height with sympodial branches. Erande et al. (2014) and Nandhini et al. (2019) reported a non-significant and positive correlation of the number of monopodial branches with GOT. Salahuddin et al. (2010) found that at the phenotypic level, yield was positively associated with sympodial and bolls. Shakeel et al. (2015) reported that plant height and seed per boll was significantly and positively correlated with the number of sympodial branches. Baloch et al. (2015) reported a significant and positive relation of bolls with seed cotton yield at both genotypic and phenotypic levels. A similar association of seeds per boll with other yield and fiber traits was observed by Ali et al. (2020), Rai and Sangwan (2020), and Bhatti et al. (2020). Seed index was significantly and positively associated with yield and uniformity ratio as displayed by Ahmed et al. (2019), and Rai and Sangwan (2020). Erande et al. (2014) and Monisha et al. (2018) reported a significantly positive association of GOT with yield, the number of bolls and seeds per boll. The results of path analysis were in accordance with Nandhini et al. (2019) and Kumbhar et al. (2020), who suggested direct positive effects of boll weight on yield. Manonmani et al. (2019) and Ali et al. (2020) reported a direct positive effect of seed per boll on yield. Ahsan et al. (2015) observed that GOT had positive and direct effects on yield.

PCA divided total variance into 14 components. Two-dimensional representation was domenstrated by using the first two PCs. The first 6 PCs out of the total of fourteen PCs displayed eigen-values (> 1) and had maximum share to total variability. Isong et al. (2017) reported similar results. According to scree plot, PC1 displayed the highest variability 22.63% with an eigenvalue of 3.622. Minimum variability was observed for PC13 and PC14 with eigenvalues of 0.038 and 0.002, respectively. PC1 had maximum variability so the genotypes in PC1 should opt for selection. The results were in accordance with the findings of Shakeel et al. (2015) and Riaz et al. (2019). The analysis revealed maximum variation in the germplasm that can be utilized in future breeding programs to get desired genotypes.

Conclusion

Heritability estimation are of great importance for plant breeders. The traits with higher heritability (> 80%) make the selection procedure easier. Plant height, the number of monopodial branches, the number of bolls, lint index, seed index, and seed cotton yield displayed a high heritability with the maximum genetic advance. Correlation study displayed significant association of seed cotton yield with sympodial branches, plant height, GOT, and the number of bolls at both genotypic and phenotypic levels. Path coefficient analysis unveiled that plant height, the number of monopodial branches, GOT, boll weight, seeds per boll, and short fiber index exerted direct positive effects on seed cotton yield while the rest of the characters had direct negative effects on yield. PCA provided information about the divergence present among genotypes. AA-802, IUB-13, FH-159, FH-458, and CIM-595 were genetically diverse and could be utilized for the selection of better-performing genotypes for further improvement.