During the recently ended International Year of Natural Fibers (http://www.naturalfibres2009.org/), it is fitting that progress in sequencing of genomes in the cotton genus (Gossypium) accelerated rapidly, toward the realization of many novel opportunities to advance knowledge of organic evolution. Of singular importance is dissecting the evolution of the ‘lint fiber’ that sustains the textile industry, with an aggregate influence estimated at ∼$120 billion/yr on US gross domestic product and ∼$500 billion/yr worldwide. “There are only a few cells in the plant kingdom that are as exaggerated in their size or composition as cotton fibers”, and some of these single-celled seed epidermal trichomes “... reach lengths of over 6 cm, or one-third the height of an Arabidopsis plant (Kim and Triplett 2001).”

Cotton is unusual among major crops in having been domesticated independently four times at two different ploidy levels. Spinnable fibers evolved in the Old World A genome lineage in the past 5–7 million years (Senchina et al. 2003; Udall et al. 2006). Domestication of A genome cottons G. herbaceum and/or G. arboreum may have started before 6000 B.C. in Pakistan (Moulherat et al. 2002). In parallel, by 3500–2300 BC (Stephens and Moseley 1974) New World aboriginals were utilizing two tetraploid species that arose from natural hybridization between an A genome species and a New World D genome species. A and D genome taxa diverged ∼5–10 million years ago (Senchina et al. 2003; Udall et al. 2006), reuniting by polyploidization ∼1–2 million years ago following trans-oceanic dispersal of an A genome propagule to the New World (Wendel 1989). The ancestral allopolyploid spawned two species that were independently domesticated (G. hirsutum, or ‘Upland’ cotton; and G. barbadense, including forms referred to as ‘Sea Island’, Egyptian, and Pima cotton), and three species known only in the wild, native to the Galapagos (G. darwinii), Hawaii (G. tomentosum), and Brazil (G. mustelinum).

Revealing the genetic underpinnings of cotton productivity will require understanding both the prehistoric evolution of spinnable fibers, and the results of independent domestication processes in both the Old and New Worlds. In particular, the New World D genome (similar to extant G. raimondii) played a surprising role in cotton improvement. Although no D genome species produce spinnable fiber, more than half of genetic differences in fiber traits between the two domesticated tetraploid species map to D-genome chromosomes (Jiang et al. 1998; Rong et al. 2007). Moreover, gene expression in tetraploid cotton fiber shows a like bias in favor of D-genome alleles (Hovav et al. 2008). These data support the hypothesis that the superior fiber yield and quality of tetraploids may be an emergent property of combining two genomes (Jiang et al. 1998). Indeed, cotton has gone ‘full circle’—evolution of spinnable fibers may have unwittingly provided the Old World A genome a dispersal mechanism by which to transiently colonize the New World and permit the tetraploid to form. In turn, in the post-Columbian era, more productive and finer-quality New World tetraploids have largely supplanted cultivated diploids in the Old World.

Cotton enjoys many opportunities to participate in a bio-based products revolution that may reduce dependence on petrochemicals (Council 2000). Cotton fiber with increased uniformity, durability, and strength might replace synthetic fibers that require ∼230 million barrels of petroleum per year to produce in the USA alone. Cotton seed oil, and byproducts of fiber processing, are raw materials for biofuel production (Holt et al. 2003).

Discovery and utilization of new Gossypium diversity may be especially important for sustainable cotton production because of its narrow gene pool (Chee et al. 2004; Lubbers et al. 2004). The natural ‘genetic bottleneck’ imposed by polyploid formation has been exacerbated by repeatedly crossing relatively few closely-related genotypes to one another to breed new cultivars (May et al. 1995) and using only a few cultivars to deploy transgenes (Helms 2000). For example, a looming worldwide water crisis (UNESCO 2002) makes it important to identify adaptations that permitted wild cottons to endure periodic drought and temperature extremes (Kohel et al. 1974), restoring such valuable alleles that may have been “left behind” during domestication (Gur and Zamir 2004) to create cultivars that produce more with less (water).

DNA sequencing promises to reveal the spectrum of diversity in the Gossypium genus. A high degree of conservation of gene order and sequence suggests that the vast majority of data from diploids will extrapolate to tetraploids (Rong et al. 2004). Accordingly, obtaining a reference sequence of the smallest Gossypium genome (D, ∼900 Mb) is a logical stepping-stone toward characterizing the larger A diploid (∼1700 Mb) and AD tetraploid genomes (∼2500 Mb) (Paterson 2007; Chen et al. 2007). Rapid low cost re-sequencing might then be sufficient to reveal diversity in the remaining six genomes (B, C, E, F, G, K) that permitted Gossypium species to adapt to a wide range of ecosystems in warmer, arid regions of the world. The US Department of Energy Joint Genome Institute has completed a 0.4x genome-equivalent ‘pilot study’ of G. raimondii that strongly supports the feasibility of assembling a whole-genome shotgun (WGS) sequence (A.H.P. and X. Wang, unpubl. data), and has begun further sequencing (www.jgi.doe.gov/sequencing/cspseqplans2009.html). Early explorations of the A and AD genomes are also in progress.

As a leading crop in the implementation of transgenes in agriculture, a reference genome sequence may expedite ongoing development and stewardship of genetically-modified (GM) cotton. It will become easy to determine whether each transgene insertion site is in euchromatin or heterochromatin, and identify any genes inadvertently disrupted. Identification of genomic characteristics associated with favorable expression of transgenic traits might reduce the need for costly empirical testing of numerous transgenic insertions to commercialize one. Unifying principles of useful transgene insertions might be found by comparison to the only transgenic plant sequenced to date, papaya, in which five of six insertions were in nuclear-encoded DNA fragments of chloroplast origin, with four matching topoisomerase I recognition sites (Ming et al. 2008). Using the sequence to identify DNA markers closely linked to transgenes may reduce the undesirable chromatin (and traits) transmitted to elite genotypes from the otherwise-obsolete cottons that are most efficiently transformed.

The greatest challenge facing the cotton community is not genome sequencing per se but the conversion of sequence to knowledge. Completion of the Arabidopsis thaliana sequence was quickly followed by inception of the NSF 2010 project, which has greatly increased knowledge about the functions of Arabidopsis genes at a cost approaching $200 million. While the functions of perhaps half of the cotton genes might be deduced by analogy to those of Arabidopsis (Rong et al. 2005), de novo functional analysis of the remaining cotton genes faces the disadvantages of ∼20 times as much DNA, the necessity of completing its longer life cycle to see effects on the primary organ of commerce (seedborne lint fiber), and a larger body that cannot complete its life cycle in a test tube.

To realize the potential economic benefits of sequencing the cotton genomes will require investments of at least the same order-of-magnitude made in Arabidopsis. Had Arabidopsis not gone first the cost of cotton functional genomics would be much higher. Much of the required investment will need to come from the private sector, but few single enterprises have the critical mass of knowledge, skills, and resources needed to accomplish such innovation alone. Cotton is an attractive target for public-private partnership to develop enabling tools that will nurture rapid accumulation of fundamental information necessary to empower development and commercialization of products and applications across the value chain.