Intergenomic gene transfer in diploid and allopolyploid Gossypium
Intergenomic gene transfer (IGT) between nuclear and organellar genomes is a common phenomenon during plant evolution. Gossypium is a useful model to evaluate the genomic consequences of IGT for both diploid and polyploid species. Here, we explore IGT among nuclear, mitochondrial, and plastid genomes of four cotton species, including two allopolyploids and their model diploid progenitors (genome donors, G. arboreum: A2 and G. raimondii: D5).
Extensive IGT events exist for both diploid and allotetraploid cotton (Gossypium) species, with the nuclear genome being the predominant recipient of transferred DNA followed by the mitochondrial genome. The nuclear genome has integrated 100 times more foreign sequences than the mitochondrial genome has in total length. In the nucleus, the integrated length of chloroplast DNA (cpDNA) was between 1.87 times (in diploids) to nearly four times (in allopolyploids) greater than that of mitochondrial DNA (mtDNA). In the mitochondrion, the length of nuclear DNA (nuDNA) was typically three times than that of cpDNA. Gossypium mitochondrial genomes integrated three nuclear retrotransposons and eight chloroplast tRNA genes, and incorporated chloroplast DNA prior to divergence between the diploids and allopolyploid formation. For mitochondrial chloroplast-tRNA genes, there were 2-6 bp conserved microhomologies flanking their insertion sites across distantly related genera, which increased to 10 bp microhomologies for the four cotton species studied. For organellar DNA sequences, there are source hotspots, e.g., the atp6-trnW intergenic region in the mitochondrion and the inverted repeat region in the chloroplast. Organellar DNAs in the nucleus were rarely expressed, and at low levels. Surprisingly, there was asymmetry in the survivorship of ancestral insertions following allopolyploidy, with most numts (nuclear mitochondrial insertions) decaying or being lost whereas most nupts (nuclear plastidial insertions) were retained.
This study characterized and compared intracellular transfer among nuclear and organellar genomes within two cultivated allopolyploids and their ancestral diploid cotton species. A striking asymmetry in the fate of IGTs in allopolyploid cotton was discovered, with numts being preferentially lost relative to nupts. Our results connect intergenomic gene transfer with allotetraploidy and provide new insight into intracellular genome evolution.
KeywordsIntergenomic gene transfer Allopolyploidization Gossypium Mitochondrial genome Chloroplast genome Numt Nupt
- B. napus (AACC)
- B. oleracea (CC)
- B. rapa (AA)
- G. arboreum,(A2)
- G. barbadense (AD2)
- G. herbaceum (A1)
- G. hirsutum (AD1)
- G. raimondii (D5)
Intergenomic gene transfer
Long terminal repeat retrotransposon
Nuclear organellar DNAs
Nuclear mitochondrial insertions
Nuclear plastidial insertions
Prokaryotic α-proteobacteria and cyanobacteria are known to be the forerunners of modern eukaryotic mitochondria  and chloroplasts [2, 3], as described by the endosymbiont theory. The transformation from endosymbionts to organelles was accompanied by massive DNA transfer among intracellular genomes, or intergenomic gene transfers (IGT). Although the pace of IGT has slowed considerably since eukaryote formation, it remains a common process that is characteristic of nuclear and organellar genome evolution in plants . Among the three types of genomes in a plant cell, there are six possible directions of gene transfer. The most prominent directions of IGT are from either organellar genome into the nuclear genome [2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], then from the nuclear and plastid genomes into the mitochondrial genomes [2, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]. Interorganellar transfer to the highly compact plastid genome appears to be quite rare [39, 40, 41, 42, 43] .
Recent research has revealed that plant mitochondrial genomes frequently integrate DNA from the other two cellular compartments. As the number of sequenced plant mitochondrial genomes has increased [27, 29, 39, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], the extent of integration from both the chloroplast [28, 30, 32, 62, 63] and nuclear genomes [33, 60, 62] has become more apparent. In general, plant mitochondrial genomes have between 0.56% (Marchantia polymorpha) – 10.85% (Phoenix dactylifera) plastid-derived sequences [9, 10, 34]. Nuclear sequence integration tends to be more abundant and more difficult to identify, as these commonly include retrotransposon and other repetitive fragments [28, 31, 33, 55].
Nuclear integrants derived from the mitochondrial and plastic genomes are termed numts  and nupts , respectively, with these collectively referred to as norgDNAs , or nuclear organellar DNAs. Environmental stresses have been shown to increase entry of organellar DNA into the nucleus in plants , with insertions commonly occurring in open chromatin regions . While norgDNAs are commonly thought to be inactive, there is some evidence of norgDNA transcription in plant species, including in rice  and cotton .
While IGT has relevance to broad questions of genome evolution, IGT takes on additional importance due to its possible relationship to plant fertility [65, 66, 67, 68]. Repeated IGT transfers can create regions of homology within the mitochondrial genome, which can provide hotspots for intra-organellar recombination. Mitogenomic recombination is a common phenomenon that may generate novel chimeric sequences [69, 70, 71, 72]. These novel chimeric sequences may co-transcribe with adjacent functional genes [73, 74, 75], subsequently affecting or interfering with mitochondrial electron transfer chain pathways [76, 77]. Furthermore, the phenomenon known as cytoplasmic male sterility is influenced both by the nuclear and mitochondrial compartments, as well as by their interactions . Thus, improving our understanding of IGT may help inform breeding strategies that utilize plant fertility differences, e.g., in male sterility to develop hybrids.
The genus Gossypium contains approximately 50 species [78, 79, 80], including four that have been domesticated and presently are cultivated for their seed trichomes, or cotton fiber. Two of these domesticated species belong to the clade of seven extant allopolyploid species (AD genome), which formed about 1–2 million years ago (Mya) when an A-genome diploid (resembling modern G. arboreum, A2 or G. herbaceum, A1) hybridized with a D-genome species (resembling modern G. raimondii, D5) and subsequently doubled in chromosome number [79, 81]. High-quality nuclear genome assemblies have recently become available for the diploids G. raimondii [64, 82] and G. arboreum , and for the allopolyploids G. hirsutum [84, 85, 86] and G. barbadense [86, 87, 88]. There also exist multiple organellar genome sequences for both diploid and allopolyploid cotton species [60, 61, 63, 65, 89, 90, 91, 92, 93]. These genomic data provide the foundation for discovery and description of IGT events in Gossypium.
Here we analyze intracellular transfer among nuclear and organellar genomes within four cotton species, including the two cultivated allopolyploids (AD genome) and models of their ancestral diploid (A, D) genome donors, to explore the prevalence of IGT in cotton (Gossypium). We characterize and compare the frequencies of the six possible classes of IGT events, as well as the sources and sizes of the inter-organellar sequences. We report a striking asymmetry in the fate of IGTs in allopolyploid cotton, with numts being preferentially lost relative to nupts. Finally, we explored the expression of norgDNAs.
General profiles of intergenomic gene transfer in Gossypium
The total length of IGT events among three genomes in four cotton species
Cp → Mt (kb)
Nu→Mt/Cp → Mt
Mt → Nu (kb)
Cp → Nu (kb)
Cp → Nu/Mt → Nu
Mitochondrial integrations are variable for nuclear repeats and conserved for chloroplast tRNA genes
Additionally, we evaluated the contribution of both nuclear and chloroplast derived sequences to overall mitochondrial genome size for 26 land plants. Although both nuclear and chloroplast sequences have contributed to mitogenome expansion, the total length of nuclear-like sequences in mitogenomes is more strongly correlated with mitogenome size variation (R2 = 0.77) than is the length of chloroplast sequences (R2 = 0.36) (Additional file 2A, B). This is partly due to variation in the total amount of repetitive sequence inserted into the mitogenome; however, this correlation is weak (R2 = 0.13) (Additional file 2C), as is the correlation between total repeat length and total nuclear/chloroplast length (R2 = 0.23 and 0.0048; Additional file 2D, E).
Nuclear insertion of mitochondrial DNAs (numts) and chloroplast DNAs (nupts)
Within Gossypium, the number of numts in diploids is two to three times that found in tetraploids (33 in Gossypium raimondii and 23 in G. arboreum, versus 11 in Gossypium hirsutum and 13 in G. barbadense; Fig. 4). Many of the diploid numts (23) are shared between the two diploids, G. raimondii (D5) and G. arboreum (A2), which indicates that these numts were incorporated into the nuclear genome prior to the divergence of the A and D clades at the base of Gossypium. Subsequently, the lineage leading to G. raimondii acquired a further 10 numts (nad5, atp9, ccmB, ccmC, rps3, rps7, nad6, cox2, sdh3 and rpl2). In contrast, the two tetraploids (G. hirsutum and G. barbadense) suffered massive, differential numt decay after allotetraploidization. Only three of the shared diploid numts (nad9, ccmFC, and rps10) and one D5-specific numt (nad6) are present in both G. hirsutum and G. barbadense. Gossypium hirsutum contains an additional six shared diploid numt genes (nad4, sdh4, cob, cox1, atp1 and rps14) and one D5-specific numt gene (cox2). On the other hand, G. barbadense contains five different diploid-commonnumts (nad4L, cox3, atp4, atp6 and mttB), two different D5-specific numts (sdh3 and rpl2), and two AD2-specific numts (nad3 and rps12).
Interestingly, most numts remain intact (full length) after insertion (Fig. 4, red cells), with few that degraded to truncated pseudogenes (Fig. 4, green cells). In a few cases (i.e., cox2, rpl2, rpl16, rps10 and rps14), there is evidence of pseudogenes in diploids and polyploids. The remaining 10 pseudogenes (nad9, sdh4, cob, cox1, cox3, atp8, ccmFC, rpl5, rps4, and matR) were intact in diploids but truncated in polyploids. Additionally, four numt pseudogenes (nad5, atp9, ccmB, and rps7) occurred only in G. raimondii (Fig. 4, green cells).
In contrast to the greater abundance of numts in G. raimondii, nupts experienced a general decay in G. raimondii relative to the remaining cotton species (Fig. 5). Nearly all of the 78 nupts are present in G. arboreum, G. hirsutum, and G. barbadense (except for rps16 nupt loss in G. arboreum) as full or partial nupts; 69 nupts are common to the four cotton species (Fig. 5). In general, G. arboreum (A2) retains the most nupts (77 out of 78), whereas there was considerable nupt degradation (39) or removal (9) from the G. raimondii genome. As in G. arboreum, nupts in G. barbadense are typically retained as intact (75/78) whereas in the other polyploid species, G. hirsutum, there has been a modest amount of degradation (12/78 are degraded; Fig. 5). One nupt (rps16) that is absent in G. arboreum is a pseudogene in G. raimondii and G. hirsutum, only presenting as intact in G. barbadense. This suggests that this rps16 nupt was transferred to the nucleus sometime during the evolution of G. raimondii after divergence from the G. arboreum lineage, and that it was subsequently retained in polyploid Gossypium until it experienced decay in the lineage leading to G. hirsutum. In G. barbadense, the three nupts (out of 78) that have experienced degradation (i.e., atpF, rpoB, and ycf1) all experienced IGT in the common ancestor of the diploid species (G. raimondii and G. arboreum), and experienced differential degradation in G. raimondii and both allopolyploids (except the intact rpoB in G. hirsutum). Degradation of nupts was more prominent in G. hirsutum, where 15% of nupts were degraded (12/78; psaB, psbC, petA, atpF, atpI, rpoC2, rps16, matK, accD, accmA, ycf1 and ycf2). Almost all of these (nine of 12) are also pseudogenes in G. raimondii, while chloroplast psbC, rpoC2 and matK were absent in the nucleus. Overall, G. raimondii experienced the most degradation of nupts, where over half (39 out of 78) were decayed and nine were not present (i.e., psaA, ycf3, psbC, ndhB, ndhD, ndhF, rpoC2, rps12 and matK).
Congruent with most angiosperm species, the majority of norgDNAs in cotton are small to medium in size (100 bp – 5 kb) (Additional files 3 and 4), and their distribution patterns vary among species (Additional file 3). Two previously noted complete mitochondrial genome transfers were found on Chr01 of G. raimondii and ChrA03 of G. hirsutum [64, 93], similar to large-scale norgDNAs found in other plants, e.g., the numt on Chr2 of A. thaliana [61, 101] and the nupt inserted into one scaffold of S. bicolor (Additional file 3). The number of reverse matches (the direction of the sequence intervals along the genome coordinates in donor and receptor genomes are reverse, the start-to-end position of one genome is from small number to large number, while the other genome from large to small) is no fewer than that of the positive matches (both positions in donor and receptor genomes are from small to large, or from large to small). Two kinds of matches coexist in the same chromosome in most studied species. The nupts of six chromosomes (chr01, chr03, chr05, chr06, chr08 and chr09) are all reverse matches (blue dots or fragments) and the other seven chromosomes positive matches (red dots or fragments) in G. raimondii. Third, norgDNAs in some species frequently transfer into certain chromosomes, but such nuclear hotspots of integration vary from species to species, like numts on Chr2 of A. thaliana, Chr01 of G. raimondii, and ChrA03 of G. hirsutum. Fourth, the transfer of norgDNAs is phylogenetically sporadic, as there often is little norg similarity between closely related species.
Repeats variation of nuclear genomes in four cotton species
Genome sizes (Mb)
Repeat sizes (Mb)
Repeat percentage (%)
Numt length (kb)
Numt percentage (%)
Nupt length (kb)
Nupt percentage (%)
Low expression levels of organellar genes in nucleus (norgDNAs)
Transcription levels of 14 organellar genes and nuclear homologs in two G. hirsutum varieties
Organellar genes/ Nuclear homologies
DNA lengths (bp)
NorgDNAs changes during the process of diploids and allopolyploids evolution
Variable rates of IGT occur between the intracellular genomes of Gossypium
In our study, we characterized the rate and direction (i.e., nucleus ↔ chloroplast, nucleus ↔ mitochondrion, chloroplast ↔ mitochondrion) of IGT between the intracellular genomes of four Gossypium species. We detected most mitochondrial genes in present mitogenomes, which suggest that most genes (i.e., protein-encoding, rRNA, and tRNA genes) in Gossypium mitochondrial genomes are much conserved [29, 102, 103, 104]. For example, the mitochondrial genome of Gossypium retains some genes encoding both complex II (succinate dehydrogenase, sdh genes) and ribosomal subunits (rpl and rps genes) that have been massively lost in other angiosperms [9, 17], e.g., sdh3 and sdh4, rpl10 and rps10. Meanwhile, we found all chloroplast genes in any of the four cotton species’ plastids, which validates the general conservation of chloroplast genes , as in most plant species . While all three intracellular genomes were involved in IGT, the only directions of IGT not detected were nuclear or mitochondrial transfers into the chloroplast. This is consistent with observations from most other plants , with two notable exceptions, i.e., Daucus carota  and Asclepias syriaca , where intracellular transfers into the chloroplast have been reported. Together, these observations suggest that the plastid genome lacks an active mechanism to integrate exogenous sequences.
Overall, we found substantially more sequences were integrated into the nucleus than the mitochondrion, a phenomenon that may be due to the mechanisms by which sequences get integrated into each genome, limitations on mitochondrial genome size, or both. Nearly all mitochondrial and chloroplast genes experienced transfer to the nucleus in Gossypium, which is a pattern seen in most other angiosperm plants , and at a transfer rate that is 100-fold greater than into the mitochondrial genome.
Non-additive effects of allopolyploidization on IGT in Gossypium
Genome doubling via polyploidy has numerous consequences . Polyploidy can alter the size, content, and complexity of the genome , consequently affecting genetic variation, stress adaptation, biological complexity, speciation, biodiversity  and evolutionary novelty . Myriad genomic consequences have been documented for polyploidy [110, 111, 112]; however, the cyto-nuclear effects of allopolyploidy in particular (which results from hybridization of divergence species) have been underexplored  and little is known regarding the influence of polyploidy on IGT. Here, we evaluate IGT transfers for two allopolyploid species, G. hirsutum and G. barbadense, which arose from a single polyploidization event about 1–2 million years ago involving the ancestors of the two diploid cottons sequenced here . We found one diploid ancestral species (here, G. raimondii) experiences more nupt truncation and/or loss than the other diploid ancestral species (here, G. arboreum), and the resulting polyploid retains nearly all of the nupts found in the diploids, similar to the extensive retention found in G. arboreum. These patterns are consistent with the general observation found from a survey of 21 land plants . Conversely, mitochondrial-to-nuclear IGT are massively lost in the allopolyploid species. That is, only a few decayed numts are retained in the allopolyploid species, less than retained in either diploid progenitor, and those that were retained were more commonly the older numts shared between diploid species.
The total length of retained numts and nupts did not approach additivity for either polyploid species, G. hirsutum (AD1) or G. barbadense (AD2), with the possible exception of nupts in G. hirsutum whose total length was 80% of that found in their representative model diploid parents, G. raimondii (D5) and G. arboreum (A2). This may reflect insertions in the model diploid parents, either after divergence from the polyploid or the true polyploid progenitors, or it may represent differential decay in the polyploids, both upon formation and over time. Interestingly, the ratio of nupt to numt in the two allotetraploid cotton species was twice that in both diploid cotton species, indicating a possible shift in relative nupt and numt incorporation and/or retention (toward nupt) in polyploid cotton. This could be partially explained by the challenge in uniquely identifying mitochondrial-derived repeats from the background repeats of the nuclear genome, particularly as these degrade over time. Our preliminary results (above) suggest similar norgDNA integration rates in Gossypium and Brassica, but more data are needed to understand the influences of ploidy on nupt and numt integration and degradation dynamics. While this hints at the differences among allopolyploid systems with respect to IGT, it is premature to draw general, more widely applicable conclusions. Clearly, this area requires further study to understand the evolutionary implications of norgDNAs, the patterns and processes by which they evolve in different biological systems, and the influences of ploidy on integration and degradation.
Patterns of IGT among intracellular genomes
Here we found that the mitochondrial atp6-trnW intergenic region and the chloroplast inverted repeat region represent hotspots for IGT source material, i.e., these regions were frequently transferred to other intracellular genomes. In the mitochondrial genome, these (and other) transfers were frequently associated with 10 bp microhomologies, a phenomenon that was observed across distantly related genera (as 2-6 bp microhomologies).
Detected numts were found in various states of completeness, either due to the length during transfer or subsequent decay. Here we found that most numts remain full-length after insertion, suggesting that the mechanism responsible for numt generation may preferentially operate on full length genes; however, a few genes did transfer to the nucleus as partial sequences, indicating that partial genes are not excluded from transfer. Using a phylogenetic approach, we also detected genes that were transferred intact but decayed afterwards, e.g. nad9 and atp8. Because younger numts are more readily identified and detection becomes more difficult as the numt decays, there is a natural bias toward detection of younger, more intact, and/or conserved numts, decaying numts/nupts slowly becoming increasingly difficult to detect as they lose sequence similarity to their source. Here, however, we describe several numts/nupts that have survived the basal-most radiation of the genus, approximately 5-10 MYA. Further insight into the potential reasons for these uncommon retentions will require additional functional study.
Contributions of IGT to genome expansion
The underlying causes of genome size variation represent an old question for the nuclear genome and a relatively recent one for the mitochondrial genome. With respect to the latter, we found that both nuclear and chloroplast sequences are correlated with mitogenome expansion, concordant with the view that contamination of plant mitochondrial with nuclear and chloroplast DNA is at the heart of mitogenome expansion in plants . As most plant genomes are composed of massive amounts of repetitive sequence, it is tempting to suggest that nuclear-derived repeats should represent the most frequent transfers; however, correlations between repetitive sequence characteristics (e.g., number, total amount) and the mitogenome size were weak. Therefore, although nuclear repeat-derived transfers do contribute to mitogenome size increase, they cannot fully explain the correlation between nuclear-to-mitochondrial transfer and mitogenome expansion.
With respect to nuclear genome size variation, it is commonly accepted that repetitive content underlies most of the size variations among species; however, the contribution of other sources of genome size expansion are less well characterized. Here we found that while nuclear repeats do contribute the most (more than half) to genomes size differences among species (i.e., 55.60, 68.50, 67.20 and 69.11% in G. raimondii, G. arboreum,G. hirsutum and G. barbadense, respectively), the contribution of numts and nupts to genome size is not insignificant (Table 2). Presence of numts and nupts were both positively correlated with nuclear genome size, with nupts affecting the genome size to a somewhat greater extent (i.e., 0.27, 0.20, 0.20 and 0.13% in G. raimondii, G. arboreum, G. hirsutum and G. barbadense, respectively) than numt doing (i.e., 0.15, 0.11, 0.05 and 0.04% in G. raimondii, G. arboreum, G. hirsutum and G. barbadense, respectively, Table 2). We also found a positive correlation between nuclear repeats and norgDNAs, which may reflect a greater ability for norgDNAs to successfully integrate into genomes with larger gene-free regions.
The consequences of organelle-to-nuclear transfers are largely unknown
Organelles have a history of functional transfers to the nucleus, some of which are conserved among distantly related lineages and others which are lineage specific. These are in addition to non-functional transfers, which may represent sequences varying in size and content from gene fragments to large regions of the organellar genome [4, 14, 18, 19, 46, 101, 115, 116]. While many recent organellar-derived sequences are inactive and/or nonfunctional, some transfers to the nucleus may have function [11, 16, 17, 117], and both functional and non-functional transfers can have consequences for intracellular metabolism and genome evolution [14, 17, 18, 114, 117, 118]. We analyzed the expression of organellar genes and their norgDNAs as a proxy for function, finding that those organelle-derived genes were generally expressed at a far lower level than their organellar counterparts, suggesting limited functional potential. Therefore, while expression of norgDNA does appear to occur, it may reflect leaky transcription rather than function. Expressed and potentially functional norgDNAs, however, do occur in plants (despite the lack of a nuclear promoter) , cautioning against a ubiquitous assumption of non-function. Additionally, as we noted, characterization of norgDNA expression needs to acknowledge the high similarity of organelle-derived reads to those from nuclear integrants.
Evolutionary asymmetry of norgDNAs in the allopolyploidization process of Gossypium
During polyploid evolution, many of the numts present in the diploids have been lost and nearly all nupts in diploids are still retained in current polyploids. Given that assembly quality differences among the genomes represents a possible alternative explanation for presence vs. absence of specific genes, we compared our results to a new analysis using the more recently released and even higher quality genome assemblies for G. hirsutum and G. barbadense . This analysis mostly reiterated our results, but with some differences. In G. hirsutum, there are three additional numts (sdh3, rpl5 and rps7) and one lost numt (nad4), and one intact nupt (matK, the previous inference was not intact) and one decayed nupt (rps12, the previous inference was intact). In G. barbadense, there are six new numts (sdh4, cob, cox1, rpl5, rps7 and rps14) and four lost numts (nad3, nad4L, rps12 and mttB), and four newly decayed nupts (ycf2, ycf3, ycf4 and rps12) that were intact using the earlier released assembly. These results emphasize both the potential impact of assembly quality on inference of numt gain and loss, but also confirm that our general conclusions about differential loss and gain in the polyploids are valid. In addition, we also detected some previously published large-fragment transfers, such as the complete mitochondrial genome transfers in Chr01 of G. raimondii and ChrA03 of G. hirsutum [64, 93], the numt on Chr2 of A. thaliana [61, 101], and additional, newly detected transfers, further validating reliability and robustness of our methods. Therefore, we conclude that the numt and nupt dynamism reported here is a genuine biological phenomenon.
This study has concluded that among the three intracellular genomes, most IGT was from organelles into the nuclear genome in two cultivated allopolyploids and their ancestral diploid cotton species; however, both nuclear retrotransposons and chloroplast tRNA genes integrated into mitochondrial genomes at a rate sufficient to correlate with mitogenome size increase. We detected hotspot regions for both the source of IGT (e.g., the atp6-trnW intergenic region) and the destination, which require further study in diverse plants to determine the patterns and generality of these observations. We also found that following allopolyploidy, there was a striking asymmetry in IGT retention in the nuclear genome, with most numts being lost but most nupts retained. While it is tempting to attribute parental origin to the loss of these fragments, in that paternally-derived norgDNAs could potentially be interfering, and therefore deleterious, we saw no bias in loss with respect to parental origin. As this is the first report of the relationship between intergenomic gene transfer and allotetraploidy, data from additional polyploid systems is required to understand the evolutionary dynamics of IGT in polyploids.
Plant materials and genome data
We used two varieties of upland cotton (G. hirsutum), Xinluzao 11 (X11) and Xinluzhong 42 (X42) for expression analysis. The seeds of X11 and X42 were provided by our own laboratory. X11 (original name: Yuzao 202) was introduced by Bole Seed Station from the Institute of Cash Crops, Henan Academy of Agricultural Sciences, Henan, China, in 1994. After many years of breeding trial, it was approved by the Crop Variety Approval Committee of Xinjiang Autonomous Region, Xinjiang, China, in 1999, and named as Xinluzao 11. X42 was bred by the Institute of Cash Crops of Xinjiang Academy of Agricultural Sciences and approved by the Crop Variety Approval Committee of Xinjiang Autonomous Region, Xinjiang, China, in 2009.
The chloroplast, mitochondrial and nuclear genome sequences of two diploids (G. raimondii and G. arboreum), and two allotetraploids (G. hirsutum and G. barbadense) were downloaded from the NCBI database (accession numbers listed in Additional file 1).
Identification of intergenomic-transfer gene
For each species in the study, we performed pair-wise comparisons of chloroplast or mitochondrial genes and nuclear chromosomes sequences using BLAST (command code 1 in Additional file 9) . We set e-value to 1e− 5 and a 100-bp minimal length for a high match (95%). We also identified the mitochondrial insertions of chloroplast DNA (mtpts) using local BLASTN (version 2.2.23) with the 50-bp minimal length of the match (identity > 95%, coverage > 90%). We cataloged those transfers as full-length genes. Pseudogenes are without full-length or existing mutations resulting in premature stop codons.
Detection of nuclear transposable elements and repeats in mitochondrial genomes
We detected nuclear transposable elements from nuclear sources using RepeatMasker (command code 2 in Additional file 9) (http://www.repeatmasker.org) with a custom Gossypium-enriched repeat database for the four cotton species studied. We used two-tailed t-tests to evaluate the significant levels of the different types. The repeats in mitochondrial genome were identified by repeat-match algorithm (command code 3 in Additional file 9) in MUMmer . Specific parameters include: -f (use the forward strand only), −n (minimum match length; default 20), and -t (only output tandem repeats).
IGT hotspot analyses in Gossypium
We performed dot matrix comparisons between the mitochondrial or chloroplast genomes and nuclear chromosomes of four Gossypium species using nucmer program of MUMmer. We set 100-bp minimum size for an exact match and 500-bp minimal interval between every two matches . We calculated middle positions of all Gossypium organellar insertions into nuclear chromosomes to tabulate transfer hotspots. Then, we draw the charts of the frequency distribution by R (command code 4 in Additional file 9) (https://www.r-project.org/).
Expression analysis of norgDNAs
Extracted total RNA using improved cetyltrimethylammonium bromide (CTAB) and sodium dodecyl sulfate (SDS) method from leaves of two accessions of upland cotton, X11 and X42, were sequenced on an Illumina HiSeq2500 at Shanghai Hanyu Biotech Co., Ltd. Sequencing libraries were generated using the Illumina TruSeq RNA Sample Preparation Kit (Illumina, USA) following the manufacturer’s recommendations, and four index codes were added to diagnose the sample origins (nuclear or organellar) for each sequence. Following experimental confirmation of concentration and purity, poly-(T) oligo-attached magnetic beads were utilized for nuclear mRNA enrichment. Fragments, preferentially 200-300 bp in length, were enriched using the Illumina PCR Primer Cocktail in a 10 cycle PCR amplification to form cDNA libraries. Finally, libraries were the paired-end sequenced in one lane with 4 Gb clean reads/sample of an average length of 125 nt. RNA sequences data quality was checked using FastQC. The reads were mapped to the norgDNAs homologies using bowtie 2 (command code 5 in Additional file 7) , then samtools idxstats  (command code 6 in Additional file 9) were used to calculate the expression reads counts of each gene. The RPKM values were used to estimate relative expressions. Expressed paired-end reads were mapped onto their respective consensus sequences using BWA 0.7.10- r789 ; then the results were transformed into BAM files using SAMtools view ; and structural variations (SVs) and InDels were visualized using the Integrative Genomics Viewer . The total RNA of X42 and X11 were reverse-transcribed using two pairs of primers at once, i.e., oligo dT primer and random 6 mers, to capture both nuclear expression (NE) and organellar expression (OE). The cDNA produced by the oligo dT primer represents nuclear expression, whereas cDNA produced by the random 6 mers represents a combination of nuclear and organellar genes (NOE). OE is equal to NOE minus NE. Both kinds of cDNA were used as the templates for qRT-PCR. qRT-PCR experiments were conducted using SYBR Premix Ex Taq™ (Tli RNaseH Plus) RR420A kit (TaKaRa) by Applied Biosystems 7500 Real-Time PCR System. The procedure consisted of three stages: stage 1, 95 °C, 30 s, 1 cycle; stage 2: 95 °C, 5 s, 60 °C, 35 s, 40 cycles; stage 3: 95 °C, 15 s, 60 °C, 1 min, 95 °C, 35 s, 1 cycle. Using the cotton housekeeping gene UBQ7 as internal control, we analysed the relative expression levels of two organellar genes and their nuclear copies, using the 2−△△Ct method. Each sample is repeated three times.
NZ analyzed the data, interpreted the results, and prepared the manuscript. ZC attended data collection, original analysis and discussion. JFW and CEG attended discussion, data analysis and the manuscript revision. JH conceived the experiment design, provided research platform, and revised the manuscript. All authors approved the final manuscript.
This research was conducted through funding support from National Natural Science Foundation of China (31671741) and National Key R & D Program for Crop Breeding (2016YFD0101305). The funding body played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. Each fund supports the experimental cost and paper revision fee, etc.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 31.Rodriguez-Moreno L, Gonzalez VM, Benjak A, Marti MC, Puigdomenech P, Aranda MA, et al. Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin. BMC Genomics. 2011;12:424.PubMedCrossRefPubMedCentralGoogle Scholar
- 42.Iorizzo M, Senalik D, Szklarczyk M, Grzebelus D, Spooner D, Simon P. De novo assembly of the carrot mitochondrial genome using next generation sequencing of whole genomic DNA provides first evidence of DNA transfer into an angiosperm plastid genome. BMC Plant Biol. 2012;12(1):1–17.CrossRefGoogle Scholar
- 53.Chaw S-M, Chun-Chieh Shih A, Wang D, Wu Y-W, Liu S-M, Chou T-Y. The mitochondrial genome of the gymnosperm cycas taitungensis contains a novel family of short interspersed elements, bpu sequences, and abundant rna editing sites. Mol Biol Evol. 2008;25(3):603–15.PubMedCrossRefPubMedCentralGoogle Scholar
- 73.Okazaki M, Kazama T, Murata H, Motomura K, Toriyama K. Whole mitochondrial genome sequencing and transcriptional analysis to uncover an RT102-type cytoplasmic male sterility-associated candidate gene derived from Oryza rufipogon. Plant Cell Physiol. 2013;54(9):1560–8.PubMedCrossRefPubMedCentralGoogle Scholar
- 79.Wendel JF, Grover CE. Taxonomy and evolution of the cotton genus, Gossypium. Cotton. 2015;57:25–44.Google Scholar
- 101.Stupar RM, Lilly JW, Town CD, Cheng Z, Kaul S, Buell CR, et al. Complex mtDNA constitutes an approximate 620-kb insertion on Arabidopsis thaliana chromosome 2: implication of potential sequencing errors caused by large-unit repeats. Proc Natl Acad Sci U S A. 2001;98(9):5099–103.PubMedCrossRefPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.