The wild sweetpotato (Ipomoea trifida) genome provides insights into storage root development
- 350 Downloads
Sweetpotato (Ipomoea batatas (L.) Lam.) is the seventh most important crop in the world and is mainly cultivated for its underground storage root (SR). The genetic studies of this species have been hindered by a lack of high-quality reference sequence due to its complex genome structure. Diploid Ipomoea trifida is the closest relative and putative progenitor of sweetpotato, which is considered a model species for sweetpotato, including genetic, cytological, and physiological analyses.
Here, we generated the chromosome-scale genome sequence of SR-forming diploid I. trifida var. Y22 with high heterozygosity (2.20%). Although the chromosome-based synteny analysis revealed that the I. trifida shared conserved karyotype with Ipomoea nil after the separation, I. trifida had a much smaller genome than I. nil due to more efficient eliminations of LTR-retrotransposons and lack of species-specific amplification bursts of LTR-RTs. A comparison with four non-SR-forming species showed that the evolution of the beta-amylase gene family may be related to SR formation. We further investigated the relationship of the key gene BMY11 (with identity 47.12% to beta-amylase 1) with this important agronomic trait by both gene expression profiling and quantitative trait locus (QTL) mapping. And combining SR morphology and structure, gene expression profiling and qPCR results, we deduced that the products of the activity of BMY11 in splitting starch granules and be recycled to synthesize larger granules, contributing to starch accumulation and SR swelling. Moreover, we found the expression pattern of BMY11, sporamin proteins and the key genes involved in carbohydrate metabolism and stele lignification were similar to that of sweetpotato during the SR development.
We constructed the high-quality genome reference of the highly heterozygous I. trifida through a combined approach and this genome enables a better resolution of the genomics feature and genome evolutions of this species. Sweetpotato SR development genes can be identified in I. trifida and these genes perform similar functions and patterns, showed that the diploid I. trifida var. Y22 with typical SR could be considered an ideal model for the studies of sweetpotato SR development.
KeywordsIpomoea trifida genome Sweetpotato Evolution Storage root development QTL BMY11 (beta-amylase)
Sweetpotato (Ipomoea batatas (L.) Lam.), which is mainly cultivated for its underground storage root (SR), was found in the Americas approximately 8000–10,000 years ago and domesticated at least 4000 years ago . This plant was then spread around the world, which could be traced back to the beginning of pre-Columbian times . Because it is easy to grow, with high yield and an abundance of starch and nutrients, it has become an important part of diets around the world. In recent years, the global production of sweetpotato was more than 100 million tons (http://www.fao.org/faostat/en/#home); it has become one of the three major root and tuber crops and the seventh most important food crop in the world [3, 4, 5]. Understanding the mechanism of SR formation and development is of pivotal importance for further improving sweetpotato yield . To date, sporamin proteins, Dof-type zinc finger proteins, MADS-box proteins and KNOX proteins have been shown to be associated with SR development [7, 8, 9]. The lack of genomic information has slowed research into SR development . The candidate genes corresponding to many sweetpotato quantitative trait loci (QTL) remain elusive . The newly released haplotype sweetpotato genome  provides an additional resource to help reach this goal, but the chromosome-scale assembly was performed according to the Ipomoea nil genome, which might not be suitable for genomics research, such as QTL investigations. Sweetpotato has a large number of small chromosomes (2n = 6x = 90, B1B1B2B2B2B2), with highly repetitive elements and high heterozygosity, and it contains a homogenous B2subgenome [11, 12], which causes difficulty in generating high-quality sequences at the chromosome level .
Among approximately 700 species of the genus Ipomoea , I. trifida is the closest wild relative of sweetpotato [12, 15, 16]. Artificial crossing and cytogenetic research suggest that hexaploid I. trifida arose from diploid I. leucantha and tetraploid I. littoralis, while tetraploid I. littoralis was an autopolyploid from the I. leucantha B genome [17, 18]. I. leucantha and I. littoralis should be considered the diploid and tetraploid forms of I. trifida based on interspecific hybridization and cytogenetics, respectively [18, 19]. The artificial hexaploid I. trifida produced from diploid and tetraploid I. trifida has the same chromosome types as sweetpotato, and thus, sweetpotato may derive from hexaploid I. trifida [18, 19, 20]. Moreover, data from noncoding chloroplast and nuclear ITS sequences and nuclear SSRs supported an autopolyploid origin of sweetpotato from a progenitor that shared the diploid I. trifida genome . Triploid I. trifida may have provided a bridge from diploid and tetraploid to hexaploid I. trifida [16, 19, 20]. Our recent result of restriction-site-associated DNA sequencing (RAD-seq) indicated that sweetpotato originated from hexaploid I. trifida, and that hexaploid I. trifida evolved from tetraploid I. trifida and diploid I. trifida . However, Wx intron variations support an allohexaploid origin of sweetpotato from I. tenuissima and tetraploid I. littoralis Blume or tetraploid I. tabascana, and the I. tenuissima derived earlier from diploid I. trifida and an unidentified species . Besides, the results of the newly released haplotype-resolved sweetpotato genome also suggested that sweetpotato may the result of an initial cross between a tetraploid progenitor and a diploid I. trifida progenitor . Either it is the most likely progenitor of sweetpotato or a part of its genome has introgressed into that of sweetpotato . In any case, diploid I. trifida is at a critical point in the complex evolutionary history of sweetpotato, and the origin of sweetpotato remains disputed. Therefore, many studies have focused on diploid I. trifida; a genome survey has been reported , and another genome assembly can be viewed online (http://sweetpotato.plantbiology.msu.edu/new.shtml). Previous studies have mainly focused on the fibrous root (FR), pencil root (PR) or thick root (TR) of I. trifida [20, 24, 25, 26, 27, 28]. Few studies have reported on the SR, possibly because of a lack of diploid material with SR, and use of the genome to study SR development has not yet been reported.
Here, we report a high-quality, chromosome-anchored reference genome of the diploid I. trifida var. Y22 , which has typical SR and is similar to sweetpotato (Additional file 1: Figure S1). We de novo assembled the highly heterozygous genome (2.20%) with a combined strategy, and 30,227 gene models were predicted. We found that a whole-genome triplication (WGT) occurred before its speciation, approximately 74.5 million years ago (MYA). Through an integrated analysis of gene family evolution, root comparative transcriptomes, QTL mapping, qPCR, we found that the new gene BMY11 (with identity 47.12% to beta-amylase 1) may play a key role in the process of SR development. The key genes between Y22 and sweetpotato have a similar expression pattern, and sweetpotato SR development genes can be identified by QTL mapping of the diploid I. trifida genetic population. This work will be very helpful to further understand the complex evolutionary history and SR development mechanisms of I. trifida and sweetpotato.
Genome assembly, validation and annotation
We constructed a 180 bp paired-end (PE) library and sequenced the Y22 genome using Illumina PE125, which yielded 62.27 Gb of data for genome survey analysis. K-mer frequency  distribution analysis showed that the genome of Y22 was 476.6 Mb, consistent with the estimate based on flow cytometry , and the heterozygosity and the proportion of repeat sequences were 2.20 and 48.42%, respectively (Additional file 2: Table S1, Additional file 1: Figures S2 and S3). To achieve a high-quality and chromosome-scale reference genome, 125.6 Gb of clean Illumina shotgun reads (approximately 264-fold coverage of the genome) and 537 Mb of Illumina Moleculo synthetic long reads (approximately 1-fold coverage of the genome) (Additional file 2: Table S2) were initially assembled into 431.57 Mb of sequence with a contig N50 of 26.50 Kb and a scaffold N50 of 580.68 Kb (Additional file 2: Table S3). Then, we incorporated 10.05 Gb (21-fold coverage of the assembly) of PacBio RS II data with an N50 read length of 17.03 Kb to increase sequence continuity. The contig and scaffold N50 of the final assembly were 54.49 Kb and 607.92 Kb, respectively (Additional file 2: Table S4). Additionally, 96.71% of the I. trifida genome was covered by the assembled 460.93 Mb scaffolds. A genotype-by-sequencing (GBS) [31, 32] genetic map was constructed using 200 true F1 individuals (Y25 × Y22) (Additional file 1: Figures S1 and S4; Additional file 2: Tables S5 and S6), and 400.44 Mb of sequence (86.88% of the final scaffolds) was successfully anchored to 15 chromosomes (Additional file 1: Figures S5 and S6), which could be considered as a better reference for I. trifida than the recent released genome (373.4 Mb scaffolds were anchored to 15 linkage groups) .
The quality and completeness of the assembly were evaluated by various datasets. First, we mapped the short insert size library reads to the assembled genome; the mapping rate was 95.2, and 95.76% of the assembled genome had more than 20-fold coverage (Additional file 2: Table S7). The GC content and sequence depth distribution calculated for the 10 k non-overlapping sliding window of the assembled genome showed that the assembled genome was not contaminated (Additional file 1: Figures S7 and S8). Second, the full-length transcripts assembled from 7.2 Gb of leaf RNA-seq reads by Trinity  were aligned back to the assembled genome. The results showed that more than 98.80% of the full-length transcript sequences could be mapped back onto one scaffold with sequence coverage more than 50% (Additional file 2: Table S8), which was comparable to the results of the highly heterozygous assembly . Third, 98.39% of 248 core eukaryotic genes (Cluster of Essential Genes (CEG) database)  could be aligned back to the genome assembly with high confidence (Additional file 2: Table S9). And the 93.9% of the BUSCO  conserved genes were complete in the assembly. These results indicated that our assembled genome achieved complete coverage of the conserved genes.
Comparative genomics analysis
Whole-genome duplication (WGD) events in I. trifida were investigated using paralogous regions detected by protein sequence similarity to reveal their importance in genome evolution . We identified 1856 intra diploid I. trifida syntenic blocks (with at least 5 genes per block), which contained 17,630 genes, accounting for ~ 58.33% of the total genes. A four-fold transversion (4dTv) analysis indicated that the WGD occurred at the peak around 0.31, approximately 74.5 MYA (Fig. 2a, b) which is similar to the reported result . Notably, these WGD/WGT events of this historical period may have enabled the survival of the most recent common ancestor of these species across the Cretaceous-Paleogene boundary (Fig. 2a), as attested in many other angiosperms . A gene block comparison of I. trifida versus itself also showed numerous duplicated gene pairs (Fig. 1), which is clear structural evidence of a WGT event. We found that chromosomes 4, 9, 13 and 15 of I. trifida each obviously have two additional paralogous segments from other chromosomes (Fig. 2c). This result indicated that a WGT event occurred in the Ipomoea genome instead of the reported WGD . To confirm this WGT event, we further compared the genome of I. trifida with that of Coffea canephora, which has no lineage-specific WGD/WGT event except the older gamma genome triplication in the common ancestor of most eudicots . As expected, the structural analysis showed that one C. canephora region tends to correspond to three segments of I. trifida. More specifically, chromosome 1 of C. canephora has a syntenic relationship with chromosomes 7, 9 and 11 of I. trifida, and chromosome 4 of C. canephora has a syntenic relationship with chromosomes 5, 12 and 13 of I. trifida (Fig. 2c). Furthermore, the mutual coverage of C. canephora and I. trifida achieved the maximal value under a corresponding relationship of 1:3. Similarly, comparing I. trifida with Vitis vinifera, the genome coverage reached a maximal value of 93.2% for I. trifida and of 95.8% for V. vinifera at the ratio of 3:1, which also supports the results of the above WGT analysis (Additional file 2: Table S18).
The expanded and contracted gene families of I. trifida were identified using CAFE  among I. trifida and four non-SR-forming species, including A. thaliana, S. lycopersicum, C. canephora and I. nil. There were 910 expanded gene families enriched in 22 pathways (p value< 0.05). Notably, some of these families have been reported to have functions in carbohydrate metabolism, for example, glycosaminoglycan degradation, carbon metabolism, and carbon fixation in photosynthetic organisms (Additional file 2: Table S19). Among these expanded gene families, we found that several of them may be important for SR development based on the functional annotation, such as beta-amylase genes [3, 44], MADS-box genes [45, 46], ethylene-responsive transcription factor (ERF) genes , beta-galactosidase genes , IAA-related genes [49, 50] and gibberellic acid (GA)-related genes such as GA20ox and gibberellin-regulated protein  (Fig. 2d). Besides, there were 100 contracted gene families enriched in 15 KEGG pathways, such as plant-pathogen interaction, ABC transporters, and ascorbate and aldarate metabolism (Additional file 2: Table S20).
Identification of key genes in SR development
To further investigate the key genes responding to SR starch accumulation, we used the starch contents of the dried roots of 200 true F1 individuals and two parents for QTL analysis with a GBS genetic map (Additional file 1: Figures S17 and S18). We obtained five QTLs, which were all located in chromosome 3, including 39 genes (Additional file 2: Table S25). Four of them were upregulated, and one was BMY11 (Itr.Sc0000035.86, with identity 47.12% to beta-amylase 1, named BMY11) (Fig. 3b). Notably, among the four upregulated genes, beta-amylase gene was identified above among both the expanded gene families in the genome and the differentially upregulated genes. We examined all the members of this gene family in I. trifida and four other plants, i.e. I. nil, S. lycopersicum, C. canephora, and A. thaliana, and compared them. We found that the beta-amylase gene number in I. trifida was higher than that in non-SR-forming plants (Additional file 1: Figure S19). In the beta-amylase phylogenetic tree, I. trifida possessed the same or more number of beta-amylase gene members than I. nil expect two clade, which were the clade of BMY11, containing two I. nil genes (INIL06g37601 and INIL06g37600) due to its specific tandem duplication (the distance of these two genes are 3350 bases), and the clade containing three I. nil genes (INIL15g27806, INIL15g27808 and INIL10g41052) (Fig. 3c).
BMY11 and SR development
We found one homologous transcript (identity = 98.85%) was also identified using microarray hybridization during the process of SR development in sweetpotato var. Guangshu87 (Additional file 2: Table S26) . one BMY11 (gene ID: TU52177, identity = 99.39%) in the haplotype-resolved sweetpotato assembly . Through full-length transcript identification sweetpotato var. Xushu18, we found that BMY11 could exist in more than one copy in the cultivated sweetpotato genome (Additional file 2: Table S27). Similar to Y22, the expressions of BMY11 in PR and FR were higher than that of in SR, which also indicated that it keep an appropriate activity is beneficial to SR swelling. Based on qPCR results, the expression of BMY11 in the transverse SR sections of Xushu18 was much lower than that in Y22 SR (Additional file 1: Figure S20). These expression differences indicated that the lower expression of BMY11 may be beneficial for greater starch accumulation and contribute to rapid SR swelling and larger tuberous root formation in cultivated sweetpotato. A similar expression pattern was found in potato, i.e., beta-amylase activity in doubled monoploid (derived from a primitive South American cultivar) tubers was 5 to 10-fold higher than that in a diploid breeding line (more closely resembling commercially cultivated tetraploid potato) tubers . Similar phenomena may have occurred due to natural evolution in the beta-amylase gene family and the relative lower expression of this gene might be essential for the tuberous root crops.
In this research, we used a combined strategy to overcome the problem of high heterozygosity and provide a chromosome-scale reference genome sequence of much higher quality than the previous survey results  and the recent released genome , which could be considered as a better reference for diploid I. trifida. The assembled genome enables us to characterize genomic features in this species and compare the genome with other published plant genomes, such as cultivated sweetpotato and its closely related species, I. nil [11, 37]. For example, I. trifida harbours fewer repeat sequences than I. nil and shows more efficient elimination of LTR retrotransposons, resulting in a smaller genome. Using new evidence, we clarify that both I. trifida and I. nil underwent a WGT instead of the reported WGD  long before their divergence. Furthermore, naturally occurring horizontal gene transfer (HGT) in plants has been reported infrequently, but diploid I. trifida (including Y22, Contig_2131, identity = 92.66%) commonly contains IbT-DNA2 . This result indicates that diploid I. trifida, as a naturally transgenic plant, could have existed at least nearly 1.3 MYA . Sweetpotato also contains IbT-DNA2 [62, 63]; therefore, this sequence may have been inherited from diploid I. trifida. All these findings enable a better understanding of the genome evolution of I. trifida, and the high-quality genome of Y22 should be considered a valuable resource for investigation of the genome evolution of sweetpotato and of the genus Ipomoea in general.
Within the genus Ipomoea, sweetpotato is one of the most important crops, with a global annual production of more than 100 million tons. The released haplotype-resolved genome of sweetpotato was generated by anchoring 75.7% of the scaffolds to 15 pseudochromosomes based on gene and sequence synteny between sweetpotato and I. nil . This compressed monoploid (~ 836 Mb) was also considered as valuable resources for investigating the complexity of chromosome sequence composition in sweetpotato  and this reference sequence for sweetpotato can be improved with long-read sequencing technology, Hi-C sequencing combining with novel assembly algorithm [11, 39, 64]. The hexaploid genome has proven to be very difficult to assemble and impedes its genetic research in genomic level; to find a proper genetic research model could be a short way. As the closest wild relative and putative progenitor, diploid I. trifida has been considered a model species for sweetpotato, including genetic, cytological, and physiological analyses . With the high-quality genome of I. trifida, we further demonstrated this concept and identified the key genes associated with an important agronomic trait (SR) with extensive evidence: (1) Y22 can be transplanted like sweetpotato and develop SR (Fig. 3a, Additional file 1: Figure S1), and the SRs of both species contained the highly specific protein sporamin expressed  (Additional file 1: Figure S15, Additional file 2: Table S23); (2) the comparative transcriptome analysis of four typical roots provided strong evidence to show that upregulation of genes involved in carbohydrate metabolism and downregulation of those involved in stele lignification, an expression pattern similar to that of sweetpotato, played key roles in the development of Y22 SRs [4, 59, 65] (Additional file 1: Figures S14 and S16; Additional file 2: Table S24); (3) the key gene BMY11 was associated with SR swelling by comparative transcriptomics and QTL mapping, and BMY11 has a similar expression pattern in sweetpotato (Additional file 1: Figure S20) . All these findings in our work could accelerate starch biosynthesis study in the genus Ipomoea, and the diploid I. trifida var. Y22 could be considered an ideal model for future studies of sweetpotato SR development.
We generated the chromosome-scale genome sequence of SR-forming diploid I. trifida var. Y22 with high heterozygosity (2.20%) by integrating whole-genome shotgun reads, single-molecule long reads (PacBio RS II) and GBS genetic maps; and this genome provides a better resolution to its genomics feature. Comparative genomics analysis showed that a whole-genome triplication event occurred in diploid Ipomoea genome, and enables a better understanding the genome evolutions of this species. We found that the key gene BMY11 (with identity 47.12% to beta-amylase 1) may contribute to starch accumulation and SR development by combining with the analysis of gene family expansion, QTL mapping, differentially gene expression profiling, morphology and structure of SRs and qPCR of BMY11. Sweetpotato SR development genes can be identified in I. trifida and these genes perform similar functions and patterns, showed that the diploid I. trifida var. Y22 could be considered an ideal model for the studies of sweetpotato SR development.
Plant materials and sequencing
Y22 (Additional file 1: Figure S1a) is a clone of diploid I. trifida seeds (CIP No.: PC98_1 (698014), female parent 2X P96124.5, male parent PC) with good SR-forming characteristics. Y25 (Additional file 1: Figure S1b) is a clone of diploid I. trifida seeds (CIP No.: 696153, female parent 2X 6.1, male parent OP) that does not form SR (Additional file 1: Figure S1). Xushu18 is a well-known cultivated sweetpotato variety in China. The plant materials were planted in the experimental greenhouse of the Institute of Biotechnology and Nuclear Technology, Sichuan Academy of Agricultural Sciences (Sichuan AAS), Chengdu, Sichuan province, China, and insect netting and yellow sticky paper were used for strict pest control. The F1 genetic population of diploid I. trifida was constructed with Y25 (female parent) and Y22 (male parent).
Y22 was used for the whole-genome sequencing and assembly. High-quality genomic DNA was isolated from the leaves using the Qiagen DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA). First, PE libraries, with insert sizes of 230 bp, 350 bp and 500 bp, were constructed using the NEBNext Ultra DNA Library Prep Kit following the manufacturer’s instructions. Then, mate-pair libraries, with insert sizes of 2 Kb, 5 Kb, 10 Kb and 20 Kb, were generated through circularization by Cre-Lox recombination . Synthetic long reads were generated by Moleculo chemistry. All the above libraries were subjected to the Illumina HiSeq 2500 platform to produce PE 2 × 125 bp reads. Second, the RS II platform with PacBio P6-C4 chemistry was used to generate single-molecule long reads.
For starch content testing and transcriptome sequencing, we took cuttings from Y22 and transplanted them in sandy soil at the Southern Experiment Station (SES) of Sichuan AAS, Sanya, Hainan province, China. The four typical stages of SR development were AR, ISR (diameter = ~ 2 mm), YSR (5–8 mm) and MSR (>20 mm). To examine these stages, the roots were sampled at approximately 25 (S0), 50 (S1), 85 (S2) and 120 (S3) days after transplantation (DAT) (Fig. 3a). Each sample consisted of three repeats. Starch content was tested according to Chinese Testing Standard NY/T11–1985. Total RNA was extracted using TRIzol reagent (Invitrogen) and treated with RNase-free DNase I (Promega, USA). The corresponding RNA-seq libraries were generated using the NEBNext Ultra RNA Library Prep Kit (NEB, USA). Then, all the libraries were sequenced by the Illumina HiSeq Xten platform.
For tag-based sequencing, the F1 population (Y25 × Y22) was planted at the SES of Sichuan AAS, Sanya, China. Genomic DNA was extracted from fresh leaves of the individual F1 plants and parents using a DNA extraction kit (Tiangen Biotech Co. Ltd., Beijing) and dissolved in 1× TE buffer (10 mM Tris-HCl and 1 mM EDTA, pH 8.0). The individuals were randomly selected from the F1 population and confirmed by simple sequence repeat (SSR) markers; the SSRs were designed by a Perl script according to the assembled Y22 genome (Additional file 1: Figure S4 and Additional file 2: Table S5). A total of 202 GBS libraries, including two parent libraries and 200 F1 individual libraries, were prepared according the reference method . Finally, PE sequencing was performed on the selected tags using the Illumina HiSeq Xten platform.
For Xushu18 transcriptome sequencing, we sampled the flowers, stems, leaves and SRs, extracted the total RNA using TRIzol reagent (Invitrogen) and treated it with RNase-free DNase I (Promega, USA). The RNA concentration was determined by Nanodrop (Thermo Scientific), and the different RNAs were mixed in equal proportions to construct the RNA-seq library. The corresponding RNA-seq libraries were generated using the NEBNext Ultra RNA Library Prep Kit (NEB, USA). Then, the libraries were sequenced on the Illumina HiSeq 2500 and PacBio RS II platforms.
Genome assembly and construction of pseudochromosomes
To obtain high-quality reads, we first filtered all the following reads: (a) reads that contained ‘N’ as more than 10% of the nucleotides; (b) reads that contained adapter sequences; (c) duplicated reads generated by PCR amplification. After quality control, all of the PE and mate-pair reads were assembled with the software Platanus  with the default parameters. Second, Illumina Moleculo synthetic long reads were used to perform gap filling with the program PBJelly  with the default parameters. Third, PacBio single-molecule long reads were used to further extend the sequence continuity. Fourth, HaploMerger  was employed to reduce the sequence redundancy caused by heterozygosity.
The assembled Y22 scaffolds were used as a reference genome to identify SNPs (single nucleotide polymorphism) in Y22, Y25 and their progeny. Variant calling was performed for all samples using the GATK  software. The parental polymorphic markers were classified into eight segregation patterns (ab × cd, ef × eg, hk × hk, lm × ll, nn × np, aa × bb, ab × cc and cc × ab). For the F1 population, segregation patterns were chosen for the genetic map . Prior to map construction, the markers with segregation distortion (p < 0.05), integrity (> 75%), or abnormal bases were filtered. The remaining markers were converted to bin markers using an in-house script. Linkage groups were constructed according to physical position with the JoinMap 4.0 software  and determined using a minimum LOD value of 5.0 and a maximum recombination of 45%. Because a large number of segregating SNP loci were involved in the present linkage analysis, the Kosambi mapping function was used to translate recombination frequencies into map distances. The final marker order of each linkage group was verified by the software program RECORD . The parents, Y25 and Y22, were sequenced with average depths of 29.11× and 24.68×, respectively, which were higher than those of the progeny (average 11.77×). In total, 489,692 and 178,112 SNP loci were detected in Y25 and Y22, respectively. The final map consisted of 6306 bin markers containing 15,526 SNPs spanning 3156.55 cM in 15 linkage groups (Additional file 1: Figure S5 and Additional file 2: Table S6). The average genetic distance between SNP markers was 0.50 cM. Then, the scaffolds were anchored to pseudochromosomes according to the locations of the markers using the constructed linkage map (Additional file 1: Figure S6).
To assess the accuracy of the assembled genome sequences, we selected the small-fragment library reads and used BWA software  to map the reads to the assembled genome. Subsequently, we calculated the mapping rate, the coverage degree and the genome depth. To assess gene structure integrity in the assembly, the transcripts were assembled by Trinity  with the parameters -ss 0.5 -jc 0, −minkmercov 2 -minglue 2. The EST sequences were aligned to the assembled genome using BLAT  with the default parameters. CEGMA (Core Eukaryotic Genes Mapping Approach)  was used to define 248 conserved genes that were also used to evaluate the completeness of gene sequences in the final assembly.
Repetitive sequence annotation methods were classified into homologous sequence alignment and ab initio prediction. Homologous sequence alignment methods were based on the repeated sequence database Repbase . We used RepeatMasker and RepeatProteinMask  to identify sequences that were similar to known repeat sequences. We also used LTR_FINDER , Piler , RepeatScout , and RepeatModeler (http://www.repeatmasker.org/RepeatModeler/) to build the de novo repeat database. Then, we used RepeatMasker  to identify repeats according to the established repeat database.
Gene prediction was based on an integration of de novo prediction, homology-based prediction and RNA-seq prediction. Gene structure de novo prediction was carried out using Augustus , GlimmerHMM , GeneScan , Geneid  and SNAP  software. Homology-based prediction included protein-based homology searches from closely related or model species. In total, we used 11 homologous species, including Arabidopsis thaliana, Solanum tuberosum, Solanum lycopersicum, Capsicum annuum, Sesamum indicum, Beta vulgaris, Vitis vinifera, Manihot esculenta, Nelumbo nucifera., Raphanus sativus and Solanum pennellii. Other evidence supporting this prediction includes the homologous EST or cDNA data that we used to align the predicted gene structure by BLAT .
The RNA-seq prediction used experimental RNA-seq data to predict genes. Based on the above prediction results and combined with the transcriptome comparison data, the gene set predicted by each method was integrated into a non-redundant system using EVidenceModeler (EVM) . We used PASA  and the transcriptome assembly results to correct the results of the EVM annotation and to add information such as UTRs and splice variants to obtain the final gene set. Putative gene functions were assigned according to the best match of the alignments using BLASTP (E-value≤1e-5) to four databases: InterPro, KEGG, Swiss-Prot and TrEMBL. We identified candidate ncRNAs in the assembled I. trifida genome by comparing them with known ncRNA libraries or by structural prediction.
The protein sequences of seven plant species, including A. thaliana, M. esculenta, I. nil, C. canephora, O. sativa, S. lycopersicum and S. tuberosum, were downloaded. Then, the gene set of each species was filtered as follows: (a) when multiple transcripts were present in one gene, only the longest transcript in the coding region was used for further analysis; (b) genes encoding proteins of less than 30 amino acids were filtered out. Then, we evaluated the similarity between the protein sequences of all the species through BLASTP with the E-value 1e-5. The protein datasets of all seven species and the Y22 protein dataset were clustered into paralogous and orthologous datasets using the program OrthoMCL  with the inflation parameter 1.5.
After gene family clustering, we aligned all 1930 single-copy gene protein sequences by MUSCLE  and combined all the alignment results to create an alignment supermatrix. Then, an eight-species phylogenetic tree was constructed using RAxML  with the maximum likelihood method, and the number of bootstrap samples was set to 100. O. sativa was designated as an outgroup of the phylogenetic tree. Using the single-copy gene families, the divergence time estimates were made using MCMCtree in the PAML [90, 91] package. The MCMCtree running parameters were as follows: burn-in: 5,000,000, sample-number: 1,000,000, sample-frequency: 50. The time correction points were O. sativa and A. thaliana, 140–200 MYA; A. thaliana and S. tuberosum, 101–156 MYA; C. canephora and S. lycopersicum, 83–89 MYA; S. tuberosum and S. lycopersicum, 7.2–7.4 MYA. All time correction points were derived from the TimeTree website (http://www.timetree.org/).
Genome synteny and whole-genome duplication (WGD)
BLASTP was implemented with an E-value of 1e-5 between different species, including I. trifida to I. trifida, S. tuberosum to S. tuberosum, I. nil to I. nil, C. canephora to C. canephora, I. trifida to S. tuberosum, I. trifida to I. nil and I. nil to S. tuberosum. MCscan  was used to search for collinear segments within each comparison pair. Then, the four-fold transversion (4dTv) ratio for each gene pair in the block of each comparison group was calculated from concatenated nucleotide alignments with MUSCLE . WGD was estimated using the 4dTv ratio distribution. The synteny blocks between chromosomes were visualized by Circos.
Gene expression profiling of roots
After sequencing, raw RNA reads were filtered and trimmed to yield clean reads. All RNA reads were mapped to the Y22 genome by TopHat2  with the following parameters: --max-intron-length 500,000, −-read-gap-length 10, −-read-edit-dist 15, −-max-insertion-length 5 and --max-deletion-length 5. The read count of each sample was calculated by HTSeq . Then, RPKM was calculated by its definition (reads per kilobase per million mapped reads). Differentially expressed genes of different comparison groups were defined using DESeq  with ajusted Pvalue(Padj) < 0.05.
Definition and identification of starch metabolism genes in Y22
We defined and identified starch metabolism genes using a combined method. Starch metabolism genes were defined as upstream or downstream genes involved in starch synthesis or in the starch synthesis pathway. The key enzyme genes in this study were identified by the following method: first, the protein sequences of these genes in A. thaliana were downloaded from NCBI (https://www.ncbi.nlm.nih.gov/). The candidate genes were identified by BLASTP with an E-value of 1e-5. Second, the candidate genes were filtered by identity, and overlaps were removed. Protein domains of homologous species and Y22 candidate genes were predicted by PFAM (http://pfam.xfam.org/). Only the candidate genes with the same protein domains as those in homologous species were kept. Transcription factors were identified by iTAK software .
Phylogenetic reconstruction of beta-amylase (BMY) gene family
For the identification of BMY gene family members in I. trifida, I.nil genome and other genomes, the protein sets in these species were aligned against to known beta-amylase proteins from A. thaliana by BLASTp. For each protein, only the alignment with an E-value of 1e-5 and identity > 50% was retained. These genes with retained alignments to responding A. thaliana proteins were further required to possess the PF01373 PFAM domains and then were considered as initial candidate BMY genes in these genomes. In order to eliminate the inaccuracy of this artificial selection condition (identity > 50% might be too strict for I. trifida and I.nil), we redo the BMY gene identification for these species by replacing the known A. thaliana beta-amylase proteins with their respective protein sequences of initially detected BMY genes and then follow the same process. Then, all of retained BMY genes were aligned to each other using MUSCLE and the phylogenetic tree for BMY genes was constructed using RAxML with the maximum likelihood method.
The whole roots of 200 F1 individuals were dug up and washed at 130 DAT (Additional file 1: Figure S17). The whole roots were cut into 3 mm slices or 3 cm pieces and dried at 80 °C to a stable weight. The whole roots were comminuted until all fragments would pass through a 0.17 mm mesh sieve. Then, all the samples were sent to the Analysis and Determination Centre of Sichuan AAS for starch testing according to Chinese Testing Standard NY/T11–1985. The starch contents of the dry roots ranged from 13.91 to 47.32% (Additional file 1: Figure S18). The root starch content data were constructed into a data matrix, which was used for QTL analysis by the Windows QTL Cartographer V2.5 software . Phenotypic variance and QTL detection were calculated with a multiple QTL mapping model (MQM). The phenotypic threshold of LOD scores for evaluating the statistical significance of QTL effects was determined using 1000 permutations and a threshold of p = 0.05. LOD values 3.0 and above were considered to indicate QTL loci.
The total RNA isolated by the RNAprep Pure Plant Kit (Tiangen Biotech, Beijing) was used to synthesize the first-strand cDNA by oligo (dT)18 and random hexamer primers with the ReverTra Ace qPCR RT Master Mix (Toyobo, Japan). Quantitative real-time reverse transcription PCR (qPCR) was carried out using diluted cDNA and SYBR® Green Real-time PCR Master Mix (Toyobo, Japan) in the Bio-Rad iCycler MyiQ Real-Time PCR System. The qPCR cycle profile included one cycle of 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s, 58 °C for 30 s, and a final melt curve profile (65–95 °C). The changes in gene expression were calculated relative to Actin using the 2−ΔΔCt method (Additional file 2: Table S28). Each data point represents the average of three repeats.
Root anatomic structure
The fresh roots were fixed in 4% neutral-buffered formalin for 48 h and then dehydrated and embedded in paraffin wax. Eight-micrometre sections were cut and placed on silane-coated slides to fix the samples. After drying at 60 °C, the sections were dewaxed and rehydrated. The sections were prepared for starch staining and Safranin-Fast Green staining. For starch staining, the sections were stained with I-KI. For Safranin-Fast Green staining, the sections were stained with the Safranin solution for 60–120 min and destained with gradient alcohol. The sections were placed into Fast Green solution for 30–60 s, dehydrated and mounted with resin. All the images were collected using a 3DHISTECH scanner (Pannoramic MIDI) and the software CaseViewer was used to view the image data (https://www.3dhistech.com/caseviewer).
We are very grateful to Prof. Yan Pei and Prof. Suqiong Xiang (Southwest University, China) for share the I. trifida seeds, and very grateful to Prof. Jianmin Wan (Chinese Academy of Agricultural Sciences, China), associate Prof. Hangxing Ren (Chongqing Academy of Animal Sciences, China) and Prof. Haoyang Cai (School of Life Sciences, Sichuan University, China) for helpful comments.
This work was supported by the National Natural Science Foundation of China (31471448, 31101119, 91417305, 31470342), the earmarked fund for China Agriculture Research System (CARS-11-B04, CARS-10-B5), Sichuan Financial Innovation Project, China (2015LWJJ-001, 2016QNJJ-001, 2016ZYPZ-005, 2015KXJJ-002, 2015JSCX-007), Sichuan Breeding Cooperation Project, China (2016NYZ0032, 2016NZ0103) and the Major State Basic Research Development Program of China (973 Program) (2015CB150100).
Availability of data and materials
The datasets generated and analyzed during the current study are available in the database of National Center for Biotechnology Information (NCBI), under the BioProject number: PRJNA362521. The plant materials are available from the corresponding author. The datasets generated and analyzed from GBS and phenotype of F1 population during the current study are not publicly available due these data in the process of doing another study but are available from the corresponding author on reasonable request.
ML and HL designed the work. ML managed the project. YL, HY, AL, WS and SQ prepared plants, constructed genetic populations and tested phenotypes. WT, FL, SY, and HZ isolated nucleic acids and identified F1 populations. LZ and JL performed sequencing. JF, ZP, ZW and ZZ analyzed the genetic map and QTLs. WT, WX, ZP and CW assembled and analyzed genome sequence. WX, MP, CD and KW performed gene prediction, functional annotation and comparative genomics analysis. ML, MP, WX and YZ performed comparative transcriptome analysis. CZ, YL, ZP and HZ performed gene family analysis, gene expression analysis. ML wrote the manuscript, HL and JZ revised the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 1.Woolfe JA. Sweetpotato: an untapped food resource. Cambridge: Cambridge University Press; 1992.Google Scholar
- 20.Lu S, Liu Q, Li W. Sweetpotato breeding. Beijing: Chinese Agricultural Press; 1998.Google Scholar
- 41.Vanneste K, Maere S, Van de Peer Y. Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. Philos Trans R Soc Lond Ser B Biol Sci. 2014;369(1648). https://doi.org/10.1098/rstb.2013.0353.
- 46.Noh SA, Lee HS, Huh EJ, Huh GH, Paek KH, Shin JS, Bae JM. SRD1 is involved in the auxin-mediated initial thickening growth of storage root by enhancing proliferation of metaxylem and cambium cells in sweetpotato (Ipomoea batatas). J Exp Bot. 2010;61(5):1337–49.PubMedPubMedCentralCrossRefGoogle Scholar
- 48.Goujon T, Minic Z, El Amrani A, Lerouxel O, Aletti E, Lapierre C, Joseleau JP, Jouanin L. AtBXL1, a novel higher plant (Arabidopsis thaliana) putative beta-xylosidase gene, is involved in secondary cell wall metabolism and plant development. Plant J. 2003;33(4):677–90.PubMedCrossRefPubMedCentralGoogle Scholar
- 53.McGregor CE, He L, Ali RM, Sosinski B, Jankowicz J, Burg K, LaBonte DR. The effect of replicate number and image analysis method on sweetpotato [Ipomoea batatas (L.) Lam.] cDNA microarray results. Plant Mol. Biol. Rep. 2005;23: 367-381.Google Scholar
- 55.Dyson BC, Allwood JW, Feil R, Xu Y, Miller M, Bowsher CG, Goodacre R, Lunn JE, Johnson GN. Acclimation of metabolism to light in Arabidopsis thaliana: the glucose 6-phosphate/phosphate translocator GPT2 directs metabolic acclimation. Plant Cell Environ. 2015;38(7):1404–17.PubMedPubMedCentralCrossRefGoogle Scholar
- 61.Qin Y, Wang Y, Duan C-Q, Zhang D-P. β-Amylase is predominantly localized to plastids in the developing tuberous root of sweet potato. Acta Bot Sin. 2003;45(5):581–8.Google Scholar
- 62.Kyndt T, Quispe D, Zhai H, Jarret R, Ghislain M, Liu Q, Gheysen G, Kreuze JF. The genome of cultivated sweet potato contains agrobacterium T-DNAs with expressed genes: an example of a naturally transgenic food crop. Proc Natl Acad Sci U S A. 2015;112(18):5844–9.PubMedPubMedCentralCrossRefGoogle Scholar
- 64.Zimin AV, Puiu D, Luo MC, Zhu T, Koren S, Marcais G, Yorke JA, Dvorak J, Salzberg SL. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res. 2017;27(5):787–92.PubMedPubMedCentralCrossRefGoogle Scholar
- 65.Zhang K, Wu Z, Tang D, Luo K, Lu H, Liu Y, Dong J, Wang X, Lv C, Wang J, et al. Comparative transcriptome analysis reveals critical function of sucrose metabolism related-enzymes in starch accumulation in the storage root of sweet potato. Front Plant Sci. 2017;8:914.PubMedPubMedCentralCrossRefGoogle Scholar
- 66.Hardy S, Kitamura M, Harris-Stansil T, Dai Y, Phipps ML. Construction of adenovirus vectors through Cre-lox recombination. J Virol. 1997;7(3):1842–9.Google Scholar
- 96.Zheng Y, Jiao C, Sun H, Rosli HG, Pombo MA, Zhang P, Banf M, Dai X, Martin GB, Giovannoni JJ, et al. iTAK: a program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases. Mol Plant. 2016;9(12):1667–70.PubMedCrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.