Allele-aware chromosome-level genome assembly of the autohexaploid Diospyros kaki Thunb

Li, Huawei; Sun, Peng; Wang, Yiru; Zhang, Zhongren; Yang, Jun; Suo, Yujing; Han, Weijuan; Diao, Songfeng; Li, Fangdong; Fu, Jianmin

doi:10.1038/s41597-023-02175-2

Allele-aware chromosome-level genome assembly of the autohexaploid Diospyros kaki Thunb

Data Descriptor
Open access
Published: 11 May 2023

Volume 10, article number 270, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Allele-aware chromosome-level genome assembly of the autohexaploid Diospyros kaki Thunb

Download PDF

Huawei Li^1,2,3,
Peng Sun^1,2,
Yiru Wang^1,2,
Zhongren Zhang ORCID: orcid.org/0000-0002-4996-7912⁴,
Jun Yang⁵,
Yujing Suo^1,2,
Weijuan Han^1,2,
Songfeng Diao^1,2,
Fangdong Li^1,2 &
…
Jianmin Fu ORCID: orcid.org/0000-0002-7368-0751^1,2,6

1543 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Artificially improving persimmon (Diospyros kaki Thunb.), one of the most important fruit trees, remains challenging owing to the lack of reference genomes. In this study, we generated an allele-aware chromosome-level genome assembly for the autohexaploid persimmon ‘Xiaoguotianshi’ (Chinese-PCNA type) using PacBio CCS and Hi-C technology. The final assembly contained 4.52 Gb, with a contig N50 value of 5.28 Mb and scaffold N50 value of 44.01 Mb, of which 4.06 Gb (89.87%) of the assembly were anchored onto 90 chromosome-level pseudomolecules comprising 15 homologous groups with 6 allelic chromosomes in each. A total of 153,288 protein-coding genes were predicted, of which 98.60% were functionally annotated. Repetitive sequences accounted for 64.02% of the genome; and 110,480 rRNAs, 12,297 tRNAs, 1,483 miRNAs, and 3,510 snRNA genes were also identified. This genome assembly fills the knowledge gap in the autohexaploid persimmon genome, which is conducive in the study on the regulatory mechanisms underlying the major economically advantageous traits of persimmons and promoting breeding programs.

Chromosome-level genome assembly of Solanum pimpinellifolium

Article Open access 04 June 2024

Chromosome-level assembly of the synthetic hexaploid wheat-derived cultivar Chuanmai 104

Article Open access 22 June 2024

Chromosome-level assembly and analysis of Camelina neglecta: a novel diploid model for Camelina biotechnology research

Article Open access 31 January 2024

Background & Summary

Persimmon (Diospyros kaki Thunb.), a member of the Ebenaceae, is an important fruit tree species that originated in East Asia and was successively introduced to Europe and America in the 18th–20th centuries¹. The cultivated area of persimmon had reached 1.01 million ha, with a total yield of 4.24 million tons globally in 2020 (www.fao.org). The persimmon industry is negatively affected by labor-intensive artificial de-astringency treatment, transportation difficulties, short shelf life, and limited processing².

To enhance the persimmon industry, superior cultivars suitable for fresh-eating and processing are urgently needed. Current persimmon cultivars are generally classified into either pollination-constant non-astringent (PCNA) or non-PCNA^3,4, based on the natural de-astringency capacity of fruits at the commercial maturity stage (fruits with mature peel color but not soft). The PCNA contains the Japanese-type PCNA (JPCNA) and Chinese-type PCNA (CPCNA), which are edible without any artificial de-astringency treatment and have high market valuable⁵. The non-PCNA includes pollination-variant non-astringent (PVNA), pollination-constant astringent (PCA), and pollination-variant astringent (PVA)¹.

In the PCNA breeding program, inbreeding repression limits the efficiency due to the high genetic similarity among the JPCNA cultivars^6,7. Modern molecular refinement breeding methods, including CRISPR/Cas9, are potentially effective for breeding new superior cultivars. The molecular mechanisms regulating crucial traits should first be determined accurately when using these new methods. The natural de-astringency capacity of JPCNA is controlled by a recessive allele at a single locus of ASTRINGENCY (AST)^8,9,10. A previous study suggested that the natural de-astringency capacity of CPCNA is controlled by dominant alleles^11,12, indicating that the CPCNA type may be more effective for breeding new superior PCNA cultivars, although the precise natural de-astringency mechanism of CPCNAs remains unknown.

The most common persimmon cultivar is hexaploidy (2n = 6x = 90). Owing to a lack in genomic data, the principal molecular mechanisms underlying the natural de-astringency of CPCNA and other crucial traits, including fruit size, shape, and flavor, of hexaploidy persimmon remains challenging to understand. Fortunately, the publication of genomes of hexaploid persimmon close relatives Diospyros lotus (2n = 2x = 30) and Diospyros oleifera (2n = 2x = 30) provided help for the study of persimmon biology^13,14,15,16. Hexaploid and diploid persimmons are different species with discrepant genomic information. Taking the diploid persimmon genome as a reference, data on the regulation mechanism of some traits in hexaploid persimmon are limited, which contributes to the reduction of breeding efficiency; hence, the desperate need for the hexaploid persimmon genome assembly, that will help in both basic and applied research.

The assembly of polyploid genomes is a major technical challenge hindered by repeat content, transposable elements, high heterozygosity, and gene content¹⁷. The assembly of autopolyploids with smaller genetic distances is more susceptible to the misassignment of sub-genome fragments than allopolyploids. With the advancement of sequencing and assembly technology, the autopolyploid genomes of some plants have been reported, such as Ipomoea batatas¹⁸, Saccharum spontaneum^19,20, Medicago sativa²¹, and Solanum tuberosum²², which provide a reference for current genome assemblies.

This study uses PacBio circular consensus sequencing (CCS) and high-throughput chromosome conformation capture (Hi-C) technologies to generate an allele-aware chromosome-level genome assembly for D. kaki. The current genomic information will provide a molecular platform for future research and elaborate breeding programs.

Methods

Sampling and sequencing

‘Xiaoguotianshi’ persimmon is one of the five varieties of the CPCNA persimmon ‘Luotiantianshi’ with a good taste and higher soluble solids content than other CPCNA persimmons. The young leaves of D. kaki ‘Xiaoguotianshi’ and D. lotus (wild germplasm) were collected from the Persimmon Germplasm Resources Nursery of Research Institute of Non-timber Forestry, Chinese Academy of Forestry (Yuanyang County, Henan Province, China, 34°55′18″–34 °56′27″N, 113°46′14″–113°47′35″E).

Genomic DNA was extracted from the young leaf tissue of D. kaki using a DNAsecure Plant Kit (TIANGEN, Beijing, China). Sequencing libraries with insert sizes of 350 bp were constructed using a library construction kit, following manufacturer’s instructions (Illumina, San Diego, CA, USA). The libraries were sequenced using the Illumina HiSeq X platform.

For the PacBio library, the DNA was used to construct 15-kb-insert-size SMRTbell libraries using the SMRTbell Express Template Prep Kit 2.0, following manufacturer’s instructions (PacBio, CA). Then, libraries were sequenced using PacBio Sequel II, and HiFi reads were obtained using the CCS tool (https://github.com/PacificBiosciences/ccs; v6.0.0) by setting ‘min-passes = 3, min-rq = 0.99’.

For the Hi-C library, formaldehyde was used to fix the chromatin. Leaf cells were lysed, and HindIII endonuclease was used to digest the fixed chromatin. The 5 overhangs of the DNA were recovered with biotin-labeled nucleotides, and the resulting blunt ends were ligated to each other using DNA ligase. Proteins were removed with protease to release DNA molecules from the crosslinks. The purified DNA was sheared into 350-bp fragments and ligated to adaptors²³. The biotin-labeled fragments were extracted using streptavidin beads; following PCR enrichment, the libraries were sequenced on an Illumina HiSeq X instrument.

For RNA sequencing, total RNA was extracted from the leaf, stem and fruit tissues using an RNAprep Pure Plant Kit (TIANGEN, Beijing, China), and genomic DNA contaminants were removed using RNase-Free DNase I (TIANGEN, Beijing, China). The RNA integrity was evaluated using 1.0% agarose gel stained with ethidium bromide (EB), while its quality and quantity were assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies, CA, USA). The integrated RNA was then used for cDNA library construction, Illumina and PacBio sequencing. The cDNA libraries were constructed using the NEBNext Ultra RNA Library Prep Kit (NEB, MA, USA) for Illumina and SMRTbell Express Template Prep Kit 2.0 (PacBio, CA, USA) for PacBio, following the manufacturers’ instructions. Prepared libraries were sequenced on the Illumina HiSeq X and PacBio Sequel platform.

Genome size estimation

K-mer frequency analysis was used to determine genome characteristics²⁴. The genome size of D. kaki was calculated based on k-mer (k = 27) statistics using the modified Lander–Waterman algorithm. The total length of the sequence reads was divided by the sequencing depth; the peak value of the frequency curve represented the overall sequencing depth. We estimated the genome size using the following formula: (N × (L−K + 1) − B)/D = G, where N is the total number of the sequence reads, L is the average length of the sequence reads, K is the K-mer length (27 bp)²⁵, B is the total number of low-frequency K-mers (frequency ≤ 1 in this analysis), G is the genome size, and D is the overall depth estimated via the K-mer distribution. Heterozygosity was reflected in the distribution of the number of distinct k-mers (k = 27). On the basis of a total of 222,144,314,592 27-mer and a peak 27-mer depth of 49, the estimated genome size was 4533.56 Mb (Fig. 1).

The genome size of the sequenced individuals was confirmed using flow cytometry. Approximately 20–50 mg of fresh leaves of D. kaki and D. lotus were chopped using a razor blade in 1 ml of LB01 buffer (15 mM Tris, 2 mM Na2EDTA, 0.5 mM spermine tetrahydrochloride, 80 mM KCl, 20 mM NaCl, 0.1% (vol/vol) Triton X-100) adjusted to pH 7.5 with 1 M NaOH and b-mercaptoethanol to 15 mM. Cell culture was collected by gentle pipetting and filtered through a 400-mesh nylon strainer. The samples were stained with 100 μg/ml PI and 100 μg/ml RNase in an ice bath for 10 min before analysis using a MoFlo-XDP flow cytometer (Beckman Coulter Inc., USA).

Nuclear fluorescence was measured using a MoFlo-XDP high-speed flow cytometer with a 70 μm ceramic nozzle at a sheath pressure of 60 psi. PI fluorescence was detected with a solid-state laser (488 nm) and a 625-/26-nm HQ band-pass filter. The FL3-Height/SSC-Height gate method eliminated debris, cell fragments, and dead cells. Single and double cells were discriminated using FL3-Height /FL3-Area. The final results showed that the genome size of D. kaki was 4.61 Gb (Fig. 2).

Genome assembly

In total, 179.09 Gb PacBio HiFi long reads (8 SMRT cell; 39.53X coverage) and 445.72 Gb Hi-C paired-end reads (98.39X coverage) were obtained (Table 1). D. kaki genome was assembled with Hifisam (v0.13-r308)²⁶ using PacBio HiFi reads with default parameter settings. After initial assembly, Hi-C sequencing data were aligned to the assembled contigs using the Burrows-Wheeler Aligner (BWA) mem option²⁷, while pseudo-chromosomes were constructed based on ALLHIC (v0.9.8)²⁰. We configured the parameter setting -K 90--minREs 50--maxlinkdensity 3--NonInformativeRabio 2. Finally, Hi-C scaffolding yielded 90 chromosome-length scaffolds. The final assembly contained 4.52 Gb with a contig N50 value of 5.28 Mb and scaffold N50 value of 44.01 Mb, respectively; 4.06 Gb (89.87%) of the assembly was anchored onto 90 chromosome-level pseudomolecules comprising 15 homologous groups, with six allelic chromosomes in each. The assignment to genome haplotypes was based on chromosome length (Tables 2, 3; Figs. 3, 4).

Table 1 Statistics of data for genome assemblies of D.kaki.

Full size table

Table 2 Summary of D. kaki genome assembly.

Full size table

Table 3 Statistics of chromosome length in D. kaki genome.

Full size table

Repetitive sequence annotation

Transposable elements (TEs) in the D. kaki genome were identified by combining de novo- and homology-based approaches. For the de novo-based approach, we used RepeatScout (v1.0.5; https://github.com/mmcco/RepeatScout)²⁸, RepeatModeler (http://www.repeatmasker.org/RepeatModeler.html), and LTR_FINDER (v1.0.7; https://github.com/xzhub/LTR_Finder)²⁹ to build a de novo repeat library. For the homology-based approach, we used RepeatMasker (v3.3.0; http://www.repeatmasker.org/) against the Repbase TE library (http://www.girinst.org/server/RepBase/)³⁰ with score cut-off of 225 and RepeatProteinMask (v4.0.5; http://www.repeatmasker.org/) against the TE protein database³¹ with a p-value cut-off of 1e-4. Tandem repeats were identified using Tandem Repeats Finder (v4.0.7; https://tandem.bu.edu/trf/)³² with parameters ‘matching weight: 2, mismatching penalty: 7, indel penalty: 7, minimum alignment score:50, maximum period size: 2000’. Ultimately, a total of 2.90 Gb of repetitive elements occupying 64.02% of the D. kaki genome were annotated (Table 4). Most of the repeats were long terminal repeats (LTRs) (51.28% of the genome; Table 5). The DNA, LINE, and SINE classes accounted for 5.93%, 2.66%, and 0.03% of the genome, respectively (Table 5).

Table 4 Summary of repeat sequences in D. kaki genome.

Full size table

Table 5 Summary of TE sequences in D. kaki genome.

Full size table

Gene prediction and annotation

Homology-based, de novo, and transcriptome-based predictions were used to predict protein-coding genes in the D. kaki genome. Homologous proteins from five plant genomes (Arabidopsis thaliana, D. oleifera, D. lotus, Actinidia chinensis, and Camellia sinensis) were downloaded from Ensembl Plants (http://plants.ensembl.org/index.html) and NCBI (https://www.ncbi.nlm.nih.gov/). The protein sequences were then aligned to the D. kaki genome assembly using tblastN³³, with an E-value cut-off of 1e-5. The BLAST hits were conjoined using a Solar software³⁴. GeneWise (https://www.ebi.ac.uk/Tools/psa/genewise) was used to predict the exact gene structure of the corresponding genomic regions in each BLAST hit (Homo-set)³⁵. The published RNA-seq data of female flowers and fruit at different developmental stages, and 0.33 Gb new sequencing RNA-seq data of the young leaves and stems of ‘Xiaoguotianshi’ (three biological replicates) were mapped to the D. kaki genome using HISAT2 (https://daehwankimlab.github.io/hisat2/, v2.2.1)³⁶ and Cufflinks (http://cole-trapnell-lab.github.io/cufflinks/, v2.1.1)³⁷ (Table 6). A total of 70.54 Gb Iso-seq data from PacBio transcriptome sequencing of mixed samples containing the young leaves, stems, and fruits of ‘Xiaoguotianshi’ (three biological replicates) were used to create several pseudo-ESTs. These pseudo-ESTs were mapped to the assembly, and gene models were predicted using PASA (http://pasapipeline.github.io/)³⁸ (Table 6). This gene set was denoted as the PASA-T-set and used to train ab initio gene prediction programs. Five ab initio gene prediction programs, namely, Augustus (http://augustus.gobics.de/, v3.2.3), GENSCAN (http://genes.mit.edu/GENSCAN.html, v1.0), GlimmerHMM (http://ccb.jhu.edu/software/glimmerhmm/, v3.0.1), geneid (http://genome.crg.es/software/geneid/), and SNAP (http://korflab.ucdavis.edu/software.html) were used to predict coding regions in the repeat-masked genome^39,40,41,42. Gene model evidence from homo-set, cufflinks-set, PASA-T-set, and ab initio programs were combined using EVidenceModeler (EVM) (http://evidencemodeler.sourceforge.net/) into a non-redundant set of gene structures⁴³.

Table 6 Statistics of RNA-seq and Iso-seq.

Full size table

Functional annotation of protein-coding genes was performed using BLASTP (E-value: 1e-05) against two integrated protein sequence databases⁴⁴: SwissProt (http://web.expasy.org/docs/swiss-prot_guideline.html) and NR (ftp://ftp.ncbi.nih.gov/blast/db/). Protein domains were annotated by searching against the InterPro (http://www.ebi.ac.uk/interpro/, v32.0) and Pfam (https://pfam-legacy.xfam.org/.org/, v27.0) databases using InterProScan (v4.8) and HMMER (http://www.hmmer.org/, v3.1), respectively^45,46,47,48. Gene ontology (GO, http://www.geneontology.org/page/go-database) terms for each gene were obtained from the corresponding InterPro or Pfam entries. The pathways in which the genes might be involved were assigned using BLAST against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.kegg.jp/kegg/kegg1.html, release 53), with an E-value cut-off of 1e-05. Overall, a total of 153,288 protein-coding genes were predicted with an average sequence length of 7,397.94 bp and an average CDS length of 1,153.82 bp (Table 7). Of these, 135,446 genes are anchored to 90 chromosomes (Table 8). On average, each predicted gene contained 5.01 exons with an average sequence length of 230.33 bp (Table 7). 98.60% of the genes were functionally annotated via similarity searches against homologous sequences and protein domains (Table 9).

Table 7 Summary of gene structure prediction in D. kaki genome.

Full size table

Table 8 Statistics of chromosome gene number in D. kaki genome.

Full size table

Table 9 Statistics of gene function annotation in D. kaki genome.

Full size table

tRNA genes were identified using the tRNAscan-SE software⁴⁹. The rRNA fragments were predicted by aligning the rRNA sequences using BlastN at an E-value of 1e-10. The miRNA and snRNA genes were predicted using the INFERNAL software⁵⁰ against the Rfam database (release 9.1)⁵¹. As a result, 110,480 rRNA, 12,297 tRNA, 1,483 miRNA, and 3,510 snRNA genes were annotated (Table 10).

Table 10 Statistics of non-coding RNA in D. kaki genome.

Full size table

Data Records

Raw data of genome sequencing and transcriptome sequencing of D. kaki are deposited in the NCBI SRA database under BioProject ID PRJNA810977. The SRA accession number of PacBio HiFi sequencing data are SRR18500470⁵², SRR18500471⁵³, SRR18500472⁵⁴, SRR18500473⁵⁵ SRR18500474⁵⁶, SRR18500475⁵⁷, SRR18500476⁵⁸, and SRR18500477⁵⁹. The SRA accession number of Hi-C sequencing data are SRR18500481⁶⁰, SRR18500482⁶¹, SRR18500483⁶², SRR18500484⁶³, SRR18500485⁶⁴, SRR18500486⁶⁵, SRR18500487⁶⁶ and SRR18500488⁶⁷. The SRA accession number of Illumina sequencing data are SRR18500479⁶⁸ and SRR18500480⁶⁹. The SRA accession number of Iso-seq data SRA accession number is SRR18500463⁷⁰. The SRA accession number of some RNA-seq data are SRR18500464⁷¹, SRR18500465⁷², SRR18500466⁷³, SRR18500478⁷⁴, SRR18500489⁷⁵, SRR18500490⁷⁶ and SRP151715⁷⁷. The others RNA-seq data have been deposited in the NCBI SRA database under the SRR16371984⁷⁸, SRR16371985⁷⁹, SRR16371986⁸⁰, SRR16371987⁸¹, SRR16371988⁸², SRR16371989⁸³, SRR16371990⁸⁴, SRR16371991⁸⁵, SRR16371992⁸⁶, SRR16371993⁸⁷, SRR16371994⁸⁸, SRR16371995⁸⁹, SRR16371996⁹⁰, SRR16371997⁹¹ and SRR16371997⁹², which is associated with the Bioproject ID PRJNA771936. The assembled genome sequence has been deposited at GenBank with accession number JAQSGO000000000⁹³. Other data, such as gene structure annotation, predicted CDS and protein sequences, annotation of TEs, tandem repeat sequences, tRNA genes, miRNA genes, snRNA genes, and rRNA genes, are available at FigShare database⁹⁴.

Technical Validation

Assessment of the completeness of the genome assembly using CEGMA indicated a 95.56% (Haplotype: DkaA 89.92%; DkaB 90.73%; DkaC 91.13%; DkaD 86.69%; DkaE 89.11%; DkaF 86.69%) coverage of the conserved core eukaryotic genes, while the BUSCO (v5.2.2; embryophyta odb10 database)⁹⁵ results indicated that the genome and gene set was 99.50% (Haplotype: DkaA 92.70%; DkaB 93.50%; DkaC 92.10%; DkaD 90.10%; DkaE 90.20%; DkaF 84.60%) and 97.50 (Haplotype: DkaA 87.10%; DkaB86.60%; DkaC 86.10%; DkaD 83.90%; DkaE 84.60%; DkaF 79.80%) complete, respectively (Tables 11, 12), showing that the individual haplotypes lack genes present elsewhere in the genome. Additionally, 99.86% (Haplotype: DkaA 94.88%; DkaB 94.59%; DkaC 93.70%; DkaD 93.94%; DkaE 92.97%; DkaF 90.09%) of the high-quality short reads were mapped back to the assembly (Table 13). All in all, these results of these assessments indicate to us that the D. kaki genome assembly is complete and high quality.

Table 11 Assessment of the completeness of the genome assembly.

Full size table

Table 12 Assessment of the completeness of the gene set.

Full size table

Table 13 Coverage statistics of D. kaki genome.

Full size table

Inter-genomic comparison analysis revealed a distinct 6-to-1 syntenic relationship between D. kaki and D. oleifera (Fig. 5), which further supported the high quality of the D. kaki assembly.

Code availability

All software used in this work are in the public domain, with parameters described in the Methods section. The commands used in the processing were all executed according to the manuals and protocols of the corresponding bioinformatics software.

References

Yonemori, K., Sugiura, A. & Yamada, M. Persimmon Genetics and Breeding. Plant. Breed. Rev. 19, 191–225 (2000).
Google Scholar
Itamura, H., Zheng, Q. & Akaura, K. Industry and research on persimmon in Japan. III. Int. Sym. Persimmon. 685, 37–44 (2004).
Google Scholar
Akagi, T., Katayama-Ikegami, A. & Yonemori, K. Proanthocyanidin biosynthesis of persimmon (Diospyros kaki Thunb.) fruit. Sci. Hortic. 130, 373–380 (2011).
Article CAS Google Scholar
Xu, J., Zhang, Q., Xu, L., Guo, D. & Luo, Z. Recent developments in deastringency mechanism of persimmon fruit. Acta. Hortic. Sinica. 43, 1653 (2016).
Google Scholar
Sato, A. & Yamada, M. Persimmon breeding in Japan for pollination-constant non-astringent (PCNA) type with marker-assisted selection. Breeding. Sci. 66, 60–68 (2016).
Article Google Scholar
Yamada, M., Yamane, H. & Ukai, Y. Genetic analysis of Japanese persimmon fruit weight. J. Am. Soc. Hortic. Sci. 119, 1298–1302 (1994).
Article Google Scholar
Pei, X., Zhang, Q., Guo, D., Liu, J. & Luo, Z. Development of genetic improvement in Chinese PCNA persimmon. J. Fruit. Sci. 32, 313–321 (2015).
CAS Google Scholar
Akagi, T., Tao, R., Tsujimoto, T., Kono, A. & Yonemori, K. Fine genotyping of a highly polymorphic ASTRINGENCY-linked locus reveals variable hexasomic inheritance in persimmon (Diospyros kaki Thunb.) cultivars. Tree. Genet. Genomes. 8, 195–204 (2012).
Article Google Scholar
Ikeda, I., Yamada, M., Kurihara, A. & Nishida, T. Inheritance of astringency in Japanese persimmon. J. Jpn. Soc. Hortic. Sci. 54, 39–45 (1985).
Article Google Scholar
Yamada, M. & Sato, A. Segregation for fruit astringency type in progenies derived from crosses of ‘Nishimurawase’× pollination constant non-astringent genotypes in oriental persimmon (Diospyros kaki Thunb.). Sci. Hortic. 92, 107–111 (2002).
Article Google Scholar
Ikegami, A. et al. Segregations of astringent progenies in the F1 populations derived from crosses between a chinese pollination-constant nonastringent (PCNA) ‘Luo tian tian shi’, and japanese PCNA and pollination-constant astringent (PCA) cultivars of Japanese origin. HortScience. 41, 561–563 (2006).
Article Google Scholar
Ikegami, A., Yonemori, K., Sugiura, A., Sato, A. & Yamada, M. Segregation of astringency in F1 progenies derived from crosses between pollination-constant, nonastringent persimmon cultivars. HortScience. 39, 371–374 (2004).
Article Google Scholar
Sun, P. et al. Genomics-based discrimination of 2n gamete formation mechanisms in polyploids: a case study in nonaploid Diospyros kaki ‘Akiou’. G3-Genes. Genom. Genet. 11, jkab188 (2021).
Google Scholar
Akagi, T. et al. The persimmon genome reveals clues to the evolution of a lineage-specific sex determination system in plants. PLoS. Genet. 16, e1008566 (2020).
Article CAS PubMed PubMed Central Google Scholar
Suo, Y. et al. A high-quality chromosomal genome assembly of Diospyros oleifera Cheng. GigaScience. 9, giz164 (2020).
Article PubMed PubMed Central Google Scholar
Zhu, Q. et al. The persimmon (Diospyros oleifera Cheng) genome provides new insights into the inheritance of astringency and ancestral evolution. Hortic. Res. 6, 1–15 (2019).
Article Google Scholar
Kyriakidou, M., Tai, H. H., Anglin, N. L., Ellis, D. & Strömvik, M. V. Current strategies of polyploid plant genome sequence assembly. Front. Plant. Sci. 9, 1660 (2018).
Article PubMed Central Google Scholar
Yang, J. et al. Haplotype-resolved sweet potato genome traces back its hexaploidization history. Nat. Plants. 3, 696–703 (2017).
Article CAS PubMed Google Scholar
Zhang, J. et al. Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L. Nat. Genet. 50, 1565–1573 (2018).
Article CAS PubMed Google Scholar
Zhang, X., Zhang, S., Zhao, Q., Ming, R. & Tang, H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants. 5, 833–845 (2019).
Article CAS PubMed Google Scholar
Chen, H. et al. Allele-aware chromosome-level genome assembly and efficient transgene-free genome editing for the autotetraploid cultivated alfalfa. Nat. Commun. 11, 1–11 (2020).
Google Scholar
Sun, H. et al. Chromosome-scale and haplotype-resolved genome assembly of a tetraploid potato cultivar. Nat. Genet 54, 342–348 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43, 1059–1065 (2011).
Article CAS PubMed Google Scholar
Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv. preprint. arXiv:1308.2012 (2013).
Zhang, Q. et al. The genome of Prunus mume. Nat. Commun. 3, 1–8 (2012).
Article Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat.Methods. 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics. 30, 2843–2851 (2014).
Article CAS PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).
Article CAS PubMed Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic. Acids. Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Kohany, O., Gentles, A. J., Hankus, L. & Jurka, J. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC. Bioinformatics. 7, 1–7 (2006).
Article Google Scholar
Chen, N. Using Repeat Masker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics. 5, 4.10. 1–4.10. 14 (2004).
Article Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic. Acids. Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Yu, X., Zheng, H., Wang, J., Wang, W. & Su, B. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics. 88, 745–751 (2006).
Article CAS PubMed Google Scholar
Cook, C. E., Bergman, M. T., Cochrane, G., Apweiler, R. & Birney, E. The European Bioinformatics Institute in 2017: data coordination and integration. Nucleic. Acids. Res. 46, D21–D29 (2018).
Article CAS PubMed Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 12, 357–360 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ghosh, S. & Chan, C.-K.K. Analysis of RNA-Seq data using TopHat and Cufflinks. Plant. Bioinformatics: Methods. Protoc. 339–361 (2016).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic. Acids. Res. 31, 5654–5666 (2003).
Article CAS PubMed PubMed Central Google Scholar
Blanco, E., Parra, G. & Guigó, R. Using geneid to identify genes. Curr. Protoc. Bioinformatics. 18, 4.3. 1–4.3. 28 (2007).
Article Google Scholar
Burge, C. B. & Karlin, S. Finding the genes in genomic DNA. Curr. Opin. Struc. Biol. 8, 346–354 (1998).
Article CAS Google Scholar
Keller, O., Kollmar, M., Stanke, M. & Waack, S. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics. 27, 757–763 (2011).
Article CAS PubMed Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, 1–22 (2008).
Article Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic. Acids. Res. 25, 3389–3402 (1997).
Article CAS PubMed PubMed Central Google Scholar
Finn, R. D. et al. InterPro in 2017—beyond protein family and domain annotations. Nucleic. Acids. Res. 45, D190–D199 (2017).
Article CAS PubMed Google Scholar
Finn, R.D. et al. HMMER web server: 2015 update. Nucleic. Acids. Res. 30–8 (2015).
Finn, R. D. et al. Pfam protein families database. Nucleic. Acids. Res. 40, D290–301 (2010).
Google Scholar
Zdobnov, E.M. & Rolf, A. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 847–848 (2001).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic. Acids. Res. 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics. 25, 1335–1337 (2009).
Article CAS PubMed PubMed Central Google Scholar
Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A. & Bateman, A. Rfam: Annotating Non-Coding RNAs in Complete Genomes. Nucleic. Acids. Res. 33, D121–4 (2005).
Article CAS PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500470 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500471 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500472 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500473 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500474 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500475 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500476 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500477 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500481 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500482 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500483 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500484 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500485 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500486 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500487 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500488 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500479 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500480 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500463 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500464 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500465 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500466 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500478 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500489 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR18500490 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP151715 (2019).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371984 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371985 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371986 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371987 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371988 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371989 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371990 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371991 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371992 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371993 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371994 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371995 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371996 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371997 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR16371998 (2022).
NCBI GenBank https://identifiers.org/ncbi/insdc:JAQSGO000000000 (2023).
Li, H., Sun, P., Wang, Y. & Zhang, Z. Persimmon genome. Figshare https://doi.org/10.6084/m9.figshare.19447256 (2023).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31, 3210–3212 (2015).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Key R & D Program of China (2018YFD1000606).

Author information

Authors and Affiliations

Research Institute of Non-timber Forestry, Chinese Academy of Forestry, No. 3 Weiwu Road, Jinshui District, Zhengzhou, 450003, China
Huawei Li, Peng Sun, Yiru Wang, Yujing Suo, Weijuan Han, Songfeng Diao, Fangdong Li & Jianmin Fu
Key Laboratory of Non-timber Forest Germplasm Enhancement & Utilization of State Administration of Forestry and Grassland, No. 3 Weiwu Road, Jinshui District, Zhengzhou, 450003, China
Huawei Li, Peng Sun, Yiru Wang, Yujing Suo, Weijuan Han, Songfeng Diao, Fangdong Li & Jianmin Fu
Key Laboratory of Cultivation and Protection for Non-Wood Forest Trees, Ministry of Education, Central South University of Forestry and Technology, No. 498 Shaoshan South Road, Changsha, 410004, China
Huawei Li
Novogene Bioinformatics Institute, Beijing, 100083, China
Zhongren Zhang
Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, 3888 Chenhua Road, Shanghai, 201602, China
Jun Yang
Henan Key Laboratory of Germplasm Innovation and Utilization of Eco-economic Woody Plant, Pingdingshan University, Pingdingshan, 467000, China
Jianmin Fu

Authors

Huawei Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Yiru Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongren Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yujing Suo
View author publications
You can also search for this author in PubMed Google Scholar
Weijuan Han
View author publications
You can also search for this author in PubMed Google Scholar
Songfeng Diao
View author publications
You can also search for this author in PubMed Google Scholar
Fangdong Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Fu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Fu, J. and Li, F. designed the project, and they are the corresponding authors of this paper. Han, W., Suo, Y., and Diao, S. collected the samples. Li, H., Sun P., Wang, Y. and Zhang, Z. conducted genome assembly, analyze data, and wrote the manuscript, and they contributed equally to this study. Yang, J. provided intellectual insights. All authors critically commented and approved the manuscript.

Corresponding authors

Correspondence to Fangdong Li or Jianmin Fu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, H., Sun, P., Wang, Y. et al. Allele-aware chromosome-level genome assembly of the autohexaploid Diospyros kaki Thunb. Sci Data 10, 270 (2023). https://doi.org/10.1038/s41597-023-02175-2

Download citation

Received: 14 October 2022
Accepted: 21 April 2023
Published: 11 May 2023
DOI: https://doi.org/10.1038/s41597-023-02175-2
Springer Nature Limited

This article is cited by

DkWRKY transcription factors enhance persimmon resistance to Colletotrichum horii by promoting lignin accumulation through DkCAD1 promotor interaction
- Hanyue Fan
- Xiaoxia Shen
- Changfei Guan
Stress Biology (2024)

Allele-aware chromosome-level genome assembly of the autohexaploid Diospyros kaki Thunb

Abstract

Similar content being viewed by others

Chromosome-level genome assembly of Solanum pimpinellifolium

Chromosome-level assembly of the synthetic hexaploid wheat-derived cultivar Chuanmai 104

Chromosome-level assembly and analysis of Camelina neglecta: a novel diploid model for Camelina biotechnology research

Background & Summary