Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea

Li, Jun; Ma, Haitao; Qin, Yanpin; Zhao, Zhen; Niu, Yongchao; Lian, Jianmin; Li, Jiang; Noor, Zohaib; Guo, Shuming; Yu, Ziniu; Zhang, Yuehuan

doi:10.1038/s41597-024-03014-8

Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea

Data Descriptor
Open access
Published: 10 February 2024

Volume 11, article number 186, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea

Download PDF

Jun Li^1,2,3,4^na1,
Haitao Ma^1,2,3,4^na1,
Yanpin Qin^1,2,3,4^na1,
Zhen Zhao^1,2,
Yongchao Niu⁵,
Jianmin Lian⁵,
Jiang Li⁵,
Zohaib Noor^1,2,6,
Shuming Guo^1,6,
Ziniu Yu^1,2,3,4 &
…
Yuehuan Zhang^1,2,3,4

1176 Accesses
6 Altmetric
Explore all metrics

Abstract

Tridacna crocea is an ecologically important marine bivalve inhabiting tropical coral reef waters. High quality and available genomic resources will help us understand the population structure and genetic diversity of giant clams. This study reports a high-quality chromosome-scale T. crocea genome sequence of 1.30 Gb, with a scaffold N50 and contig N50 of 56.38 Mb and 1.29 Mb, respectively, which was assembled by combining PacBio long reads and Hi-C sequencing data. Repetitive sequences cover 71.60% of the total length, and a total of 25,440 protein-coding genes were annotated. A total of 1,963 non-coding RNA (ncRNA) were determined in the T. crocea genome, including 62 micro RNA (miRNA), 58 small nuclear RNA (snRNA), 83 ribosomal RNA (rRNA), and 1,760 transfer RNA (tRNA). Phylogenetic analysis revealed that giant clams diverged from oyster about 505.7 Mya during the evolution of bivalves. The genome assembly presented here provides valuable genomic resources to enhance our understanding of the genetic diversity and population structure of giant clams.

An improved chromosome-level genome assembly and annotation of Echeneis naucrates

Article Open access 04 May 2024

Chromosome-level genome assembly and annotation of the Antarctica whitefin plunderfish Pogonophryne albipinna

Article Open access 12 December 2023

A first genomic portrait of the deep-water azooxanthellate reef-building coral Madracis myriaster: genome size, repetitive elements, nuclear RNA gene operon, mitochondrial genome, and phylogenetic placement in the family Pocilloporidae

Article 29 September 2023

Background & Summary

Giant clams are tropical marine shellfish mainly distributed in the Indian Ocean, Western Pacific, and South China Sea. There are twelve species of giant clams, divided into two genera, with10 species in Tridacna and 2 in Hippopus¹. They play a crucial role in coral reef ecosystems, contributing over 60% of the biomass of coral reef ecosystems². Giant clams support coral reef biodiversity, offer habitats, breeding and feeding grounds to various marine organisms, and have extremely important ecological value^3,4. Giant clams are hermaphrodites, initially functioning as males and later developing female gonads and functioning both as male and female⁵. To avoid the occurrence of self-fertilization, giant clams first release sperm, and then eggs⁶. Bivalves often form symbiotic associations with bacteria, algae, and other marine fauna⁷. There is a symbiotic relationship between giant clams and zooxanthellae. Unlike intracellular symbiosis in stony corals, the zooxanthellae in clams are intercellular and live within the mantle⁸. The symbionts supply nutrients to the host through photosynthesis. While also obtaining some essential nutrients from the host. Notably, symbionts are not transmitted vertically and must be acquired from the environment during the ontogeny of the second larval stage, veliger⁹. Additionally, some bivalves from deep sea engage in symbiosis with chemosynthetic bacteria, which are the primary producers of deep-sea cold seeps and vents¹⁰.

Among Tridacna species, T. crocea is the smallest, with a maximum shell length of no more than 20 cm, growing at a rate of about 4 cm per year, reaching sexual maturity in 1–2 years¹¹. The shell is shallow, with two equal sides and the same shape and size. Despite its slow growth and small size, T. crocea is known for its vibrant colors and beautiful appearance, making it valuable in food markets, the aquarium trade markets and tropical coral reef ecosystems¹². Moreover, its photoautotrophic characteristics contribute to oxygen production, benefiting marine organisms¹³. However, anthropogenic disturbances, such as global warming, habitat destruction and over-harvesting, have led to declining giant clam populations, resulting in giant clams been listed on the IUCN red list (IUCN, 2007).

Despite the ecological importance of giant clams, their genomic features have remained unclear. In fact, previous molecular studies of giant claims have focused on phylogeographical patterns^14,15, as well as the expression and functional analysis of specific genes^16,17. Limited transcriptome data are available^18,19. Recently, a genomic survey and resources for T. crocea were conducted, which involved determining the genome size, predicting unique content, and providing partial annotations, and assemblies²⁰. The lack of genomic information has been a hindrance to the study of the evolutionary and ecological characteristics of giant clam. Recently, the Pacific Biosciences (PacBio) high-fidelity reads (HiFi) have been successfully applied to various complex species and sex chromosomes, such as cultivated apple (high heterozygous)²¹, cultivated alfalfa (utotetraploid)²², and human X chromosome²³. In the present study, the chromosome-level genome of T. crocea was analyzed for the first time using PacBio HiFi reads, Phase genomics Proximo Hi-C technologies, and Illumina short-read sequencing. In order to predict the relationship between T. crocea and other bivalves, gene prediction, functional annotation and phylogenetic analysis were performed. The genome sequence of the giant clam is an important resource for genetic and breeding studies.

Methods

Experimental samples collection and sequencing

T. crocea were sampled from a tropical marine biological research station in Sanya, Hainan province. The giant clams were immediately anaesthetized, and muscle was extracted for DNA isolation using the modified cetyltrimethylammonium bromide (CTAB) method. The quality and quantity of genomic DNA were assessed using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific) and a Qubit 2.0 fluorometer (Thermo Fisher Scientific). DNA integrity was confirmed using a 0.8% agarose gel.

Three distinct genome libraries were created and sequenced in accordance with the manufacturer’s instructions to produce a chromosome-scale assembly of the giant clam: (i) In accordance with the standard PacBio methodology, PCR-free SMRTbell DNA libraries were created utilizing the BluePippin size selection system. The PacBio Sequel system was used to produce long reads; (ii) Phase Genomic’s Hi-C chromosomal conformation captured reads were prepared with the Proximo Hi-C (Animal) Prep Kit and sequenced; (iii) Purified DNA was sheared using a focused ultrasonicator (Covaris) and then used for 350-bp paired-end library construction with the Next Ultra DNA library prep kit (NEB) for Illumina sequencing, the Illumina NovaSeq. 6000 platform was used to sequence short reads (150 bp in length). RNA was extracted from the giant clam mantle and sequenced on the Illumina NovaSeq platform in order to fully aid gene annotation. To construct a high-quality reference genome for the Tridacna crocea, the whole genome sequencing generated ~167 × Pacbio Sequel long reads (218.24 Gb) (Table 1), ~105 × Hi-C reads (136.70 Gb) and ~45 × Illumina paired-end reads (58.50 Gb) (Table 2).

Table 1 Statistic of Pacbio whole genome sequencing data.

Full size table

Table 2 Statistic of illumina data.

Full size table

Genome assembly with Pacbio data and Hi-C data

The Pacbio reads were firstly assembled with Falcon software packages (v2.0.5)²⁴ to build the primary contigs and alternate haplotigs (alternative sequences for regions within the primary contigs where heterozygosity was detectable with the long reads). Tool arrow (v2.2.2) as implemented in SMRTlink6.0 (Pacific Biosciences of California, Inc) was used to polish the contigs. The FALCON-Phase software (v0.2.0-beta) was then used to perform a Hi-C-based contigs phasing, resulting in phased, diploid contigs. The chromosome-scale scaffolds were constructed from the phased contigs using Phase Genomics’ Proximo Hi-C genome scaffolding platform²⁵. Subsequently, Juicebox (v1.8.8)²⁶ was used for a round of polishing to fix minor mistakes in chromosome assignment, ordering, and orientation during chromosomal scaffolding. After a draft set of scaffolds was generated, FALCON-Phase was run again for Hi-C based scaffold phasing. The Illumina sequencing data were further used to improve the assembly by Pilon (v1.22) software²⁷. Finally, the Pacbio reads were initially assembled with Falcon software packages, producing an initial contig assembly, then the assembly was integrated with Phase Genomics Hi-C data to orient and order contigs into chromosome-scale scaffolds. About 78.88% of the 1.30 Gb final Tridacna crocea assembly was assigned to 18 superscaffolds (Fig. 1), with a scaffold N50 and contig N50 of 56.38 Mb and 1.29 Mb, respectively (Table 3). The length distribution of pacbio long reads indicates the peak length is longer than 4 kb (Fig. S1). This result is consistent with the results of other aquatic animals^{28,29,30,31,32}.

Table 3 Features of Tridacna crocea genome.

Full size table

Repeat annotation

There are a large number of repeat sequences in the Tridacna crocea genome, which can be divided into two categories according to the distribution pattern, namely tandem repeat sequences and interspersed repeat sequences. Tandem repetitive sequences were identified using GMATA³³ and Tandem Repeats Finder (TRF, version 4.07b)³⁴ with default parameters. The interspersed repeat contents of the Tridacna crocea genome were identified using two methods, de novo repeat identification and known repeat searching against existing databases. RepeatModeler (v1.0.11) and MITE-hunter³⁵ were used to de novo predict repeat sequences in the genome, the homology-based approach involved applying RepeatMasker (version 1.331) (http://www.repeatmasker.org/) and Repbase database³⁶ to identify TE repeats in the assembled genome. The results showed that 71.60% of the assembly consisted of repetitive sequences (Table 4, Fig. 2). The proportion of repeat elements was higher than that of close relatives of mollusks, such as Patinopecten yessoensis (39%)³⁷, Crassostrea gigas (43%)³⁸ and Sinonovacula constricta (40%)²⁹, given that repetitive sequences are the main drivers of genome amplification, T.crocea presents a larger genome size compared to the three closely related species (Table 5). Among these repetitive sequences, transposable elements (TEs) accounted for 55.83% of the T. crocea genome size, with DNA transposons to be the most predominant type (37.68% of the genome size).

Table 4 Repeat content in the assembled Tridacna crocea genome.

Full size table

Table 5 Features of Mollusk assemblies.

Full size table

Gene prediction and functional annotation

Gene prediction in a repeat-masked genome was performed using reference guided transcriptome assembly, homology search and ab initio prediction. By combining transcriptome alignment, homologous protein prediction and ab initio prediction. In detail, proteins of four mollusks (Crassostrea gigas, Crassostrea virginica, Mizuhopecten yessoensis, Octopus bimaculoides) were downloaded from NCBI DataBase for homolog prediction, GeMoMa³⁹ was used to align the homologous peptides to the assembly and then got the gene structure information. For RNAseq-based gene prediction, filtered mRNA-seq reads were aligned to the reference genome using STAR⁴⁰. The transcripts were then assembled using StringTie2⁴¹ and open reading frames (ORFs) were predicted using PASA⁴². For the de novo prediction, RNA-seq reads were de novo assembled using stringtie and analyzed with PASA to produce a training set. Augustus⁴³ with default parameters were then utilized for ab initio gene prediction with the training set. Finally, EVidenceModeler (EVM)⁴⁴ was used to produce an integrated gene set of which gene with TE were removed using TransposonPSI package (http://transposonpsi.sourceforge.net/) and the miscoded genes were further filtered. Untranslated regions (UTRs) and alternative splicing regions were determined using PASA based on RNA-seq assemblies. We retained the longest transcripts for each locus, and regions outside of the ORFs were designated UTRs. We predicted 25,440 protein-coding genes with an average gene length of 25,946 bp and an average 8.43 exons per gene. Functional annotation based on public databases (including SwissProt, NR, KEGG, KOG and Gene Ontology) estimated that 23,017 (90.48%) genes could be classified by at least one of the databases (Fig. 3). In addition, we annotated four types of non-coding RNAs in the T. crocea assembly, including micro RNA (miRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), and small nuclear RNA (snRNA). The tRNA genes were predicted by an improved tool for tRNA detection, tRNAscan-SE (version 1.3.1)⁴⁵ with default paramerters. The rRNA fragments were predicted by aligning to invertebrate template rRNA sequences using BlastN (version 2.2.24) at an E-value of 1e-5. The snRNAs as well as miRNAs were identified using INFERNAL (version 1.1.1)⁴⁶ to search against the Rfam database (release 12.0). A total of 1,963 non-coding RNA (ncRNA) were determined in the Tridacna crocea genome, including 62 micro RNA (miRNA), 58 small nuclear RNA (snRNA), 83 ribosomal RNA (rRNA), and 1,760 transfer RNA (tRNA) (Table 6).

Table 6 Non-coding RNAs in the Tridacna crocea assembly.

Full size table

Comparative genomic and phylogenetic analysis

We clustered the protein-coding genes into gene families for T. crocea Aplysia californica (GCF_000002075.1), Crassostrea gigas (GCF_902806645.1), Crassostrea virginica (GCF_002022765.2), Helobdella robusta (GCF_000326865.1), Lottia gigantean (GCF_000327385.1), Mizuhopecten yessoensis (GCF_002113885.1), Octopus bimaculoides (GCF_001194135.1), Drosophila melanogaster (GCF_000001215.4), Homo sapiens (GCF_000001405.39) and Nematostella vectensis (GCF_000209225.1) (Table 7). 27,422 gene families were identified, of which 3,109 were shared by all eleven species. Comparing with other ten species, there are 347 specific gene families in the T. crocea assembly (Fig. 3), among these T. crocea specific families, 953 genes are supported by evidence of gene functional annotation. These T. crocea specific genes were significantly (P < 0.05) enriched in zinc ion binding, extracellular ligand-gated ion channel activity, integral component of membrane, ion transport related gene ontology (GO) categories (Table 8).

Table 7 Statistic analysis of gene families.

Full size table

Table 8 GO enrichment of positive selection genes in Tridacna crocea.

Full size table

A phylogenetic tree was constructed using the eleven animal species (Fig. 4). Protein sequences were extracted from each family and concatenated to form one supergene for each species, and the maximum likelihood method⁴⁷ was used to reconstruct the phylogenetic tree. The divergence time among the eleven animals was estimated using the MCMCtree program (version 4.4) as implemented in the Phylogenetic Analysis of Maximum Likelihood (PAML) package⁴⁸, with a correlated rates clock and JC69 nucleotide substitution model. The divergence time between T. crocea and M. yessoensis was predicted to be about 505.7 million years ago (MYA). Compared with the common ancestor of T. crocea, M. yessoensis, C. gigas and C. virginica, Tridacna crocea shows 93 and 15 events of gene family expansion and gene family contraction, respectively. The expanded genes in T. crocea are related with “DNA replication” (GO:0006260), “DNA-directed DNA polymerase activity” (GO:0003887), “nucleotide binding” (GO:0000166), “methyltransferase activity” (GO:0008168), and so on. On the other side, the contracted genes in the T. crocea were significantly (P < 0.05) enriched in GO terms for “iron ion binding” (GO:0005506), “heme binding” (GO:0020037), “oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen” (GO:0016705), and “oxidation-reduction process” (GO:0055114).

Data Records

The raw Illumina, PacBio, Hi-C sequencing and full length transcriptome data are deposited in the NCBI SRA database under the accession numbers SRR17137644⁴⁹, SRR17137645⁵⁰, SRR17137643⁵¹, and SRR25651021⁵², respectively. The genome assembly and annotations are available from the Figshare^53,54 and the assembly genome are also deposited at the NCBI with accession number GCA_032873355.1⁵⁵.

Technical Validation

Evaluation of the genome assembly

The Hi-C heatmap exhibits the accuracy of genome assembly, with relatively independent Hi-C signals observed between the 18 pseudo-chromosomes (Fig. 1B). To evaluate the quality of the genome assembly, the completeness of the genome assembly was assessed using the conserved metazoan gene set “metazoan_odb10” from the Benchmarking Universal Single-Copy Orthologs (BUSCO) v4.054. The genome assembly was found to have a high level of completeness (94.2%). 74.2% were complete and single-copy, 20% complete and duplicated, 0.6% fragmented, and 5.2% were missing (Table 9). This demonstrates the remarkable completeness and conservation of gene content in giant clam genome assembly, achieving one of the best BUSCO scores observed among reported mollusks. Therefore, these results suggested that the quality of this genome assembly is high.

Table 9 Statistic of the Tridacna crocea assembly gene-space with the 978 BUSCO metazoa gene set.

Full size table

Genome annotation and phylogenetic analysis

By comparing with public databases including Gene Ontology, KOG, SwissProt, KEGG and NR, gene function information, motifs and domains of their proteins were assigned (Table 10). InterProScan program⁵⁶ with default parameters was used to identify the GO terms and putative domains of genes. For other four databases, the EvidenceModeler-integrated protein sequences against the 4 public protein database were compared using BLASTp⁵⁷ with an E value cutoff of 1e⁻⁰⁵. Results from the five database searches were concatenated.

Table 10 Functional annotation of the predicted genes in the assembly of Tridacna crocea.

Full size table

The maximum likelihood method was performed to reconstruct the phylogenetic tree according to⁴⁷. The divergence time among the eleven animals were predicted by the MCMCtree program (version 4.4) of Phylogenetic Analysis of Maximum Likelihood (PAML) package⁴⁸, with a correlated rates clock and JC69 nucleotide substitution model. The TimeTree database was used to predict the calibration times of divergence between Octopus bimaculoides and Crassostrea gigas (~554MYA)⁵⁸.

Code availability

All data processing commands and pipelines are executed according to instructions and guidelines provided by relevant bioinformatics software. No custom scripts or code were used in this study.

References

Neo, M. L., Eckman, W., Vicentuan, K., Teo, S. L. M. & Todd, P. A. The ecological significance of giant clams in coral reef ecosystems. Biol Conserv 181, 111–123, https://doi.org/10.1016/j.biocon.2014.11.004 (2015).
Article Google Scholar
Harzhauser, M., Mandic, O., Piller, W. E., Reuter, M. & Kroh, A. Tracing back the origin of the Indo-Pacific mollusc fauna: Basal Tridacninae from the Oligocene and Miocene of the Sultanate of Oman. Palaeontology 51, 199–213, https://doi.org/10.1111/j.1475-4983.2007.00742.x (2008).
Article Google Scholar
Perry, C. T. et al. Estimating rates of biologically driven coral reef framework production and erosion: a new census-based carbonate budget methodology and applications to the reefs of Bonaire. Coral Reefs 31, 853–868, https://doi.org/10.1007/s00338-012-0901-4 (2012).
Article ADS Google Scholar
Mallela, J. & Perry, C. T. Calcium carbonate budgets for two coral reefs affected by different terrestrial runoff regimes, Rio Bueno, Jamaica. Coral Reefs 26, 129–145, https://doi.org/10.1007/s00338-006-0169-7 (2007).
Article ADS Google Scholar
Mies, M. & Sumida, P. Giant Clam Aquaculture: a Review on Induced Spawning and Larval Rearing. International Journal of Marine Science 2, 62–69 (2012).
Google Scholar
Braley, R. D. Serotonin-Induced Spawning in Giant Clams (Bivalvia, Tridacnidae). Aquaculture 47, 321–325, https://doi.org/10.1016/0044-8486(85)90217-0 (1985).
Article CAS Google Scholar
Dubilier, N., Bergin, C. & Lott, C. Symbiotic diversity in marine animals: the art of harnessing chemosynthesis. Nature reviews. Microbiology 6, 725–740, https://doi.org/10.1038/nrmicro1992 (2008).
Article CAS PubMed Google Scholar
Norton, J., Shepherd, M., Long, H. & Fitt, W. The Zooxanthellal Tubular System in the Giant Clam. Biological Bulletin 183, https://doi.org/10.2307/1542028 (1992).
Mies, M. Evolution, diversity, distribution and the endangered future of the giant clam-Symbiodiniaceae association. Coral Reefs 38, https://doi.org/10.1007/s00338-019-01857-x (2019).
Guo, Y. et al. Hologenome analysis reveals independent evolution to chemosymbiosis by deep-sea bivalves. BMC biology 21, 51, https://doi.org/10.1186/s12915-023-01551-z (2023).
Article CAS PubMed PubMed Central Google Scholar
Zhou, Z. et al. Artificial interspecific hybridization of two giant clams, Tridacna squamosa and Tridacna crocea, in the south China sea. Aquaculture 515, 734581, https://doi.org/10.1016/j.aquaculture.2019.734581 (2020).
Article Google Scholar
Li, Y. Q. et al. Study on the Individual Coloring Mechanism of Iridescent Cells in the Mantle of the Boring Giant Clam. Front Mar Sci 9, https://doi.org/10.3389/Fmars.2022.883678 (2022).
Li, J. et al. Assessment of the juvenile vulnerability of symbiont-bearing giant clams to ocean acidification. Sci Total Environ 812, https://doi.org/10.1016/j.scitotenv.2021.152265 (2022).
Cai, S. Y., Mu, W. D., Wang, H., Chen, J. W. & Zhang, H. B. Sequence and phylogenetic analysis of the mitochondrial genome of giant clam, Tridacna crocea (Tridacninae: Tridacna). Mitochondrial DNA B 4, 1032–1033, https://doi.org/10.1080/23802359.2019.1579071 (2019).
Article Google Scholar
Ma, H. T. et al. Molecular phylogeny and divergence time estimates for native giant clams (Cardiidae: Tridacninae) in the Asia-Pacific: Evidence from mitochondrial genomes and nuclear 18S rRNA genes. Front Mar Sci 9, https://doi.org/10.3389/Fmars.2022.964202 (2022).
Zhou, Y. Y. et al. Developmental Expression Pattern of the Piwi1 Gene, Timing of Sex Differentiation and Maturation in Artificially Produced Juvenile Boring Giant Clam, Tridacna crocea. Front Mar Sci 9, https://doi.org/10.3389/Fmars.2022.883661 (2022).
Zhou, Y. et al. Examination of the role of the forkhead box L2 (Foxl2) in gonadal and embryonic development in the boring giant clam Tridacna crocea. Aquaculture 560, 738554, https://doi.org/10.1016/j.aquaculture.2022.738554 (2022).
Zhou, Z., Liu, Z., Wang, L., Luo, J. & Li, H. Oxidative stress, apoptosis activation and symbiosis disruption in giant clam Tridacna crocea under high temperature. Fish & Shellfish Immunology 84, 451–457, https://doi.org/10.1016/j.fsi.2018.10.033 (2019).
Article CAS Google Scholar
Xu, D. et al. Mechanistic molecular responses of the giant clam Tridacna crocea to Vibrio coralliilyticus challenge. Plos One 15, https://doi.org/10.1371/journal.pone.0231399 (2020).
Baeza, J. A., Neo, M. L. & Huang, D. Genomic Survey and Resources for the Boring Giant Clam Tridacna crocea. Genes (Basel) 13, https://doi.org/10.3390/genes13050903 (2022).
Sun, X. P. et al. Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication. Nat Genet 52, 1423–1432, https://doi.org/10.1038/s41588-020-00723-9 (2020).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Chen, H. T. et al. Allele-aware chromosome-level genome assembly and efficient transgene-free genome editing for the autotetraploid cultivated alfalfa. Nat Commun 11, https://doi.org/10.1038/S41467-020-16338-X (2020).
Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585, 79-+, https://doi.org/10.1038/s41586-020-2547-7 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13, 1050-+, https://doi.org/10.1038/Nmeth.4035 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bickhart, D. M. et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet 49, 643-+, https://doi.org/10.1038/ng.3802 (2017).
Article CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3, 99–101, https://doi.org/10.1016/j.cels.2015.07.012 (2016).
Article CAS PubMed PubMed Central Google Scholar
Walker, B. J. et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. Plos One 9, https://doi.org/10.1371/journal.pone.0112963 (2014).
Gomes-dos-Santos, A. et al. The Crown Pearl: a draft genome assembly of the European freshwater pearl mussel Margaritifera margaritifera (Linnaeus, 1758). DNA Research 28, https://doi.org/10.1093/dnares/dsab002 (2021).
Ran, Z. et al. Chromosome-level genome assembly of the razor clam Sinonovacula constricta (Lamarck, 1818). Mol Ecol Resour 19, 1647–1658, https://doi.org/10.1111/1755-0998.13086 (2019).
Article CAS PubMed Google Scholar
Gallardo-Escarate, C. et al. Chromosome-Level Genome Assembly of the Blue Mussel Mytilus chilensis Reveals Molecular Signatures Facing the Marine Environment. Genes-Basel 14, https://doi.org/10.3390/Genes14040876 (2023).
Bai, C.-M. et al. Chromosomal-level assembly of the blood clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C. Gigascience 8, https://doi.org/10.1093/gigascience/giz067 (2019).
Kim, J. et al. Chromosome-Level Genome Assembly of the Butter Clam Saxidomus purpuratus. Genome Biology and Evolution 14, https://doi.org/10.1093/gbe/evac106 (2022).
Wang, X. W. & Wang, L. GMATA: An Integrated Software Package for Genome-Scale SSR Mining, Marker Development and VIewing. Front Plant Sci 7, https://doi.org/10.3389/Fpls.2016.01350 (2016).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580, https://doi.org/10.1093/Nar/27.2.573 (1999).
Article CAS PubMed PubMed Central Google Scholar
Han, Y. J. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res 38, https://doi.org/10.1093/nar/gkq862 (2010).
Bao, W. D., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA-Uk 6, https://doi.org/10.1186/s13100-015-0041-9 (2015).
Wang, S. et al. Scallop genome provides insights into evolution of bilaterian karyotype and development. Nat Ecol Evol 1, 120, https://doi.org/10.1038/s41559-017-0120 (2017).
Article PubMed Google Scholar
Penaloza, C. et al. A chromosome-level genome assembly for the Pacific oyster Crassostrea gigas. Gigascience 10, https://doi.org/10.1093/gigascience/giab020 (2021).
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res 44, https://doi.org/10.1093/nar/gkw092 (2016).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq. 2. Genome Biol 15, https://doi.org/10.1186/S13059-014-0550-8 (2014).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol 20, https://doi.org/10.1186/S13059-019-1910-1 (2019).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31, 5654–5666, https://doi.org/10.1093/nar/gkg770 (2003).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439, https://doi.org/10.1093/nar/gkl200 (2006).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol 9, https://doi.org/10.1186/Gb-2008-9-1-R7 (2008).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25, 955–964, https://doi.org/10.1093/Nar/25.5.955 (1997).
Article CAS PubMed PubMed Central Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Article CAS PubMed PubMed Central Google Scholar
Guindon, S. et al. New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Syst Biol 59, 307–321, https://doi.org/10.1093/sysbio/syq010 (2010).
Article CAS PubMed Google Scholar
Yang, Z. H. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13, 555–556 (1997).
CAS PubMed Google Scholar
Li, J. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR17137644 (2023).
Li, J. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR17137645 (2023).
Li, J. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR17137643 (2023).
Li, J. NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR25651021 (2023).
Li, J. Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea. figshare. Dataset. https://doi.org/10.6084/m9.figshare.24264643 (2023).
Li, J. Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea. figshare. Dataset. https://doi.org/10.6084/m9.figshare.24264646 (2023).
Li, J. NCBI Genbank https://identifiers.org/insdc.gca:GCA_032873355.1 (2023).
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res 33, W116–W120, https://doi.org/10.1093/nar/gki442 (2005).
Article CAS PubMed PubMed Central Google Scholar
McGinnis, S. & Madden, T. L. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 32, W20–W25, https://doi.org/10.1093/nar/gkh435 (2004).
Article CAS PubMed PubMed Central Google Scholar
Hedges, S. B. & Dudley, J. Kumar & Sudhir. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22, 2971–2972 (2006).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This research was supported by National Key Research and Development Program of China (2022YFC3102002); Guangzhou Science and Technology Project (2023B03J00165; 202206010133); Guangdong Provincial Key Research and Development Program (2021B0202020003); the National Science Foundation of China (32002387); the Project of Sanya Yazhou Bay Science and Technology City; Science and Technology Project of Guangdong Provincial Department of Natural Resources (GDNRC[2022]40); Guangdong Basic and Applied Basic Research Foundation (2023A1515010944; 2022A1515010203); the Open Foundation of the State Key Laboratory of Loess and Quaternary Geology (SKLLQG2213); National Marine Genetic Resource Center; the earmarked fund for CARS-49; and the Science and Technology Planning Project of Guangdong Province, China (2023B1212060047).

Author information

These authors contributed equally: Jun Li, Haitao Ma, Yanpin Qin.

Authors and Affiliations

Key Laboratory of Tropical Marine Bio-resources and Ecology, Guangdong Provincial Key Laboratory of Applied Marine Biology, Innovation Academy of South China Sea Ecology and Environmental Engineering, South China Sea Institute of Oceanology, Chinese Academy of Science, Guangzhou, 510301, China
Jun Li, Haitao Ma, Yanpin Qin, Zhen Zhao, Zohaib Noor, Shuming Guo, Ziniu Yu & Yuehuan Zhang
Hainan Key Laboratory of Tropical Marine Biotechnology, Hainan Sanya Marine Ecosystem National Observation and Research Station, Sanya, 572024, China
Jun Li, Haitao Ma, Yanpin Qin, Zhen Zhao, Zohaib Noor, Ziniu Yu & Yuehuan Zhang
Daya Bay Marine Biology Research Station, Chinese Academy of Sciences, Shenzhen, 518124, China
Jun Li, Haitao Ma, Yanpin Qin, Ziniu Yu & Yuehuan Zhang
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, 519015, China
Jun Li, Haitao Ma, Yanpin Qin, Ziniu Yu & Yuehuan Zhang
Biozeron Shenzhen, Inc, Shenzhen, 518000, China
Yongchao Niu, Jianmin Lian & Jiang Li
University of Chinese Academy of Sciences, Beijing, 100049, China
Zohaib Noor & Shuming Guo

Authors

Jun Li
View author publications
You can also search for this author in PubMed Google Scholar
Haitao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yanpin Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yongchao Niu
View author publications
You can also search for this author in PubMed Google Scholar
Jianmin Lian
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Zohaib Noor
View author publications
You can also search for this author in PubMed Google Scholar
Shuming Guo
View author publications
You can also search for this author in PubMed Google Scholar
Ziniu Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yuehuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jun Li, Yuehuan Zhang, and Ziniu Yu conceived and designed this study. Yanping Qin and Haitao Ma collected the samples. Yongchao Niu and Zhen Zhao assembled and annotated the genome. Yongchao Niu and Zhen Zhao performed gene family and genome evolutionary analyses. Jun Li, Yanping Qin, Jianmin Lian, Zohaib Noor, Shuming Guo, Gongpengyang Shi and Jiang Li performed bioinformatic analyses. Jun Li wrote the manuscript. Yanping Qin, and Haitao Ma revised it. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Ziniu Yu or Yuehuan Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Fig S1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, J., Ma, H., Qin, Y. et al. Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea. Sci Data 11, 186 (2024). https://doi.org/10.1038/s41597-024-03014-8

Download citation

Received: 11 October 2023
Accepted: 24 January 2024
Published: 10 February 2024
DOI: https://doi.org/10.1038/s41597-024-03014-8
Springer Nature Limited

Chromosome-level genome assembly and annotation of rare and endangered tropical bivalve, Tridacna crocea

Abstract

Similar content being viewed by others

An improved chromosome-level genome assembly and annotation of Echeneis naucrates

Chromosome-level genome assembly and annotation of the Antarctica whitefin plunderfish Pogonophryne albipinna

A first genomic portrait of the deep-water azooxanthellate reef-building coral Madracis myriaster: genome size, repetitive elements, nuclear RNA gene operon, mitochondrial genome, and phylogenetic placement in the family Pocilloporidae

Background & Summary

Methods

Experimental samples collection and sequencing