The first chromosomal-level genome assembly and annotation of white suckerfish Remora albescens

Zhou, Chaowei; Liu, Qi; Qu, Yinquan; Qiao, Ying; Gao, Tianxiang; Wang, Danyang

doi:10.1038/s41597-024-03363-4

The first chromosomal-level genome assembly and annotation of white suckerfish Remora albescens

Data Descriptor
Open access
Published: 22 May 2024

Volume 11, article number 523, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

The first chromosomal-level genome assembly and annotation of white suckerfish Remora albescens

Download PDF

Chaowei Zhou^1,2,
Qi Liu³,
Yinquan Qu⁴,
Ying Qiao⁵,
Tianxiang Gao⁴ &
…
Danyang Wang ORCID: orcid.org/0009-0007-2366-7923⁶

487 Accesses
1 Altmetric
Explore all metrics

Abstract

Remora albescens, also known as white suckerfish, recognized for its distinctive suction-cup attachment behavior and medicinal significance. In this study, we produced a high-quality chromosome-level genome assembly of R. albescens through the integration of 23.87 Gb PacBio long reads, 64.54 Gb T7 short reads, and 88.63 Gb Hi-C data. Initially, we constructed a contig-level genome assembly totaling 605.30 Mb with a contig N50 of 23.12 Mb. Subsequently, employing Hi-C technology, approximately 99.68% (603.38 Mb) of the contig-level genome was successfully assigned to 23 pseudo-chromosomes. Through the integration of homologous-based predictions, ab initio predictions, and RNA-sequencing methods, we successfully identified a comprehensive set of 22,445 protein-coding genes. Notably, 96.36% (21,629 genes) of these were effectively annotated with functional information. The genome assembly achieved an estimated completeness of 98.1% according to BUSCO analysis. This work promotes the applicability of the R. albescens genome, laying a solid foundation for future investigations into genomics, biology, and medicinal importance within this species.

A chromosome-level genome assembly of the yellow-throated marten (Martes flavigula)

Article Open access 17 April 2023

Improved chromosomal-level genome assembly and re-annotation of leopard coral grouper

Article Open access 22 March 2023

Chromosomal-level genome assembly and annotation of the tropical sea cucumber Holothuria scabra

Article Open access 09 May 2024

Background & Summary

Remora albescens, namely white suckerfish or white remora, are in the Echeneidae family, order Carangiformes, and inhabit warm seas (Fig. 1). Similar to other members of the Echeneidae family, white suckerfish have evolved front dorsal fin sucking discs, which extend from the top of the head to the tips of their pectoral fins, consisting of 13-14 plates¹. These adaptations enable them to adhere to smooth surfaces through suction, and they spend majority of their lives clinging to a host animal, such as a manta ray or a shark². They frequently affix themselves to the body, as well as within the gill chamber and the mouth of the host². The relationship between a white suckerfish and its host is typically considered a form of commensalism, specifically phoresy. Besides their unique biological characteristics, the white suckerfish are used in traditional Chinese medicine for their positive impact on lung and spleen-stomach health³, which grants them considerable medicinal value and commercial benefits.

High-quality reference genomes are instrumental in facilitating a deep understanding and comprehensive screening of the genetic foundation and variations linked to crucial traits. This knowledge allows us to gain insights into and effectively harness the biological characteristics of the species for various purposes. Currently, the genome of the white suckerfish has not been sequenced, impeding our exploration of genetic basis behind their biological features and behaviours. Overall, a high-quality chromosome-level reference genome will contribute to a profound comprehension of the genetic mechanisms responsible for the medicinal value of R. albescens.

In this study, through the integration of PacBio High fidelity (HiFi) long-reads, T7 paired-end sequencing short-reads and high-throughput chromatin capture (Hi-C) sequencing data (Table 1), we introduce the first chromosomal-level genome assembly of R. albescens. The assembly yielded a genome of 605.30 Mb, composed of 158 contigs, with a contig N50 length of 23.12 Mb. In total, 603.38 Mb, covering 99.68% of the contig-level genome, were accurately mapped onto 23 chromosomes by using Hi-C data. The BUSCO alignment analysis indicated that our ultimate assembly contained 3,571 (98.1%) complete BUSCOs. In conclusion, this high-quality chromosomal-level reference genome establishes a valuable foundation for comprehending the biological characteristics and conducting further research into the medicinal value of the R. albescens.

Table 1 Statistics of sequencing data for Remora albescens genome assembly and annotation.

Full size table

Table 2 Comparison of the R. albescens genome assembly metrics with the E. naucrates.

Full size table

Methods

Fish sample collection and preparation

A single fish, measuring 18 centimeters in length, was obtained from Northern South China Sea in June 2022 (Fig. 1). The collection of the sampled fish for this study was conducted in accordance with the guidelines and regulations set forth by the Animal Care and Use Committee of Fisheries College of Zhejiang Ocean University, as indicated by Animal Ethics no. 1067. Tissues from the R. albescens were collected and preserved in liquid nitrogen until DNA or RNA extraction. Wherein, muscle and liver tissues were utilized for DNA sequencing to implement the genome assembly. Kidney, spleen, fin, gill and sucker tissues were utilized for RNA sequencing.

WGS BGISEQ library and PacBio library construction, sequencing and contig-level assembly

According to the standard phenol/chloroform extraction instruction, the whole-genome sequencing (WGS) libraries were prepared by extracting genomic DNA from muscle tissues.

To obtain BGISEQ short reads, the DNA sample underwent evaluation through 1% agarose gel electrophoresis and the Pultton DNA/Protein Analyzer (Plextech). Subsequently, a paired-end library with an insert size of 300 bp to 350 bp was constructed following the BGISEQ standard protocol. Afterward, the DNA sample was purified, quantified, and subjected to sequencing from both ends using the BGISEQ-T7 sequencing platform. BGISEQ sequencing resulted in a total of 66.21 Gb raw reads (Table 1). Following a filtering process utilizing fastp v0.23.2⁴ with default parameters, which aimed to eliminate low-quality, short reads, adapters and redundant sequences, a total of 64.54 Gb clean reads were obtained (Table 1). Then by using GCE v1.0.0 software⁵, K-mer analysis was performed to estimate the genome size and heterozygosity for R. albescens, which were 563 Mb and 0.63%, respectively (Fig. 2).

To obtain PacBio long reads, the DNA sample was first evaluated using Nanodrop, Qubit and agarose gel electrophoresis. Then, the library with a fragment size of 20 kb was created utilizing the SMRTBell template preparation kit 1.0 following the manufacturer’s instructions. Afterward, the DNA sample was subjected to sequencing using the PacBio Sequel II platform in Circular Consensus Sequence (CCS) mode. After removing low-quality sequences using the CCS v6.0.0 algorithm with default parameters, a sum of 23.87 Gb high-precision reads with an N50 value of 18.88 kb were obtained. With these HiFi reads, the initial contigs were assembled using the Hifiasm v0.16.1⁶ and the purge_haplotigs algorithms⁷ with the default settings. The assembly yielded a 605.30 Mb genome with a maximum contig size of 51.46 Mb.

Hi-C library preparation, sequencing and chromosomal-level assembly

The contigs obtained in the previous step were anchored onto chromosomes using Hi-C data. In a nutshell, 1 g of liver tissue from R. albescens was treated with 1% formaldehyde for 20 minutes at 20–25 °C temperature to facilitate the coagulation of proteins implicated in chromatin interactions. Next, DNA was digested using MboI and the overhangs of the resulting restriction fragments were labeled with biotinylated nucleotides, after which they were ligated within a confined volume. Following the cross-link reversal, the ligated DNA was purified and fragmented to a size range of 300–500 bp. Following this step, ligation junctions were extracted by streptavidin beads and subjected to sequencing from both ends using the BGISEQ-T7 sequencing platform, producing a total of 88.75 Gb raw data (Table 1). After removing low-quality sequences and adapters, and only retaining paired-end reads, both of which are longer than 50 bp, with fastp v0.23.2⁴ software, a sum of 88.63 Gb clean data were acquired (Table 1). We utilized the HiCUP pipeline⁸ to obtain credible and nonredundant contigs interaction matrix, and then anchored the contigs onto chromosomes by using 3D-DNA pipeline⁹. Juicebox Assembly Tools¹⁰ was utilized for manual error correction to rectify any occurrences of chromosome inversion and translocation. Finally, 603.38 Mb (~99.63%) of contig-level assembled sequences were positioned onto 23 pseudo-chromosomes (Fig. 3A).

RNA library construction and sequencing

Total RNA was extracted from the five tissues, including kidney, spleen, fin, gill and sucker, of the R. albescens using TRIzol reagent (Invitrogen). To evaluate RNA quality, we utilized the NanoDrop ND-1000 spectrophotometer (Labtech) and the 2100 Bioanalyzer (Agilent Technologies). The paired-end reads were sequenced using the BGISEQ-T7 Platform. Overall, 6.01 Gb of clean data were obtained following filtering process utilizing fastp v0.23.2⁴ with default settings to eliminate low-quality and short reads, as well as trim adapters and polyG tails (Table 1).

Repetitive elements annotation

Repeat elements in the R. albescens genome were systematically identified using a dual approach, incorporating both homology-based searches and ab initio predictions. The ab initio prediction of repeat elements was carried out through two tools, namely Tandem Repeat Finder v4.09¹¹ and LTR_FINDER_parallel v1.1¹¹ with default parameters. Subsequently, newly discovered repeats were predicted using RepeatMasker v4.0.9¹², based on the de novo repetitive sequence library that was constructed using LTR_FINDER_parallel and RepeatModeler v2.0¹³, RepeatMasker v4.0.9 and RepeatProteinMask v4.1.0 (http://www.repeatmasker.org) were used to identify known repeat elements with the Repbase v20181026 database¹⁴. In total, 18.04% of the R. albescens genome were identified as repetitive sequences (Fig. 3B). Among these repeat elements, DNAs, LTRs, LINEs, and SINEs constituted 6.98%, 2.49%, 5.41%, and 1.69% of the genome, respectively (Table 3).

Table 3 Statistics on transposable elements in the R. albescens genome.

Full size table

Gene prediction and annotation

Utilizing the repeat-masked genome as a basis, three strategies, comprising ab initio prediction, homologous prediction and RNA-sequencing method, were employed to predict protein-coding genes within the R. albescens genome. Ab initio prediction was conducted utilizing Augustus v3.3.2¹⁵ and Genscan¹⁶ software. Simultaneously, homologous prediction relied on protein sequences from various annotated species, comprising Seriola lalandi (RefSeq assembly accession: GCF_002814215.2), Seriola dumerili (RefSeq assembly accession: GCF_002260705.1), Echeneis naucrates (RefSeq assembly accession: GCF_900963305.1), Takifugu rubripes (RefSeq assembly accession: GCF_901000725.2), Gasterosteus aculeatus (RefSeq assembly accession: GCF_016920845.1), and Danio rerio (RefSeq assembly accession: GCF_000002035.6). The protein sequences above were retrieved from the NCBI database and then aligned with the R. albescens genome utilizing tblastn tool (e-value ≤ 1e-5). Subsequently, the homologous sequences were aligned with the corresponding proteins with Genewise v2.4.0¹⁷ to predict detailed gene structures. The RNA-seq dataset were aligned to the assembled genome by using HISAT2 v2.1.0¹⁸ with default settings, and the predicted transcripts were identified by using StringTie v1.3.5¹⁹ and TransDecoder v5.1.0 (https://github.com/TransDecoder/TransDecoder) with default settings. Three gene model predictions were merged using MAKER v2.31.10²⁰. Based on that, we further refined the gene set using HiFAP (Wuhan OneMore Tech Co., Ltd., https://www.onemore-tech.com/) with high-quality transcripts and homology annotation results, resulting in a final gene set with a total number of protein-coding genes of 22,445 genes (Fig. 3B and Table 4).

Table 4 Statistics of gene predictions in the R. albescens genome.

Full size table

The functional annotation of the predicted protein-coding gene sets was performed using BLASTp (e-value ≤ 1e-5) with the diamond v2.0.8 software²¹ based on six databases, including Swiss-Prot v2023-03-01²², NCBI nonredundant protein (NR) v2023-04-01, Kyoto Encyclopedia of Genes and Genomes (KEGG) v2023-01-01 (http://www.genome.jp/kegg/), TrEMBL v2023-03-01 (http://www.uniprot.org), eukaryotic orthologous groups of proteins (KOG) v2003-03-01²³ and AnimalTFDB v4.0 (http://bioinfo.life.hust.edu.cn/AnimalTFDB4/?#/). Additionally, protein structural domain predictions of gene sets were performed based on InterPro and Pfam databases utilizing InterProScan v5.61-93.0²⁴ with parameters “–goterms–pathways -dp”. As a result, 96.36% (21,629 genes) of the total predicted genes were successfully annotated. (Table 5).

Table 5 Summary of functional annotations for predicted genes of the R. albescens genome.

Full size table

Non-coding RNA prediction and annotation

According to the miRBase²⁵ and rfam²⁶ databases, the microRNAs (miRNAs), ribosomal RNAs (rRNAs) and small nuclear RNAs (snRNAs) were annotated utilizing INFERNAL v1.1²⁷. The transfer RNAs (tRNAs) were predicted by using tRNAscan-SE v1.3.1²⁸. Consequently, 829 miRNAs, 1,832 rRNAs, 820 snRNAs and 7,033 tRNAs were predicted within the R. albescens genome (Table 6).

Table 6 Statistics of ncRNA in the R. albescens genome.

Full size table

Data Records

The raw sequencing data for R. albescens in this study is available from the Sequence Read Archive (SRA) under Bioproject number PRJNA1036795, which includes WGS T7 sequencing data (SRR26831100²⁹), Pacbio HiFi sequencing data (SRR26831099³⁰), Hi-C sequencing data (SRR26831098³¹), and RNA sequencing data (SRR28537587³²). The assembled genome of R. albescens has been deposited in GenBank under accession JAXCVL000000000³³. Additionally, files contained the assembled genome, protein-coding gene annotation, non-coding RNA prediction, and repeat annotation of R. albescens have been made available in the Figshare database³⁴.

Technical Validation

Our initial assessment of the continuity of the R. albescens genome assembly was conducted using QUAST v5.2.0³⁵. The contig N50 reaches 23.12 Mb and the genome displays a minimal number of gaps (1.75 per 100 kbp), which exhibits better assembly performance than closely related species (Echeneis naucrates: GCA_900963305.1) (Table 2). Next, we remapped T7 clean short reads and PacBio clean long reads to the R. albescens genome using BWA³⁶ and Minimap2³⁷, yielding mapping rates of 99.83%, 99.96% and coverage rates (at least 4X) of 99.61%, 99.76%, respectively (Table 7). Furthermore, the completeness of the R. albescens genome was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO, v5.1.0)³⁸ with the actinopterygii_odb10 database. The analysis revealed that the genome assembly contained 3,571 (98.1%) complete BUSCO genes, comprising 3,551 (97.55%) single-copy BUSCO genes, 20 (0.55%) duplicated BUSCO genes, and 11 (0.3%) fragmented BUSCO genes (Table 8). Collectively, the comprehensive assessment indicates that the R. albescens genome serves as a high-quality reference genome.

Table 7 Statistics of T7 and PacBio data remapped to the R. albescens genome.

Full size table

Table 8 Statistics of BUSCO assessment in the R. albescens genome.

Full size table

Code availability

No specific custom codes were developed in this study. Data analyses were conducted following the guidelines and protocols provided by the developers of the respective bioinformatics tools, as detailed in the methods section.

References

Schwartz, F. J. Five species of sharksuckers (family Echeneidae) in North Carolina. JNCAS 120, 44–49 (2004).
Google Scholar
O’Toole, B. Phylogeny of the species of the superfamily Echeneoidea (Perciformes: Carangoidei: Echeneidae, Rachycentridae, and Coryphaenidae), with an interpretation of echeneid hitchhiking behaviour. Can J Zool 80, 596–623 (2002).
Article Google Scholar
Tang, W. C. Chinese medicinal materials from the sea. (1987).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, 884–890 (2018).
Article Google Scholar
Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. Quant Biol 35, 62–67 (2013).
Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Roach, M. J., Schmidt, S. A. & Borneman, A. R. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19, 460 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res 4, 1310 (2015).
Article PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, Unit 4 10 (2004).
PubMed Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, 435–439 (2006).
Article Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J Mol Biol 268, 78–94 (1997).
Article CAS PubMed Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11, 1650–1667 (2016).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Cantarel, B. L. et al. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18, 188–196 (2008).
Article CAS PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31, 365–370 (2003).
Article CAS PubMed PubMed Central Google Scholar
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).
Article PubMed PubMed Central Google Scholar
Zdobnov, E. M. & Apweiler, R. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848 (2001).
Article CAS PubMed Google Scholar
Kozomara, A., Birgaoanu, M. & Griffiths-Jones, S. miRBase: from microRNA sequences to function. Nucleic Acids Res 47, 155–162 (2019).
Article Google Scholar
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res 33, 121–124 (2005).
Article Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26831100 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26831099 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26831098 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28537587 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc:JAXCVL000000000 (2023).
Wang, D. Y. et al. Chromosome-level genome assembly and annotation of Remora albescens. figshare https://doi.org/10.6084/m9.figshare.24624144.v1 (2024).
Gurevich, A., Saveliev, V., Vyahhi, N. & Tesler, G. QUAST: quality assessment tool for genome assemblies. Bioinformatics 29, 1072–1075 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Waterhouse, R. M. et al. BUSCO Applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35, 543–548 (2018).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (41976083).

Author information

Authors and Affiliations

College of Fisheries, Southwest University, Chongqing, 402460, China
Chaowei Zhou
Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City, Southwest University, Chongqing, 402460, China
Chaowei Zhou
Wuhan Onemore-tech Co., Ltd, Wuhan, 430000, China
Qi Liu
Fisheries College, Zhejiang Ocean University, Zhoushan, 316022, China
Yinquan Qu & Tianxiang Gao
Key Laboratory of Tropical Marine Ecosystem and Bioresource, Fourth Institute of Oceanography, Ministry of Natural Resources, Beihai, 536000, China
Ying Qiao
MOE Key Laboratory of Marine Genetics and Breeding, College of Marine Life Sciences, Ocean University of China, Qingdao, 266100, China
Danyang Wang

Authors

Chaowei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yinquan Qu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Tianxiang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Danyang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.W. and T.G. conceived and designed the study. C.Z. and T.G. conducted animal work and prepared biological samples. C.Z., Q.L. and Y.Q. performed the genome assembly and analysis. D.W., T.G. and C.Z. wrote the paper. D.W., T.G., C.Z. Q.L. and Y.Q. revised the paper.

Corresponding authors

Correspondence to Tianxiang Gao or Danyang Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhou, C., Liu, Q., Qu, Y. et al. The first chromosomal-level genome assembly and annotation of white suckerfish Remora albescens. Sci Data 11, 523 (2024). https://doi.org/10.1038/s41597-024-03363-4

Download citation

Received: 05 December 2023
Accepted: 09 May 2024
Published: 22 May 2024
DOI: https://doi.org/10.1038/s41597-024-03363-4
Springer Nature Limited

The first chromosomal-level genome assembly and annotation of white suckerfish Remora albescens

Abstract

Similar content being viewed by others

A chromosome-level genome assembly of the yellow-throated marten (Martes flavigula)

Improved chromosomal-level genome assembly and re-annotation of leopard coral grouper

Chromosomal-level genome assembly and annotation of the tropical sea cucumber Holothuria scabra

Background & Summary