Chromosome-scale genome assemblies of sexually dimorphic male and female Acrossocheilus fasciatus

Yuan, Yixin; Zhong, Tianxing; Wang, Yifei; Yang, Jinquan; Gui, Lang; Shen, Yubang; Zhou, Jiajun; Chung-Davidson, Yu-Wen; Li, Weiming; Xu, Jinkai; Li, Jiale; Li, Mingyou; Ren, Jianfeng

doi:10.1038/s41597-024-03504-9

Chromosome-scale genome assemblies of sexually dimorphic male and female Acrossocheilus fasciatus

Data Descriptor
Open access
Published: 21 June 2024

Volume 11, article number 653, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-scale genome assemblies of sexually dimorphic male and female Acrossocheilus fasciatus

Download PDF

283 Accesses
Explore all metrics

Abstract

Acrossocheilus fasciatus is a stream-dwelling fish species of the Barbinae subfamily. It is valued for its colorfully striped appearance and delicious meat. This species is also characterized by apparent sexual dimorphism and toxic ovum. Biology and aquaculture researches of A. fasciatus are hindered by the lack of a high-quality reference genome. Here, we report chromosome-level genome assemblies of the male and female A. fasciatus. The HiFi-only genome assemblies for both female and male individuals were 899.13 Mb (N50 length of 32.58 Mb) and 885.68 Mb (N50 length of 33.06 Mb), respectively. Notably, a substantial proportion of the assembled sequences, accounting for 96.15% and 98.35% for female and male genomes, respectively, were successfully anchored onto 25 chromosomes utilizing Hi-C data. We annotated the female assembly as a reference genome and identified a total of 400.62 Mb (44.56%) repetitive sequences, 27,392 protein-coding genes, and 35,869 ncRNAs. The high-quality male and female reference genomes will provide genomic resources for developing sex-specific molecular markers, inform single-sex breeding, and elucidate genetic mechanisms of sexual dimorphism.

Chromosome assembly of Collichthys lucidus, a fish of Sciaenidae with a multiple sex chromosome system

Article Open access 24 July 2019

Chromosome-level genome assembly of Asian yellow pond turtle (Mauremys mutica) with temperature-dependent sex determination system

Article Open access 12 May 2022

Chromosome-level genome assembly and annotation of the yellow grouper, Epinephelus awoara

Article Open access 31 January 2024

Background & Summary

The Barbinae is a subfamily of the Cyprinidae that is the largest family of freshwater fishes. This subfamily contains the most complex and diverse fish groups within the Cyprinidae¹. Their morphologies and habits are highly diverse. For example, Sinocyclocheilus rhinocerous dwells in caves and has evolved relevant traits². Genome sequences of several Barbinae species, including three species of genus Sinocyclocheilus (S. grahami, S. rhinocerous, and S. anshuiensis), Poropuntius huangchuchieni, Puntigrus tetrazonahas, and Onychostoma macrolepis, have been deciphered, largely due to their phylogeny features and notable evolutionary status^2,3,4. Most of the species in the Barbinae had undergone whole genome duplication after the third round of teleost-specific genome duplication (TGD) event that generated tetraploid even hexaploid⁵. However, some species remain diploids that retain the original chromosome number 2n=50, such as O. macrolepis, P. huangchuchieni and P. tetrazonahas^3,4,6. Acrossocheilus fasciatus is also a diploid species in the Barbinae, with chromosome number 2n=50⁷. It is mainly found in streams south of the Yangtze River and is extremely popular with recreational fisheries due to its colorful appearance with six dark stripes. It is a local delicacy and is considered highly nutritious⁸ by people in southeast China, especially in Zhejiang Province. However, because of its small size and slow growth rate⁹, this fish is always in short supply and has great market prospects. In addition, A. fasciatus is ichthyootoxic, with toxic ova¹⁰. The structures of the toxins remain unknown. Furthermore, it is sexually dimorphic in both body mass and appearance (Fig. 1). The weight of a two-year-old mature female is approximately 1.5 times that of the mature male¹¹. In mature males, the six black transverse stripes gradually faded with the appearance of secondary sex characteristics such as the pearl organs and redness of the abdomen, whereas the females always retain the transverse stripes.

Despite its biological and economic importance, the genomic resources of A. fasciatus are limited. Several studies on A. fasciatus were focused on the mitochondrial DNA or transcriptomes^{12,13,14,15,16}. In this study, we sequenced and annotated the chromosome-scale genome assemblies of the male and female A. fasciatus using PacBio HiFi reads and high-throughput chromosome conformation capture (Hi-C) technologies. The genome size of female A. fasciatus was estimated to be about 880.6 Mb through k-mer frequency distribution analysis with 126.33 Gb (~143 × ) Illumina clean data. The female and male genomes were independently assembled into contigs with PacBio HiFi reads. The female genome assembly spans 899.13 Mb with a contig N50 length of 32.58 Mb using 62.01 Gb (~70 × ) PacBio HiFi clean reads. The male genome spans 885.68 Mb with a contig N50 length of 33.06 Mb using 97.67 Gb (~111 × ) of HiFi clean reads. 96.15% and 98.35% of contig sequences of the female (contigs N50 length = 32.35 Mb; scaffolds N50 length = 33.86 Mb) and male (contigs N50 length = 32.84 Mb; scaffolds N50 length = 33.78 Mb) genomes were anchored onto 25 chromosomes using Hi-C data (Supplementary Table 1). Finally, the female genome was annotated as a reference genome with 44.56% (400.62 Mb) of repetitive sequences, 27,392 protein-coding genes, and 35,869 ncRNAs. The female and male genome assemblies reported here provide genomic resources for development of sex-specific molecular markers and single-sex breeding as well as a better understanding of the mechanisms of sexual dimorphism.

Methods

Sample collection

Two-year-old female and male adults of A. fasciatus were randomly sampled from the second-generation progeny of selective breeding performed in Dingxin Ecological Agriculture Co., Ltd. (Xiuning County, Huangshan City of Anhui Province, China). The sampled fish were euthanized with MS-222 (Sigma-Aldrich, #A5040) and dissected on ice. Eight tissues including the brain, gill, heart, intestine, liver, ovary, muscle, and skin of one female (body length = 16.23 cm, body weight = 43.56 g) were collected and immediately frozen in liquid nitrogen and then stored at −80 °C until DNA and RNA extraction. The blood and muscle tissues of one male (body length = 13.05 cm, body weight = 26.73 g) were collected for DNA extraction.

DNA extraction and sequencing for genomes

The high-molecular weight (HMW) genomic DNA from the female muscle and the male blood of A. fasciatus was extracted using the phenol/chloroform method¹⁷. The quality and quantity of the extracted DNA were assessed using 1.0% agarose gel electrophoresis and a Qubit 4.0 fluorometer (Thermo Fisher Scientific, USA).

For PacBio sequencing, the high-quality DNA (main band > 30 kb) was randomly interrupted into 15–18 kb size fragments by a Covaris g-TUBE (Woburn, Massachusetts, USA), and then the SMRTbell libraries were constructed using the PacBio HiFi Express Template Prep Kit 2.0 according to the manufacturer’s instruction¹⁸ (Pacific Biosciences, Menlo Park, CA, USA). For the female genome assembly, we generated two cells of HiFi clean reads with 62.01 Gb (~70 × ) data and an N50 read length of 14.12 kb using PacBio Sequel IIe platform. For the male genome assembly, we generated only one cell of HiFi reads with 97.67 Gb (~111 × ) data and an N50 read length of 13.96 kb using PacBio Revio platform (Table 1). For Illumina sequencing, the DNA was randomly interrupted into ~350 bp fragments using the Covaris ultrasonic crusher. Libraries were constructed using NEBNext^® Ultra^TM DNA Library Prep Kit for Illumina (NEB, #E7370L) and sequenced on the Novaseq 6000 platform (Illumina, Inc., San Diego, CA, USA) with paired-end (PE) 150 bp model. We also obtained 126.33 Gb (~143 × ) of Illumina short reads to survey the female genome (Table 1).

Table 1 Statistics of the sequencing data for A. fasciatus genomes.

Full size table

For genome scaffolding, Hi-C libraries were prepared using muscle tissues from both female and male individuals for PacBio genome sequencing. The Hi-C library construction, including cell crosslinking, cell lysis, chromatin digestion (MboI), biotin labeling, proximal chromatin DNA ligation and DNA purification, was performed according to the standard protocol described previously^19,20. After quality control assessment by Agilent 2100 Bioanalyzer and qPCR test, the resulting Hi-C libraries were subjected to sequencing with PE 150 bp model on Illumina Novaseq. 6000 platform. As a result, a total of 137.24 Gb (~152 × ) and 104.69 Gb (~116 × ) raw read data were generated for the female and male genome, respectively (Table 1).

RNA extraction and transcriptome sequencing

Eight sampled tissues, including the brain, gill, heart, intestine, liver, ovary, muscle, and skin of the female A. fasciatus were each extracted for total RNA using TRIzol^TM reagent (Thermo Fisher Scientific, USA). The resulting RNAs were treated with DNase I (NEB, USA) to remove the genomic DNA.

To facilitate genome annotation, both Iso-Seq and RNA-Seq were performed. For PacBio Iso-Seq, the RNAs were mixed equimolarly and subjected to sequencing. Specifically, the concentration, integrity, and purity of the RNA isolated from each tissue of the female were confirmed using Qubit, Agilent 2100 and Nanodrop, then pooled together at an equimolar concentration. A double-stranded cDNA library was prepared with SMARTer^® PCR cDNA Synthesis Kit (Clontech, USA). Subsequently, the cDNA library was sequenced using the PacBio Sequel IIe platform. After filtering and treating using SMRTlink v11.0 (https://www.pacb.com/support/software-downloads/) with parameters–minLength=50, a total of 20.25 Gb of subreads data were generated (Table 1). For Illumina RNA-seq, eight cDNA libraries from the aforementioned tissues were constructed independently and sequenced using Illumina NovaSeq 6000. A total of 56.32 Gb clean data were generated after removing reads containing adapters, reads with more than 10% unknown nucleotides (Ns) or low-quality bases (more than 20% bases with Phred quality < 5) (Table 1).

De novo genome assembly with PacBio HiFi reads and Hi-C technologies

Before de novo assembly, the size of the female genome was estimated with k-mer analysis of Illumina reads. The Illumina clean reads were filtered to remove redundancy with in-house script redup.v2 developed by Novogene (Beijing, China), and utilized to calculate the k-mer frequency with k=17 using Jellyfish v2.2.7^21,22. Based on the formula: genome size = k-mer number/peak depth, the female genome size of A. fasciatus was estimated to be 880.6 Mb, with a heterozygous ratio of 0.53% and repeat rate of 47.82% (Supplemental Fig. 1).

PacBio HiFi reads from the female and the male individuals were assembled into the female contigs and the male contigs using Hifasm v0.16.1²³ with default parameters. A total of 110 female contigs were built with a total length of 899,126,031 bp and an N50 length of 32.58 Mb. And a total of 174 male contigs were built with a total length of 885,680,593 bp and an N50 length of 33.06 Mb.

The Hi-C raw reads were processed to remove paired reads that contain adapters or low-quality bases (more than 20% bases with Phred quality <5), and quality-controlled by HiCUP²⁴. Subsequently, the contigs were anchored into 25 pseudo-chromosomes using ALLHiC pipeline²⁵ with the clean Hi-C data (Fig. 2a). Juicebox software was used to correct chromosome interaction strength artificially (Supplemental Fig. 2)²⁶. As a result, 84 scaffolds of the female genome were generated with a total length of 899,129,631 bp and an N50 length of 33.86 Mb, of which 96.15% (864,515,734 bp) was anchored onto 25 chromosomes (Tables 2, 3). 167 scaffolds of the male genome were generated with a total length of 885,681,293 bp and an N50 length of 33.78 Mb, of which 98.35% (871,084,321 bp) was anchored onto 25 chromosomes (Tables 2, 3). Finally, we obtained the high-quality chromosome-level male and female reference genomes with Hi-C technologies²⁰ for genome characters analysis (Fig. 2b).

Table 2 Statistics of the 25 chromosomes in both female and male genomes of A. fasciatus.

Full size table

Table 3 Statistics of scaffolds anchored in both the female and male genomes of A. fasciatus.

Full size table

Genomic synteny analysis

To assign the chromosome ID of A. fasciatus genomes and assess the accuracy of genome assemblies, we performed the genomic synteny analysis between zebrafish Danio rerio, and the female and male A. fasciatus. For synteny analysis between the assemblies of zebrafish and female A. fasciatus, Mummer²⁷ (v4.0.0beta2) was used to match the maximal unique sequences between the genomes with parameter “–mincluster 500”. The matched sequence sets were filtered by removing the sets with sequence similarity less than 80%. For synteny analysis between the female and the male assemblies of A. fasciatus, the matched sequence sets were filtered by removing the sets with sequence similarity of less than 95% and length less than 10 kb. Genomic synteny graphs were generated with the matched sets using RectChr v1.36 (https://github.com/BGI-shenzhen/RectChr) (Fig. 2c). The synteny graphs indicated a moderate level of collinearity with minor rearrangements between the genomes of zebrafish and A. fasciatus, and the genome assemblies of the female and male A. fasciatus are remarkably accurate. No obvious chromosome structure variation was observed between female and male genomes through synteny analysis.

Repeat annotation of the female genome

The repeat sequences mainly consisted of interspersed repeats (mainly transposable elements, TEs) and tandem repeats. The repeat sequences of TEs in the female A. fasciatus genome were identified using a strategy combing homology alignment and ab initio search. Tandem repeats were predicted ab initio using TRF²⁸. Firstly, the homolog prediction of TEs was based on Repbase²⁹ database employing RepeatMasker and RepeatProteinMask³⁰ (https://www.repeatmasker.org/) with default parameters. Secondly, de novo repetitive elements were identified by LTR_FINDER³¹, RepeatScout³², and RepeatModeler³³ with the default parameters. All repeat sequences with length > 100 bp and a gap ‘N’ less than 5% constituted the de novo TE library. Finally, a customized library (combination of homolog and de novo TE library without redundancy) was subjected to homology search using RepeatMasker to identify TEs. As a result, extensive repeat sequences including tandem repeats and interspersed repeats were detected in the genome, accounting for approximately 44.56% (400.62 Mb) of the genome (Table 4), which is close to the repeat rate of 47.82% estimated by the genome survey. The tandem repeat sequences were 57.51 Mb in length, accounting for 6.40% of the genome (Table 4).

Table 4 Statistics of the repetitive sequences in female genome of A. fasciatus.

Full size table

Gene prediction and functional annotation

Three strategies were used to predict gene structures in the female genome: homology searching, ab initio prediction, and transcriptome-assisted prediction. For homology searching, the homologous protein sequences of Danio rerio, Ctenopharyngodon idella, Megalobrama amblycephala, Poropuntius huangchuchieni, Puntigrus tetrazona, Onychostoma macrolepis, and Oryzias latipes were downloaded from NCBI database (https://ftp.ncbi.nlm.nih.gov/genomes/refseq). Protein sequences were aligned to the genome using TBLASTN (v2.2.26; E-value ≤1e⁻⁵)³⁴, and then the matched proteins were aligned to the homologous genome sequences for accurate spliced alignments with GeneWise (v2.4.1)³⁵ which was used to predict gene structure contained in each protein region. For gene predication ab initio, AUGUSTUS³⁶ (v3.2.3), GeneID³⁷ (v1.4), GENSCAN³⁸ (v1.0) and GlimmerHMM³⁹ (v3.04) and SNAP⁴⁰ (2013-11-29) were used in an automated gene prediction pipeline. For RNA-sequencing-assisted prediction, transcriptome read assemblies were generated with Trinity (v2.1.1) for the genome annotation⁴¹. To optimize the genome annotation, the RNA-Seq reads from different tissues were aligned to genome sequences using HISAT (v2.0.4) with default parameters to identify exon regions and splice positions⁴². The alignment results were then used as the input for Cufflinks (v2.2.1) with default parameters for genome-based transcript assembly⁴³. The non-redundant reference gene set was generated by merging genes predicted by three methods with EvidenceModeler (EVM, v1.1.1) and then further annotated with PASA (Program to Assemble Spliced Alignment)⁴⁴. As a result, we identified 27,392 protein-coding genes in the female reference genome (Table 5, Supplemental Fig. 3a).

Table 5 Statistics of gene structure prediction in female genome of A. fasciatus.

Full size table

Gene functions were assigned according to the best match by aligning the protein sequences to the Swiss-Prot⁴⁵ (http://www.uniprot.org/) using BLASTP (E-value ≤ 1e-5). The motifs and domains were annotated using InterProScan70⁴⁶ (v5.31) (https://www.ebi.ac.uk/interpro/). The Gene Ontology (GO) IDs for each gene were assigned according to the corresponding InterPro entry. We predicted the protein function by transferring annotations from the closest BLAST hit (E-value ≤ 1e-5) in the Swiss-Prot database and DIAMOND (v0.8.22)/BLAST hit (E-value < 10-5) in the NR database (ftp://ftp.ncbi.nih.gov/blast/db). We also mapped the gene set to a KEGG pathway and identified the best match for each gene⁴⁷. As a result, 96.1% of the predicted 27,392 protein-coding genes have functional annotations (Supplementary Fig. 3b).

For non-coding RNA (ncRNA) annotation, the tRNAs were predicted using the program tRNAscan-SE⁴⁸. Since rRNAs are highly conserved, the rRNA sequences of Homo sapiens were chosen as references, and rRNA sequences were predicted using BLASTN (E-value ≤ 1e-5). Other ncRNAs, including miRNAs and snRNAs were identified by searching against the Rfam database with default parameters using the infernal software⁴⁹. Finally, a total of 35,869 ncRNAs were identified including 2,588 miRNAs, 18,386 tRNAs, 12,709 rRNAs, and 2,186 snRNAs (Supplementary Table 2).

Furthermore, the male genome of A. fasciatus was also annotated using the annotation result of the female genome as a reference with the liftoff⁵⁰ software, an accurate gene annotation mapping tool, capable of mapping genes from a reference genome to a target genome.

Data Records

All the raw sequencing data for genome assembly have been deposited in the NCBI database (https://www.ncbi.nlm.nih.gov/bioproject). Specifically, for the female genome, the Illumina WGS data (SRR26993408⁵¹-SRR26993409⁵²), PacBio WGS data (SRR26993393⁵³-SRR26993394⁵⁴), transcriptome data (SRR26993400-SRR269934007^{55,56,57,58,59,60,61,62},SRR26993392⁶³) and Hi-C data (SRR26993395-SRR26993399^{64,65,66,67,68}) were deposited under the BioProject accession number PRJNA1045882. For the male genome, the PacBio WGS data (SRR27126179⁶⁹) and Hi-C data (SRR27588553⁷⁰) were deposited under the BioProject accession number PRJNA1049304. The final files of the assembled genome of A. fasciatus have been deposited at GenBank under the accession number JAXUIB000000000 (female)⁷¹ and JAZDCR000000000 (male)⁷². Meanwhile, all the data including the male and female genome sequences and annotation files are accessible through the Figshare⁷³.

Technical Validation

Benchmarking Universal Single-Copy Orthologues (BUSCO)⁷⁴, Core Eukaryotic Genes Mapping Approach (CEGMA)⁷⁵, and Merqury software⁷⁶ were used to evaluate the genome assemblies. The BUSCO (v5.2.2) was used to evaluate the completeness of the genome assemblies with the vertebrata database (vertebrata_odb10). Out of the 3,354 orthologous genes, 3,304 (98.5%) genes were identified as complete genes, 16 (0.5%) genes were identified as fragmented genes, and 34 (1%) genes were missing from the female genome assembly (Fig. 3a). On the other hand, 3,301 (98.5%) genes were identified as complete genes, 19 (0.5%) genes were identified as fragmented genes, and 34 (1%) genes were missing from the male genome assembly (Fig. 3b). Meanwhile, CEGMA (v2.5) evaluation was also considered for genome completeness evaluation. Out of the 248 Eukaryotic core genes, 235 (94.76%) genes and 233 (93.95%) were identified in the female and male genomes, respectively (Supplementary Table 3). To further assess the completeness of genome assemblies, we identified telomeric repeats in both female and male genomes using tidk (v0.2.41) (https://zenodo.org/records/10091385) with Cypriniformes-specific telomeric repeat sequences. The results demonstrated telomeric repeat sequences could be identified in almost all of the chromosome ends (Supplementary Fig. 4). These results indicate an extremely high level of completeness of the genome assemblies.

To evaluate the quality and accuracy of the female genome assembly, we employed a three-step validation process. Firstly, the Illumina short-reads for the genome survey were mapped to genome assembly using BWA-MEM (v0.7.8)⁷⁷ with default parameters, and then SAMtools⁷⁷ was used for SNP calling. As a result, 99.30% of reads were mapped to the genome with approximately 99.95% coverage. Subsequently, the base quality value (QV) of genome sequences was quantified using Merqury software, resulting in a QV score of 52.22. All these results indicate a high-quality genome assembly. The GC skew of genome assembly was calculated with a 10 kb slide window using SOAP.coverage (v2.7.7)⁷⁸. GC content was 37.49% with no obvious separation, indicating no foreign contamination in the genome (Supplementary Fig. 5).

Code availability

There were no custom software codes developed. The tools used for reads quality control are non-open scripts developed by the Novogene (Beijing, China). All bioinformatics tools and pipelines were performed following the instructions of the manuals and protocols. The versions of the software used, along with their corresponding parameters, have been thoroughly described in the Methods section.

References

Zheng, L. P., Yang, J. X. & Chen, X. Y. Molecular phylogeny and systematics of the Barbinae (Teleostei: Cyprinidae) in China inferred from mitochondrial DNA sequences. Biochem. Syst. Ecol. 68, 250–259 (2016).
Article CAS Google Scholar
Yang, J. X. et al. The Sinocyclocheilus cavefish genome provides insights into cave adaptation. BMC Biol. 14, (1) (2016).
Chen, L. et al. Chromosome-level genome of Poropuntius huangchuchieni provides a diploid progenitor-like reference genome for the allotetraploid Cyprinus carpio. Mol. Ecol. Resour. 21, 1658–1669 (2021).
Article CAS PubMed Google Scholar
Li, J. T. et al. Parallel subgenome structure and divergent expression evolution of allo-tetraploid common carp and goldfish. Nat. Genet. 53, 1493–1503 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xu, M. R. X. et al. Maternal dominance contributes to subgenome differentiation in allopolyploid fishes. Nat. Commun. 14 (2023).
Cui, W. Y. et al. Embryonic development and phylogenetic analysis of Puntius tetrazona. Journal of Fisheries of China (in Chinese) 44, 1286–1295 (2020).
Google Scholar
Jiang, J., Li, M. Y. & Wu, E. M. Chromosome karyotyping of Acrossocheilus fasciatus. Freshwater Fisheries of China (in Chinese) 39, 77–79 (2009).
Google Scholar
Yu, Y. Y., Zhou, J. B., Zhang, Y. M. & Li, M. Y. The nutritional compositions and evalution of wild and cultured Acrossocheilus fasciatus. Journal of Fishery Sciences of China (in Chinese) 31, 207–210 (2012).
CAS Google Scholar
Yan, Y. Z. et al. Life-history strategies of Acrossocheilus fasciatus (Barbinae, Cyprinidae) in the Huishui Stream of the Qingyi watershed, China. Ichthyol. Res. 59, 202–211 (2012).
Article Google Scholar
Wu, H. L. New records of toxic and medicinal fishes in China. (China Agriculture Press, 2002).
Zhang, Y. M., Cheng, S., Jiang, J. H., Lei, S. Y. & Yang, L. J. Primary study on the growth of Acrossocheilus fasciatus in cultivation. Journal of Shanghai Ocean University (in Chinese) 21, 542–548 (2012).
Google Scholar
Zhou, M. Y. et al. Historical landscape evolution shaped the phylogeography and population history of the cyprinid fishes of Acrossocheilus (Cypriniformes: Cyprinidae) according to mitochondrial DNA in Zhejiang Province, China. Diversity (Basel) 15 (2023).
Wei, Z. Z., Fang, Y., Shi, W., Chu, Z. J. & Zhao, B. Transcriptional modulation reveals physiological responses to temperature adaptation in Acrossocheilus fasciatus. Int. J. Mol. Sci. 24 (2023).
Wei, W. B. et al. Integrated mRNA and miRNA expression profile analysis of female and male gonads in Acrossocheilus fasciatus. Biology 11 (2022).
Wang, L. et al. Influences of chronic copper exposure on intestinal histology, antioxidative and immune status, and transcriptomic response in freshwater grouper (Acrossocheilus fasciatus). Fish Shellfish Immunol. 139 (2023).
Wang, L. et al. Dietary berberine against intestinal oxidative stress, inflammation response, and microbiota disturbance caused by chronic copper exposure in freshwater grouper (Acrossocheilus fasciatus). Fish Shellfish Immunol. 139 (2023).
Green, M. R. & Sambrook, J. Isolation of High-Molecular-Weight DNA using organic solvents. Cold Spring Harb. Protoc. 2017, pdb.prot093450 (2017).
Article PubMed Google Scholar
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Article ADS CAS PubMed Google Scholar
Belton, J. M. et al. Hi-C: A comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Article CAS PubMed Google Scholar
Rao, S. S. P. et al. A 3D Map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article CAS PubMed PubMed Central Google Scholar
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Article CAS PubMed PubMed Central Google Scholar
Li, R. Q. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H. Y., Concepcion, G. T., Feng, X. W., Zhang, H. W. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Research 4, 1310 (2015).
Article PubMed PubMed Central Google Scholar
Zhang, X. T., Zhang, S. C., Zhao, Q., Ming, R. & Tang, H. B. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 5, 833–845 (2019).
Article CAS PubMed Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002).
Article PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Jurka, J. et al. Repbase update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. Chapter 4, Unit 4.10 (2004).
Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, I351–I358 (2005).
Article CAS PubMed Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS PubMed PubMed Central Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. & Waack, S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, II215–II225 (2003).
Article PubMed Google Scholar
Parra, G., Blanco, E. & Guigó, R. GeneID in Drosophila. Genome Res. 10, 511–515 (2000).
Article CAS PubMed PubMed Central Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J.Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinform. 5 (2004).
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 12, 357–360 (2015).
Article CAS PubMed PubMed Central Google Scholar
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–U174 (2010).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Article CAS PubMed PubMed Central Google Scholar
Mulder, N. & Apweiler, R. InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol. Biol. (Clifton, N.J.) 396, 59–70 (2007).
Article CAS Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993408 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993409 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993393 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993394 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993400 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993401 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993402 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993403 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993404 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993405 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993406 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993407 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993392 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993395 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993396 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993397 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993398 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26993399 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27126179 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27588553 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc:JAXUIB000000000 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc:JAZDCR000000000 (2023).
Yuan, Y. X. The genome annotations of Acrossocheilus fasciatus. figshare https://doi.org/10.6084/m9.figshare.24995825 (2023).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Article CAS PubMed PubMed Central Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).
Article CAS PubMed Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21 (2020).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was financially supported by the National Key Research and Development Program of China (No.2022YFD2400102) and the National Natural Science Foundation of China (No. 31872207).

Author information

These authors contributed equally: Yixin Yuan, Tianxing Zhong.

Authors and Affiliations

Key Laboratory of Freshwater Aquatic Genetic Resources certificated by the Ministry of Agriculture and Rural Affairs, Shanghai Ocean University, Shanghai, 201306, China
Yixin Yuan, Tianxing Zhong, Yifei Wang, Jinquan Yang, Lang Gui, Yubang Shen, Jiale Li, Mingyou Li & Jianfeng Ren
Zhejiang Forest Resource Monitoring Center, Hangzhou, 310020, China
Jiajun Zhou
Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, 48824, USA
Yu-Wen Chung-Davidson & Weiming Li
Huangshan Dingxin Ecological Agriculture Co., Ltd, Huangshan, 245431, China
Jinkai Xu

Authors

Yixin Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Tianxing Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jinquan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lang Gui
View author publications
You can also search for this author in PubMed Google Scholar
Yubang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Wen Chung-Davidson
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinkai Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jiale Li
View author publications
You can also search for this author in PubMed Google Scholar
Mingyou Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Ren
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.F.R., M.Y.L. and J.L.L. conceived and supervised the study. T.X.Z., Y.F.W. and J.K.X. collected the samples. Y.X.Y., T.X.Z. and J.F.R. performed the bioinformatics analysis. Y.X.Y., T.X.Z. and J.F.R. drafted the manuscript. J.Q.Y., L.G., Y.B.S., J.J.Z., Y.-W.C.-D. and W.M.L. provided review comments and modification of the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Mingyou Li or Jianfeng Ren.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary files

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yuan, Y., Zhong, T., Wang, Y. et al. Chromosome-scale genome assemblies of sexually dimorphic male and female Acrossocheilus fasciatus. Sci Data 11, 653 (2024). https://doi.org/10.1038/s41597-024-03504-9

Download citation

Received: 05 February 2024
Accepted: 10 June 2024
Published: 21 June 2024
DOI: https://doi.org/10.1038/s41597-024-03504-9
Springer Nature Limited

Chromosome-scale genome assemblies of sexually dimorphic male and female Acrossocheilus fasciatus

Abstract

Similar content being viewed by others

Chromosome assembly of Collichthys lucidus, a fish of Sciaenidae with a multiple sex chromosome system

Chromosome-level genome assembly of Asian yellow pond turtle (Mauremys mutica) with temperature-dependent sex determination system

Chromosome-level genome assembly and annotation of the yellow grouper, Epinephelus awoara

Background & Summary