Skip to main content
Log in

The Utilization of Reference-Guided Assembly and In Silico Libraries Improves the Draft Genome of Clarias batrachus and Culter alburnus

  • Research
  • Published:
Marine Biotechnology Aims and scope Submit manuscript

Abstract

Long-read sequencing technologies can generate highly contiguous genome assemblies compared to short-read methods. However, their higher cost often poses a significant barrier. To address this, we explore the utilization of mapping-based genome assembly and reference-guided assembly as cost-effective alternative approaches. We assess the efficacy of these approaches in improving the contiguity of Clarias batrachus and Culter alburnus draft genomes. Our findings demonstrate that employing an iterative mapping strategy leads to a reduction in assembly errors. Specifically, after three iterations, the Mismatches per 100 kbp value for the C. batrachus genome decreased from 2447.20 to 2432.67, reaching a minimum of 2422.67 after two iterations. Additionally, the N50 value for the C. batrachus genome increased from 362,143 to 1,315,126 bp, with a maximum of 1,315,403 bp after two iterations. Furthermore, we achieved Mismatches per 100 kbp values of 3.70 for the reference-guided assembly of C. batrachus and 0.34 for C. alburnus. Correspondingly, the N50 value for the C. batrachus and C. alburnus genomes increased from 362,143 bp and 3,686,385 bp to 2,026,888 bp and 43,735,735 bp, respectively. Finally, we successfully utilized the improved C. batrachus and C. alburnus genomes to compare genome studies using the combined approach of Ragout and Ragtag. Through a comprehensive comparative analysis of mapping-based and reference-guided genome assembly methods, we shed light on the specific contributions of reference-guided assembly in reducing assembly errors and improving assembly continuity and integrity. These advancements establish reference-guided assembly and the utilization of in silico libraries as a promising and suitable approach for comparative genomics studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Availability of Data and Material

This study utilized several data sets which are publicly available in the NCBI Sequence Read Archive database and European Nucleotide Archive (ENA) database, including SRR7440018, GCA_003987875, GCA_013621035, GCA_011419295, GCA_009869775, GCA_024489055, and GCA_018812025.

Code Availability

Not applicable.

References

  • Ali RH, Bogusz M, Whelan S (2019) Identifying clusters of high confidence homologies in multiple sequence alignments. Mol Biol Evol 36:2340–2351

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Alonge M, Lebeigle L, Kirsche M, Jenike K, Ou S, Aganezov S, Wang X, Lippman ZB, Schatz MC, Soyk S (2022) Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol 23:258

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Altenhoff AM, Train CM, Gilbert KJ, Mediratta I, Mendes de Farias T, Moi D, Nevers Y, Radoykova HS, Rossier V, Warwick Vesztrocy A, Glover NM, Dessimoz C (2021) OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more. Nucleic Acids Res 49:D373–D379

    Article  CAS  PubMed  Google Scholar 

  • Antipov D, Korobeynikov A, McLean JS, Pevzner PA (2016) hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32:1009–1015

    Article  CAS  PubMed  Google Scholar 

  • Bao E, Jiang T, Girke T (2014) AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references. Bioinformatics 30:i319–i328

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Barnett R, Westbury MV, Sandoval-Velasco M, Vieira FG, Jeon S, Zazula G, Martin MD, Ho SYW, Mather N, Gopalakrishnan S, Ramos-Madrigal J, de Manuel M, Zepeda-Mendoza ML, Antunes A, Baez AC, De Cahsan B, Larson G, O'Brien SJ, Eizirik E, Johnson WE, Koepfli KP, Wilting A, Fickel J, Dalen L, Lorenzen ED, Marques-Bonet T, Hansen AJ, Zhang G, Bhak J, Yamaguchi N, Gilbert MTP (2020) Genomic adaptations and evolutionary history of the extinct scimitar-toothed cat, homotherium latidens. Curr Biol 30:5018–5025 e5015

  • Beier S, Himmelbach A, Colmsee C, Zhang XQ, Barrero RA, Zhang Q, Li L, Bayer M, Bolser D, Taudien S, Groth M, Felder M, Hastie A, Simkova H, Stankova H, Vrana J, Chan S, Munoz-Amatriain M, Ounit R, Wanamaker S, Schmutzer T, Aliyeva-Schnorr L, Grasso S, Tanskanen J, Sampath D, Heavens D, Cao S, Chapman B, Dai F, Han Y, Li H, Li X, Lin C, McCooke JK, Tan C, Wang S, Yin S, Zhou G, Poland JA, Bellgard MI, Houben A, Dolezel J, Ayling S, Lonardi S, Langridge P, Muehlbauer GJ, Kersey P, Clark MD, Caccamo M, Schulman AH, Platzer M, Close TJ, Hansson M, Zhang G, Braumann I, Li C, Waugh R, Scholz U, Stein N, Mascher M (2017) Construction of a map-based reference genome sequence for barley, Hordeum vulgare L. Sci Data 4:170044

  • Bouckaert R, Vaughan TG, Barido-Sottani J, Duchene S, Fourment M, Gavryushkina A, Heled J, Jones G, Kuhnert D, De Maio N, Matschiner M, Mendes FK, Muller NF, Ogilvie HA, du Plessis L, Popinga A, Rambaut A, Rasmussen D, Siveroni I, Suchard MA, Wu CH, Xie D, Zhang C, Stadler T, Drummond AJ (2019) BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. PLoS Comput Biol 15:e1006650

  • Brandt DY, Aguiar VR, Bitarello BD, Nunes K, Goudet J, Meyer D (2015) Mapping bias overestimates reference allele frequencies at the HLA genes in the 1000 genomes project phase I data. G3 (Bethesda) 5:931–941

  • Chen S, Zhou Y, Chen Y, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890

    Article  PubMed  PubMed Central  Google Scholar 

  • Chen Z, Erickson DL, Meng J (2020) Benchmarking hybrid assembly approaches for genomic analyses of bacterial pathogens using Illumina and Oxford nanopore sequencing. BMC Genom 21:631

    Article  CAS  Google Scholar 

  • Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H (2021) Twelve years of SAMtools and BCFtools. Gigascience 10:1–4

    Article  CAS  Google Scholar 

  • Douglas J, Zhang R, Bouckaert R (2021) Adaptive dating and fast proposals: revisiting the phylogenetic relaxed clock model. PLoS Comput Biol 17:e1008322

  • Duong TY, Tan MH, Lee YP, Croft L, Austin CM (2020) Dataset for genome sequencing and de novo assembly of the Vietnamese bighead catfish (Clarias macrocephalus Gunther, 1864). Data Brief 31:105861

  • Emms DM, Kelly S (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 20:238

    Article  PubMed  PubMed Central  Google Scholar 

  • Gavrielatos M, Kyriakidis K, Spandidos DA, Michalopoulos I (2021) Benchmarking of next and third generation sequencing technologies and their associated algorithms for de novo genome assembly. Mol Med Rep 23(4):251. https://doi.org/10.3892/mmr.2021.11890

  • Grau JH, Hackl T, Koepfli KP, Hofreiter M (2018) Improving draft genome contiguity with reference-derived in silico mate-pair libraries. Gigascience 7(5):giy029. https://doi.org/10.1093/gigascience/giy029

  • Gui S, Peng J, Wang X, Wu Z, Cao R, Salse J, Zhang H, Zhu Z, Xia Q, Quan Z, Shu L, Ke W, Ding Y (2018) Improving Nelumbo nucifera genome assemblies using high-resolution genetic maps and BioNano genome mapping reveals ancient chromosome rearrangements. Plant J 94:721–734

    Article  CAS  PubMed  Google Scholar 

  • Gunther T, Nettelblad C (2019) The presence and impact of reference bias on population genomic studies of prehistoric human populations. PLoS Genet 15:e1008302

  • Howe K, Wood JM (2015) Using optical mapping data for the improvement of vertebrate genome assemblies. Gigascience 4:10

    Article  PubMed  PubMed Central  Google Scholar 

  • Huang W, Li L, Myers JR, Marth GT (2012) ART: a next-generation sequencing read simulator. Bioinformatics 28:593–594

    Article  PubMed  Google Scholar 

  • Jackman SD, Vandervalk BP, Mohamadi H, Chu J, Yeo S, Hammond SA, Jahesh G, Khan H, Coombe L, Warren RL, Birol I (2017) ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter. Genome Res 27:768–777

  • Jung Y, Han D (2022) BWA-MEME: BWA-MEM emulated with a machine learning approach. Bioinformatics 38:2404–2413

    Article  CAS  PubMed  Google Scholar 

  • Kajitani R, Toshimoto K, Noguchi H, Toyoda A, Ogura Y, Okuno M, Yabana M, Harada M, Nagayasu E, Maruyama H, Kohara Y, Fujiyama A, Hayashi T, Itoh T (2014) Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res 24:1384–1395

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim J, Larkin DM, Cai Q, Asan ZY, Ge RL, Auvil L, Capitanu B, Zhang G, Lewin HA, Ma J (2013) Reference-assisted chromosome assembly. Proc Natl Acad Sci U S A 110:1785–1790

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kolmogorov M, Armstrong J, Raney BJ, Streeter I, Dunn M, Yang F, Odom D, Flicek P, Keane TM, Thybert D, Paten B, Pham S (2018) Chromosome assembly of large and complex genomes using multiple references. Genome Res 28:1720–1732

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kumar S, Stecher G, Suleski M, Hedges SB (2017) Timetree: a resource for timelines, timetrees, and divergence times. Mol Biol Evol 34:1812–1819

    Article  CAS  PubMed  Google Scholar 

  • Kushwaha B, Pandey M, Das P, Joshi CG, Nagpure NS, Kumar R, Kumar D, Agarwal S, Srivastava S, Singh M, Sahoo L, Jayasankar P, Meher PK, Shah TM, Hinsu AT, Patel N, Koringa PG, Das SP, Patnaik S, Bit A, Iquebal MA, Jaiswal S, Jena J (2021) The genome of walking catfish Clarias magur (Hamilton, 1822) unveils the genetic basis that may have facilitated the development of environmental and terrestrial adaptation systems in air-breathing catfishes. DNA Res 28(1):dsaa031. https://doi.org/10.1093/dnares/dsaa031

  • Li H (2022) auN: a new metric to measure assembly contiguity. https://lh3.github.io/2020/04/08/a-new-metric-on-assembly-contiguity. Accessed 10 March 2023

  • Li H (2023) Protein-to-genome alignment with miniprot. Bioinformatics 39(1):btad014. https://doi.org/10.1093/bioinformatics/btad014

  • Li N, Bao L, Zhou T, Yuan Z, Liu S, Dunham R, Li Y, Wang K, Xu X, Jin Y, Zeng Q, Gao S, Fu Q, Liu Y, Yang Y, Li Q, Meyer A, Gao D, Liu Z (2018) Genome sequence of walking catfish (Clarias batrachus) provides insights into terrestrial adaptation. BMC Genom 19:952

    Article  CAS  Google Scholar 

  • Lischer HEL, Shimizu KK (2017) Reference-guided de novo assembly approach improves genome reconstruction for related species. BMC Bioinform 18:474

    Article  Google Scholar 

  • Liu H, Chen C, Lv M, Liu N, Hu Y, Zhang H, Enbody ED, Gao Z, Andersson L, Wang W (2021) A chromosome-level assembly of blunt snout bream (Megalobrama amblycephala) genome reveals an expansion of olfactory receptor genes in freshwater fish. Mol Biol Evol 38:4238–4251

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Liu K, Xie N, Wang Y, Liu X (2023) Contribution bias of parental genomes to the hybrid lineages of black Amur bream and topmouth culter revealed by low-coverage whole-genome sequencing. Gene 852:147058

  • Lu H, Giordano F, Ning Z (2016) Oxford nanopore MinION sequencing and genome assembly. Genom Proteom Bioinform 14:265–279

  • Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J, Wu G, Zhang H, Shi Y, Liu Y, Yu C, Wang B, Lu Y, Han C, Cheung DW, Yiu SM, Peng S, Xiaoqian Z, Liu G, Liao X, Li Y, Yang H, Wang J, Lam TW, Wang J (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1:18

    Article  PubMed  PubMed Central  Google Scholar 

  • Manni M, Berkeley MR, Seppey M, Simao FA, Zdobnov EM (2021) BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral Genomes. Mol Biol Evol 38:4647–4654

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Marcais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A (2018) MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol 14:e1005944

  • Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A (2018) Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34:i142–i150

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Miller JR, Zhou P, Mudge J, Gurtowski J, Lee H, Ramaraj T, Walenz BP, Liu J, Stupar RM, Denny R, Song L, Singh N, Maron LG, McCouch SR, McCombie WR, Schatz MC, Tiffin P, Young ND, Silverstein KAT (2017) Hybrid assembly with long and short reads improves discovery of gene family expansions. BMC Genom 18:541

    Article  Google Scholar 

  • Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Palkopoulou E, Lipson M, Mallick S, Nielsen S, Rohland N, Baleka S, Karpinski E, Ivancevic AM, To TH, Kortschak RD, Raison JM, Qu Z, Chin TJ, Alt KW, Claesson S, Dalen L, MacPhee RDE, Meller H, Roca AL, Ryder OA, Heiman D, Young S, Breen M, Williams C, Aken BL, Ruffier M, Karlsson E, Johnson J, Di Palma F, Alfoldi J, Adelson DL, Mailund T, Munch K, Lindblad-Toh K, Hofreiter M, Poinar H, Reich D (2018) A comprehensive genomic history of extinct and living elephants. Proc Natl Acad Sci USA 115:E2566–E2574

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Paril J, Zare T, Fournier-Level A (2023) Compare_Genomes: a comparative genomics workflow to streamline the analysis of evolutionary divergence across eukaryotic genomes. Curr Protoc 3(8):e876. https://doi.org/10.1002/cpz1.876

  • Paulino D, Warren RL, Vandervalk BP, Raymond A, Jackman SD, Birol I (2015) Sealer: a scalable gap-closing application for finishing draft genomes. BMC Bioinform 16:230

    Article  Google Scholar 

  • Prasad A, Lorenzen ED, Westbury MV (2022) Evaluating the role of reference-genome phylogenetic distance on evolutionary inference. Mol Ecol Resour 22:45–55

    Article  PubMed  Google Scholar 

  • Ren L, Li W, Qin Q, Dai H, Han F, Xiao J, Gao X, Cui J, Wu C, Yan X, Wang G, Liu G, Liu J, Li J, Wan Z, Yang C, Zhang C, Tao M, Wang J, Luo K, Wang S, Hu F, Zhao R, Li X, Liu M, Zheng H, Zhou R, Shu Y, Wang Y, Liu Q, Tang C, Duan W, Liu S (2019) The subgenomes show asymmetric expression of alleles in hybrid lineages of Megalobrama amblycephala x Culter alburnus. Genome Res 29:1805–1815

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genom Proteom Bioinform 13:278–289

  • Ros-Freixedes R, Battagin M, Johnsson M, Gorjanc G, Mileham AJ, Rounsley SD, Hickey JM (2018) Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing. Genet Sel Evol 50:64

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sarver BA, Keeble S, Cosart T, Tucker PK, Dean MD, Good JM (2017) Phylogenomic insights into mouse evolution using a pseudoreference approach. Genome Biol Evol 9:726–739

    Article  PubMed  PubMed Central  Google Scholar 

  • Shapiro B, Hofreiter M (2014) A paleogenomic perspective on evolution and gene function: new insights from ancient DNA. Science 343:1236573

    Article  CAS  PubMed  Google Scholar 

  • Stevenson KR, Coolon JD, Wittkopp PJ (2013) Sources of bias in measures of allele-specific expression derived from RNA-sequence data aligned to a single reference genome. BMC Genom 14:536

    Article  Google Scholar 

  • Than C, Ruths D, Nakhleh L (2008) PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinform 9:322

    Article  Google Scholar 

  • Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou LP, Mi H (2022) PANTHER: making genome-scale phylogenetics accessible to all. Protein Sci 31:8–22

    Article  CAS  PubMed  Google Scholar 

  • Yu Y, Nakhleh L (2015) A maximum pseudo-likelihood approach for phylogenetic networks. BMC Genom 16:S10

  • Zhang J, Li C, Zhou Q, Zhang G (2015) Improving the ostrich genome assembly using optical mapping data. Gigascience 4:24

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhao S, Yang X, Pang B, Zhang L, Wang Q, He S, Dou H, Zhang H (2022) A chromosome-level genome assembly of the redfin culter (Chanodichthys erythropterus). Sci Data 9:535

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhou T, Lu L, Li C (2023) Optimization of the “in-silico” mate-pair method improves contiguity and accuracy of genome assembly. Ecol Evol 13:e9745

Download references

Funding

Science & Technology Innovation Program of Hangzhou Academy of Agricultural Sciences (Grant numbers 2022HNCT-01).

Author information

Authors and Affiliations

Authors

Contributions

Kai Liu and Nan Xie conducted the experiments; Kai Liu analyzed the data and wrote the manuscript; Yuxi Wang and Xinyi Liu participated in the data collection; All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kai Liu.

Ethics declarations

Ethics Approval

Approval from the Science and Technology Bureau of China and the Department of Wildlife Administration is not required for the experiments conducted in this paper when the fish in question are neither rare nor near extinction (first- or second-class state protection level). All activities comply with China's Wildlife Protection and Fishery Law.

Consent to Participate

The participant has consented to the participants of the manuscript.

Consent for Publication

The participant has consented to the submission of the manuscript to the journal.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, K., Xie, N., Wang, Y. et al. The Utilization of Reference-Guided Assembly and In Silico Libraries Improves the Draft Genome of Clarias batrachus and Culter alburnus. Mar Biotechnol 25, 907–917 (2023). https://doi.org/10.1007/s10126-023-10248-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10126-023-10248-x

Keywords

Navigation