A chromosome-level genome assembly of Prosopocoilus inquinatus Westwood, 1848 (Coleoptera: Lucanidae)

Pang, Bo; Zhan, Zhihong; Wang, Yunchao

doi:10.1038/s41597-024-03647-9

A chromosome-level genome assembly of Prosopocoilus inquinatus Westwood, 1848 (Coleoptera: Lucanidae)

Data Descriptor
Open access
Published: 20 July 2024

Volume 11, article number 808, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

A chromosome-level genome assembly of Prosopocoilus inquinatus Westwood, 1848 (Coleoptera: Lucanidae)

Download PDF

Bo Pang¹,
Zhihong Zhan² &
Yunchao Wang³

765 Accesses
5 Altmetric
Explore all metrics

Abstract

Lucanidae (Coleoptera: Scarabaeidae) are fascinating beetles exhibiting significant dimorphism and are widely used as beetle evolutionary study models. However, lacking high-quality genomes prohibits our understanding of Lucanidae. Herein, we proposed a chromosome-level genome assembly of a widespread species, Prosopocoilus inquinatus, combining PacBio HiFi, Illumina, and Hi-C data. The genome size reaches 649.73 Mb, having the scaffold N50 size of 59.50 Mb, and 99.6% (647.13 Mb) of the assembly successfully anchored on 12 chromosomes. The BUSCO analysis of the genome exhibits a completeness of 99.6% (n = 1,367), including 1,362 (98.5%) single-copy BUSCOs and 15 (1.1%) duplicated BUSCOs. The genome annotation identifies that the genome contains 61.41% repeat elements and 13,452 predicted protein-coding genes. This high-quality Lucanidae genome provides treasured genomic information to our knowledge of stag beetles.

A chromosomal-level genome assembly of Serrognathus titanus Boisduval, 1835 (Coleoptera: Lucanidae)

Article Open access 15 August 2024

Chromosome-level genome assembly of predatory Arma chinensis

Article Open access 04 September 2024

Chromosome-level genome assembly of Microplitis manilae Ashmead, 1904 (Hymenoptera: Braconidae)

Article Open access 10 May 2023

Background & Summary

The stag beetle (Coleoptera: Lucanidae) is a family in Superfamily Scarabaeoidea, comprising around 1,500 species worldwide¹. Most stag beetle species exhibit significant intraspecific or even interspecific sexual dimorphism, in which males usually tend to have extremely impressive mandibles to fight and attract females in the wild. Thus, stag beetles have received much attention since Linnaeus first described the Scarabaeus parallelipipedus from Europe (later transferred to the genus Dorcus)². Many lucanid species have been selected as an ideal behavior and functional morphology study model, and their fascinating mandibles make them popular pets and valuably private collections^3,4,5,6,7. In the wild, most stag beetles are closely related to forest ecosystems, as their carboxylic larvae usually feed on decaying logs and other litter, such as leaves or fungi^8,9,10.

The major geographical distribution and species diversity of Lucanidae are associated with the Indomalayan and Palearctic regions; 33 genera and nearly 400 species are known from China^11,12,13. The present research on the stag beetle primarily focuses on its taxonomy and phylogeny, including new species descriptions and mitochondrial genome studies^{7,11,12,13,14}. Our understanding of the stag beetle genome, especially high-quality genome assembly, remains in its infancy. Only one genome, Dorcus hopei, has been reported¹⁵. Compared with other beetles’ sharply increasing genome assembly number, more high-quality genome assemblies for stag beetles have become necessary and inevitable.

To enhance the knowledge of the taxonomy, evolution, and ecology of Lucanidae, we proposed the chromosome-level genome of a widespread species, Prosopocoilus inquinatus (Westwood, 1848), with the combination of PacBio HiFi, Illumina, and Hi-C data. Genome annotation, including repeats, non-coding RNAs (ncRNAs), and protein-coding genes (PCGs) were analyzed and exhibited. The high-quality genome of P. inquinatus provides valuable genomic information for Lucanidae study.

Methods

Sample collection and sequencing

A single P. inquinatus male sample was collected for DNA and RNA sequencing data on April 30, 2023, in Motuo County, Xizang, China. Muscle tissue, including the pronotum and posterior abdomen, was extracted from the specimen and washed via phosphate-buffered saline (PBS) solution for five minutes to eliminate any possible external pollutants. The specimen was then transferred into liquid nitrogen, frozen for at least 20 minutes, and kept at −80 °C for temporary storage until sequencing.

The specimen’s genomic DNA (gDNA) was extracted using the FastPure® Blood/Cell/Tissue/Bacteria DNA Isolation Mini Kit (Vazyme Biotech Co., Ltd, Nanjing, China). High molecular weight (HMW) gDNA was sheared into 15 kb with the MegaruptorTM device (Diagenode, Liege, Belgium) and was enriched using the AMPurePB Beads. PCR-free short reads library for whole genome sequencing (WGS) was prepared using the Truseq DNA PCR-free Kit. A PacBio HiFi 15 kb library was prepared using the SMRTbellTM Express Template Prep Kit 2.0, and the resulting library was sequenced on the PacBio Sequel II platform. The Hi-C data was carried out by digesting extracted DNA with the Mbol restriction enzyme. RNA was lysed from the specimen using the TRIzoTM Reagent (Invitrogen, Carlsbad, CA, USA). RNA-seq libraries were constructed using the VAHTS mRNA-seq v2 Library Prep Kit (Vazyme, Nanjing, China). The Illumina NovaSeq. 6000 platform was used to build all short-read libraries. The Nanopore PromethlION platform constructed long reads of the RNA library. Berry Genomics (Beijing, China) carried out all library constructions and sequencing. Consequently, we obtained 272.73 Gb of sequencing data, including 109.10 Gb (152.68×) of Illumina reads, 42.50 Gb (65.41×) of PacBio HiFi reads, 101.03 Gb (155.40×) of Hi-C data, 20.10 Gb of transcriptome data, including 9.72 Gb of short reads data and 10.38 Gb of long reads data (RNA-ONT) (Table 1).

Table 1 Statistics of the sequencing data generated for Prosopocoilus inquinatus.

Full size table

De novo genome assembly

Raw genomic Illumina sequencing reads for genome scan were employed as quality control using Fastp v0.23.2¹⁶ to remove adaptors, duplications, and low-quality reads.

Raw PacBio HiFi reads were generated into the primary assembly using Hifiasm v0.19.8¹⁷. The direct reads were then mapped with the raw HiFi reads using Minimap2 v2.24¹⁸ to calculate the mapping rate. One round of primary self-polishing assembly was performed for primary assembly by utilizing NextPolish2 v0.2.0¹⁹.

Raw Hi-C data was processed under quality control to remove duplicates using Chromap v.0.2.5-r473²⁰. Clean Hi-C data was then utilized to align the primary assembly for haplotype identification and division. Contigs were anchored and orientated onto chromosomes using YaHS v1.2²¹ and Juicer v1.6.2²². The result of the contig assembly was reviewed, and any assembly errors were corrected manually under Juicebox v.1.11.08²³. To determine the autosomes and sex chromosomes, the final assembly was remapped with raw HiFi data by using MiniMap2 to determine each chromosome length. Chromosome coverage was then calculated using SAMtools v. 1.9²⁴ by dividing raw data by chromosome length. Moreover, the X chromosome was also detected by chromosome synteny between the model beetle species, Tribolium castaneum, and the relative species Trypoxylus dichotomus according to the relatively conserved feature in insect sexual chromosome X²⁵. Syntenic blocks were identified and determined using MCScanX²⁶ and TBtools²⁷. Conclusively, the X chromosome was identified by exhibiting around half of the chromosome coverage compared with other chromosomes (Table 3) and re-confirmed by sharing high synteny features with other beetles’ X chromosomes (Fig. 2).

To ensure the high-quality assembly of our genome, potential contaminants were detected and eliminated by software and NCBI. In this case, we focused on Humans, Bacteria, viruses, and plant sequences. Possible contaminants were detected using MMseq. 2 v11²⁸, which utilizes BLASTN-like searches and the UniVec database based on the NCBI nucleotide database. Potential vector contaminants were also specifically detected and identified by blastn (BLAST + v2.11.0²⁹) against the UniVec database. Sequences with over 90% hits in the database above were considered contaminants, and sequences with over 80% hits were rechecked by online BLASTN analysis in the NCBI nucleotide database. The final genome assembly was uploaded to NCBI to detect and eliminate contaminants. According to vector search, no prominent contaminant was found in our assembly, reflecting the high quality of sample preparation and accuracy of specimen sequencing.

The final P. inquinatus genome assembly eventually reached the chromosomal level with a total size of 649.73 Mb, consisting of 174 scaffolds and 195 contigs (Table 2). The scaffold and contig N50 length reached 59.5 Mb and 26.36 Mb, respectively. GC content of the P. inquinatus was 35.67%. Most contigs (612.12 Mb, 94.21%) were firmly anchored and orientated onto 12 chromosomes. All chromosome coverage was computed and exhibited (Table 3). Among these chromosomes, one particular chromosome, number 12, has a coverage of 37.02 for long-read sequencing and 88.58 for short-read sequencing, around half of the other chromosomes (Table 3). Hence, the number 12 chromosome was considered the X chromosome in P. inquinatus. All chromosomes in assembly, including 11 autosomes and X chromosome, with individual lengths ranging from 17.22 to 75.68 Mb (Tables 2, 3; Fig. 1). Compared with the assembly result of its related species, Trypoxylus dichotomus³⁰ (Sarabaeidae) (636.37 Mb in genome size and 35.11% GC content), P. inquinatus exhibited a larger genome size and GC content (Table 4).

Table 2 Genome assembly statistics for Prosopocoilus inquinatus.

Full size table

Table 3 Chromosome status of Prosopocoilus inquinatus.

Full size table

Table 4 Genome assembly and annotation statistics for Prosopocoilus inquinatus and its relative species, Trypoxylus dichotomus (Scarabaeidae).

Full size table

Genome annotation

A de novo specific repeat library for P. inquinatus was built by RepeatModeler v2.0.4³¹. This specific repeat library was combined with RepBase-20230909³² and added to the custom library. Repeat elements in the P. inquinatus genome were recognized and masked by RepeatMasker v.4.1.4³³ by aligning the custom library. Repetitive elements analysis resulting from RepeatMasker demonstrated that the P. inquinatus genome contains approximately 62.19% repetitive elements, including unclassified elements (42.02%), LTR elements (8.36%), DNA transposons (7.33%), LINE (1.77%), and simple repeats (0.68%) with other elements (S Table). The density for the type of each element, including simple and TEs elements, was exhibited on each chromosome (Fig. 3). Compared with the repetitive element components in T. dichotomus, P. inquinatus showed more significant size percent of Unclassified (42.02% to 16.67%) and LTR (8.36% to 1.24%) elements; however, P. inquinatus had a significantly minor size percent of DNA transposons, LINEs, and SINEs (Table 4).

Non-coding RNAs (ncRNAs) and transfer RNA (tRNA) in P. inquinatus were detected and identified by Infernal v1.1.4³⁴ and tRNAscan-SE v2.0.9³⁵, respectively. As a result, 1,857 ncRNAs were placed in the P. inquinatus genome, including four long non-coding RNAs, six ribozymes, 55 small nuclear RNAs, 93 microRNAs, 344 other ncRNAs, 351 tRNAs, and 1,004 ribosomal RNAs (Table 4). Comparatively, the number of P. inquinatus ncRNAs was around 2.8 times more than T. dichotomus (Table 4).

Protein-coding genes (PCGs) annotation in P. inquinatus was analyzed by MAKER v3.01.03³⁶ from transcribed RNA, ab initio gene predictions, and homologous proteins. Transcribed RNA alignment prediction was performed by HISAT2 v2.2.1³⁷. RNA-seq alignment production was then acted as a genome-guided assembly by StringTie v2.1.6³⁸. The BRAKER v3.0.3³⁹ was applied to acquire the ab inito gene predictions by employing GeneMark-ETP⁴⁰ and Augustus v3.4.0⁴¹ and automatically trained them based on RNA sequence alignments and reference proteins obtained from OrthoDB v11 database⁴². GeMoMa v1.9⁴³ analyzed protein-homology alignments from five insect species’ proteins, including two Coleopteran species, Tribolium castaneum (GCF_000002335.3⁴⁴) and Coccinella septempunctata (GCF_907165205.1⁴⁵) related to Lucanidae and three sister families of Coleoptera, including one Dipteran species, Drosophila melanogaster (GCA_000001215.4⁴⁶), one Hymenopteran species, Apis mellifera (GCA_003254395.2⁴⁷), and one Neuropteran species Chrysoperla carnea (GCA_905475395.1⁴⁸) (Table 5). Results from BRAKER and GeMoMA were finally combined and applied as the ab inito input for MAKER. The final result of P. inquinatus PCGs establishment indicated 13,452 genes with an average length of 17,401.8 bp (Table 6).

Table 5 Species taxonomic information and accession code of all samples used in this study.

Full size table

Table 6 Summary statistics of genome annotations in the Prosopocoilus inquinatus genome.

Full size table

The functional gene annotation was proposed by searching the UniProtKB (SwissProt and TrEMBL) 20190527 database, which uses Diamond v2.0.11.1⁴⁹. Protein domain identifications were performed by eggNOG-mapper v2.1.9⁵⁰ and InterProScan 5.60–92.0⁵¹ for Gene Ontology (GO) and KEGG pathway annotation analysis. Five databases, including Pfam⁵², SMART⁵³, Superfamily⁵⁴, Gene3D⁵⁵, and CDD⁵⁶, were analyzed in InterProScan. Functional annotation indicated that P. inquinatus contained 11,656 COG categories, 7,087 GO terms, 4,924 KEGG pathways, and 2,838 Enzyme Codes based on the InterProScan and eggNOG annotation integration (Table 6).

Data Records

The raw sequencing data and genome assembly of Prosopocoilus inquinatus have been deposited at the National Center for Biotechnology Information (NCBI). The Illumina, PacBio, Hi-C, transcriptome short reads, and transcriptome long reads data can be found under identification numbers SRR27127825⁵⁷, SRR27243604⁵⁸, SRR27127828⁵⁹, SRR27127827⁶⁰, and SRR27127826⁶¹, respectively, under the BioProject accession number PRJNA1015594 and BioSample accession number SAMN37358649. The assembled genome has been deposited in the GeneBank in NCBI under accession number GCA_036172665.1⁶². The annotation results for repeated sequences, gene structure, and functional prediction have been deposited in the Figshare database⁶³.

Technical Validation

Berry Genomics (Beijing, China) carried out the DNA extraction. Two quantities, including the NanoDrop and Qubit, were mentioned during the extraction process (Table 7). Our extraction yielded a NanoDrop of 86 ng/μl and a 44.65 ng/μl Qubit. The 280/260 and the 260/230 of our stag beetle are 1.78 and 1.85, respectively.

Table 7 DNA extraction of the Prosopocoilus inquinatus.

Full size table

Two methods were used to evaluate the quality of the genome assembly. Firstly, BUSCO v5.4.4⁶⁴ was applied for assembly completeness calculation with the reference Insecta gene set (n = 1,367) with the euk_genome_met mode. The final genome assembly showed a BUSCO completeness of 99.6%, including 1,362 (98.5%) single-copy BUSCOs, 15 (1.1%) duplicated BUSCOs, 1 (0.1%) fragmented BUSCOs, and 4 (0.3%) missing BUSCOs. To investigate the quality of the de novo assembly, Merqury v1.3⁶⁵ was performed to identify possible assembly sequence errors based on efficient k-mer set operations and QV score calculation. Consequently, the k-mer completeness value of the stag beetle is 94.2%, and the QV score is 46.60. Both the k-mer value and the QV score reflect the high accuracy of the base pairs, combined with the BUSCOs, which exhibit the high completeness and accuracy of our genome assembly. The final annotation validation was also calculated by BUSCOs with a protein mode with the reference Insecta gene set (n = 1,367). The final annotation genome exhibited a BUSCO completeness of 99.6%, including 1,079 (78.9%) single-copy BUSCOs, 283 (20.7%) duplicated BUSCOs, 1 (0.1%) fragmented BUSCOs, and 4 (0.3%) missing BUSCOs. The mapping rate was also measured to determine the assembly accuracy. The mapping rates for PacBio, Illumina, RNA short reads, and RNA long reads were 99.6%, 96.51%, 96.93%, and 97.59%, respectively. These evaluations altogether reflected the high-quality value of the genome assembly.

Code availability

All commands and pipelines used in data processing were executed according to the manual and protocols of the corresponding bioinformatic software. The settings and parameters of software were listed below: (1) Fastp v0.23.2: ‘-D’ (drop the duplicated reads), ‘-g’ (tail trimming), ‘-x’ (polymer trimming on 3′ ends), ‘-5’ (move a sliding window from 5′ tail to tail), ‘-u 10’ (unqualified percentage limit), ‘-c’ (overlapped bases correction); (2) Hifiasm v0.19.8: ‘-l2’ (strongly remove haplotig duplications); (3) Minimap2 v2.24: default parameters; (4) NextPolish2 v0.2.0: default parameters; (5) YaHS v1.2: default parameters; (6) Juicer v1.6.2: default parameters; (7) Juicebox v.1.11.08: default parameters; (8) MMseq2 v11: default parameters with ‘--search-type 3’, ‘—min-seq-id 0.8’ for potential contaminants; (9) SAMtools v. 1.9: default parameters; (10) RepeatModeler v2.0.4: ‘-LTRStruct’ LTR discovery pipeline; (11) RepeatMasker v.4.1.4: default parameters; (12) Infernal v1.1.4: default parameters; (13) tRNAscan-SE v2.0.9: ‘EukHighConfidenceFilter’ script with default parameters; (14) MAKER v3.01.03: default parameters; (15) HISAT2 v2.2.1: default parameters; (16) StringTie v2.1.6: default parameters; (17) BRAKER v3.0.3: default parameters; (18) GeneMark-ETP: default parameters; (19) Augustus v3.4.0: default parameters; (20) GeMoMa v1.9: ‘GeMoMa.m = 15000’, ‘ERE.c = false’ with default parameters; (21) Diamond v2.0.11.1: default parameters; (22) eggNOG-mapper v2.1.9: default parameters; (23) InterProScan 5.60–92.0: default parameters.

References

Fujita, H. The Lucanid Beetles of the World. Mushi-sha, Tokyo. (2010).
Linnaeus, C. Systema Naturae per regna tria naturae, secundum classes, ordines, genera, species cum characteribus, differntiis, synonymis locis. Tomus I. Editio X. Laurentii Salvi, Holmiae. (1758).
Kojima, H. Breeding Technique of Lucanid Beetles. Mushi-sha, Tokyo (1996).
New, T. R. Beetles in Conservation. Oxford University Press (2010).
Gotoh, H. et al. Developmental link between sex and nutrition; doublesex regulates sex-specific mandible growth via juvenile hormone signaling in stag beetles. PLoS Genet. 10, e1004098 (2014).
Article PubMed PubMed Central Google Scholar
Goyen, J., Dirckx, J. & Aerts, P. Costly sexual dimorphism in Cyclommatus metallifer stag beetles. Funct. Ecol. 29, 35–43 (2015).
Article Google Scholar
Liu, J., Chenggong, L. I., You, S., Wan, X. & Ecology, D. O. The first complete mitogenome of Cyclommatus stag beetles (Coleoptera: Lucanidae) with the phylogenetic implications. Entomotaxonomia. 39, 294–299 (2017).
Google Scholar
Araya, K. Relationship between the decay types of dead wood and occurrence of lucanid beetles (Coleoptera: Lucanidae). Appl. Entomol. Zool. 28, 27–33 (1993).
Article Google Scholar
Tanahashi, M., Matsuchita, N. & Togshi, K. Are stag beetles fungivorous? J. Insect Physiol. 55, 983–988 (2009).
Article CAS PubMed Google Scholar
Songvorawit, N., Butcher, B. A. & Chaisuekul, C. Decaying Wood preference of stag beetles (Coleoptera: Lucanidae) in a tropical dry-Evergreen Forest. Environ. Entomol. 46, 1322–1328 (2017).
Article CAS PubMed Google Scholar
Huang, H. & Chen, C. C. Stag Beetles of China I. Formosa Press, Taipei. (2010).
Huang H. & Chen, C. C. Stag Beetles of China II. Formosa Press, Taipei. (2013).
Huang H. & Chen, C. C. Stag Beetles of China III. Formosa Press, Taipei. (2017).
Zhou, L. Y., Zhan, Z. H., Zhu, X. L. & Wan, X. Multilocus phylogeny and species delimitation suggest synonymies of two Lucanus Scopoli, 1763 (Coleoptera, Lucanidae) species names. Zookeys. 1135, 139–155 (2023).
Article Google Scholar
Li, X. et al. The first chromosome-level genome of the stag beetle Dorcus hopei Saunders, 1854 (Coleoptera: Lucanidae). Sci Data. 11, 396 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, S. F., Zhou, Y. Q., Chen, Y. R. & Gu, J. Fastp: an ultra-fast all-in-one FASTAQ preprocessor. Bioinformatics. 34(17), 884–890 (2018).
Article Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hu, J. et al. Nextpolish2: a repeat-aware polishing tool for genomes assembled using HiFi long reads. bioRxiv. 2023.04.26.538352 (2023).
Zhang, H. et al. Fast alignment and preprocessing of chromatin profiles with Chromap. Nat Communications. 12(1), 1–6 (2021).
ADS Google Scholar
Zhou, C. X., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics. 39(1), btac808 (2023).
Article CAS PubMed Google Scholar
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. Twelve years of SAMtools and BCFtools. GigaScience. 10(2), giab008 (2021).
Article Google Scholar
Toups, A. M. & Vicoso, B. The X chromosome of insects likely predates the origin of class Insecta. Evolution. 77(11), 2504–2511 (2023).
Article PubMed Google Scholar
Wang, Y. et al. MCScanX: A toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, C. et al. Tbtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data. Mol. Plant. 13, 1194–1202 (2020).
Article CAS PubMed Google Scholar
Steinegger, M. & Soding, J. MMseqs 2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Article CAS PubMed Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Wang, Q. Y., Liu, L. W., Zhang, S. J., Wu, H. & Huang, J. H. A chromosome-level genome assembly and intestinal transcriptome of Trypoxylus dichotomus (Coleoptera: Scarabaeidae). GigaScience. 11, giac059 (2022).
Article PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. Dna. 6, 11 (2015).
Article PubMed PubMed Central Google Scholar
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. Available online: http://www.repeatmasker.org (accessed on 14 October 2023) (2013–2015).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 29, 2933–2935 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chan, P. P. & Lowe, T. M. TRNAscan-SE: Searching for tRNA genes in genomic sequences. Methods Mol Biol. 1962, 1–14 (2019).
Article CAS PubMed PubMed Central Google Scholar
Holt, C. & Yandell, M. MAKER2: An annotation pipeline and genome-database management tool for second-generation genome projects. Bmc Bioinformatics. 12, 491 (2011).
Article PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: A fast spliced aligner with low memory requirements. Nat. Methods. 12, 357–360 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 278 (2019).
Article CAS PubMed PubMed Central Google Scholar
Bruna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: Automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. Nar Genom. Bioinform. 3, lqaa108 (2021).
Article PubMed PubMed Central Google Scholar
Bruna, T., Lomsadze, A. & Borodovsky, M. GeneMark-ETP: automatic gene finding in eukaryotic genomes in consistency with extrinsic data. bioRxiv. 13.524024. (2023).
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Res. 32, W309–W312 (2004).
Article CAS PubMed PubMed Central Google Scholar
Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res. 51(D1), D445–D451 (2023).
Article CAS PubMed Google Scholar
Keilwagen, J., Hartung, F., Paulini, M., Twardziok, S. O. & Grau, J. Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi. Bmc Bioinformatics. 19, 189 (2018).
Article PubMed PubMed Central Google Scholar
Herndon, N. et al. Enhanced genome assembly and a new official gene set for Tribolium castaneum. BMC Genomics. 21, 47 (2020).
Article CAS PubMed PubMed Central Google Scholar
Crowley, L. et al. The genome sequence of the seven-spotted ladybird, Coccinella septempunctata Linnaeus, 1758. Welcome Open Res. 6, 319 (2021).
Article Google Scholar
Hoskins, R. A. et al. The Release 6 reference sequence of the Drosophila melanogaster genome. Genome Res. 25, 445–458 (2015).
Article PubMed PubMed Central Google Scholar
Wallberg, A. et al. A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds. BMC Genomics. 20(1), 275 (2019).
Article PubMed PubMed Central Google Scholar
Wang, Y. et al. The first chromosome-level genome assembly of a green lacewing Chrysopa pallens and its implication for biological control. Mol Ecol Resour. 22(2), 755–767 (2021).
Article PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods. 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Huerta-Cepas, J. et al. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
Article CAS PubMed PubMed Central Google Scholar
Finn, R. D. et al. InterPro in 2017—Beyond protein family and domain annotations. Nucleic Acids Res. 45, D190–D199 (2017).
Article CAS PubMed Google Scholar
El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).
Article CAS PubMed Google Scholar
Letunic, I. & Bork, P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 46, D493–D496 (2018).
Article CAS PubMed Google Scholar
Wilson, D. et al. SUPERFAMILY—Sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380–D386 (2009).
Article CAS PubMed Google Scholar
Lewis, T. E. et al. Gene3D: Extensive Prediction of Globular Domains in Proteins. Nucleic Acids Res. 46, D1282 (2018).
Article PubMed Google Scholar
Marchler-Bauer, A. et al. CDD/SPARCLE: Functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 45, D200–D203 (2017).
Article CAS PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27127825 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27243604 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27127828 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27127827 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR27127826 (2024).
NCBI GeneBank https://identifiers.org/ncbi/insdc.gca:GCA_036172665.1 (2024).
Bo, P. Genome Annotation. figshare https://doi.org/10.6084/m9.figshare.24635889.v1 (2024).
Waterhouse, R. M. et al. BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).
Article CAS PubMed Google Scholar
Rhie, A., Walenz, P. B., Koren, S. & Philippy, M. A. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was supported by the key research and development program of the Tibet Autonomous Region, investigating invasive species in Xizang agricultural fields and developing key prevention and control techniques for important invasive species under project XZ202201ZY0002N, and Plan Project of Zunyi Science and Technology NO.ZSKHZ [2023]148.

Author information

Authors and Affiliations

Plant Protection Department, College of Agriculture and Animal Husbandry of Xizang Autonomous Region, Lhasa, 850000, China
Bo Pang
Department of Entomology, College of Plant Protection, Nanjing Agricultural University, Nanjing, 210095, China
Zhihong Zhan
College of Biology and Agriculture, Zunyi Normal University, Zunyi, 563006, China
Yunchao Wang

Authors

Bo Pang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Yunchao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.P. contributed to the research design. B.P. collected the samples. B.P. analyzed the data. B.P. and Z.Z.H. wrote the draft manuscript, and W.Y.C. revised it. All co-authors contributed to this manuscript and approved it.

Corresponding author

Correspondence to Bo Pang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

S Table Repeat annotation in the Prosopocoilus inquinatus genome.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pang, B., Zhan, Z. & Wang, Y. A chromosome-level genome assembly of Prosopocoilus inquinatus Westwood, 1848 (Coleoptera: Lucanidae). Sci Data 11, 808 (2024). https://doi.org/10.1038/s41597-024-03647-9

Download citation

Received: 26 January 2024
Accepted: 12 July 2024
Published: 20 July 2024
DOI: https://doi.org/10.1038/s41597-024-03647-9
Springer Nature Limited

A chromosome-level genome assembly of Prosopocoilus inquinatus Westwood, 1848 (Coleoptera: Lucanidae)

Abstract

Similar content being viewed by others

A chromosomal-level genome assembly of Serrognathus titanus Boisduval, 1835 (Coleoptera: Lucanidae)

Chromosome-level genome assembly of predatory Arma chinensis

Chromosome-level genome assembly of Microplitis manilae Ashmead, 1904 (Hymenoptera: Braconidae)

Background & Summary