Chromosome-level genome of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus

Ren, Xianyun; Sun, Dongfang; Lv, Jianjian; Gao, Baoquan; Jia, Shaoting; Bian, Xueqiong; Zhao, Kuangcheng; Li, Jitao; Liu, Ping; Li, Jian

doi:10.1038/s41597-024-03512-9

Chromosome-level genome of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus

Data Descriptor
Open access
Published: 22 June 2024

Volume 11, article number 662, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-level genome of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus

Download PDF

Xianyun Ren ORCID: orcid.org/0000-0001-9884-4874^1,2^na1,
Dongfang Sun^1,2^na1,
Jianjian Lv^1,2^na1,
Baoquan Gao^1,2,
Shaoting Jia^1,2,
Xueqiong Bian^1,2,3,
Kuangcheng Zhao^1,2,
Jitao Li^1,2,
Ping Liu^1,2 &
…
Jian Li^1,2

203 Accesses
Explore all metrics

Abstract

Recent conservation efforts to protect rare and endangered aquatic species have intensified. Nevertheless, the ornate spiny lobster (Panulirus ornatus), which is prevalent in the Indo-Pacific waters, has been largely ignored. In the absence of a detailed genomic reference, the conservation and population genetics of this crustacean are poorly understood. Here, We assembled a comprehensive chromosome-level genome for P. ornatus. This genome—among the most detailed for lobsters—spans 2.65 Gb with a contig N50 of 51.05 Mb, and 99.11% of the sequences with incorporated to 73 chromosomes. The ornate spiny lobster genome comprises 65.67% repeat sequences and 22,752 protein-coding genes with 99.20% of the genes functionally annotated. The assembly of the P. ornatus genome provides valuable insights into comparative crustacean genomics and endangered species conservation, and lays the groundwork for future research on the speciation, ecology, and evolution of the ornate spiny lobster.

Life barcoded by DNA barcodes

Article Open access 15 August 2022

Multi-genome comparisons reveal gain-and-loss evolution of anti-Mullerian hormone receptor type 2 as a candidate master sex-determining gene in Percidae

Article Open access 26 June 2024

Comparative Study of Two Himalayan Snow Trouts, Schizothorax esocinus and Schizothorax curvifrons Within the Schizothoracinae and Other Nearest Relatives of Cyprinidae, Inferred from Mitochondrial Sequences of Cytochrome b (Cyt-b) and Cytochrome Oxidase I (Co-I) Gene

Article 19 June 2024

Background & Summary

Lobsters, with a prestigious status as valuable marine resources, are highly sought after in global fisheries for their economic and culinary significance. This has placed considerable focus on lobsters within the realms of biology, fisheries, and taxonomy¹. The marine lobster family presently encompasses 49 acknowledged species, including 11 genera². Lobsters, notable for their large size as benthic invertebrates, have exceptionally long lives, with some species estimated to live over 50 years and possibly up to 100 years³. However, the high market demand for lobsters resulted in intensive overfishing. Few countries have implemented effective management strategies to ensure sustainable harvests, and inadequate enforcement of fishing and marketing regulations have, in many regions, put significant strain on lobster populations. Consequently, to safeguard these valuable species and ensure their long-term sustainability, there is an urgent need to explore and implement alternative management approaches, such as co-management⁴.

The ornate spiny lobster, Panulirus ornatus, is an endangered species found on coral reefs and inshore habitats widely distributed in China, the South Pacific, and the Indian Ocean (Fig. 1a,b). In global aquaculture, it ranks as one of the most valued and highly priced fisheries⁵, and is consequently overexploited in unregulated fisheries^6,7. On February 5, 2021, the ornate spiny lobster (P. ornatus) was classified as a Second Class species on China’s National Key Protected Wild Animals List—a notable conservation milestone, making P. ornatus the first crustacean to be recognized and included in this crucial protection list⁸. Like many other valued marine species around the globe, the ornate spiny lobster population faces several critical threats, including marine environmental pollution, injuries from fishing activities, loss of vital habitats, and a decline in fish resources⁹. The combined effects of global climate change and human activities exacerbate these challenges, posing significant risks to the survival and health of lobsters¹⁰. In conclusion, the population size of P. ornatus is in decline, and the pursuit of further conservation measures for these species is imperative.

Previous attempts to sequence the genome of this species resulted in an incomplete and fragmented assembly, with an estimated genome size of 3.23 Gb compared to the actual assembled genome size of 1.93 Gb and a contig N50 of 5,451 bp, limiting the depth of potential research¹¹. Here, we successfully achieved the first chromosome-level genome assembly for an endangered lobster species by integrating a combination of Illumina short reads, PacBio long read DNA sequencing, and Hi-C technology (Fig. 2). The project amassed 182.90 Gb of Illumina short-read data, 115.67 Gb of PacBio continuous long read data, and 456.71 Gb of Hi-C data, culminating in an assembled genome size of 2.65 Gb and a scaffold N50 of 51.05 Mb (Tables 1 and 2). Our high-quality genome assembly enhances the genomic resources available for crustaceans and provides essential data for their further protection.

Table 1 Statistics of the sequencing data.

Full size table

Table 2 Assembly statistics of the ornate spiny lobster.

Full size table

Methods

Sample collection and nucleic acid extraction

We collected male adult P. ornatus from Huangliu Co., LTD. in Sanya, Hainan, China. In this study, muscle tissue samples were collected and meticulously washed three times with sterile phosphate-buffered saline (PBS). The samples were then instantly froze with liquid nitrogen and subsequently stored at −80 °C. Total genomic DNA (gDNA) was extracted for genome survey and construction of the genome sequence libraries using the AMPure bead cleanup kit following the manufacturer’s instructions (Beckman Coulter, High Wycombe, UK). Meanwhile, we extracted total RNA from eight tissues (testis, intestines, hepatopancreas, hemocytes, muscle, gills, heart, and eyestalk) of the same individual by utilizing the TRIzol reagent according to the manufacturer’s instructions and subjected to RNA-seq analysis for genome structure annotation. The integrity and quality of the extracted nucleic acids were evaluated using 1.5% agarose gel electrophoresis and nucleic acid concentrations were accurately quantified using a Qubit fluorometer (Thermo Fisher Scientific based in Waltham, MA).

Library construction and sequencing

A short-read library was prepared with an insert size of 350 bp and sequenced utilizing the Illumina Platform to generate 2 × 150 bp reads with NEB Next* Ultra^TM DNA Library Prep Kit (NEB, USA) for Illumina short-read sequencing following the manufacturer’s recommendations. For PacBio sequencing, we used genomic DNA to construct SMRTbell libraries following the manufacturer’s guidelines. We then sequenced the libraries using a PacBio Sequel platform equipped with single molecule real-time (SMRT). These sequencing efforts led to the generation of 182.90 Gb of Illumina short-read data and 292.02 Gb of raw continuous long reads (CLR), achieving a comprehensive 179-fold coverage of the P. ornatus genome (Table 1).

For Hi-C library construction, we used the MboI restriction enzyme to digest cross-linked high molecular weight (HMW) gDNA. After 5′ overhang biotinylation and blunt-end ligation, we physically sheared DNA into 300–500 bp fragments. Finally, we sequenced the Hi-C library with a strategy of 2 × 150 bp on the Illumina HiSeq using the NovaSeq 6000 platform, resulting in 456.71 Gb of paired-end raw reads. The sequencing libraries were then constructed using the NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA), with all procedures strictly adhering to the manufacturer’s recommendations. We then sequenced the RNA-seq library using the Illumina HiSeq 6000 platform to generate 2 × 150 bp reads. From this process, we generated 54.38 Gb of paired-end short clean reads, as we detail in Table 1.

Genome survey and assembly

The adapter sequences and low-quality reads obtained from the Illumina platform were removed before the assembly process, using fastp software (version 0.23.1)¹², retaining only clean reads for the subsequent stages of genome survey and assembly. We conducted genome surveys to determine key genomic characteristics such as overall size, heterozygosity, and repeatability, employing SOAPec (version 2.01)¹³ and GenomeScope (version 2.0)¹⁴ software to analyze 17 different K-mer frequencies. From these analyses, with a dominant peak depth of 59, we calculated the estimated genome size of P. ornatus to be 2917.34 Mb. We also approximated the heterozygosity and repetitive sequence content of the genome at 0.92% and 63.86%, respectively. In Table S1 and Fig. S1, we comprehensively detail these findings and estimates.

For genome assembly of P. ornatus, we employed a dual approach using two distinct assemblers—Wtdbg2 (version 2.5)¹⁵ and Flye (version 2.9)¹⁶—each of which produced an initial assembly using default parameters, which we then refined using the Arrow polishing process (version 8.0)¹⁷. Arrow is a consensus algorithm that generates highly accurate consensus sequences from PacBio subreads. After polishing, we merged the assemblies from Wtdbg2 and Flye using Quickmerge (version 0.3)¹⁸— a tool specifically designed to combine multiple genome assemblies into a single, unified consensus assembly. The resulting merged assembly was then polished twice using two rounds of Arrow and two rounds of Pilon (version 1.22)¹⁹ with default parameters. We performed PacBio subreads for Arrow and Illumina short reads for pilon, generating a total of 8,061 contigs with a total length of 2,651,872,113 bp (Table 2).

Hi-C scaffolding

In the Hi-C scaffolding phase of this study, we first processed the raw Hi-C reads to eliminate adapters and low-quality bases, using fast software (version 0.23.1)²⁰ with the parameters set to -q 20-l 50. Subsequently, we aligned these processed reads to the preliminary assembly using the Juicer pipeline²¹. Following alignment, we used the 3D-DNA pipeline²² to perform several critical tasks, including grouping the contigs into chromosomes, and orienting and ordering the contigs within each chromosome. To enhance the accuracy of the assembly, we manually corrected errors using Juicebox Assembly Tools (version 2.13.06)²¹. The scaffolding process allowed for accurate anchoring of 2,628.95 Mb of the assembly to 73 chromosomes (Fig. 3)—accounting for 99.11% of the total assembly (Table S2). The scaffold N50, a measure of assembly continuity, reached a length of 51.05 Mb in the final assembly (Table 2). This assembly is noteworthy for the contiguity of 14 chromosomes, each with no more than 30 gaps (Table 3).

Table 3 Assembly statistics for the chromosomes.

Full size table

Genomic repeat annotation. We identified repeat sequences in the P. ornatus genome using both homology-based and de novo strategies²³. Initially, we merged the de novo predicted repetitive sequence database with the Repbase homologous repetitive sequence database²⁴. We used a suite of tools—RepeatScout (version 1.0.5)²³, RepeatModeler (version 2.0.1)²⁵, Piler (version 1.0)²⁶, and LTR-FINDER (version 1.0.6)²⁷—to identify transposable element (TE) families, whereafter we employed Repeatmasker (version 4.1.0)²⁵, RepeatProteinMask (version 4.1.0), and TRF (version 4.0.9)²⁸ to classify different repetitive elements. We achieved this classification by aligning the P. ornatus genome sequences with the integrated database. After eliminating redundant results from these three methods, we established that repeat sequences constituted 65.67% of the P. ornatus genome (Table S3). In addition, we calculated the Kimura divergence value of TEs using the script ‘calcDivergenceFromalign.pl²⁹ and created TE landscapes with ‘createRepeatLandscape.pl³⁰. Among the identified repeat elements, we identified DNA elements as comprising 4.58% of the genome, with long interspersed nuclear elements (LINEs) accounting for 40.30%. Short interspersed nuclear elements (SINEs) and long terminal repeats (LTRs) constituted only 0.01% and 30.07% of the genome, respectively (Table 4 and Fig. 4).

Table 4 Classification of repetitive sequences in the P. ornatus genome.

Full size table

In the process of annotating noncoding RNA (ncRNA) within the P. ornatus genome, we employed specific tools for different types of ncRNA predictions. For tRNA prediction, we used tRNAScan (version 1.4)³¹, whereas for rRNA prediction we used Blast (version 2.2.26)³². To identify other types of ncRNAs, such as miRNA and snRNA, we aligned the sequences to the Rfam database³³ using the INFERNAL tool (version 1.0)³⁴. Using these methods, we successfully identified four distinct types of noncoding RNAs in the P. ornatus genome. including 12,771 miRNAs, 5,187 tRNAs, 1,716 rRNAs, and 1,296 snRNAs (Table 5).

Table 5 Classification of ncRNAs in the P. ornatus genome.

Full size table

Protein-coding gene prediction and annotation

For gene structure prediction of the P. ornatus genome, we employed a combination of de novo, homology-based, and transcriptome sequencing-based predictions. For the de novo approach, we used a suite of tools–Augustus (v3.2.3)³⁵, GlimmerHMM (v3.02)³⁶, SNAP (v2013.11.29)³⁷, Geneid (v1.4)³⁸, and Genscan (v1.0)³⁹—to predict gene structures directly from the genome sequence. For homologous-based annotation, the protein sequences of Portunus trituberculatus (swimming crab), Cherax quadricarinatus (Australian red claw crayfish), Penaeus vannamei (Pacific white shrimp), Procambarus virginalis (marbled crayfish), Homo sapiens (human), Drosophila melanogaster (fruit fly), Tribolium castaneum (red flour beetle), Caenorhabditis elegans (nematode), and Crassostrea gigas (Pacific oyster) were downloaded from the NCBI’s Genbank database, and aligned against spiny lobster genome using Blast (v2.2.26)³² and Genewise (v2.4.1)⁴⁰. With this multifaceted approach, we ensured a thorough and accurate prediction of the protein-coding genes in the P. ornatus genome, thereby enhancing our understanding of its genetic architecture. We identify a total of 5,087–58,220 homolgous genes when comparing against the nine target species (Table 6) (Table 6). We analyzed the lengths of genes, CDS, exons, and introns in P. ornatus and compared them with those of five other species (Fig. 5). We found the average lengths for P. ornatus to be 29,875.91 bp for transcripts, 1,420.49 bp for CDS, 257.65 bp for exons, and 6,300.84 bp for introns (Table S4).

Table 6 Statistical analyses of gene structure annotation of the P. ornatus genome.

Full size table

Two assembly methods including transcript assembly with reference to the genome and de novo assembly using Trinity software (version 2.11.0)⁴¹ were utilized to process clean RNA-seq data. Open reading frames (ORFs) were identified using PASA (version 2.1.0)⁴², and gene sets predicted by the different methods were merged into a comprehensive, non-redundant gene set containing 22,752 protein-coding genes with Evidence Modeler (version 1.1.1)⁴³ (Table 7 and Fig. 6a).

Table 7 Statistical analysis of functional gene annotations of the P. ornatus genome.

Full size table

We functionally annotated the protein-coding genes using Blastp (version 2.2.26)⁴⁴ and Diamond (version 0.8.22)⁴⁵ to align the genes against several protein databases, including SwissProt⁴⁶, NCBI Nonredundant protein (NR), KEGG⁴⁷, InterPro⁴⁸, GO Ontology (GO)⁴⁹, and Pfam⁵⁰, setting the E-value cutoff at 1E-5. We further annotated protein domains and motifs using InterProScan (version 5.52–86.0)⁵¹. We annotated 22,568 (99.20%) of the 22,752 predicted genes, by at least one of these databases (Table 7). All four databases supported 14,884, or 65.42% of these functionally annotated proteins (Fig. 6b).

Data Records

We deposited the genomic Illumina sequencing data in the SRA at NCBI SRR26801482⁵² and SRR26801483⁵³.

We deposited the genomic PacBio sequencing data in the SRA at NCBI SRR26801477⁵⁴ and SRR26801478⁵⁵.

We deposited the transcriptomic sequencing data in the SRA at NCBI SRR SRR26945899⁵⁶-SRR26945906^{57,58,59,60,61,62,63}.

We deposited the Hi-C sequencing data in the SRA at NCBI SRR26801479–SRR 26801481^64,65,66.

This Whole Genome Shotgun project has been deposited at GenBank under the accession https://identifiers.org/ncbi/insdc.gca:GCA_036320965.1⁶⁷. The version described in this paper is version ASM3632096v1. The final chromosome assembly and genome annotation files are also available in Figshare⁶⁸.

Technical Validation

Evaluation of genome assembly and annotation

We rigorously evaluated the quality of P. ornatus genome assembly using multiple methods. First, with the Benchmarking Universal Single-Copy Orthologs (BUSCO) (version 3.0.2)⁶⁹ assessment, using the BUSCO database (arthropoda_odb9) of single-copy orthologous genes along with tools such as tblastn, augustus, and hmmer, we confirmed the presence of 93.6% of gene orthologs in P. ornatus, with 93.6% being complete and 3.2% fragmented, indicating a comprehensive assembly (Table S5). Second, employing the Core Eukaryotic Genes Mapping Approach (CEGMA) (version 2.5)⁷⁰, we revealed that P. ornatus genes had homologs for 226 highly conserved core genes, accounting for 91.13% (248) of the total, further confirming the completeness of the assembly (Table S6). Finally, we aligned Illumina sequencing reads to the nuclear genome using BWA (version 0.7.8)⁷¹, resulting in a high read mapping rate of 97.85% and a coverage rate of 96.80%, demonstrating the better integrity of the assembled genome as well as the homogeneity of the sequencing data (Table S7). These collective findings indicate the high quality of P. ornatus genome assembly.

Collinearity analysis

For whole genome synteny comparison, we aligned the chromosome-level genomes of two decapod species, Penaeus chinensis and Procambarus clarkii, with the P. ornatus genome assembly, using LASTZ (version 1.02.00)⁷² with default parameters. We found that nearly 73 chromosome-level scaffolds of P. ornatus exhibited significant similarity with the corresponding chromosomes of P. chinensis and P. clarkii (Fig. 7). This similarity underscores the high quality of the sequencing and assembly of the P. ornatus genome, while improving the reliability of phylogenetic analyses.

In conclusion, we successfully assembled a high-quality chromosome-level genome of P. ornatus. This newly generated reference genome represents a significant contribution to our knowledge of lobster genetic diversity. It will not only advance comparative evolutionary studies but also play a crucial role in conservation efforts for this endangered species.

Code availability

We detail all commands and pipelines employed for data processing in the methods section. For any software where specific parameters were not mentioned, we used the default settings recommended by the software developers. The core code is available at https://github.com/sundongfang/Chromosome-level-genome-of-Panulirus-ornatus.

References

Radhakrishnan, E. V. et al. Lobsters: biology, fisheries and aquaculture. Springer Nature Singapore Pte Limited. (2019).
Chan, T. Y. Updated checklist of the world’s marine lobsters. In Lobsters: biology, fisheries and aquaculture (pp. 35-64). Springer, Singapore. (2019).
Vogt, G. Ageing and longevity in the Decapoda (Crustacea): a review. Zool. Anz. 251, 1–25 (2012).
Article Google Scholar
Vogt, G. How to minimize formation and growth of tumours: potential benefits of decapod crustaceans for cancer research. Int. J. Cancer 123, 2727–2734 (2008).
Article CAS PubMed Google Scholar
Priyambodo, B., Jones, C. M. & Sammut, J. Assessment of the lobster puerulus (Panulirus homarus and Panulirus ornatus, Decapoda: Palinuridae) resource of Indonesia and its potential for sustainable harvest for aquaculture. Aquaculture 528, 735563 (2020).
Article Google Scholar
Sachlikidis, N. G., Jones, C. M. & Seymour, J. E. The Effect of Temperature on the Incubation of Eggs of the Tropical Rock Lobster Panulirus Ornatus. Aquaculture 305, 79–83 (2010).
Article Google Scholar
Lewis, C. L., Fitzgibbon, Q. P., Smith, G. G., Elizur, A. & Ventura, T. Transcriptomic analysis and time to hatch visual prediction of embryo development in the ornate spiny lobster (Panulirus ornatus). Front. Mar. Sci. 9, 1009 (2022).
Article Google Scholar
Chen, J. F., Wu, X. J., Lin, H. & Cui, G. F. A comparative analysis of the List of State Key Protected Wild Animals and other wildlife protection lists. Biodiversity Science 31, 22639 (2023).
Article Google Scholar
Bauer, R. T. Fisheries and aquaculture. In Shrimps: Their Diversity, Intriguing Adaptations and Varied Lifestyles (pp. 583-655). Cham: Springer International Publishing (2023).
Leiva, L. et al. European lobster larval development and fitness under a temperature gradient and ocean acidification. Front. Physiol. 13, 809929 (2022).
Article PubMed PubMed Central Google Scholar
Veldsman, W. P. et al. Comparative genomics of the coconut crab and other decapod crustaceans: exploring the molecular basis of terrestrial adaptation. BMC Genomics 22, 1–15 (2021).
Article Google Scholar
Chen, S. Ultrafast one‐pass FASTQ data preprocessing, quality control, and deduplication using fastp. Imeta 2, e107 (2023).
Article PubMed PubMed Central Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Comm. 11, 1432 (2020).
Article CAS ADS Google Scholar
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods. 17, 155–158 (2020).
Article CAS PubMed Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article CAS PubMed Google Scholar
Zhao, H., Lai, Z. & Chen, Y. Global-and-local-structure-based neural network for fault detection. Neural Networks 118, 43–53 (2019).
Article PubMed Google Scholar
Chakraborty, M., Baldwin-Brown, J. G., Long, A. D. & Emerson, J. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res. 44, e147–e147 (2016).
PubMed PubMed Central Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS one 9, e112963 (2014).
Article PubMed PubMed Central ADS Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article CAS PubMed PubMed Central ADS Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl_1), i351–358 (2005).
Article CAS PubMed Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. Chapter 4, 4.10.11–14.10.14 (2009).
Google Scholar
Edgar, R. C. & Myers, E. W. PILER: identification and classification of genomic repeats. Bioinformatics 21(Suppl 1), i152–158 (2005).
Article CAS PubMed Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Hubley, R. GitHub repository, https://github.com/rmhubley/RepeatMasker/blob/master/util/createRepeatLandscape.pl (2023).
Rosen, J. GitHub repository, https://github.com/rmhubley/RepeatMasker/blob/master/util/calcDivergenceFromAlign.pl (2020).
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
Mount, D. W. Using the Basic Local Alignment Search Tool (BLAST). CSH Protoc. 2007, pdb.top17 (2007).
PubMed Google Scholar
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–124 (2005).
Article CAS PubMed Google Scholar
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–439 (2006).
Article CAS PubMed PubMed Central Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 1–9 (2004).
Article Google Scholar
Blanco, E., Parra, G. & Guigó, R. Using geneid to identify genes. Curr. Protoc. Bioinformatics Chapter 4, Unit 4.3 (2007).
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–462 (2016).
Article CAS PubMed Google Scholar
Finn, R. D. et al. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res. 45, D190–d199 (2017).
Article CAS PubMed Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–230 (2014).
Article CAS PubMed Google Scholar
Mulder, N. & Apweiler, R. InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol. Biol. 396, 59–70 (2007).
Article CAS PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26801482 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26801483 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26801477 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26801478 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26945899 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26945900 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26945901 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26945902 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26945903 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26945904 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26945905 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26945906 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26801479 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26801480 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR26801481 (2023).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_036320965.1 (2024).
Ren, X. Y. The chromosome-level genome of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus. Figshare https://doi.org/10.6084/m9.figshare.24654915.v1 (2023).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).
Article CAS PubMed Google Scholar
Li, H. Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Harris, R. S. Improved Pairwise Alignment of Genomic DNA. Ph.D. dissertation, The Pennsylvania State University, Pennsylvania (2017).

Download references

Acknowledgements

This work was supported by the project of the First National Survey of Aquaculture Germplasm Resources in the Yellow and Bohai Seas (17210247), the China Agriculture Research System (CARS-48), the Basic scientific research business expenses of Chinese Academy of Fishery Sciences of “Innovation team project of ecological aquaculture in seawater pond” (2020td46), and Central Public-interest Scientific Institution Basal Research Fund, CAFS (2023TD50).

Author information

These authors contributed equally: Xianyun Ren, Dongfang Sun, Jianjian Lv.

Authors and Affiliations

National Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, Shandong, 266071, China
Xianyun Ren, Dongfang Sun, Jianjian Lv, Baoquan Gao, Shaoting Jia, Xueqiong Bian, Kuangcheng Zhao, Jitao Li, Ping Liu & Jian Li
Laboratory for Marine Fisheries Science and Food Production Processes, Laoshan Laboratory, Qingdao, Shandong, 266237, China
Xianyun Ren, Dongfang Sun, Jianjian Lv, Baoquan Gao, Shaoting Jia, Xueqiong Bian, Kuangcheng Zhao, Jitao Li, Ping Liu & Jian Li
College of Fisheries and Life Science, Shanghai Ocean University, Shanghai, PR China
Xueqiong Bian

Authors

Xianyun Ren
View author publications
You can also search for this author in PubMed Google Scholar
Dongfang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jianjian Lv
View author publications
You can also search for this author in PubMed Google Scholar
Baoquan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Shaoting Jia
View author publications
You can also search for this author in PubMed Google Scholar
Xueqiong Bian
View author publications
You can also search for this author in PubMed Google Scholar
Kuangcheng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jitao Li
View author publications
You can also search for this author in PubMed Google Scholar
Ping Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P. L. and J. L. conceived the the study and supervised the project. X.Y.R. collected the sample and wrote the manuscript. D.F.S. and J.J.L. performed the data analysis and data uploading. B.Q.G. and J.T.L. supervised this work and assisted in data analysis. S.T.J., X.Q.B. and K.C.Z. collected the samples. All authors contributed to the final manuscript editing.

Corresponding authors

Correspondence to Ping Liu or Jian Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Figure S1, Figure S2

Table S1, Table S2, Table S3, Table S4, Table S5, Table S6, Table S7

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ren, X., Sun, D., Lv, J. et al. Chromosome-level genome of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus. Sci Data 11, 662 (2024). https://doi.org/10.1038/s41597-024-03512-9

Download citation

Received: 02 February 2024
Accepted: 12 June 2024
Published: 22 June 2024
DOI: https://doi.org/10.1038/s41597-024-03512-9
Springer Nature Limited

Chromosome-level genome of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus

Abstract