Abstract
Understanding the genomic underpinnings of thermal adaptation is a hot topic in eco-evolutionary studies of parasites. Marine heteroxenous parasites have complex life cycles encompassing a free-living larval stage, an ectothermic intermediate host and a homeothermic definitive host, thus representing compelling systems for the study of thermal adaptation. The Antarctic anisakid Contracaecum osculatum sp. D is a marine parasite able to survive and thrive both at very cold and warm temperatures within the environment and its hosts. Here, a de novo transcriptome of C. osculatum sp. D was generated for the first time, by performing RNA-Seq experiments on a set of individuals exposed to temperatures experienced by the nematode during its life cycle. The analysis generated 425,954,724 reads, which were assembled and then annotated. The high-quality assembly was validated, achieving over 88% mapping against the transcriptome. The transcriptome of this parasite will represent a valuable genomic resource for future studies aimed at disentangling the genomic architecture of thermal tolerance and metabolic pathways related to temperature stress.
Similar content being viewed by others
Background & Summary
Temperature is a critical factor in marine environments that plays a significant role in shaping the evolution of life1,2. In particular, fluctuations in temperature experienced in the marine realm pose significant challenges for the organisms. They have evolved numerous adaptations to maintain homeostasis across a wide thermal range, including modifications to membrane lipids, enzymes, and metabolic pathways, as well as changes in gene expression profiles1,2.
Recent advances in -omics sciences have allowed researchers to investigate the genomic foundations of thermal adaptation processes, which are highly relevant in evolutionary research. Through genome and transcriptome sequencing and assembly, valuable insights into the genetic mechanisms that enable diverse marine organisms to survive and thrive in environments with extreme conditions, have been gained3,4,5,6. However, while extensive research has been conducted on organisms that thrive in extreme temperatures, there is currently a notable gap in specific studies focusing on the molecular mechanisms which are at the base of thermal fluctuations.
Marine anisakid parasites, which are particularly subjected to fluctuations in temperatures due to their association with both ectothermic and/or homeothermic hosts7, provide a particularly interesting system for studying this phenomenon. These parasites employ a range of strategies, including heat-tolerant and freeze-avoiding mechanisms, to ensure their survival and adaptability1. The Antarctic anisakid nematode, Contracaecum osculatum sp. D inhabits the Earth’s coldest marine ecosystem, the Antarctic Sea8 and exhibits exceptional thermal adaptability. From the first larva to the adult stage, this parasite species experiences a range of temperatures. It is capable of surviving and thriving in both cold and warm temperatures during different stages of its heteroxenous life cycle. For instance, it undergoes a free-living larval stage, a third larval stage in ectothermic hosts (invertebrates and icefish), and the adult stage in homeothermic pinniped hosts (i.e. the Weddell Seal, Leptonychotes weddellii)9. As a consequence, C. oscultum sp. D may be exposed to subzero temperatures, likely exhibiting tolerance to thermal stress to maintain homeostasis, and finally possesses adaptive features to thrive to its suitable definitive host’s temperature.
Temperature can have significant effects on the development of the parasite in seawater, its relationship with the host, and the transmission of the parasite to its hosts. Temperatures experienced during different stages of its life cycle can directly or indirectly drive the population dynamics of the infection cycles of the parasite. Despite these challenges, the parasite species have been found to have a consistently high population size in its intermediate/paratenic and definitive hosts under a temporal scale level (from 1995 to 2014), indicating successful maintenance of parasites’ fitness through evolutionary adaptation’s features8,9.
The recent advance in -omic sciences of marine anisakid nematodes10,11,12,13,14,15,16,17 provides new opportunities for investigating the genetic basis of the evolutionary traits that underlie the adaptation of these heteroxenous parasites to their hosts.
Therefore, the aim of the present study was to provide a transcriptomic resource for investigating the genetic underpinnings in terms of disclosing putative genes and metabolic pathways involved in the thermal adaptation speculated by the Antarctic parasite C. osculatum sp. D through its life cycle.
The resource generated by the present comprehensive analysis will provide a deeper understanding of the evolutionary forces that have shaped the genomic architecture of thermal-adapted marine organisms, particularly in these anisakid parasites, and provide valuable insights into their survival strategies in the challenging marine environment. Moreover, these advances in -omic sciences hold great promise for uncovering the molecular basis of thermal adaptation in marine organisms and for promoting conservation efforts aimed at protecting marine Antarctic ecosystems, considering the ongoing oceans ‘environmental change.
Methods
Sample collection and RNA preparation
Contracaecum osculatum (sensu lato) third-stage larvae (L3) were extracted from the body cavity of the ice fish, Chionodraco hamatus caught in the Ross Sea, Antarctica. The parasitological examination was conducted on-site, specifically at the Zucchelli Station in the Ross Sea, during the expedition financed by the Italian PNRA-MUR 2019 (National Antarctic Research Program-Ministry of University and Research). The careful removal was done using scissors and tweezers. Then, L3 were examined for their integrity under a dissecting microscope, and their vitality was assessed based on their spontaneous movements. Alive and not disrupted larvae were washed, following the procedure, as previously reported17. Larvae were then cultured under different thermal profiles, which are believed to mimic the temperature conditions experienced by the Antarctic anisakid during its life cycle (i.e., −2 °C, 1 °C, 37 °C) (Table 1). At specific time intervals, N = 54 L3 were promptly preserved in RNAlater solution until further analyses. RNA and DNA were extracted from whole L3 using TRIzol reagent (Invitrogen, Carlsbad, CA, USA), according to the manufacturer’s instructions with some modifications as previously reported18. DNA was used to identify L3 at the species level following procedures previously reported8. RNA was treated with DNase (DNase I, Invitrogen) according to the manufacturer’s instructions. Subsequently, RNA from each group of three individuals, belonging to the same experimental condition, was pooled together. The quality and concentration of RNA were evaluated using a spectrophotometer (NanoDrop® 8000, Thermo Fisher Scientific, US) and a Bioanalyzer (Agilent 2100, Agilent Technologies, Santa Clara, USA).
Library preparation and sequencing
mRNA sequencing libraries were prepared using the Illumina Truseq stranded mRNA library prep kit, according to the manufacturer’s instructions. Briefly, mRNA was purified and fragmented from total RNA using poly-T oligo-attached magnetic beads. The resulting cleaved RNA fragments, primed with random hexamers, were reverse transcribed into first-strand cDNA using reverse transcriptase, random primers, and dUTP instead of dTTP. The incorporation of dUTP halts the second strand during amplification since the polymerase cannot extend beyond this nucleotide. The generated cDNA fragments were further processed by adding a single ‘A’ base and ligating the adapter. The resulting products were then purified and enriched using PCR to generate a final strand-specific cDNA library. The quality of the amplified libraries was assessed using capillary electrophoresis (Bioanalyzer, Agilent).
After performing qPCR using the SYBR Green PCR Master Mix (Applied Biosystems), the libraries with index tags in equimolar amounts were pooled. The cluster generation step took place in the flow cell using the cBot automated cluster generation system (Illumina). Subsequently, the flow cell was loaded onto the NovaSeq 6000 sequencing system (Illumina) for sequencing, employing a read length of 2 × 100 base pairs (bp). The sequencing data are available at the NCBI Sequence Read Archive (Table 1).
Pre-assembly processing stage
The obtained RNA-seq data on C. osculatum sp. D L3 were processed for mass transcriptome sequencing. The workflow of the bioinformatics pipelines, adapted from two previous studies19,20, is illustrated in Figure 1. The bioinformatic analyses were conducted using the High-Performance Computing resources supplied by ELIXIR-IT HPC@CINECA21,22,23,24,25. A total of 425,954,724 pairs of reads were generated. All of them underwent a cleaning and analytic step. Read quality control was performed by running FastQC (v. 0.11.5) (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc), before and after the trimming phase. Detailed quality results are freely available in the figshare archive (Table 2). Quality evaluation metrics of the trimmed data were aggregated for all samples into a single report for concise visualisation using the software tool MultiQC (v. 1.9)26. Raw reads underwent a quality trimming phase using Trimmomatic (v. 0.39) to remove low-quality bases and adapter sequences. The Trimmomatic parameters were as follows: SLIDINGWINDOW:4:15, MINLEN:36, and HEADCROP:13)27. Unpaired reads were also discarded for the rest of the assembly pipeline. After the cleaning and removal of low-quality reads, a total of 387,489,135 reads were retained for de novo transcriptome assembly, corresponding to 92% of the raw reads (Table 1).
De novo transcriptome assembly and quality assessment
Given the unavailability of a reference genome for C. osculatum sp. D, a de novo transcriptome assembly procedure was carried out. To ensure the construction of an optimized transcriptome and avoid chimera transcripts, we used rnaSPAdes28, a tool for de novo transcriptome assembly implemented in the SPAdes package (v. 3.14.1). RnaSPAdes automatically detected two k-mer sizes, approximately one-third and half of the maximum read length (the detected k-mer sizes were 45 and 67 nucleotides, respectively). A total of 237,314 assembled contigs were generated from rnaSPAdes runs with an N50 of 1783 bp (Table 3). Two filtering steps were performed to accurately remove assembly redundancies. The first step was performed by launching CD-HIT-est (v. 4.8.1) on the rnaSPAdes output; the result was uploaded on figshare (Table 2). The final assembly was produced by Corset (v. 1.06)29, a tool we employed based on its successful application in a previous study30. The Corset output showed an N50 of 1871 bp (Table 3).
Thanks to the two-step process of removing redundancies, which effectively reduces assembly chimeras and improves the accuracy of subsequent analyses, the final assembly contained about 20% of the original transcripts.
The validation process encompassed two distinct phases applied to the assembly outcomes. The initial phase aimed at evaluating the preliminary assembly, followed by another phase aimed at appraising the quality of the final, non-redundant assembly. Two distinct tools were employed: TransRate (v. 1.0.3)31 and BUSCO (Benchmarking Universal Single-Copy Orthologs) (v. 5.4.4)32. These tools generated an array of metrics, serving as a valuable compass for identifying potential errors within the assembly process and offering evidence about the quality of the de novo assembled transcriptome. The TransRate results (Table 3) also include ‘good contigs’ values, which represent the number of contigs in the assembly classified as high quality by the validator, and the ‘p good contigs’ value, which indicates the percentage of high quality contigs compared to the total number of contigs in the assembly. BUSCO provides a quantitative measure of transcriptome quality and completeness, founded on evolutionarily derived predictions of gene content from databases housing nearly universal and ultra conserved protein orthologs. The analysis of gene content was carried out by conducting BUSCO assessments on three orthologous gene databases: Nematoda, Metazoa, and Eukaryota. The completeness of the transcriptome by BUSCO is reported in Table 4. Figure 2 illustrates completed, fragmented and missing genes mapped from the three databases.
Generation of the full-length transcriptomes
After the validation and evaluation phase, the resulting data from the assembly process serves as the input for the CD-HIT-est program33. This hierarchical clustering tool is employed to circumvent redundancy among transcripts and to address fragmented assemblies in the de novo assembly process, yielding unique genes. CD-HIT-est was executed with default settings, resulting in a 95% similarity threshold. To refine the final transcriptome dataset, an additional hierarchical clustering phase was performed using Corset. This groups related transcripts based on their expression patterns, effectively identifying and merging isoforms, and transcriptional variants. This phase significantly improves the accuracy and completeness of the transcriptome assembly. Subsequently, the output of Corset was validated by BUSCO, and quality assessment was conducted using HISAT2 (v. 2.1)34 by mapping the trimmed reads to the reference transcriptome (unigenes). HISAT2 results showed a percentage of at least 88% (Fig. 3), providing the relative fraction of RNA-seq reads used to assemble the transcriptome.
For each sample, the representation of the total paired-reads is shown in blue, the total paired-reads after removal of the adapters and quality trimming is shown in orange, and the trimmed paired-reads mapped against the de novo assembled Contracaecum osculatum sp. D transcriptome is shown in green.
The results of all validation phases are shown in Table 2 and discussed in the “Technical Validation” section.
The Corset output was run with TransDecoder (v. 5.7.0)35,36, a current standard tool for identifying long open read frames (ORFs) within assembled transcripts, using default parameters. TransDecoder performs ORF prediction on both transcript strands, irrespective of the sequenced library. Additionally, it evaluates ORF completeness and discerns potential 5′ end incompleteness by detecting any length of amino acid (AA) codons downstream of a start codon (M) without a stop codon. The “Longest ORF” criterion was employed, leading to identify the earliest 5' AUG codon as the start site for translation.
Transcriptome annotation
We employed different kinds of annotations for the de novo assembly. We introduced DIAMOND37, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity. Like BLASTX, DIAMOND attempts to determine exhaustively all significant alignments for a given query. Most sequence comparison programs, including BLASTX, follow the seed-and-extend paradigm. In this two-phase approach, users search first for matches of seeds (short stretches of the query sequence) in the reference database, and this is followed by an ‘extend’ phase that aims to compute a full alignment. The following parameter settings were applied: DIAMOND-fast DIAMOND BLASTX-t 48 -k 250 -min-score 40; DIAMOND-sensitive: DIAMOND BLASTX -t 48 -k 250 -sensitive -min-score 40.
Contigs were aligned with DIAMOND against the NCBI non-redundant (NR) protein database, which includes the non-redundant RefSeq proteins. In addition, the SwissProt and TrEMBL databases were also used to retrieve the best matching annotations for contigs. An annotation matrix was then generated by selecting the best hit for each database.
After conducting the BLASTX analysis against Nr, TrEMBL and SwissProt, we obtained annotations for 29,694 (80,3%), 29,904 (80,9%) and 20,660 (55,9%) contigs, respectively. Similarly, using BLASTP versus Nr, TrEMBL and SwissProt, we annotated 24,366 (65,9%), 24,600 (66,5%) and 17,239 (46,6%) contigs, respectively.
All the information on the resulting datasets is summarised in Table 5. The overview of data files and data sets produced in this study are summarised in Table 2.
The output obtained from the BLASTX annotation consisted of a total of 20,655 sequences mapped simultaneously to the three interrogated databases (i.e., Nr, SwissProt and TrEMBL). The output from the BLASTP annotation consisted of a total of 17,236 sequences mapped simultaneously to the three databases. Venn diagrams are presented in Fig. 4, showing the redundancy of the annotations in the different databases for both DIAMOND BLASTX (Fig. 4a) and DIAMOND BLASTP (Fig. 4b). Furthermore, the ten most represented species and the ten hits of the gene product obtained respectively with BLASTX and BLASTP by mapping the transcripts against the reference database Nr are shown in Figures 5, 6. The BLASTX approach translates nucleotide sequences in all six frames (three forward and three reverse) into protein sequences before conducting the search against protein databases, making it more exhaustive than the BLASTP approach, which directly aligns protein sequences against protein databases.
The total number of unigenes obtained from the transcriptome assembly was also mapped onto another database of functional annotations: EggNOG (Evolutionary genealogy of genes: Non-supervised Orthologous Groups)38. The EggNOG database incorporates various taxonomic levels of orthologous groups (OG) of proteins with functional annotations, using an algorithm that builds on previous orthologous group (COG) methodologies. This database offers detailed functional information for genes within each orthologous group and includes a wide range of sequenced genomes from different species, providing a robust evolutionary context for our data analysis. Of the 36,985 total predicted ORFs, 16,968 (or 45,9%) were annotated in the EggNOG database. For details, see Table 2.
Comparison with closest species through the orthologs
We compared the predicted ORFs from the de novo transcriptome of C. osculatum sp. D with both the predicted ORFs of the transcriptome of Anisakis pegreffii16 and the transcripts of A. simplex (sensu stricto)39. The reference transcriptome of A. simplex (s.s.) was produced with GffRead40, an open-source program to manipulate GFF and GTF format files. The identification and orthological grouping of all the proteins of the three species were performed using OrthoFinder (v. 2.5.5)41. This approach also served to assess the completeness of the assembly based on sequence similarity. OrthoFinder allows orthogroup detection, defined as a set of genes descended from a single gene of the last common ancestor within species groups42. The orthogroup detection demonstrated considerable overlap in transcript sequences in all three groups: A. pegreffii, A. simplex (s.s.) and C. osculatum sp. D. More than 20% (8348) of the transcripts identified as putative orthologs were shared between all three species (Fig. 7). We found that 15452 transcripts (36,9%) in A. pegreffii, 4049 transcripts (9,7%) in A. simplex (s.s.), and 3378 transcripts (8,1%) in C. osculatum sp. D were classified as species-specific. Thus, the marked level of sequence overlap observed between transcriptomes further validates the completeness and quality of the assembly presented in this study.
Data Records
All raw data generated in this project have been deposited in National Center for Biotechnology Information Sequence Read Archive (NCBI SRA, PRJNA934921)43. The de novo transcriptome assembly resource was deposited on figshare (Table 2) and NCBI (GKNQ00000000)42 after Foreign Contamination Screen (FCS). All files produced in the transcriptome assembly and annotation were deposited in the figshare archive (Table 2).
Technical Validation
The data quality was assessed using FastQC, pre and post trimming analysis. In the FastQC results, the average quality scores at every base position exceeded the threshold of 35 (Image file 1, Table 2). Validation of the transcriptome assembly was performed using two validation tools: BUSCO and TransRate. The results of validation processes are presented in Table 3. BUSCO analysis was performed on three databases: Nematoda, Metazoa and Eukariota. The details of BUSCO are listed in Table 4. Complete (C), Missing (M) and Fragmented (F) genes are plotted in Figure 2. An additional validation assessment was performed by mapping the clipped reads against the de novo assembled transcriptome of C. osculatum sp. D using the HISAT2 tool (Fig. 3). To further assess the quality of transcriptome assembly, the number of full-length assembled transcripts was evaluated44. The outcomes reveal a total of 5188 proteins, which exhibit coverage exceeding 90% of their respective protein lengths. The final transcriptome (unigenes) after contamination screening comprised a total of 43,673 transcripts and an N50 of 1867 bp. The BUSCO evaluation attested a completeness over 80% for each database interrogated.
Quality control of annotation
Functional annotation of the transcriptome was accomplished by DIAMOND and EggNOG. The application of DIAMOND for annotation purposes led to the identification of 20,655 predicted ORFs (for BLASTX analyses) and 17,236 predicted ORFs (for BLASTP analyses) shared between the three databases used: Nr, SwissProt and TrEMBL. Finally, from the EggNOG analysis, we obtained COG (Cluster of Orthologous Groups) annotations and KEGG (Kyoto Encyclopedia of Genes and Genomes) annotations for 16,968 ORFs, representing 45,9% of the total.
Code availability
The article includes a comprehensive list of software programs employed for various tasks, such as de novo transcriptome assembly, pre- and post-assembly procedures, and transcriptome annotation, all of which are specified alongside their respective versions within the Methods section. If specific parameter details are not provided, the programs were used with their default settings.
References
Wharton, D. Parasites and low temperatures. Parasitology 119, S7–S17 (1999).
D’Amico, S. et al. Molecular basis of cold adaptation. Phil. Trans. R. Soc. Lond. B 357, 917–925 (2002).
Shin, S. C. et al. The genome sequence of the Antarctic bullhead notothen reveals evolutionary adaptations to a cold environment. Genome Biol. 15, 1–14 (2014).
Bargelloni, L. et al. Draft genome assembly and transcriptome data of the icefish Chionodraco myersi reveal the key role of mitochondria for a life without hemoglobin at subzero temperatures. Commun. Biol. 2, 443 (2019).
Kim, B. M. et al. Antarctic blackfin icefish genome reveals adaptations to extreme environments. Nat. Ecol. Evol. 3, 469–478 (2019).
Lauritano, C., Roncalli, V., Ambrosino, L., Cieslak, M. C. & Ianora, A. First De Novo Transcriptome of the copepod Rhincalanus gigas from Antarctic waters. Biology 9, 410 (2020).
Timi, J. T. & Poulin, R. Why ignoring parasites in fish ecology is a mistake. Int. J. Parasitol. 50, 755–761 (2020).
Mattiucci, S. et al. Temporal stability of parasite distribution and genetic variability values of Contracaecum osculatum sp. D and C. osculatum sp. E (Nematoda: Anisakidae) from fish of the Ross Sea (Antarctica). Int. J. Parasitol. Parasites Wildl. 4, 356–367 (2015).
Mattiucci, S. & Nascetti, G. Advances and trends in the molecular systematics of anisakid nematodes, with implications for their evolutionary ecology and host—parasite co-evolutionary processes. Adv. Parasitol. 66, 47–148 (2008).
Arcos, S. C. et al. Proteomic profiling and characterization of differential allergens in the nematodes Anisakis simplex sensu stricto and A. pegreffii. Proteomics 14, 1547–1568 (2014).
Llorens, C. et al. Functional insights into the infective larval stage of Anisakis simplex s.s., Anisakis pegreffii and their hybrids based on gene expression patterns. BMC Genom. 19, 59 (2018).
Nam, U. H., Kim, J. O. & Kim, J. O. De novo transcriptome sequencing and analysis of Anisakis pegreffii (Nematoda: Anisakidae) third-stage and fourth-stage larvae. J. Nematol. 52, e2020–e2041 (2020).
Cavallero, S. et al. Tissue-specific transcriptomes of Anisakis simplex (sensu stricto) and Anisakis pegreffii reveal potential molecular mechanisms involved in pathogenicity. Parasites Vectors 11, 31 (2018).
Cavallero, S., Bellini, I., Pizzarelli, A., Arcà, B. & D’Amelio, S. A miRNAs catalogue from third-stage larvae and extracellular vesicles of Anisakis pegreffii provides new clues for host-parasite interplay. Sci. Rep. 12, 9667 (2022).
Trumbić, Ž., Hrabar, J., Palevich, N., Carbone, V. & Mladineo, I. Molecular and evolutionary basis for survival, its failure, and virulence factors of the zoonotic nematode Anisakis pegreffii. Genomics 113, 2891–2905 (2021).
Palomba, M. et al. De novo transcriptome assembly and annotation of the third stage larvae of the zoonotic parasite Anisakis pegreffii. BMC Res. Notes 15, 223 (2022).
Palomba, M. et al. Proteomic characterization of extracellular vesicles released by third stage larvae of the zoonotic parasite Anisakis pegreffii (Nematoda: Anisakidae). Front. Cell. Infect. Microbiol. 13, 261 (2023).
Palomba, M. et al. Gene expression profiles of antigenic proteins of third stage larvae of the zoonotic nematode Anisakis pegreffii in response to temperature conditions. Parasite 26, 52 (2019).
Libro, P. et al. First brain de-novo transcriptome of Tyrrenhian tree frog, Hyla sarda, for the study of dispersal-related behavioral variation. Front. Ecol. Evol. 10, 1–6 (2022).
Libro, P. et al. De novo transcriptome assembly and annotation for gene discovery in Salamandra salamandra at the larval stage. Sci. Data 10, 330 (2023).
Castrignanò, T. et al. ELIXIR-IT HPC@ CINECA: high-performance computing resources for the bioinformatics community. BMC Bioinformatics 21, 1–17 (2020).
Picardi, E., D’Antonio, M., Carrabino, D., Castrignanò, T. & Pesole, G. ExpEdit: a webserver to explore human RNA editing in RNA-Seq experiments. Bioinformatics 27, 1311–1312 (2011).
Chiara, M. et al. CoVaCS: a consensus variant calling system. BMC Genom. 19, 1–9 (2018).
Castrignano, T. et al. ASPIC: a web resource for alternative splicing prediction and transcript isoforms characterization. Nucleic Acids Research 34, W440–W443 (2006).
Castrignanò, T. et al. The MEPS server for identifying protein conformational epitopes. BMC bioinformatics 8, 1–5 (2007).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Bushmanova, E., Antipov, D., Lapidus, A. & Prjibelski, A. D. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience 8, giz100 (2019).
Davidson, N. M. & Oshlack, A. Corset: enabling differential gene expression analysis for de novo assembled transcriptomes. Genome Biol. 15, 1–14 (2014).
Chiocchio, A. et al. Brain de novo transcriptome assembly of a toad species showing polymorphic anti-predatory behaviour. Sci. Data 9, 619 (2022).
Smith-Unna, R., Boursnell, C., Patro, R., Hibberd, J. M. & Kelly, S. TransRate: Reference-free quality assessment of de novo transcriptome assemblies. Genome Res. 26, 1134–1144 (2016).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Signal, B. & Kahlke, T. Borf: Improved ORF prediction in de-novo assembled transcriptome annotation. BioRxiv 2021–04 (2021).
Tang, S., Lomsadze, A. & Borodovsky, M. Identification of protein-coding regions in RNA transcripts. Nucleic Acids Res. 43, 78 (2015).
Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Muller, J. et al. EggNOG v2. 0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations. Nucleic Acids Res. 38, D190–D195 (2010).
Łopieńska-Biernat, E., Molcan, T., Paukszto, Ł., Jastrzębski, J. P. & Myszczyński, K. Modelling studies determing the mode of action of anthelmintics inhibiting in vitro trehalose-6-phosphate phosphatase (TPP) of Anisakis simplex s.l. Exp. Parasitol. 184, 46–56 (2018).
Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000 Res. 9, 304 (2020).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 1–14 (2019).
Palomba, M. et al. Contracaecum osculatum, transcriptome shotgun assembly. GenBank https://identifiers.org/ncbi/insdc:GKNQ00000000 (2023).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP422483 (2023).
Counting Full Length Trinity Transcripts. https://github.com/trinityrnaseq/trinityrnaseq/wiki/Counting-Full-Length-Trinity-Transcripts.
Palomba, M. et al. Cosculatum sp. D. figshare. https://doi.org/10.6084/m9.figshare.c.6701190.v1 (2023).
Acknowledgements
This study was supported by the Ministry of University and Research (MUR), Italian Program of Research in Antarctica (PNRA), PNRA19_00125 and the Research project implemented under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.4 - Call for tender No. 3138 of 16 December 2021, rectified by Decree n.3175 of 18 December 2021 of Italian Ministry of University and Research funded by the European Union - Next Generation EU. Project code CN_00000033, Concession Decree No. 1034 of 17 June 2022 adopted by the Italian Ministry of University and Research, CUP J83C22000860007 (Department of Ecological and Biological Sciences, University of Tuscia), Project title “National Biodiversity Future Center – NBFC”.
Author information
Authors and Affiliations
Contributions
M.P. and S.M. conceived the study, designed the experiments and coordinated the sample collection; A.M. performed the sample collection and the experiments; M.P. and X.R.-G. performed the RNA extraction; M.P., S.M. and D.C. coordinated the sequencing; T.C. designed and coordinated the bioinformatic analysis; T.C., P.L. and J.D.M. performed reads quality assessment, reads alignment on the transcriptome, transcriptome annotation and validation; M.P., P.L. J.D.M., T.C., D.C. and S.M. wrote the manuscript; all authors reviewed the manuscript; S.M. is the PI of the financed study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Palomba, M., Libro, P., Di Martino, J. et al. De novo transcriptome assembly of an Antarctic nematode for the study of thermal adaptation in marine parasites. Sci Data 10, 720 (2023). https://doi.org/10.1038/s41597-023-02591-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-02591-4
- Springer Nature Limited