Abstract
The Indian black clam Villorita cyprinoides Gray, 1825, is an economically valuable estuarine bivalve that faces challenges from multiple stressors and anthropogenic pressures. However, limited genomic resources have hindered molecular investigations into the impact of these stressors on clam populations. Here, we have generated the first transcriptomic reference datasets for V. cyprinoides to address this knowledge gap. A total of 25,040,592 and 22,486,217 million Illumina paired-end reads generated from two individuals were assembled using Trinity and rnaSPAdes. From the 47,607 transcripts identified as Coding Domain Sequences, 37,487 returned positive BLAST hits against six different databases. Additionally, a total of 14,063 Single Sequence Repeats were identified using GMATA. This study significantly enhances the genetic understanding of V. cyprinoides, a potential candidate for aquaculture that supports the livelihoods of many people dependent on small-scale fisheries. The data generated provides insights into broader genealogical connections within the family Cyrenidae through comparative transcriptomics. Furthermore, this transcriptional profile serves as baseline data for future studies in toxicological and conservation genetics.
Similar content being viewed by others
Background & Summary
Estuarine regions are ecotone zones with rich biodiversity that possess great ecological and economic values1. Among the wider range of species communities within an estuarine system, the malacofauna represents a prominent fisheries resource, rendering a significant contribution to Small Scale Fisheries (SSF)2,3,4,5. Besides, these sentinel soft-bottom bivalve communities provide various ecosystem services including biomonitoring of pollution and assessment of toxin accumulation while improving water quality through filtration6,7. Being a biodiversity-rich ecosystem, the detrimental effects of climatic changes and anthropogenic pressures exerted on estuarine communities, particularly in bivalves, are implausible8,9. The aforementioned factors have a pronounced effect on endemic as well as economically exploited species, which brings us to the species of interest of this work- Villorita cyprinoides Gray, 1825.
Popularly known as the Indian black clam, Villorita cyprinoides is an endemic cyrenid clam inhabiting the estuaries of Peninsular India. Being a readily available and affordable protein source, this artisanal fishery accounts for more than 70 percent of the Indian clam fishery, making it economically valuable and overutilized10. However, the wild population of the clam faces significant challenges including multiple climatic stressors and anthropogenic pressures such as overfishing and habitat destruction. This inland fishery has been overexploited and reported with rapid and fragmentary population decline despite implementing a minimum legal catch size of 10 mm as a preventive measure11,12. Niche fragmentation, climate change, and associated environmental stressors have drastically altered larval development and threatened the existence of this organism13. To rejuvenate black clam resources in fishery areas, immediate actions such as re-laying have been undertaken, and initial breeding standardization studies are in progress14. However, due to limited genomic resources, it is difficult to ensure the conservation and sustainability of Villorita given the current significant risk in its population.
Apart from the evident economic importance, Villorita is also considered an excellent sentinel organism, monitoring ecotoxicological changes and ensuring ecosystem health15,16. Recent studies have discovered elevated levels of metals in black clams, leading to concerns about the overall health of the ecosystem17. Moreover, numerous earlier investigations have emphasized the substantial buildup and susceptibility of black clams to biologically essential metals like zinc (Zn) and copper (Cu)18,19. Though they are reported to exhibit resilience to environmental perturbations, various climatic stressors are directly influencing their survival11. To address these issues, molecular responses, fluctuations in the internal environment, signaling pathways, and associated genes need to be monitored when clams are probed for pollutants and environmental stressors20. This requires the elucidation of robust Reference Transcriptomic Datasets (RTD) that provide a basic notion of transcripts produced by a non-model organism21. In the past decades, high throughput next-generation sequencing (NGS) technologies have enabled large-scale sequencing of genomic/transcriptomic data with efficiency and low cost22, making the “omics” data more accessible23, and facilitating genetic level investigation easier. However, the transcriptome profile of brackish water clams has received less attention24,25.
In this study, we aim to characterize the first comprehensive transcriptome sets of the commercially important bivalve, V. cyprinoides endemic to peninsular India. A total of 25,040,592 and 22,486,217 million raw paired-end reads were generated from two V. cyprinoides samples using Illumina short-read sequencing, respectively. These reads were then assembled using the de novo transcript reconstruction method. Despite the lack of a reference genome, the transcriptomic data obtained in this study will serve as an important genomic resource and act as a prerequisite for further ecotoxicological, gene expression, and population studies on this species.
Methods
Ethics statement
No specific permits were required to collect and study the clam from the described fields. They are not under an endangered or protected list and thus have no control over the collection of samples. The experimental protocols used to conduct this study were approved by the Institutional Animal Ethical Committee of the ICAR Central Marine Fisheries Research Institute (CMFRI), Kochi. Additionally, the methodologies utilized were following the guidelines outlined in ARRIVE (Animal Research: Reporting of In Vivo Experiments) guidelines available at http://arriveguidelines.org.
Study area and sample collection
The healthy adult black clam specimens with a shell length (SL) of 30 ± 3.5 mm and a total body weight (BW) of 35–40 grams were collected using kolli (an indigenous hand rake net) from Vembanad Lake, a Ramsar site in Kerala, India in March 2021. The study locations in Vembanad Lake included the southern dominant freshwater region, Muhamma: 9°36′18.6″N 76°22′01.2″E and the northern saltwater region, Vaikom: 9°44′35.8″N 76°23′03.4″E, divided seasonally by the Thaneermukkom barrage10. Individuals were primarily identified using malacological literature26. A single specimen from each sampling site was randomly selected and tissue samples of the gill, foot, adductor muscle, mantle, and gonad were dissected and stored separately in RNA later (Sigma). The remaining tissue samples were fixed in absolute ethanol for species confirmation by the molecular method. The tubes were transported to the laboratory at 4 °C and kept at −80 °C until RNA extraction. An overview of the workflow is shown in Fig. 1.
DNA extraction and barcoding
Extraction of total genomic DNA from ethanol-preserved tissue was carried out using the Qiagen DNeasy Blood and Tissue Kit (QIAGEN, Valencia, CA, USA) and the barcode region of mitochondrial cytochrome c oxidase subunit I (COI) was amplified following the PCR protocol: initial denaturation for 5 min at 94 °C followed by 35 cycles of 30 s at 94 °C, 30 s at 42 °C, and 1 min at 72 °C with a final extension of 5 minutes at 72 °C, using the primer set LCO1490 /HCO219827. The amplified PCR products were purified, bi-directionally sequenced, edited, and aligned in MEGA 1128. The COI sequences generated were nBLAST against the GenBank database to confirm species identification.
RNA extraction and mRNA library preparation and sequencing
Total RNA from each tissue sample (gill, foot, adductor muscle, mantle, and gonad tissues from each clam) was extracted, purified, and quantified separately. RNA isolation was carried out using TRIzol Reagent (Invitrogen) with the manufacturer’s instructions and treated with RNase-free DNase I (TaKaRa). The quality and quantity of isolated RNA were confirmed using 0.8% denaturing agarose gel, Qubit Fluorometer 3.0 (ThermoFisher), and Agilent 4200 Bioanalyzer (Agilent Technologies, USA). 1.5 µg of RNA from the five tissues of each clam were pooled in equimolar concentration to prepare two RNA-seq libraries (VcypMt2, VcypVt2) using the TruSeq Standard mRNA-seq library preparation kit following the manufacturer’s protocol (Illumina, USA). Barcoded libraries were then sequenced on the Illumina NovaSeq 6000 platform with 150 base pair (bp) PE mode.
De novo transcriptome assembly, refinement, and quality assessment
Illumina sequences generated from two black clam specimens were processed separately and default settings were used for all software analyses unless otherwise stated. Primarily, FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to assess quality per base, overexpressed sequences, and adapter content from sequenced raw reads. The adapters and poly-A tails were trimmed and ambiguous reads were removed, N > 5% from both trailing and leading sequences. High-quality reads with a Phred score >30, and a minimum read length >50 bp were retained using Trimmomatic v0.3529. De novo transcriptome assemblies of clean reads were performed using Trinity v2.20 and rnaSPAdes v3.9 with a k-mer of 2530,31. Redundancies from the primary assembled transcriptomes were removed by CD-HIT v4.6.532.
Bowtie v2.3.5 and the Benchmark Universal Single Copy Orthologs (BUSCO) software were used to reassess the quality and completeness of the assembled transcriptomes33,34. In order to determine the degree of representation of reads within the transcriptome assemblies, Bowtie mapped back the Illumina RNA-seq reads to the corresponding transcriptome assemblies. The alignment parameters included end-to-end sensitivity, and a maximum number of mismatches, N = 1. The percentage of single-copy orthologous genes in the de novo assemblies was determined by BUSCO, which was compared to the mollusca_odb10 datasets.
Functional annotation
Open Reading Frames (ORFs) and coding regions from non-redundant transcripts were predicted by TransDecoder v3.0.0 with default parameters35. The predicted protein coding regions were then searched for homology using BLASTtx and BLASTp against NCBInr (https://www.ncbi.nlm.nih.gov/), UniProtKB (https://www.ebi.ac.uk/uniprot/), RefSeq (https://www.ncbi.nlm.nih.gov/refseq/) and Pfam databases with an e-value threshold of 1e-536,37. Gene Ontology (GO) annotations were also performed using EggNOG, KEGG, and Metascape (https://metascape.org/), which assigns transcripts to cellular components, cellular functions, and biological processes38. Subsequently, functional protein domains were then assigned using InterProScan v4.039.
Repetitive sequence identification
GMATA (Genome-wide Microsatellite Analyzing Towards Application) was used to identify simple sequence repeats (SSR) with a length of 2–10 bp and a repeat of 5 in the generated transcriptome40.
Data analysis and visualization
The data analysis and visualization were performed via R software using the packages ggplot2, tidyr, dplyr, stringi, plotrix, forcats, ggVennDiagram, venn, ggpolypath, and RColorBrewer41.
Data Records
The COI amplicons amplified from morphologically identified specimens were sequenced by Sanger sequencing and were submitted to GenBank with accession numbers OP99965342 and OP99965443. All the Illumina sequencing reads were submitted to NCBI-SRA (National Centre for Biotechnology Information-Sequence Read Archive) under Bioproject ID: PRJNA910160 with accession numbers SRR2257746244 and SRR2257746345. The corresponding BioSample IDs are SAMN32114842 and SAMN32114843. The transcriptome assembly and annotations of V. cyprinoides were shared through the Figshare platform46.
Technical Validation
The RNA isolated from different tissues with prominent and consistent 18S rRNA bands was used for library preparation. After two Illumina paired-end sequencing and quality filtering, the refrained high-quality clean transcripts were used to reconstruct transcriptomes using the reference-free de novo transcriptome assembly method. The quality filtering approach to remove redundancies in transcriptome assemblies was done by the elimination of repetitive regions, contigs, and singletons by CD-HIT. Comparison of statistics of de novo transcriptome assemblies using Trinity and rnaSPAdes assemblers is shown in Table 1. Conversely, the completeness of final assemblies was evaluated using BUSCO, and the percentage of single copies, missing, and fragmented BUSCO groups are depicted in Fig. 2. The transcripts annotated from Coding Domain Sequences (CDSs) identified through the six databases– UniProtKB, NCBI, Pfam, KOG, KEGG, and RefSeq are shown in Fig. 3. In addition, statistics of functional annotation towards different databases are summarized in Table 2. The categorization of GO terminologies such as biological process (BP), molecular function (MF), and cellular components (CC) is shown in Fig. 4. Besides, the KOG (EuKaryotic Orthologous Groups) functional classification groups the transcripts into 24 KOG categories, as depicted in Fig. 5, while Fig. 6 shows the transcripts mapped to five KEGG pathways. In addition, the ten most common protein domains obtained from the transcripts using InterPro member databases; Hamap, Pfam, Prints, ProSiteProfiles, SUPERFAMILY, PANTHER, PIRSF, ProSitePatterns, SMART, TIGRFAM are depicted in Fig. 7. Among the detected SSRs, the frequencies of di-, tri-, tetra-, penta-, and hexanucleotides were 62.7%, 26.7%, 9.5%, 0.89% and 0.02% respectively (Fig. 8, Tables S1–S3). From the data, the most prevalent di- and trinucleotide repeat was also detected (Table S4). These datasets expand indispensable transcriptomic resources for future research on functional genomics, gene characterization, and expression profiling in V. cyprinoides, as well as expanding molecular data for evolutionary studies in the family Cyrenidae.
Code availability
No custom code was generated in this study. The bioinformatics tools implemented in this study and their versions, settings, and parameters were explained in the Methods section. In cases where no specific settings were specified for a particular tool, default parameters were employed.
References
Barbier, E. B. et al. The value of estuarine and coastal ecosystem services. Ecol. Monogr. 81, 169–193 (2011).
Ramachandra, T. V, Subash Chandran, M. D. & Joshi, N. V. Edible Bivalves of Central West Coast, Uttara Kannada District, Karnataka, India - Sahyadri Conservation Series 15 (ENVIS Technical Report: 48). 1–21 (2012).
Wijsman, J., Troost, K., Fang, J. & Roncarati, A. Trends and challenges. Global Production of Marine Bivalves. (2018).
Dumbauld, B. R., Ruesink, J. L. & Rumrill, S. S. The ecological role of bivalve shellfish aquaculture in the estuarine environment: A review with application to oyster and clam culture in West Coast (USA) estuaries. Aquaculture. 290, 196–223 (2009).
Chuku, E. O. et al. The Estuarine and Mangrove Ecosystem-Based Shellfisheries of West Africa: Spotlighting Women-Led Fisheries Livelihoods. USAID women shellfishers and food security project. coastal resources center, graduate school of oceanography, University of Rhode island. Narragansett, RI, USA (2021).
McLeod, I. M. et al. Can Bivalve Habitat Restoration Improve Degraded Estuaries? Coasts and Estuaries: The Future 427–442 https://doi.org/10.1016/B978-0-12-814003-1.00025-3 (2019).
Vaughn, C. C. & Hoellein, T. J. Bivalve Impacts in Freshwater and Marine Ecosystems. Annu. Rev. Ecol. Evol. Syst. 49, 73–93 (2018).
Bramwell, G. et al. A review of the potential effects of climate change on disseminated neoplasia with an emphasis on efficient detection in marine bivalve populations. Sci. Total Environ. 775, 145134 (2021).
Castro-Olivares, A. et al. Does global warming threaten small-scale bivalve fisheries in NW Spain? Mar. Environ. Res. 180 (2022).
Rahuman, S., Jeena, N. S., Asokan, P. K., Vidya, R. & Vijayagopal, P. Mitogenomic architecture of the multivalent endemic black clam (Villorita cyprinoides) and its phylogenetic implications. Sci. Rep. 2020 10:1. 10, 1–16 (2020).
Paul, T. T. et al. Assessing vulnerability and adopting alternative climate resilient strategies for livelihood security and sustainable management of aquatic biodiversity of Vembanad lake in India. Journal of Water and Climate Change. 12, 1310–1326 (2021).
Suja, N. & Mohamed, K. S. The black clam, Villorita cyprinoides, fishery in the State of Kerala, India. Mar. Fish. Rev. 72, 48–61 (2010).
Nagarathinam, A. et al. Implications of an extensive salt water barrage on the distribution of black clam in a tropical estuarine system, Southwest coast of India. Oceanologia. 63, 343–355 (2021).
Rahuman, S. et al. Induced spawning, larval rearing, and innate diet analysis of Indian black clam, Villorita cyprinoides: An alternative to resource restoration through aquaculture. Aquaculture. 590, 741101 (2024).
Paul, T. T., Shyam, S. S., Manoharan, V. S. & Unnithan, U. Identification and evaluation of ecosystem services provided by clam (Villorita cyprinoides) fisheries in wetland. Indian Journal of Tropical Biodiversity. 23, 21–29 (2015).
George, R., Martin, G. D., Nair, S. M. & Chandramohanakumar, N. Biomonitoring of trace metal pollution using the bivalve molluscs, Villorita cyprinoides, from the Cochin backwaters. Environ. Monit. Assess. 185, 10317–10331 (2013).
Neethu, K. V. et al. A multibiomarker approach to assess lead toxicity on the black clam, Villorita cyprinoides (Gray, 1825), from Cochin estuarine system (CES), southwest coast, India. Environ Sci Pollut Res. 28, 1775–1788 (2021).
Raveenderan, R. & Sujatha, C. Quantization of specific trace metals in bivalve, Villorita cyprinoides var. cochinensis in the Cochin estuary (2011).
Don Xavier, N. D. et al. Eliciting heavy metal contamination on selected native organisms from Cochin estuary using contemporary biomarker approach. J Earth Syst Sci. 130, 174 (2021).
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).
Morillon, A. & Gautheret, D. Bridging the gap between reference and real transcriptomes. Genome Biol. 20, 1–7 (2019).
Zhang, J., Chiodini, R., Badr, A. & Zhang, G. The impact of next-generation sequencing on genomics. Journal of genetics and genomics. 38, 109 (2011).
Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature. 550, 345–353 (2017).
Amil-Ruiz, F. et al. Constructing a de novo transcriptome and a reference proteome for the bivalve Scrobicularia plana: Comparative analysis of different assembly strategies and proteomic analysis. Genomics. 113, 1543–1553 (2021).
Dong, Y. et al. The chromosomal-level genome assembly and comprehensive transcriptomes of Chinese razor clam (Sinonovacula constricta) with deep-burrowing life style and broad-range salinity adaptation. bioRxiv 735142 https://doi.org/10.1101/735142 (2019).
N.V. Rao, S. Indian Seashells: Part-2 Bivalvia. (Zoological Survey of India, 2017).
Folmer, O., Black, M., Hoeh, W., Lutz, R. & Vrijenkoek, R. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol. Marine Biol. Biotechnol. 294–299 (1994).
Tamura, K., Stecher, G. & Kumar, S. MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Mol Biol Evol. 38, 3022–3027 (2021).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30, 2114–2120 (2014).
Bushmanova, E., Antipov, D., Lapidus, A. & Prjibelski, A. D. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. GigaScience. 8, 1–13 (2019).
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 22, 1658–1659 (2006).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31, 3210–3212 (2015).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 9, 357–359 (2012).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Finn, R. D. et al. The Pfam protein families database. Nucleic Acids Res. 38, D211–D222 (2010).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Huerta-Cepas, J. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).
Quevillon, E. et al. InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005).
Wang, X. & Wang, L. GMATA: An integrated software package for genome-scale SSR mining, marker development and viewing. Front. Plant Sci. 7, 1350 (2016).
R Core Team. R: A language and environment for statistical computing. R foundation for statistical computing. (2010).
NCBI GenBank https://identifiers.org/ncbi/insdc:OP999653 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc:OP999654 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR22577462 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR22577463 (2024).
Rahuman, S. et al. Transcriptomic approach on Villorita cyprinoides (black clam). Figshare https://doi.org/10.6084/m9.figshare.25240684 (2024).
Acknowledgements
The authors would like to sincerely thank the Director of ICAR-CMFRI, Kochi and the Head of Marine Biotechnology, Fish Nutrition, and Health Division for their support and facilities provided for the research. We also thank Dr. P. Vijayagopal, former Head of the Division, for providing research facilities. This work is a part of the Ph.D. thesis of S.R. and the author acknowledges the Council of Scientific and Industrial Research (CSIR), India for the fellowship received for this research.
Author information
Authors and Affiliations
Contributions
S.R. conceptualized the experiment. S.R. and J.N.S. collected specimens, processed the samples, and wrote the paper. S.R., W.S. and E.V. analyzed data, and A.P.K. critically reviewed the manuscript and coordinated the work.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Rahuman, S., N. S., J., Sebastian, W. et al. Tidings from the Tides–De novo transcriptome assembly of the endemic estuarine bivalve Villorita cyprinoides. Sci Data 11, 723 (2024). https://doi.org/10.1038/s41597-024-03541-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03541-4
- Springer Nature Limited