The PeptideAtlas of a widely cultivated fish Labeo rohita: A resource for the Aquaculture Community

Nissa, Mehar Un; Reddy, Panga Jaipal; Pinto, Nevil; Sun, Zhi; Ghosh, Biplab; Moritz, Robert L.; Goswami, Mukunda; Srivastava, Sanjeeva

doi:10.1038/s41597-022-01259-9

The PeptideAtlas of a widely cultivated fish Labeo rohita: A resource for the Aquaculture Community

Data Descriptor
Open access
Published: 13 April 2022

Volume 9, article number 171, (2022)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

The PeptideAtlas of a widely cultivated fish Labeo rohita: A resource for the Aquaculture Community

Download PDF

3393 Accesses
10 Citations
4 Altmetric
Explore all metrics

Abstract

Labeo rohita (Rohu) is one of the most important fish species produced in world aquaculture. Integrative omics research provides a strong platform to understand the basic biology and translate this knowledge into sustainable solutions in tackling disease outbreak, increasing productivity and ensuring food security. Mass spectrometry-based proteomics has provided insights to understand the biology in a new direction. Very little proteomics work has been done on ‘Rohu’ limiting such resources for the aquaculture community. Here, we utilised an extensive mass spectrometry based proteomic profiling data of 17 histologically normal tissues, plasma and embryo of Rohu to develop an open source PeptideAtlas. The current build of “Rohu PeptideAtlas” has mass-spectrometric evidence for 6015 high confidence canonical proteins at 1% false discovery rate, 2.9 million PSMs and ~150 thousand peptides. This is the first open-source proteomics repository for an aquaculture species. The ‘Rohu PeptideAtlas’ would promote basic and applied aquaculture research to address the most critical challenge of ensuring nutritional security for a growing population.

Measurement(s)	Proteins and Peptides
Technology Type(s)	Mass Spectrometry
Sample Characteristic - Organism	Labeo rohita
Sample Characteristic - Location	India

Proteomics: Applications and Advances

The Queen Conch (Lobatus gigas) Proteome: A Valuable Tool for Biological Studies in Marine Gastropods

Article 09 August 2019

Developing Well-Annotated Species-Specific Protein Databases Using Comparative Proteogenomics

Background & Summary

The average annual increase in global consumption of fish has outpaced population growth. Of the global animal protein consumption, 20% is met by fish suggesting the importance of fish in global food security and nutrition. India ranks second in global aquaculture production and Indian major carps (IMCs) contribute to more than 75% of its aquaculture economy¹. Labeo rohita (Rohu) is an IMC and among the top eleven finfish species produced in world aquaculture¹. With the emergence of genomic information for Rohu, this species has entered the post-genomic era such as transcriptomics, proteomics and metabolomics research to address key issues like safety, quality and health in aquaculture.

Proteomic approaches have been applied in diverse areas to investigate developmental biology, physiology, disease mechanisms, impact of stress inducers² and effects of dietary supplements on overall physiology of fish^3,4. Application of proteomics studies in zebrafish and Xiphophorus sp. has revealed the role of phosphorylated Ezrin in gastrulation⁵ and peroxiredoxins in human melanoma⁶. Proteomics can identify and explore sensitive and specific markers for assessing the quality of fish or fishery related products⁷. The effect of pesticide mixtures and temperature have also been explored in goldfish (Carassius auratus)⁸. All these findings suggest the importance of proteomic characterization of fish in addressing basic biological to ecological, environmental and food related issues.

Mass spectrometry (MS) based proteomic approaches are progressively used to disentangle complex biological questions, often associated with other omics disciplines (e.g., genomics, transcriptomics, metabolomics)^9,10. Proteome reference maps for many organisms such as human and zebrafish have been generated using high resolution mass spectrometry^11,12,13. A recent publication of Rohu genome reported a prediction of 26,400 protein coding genes¹⁴. However, proteomics studies in Rohu are rare with most studies focusing on only a particular tissue in isolation^15,16.

Data repositories like PeptideAtlas¹⁷, PRIDE¹⁸ and Global Proteome Machine Database¹⁹ enable successful planning of MS-based experiments for biomedical research. The PeptideAtlas project mainly provides a large collection and precise analysis of available MS-based proteomics data. With the exception of the model organism, Zebrafish, no other aquaculture species is well represented so far in any of the publicly available proteomics databases. Towards this goal, an extensive proteomic profiling of 17 histologically normal tissues in Rohu, embryo and plasma was performed using high-resolution high-mass accuracy mass spectrometry. Here, we provided mass spectrometric evidence of more than 150 thousand peptides corresponding to 6015 high confidence canonical proteins with 1% FDR. This dataset has been utilised to develop the PeptideAtlas repository for Rohu. To our knowledge, this is the first such extensive open-source peptide dataset for Rohu.

This work could be considered as a basis for proteomic research on specific genes related to fish health by studying various aspects like improvement in fertility, muscle quality and molecular alterations during stress conditions²⁰. The PeptideAtlas interface is user friendly and very useful in designing targeted proteomic experiments by evaluating the candidate peptides or transitions suited for targeted proteomics based diagnostic assays for fish disease, safety and quality. Using this dataset, spectral libraries can be generated for designing and validating the targeted proteomics data. We believe this extensive proteomic sequence information would complement the genomic information allowing basic and applied research to move faster in fisheries and aquaculture sectors.

Methods

Fish collection and acclimatisation

Three-month old healthy L. rohita fingerlings of around 10 ± 2 g weight, were collected from Powarkheda Regional Centre of ICAR-CIFE, Madhya Pradesh, India. Laboratory conditions used for fingerling acclimatisation included aeration 24 h, daylight 12 h, 10% daily water exchange, water temperature 28–30 °C and feeding twice by 2% of body weight. Following an acclimatisation of seven days, five healthy fishes were placed in an aquarium under starving conditions for one day followed by euthanization for sample collection. Nineteen different types of samples were collected as shown in Table 1 which includes one whole embryonic tissue sample, blood plasma and 17 tissues. Fifteen of the tissues were collected from fingerlings whereas plasma and gonadal tissues from adult fishes. Blood plasma was collected from female fish and embryos were sampled after four days of fertilisation. Collected samples were stored at −80 °C till further use.

Table 1 Tissue types and sampling details.

Full size table

Protein extraction for in-depth proteomic profiling

For extraction of proteins, organ wise samples collected from individual fish were pooled and taken forward. For lysing the tissue, urea buffer containing 8 M Urea, 50 mM Tris-HCl, 1 mM MgCl₂ and 75 mM NaCl was used. For fifteen of the tissues including spleen, spinal cord, skin, scales, muscle, male gonad, liver, kidney, heart, gut, gill, female gonad, eye, brain and air bladder, pH shift solubilisation method²⁰ was used for protein extraction. For these tissues, proteins were extracted using urea buffer in three different pH i.e., pH 2.5, 8 and 13. To around 75–100 mg of tissue sample, 300 µl of lysis buffer was added followed by sonication for 2-3 times (Vibra-Cell™ Ultrasonic Liquid Processors, VCX 130 (Sonics). The sample was bead beated using Zirconium/Silica beads (Cat. No. 11079110z) for 90 s. It was followed by centrifugation at 8000 rpm at 4 °C for 15 min to get a clear supernatant containing proteins. For the embryo sample, whole embryos were processed using Trizol method²¹ of protein extraction. Plasma sample were directly (without any depletion) taken for downstream analysis.

Protein quantification and quality check on SDS-PAGE

Protein quantification was performed by Bradford protein assay, using Bovine Serum Albumin (BSA) as a standard. Accordingly, absorbance was taken at 595 nm and standard curve was plotted using BSA dilutions and concentration for all the unknown samples was determined. In order to check the quality of the protein extract, 1-dimensional SDS-PAGE was performed for which 15 ug protein was loaded for each sample onto a mini-vertical gel (Bio-Rad Mini PROTEAN® 3 Cell, Bio-Rad Laboratories), in accordance to Laemmli protocol²². As the extracted protein was present in urea containing buffer, no heating step was performed before SDS-PAGE to avoid the risk of Carbamylation. Gel electrophoresis was performed for 1-2 hours followed by staining in Coomassie blue R350 solution in methanol and acetic acid. Gel was destained to visualise the protein bands (Supplementary Fig. S1a).

Fractionation, in-gel digestion and peptide preparation

For in-gel digestion, 30 µg protein from each sample was run on SDS-PAGE as above. Each sample was run in duplicate and at least six slices per lane were excised (Fig. 1a). For plasma sample, 11 gel fractions were processed for in-gel digestion. The electrophoresis was performed for only 30–40 minutes i.e., ~1 cm in the resolving gel. Before performing the digestion of protein, stain was removed followed by protein reduction and alkylation. For removing stain from the gel pieces an alternate treatment with buffer salt ammonium bicarbonate (NH₄HCO₃) and organic solvent Acetonitrile (ACN) solution was performed. Proteins were reduced using Dithiothreitol (DTT) and alkylated using Iodoacetamide (IAA). For protein digestion, trypsin was used in ~1:30 enzyme to protein (w/w) ratio. Peptides were extracted from the gel pieces after 16–18 hours of digestion using an increasing gradient of ACN solution. Peptides were desalted using C18 Empore™ SPE Disks matrix (Merck). Peptide quantification was done using Scopes method²³ and one µg of peptide was subjected to mass spectrometric analysis.

Data-dependant Acquisition by Liquid Chromatography Tandem Mass spectrometry (LC-MS/MS)

An Easy-nLC nano-flow liquid chromatography 1200 system was used for the separation of peptides following in-gel digestion (Fig. 1a). With a flow rate of 5 µl/min, one µg desalted peptides were loaded to pre-analytical column (Thermo Scientific, PN 164564-CMD, Trap column nanoViper C18, 5 µm, 100 Å, Acclaim PepMap 100- 100 µm x 2 cm). The peptides were run over a gradient of 120 min in solvent B which was a solution of 80% ACN with 0.1% Formic acid (FA). The flow rate was kept as 300 nl/min for resolving peptides on the analytical column (Thermo Scientific, PN ES903, C18- 75 μm × 50 cm, 2 μm particle, PepMap RSLC, 100 Å pore size). Mass spectrometric data was acquired using Orbitrap mass analyser in DDA mode in a full scan range of 375–1700 m/z at a mass resolution of 60,000. For dynamic exclusion, the mass tolerance was set as ± 10 for 40 s and for MS2 precursors, the isolation mass window was set to 1.2 Da. High energy Collision Dissociation (HCD) method was used for MS/MS fragmentation. For MS1 and MS2, AGC target was set to be 400000 and 10000, respectively. A lock mass of 445.12003 m/z was used for positive internal calibration.

The mass spectrometric data used in this study for developing PeptideAtlas of Labeo rohita has been utilised for tissue wise profiling of post-translational modifications (PTMs) and comparative protein expression analysis as reported in our recent study²⁴.

Protein identification, TPP analysis and PeptideAtlas assembly

The raw mass spectrometry data (.raw) generated from the Orbitrap Fusion mass spectrometer was converted to .mzML files using MSconvert 3.0.5533 tool²⁵. The converted mzML files were searched using Comet (2019.01 rev.1)²⁶ tool against L. rohita NCBI protein database. This database consisted of protein sequences generated by translation of coding sequences (CDS) through gene predictions after whole genome sequencing of Labeo rohita (Bio project: PRJNA437789). The database had locus tag IDs (prefix Rohu_) and EMBL/Bank/GenBank/DDBJ CSS IDs (prefix RXN). UniProt database for this species (ProteomeID- UP000290572) consists of a UniProt protein identifier for each CD. The NCBI database had 32687 entries and the UniProt database which was downloaded on 16^th August, 2019, has 32379 entries and is the subset of the NCBI database. For initial comet search, NCBI database was used whereas all downstream steps including protein identification and PeptideAtlas assembly were performed using combined database of NCBI and UniProt. We utilized the combined database so that the proteins which are not yet included in the UniProt database, can also be covered in PeptideAtlas build.

To the protein database, an equal number of decoy and contaminant sequences were added. Decoy sequences were generated using “randomize sequences and interleave entries” decoy algorithm whereas the contaminant sequences were taken from common Repository of Adventitious Proteins, cRAP, database (http://www.thegpm.org/crap/). The parameters used for the data analysis in Trans-Proteomic Pipeline (TPP) suite include peptide mass tolerance 20 ppm, fragment ions bin tolerance 0.05 m/z and monoisotopic mass offset 0.0 m/z, two allowed missed cleavages, fully tryptic and semi-tryptic peptides, oxidation of tryptophan and methionine (+15.994915 Da) as variable modifications and carbamidomethylation of cysteine (+57.021464 Da) as static modification. Protein identification was performed using TPP V 5.2.0 Flammagenitus²⁷. To score for peptide spectral match (PSM), integrated tools of PeptideProphet and iProphet were used for individual files and the score unique peptides in combined PeptideProphet files. Finally, ProteinProphet tool was used for protein identification based on iProphet input and true identifications were selected at less than 1% FDR^28,29,30. The whole workflow is represented in Fig. 1b.

The chimeric spectra were accessed by reanalysing the iProphet files using reSpect algorithm³¹. In brief, reSpect search was performed on iProphet files by increasing the precursor mass tolerance to 3.0 Da. TPP analysis was performed as mentioned earlier and the process of reSpect and TPP analysis was repeated once. A minimum iProphet probability ≥ 0.0 was used for the reSpect search. PeptideAtlas processing pipeline was used to build PeptideAtlas by combining the iProphet results from regular TPP and reSpect search results. The spectrum was filtered at variable probability to get constant peptide spectrum match (PSM) FDR of 0.0008% for each experiment. The statistically significant results were organized in the “Rohu PeptideAtlas”, which is built and maintained by ISB at the given link. http://www.peptideatlas.org/builds/rohu/.

Ortholog analysis for the identified proteome

Ortholog analysis for the total canonical proteins was performed in EGGNOG-mapper genome-wide functional annotation tool³² (http://eggnog-mapper.embl.de/). Firstly, the FASTA sequences were acquired from UniProt³³ of all the protein IDs and taken as input list (Supplementary Table S1). During this analysis, taxonomic scope was selected as Actinopterygii, orthology restrictions selected as ‘transfer annotation from any ortholog’, seed ortholog detection criteria were set to be 0.001.

Acquisition of selected reaction monitoring (SRM) data for targeted verification

The targeted proteomic data was acquired using a Thermo TSQ Altis Triple Quadrupole Mass Spectrometer linked to a Thermo Vanquish HPLC system. The data was acquired using an SRM/ MRM (Selected/ Multiple reaction monitoring) acquisition mode. A Hypersil GOLD analytical column (Thermo Fisher Scientific, 100 × 2 mm, C18) was used for the reverse phase separation of peptides. Samples were run at a flow rate of 450 µl/ min. One µg of desalted peptide sample was subjected to the column and run for 10 minutes. The liquid chromatography system used, consisted of 0.1% formic acid (FA) in milliQ water as solvent A and 80% Acetonitrile (ACN) and 0.1% FA as solvent B. Throughout the run, the column temperature was set to be 45 ^οC and cycle time was kept as 2 s. The Skyline daily software³⁴ (version 20.2.1) was utilised for analysing the data.

Data Records

Data record 1

Mass spectrometry data obtained after DDA-MS experiments includes raw files (.raw) for 19 different sample types of fish (Supplementary File S1). This mass spectrometry data along with the protein databases (.fasta) has been deposited to the ProteomeXchange Consortium via the PRIDE partner repository and can be accessed through the identifier PXD026377 using the link https://www.ebi.ac.uk/pride/archive/projects/PXD026377³⁵. The comet search parameter file and MAYU statistical report (.xlsx) is provided in Supplementary File S2 and S3 respectively. Peptides identified are enlisted in Supplementary Table S2. The details of the proteins and peptides identified along with various interactive data and visualizations are available at PeptideAtlas and can be accessed using the given link https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/buildDetails?atlas_build_id=500³⁶.

Data record 2

The targeted mass spectrometry data includes spectral library files (.blib), the target peptide list selected based on PeptideAtlas data (.xls), instrument raw files (.raw) and the result imported skyline documents (.sky, .view, .skyd, .skyl). The targeted proteomics data including all skyline documents, raw files and spectral library have been deposited to Panorama web server³⁷. Also, the target peptides and transition lists are given in supplementary Tables S3 to S5.

Data record 3

In the EggNOG database³² based ortholog analysis, the canonical proteins were mapped against orthologs corresponding to wide range of cellular processes and metabolic functions. Around 97% of the mapped orthologs belong to Actinopterygii, the class of ray finned fishes and majority of them were linked to signal transduction mechanism. This information is represented in Fig. 2/ Table 2 and Supplementary Table S1).

Table 2 Distribution of identified canonical proteins across various orthologs*.

Full size table

Technical Validation

Building and validation of an extensive PeptideAtlas for Labeo rohita

Targeted proteomics is an emerging approach for acquiring proteome wide qualitative and quantitative information in a targeted manner. Generally, the targeted proteomics involves a hypothesis driven experiment which starts from a list of precise protein/peptide targets to be monitored. PeptideAtlas is a compendium of peptides that can serve as an important resource for designing a targeted experiment or validating the protein/peptide target related to a shotgun experiment. To generate the PeptideAtlas resource for Rohu, the DDA-MS dataset was analysed using a combined non-redundant Uniprot database and NCBI database of Labeo rohita (details in the Methods). To make the data more reliable, accurate and to avoid the identification of false positives, we used MAYU³⁸ tool both at the protein and peptide level. Mayu is a software used to determine false discovery rates (FDRs) for protein identification (protFDR), peptide identification (pepFDR) and peptide-spectrum match (mFDR). All experiments were thresholded at a probability that yields an iProphet model-based PSM-FDR of 0.0008%. The exact probability varies from experiment to experiment depending on how well the modeling can separate correct from incorrect. However, this probability threshold is typically greater than 0.99. For each experiment, the spectra were filtered at variable probability to get constant PSM level FDR of 0.0008%. Throughout the procedure, decoy identifications were retained and then used to compute final decoy-based FDRs. The model-based PSM-FDR was adjusted if the final decoy-based protein FDR is higher than 1%. For protein identification, based on iProphet input, true identifications were selected at less than 1% FDR.

This resulted in the identification of 6015 high confident canonical proteins along with 667 indistinguishable representative proteins, 671 marginally distinguished proteins, 768 representative proteins and 1165 other proteins. The overall summary for Rohu PeptideAtlas is shown in Table 3. Briefly, the current build contains more than 2.96 million identified peptide MS/MS spectra with additional information for a selection of PSMs at FDR level less than or equal to 0.0008% (i.e., 150781 distinct peptides at 0.18% peptide level FDR) (Fig. 3a). This peptide information corresponds to all the identified proteins at less than 1% protein level FDR. All tissues except muscle, fin, scale and plasma have contributed ~15,000-20,000 peptides and ~2000–3000 canonical proteins each to the build (Fig. 3b). Majority of the identified peptides were doubly or triply charged with a length of 10–20 amino acids and most of the identified peptides were without any missed cleavage (Fig. 3c,d, Supplementary Fig. S1b). Each canonical protein has at least 2 unique peptides and ~93% of them had at least ≥3 unique peptides (Fig. 3d, Table S2). As far as the sequence coverage is concerned, observed peptides for ~54% of the canonical proteins spanned >30% of the protein sequence whereas 22% of canonical proteins had >60% coverage (Fig. 3e, Table S2). PeptideAtlas is a user-friendly portal for researchers who can access protein and peptide related information. The Rohu PeptideAtlas hence provides a platform for obtaining detailed information of all identified proteins and peptides that can be helpful for discovery experiments as well as designing targeted assays for L. rohita.

Table 3 Organ wise numerical summary for the data in Labeo rohita PeptideAtlas.

Full size table

Protein and peptide search in Labeo rohita PeptideAtlas

For any targeted experiment, proteotypic peptides are the ideal targets which can be selected based on several scores assigned to a peptide in PeptideAtlas. For each protein entry, a dynamic page is obtained to provide mass spectral information and peptide modification details about the protein such as total observed peptides and a graphical representation of coverage of protein for each observed peptide. Additionally, all observed peptides are represented in a tabular format and ranked according to their empirical suitability score (ESS) empirical observability score (EOS) (Fig. 4a). ESS is a measure of incidence of observing a protein/peptide in a given sample while EOS represents how much suitable is the observed peptide for the significant proteotypic detection of protein from which it was obtained. Peptides having high value of EOS and map to a unique protein are the most suited candidates to monitor for identifying/quantifying a protein in a given sample. The protein view page also gives the information of all the tissues/sample in which the particular protein was detected.

For any observed peptide, a peptide view page presents all available information of respective peptide including its alignment to particular protein, genome mapping, modification site (if any). It also presents the peptide spectra in each sample where the peptide was observed (Fig. 4b). Spectral quality can be estimated based on the spectral information provided for each peptide in the Lorikeet spectral viewer. Peptide spectra along with the precursor mass and all product ion masses and detected product ions are presented in tabular format.

Utility of PeptideAtlas information in SRM based targeted proteomic experiments

A set of peptides was taken for targeted verification using selected reaction monitoring (SRM) approach. Results were matched with the spectral library for the reliability of the data. This section shows the significance of PeptideAtlas in targeted experiments. We have performed targeted experiment for two proteins in female gonad tissue and similar kind of experiments can be designed and validated using PeptideAtlas information for all studied tissues of Rohu. Following steps were followed for SRM based verification experiment.

Generation of spectral library

PepXML (.pepXML) files obtained after comet search for female gonad sample were used to create a non-redundant spectral library. The spectral library was created using skyline software³⁴ through the ‘build’ option under library tab inside the peptide settings. Finally, a .blib file was created and selected for the experiment.

Peptide and transition selection

Two proteins; Elongation factor 1 alpha (EF1 alpha-A0A498N236) and Zona pellucida sperm binding 3 like protein (zp3- A0A498NTM4) were selected for targeted verification. Only peptides unique to these proteins and without any missed cleavage were considered. Selected peptides were having ESS score greater than or equals to 0.4 and length ranged from 8 to 30 amino acids (Supplementary Table S3). Using skyline software, it was found that 593 transitions corresponding to 30 peptides and 44 precursors of the selected proteins were found in the spectral library of female gonad sample. Hence, two transition lists (TL1-305 transitions and TL2- 288 transitions) were exported for preparing the methods for performing SRM experiment (Supplementary Tables S4, S5).

Performing an SRM based targeted proteomics experiment

Instrument used for SRM experiment was Thermo Altis Triple quadrupole mass spectrometer. Transition lists for selected peptides were used to create respective targeted methods. Peptides obtained from female gonad tissue were run against the prepared methods in replicates (i.e., R1 and R2 for both the transition lists) with a liquid chromatography gradient of 10 minutes (See methods section). Data acquired was imported for further analysis in skyline against the same document from which the transition list was exported.

Validation of data/ spectral information using spectral library

A combination of multiple factors is generally used to correctly identify the peptides in a targeted experiment. The gold standard for this is heavy labelled peptides that co-elute with the peptide of interest. However, when heavy labelled peptides are unavailable as can be the case in most laboratory experiments, fragment ion matching to a spectrum library can be the best method to identify the peptide of interest unambiguously³⁹. In case of spectral library matching, the observed spectra are matched with the existing spectra in the spectral library and a similarity score is calculated called as dot product (dotp). The dotp score is based on the normalised spectral contrast angle, which provides a measure of peak detection confidence. The dotp could range from 0 for lowest similarity to 1 for highest similarity and confident identification⁴⁰.

In order to determine the promising peptides detected for the selected proteins; we imported the results to skyline. For the precursors, both singly and doubly charged product ions corresponding to y2 through last ion were considered. The spectral information was compared with the spectral library created from PeptideAtlas resource in order to confirm the reliability of the data. This was done based on the dot product metric (dotp) which is a measure of similarity between library spectra and query peaks⁴¹. Based on peak shape, peak area and co-elution of fragment ions, many peptides gave consistent results in both the replicate runs with a decent dotp value. Peak area and intensity values were consistent between the replicate runs and no peaks were observed in the blank runs. Table 4 shows respective dotp values for both doubly charged and/triply charged precursor of targeted peptides along with their ESS and EOS scores. For example, the peptide IGGVGTVPVGK and EVAVDFQMR were matched with the spectral library with a dotp value of more than 0.8 and 0.9, respectively in both the replicates (Fig. 5a,b). Similarly, a few more peptides exhibited single peaks for the respective peptide with no ambiguity.

Table 4 List of peptides selected for SRM based verification along with some details from PeptideAtlas and match score (dotp*) with spectral library.

Full size table

However, there were several peptides for which multiple peaks scattered across the LC gradient were observed. These peaks were found to have good shape with co-elution, making it difficult to identify the correct peak in the absence of corresponding heavy labelled peptide. In such cases, spectral libraries play a significant role for determining the best match to obtain reliable and representative fragmentation patterns. For instance, for the peptide GEFEAGISR, two peaks were obtained in both the replicate runs, one at retention time 4.6 min and other at 5.9 min (Fig. 5c,d). Based on the match with spectral library (created using female gonad PepXML files) in both the runs, peak obtained at 4.6 would be the real peak as it has a dotp value of 0.85/0.84 compared to the one at 5.9 with a dotp of 0.34.

Usage Notes

Development and evaluation of a comprehensive PeptideAtlas for Labeo rohita

In the present study, we developed an open resource for fish proteome analysis for the scientific community based on high resolution mass spectrometry data from 19 different sample types of L. rohita (Rohu) using different protein extraction methods and sample fractionation. This is the first and foremost comprehensive fish proteome analysis (along with PTM information that is to be updated soon in PeptideAtlas as a part of another study). The complete building and evaluation process of the Rohu PeptideAtlas is explained elaborately in the Methods section.

A valuable resource for designing targeted proteomics experiments

SRM or MRM based targeted proteomic experiments require unique transitions of the targeted proteins (or peptides) for accurate quantification. PeptideAtlas is the best resource for selecting the unique peptides and respective transitions using several tools in the PeptideAtlas. It also provides the information for best observable or identified tryptic peptides across wide range of sample types and also across different types of mass spectrometry. The interactive interface of PeptideAtlas helps to visualize individual and consensus spectra in PeptideAtlas to select and export either single or multiple targeted peptides/proteins and its respective transitions as.csv/.tsv format which can be imported directly into the mass spectrometry instrument for SRM/MRM experiment.

A valuable resource for spectral library generation and data search

The Rohu PeptideAtlas built is dynamic and can be updated whenever a new proteomics dataset is generated in-house or get uploaded in the public repositories such as PRIDE, MASSIVE⁴² etc. The data repository in PeptideAtlas, TPP output files (.pepXML) used for generating PeptideAtlas and the results from PeptideAtlas can be used for generating spectral library using SpectraST, an integrated tool in TPP package. Spectral libraries are new generation peptide database with experimentally identified spectra used for the accurate and precise identification/quantification of peptides/proteins in DIA/SWATH analysis or for SRM/MRM data analysis.

Best resource for Proteogenomic analysis and annotation

Accurate annotation of the genome is still a challenging task despite availability of advanced technology and algorithms. Integration of high-resolution mass spectrometry along with genomic data would improve the gene annotations. Rohu genome was sequenced recently and the preliminary annotations are available with no curation and it also contains several hypothetical proteins and pseudogenes¹⁴. Currently, in the UniProt database of Labeo rohita, only two proteins are reviewed which have protein evidence (PE) level 2 i.e., experimental evidence at transcript level. However, none of the protein has PE level 1 that represents the protein level evidence. The current dataset can help the UniProt curators, by providing the mass spectrometric based protein level evidence for the existence of Labeo rohita proteome. It has been reported that gene annotation can be improved with the help of mass spectrometric data^43,44. Tanner et al. utilised the tandem mass spectra from human peptides and validated 11,000 introns and 39,000 exons at translation level along with identification of novel exons and splicing events⁴⁵. In a similar manner, the peptide dataset provided in Rohu PeptideAtlas could help to improve the genome annotations and may provide evidence for pseudogenes, alternative splicing events, extended exons and hypothetical proteins.

Code availability

The authors do not have code specific to this work to disclose.

References

FAO. “Sustainability in action.” State of World Fisheries and Aquaculture. Food and Agriculture Organization of the United Nations, Rome, Italy (2020).
Forne, I., Abian, J. & Cerda, J. Fish proteome analysis: model organisms and non-sequenced species. Proteomics 10, 858–872 (2010).
Article CAS Google Scholar
Cerqueira, M. et al. How tryptophan levels in plant-based aquafeeds affect fish physiology, metabolism and proteome. Journal of proteomics 221, 103782 (2020).
Article CAS Google Scholar
Ghaedi, G., Keyvanshokooh, S., Azarm, H. M. & Akhlaghi, M. Proteomic analysis of muscle tissue from rainbow trout (Oncorhynchus mykiss) fed dietary β-glucan. Iranian journal of veterinary research 17, 184 (2016).
PubMed PubMed Central Google Scholar
Link, V. et al. Identification of regulators of germ layer morphogenesis using proteomics in zebrafish. Journal of cell science 119, 2073–2083 (2006).
Article CAS Google Scholar
Lokaj, K. et al. Quantitative differential proteome analysis in an animal model for human melanoma. J Proteome Res 8, 1818–1827 (2009).
Article CAS Google Scholar
Pedreschi, R., Hertog, M., Lilley, K. S. & Nicolai, B. Proteomics for the food industry: opportunities and challenges. Critical reviews in food science and nutrition 50, 680–692 (2010).
Article CAS Google Scholar
Gandar, A. et al. Proteome response of fish under multiple stress exposure: Effects of pesticide mixtures and temperature increase. Aquat Toxicol 184, 61–77 (2017).
Article CAS Google Scholar
Williams, E. G. et al. Systems proteomics of liver mitochondria function. Science 352 (2016).
Chick, J. M. et al. Defining the consequences of genetic variation on a proteome-wide scale. Nature 534, 500–505 (2016).
Article ADS CAS Google Scholar
Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).
Article ADS CAS Google Scholar
Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).
Article ADS CAS Google Scholar
Kelkar, D. S. et al. Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis. Molecular & cellular proteomics 13, 3184–3198 (2014).
Article CAS Google Scholar
Das, P. et al. De novo assembly and genome-wide SNP discovery in Rohu Carp, Labeo rohita. Frontiers in genetics 11, 386 (2020).
Article CAS Google Scholar
Goswami, M. et al. Proteomics Analysis of Liver Tissue of Labeo rohita. Current Proteomics 12, 56–62 (2015).
Article CAS Google Scholar
Banerjee, S. et al. Identification of potential biomarkers of hepatotoxicity by plasma proteome analysis of arsenic-exposed carp Labeo rohita. Journal of hazardous materials 336, 71–80 (2017).
Article CAS Google Scholar
Deutsch, E. W., Lam, H. & Aebersold, R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO reports 9, 429–434 (2008).
Article CAS Google Scholar
Vizcaíno, J. A. et al. A guide to the Proteomics Identifications Database proteomics data repository. Proteomics 9, 4276–4283 (2009).
Article Google Scholar
Craig, R., Cortens, J. P. & Beavis, R. C. Open source system for analyzing, validating, and storing protein identification data. Journal of proteome research 3, 1234–1242 (2004).
Article CAS Google Scholar
Surasani, V. K. R., Tyagi, A. & Kudre, T. Recovery of proteins from rohu processing waste using pH shift method: characterization of isolates. Journal of aquatic food product technology 26, 356–365 (2017).
Article CAS Google Scholar
Jaipal Reddy, P. et al. A simple protein extraction method for proteomic analysis of diverse biological specimens. Current proteomics 10, 298–311 (2013).
Article Google Scholar
Laemmli, U. K. Cleavage of structural proteins during the assembly of the head of bacteriophage T4. nature 227, 680–685 (1970).
Article ADS CAS Google Scholar
Scopes, R. Measurement of protein by spectrophotometry at 205 nm. Analytical biochemistry 59, 277–282 (1974).
Article CAS Google Scholar
Nissa, M. U. et al. Organ-Based Proteome and Post-Translational Modification Profiling of a Widely Cultivated Tropical Water Fish, Labeo rohita. Journal of proteome research (2021).
Chambers, M. C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nature biotechnology 30, 918–920 (2012).
Article CAS Google Scholar
Eng, J. K., Jahan, T. A. & Hoopmann, M. R. Comet: an open‐source MS/MS sequence database search tool. Proteomics 13, 22–24 (2013).
Article CAS Google Scholar
Deutsch, E. W. et al. Trans‐Proteomic Pipeline, a standardized data processing pipeline for large‐scale reproducible proteomics informatics. PROTEOMICS–Clinical Applications 9, 745–754 (2015).
Article CAS Google Scholar
Deutsch, E. W. et al. State of the human proteome in 2014/2015 as viewed through PeptideAtlas: enhancing accuracy and coverage through the AtlasProphet. Journal of proteome research 14, 3461–3473 (2015).
Article CAS Google Scholar
Shteynberg, D. et al. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Molecular & cellular proteomics 10, M111. 007690 (2011).
Article Google Scholar
Nesvizhskii, A. I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Analytical chemistry 75, 4646–4658 (2003).
Article CAS Google Scholar
Shteynberg, D. et al. reSpect: software for identification of high and low abundance ion species in chimeric tandem mass spectra. Journal of the American Society for Mass Spectrometry 26, 1837–1847 (2015).
Article ADS CAS Google Scholar
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research 47, D309–D314 (2019).
Article CAS Google Scholar
Apweiler, R. et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Research 32, D115–D119 (2004).
Article CAS Google Scholar
MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).
Article CAS Google Scholar
Nissa, M. U. Proteomic profiling of Labeo Rohita; a widely cultivated fish. PRIDE Archive https://www.ebi.ac.uk/pride/archive/projects/PXD026377 (2022).
Labeo rohita PeptideAtlas. PeptideAtlas https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/buildDetails?atlas_build_id=500 (2022).
Srivastava, S. Multiple reaction monitoring (MRM) based data for targeted validation of proteins in Labeo rohita. Panorama Public https://panoramaweb.org/rohufemalegonad.url (2022).
Reiter, L. et al. Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Molecular & cellular proteomics: MCP 8, 2405–2417 (2009).
Article CAS Google Scholar
Grossegesse, M., Nitsche, A., Schaade, L. & Doellinger, J. Application of spectral library prediction for parallel reaction monitoring of viral peptides. Proteomics 21, 2000226 (2021).
Article CAS Google Scholar
Pino, L. K. et al. The Skyline ecosystem: Informatics for quantitative mass spectrometry proteomics. Mass spectrometry reviews 39, 229–244 (2020).
Article ADS CAS Google Scholar
Frewen, B. E., Merrihew, G. E., Wu, C. C., Noble, W. S. & MacCoss, M. J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal Chem 78, 5678–5684 (2006).
Article CAS Google Scholar
Choi, M. et al. MassIVE. quant: a community resource of quantitative mass spectrometry–based proteomics datasets. Nature methods 17, 981–984 (2020).
Article CAS Google Scholar
Ignasi, F., Joaquin, A. N. & Joan, C. Fish proteome analysis: Model organisms and non‐sequenced species. Proteomics 10, 858–872 (2010).
Article Google Scholar
De Souza, G. A. et al. High accuracy mass spectrometry analysis as a tool to verify and improve gene annotation using Mycobacterium tuberculosis as an example. BMC genomics 9, 1–13 (2008).
Article Google Scholar
Tanner, S. et al. Improving gene annotation using peptide mass spectrometry. Genome research 17, 231–239 (2007).
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by Department of Biotechnology (BT/PR15285/AAQ/3/753/2015) Govt. of India to S.S and M.G. M.N was supported by University Grants Commission (UGC). R.M would like to acknowledge the US National Institutes for Health, National institute for General Medical Sciences under grant No. GM087221, the Office of the Director 1S10OD026936, the National Institute on Aging grant U19AG023122 and NSF award 1920268. We thank Director General, Indian Council of Agricultural Research; Director, ICAR-Central Institute of Fisheries Education, Mumbai for the support and facility. We acknowledge MASS-FIITB at IIT Bombay supported by the Department of Biotechnology (BT/PR13114/INF/22/206/2015) for mass spectrometric data acquisition. We would like to thank Mr. Saicharan Ghantasala, Deeptarup Biswas and Medha Gayathri J Pai for helpful suggestions and technical support.

Author information

These authors contributed equally: Panga Jaipal Reddy, Nevil Pinto.

Authors and Affiliations

Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
Mehar Un Nissa & Sanjeeva Srivastava
Institute for Systems Biology, Seattle, WA, 98109, USA
Panga Jaipal Reddy, Zhi Sun & Robert L. Moritz
Central Institute of Fisheries Education, Indian Council of Agricultural Research, Versova, Mumbai, Maharashtra, 400061, India
Nevil Pinto & Mukunda Goswami
Regional Centre for Biotechnology, Faridabad, 121001, India
Biplab Ghosh

Authors

Mehar Un Nissa
View author publications
You can also search for this author in PubMed Google Scholar
Panga Jaipal Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Nevil Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Biplab Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Robert L. Moritz
View author publications
You can also search for this author in PubMed Google Scholar
Mukunda Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Sanjeeva Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Concept and design: M.N., S.S., R.M. and M.G. Maintenance and sampling: N.P., M.N. Method development and Data acquisition: M.N., N.P. Data analysis and Interpretation: M.N., P.J., Z.S., N.P., B.G., R.M. Constructing database: J.R., Z.S., M.N. Writing: Original draft: M.N., N.P., J.R., Z.S., M.G., S.S., R.M. Writing: Review and editing: M.N., N.P., B.G., J.R., Z.S., R.M., M.G., S.S.

Corresponding authors

Correspondence to Mukunda Goswami or Sanjeeva Srivastava.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Figure 1

Supplementary File S1

Supplementary File S2

Supplementary File S3

Supplementary Table S1

Supplementary Table S2

Supplementary Table S3

Supplementary Table S4

Supplementary Table S5

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nissa, M.U., Reddy, P.J., Pinto, N. et al. The PeptideAtlas of a widely cultivated fish Labeo rohita: A resource for the Aquaculture Community. Sci Data 9, 171 (2022). https://doi.org/10.1038/s41597-022-01259-9

Download citation

Received: 05 August 2021
Accepted: 11 March 2022
Published: 13 April 2022
DOI: https://doi.org/10.1038/s41597-022-01259-9
Springer Nature Limited

The PeptideAtlas of a widely cultivated fish Labeo rohita: A resource for the Aquaculture Community

Abstract

Similar content being viewed by others

Background & Summary

Methods

Fish collection and acclimatisation

Protein extraction for in-depth proteomic profiling

Protein quantification and quality check on SDS-PAGE

Fractionation, in-gel digestion and peptide preparation

Data-dependant Acquisition by Liquid Chromatography Tandem Mass spectrometry (LC-MS/MS)

Protein identification, TPP analysis and PeptideAtlas assembly

Ortholog analysis for the identified proteome

Acquisition of selected reaction monitoring (SRM) data for targeted verification

Data Records

Data record 1

Data record 2

Data record 3

Technical Validation

Building and validation of an extensive PeptideAtlas for Labeo rohita

Protein and peptide search in Labeo rohita PeptideAtlas

Utility of PeptideAtlas information in SRM based targeted proteomic experiments

Generation of spectral library

Peptide and transition selection

Performing an SRM based targeted proteomics experiment

Validation of data/ spectral information using spectral library

Usage Notes

Development and evaluation of a comprehensive PeptideAtlas for Labeo rohita

A valuable resource for designing targeted proteomics experiments

A valuable resource for spectral library generation and data search

Best resource for Proteogenomic analysis and annotation

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation