Abstract
Mass spectrometry-based methods allow for the direct, comprehensive analysis of expressed proteins and their quantification among different conditions. However, in general identification of proteins by assigning experimental mass spectra to peptide sequences of proteins relies on matching mass spectra to theoretical spectra derived from genomic databases of organisms. This conventional approach limits the applicability of proteomic methodologies to species for which a genome reference sequence is available. Recently, RNA-sequencing (RNA-Seq) became a valuable tool to overcome this limitation by de novo construction of databases for organisms for which no DNA sequence is available, or by refining existing genomic databases with transcriptomic data. Here we present a generic pipeline to make use of transcriptomic data for proteomics experiments. We show in particular how to efficiently fuel proteomic analysis workflows with sample-specific RNA-sequencing databases. This approach is useful for the proteomic analysis of so far unsequenced organisms, complex microbial metatranscriptomes/metaproteomes (for example in the human body), and for refining current proteomics data analysis that solely relies on the genomic sequence and predicted gene expression but not on validated gene products. Finally, the approach used in the here presented protocol can help to improve the data quality of conventional proteomics experiments that can be influenced by genetic variation or splicing events.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Ong S-E, Blagoev B, Kratchmarova I et al (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 1:376–386
Hsu J, Huang S, Chow N et al (2003) Stable-isotope dimethyl labeling for quantitative proteomics. Anal Chem 75:6843–6852
Wiese S, Reidegeld KA, Meyer HE et al (2007) Protein labeling by iTRAQ: a new tool for quantitative mass spectrometry in proteome research. Proteomics 7:340–350
Gerber SA, Rush J, Stemman O et al (2003) Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc Natl Acad Sci U S A 100:6940–6945
Schwanhäusser B, Busse D, Li N et al (2011) Global quantification of mammalian gene expression control. Nature 473:337–342
Mann M, Kulak NA, Nagaraj N et al (2013) The coming age of complete, accurate, and ubiquitous proteomes. Mol Cell 49:583–590
Gstaiger M, Aebersold R (2009) Applying mass spectrometry-based proteomics to genetics, genomics and network biology. Nat Rev Genet 10:617–627
Freiwald A, Weidner C, Witzke A et al (2013) Comprehensive proteomic data sets for studying adipocyte-macrophage cell-cell communication. Proteomics 13:3424–3428
Meierhofer D, Weidner C, Hartmann L et al (2013) Protein sets define disease states and predict in vivo effects of drug treatment. Mol Cell Proteomics 12:1965–1979
Mann M, Hendrickson RC, Pandey A (2001) Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem 70:437–473
Perkins DN, Pappin DJC, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567
Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989
Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467
Tabb DL, Fernando CG, Chambers MC (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6:654–661
Shilov IV, Seymour SL, Patel AA et al (2007) The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol Cell Proteomics 6:1638–1655
Cox J, Neuhauser N, Michalski A et al (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 10:1794–1805
Nesvizhskii AI (2010) A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 73:2092–2123
Wang X, Slebos RJC, Wang D et al (2012) Protein identification using customized protein sequence databases derived from RNA-Seq data. J Proteome Res 11:1009–1017
Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46
Evans VC, Barker G, Heesom KJ et al (2012) De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat Methods 9:1207–1211
Lopez-Casado G, Covey PA, Bedinger PA et al (2012) Enabling proteomic studies with RNA-Seq: the proteome of tomato pollen as a test case. Proteomics 12:761–774
Sheynkman GM, Shortreed MR, Frey BL et al (2013) Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol Cell Proteomics 12:2341–2353
Adamidi C, Wang Y, Gruen D et al (2011) De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics. Genome Res 21:1193–1200
He R, Kim M-J, Nelson W et al (2012) Next-generation sequencing-based transcriptomic and proteomic analysis of the common reed, Phragmites australis (Poaceae), reveals genes involved in invasiveness and rhizome specificity. Am J Bot 99:232–247
Song J, Sun R, Li D et al (2012) An improvement of shotgun proteomics analysis by adding next-generation sequencing transcriptome data in orange. PLoS One 7, e39494
Romero-Rodríguez MC, Pascual J, Valledor L et al (2014) Improving the quality of protein identification in non-model species. Characterization of Quercus ilex seed and Pinus radiata needle proteomes by using SEQUEST and custom databases. J Proteomics 105:85–91
Wu X, Xu L, Gu W et al (2014) Iterative genome correction largely improves proteomic analysis of nonmodel organisms. J Proteome Res 13:2724–2734
Woo S, Cha SW, Merrihew G et al (2014) Proteogenomic database construction driven from large scale RNA-seq data. J Proteome Res 13:21–28
Armengaud J, Trapp J, Pible O et al (2014) Non-model organisms, a species endangered by proteogenomics. J Proteomics 105:5–18
Wang X, Zhang B (2014) Integrating genomic, transcriptomic, and interactome data to improve peptide and protein identification in shotgun proteomics. J Proteome Res 13:2715–2723
Wang X, Zhang B (2013) customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics 29:3235–3237
Luge T, Kube M, Freiwald A et al (2014) Transcriptomics assisted proteomic analysis of Nicotiana occidentalis infected by “Candidatus Phytoplasma mali” strain AT. Proteomics 14:1882–1889
Conesa A, Götz S, García-Gómez JM et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676
Gillet LC, Navarro P, Tate S et al (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 11:O111.016717
Egertson JD, Kuehn A, Merrihew GE et al (2013) Multiplexed MS/MS for improved data-independent acquisition. Nat Methods 10:744–746
Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8:1494–1512
Rice P (2000) The European Molecular Biology Open Software Suite EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16:2–3
R Development Core Team R (2011) R: a language and environment for statistical computing. R Found Stat Comput 1:409
Cox J, Matic I, Hilger M et al (2009) A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat Protoc 4:698–705
Acknowledgement
Our work was supported by the German Ministry for Education and Research (BMBF, grant number 0315082, 01EA1303 to S.S.), the European Union (FP7/2007-2013), under grant agreement n° 262055 (ESGI), and the Max Planck Society. This work is part of the Ph.D. thesis of T.L.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this protocol
Cite this protocol
Luge, T., Sauer, S. (2016). Generating Sample-Specific Databases for Mass Spectrometry-Based Proteomic Analysis by Using RNA Sequencing. In: Reinders, J. (eds) Proteomics in Systems Biology. Methods in Molecular Biology, vol 1394. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3341-9_16
Download citation
DOI: https://doi.org/10.1007/978-1-4939-3341-9_16
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3339-6
Online ISBN: 978-1-4939-3341-9
eBook Packages: Springer Protocols