Generating Sample-Specific Databases for Mass Spectrometry-Based Proteomic Analysis by Using RNA Sequencing

Luge, Toni; Sauer, Sascha

doi:10.1007/978-1-4939-3341-9_16

Generating Sample-Specific Databases for Mass Spectrometry-Based Proteomic Analysis by Using RNA Sequencing

Toni Luge³ &
Sascha Sauer³

Protocol
First Online: 24 December 2015

3707 Accesses
2 Citations
8 Altmetric

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1394))

Abstract

Mass spectrometry-based methods allow for the direct, comprehensive analysis of expressed proteins and their quantification among different conditions. However, in general identification of proteins by assigning experimental mass spectra to peptide sequences of proteins relies on matching mass spectra to theoretical spectra derived from genomic databases of organisms. This conventional approach limits the applicability of proteomic methodologies to species for which a genome reference sequence is available. Recently, RNA-sequencing (RNA-Seq) became a valuable tool to overcome this limitation by de novo construction of databases for organisms for which no DNA sequence is available, or by refining existing genomic databases with transcriptomic data. Here we present a generic pipeline to make use of transcriptomic data for proteomics experiments. We show in particular how to efficiently fuel proteomic analysis workflows with sample-specific RNA-sequencing databases. This approach is useful for the proteomic analysis of so far unsequenced organisms, complex microbial metatranscriptomes/metaproteomes (for example in the human body), and for refining current proteomics data analysis that solely relies on the genomic sequence and predicted gene expression but not on validated gene products. Finally, the approach used in the here presented protocol can help to improve the data quality of conventional proteomics experiments that can be influenced by genetic variation or splicing events.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Ong S-E, Blagoev B, Kratchmarova I et al (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 1:376–386
Article CAS PubMed Google Scholar
Hsu J, Huang S, Chow N et al (2003) Stable-isotope dimethyl labeling for quantitative proteomics. Anal Chem 75:6843–6852
Article CAS PubMed Google Scholar
Wiese S, Reidegeld KA, Meyer HE et al (2007) Protein labeling by iTRAQ: a new tool for quantitative mass spectrometry in proteome research. Proteomics 7:340–350
Article CAS PubMed Google Scholar
Gerber SA, Rush J, Stemman O et al (2003) Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc Natl Acad Sci U S A 100:6940–6945
Article CAS PubMed PubMed Central Google Scholar
Schwanhäusser B, Busse D, Li N et al (2011) Global quantification of mammalian gene expression control. Nature 473:337–342
Article PubMed Google Scholar
Mann M, Kulak NA, Nagaraj N et al (2013) The coming age of complete, accurate, and ubiquitous proteomes. Mol Cell 49:583–590
Article CAS PubMed Google Scholar
Gstaiger M, Aebersold R (2009) Applying mass spectrometry-based proteomics to genetics, genomics and network biology. Nat Rev Genet 10:617–627
Article CAS PubMed Google Scholar
Freiwald A, Weidner C, Witzke A et al (2013) Comprehensive proteomic data sets for studying adipocyte-macrophage cell-cell communication. Proteomics 13:3424–3428
Article CAS PubMed Google Scholar
Meierhofer D, Weidner C, Hartmann L et al (2013) Protein sets define disease states and predict in vivo effects of drug treatment. Mol Cell Proteomics 12:1965–1979
Article CAS PubMed PubMed Central Google Scholar
Mann M, Hendrickson RC, Pandey A (2001) Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem 70:437–473
Article CAS PubMed Google Scholar
Perkins DN, Pappin DJC, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567
Article CAS PubMed Google Scholar
Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989
Article CAS PubMed Google Scholar
Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467
Article CAS PubMed Google Scholar
Tabb DL, Fernando CG, Chambers MC (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6:654–661
Article CAS PubMed PubMed Central Google Scholar
Shilov IV, Seymour SL, Patel AA et al (2007) The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol Cell Proteomics 6:1638–1655
Article CAS PubMed Google Scholar
Cox J, Neuhauser N, Michalski A et al (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 10:1794–1805
Article CAS PubMed Google Scholar
Nesvizhskii AI (2010) A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 73:2092–2123
Article CAS PubMed PubMed Central Google Scholar
Wang X, Slebos RJC, Wang D et al (2012) Protein identification using customized protein sequence databases derived from RNA-Seq data. J Proteome Res 11:1009–1017
Article CAS PubMed Google Scholar
Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46
Article CAS PubMed Google Scholar
Evans VC, Barker G, Heesom KJ et al (2012) De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat Methods 9:1207–1211
Article CAS PubMed PubMed Central Google Scholar
Lopez-Casado G, Covey PA, Bedinger PA et al (2012) Enabling proteomic studies with RNA-Seq: the proteome of tomato pollen as a test case. Proteomics 12:761–774
Article CAS PubMed Google Scholar
Sheynkman GM, Shortreed MR, Frey BL et al (2013) Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol Cell Proteomics 12:2341–2353
Article CAS PubMed PubMed Central Google Scholar
Adamidi C, Wang Y, Gruen D et al (2011) De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics. Genome Res 21:1193–1200
Article CAS PubMed PubMed Central Google Scholar
He R, Kim M-J, Nelson W et al (2012) Next-generation sequencing-based transcriptomic and proteomic analysis of the common reed, Phragmites australis (Poaceae), reveals genes involved in invasiveness and rhizome specificity. Am J Bot 99:232–247
Article CAS PubMed Google Scholar
Song J, Sun R, Li D et al (2012) An improvement of shotgun proteomics analysis by adding next-generation sequencing transcriptome data in orange. PLoS One 7, e39494
Article CAS PubMed PubMed Central Google Scholar
Romero-Rodríguez MC, Pascual J, Valledor L et al (2014) Improving the quality of protein identification in non-model species. Characterization of Quercus ilex seed and Pinus radiata needle proteomes by using SEQUEST and custom databases. J Proteomics 105:85–91
Google Scholar
Wu X, Xu L, Gu W et al (2014) Iterative genome correction largely improves proteomic analysis of nonmodel organisms. J Proteome Res 13:2724–2734
Google Scholar
Woo S, Cha SW, Merrihew G et al (2014) Proteogenomic database construction driven from large scale RNA-seq data. J Proteome Res 13:21–28
Article CAS PubMed Google Scholar
Armengaud J, Trapp J, Pible O et al (2014) Non-model organisms, a species endangered by proteogenomics. J Proteomics 105:5–18
Article CAS PubMed Google Scholar
Wang X, Zhang B (2014) Integrating genomic, transcriptomic, and interactome data to improve peptide and protein identification in shotgun proteomics. J Proteome Res 13:2715–2723
Google Scholar
Wang X, Zhang B (2013) customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics 29:3235–3237
Article CAS PubMed PubMed Central Google Scholar
Luge T, Kube M, Freiwald A et al (2014) Transcriptomics assisted proteomic analysis of Nicotiana occidentalis infected by “Candidatus Phytoplasma mali” strain AT. Proteomics 14:1882–1889
Google Scholar
Conesa A, Götz S, García-Gómez JM et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676
Article CAS PubMed Google Scholar
Gillet LC, Navarro P, Tate S et al (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 11:O111.016717
Article PubMed PubMed Central Google Scholar
Egertson JD, Kuehn A, Merrihew GE et al (2013) Multiplexed MS/MS for improved data-independent acquisition. Nat Methods 10:744–746
Article CAS PubMed PubMed Central Google Scholar
Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8:1494–1512
Article CAS PubMed Google Scholar
Rice P (2000) The European Molecular Biology Open Software Suite EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16:2–3
Article Google Scholar
R Development Core Team R (2011) R: a language and environment for statistical computing. R Found Stat Comput 1:409
Google Scholar
Cox J, Matic I, Hilger M et al (2009) A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat Protoc 4:698–705
Article CAS PubMed Google Scholar

Download references

Acknowledgement

Our work was supported by the German Ministry for Education and Research (BMBF, grant number 0315082, 01EA1303 to S.S.), the European Union (FP7/2007-2013), under grant agreement n° 262055 (ESGI), and the Max Planck Society. This work is part of the Ph.D. thesis of T.L.

Author information

Authors and Affiliations

Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Ihnestrasse 63-73, 14195, Berlin, Germany
Toni Luge & Sascha Sauer

Authors

Toni Luge
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Sauer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sascha Sauer .

Editor information

Editors and Affiliations

University of Regensburg Institute of Functional Genomics, Regensburg, Germany
Jörg Reinders

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Luge, T., Sauer, S. (2016). Generating Sample-Specific Databases for Mass Spectrometry-Based Proteomic Analysis by Using RNA Sequencing. In: Reinders, J. (eds) Proteomics in Systems Biology. Methods in Molecular Biology, vol 1394. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3341-9_16

Download citation

DOI: https://doi.org/10.1007/978-1-4939-3341-9_16
Published: 24 December 2015
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3339-6
Online ISBN: 978-1-4939-3341-9
eBook Packages: Springer Protocols

Publish with us

Policies and ethics