Skip to main content

Generating Sample-Specific Databases for Mass Spectrometry-Based Proteomic Analysis by Using RNA Sequencing

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1394))

Abstract

Mass spectrometry-based methods allow for the direct, comprehensive analysis of expressed proteins and their quantification among different conditions. However, in general identification of proteins by assigning experimental mass spectra to peptide sequences of proteins relies on matching mass spectra to theoretical spectra derived from genomic databases of organisms. This conventional approach limits the applicability of proteomic methodologies to species for which a genome reference sequence is available. Recently, RNA-sequencing (RNA-Seq) became a valuable tool to overcome this limitation by de novo construction of databases for organisms for which no DNA sequence is available, or by refining existing genomic databases with transcriptomic data. Here we present a generic pipeline to make use of transcriptomic data for proteomics experiments. We show in particular how to efficiently fuel proteomic analysis workflows with sample-specific RNA-sequencing databases. This approach is useful for the proteomic analysis of so far unsequenced organisms, complex microbial metatranscriptomes/metaproteomes (for example in the human body), and for refining current proteomics data analysis that solely relies on the genomic sequence and predicted gene expression but not on validated gene products. Finally, the approach used in the here presented protocol can help to improve the data quality of conventional proteomics experiments that can be influenced by genetic variation or splicing events.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Ong S-E, Blagoev B, Kratchmarova I et al (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 1:376–386

    Article  CAS  PubMed  Google Scholar 

  2. Hsu J, Huang S, Chow N et al (2003) Stable-isotope dimethyl labeling for quantitative proteomics. Anal Chem 75:6843–6852

    Article  CAS  PubMed  Google Scholar 

  3. Wiese S, Reidegeld KA, Meyer HE et al (2007) Protein labeling by iTRAQ: a new tool for quantitative mass spectrometry in proteome research. Proteomics 7:340–350

    Article  CAS  PubMed  Google Scholar 

  4. Gerber SA, Rush J, Stemman O et al (2003) Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc Natl Acad Sci U S A 100:6940–6945

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Schwanhäusser B, Busse D, Li N et al (2011) Global quantification of mammalian gene expression control. Nature 473:337–342

    Article  PubMed  Google Scholar 

  6. Mann M, Kulak NA, Nagaraj N et al (2013) The coming age of complete, accurate, and ubiquitous proteomes. Mol Cell 49:583–590

    Article  CAS  PubMed  Google Scholar 

  7. Gstaiger M, Aebersold R (2009) Applying mass spectrometry-based proteomics to genetics, genomics and network biology. Nat Rev Genet 10:617–627

    Article  CAS  PubMed  Google Scholar 

  8. Freiwald A, Weidner C, Witzke A et al (2013) Comprehensive proteomic data sets for studying adipocyte-macrophage cell-cell communication. Proteomics 13:3424–3428

    Article  CAS  PubMed  Google Scholar 

  9. Meierhofer D, Weidner C, Hartmann L et al (2013) Protein sets define disease states and predict in vivo effects of drug treatment. Mol Cell Proteomics 12:1965–1979

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Mann M, Hendrickson RC, Pandey A (2001) Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem 70:437–473

    Article  CAS  PubMed  Google Scholar 

  11. Perkins DN, Pappin DJC, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567

    Article  CAS  PubMed  Google Scholar 

  12. Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989

    Article  CAS  PubMed  Google Scholar 

  13. Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467

    Article  CAS  PubMed  Google Scholar 

  14. Tabb DL, Fernando CG, Chambers MC (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6:654–661

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Shilov IV, Seymour SL, Patel AA et al (2007) The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol Cell Proteomics 6:1638–1655

    Article  CAS  PubMed  Google Scholar 

  16. Cox J, Neuhauser N, Michalski A et al (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 10:1794–1805

    Article  CAS  PubMed  Google Scholar 

  17. Nesvizhskii AI (2010) A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics 73:2092–2123

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Wang X, Slebos RJC, Wang D et al (2012) Protein identification using customized protein sequence databases derived from RNA-Seq data. J Proteome Res 11:1009–1017

    Article  CAS  PubMed  Google Scholar 

  19. Metzker ML (2010) Sequencing technologies—the next generation. Nat Rev Genet 11:31–46

    Article  CAS  PubMed  Google Scholar 

  20. Evans VC, Barker G, Heesom KJ et al (2012) De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat Methods 9:1207–1211

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Lopez-Casado G, Covey PA, Bedinger PA et al (2012) Enabling proteomic studies with RNA-Seq: the proteome of tomato pollen as a test case. Proteomics 12:761–774

    Article  CAS  PubMed  Google Scholar 

  22. Sheynkman GM, Shortreed MR, Frey BL et al (2013) Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol Cell Proteomics 12:2341–2353

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Adamidi C, Wang Y, Gruen D et al (2011) De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics. Genome Res 21:1193–1200

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. He R, Kim M-J, Nelson W et al (2012) Next-generation sequencing-based transcriptomic and proteomic analysis of the common reed, Phragmites australis (Poaceae), reveals genes involved in invasiveness and rhizome specificity. Am J Bot 99:232–247

    Article  CAS  PubMed  Google Scholar 

  25. Song J, Sun R, Li D et al (2012) An improvement of shotgun proteomics analysis by adding next-generation sequencing transcriptome data in orange. PLoS One 7, e39494

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Romero-Rodríguez MC, Pascual J, Valledor L et al (2014) Improving the quality of protein identification in non-model species. Characterization of Quercus ilex seed and Pinus radiata needle proteomes by using SEQUEST and custom databases. J Proteomics 105:85–91

    Google Scholar 

  27. Wu X, Xu L, Gu W et al (2014) Iterative genome correction largely improves proteomic analysis of nonmodel organisms. J Proteome Res 13:2724–2734

    Google Scholar 

  28. Woo S, Cha SW, Merrihew G et al (2014) Proteogenomic database construction driven from large scale RNA-seq data. J Proteome Res 13:21–28

    Article  CAS  PubMed  Google Scholar 

  29. Armengaud J, Trapp J, Pible O et al (2014) Non-model organisms, a species endangered by proteogenomics. J Proteomics 105:5–18

    Article  CAS  PubMed  Google Scholar 

  30. Wang X, Zhang B (2014) Integrating genomic, transcriptomic, and interactome data to improve peptide and protein identification in shotgun proteomics. J Proteome Res 13:2715–2723

    Google Scholar 

  31. Wang X, Zhang B (2013) customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics 29:3235–3237

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Luge T, Kube M, Freiwald A et al (2014) Transcriptomics assisted proteomic analysis of Nicotiana occidentalis infected by “Candidatus Phytoplasma mali” strain AT. Proteomics 14:1882–1889

    Google Scholar 

  33. Conesa A, Götz S, García-Gómez JM et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676

    Article  CAS  PubMed  Google Scholar 

  34. Gillet LC, Navarro P, Tate S et al (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 11:O111.016717

    Article  PubMed  PubMed Central  Google Scholar 

  35. Egertson JD, Kuehn A, Merrihew GE et al (2013) Multiplexed MS/MS for improved data-independent acquisition. Nat Methods 10:744–746

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8:1494–1512

    Article  CAS  PubMed  Google Scholar 

  37. Rice P (2000) The European Molecular Biology Open Software Suite EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16:2–3

    Article  Google Scholar 

  38. R Development Core Team R (2011) R: a language and environment for statistical computing. R Found Stat Comput 1:409

    Google Scholar 

  39. Cox J, Matic I, Hilger M et al (2009) A practical guide to the MaxQuant computational platform for SILAC-based quantitative proteomics. Nat Protoc 4:698–705

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgement

Our work was supported by the German Ministry for Education and Research (BMBF, grant number 0315082, 01EA1303 to S.S.), the European Union (FP7/2007-2013), under grant agreement n° 262055 (ESGI), and the Max Planck Society. This work is part of the Ph.D. thesis of T.L.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sascha Sauer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Luge, T., Sauer, S. (2016). Generating Sample-Specific Databases for Mass Spectrometry-Based Proteomic Analysis by Using RNA Sequencing. In: Reinders, J. (eds) Proteomics in Systems Biology. Methods in Molecular Biology, vol 1394. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3341-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3341-9_16

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3339-6

  • Online ISBN: 978-1-4939-3341-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics