Skip to main content

Statistical Analysis of Post-Translational Modifications Quantified by Label-Free Proteomics Across Multiple Biological Conditions with R: Illustration from SARS-CoV-2 Infected Cells

  • Protocol
  • First Online:
Statistical Analysis of Proteomic Data

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2426))

Abstract

Protein post-translational modifications (PTMs) are essential elements of cellular communication. Their variations in abundance can affect cellular pathways, leading to cellular disorders and diseases. A widely used method for revealing PTM-mediated regulatory networks is their label-free quantitation (LFQ) by high-resolution mass spectrometry. The raw data resulting from such experiments are generally interpreted using specific software, such as MaxQuant, MassChroQ, or Proline for instance. They provide data matrices containing quantified intensities for each modified peptide identified. Statistical analyses are then necessary (1) to ensure that the quantified data are of good enough quality and sufficiently reproducible, (2) to highlight the modified peptides that are differentially abundant between the biological conditions under study. The objective of this chapter is therefore to provide a complete data analysis pipeline for analyzing the quantified values of modified peptides in presence of two or more biological conditions using the R software. We illustrate our pipeline starting from MaxQuant outputs dealing with the analysis of A549-ACE2 cells infected by SARS-CoV-2 at different time stamps, freely available on PRIDE (PXD020019).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Witze ES, Old WM, Resing KA, Ahn NG (2007) Mapping protein post-translational modifications with mass spectrometry. Nat Methods 4(10):798–806. https://doi.org/10.1038/nmeth1100

    Article  CAS  PubMed  Google Scholar 

  2. Zhao Y, Jensen ON (2009) Modification-specific proteomics: strategies for characterization of post-translational modifications using enrichment techniques. Proteomics 9(20):4632–4641. https://doi.org/10.1002/pmic.200900398

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Pieroni L, Iavarone F, Olianas A, Greco V, Desiderio C, Martelli C, Manconi B, Sanna MT, Messana I, Castagnola M et al. (2020) Enrichments of post-translational modifications in proteomic studies. J Sep Sci 43(1):313–336. https://doi.org/10.1002/jssc.201900804

    Article  CAS  PubMed  Google Scholar 

  4. Ficarro SB, Adelmant G, Tomar MN, Zhang Y, Cheng VJ, Marto JA (2009) Magnetic bead processor for rapid evaluation and optimization of parameters for phosphopeptide enrichment. Anal Chem 81(11):4566–4575. https://doi.org/10.1021/ac9004452

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Pinkse MW, Lemeer S, Heck AJ (2011) A protocol on the use of titanium dioxide chromatography for phosphoproteomics. In: Gel-free proteomics. Springer, pp 215–228, https://doi.org/10.1007/978-1-61779-148-2_14

  6. Udeshi ND, Svinkina T, Mertins P, Kuhn E, Mani D, Qiao JW, Carr SA (2013) Refined preparation and use of anti-diglycine remnant (k-ε-gg) antibody enables routine quantification of 10,000 s of ubiquitination sites in single proteomics experiments. Mol Cell Proteomics 12(3):825–831. https://doi.org/10.1074/mcp.O112.027094

    Article  CAS  PubMed  Google Scholar 

  7. Carlson SM, Moore KE, Green EM, Martín GM, Gozani O (2014) Proteome-wide enrichment of proteins modified by lysine methylation. Nat Protoc 9(1):37–50. https://doi.org/10.1038/nprot.2013.164

    Article  CAS  PubMed  Google Scholar 

  8. Kim SC, Sprung R, Chen Y, Xu Y, Ball H, Pei J, Cheng T, Kho Y, Xiao H, Xiao L et al. (2006) Substrate and functional diversity of lysine acetylation revealed by a proteomics survey. Mol Cell 23(4):607–618. https://doi.org/10.1016/j.molcel.2006.06.026

    Article  CAS  PubMed  Google Scholar 

  9. Mertins P, Qiao JW, Patel J, Udeshi ND, Clauser KR, Mani D, Burgess MW, Gillette MA, Jaffe JD, Carr SA (2013) Integrated proteomic analysis of post-translational modifications by serial enrichment. Nat Methods 10(7):634. https://doi.org/10.1038/nmeth.2518

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Chalkley RJ, Clauser KR (2012) Modification site localization scoring: strategies and performance. Mol Cell Proteomics 11(5):3–14. https://doi.org/10.1074/mcp.R111.015305

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Tyanova S, Temu T, Cox J (2016) The maxquant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11(12):2301. https://doi.org/10.1038/nprot.2016.136

    Article  CAS  PubMed  Google Scholar 

  12. Valot B, Langella O, Nano E, Zivy M (2011) Masschroq: a versatile tool for mass spectrometry quantification. Proteomics 11(17):3572–3577. https://doi.org/10.1002/pmic.201100120

    Article  CAS  PubMed  Google Scholar 

  13. Bouyssié D, Hesse AM, Mouton-Barbosa E, Rompais M, Macron C, Carapito C, Gonzalez de Peredo A, Couté Y, Dupierris V, Burel A et al. (2020) Proline: an efficient and user-friendly software suite for large-scale proteomics. Bioinformatics 36(10):3148–3155. https://doi.org/10.1093/bioinformatics/btaa118

    Article  PubMed  PubMed Central  Google Scholar 

  14. Gentleman R (2008) R programming for bioinformatics. CRC Press. https://doi.org/10.18637/jss.v029.b08

  15. Chambers J (2008) Software for data analysis: programming with R. Springer Science & Business Media. https://doi.org/10.1007/978-0-387-75936-4

  16. Millot G (2011) Comprendre et réaliser les tests statistiques à l’aide de r. De Boeck, Brussels. ISBN 978-2-8073-0291-4

    Google Scholar 

  17. Chen H, Boutros PC (2011) Venndiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinformatics 12(1):1–7. https://doi.org/10.1186/1471-2105-12-35

    Article  CAS  Google Scholar 

  18. Conway JR, Lex A, Gehlenborg N (2017) UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33(18):2938–2940. https://doi.org/10.1093/bioinformatics/btx364

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer. https://doi.org/10.1080/15366367.2019.1565254

  20. de Vries A, Ripley BD (2020) Ggdendro: create dendrograms and tree diagrams using ‘ggplot2’. R package version 0122. https://cran.r-project.org/web/packages/ggdendro/index.html

  21. Wilke CO (2021) ggridges: ridgeline plots in ‘ggplot2’. R package version 053. https://cran.r-project.org/web/packages/ggridges/index.html

  22. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47–e47. https://doi.org/10.1093/nar/gkv007

    Article  PubMed  PubMed Central  Google Scholar 

  23. Giai Gianetto Q, Combes F, Ramus C, Bruley C, Couté Y, Burger T (2016) Calibration plot for proteomics: a graphical tool to visually check the assumptions underlying FDR control in quantitative experiments. Proteomics 16(1):29–32. https://doi.org/10.1002/pmic.201500189

    Article  CAS  PubMed  Google Scholar 

  24. Liu P, Hwang JG (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6):739–746. https://doi.org/10.1093/bioinformatics/btl664

    Article  CAS  PubMed  Google Scholar 

  25. Gianetto QG, Wieczorek S, Couté Y, Burger T (2020) A peptide-level multiple imputation strategy accounting for the different natures of missing values in proteomics data. bioRxiv. https://doi.org/10.1101/2020.05.29.122770

  26. Fox J, Weisberg S, Adler D, Bates D, Baud-Bovy G, Ellison S, Firth D, Friendly M, Gorjanc G, Graves S et al. (2020) car: companion to applied regression. R package version 30-10. https://cran.r-project.org/web/packages/car/index.html

  27. Böttcher B (2020) Copula versions of distance multivariance and dHSIC via the distributional transform–a general approach to construct invariant dependence measures. Statistics 1–18. https://doi.org/10.1080/02331888.2020.1748029

  28. Kassambara A, Mundt F (2020) factoextra: extract and visualize the results of multivariate data analyses. R package version 107. https://cran.r-project.org/web/packages/factoextra/index.html

  29. Wickham H (2020) reshape2: flexibly reshape data: a reboot of the reshape package. R package version 144. https://cran.r-project.org/web/packages/reshape2/index.html

  30. Kassambara A (2020) ggpubr: “ggplot2” based publication ready plots. https://cran.r-project.org/web/packages/ggpubr/index.html

  31. Stukalov A, Girault V, Grass V, Bergant V, Karayel O, Urban C, Haas DA, Huang Y, Oubraham L, Wang A et al. (2020) Multi-level proteomics reveals host-perturbation strategies of SARS-CoV-2 and SARS-CoV. BioRxiv https://doi.org/10.1101/2020.06.17.156455

  32. Wieczorek S, Gianetto QG, Burger T (2019) Five simple yet essential steps to correctly estimate the rate of false differentially abundant proteins in mass spectrometry analyses. J Proteomics 207:103441. https://doi.org/10.1016/j.jprot.2019.103441

    Article  CAS  PubMed  Google Scholar 

  33. Pounds S, Cheng C (2006) Robust estimation of the false discovery rate. Bioinformatics 22(16):1979–1987. https://doi.org/10.1093/bioinformatics/btl328

    Article  CAS  PubMed  Google Scholar 

  34. Kauko O, Laajala TD, Jumppanen M, Hintsanen P, Suni V, Haapaniemi P, Corthals G, Aittokallio T, Westermarck J, Imanishi SY (2015) Label-free quantitative phosphoproteomics with novel pairwise abundance normalization reveals synergistic RAS and CIP2A signaling. Sci Rep 5:13099. https://doi.org/10.1038/srep13099

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Saraei S, Suomi T, Kauko O, Elo LL (2018) Phosphonormalizer: an R package for normalization of MS-based label-free phosphoproteomics. Bioinformatics 34(4):693–694. https://doi.org/10.1093/bioinformatics/btx573

    Article  CAS  PubMed  Google Scholar 

  36. Wieczorek S, Combes F, Lazar C, Giai Gianetto Q, Gatto L, Dorffer A, Hesse AM, Coute Y, Ferro M, Bruley C, Burger T (2017) DAPAR & ProStaR: software to perform statistical analyses in quantitative discovery proteomics. Bioinformatics 33(1):135–136. https://doi.org/10.1093/bioinformatics/btw580

    Article  CAS  PubMed  Google Scholar 

  37. Lazar C, Gatto L, Ferro M, Bruley C, Burger T (2016) Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J Proteome Res 15(4):1116–1125. https://doi.org/10.1021/acs.jproteome.5b00981

    Article  CAS  PubMed  Google Scholar 

  38. Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) NbClust: an R package for determining the relevant number of clusters in a data set. J Stat Softw 61:1–36. https://doi.org/10.18637/jss.v061.i06

    Article  Google Scholar 

  39. Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B Stat Methodol 63(2):411–423. https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00293

    Article  Google Scholar 

  40. Wagih O, Sugiyama N, Ishihama Y, Beltrao P (2016) Uncovering phosphorylation-based specificities through functional interaction networks. Mol Cell Proteomics 15(1):236–245. https://doi.org/10.1074/mcp.M115.052357

    Article  CAS  PubMed  Google Scholar 

  41. Wagih O (2017) ggseqlogo: a versatile r package for drawing sequence logos. Bioinformatics 33(22):3645–3647. https://doi.org/10.1093/bioinformatics/btx469

    Article  CAS  PubMed  Google Scholar 

  42. Krug K, Mertins P, Zhang B, Hornbeck P, Raju R, Ahmad R, Szucs M, Mundt F, Forestier D, Jane-Valbuena J et al. (2019) A curated resource for phosphosite-specific signature analysis. Mol Cell Proteomics 18(3):576–593. https://doi.org/10.1074/mcp.TIR118.000943

    Article  CAS  PubMed  Google Scholar 

  43. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. https://doi.org/10.1101/gr.1239303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Doncheva NT, Morris JH, Gorodkin J, Jensen LJ (2018) Cytoscape stringApp: network analysis and visualization of proteomics data. J Proteome Res 18(2):623–632. https://doi.org/10.1021/acs.jproteome.8b00702

    Article  PubMed  PubMed Central  Google Scholar 

  45. Legeay M, Doncheva NT, Morris JH, Jensen LJ (2020) Visualize omics data on networks with omics visualizer, a cytoscape app. F1000Research 9. https://doi.org/10.12688/f1000research.22280.2

  46. Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P et al. (2019) String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47(D1):D607–D613. https://doi.org/10.1093/nar/gky1131

    Article  CAS  PubMed  Google Scholar 

  47. Kockmann T, Panse C (2020) rawR-direct access to raw mass spectrometry data in r. bioRxiv. https://doi.org/10.1101/2020.10.30.362533

  48. Fournier F, Joly Beauparlant C, Paradis R, Droit A (2014) rTANDEM, an R/Bioconductor package for MS/MS protein identification. Bioinformatics 30(15):2233–2234. https://doi.org/10.1093/bioinformatics/btu178

    Article  CAS  PubMed  Google Scholar 

  49. Kim S, Pevzner PA (2014) MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5:5277. https://doi.org/10.1038/ncomms6277

    Article  CAS  PubMed  Google Scholar 

  50. Pedersen TL (2020) Msgfplus: an interface between R and MS-GF+. R package version 1240. http://www.bioconductor.org/packages/release/bioc/html/MSGFplus.html

  51. Gatto L, Breckels LM, Naake T, Gibb S (2015) Visualization of proteomics data using R and bioconductor. Proteomics 15(8):1375–1389. https://doi.org/10.1002/pmic.201400392

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Couté Y, Bruley C, Burger T (2020) Beyond target–decoy competition: Stable validation of peptide and protein identifications in mass spectrometry-based discovery proteomics. Anal Chem 92(22):14898–14906. https://doi.org/10.1021/acs.analchem.0c00328

    Article  PubMed  Google Scholar 

  53. Pratama I, Permanasari AE, Ardiyanto I, Indrayani R (2016) A review of missing values handling methods on time-series data. In 2016 International conference on information technology systems and innovation (ICITSI). IEEE, pp 1–6. https://doi.org/10.1109/ICITSI.2016.7858189

  54. Gan G, Ma C, Wu J (2020) Data clustering: theory, algorithms, and applications. SIAM. https://doi.org/10.1137/1.9780898718348

  55. Schwämmle V, Jensen ON (2018) VSClust: feature-based variance-sensitive clustering of omics data. Bioinformatics 34(17):2965–2972. https://doi.org/10.1093/bioinformatics/bty224

    Article  PubMed  Google Scholar 

  56. Winkler R, Klawonn F, Kruse R (2011) Fuzzy C-means in high dimensional spaces. Int J Fuzzy Syst Appl 1(1):1–16. https://doi.org/10.4018/IJFSA.2011010101

    Google Scholar 

  57. Giorgino T et al. (2009) Computing and visualizing dynamic time warping alignments in R: the dtw package. J Stat Softw 31(7):1–24. https://doi.org/10.18637/jss.v031.i07

    Article  Google Scholar 

  58. Mori U, Mendiburu A, Lozano JA (2016) Distance measures for time series in R: the TSdist package. R J 8(2):451. https://doi.org/10.32614/RJ-2016-058

    Article  Google Scholar 

Download references

Acknowledgements

The author wants to acknowledge Mariette Matondo, Thibaut Douché, Thibault Chaze, and Magalie Duchateau of the Proteomics platform of the Institut Pasteur for fruitful discussions about proteomics and phosphoproteomics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quentin Giai Gianetto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Giai Gianetto, Q. (2023). Statistical Analysis of Post-Translational Modifications Quantified by Label-Free Proteomics Across Multiple Biological Conditions with R: Illustration from SARS-CoV-2 Infected Cells. In: Burger, T. (eds) Statistical Analysis of Proteomic Data. Methods in Molecular Biology, vol 2426. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1967-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1967-4_12

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1966-7

  • Online ISBN: 978-1-0716-1967-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics