Abstract
In proteomic differential analysis, FDR control is often performed through a multiple test correction (i.e., the adjustment of the original p-values). In this protocol, we apply a recent and alternative method, based on so-called knockoff filters. It shares interesting conceptual similarities with the target–decoy competition procedure, classically used in proteomics for FDR control at peptide identification. To provide practitioners with a unified understanding of FDR control in proteomics, we apply the knockoff procedure on real and simulated quantitative datasets. Leveraging these comparisons, we propose to adapt the knockoff procedure to better fit the specificities of quantitative proteomic data (mainly very few samples). Performances of knockoff procedure are compared with those of the classical Benjamini–Hochberg procedure, hereby shedding a new light on the strengths and weaknesses of target–decoy competition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57(1):289–300. http://www.jstor.org/stable/2346101
Benjamini Y, Krieger AM, Yekutieli D (2006) Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93(3):491–507. https://doi.org/10.1093/biomet/93.3.491
Efron B (2012) Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, vol 1. Cambridge University Press, Cambridge
Barber RF, Candès EJ, et al (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43(5):2055–2085. https://doi.org/10.1214/15-AOS1337
Candès E, Fan Y, Janson L, Lv J (2018) Panning for gold: ‘model-x’ knockoffs for high dimensional controlled variable selection. J. R Stat Soc: Ser B (Stat Methodol) 80(3):551–577. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssb.12265
Stephens M (2017) False discovery rates: a new deal. Biostatistics 18(2):275–294. https://doi.org/10.1093/biostatistics/kxw041
Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4(3):207–214, https://doi.org/10.1038/nmeth1019
Käll L, Storey JD, MacCoss MJ, Noble WS (2008) Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 7(01):29–34. https://doi.org/10.1021/pr700600n
Couté Y, Bruley C, Burger T (2020) Beyond target-decoy competition: stable validation of peptide and protein identifications in mass spectrometry-based discovery proteomics. Anal Chem 92(22):14898–14906. https://doi.org/10.1021/acs.analchem.0c00328
Emery K, Hasam S, Noble WS, Keich U (2019) Multiple competition-based FDR control for peptide detection. Preprint. https://arxiv.org/abs/1907.01458
He K, Fu Y, Zeng WF, Luo L, Chi H, Liu C, Qing LY, Sun RX, He SM (2015) A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics. Preprint. https://arxiv.org/abs/1501.00537
Bouret P, Bastien F (2018) Erreurs et tests statistiques (40 min). https://hal.inria.fr/medihal-01774420/
Burger T (2018) Gentle introduction to the statistical foundations of false discovery rate in quantitative proteomics. J Proteome Res 17(1):12–22. https://doi.org/10.1021/acs.jproteome.7b00170
Hastie T, Efron B (2013) LARS: Least Angle Regression, Lasso and Forward Stagewise. R package version 1.2. https://CRAN.R-project.org/package=lars
Friedman J, Hastie J, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22. http://www.jstatsoft.org/v33/i01/
Giai-Gianetto Q, Combes F, Ramus C, Bruley C, Couté Y, Burger T (2019) cp4p: calibration plot for proteomics. R package version 0.3.6. https://CRAN.R-project.org/package=cp4p
Ramus C, Hovasse A, Marcellin M, Hesse AM, Mouton-Barbosa E, Bouyssié D, Vaca S, Carapito C, Chaoui K, Bruley C, Garin J, Cianférani S, Ferro M, Van Dorssaeler A, Burlet-Schiltz O, Schaeffer C, Couté Y, Gonzalez de Peredo A (2016) Benchmarking quantitative label-free LC–MS data processing workflows using a complex spiked proteomic standard dataset. J Proteom 132:51–62. https://www.sciencedirect.com/science/article/pii/S187439191530186X
Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T, Gottardo R, Hahne F, Hansen KD, Irizarry RA, Lawrence M, Love MI, MacDonald J, Obenchain V, Ole’s AK, Pag‘es H, Reyes A, Shannon P, Smyth GK, Tenenbaum D, Waldron L, Morgan M (2015) Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12(2):115–121. http://www.nature.com/nmeth/journal/v12/n2/full/nmeth.3252.html
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc: Ser B (Stat Methodol) 67(2):301–320. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-9868.2005.00503.x
Nguyen TB, Chevalier JA, Thirion B, Arlot S (2020) Aggregation of multiple knockoffs. In: International conference on machine learning, PMLR, pp 7283–7293. http://proceedings.mlr.press/v119/nguyen20a.html
Keich U, Tamura K, Noble WS (2019) Averaging strategy to reduce variability in target-decoy estimates of false discovery rate. J Proteome Res 18(2):585–593. https://doi.org/10.1021/acs.jproteome.8b00802
Romano JP, Shaikh AM, et al. (2006) On stepdown control of the false discovery proportion. In: Optimality, Institute of Mathematical Statistics, pp 33–50
Luo D, He Y, Emery K, Noble WS, Keich U (2020) Competition-based control of the false discovery proportion. Preprint. https://arxiv.org/abs/2011.11939
Ge Y, Dudoit S, Speed TP (2003) Resampling-based multiple testing for microarray data analysis. Test 12(1):1–77. https://doi.org/10.1007/BF02595811
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499. https://doi.org/10.1214/009053604000000067
Acknowledgements
This work was supported by grants from the French National Research Agency: ProFI project (ANR-10-INBS-08), GRAL project (ANR-10-LABX-49-01), DATA@UGA and SYMER projects (ANR-15-IDEX-02) and MIAI @ Grenoble Alpes (ANR-19-P3IA-0003).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Etourneau, L., Varoquaux, N., Burger, T. (2023). Unveiling the Links Between Peptide Identification and Differential Analysis FDR Controls by Means of a Practical Introduction to Knockoff Filters. In: Burger, T. (eds) Statistical Analysis of Proteomic Data. Methods in Molecular Biology, vol 2426. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1967-4_1
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1967-4_1
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1966-7
Online ISBN: 978-1-0716-1967-4
eBook Packages: Springer Protocols