Assessing Significance of Peptide Spectrum Matches in Proteomics: A Multiple Testing Approach

Ghosh, Debashis

doi:10.1007/s12561-009-9012-3

Assessing Significance of Peptide Spectrum Matches in Proteomics: A Multiple Testing Approach

Published: 09 October 2009

Volume 1, pages 199–213, (2009)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Debashis Ghosh¹

184 Accesses
1 Citation
Explore all metrics

Abstract

In the analysis of data from proteomic mass spectrometry experiments, an important issue is determining which of the observed peptide spectrum matches (PSMs) represent true positives. We view this problem through a multiple testing framework and develop procedures for deciding true PSMs. A key feature that makes the problem relative unique to the differential expression problem in microarray analysis is that the null distribution can potentially be estimated from the data. However, this renders much of the asymptotic results from the statistical literature to be invalid. We prove some new key results for this problem using empirical process theory. We also develop a new multiple testing procedure that employs multivariate information from the peptide sequence searches. The proposed methods are studied using a real data set as well as simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Article 07 February 2017

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Article Open access 05 December 2014

Statistical power for cluster analysis

Article Open access 31 May 2022

References

Anderson DC, Li W, Payan DG, Noble WS (2003) A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. J Proteome Res 2:137–146
Article Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300
MathSciNet MATH Google Scholar
Chen CH, Li KC (1998) Can SIR ever be as popular as multiple regression? Stat Sin 8:298–316
Google Scholar
Choi HW, Ghosh D, Neshvizhskii A (2008) Statistical validation of peptide identifications in large-scale proteomics using target-decoy database search strategy and flexible mixture modeling. J Proteome Res 7:286–292
Article Google Scholar
Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65:141–151
Article MathSciNet MATH Google Scholar
Cook RD (1998) Regression graphics. Wiley, New York
Book MATH Google Scholar
Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467
Article Google Scholar
Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc 96:96–104
Article MathSciNet Google Scholar
Efron B, Tibshirani R, Storey JD, Tusher V (2001) Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 96:1151–1160
Article MathSciNet MATH Google Scholar
Fitzgibbon M, Li Q, McIntosh M (2008) Modes of inference for evaluating the confidence of peptide identifications. J Proteome Res 7:35–39
Article Google Scholar
Genovese CR, Wasserman L (2004) A stochastic process approach to false discovery control. Ann Stat 32:1035–1061
Article MathSciNet MATH Google Scholar
Genovese CR, Roeder K, Wasserman L (2006) False discovery control with p-value weighting. Biometrika 93:509–524
Article MathSciNet MATH Google Scholar
Ghosh D, Chinnaiyan AM (2009) Genomic outlier profile analysis: mixture models, null hypotheses and nonparametric estimation. Biostatistics 10:60–69
Article Google Scholar
Ghosh D, Chen W, Raghunathan TE (2006) The false discovery rate: a variable selection perspective. J Stat Plan Inference 136:2668–2684
Article MathSciNet MATH Google Scholar
Käll L, Storey JD, MacCoss MJ, Noble WS (2008) Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 7:29–34
Article Google Scholar
Keller A, Neshvizhskii AI, Kolker E, Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74:5383–5892
Article Google Scholar
Klimek J, Eddes JS, Hohmann L, Jackson J, Peterson A, Letarte S, Gafken PR, Katz JE, Mallick P, Lee H, Schmidt A, Ossola R, Eng JK, Aebersold R, Martin DB (2008) The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. J Proteome Res 7:96–103
Article Google Scholar
Li KC (1991) Sliced inverse regression for dimension reduction (with discussion). J Am Stat Assoc 86:316–342
Article MATH Google Scholar
Liebler DC (2001) Introduction to proteomics: tools for the new biology. Humana Press, Clifton
Book Google Scholar
Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5:155–176
Article MATH Google Scholar
Perkins DN, Pappin DJ, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567
Article Google Scholar
Sarkar S, Zhou T, Ghosh D (2008) A general decision-theoretic approach to multiple testing procedures for false discovery and false nondiscovery rates. Stat Sin 18:925–946
MathSciNet MATH Google Scholar
Spivak M, Weston J, Bottou L, Käll L, Noble WS (2009) Improvements to the percolator algorithm for Peptide identification from shotgun proteomics data sets. J Proteome Res 8:3737–3745
Article Google Scholar
Storey JD, Taylor JE, Siegmund D (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J R Stat Soc Ser B 66:187–205
Article MathSciNet MATH Google Scholar
Van der Vaart A (2000) Asymptotic statistics. Cambridge University Press, Cambridge
Google Scholar
Yates JR III, Eng JK, McCormack AL, Schieltz D (1995) Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal Chem 67:1426–1436
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Penn State University, University Park, PA, 16802, USA
Debashis Ghosh

Authors

Debashis Ghosh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debashis Ghosh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ghosh, D. Assessing Significance of Peptide Spectrum Matches in Proteomics: A Multiple Testing Approach. Stat Biosci 1, 199–213 (2009). https://doi.org/10.1007/s12561-009-9012-3

Download citation

Received: 02 June 2009
Accepted: 24 September 2009
Published: 09 October 2009
Issue Date: November 2009
DOI: https://doi.org/10.1007/s12561-009-9012-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing Significance of Peptide Spectrum Matches in Proteomics: A Multiple Testing Approach

Abstract

Access this article

Similar content being viewed by others

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Statistical power for cluster analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Assessing Significance of Peptide Spectrum Matches in Proteomics: A Multiple Testing Approach

Abstract

Access this article

Similar content being viewed by others

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Statistical power for cluster analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation