Progressive Calibration and Averaging for Tandem Mass Spectrometry Statistical Confidence Estimation: Why Settle for a Single Decoy?

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10229)

Abstract

Estimating the false discovery rate (FDR) among a list of tandem mass spectrum identifications is mostly done through target-decoy competition (TDC). Here we offer two new methods that can use an arbitrarily small number of additional randomly drawn decoy databases to improve TDC. Specifically, “Partial Calibration” utilizes a new meta-scoring scheme that allows us to gradually benefit from the increase in the number of identifications calibration yields and “Averaged TDC” (a-TDC) reduces the liberal bias of TDC for small FDR values and its variability throughout. Combining a-TDC with “Progressive Calibration” (PC), which attempts to find the “right” number of decoys required for calibration we see substantial impact in real datasets: when analyzing the Plasmodium falciparum data it typically yields almost the entire 17% increase in discoveries that “full calibration” yields (at FDR level 0.05) using 60 times fewer decoys. Our methods are further validated using a novel realistic simulation scheme and importantly, they apply more generally to the problem of controlling the FDR among discoveries from searching an incomplete database.

Keywords

Tandem mass spectrometry Spectrum identification False discovery rate Calibration 

Supplementary material

440120_1_En_7_MOESM1_ESM.pdf (2.1 mb)
Supplementary material 1 (pdf 2118 KB)

References

  1. 1.
    Alves, G., Ogurtsov, A.Y., Yu, Y.K.: RAId_aPS: MS/MS analysis with multiple scoring functions and spectrum-specific statistics. PLoS ONE 5(11), e15438 (2010)CrossRefGoogle Scholar
  2. 2.
    Barber, R.F., Candes, E.J.: Controlling the false discovery rate via knockoffs. Ann. Stat. 43(5), 2055–2085 (2015). http://dx.doi.org/10.1214/15-AOS1337 MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B 57, 289–300 (1995)MathSciNetMATHGoogle Scholar
  4. 4.
    Cerqueira, F.R., Graber, A., Schwikowski, B., Baumgartner, C.: Mude: a new approach for optimizing sensitivity in the target-decoy search strategy for large-scale peptide/protein identification. J. Proteome Res. 9(5), 2265–2277 (2010). pMID: 20199108. http://dx.doi.org/10.1021/pr901023v CrossRefGoogle Scholar
  5. 5.
    Elias, J.E., Gygi, S.P.: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4(3), 207–214 (2007)CrossRefGoogle Scholar
  6. 6.
    Elias, J.E., Gygi, S.P.: Target-decoy search strategy for mass spectrometry-based proteomics. Methods Mol. Biol. 604, 55–71 (2010)CrossRefGoogle Scholar
  7. 7.
    Eng, J.K., McCormack, A.L., Yates III, J.R.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994)CrossRefGoogle Scholar
  8. 8.
    Gupta, N., Bandeira, N., Keich, U., Pevzner, P.: Target-decoy approach and false discovery rate: when things may go wrong. J. Am. Soc. Mass Spectrom. 22(7), 1111–1120 (2011)CrossRefGoogle Scholar
  9. 9.
    Howbert, J.J., Noble, W.S.: Computing exact p-values for a cross-correlation shotgun proteomics score function. Mol. Cell. Proteomics 13(9), 2467–2479 (2014)CrossRefGoogle Scholar
  10. 10.
    Jeong, K., Kim, S., Bandeira, N.: False discovery rates in spectral identification. BMC Bioinform. 13(Suppl. 16), S2 (2012)CrossRefGoogle Scholar
  11. 11.
    Keich, U., Noble, W.S.: Improved false discovery rate estimation procedure for shotgun proteomics. J. Proteome Res. 14(8), 3148–3161 (2015)CrossRefGoogle Scholar
  12. 12.
    Keich, U., Noble, W.S.: On the importance of well calibrated scores for identifying shotgun proteomics spectra. J. Proteome Res. 14(2), 1147–1160 (2015)CrossRefGoogle Scholar
  13. 13.
    Keich, U., Noble, W.S.: Controlling the FDR in imperfect matches to an incomplete database (2016, submitted)Google Scholar
  14. 14.
    Kertesz-Farkas, A., Keich, U., Noble, W.S.: Tandem mass spectrum identification via cascaded search. J. Proteome Res. 14(8), 3027–3038 (2015)CrossRefGoogle Scholar
  15. 15.
    Kim, S., Gupta, N., Pevzner, P.A.: Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res. 7, 3354–3363 (2008)CrossRefGoogle Scholar
  16. 16.
    Klammer, A.A., Park, C.Y., Noble, W.S.: Statistical calibration of the sequest XCorr function. J. Proteome Res. 8(4), 2106–2113 (2009)CrossRefGoogle Scholar
  17. 17.
    Spirin, V., Shpunt, A., Seebacher, J., Gentzel, M., Shevchenko, A., Gygi, S., Sunyaev, S.: Assigning spectrum-specific p-values to protein identifications by mass spectrometry. Bioinformatics 27(8), 1128–1134 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Mathematics and Statistics F07University of SydneySydneyAustralia
  2. 2.Department of Genome Sciences, Department of Computer Science and EngineeringUniversity of WashingtonSeattleUSA

Personalised recommendations