Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics

  • Joshua E. Elias
  • Steven P. GygiEmail author
Part of the Methods in Molecular Biology™ book series (MIMB, volume 604)


Accurate and precise methods for estimating incorrect peptide and protein identifications are crucial for effective large-scale proteome analyses by tandem mass spectrometry. The target-decoy search strategy has emerged as a simple, effective tool for generating such estimations. This strategy is based on the premise that obvious, necessarily incorrect “decoy” sequences added to the search space will correspond with incorrect search results that might otherwise be deemed to be correct. With this knowledge, it is possible not only to estimate how many incorrect results are in a final data set but also to use decoy hits to guide the design of filtering criteria that sensitively partition a data set into correct and incorrect identifications.

Key words

Proteomics Target-decoy False positive False discovery Mass spectrometry Estimation 



This work was supported in part by National Institutes of Health (NIH) GM67945 and HG00041 (S.P.G.).


  1. 1.
    Eng, J. K., McCormack, A. L., and Yates, J. R. I. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5, 976-89.CrossRefGoogle Scholar
  2. 2.
    Perkins, D. N., Pappin, D. J., Creasy, D. M., and Cottrell, J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551-67.CrossRefPubMedGoogle Scholar
  3. 3.
    Geer, L. Y., Markey, S. P., Kowalak, J. A., Wagner, L., Xu, M., Maynard, D. M., Yang, X., Shi, W., and Bryant, S. H. (2004) Open mass spectrometry search algorithm. J Proteome Res 3, 958-64.CrossRefPubMedGoogle Scholar
  4. 4.
    Craig, R., and Beavis, R. C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466-7.CrossRefPubMedGoogle Scholar
  5. 5.
    Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74, 5383-92.CrossRefPubMedGoogle Scholar
  6. 6.
    Deutsch, E. W., Lam, H., and Aebersold, R. (2008) PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep 9, 429-34.CrossRefPubMedGoogle Scholar
  7. 7.
    Prince, J. T., Carlson, M. W., Wang, R., Lu, P., and Marcotte, E. M. (2004) The need for a public proteomics repository. Nat Biotechnol 22, 471-2.CrossRefPubMedGoogle Scholar
  8. 8.
    Kersey, P. J., Duarte, J., Williams, A., Karavidopoulou, Y., Birney, E., and Apweiler, R. (2004) The International Protein Index: an integrated database for proteomics experiments. Proteomics 4, 1985-8.CrossRefPubMedGoogle Scholar
  9. 9.
    (2008) The universal protein resource (UniProt). Nucleic Acids Res 36, D190-5.Google Scholar
  10. 10.
    Bakalarski, C. E., Haas, W., Dephoure, N. E., and Gygi, S. P. (2007) The effects of mass accuracy, data acquisition speed, and search algorithm choice on peptide identification rates in phosphoproteomics. Anal Bioanal Chem 389, 1409-19.CrossRefPubMedGoogle Scholar
  11. 11.
    Balgley, B. M., Laudeman, T., Yang, L., Song, T., and Lee, C. S. (2007) Comparative evaluation of tandem MS search algorithms using a target-decoy search strategy. Mol Cell Proteomics 6, 1599-608.CrossRefPubMedGoogle Scholar
  12. 12.
    Elias, J. E., Haas, W., Faherty, B. K., and Gygi, S. P. (2005) Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat Methods 2, 667-75.CrossRefPubMedGoogle Scholar
  13. 13.
    Sadygov, R. G., Cociorva, D., and Yates, J. R., III (2004) Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat Methods 1, 195-202.CrossRefPubMedGoogle Scholar
  14. 14.
    Elias, J. E., and Gygi, S. P. (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4, 207-14.CrossRefPubMedGoogle Scholar
  15. 15.
    Higdon, R., Hogan, J. M., Van Belle, G., and Kolker, E. (2005) Randomized sequence databases for tandem mass spectrometry peptide and protein identification. OMICS 9, 364-79.CrossRefPubMedGoogle Scholar
  16. 16.
    Kall, L., Storey, J. D., MacCoss, M. J., and Noble, W. S. (2008) Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 7, 29-34.CrossRefPubMedGoogle Scholar
  17. 17.
    Moore, R. E., Young, M. K., and Lee, T. D. (2002) Qscore: an algorithm for evaluating SEQUEST database search results. J Am Soc Mass Spectrom 13, 378-86.CrossRefPubMedGoogle Scholar
  18. 18.
    Peng, J., Elias, J. E., Thoreen, C. C., Licklider, L. J., and Gygi, S. P. (2003) Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res 2, 43-50.CrossRefPubMedGoogle Scholar
  19. 19.
    Haas, W., Faherty, B. K., Gerber, S. A., Elias, J. E., Beausoleil, S. A., Bakalarski, C. E., Li, X., Villen, J., and Gygi, S. P. (2006) Optimization and use of peptide mass measurement accuracy in shotgun proteomics. Mol Cell Proteomics 5, 1326-37.CrossRefPubMedGoogle Scholar
  20. 20.
    Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J., and Gygi, S. P. (2006) A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat Biotechnol 24, 1285-92.CrossRefPubMedGoogle Scholar
  21. 21.
    Elias, J. E., Gibbons, F. D., King, O. D., Roth, F. P., and Gygi, S. P. (2004) Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat Biotechnol 22, 214-19.CrossRefPubMedGoogle Scholar
  22. 22.
    Jiang, X., Han, G., Ye, M., and Zou, H. (2007) Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics. BMC Bioinformatics 8, 323.CrossRefPubMedGoogle Scholar
  23. 23.
    Kall, L., Canterbury, J. D., Weston, J., Noble, W. S., and MacCoss, M. J. (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 4, 923-5.CrossRefPubMedGoogle Scholar
  24. 24.
    Binz, P. A., Barkovich, R., Beavis, R. C., Creasy, D., Horn, D. M., Julian, R. K., Jr., Seymour, S. L., Taylor, C. F., and Vandenbrouck, Y. (2008) Guidelines for reporting the use of mass spectrometry informatics in proteomics. Nat Biotechnol 26, 862.CrossRefPubMedGoogle Scholar
  25. 25.
    Bradshaw, R. A., Burlingame, A. L., Carr, S., and Aebersold, R. (2006) Reporting protein identification data: the next generation of guidelines. Mol Cell Proteomics 5, 787-8.CrossRefPubMedGoogle Scholar
  26. 26.
    Taylor, C. F. (2006) Minimum reporting requirements for proteomics: a MIAPE primer. Proteomics 6 Suppl 2, 39-44.CrossRefPubMedGoogle Scholar
  27. 27.
    Huttlin, E. L., Hegeman, A. D., Harms, A. C., and Sussman, M. R. (2007) Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy. J Proteome Res 6, 392-98.CrossRefPubMedGoogle Scholar
  28. 28.
    Kall, L., Storey, J. D., MacCoss, M. J., and Noble, W. S. (2008) Posterior error probabilities and false discovery rates: two sides of the same coin. J Proteome Res 7, 40-4.CrossRefPubMedGoogle Scholar
  29. 29.
    Tang, W. H., Shilov, I. V., and Seymour, S. L. (2008) Nonlinear fitting method for determining local false discovery rates from decoy database searches. J Proteome Res 7(9):3661-7.CrossRefPubMedGoogle Scholar
  30. 30.
    Nesvizhskii, A. I., Keller, A., Kolker, E., and Aebersold, R. (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75, 4646-58.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Cell BiologyHarvard Medical SchoolBostonUSA
  2. 2.Taplin Biological Mass Spectrometry FacilityHarvard Medical SchoolBostonUSA

Personalised recommendations