Automated Data Integration and Determination of Posttranslational Modifications with the Protein Inference Engine

  • Stuart R. Jefferys
  • Morgan C. GiddingsEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 694)


This chapter describes using the Protein Inference Engine (PIE) to integrate various types of data – especially top down and bottom up mass spectrometer (MS) data – to describe a protein’s posttranslational modifications (PTMs). PTMs include cleavage events such as the n-terminal loss of methionine and residue modifications like phosphorylation. Modifications are key elements in many biological processes, but are difficult to study as no single, general method adequately characterizes a protein’s PTMs; manually integrating data from several MS experiments is usually required. The PIE is designed to automate this process using a guess and refine process similar to how an expert manually integrates data. The PIE repeatedly “imagines” a possible modification set, evaluates it using available data, and then tries to improve on it. After many rounds of refinement, the resulting modification set is proposed as a candidate answer. Multiple candidate answers are generated to obtain both best and near-best answers. Near-best answers are crucial in allowing for proteins with more than one supported modification pattern (isoforms) and obtaining robust results given incomplete and inconsistent data.

The goal of this chapter is to walk the reader through installing and using the downloadable version of PIE, both in general and by means of a specific, detailed example. The example integrates several types of experimental and background (prior) data. It is not a “perfect-world” scenario, but has been designed to illustrate several real-world difficulties that may be encountered when trying to analyze imperfect data.

Key words

PTM MCMC Simulated annealing Proteomics Top-down Bottom-up Data ­integration PIE 


  1. 1.
    Seo, J. and Lee, K. J. (2004) Post-translational modifications and their biological functions: proteomic analysis and systematic approaches. J. Biochem. Mol. Biol. 37, 35–44.PubMedCrossRefGoogle Scholar
  2. 2.
    Walsh, C. T., Garneau-Tsodikova, S., and Gatto, G. J. (2005) Protein posttranslational modifications: the chemistry of proteome diversifications. Angew. Chem. Int. Ed. Engl. 44, 7342–7372.PubMedCrossRefGoogle Scholar
  3. 3.
    Kollmann, M., Lovdok, L., Bartholome, K., Timmer, J., and Sourjik, V. (2005) Design principles of a bacterial signaling network. Nature 438, 504–507.PubMedCrossRefGoogle Scholar
  4. 4.
    Kentner, D. and Sourjik, V. (2006) Spatial organization of the bacterial chemotaxis system. Curr. Opin. Microbiol. 9, 619–624.PubMedCrossRefGoogle Scholar
  5. 5.
    Shi, Y. (2007) Histone lysine demethylases: emerging roles in development, physiology and disease. Nat. Rev. Genet. 8, 829–833.PubMedCrossRefGoogle Scholar
  6. 6.
    Minamoto, T., Buschmann, T., Habelhah, H., Matusevich, E., Tahara, H., Boerresen-Dale, A. L., et al. (2001) Distinct pattern of p53 phosphorylation in human tumors. Oncogene. 20, 3341–3347.PubMedCrossRefGoogle Scholar
  7. 7.
    Banerjee, A. and Gerondakis, S. (2007) Coordinating TLR-activated signaling pathways in cells of the immune system. Immunol. Cell Biol. 85, 420–424.PubMedCrossRefGoogle Scholar
  8. 8.
    Mann, M. and Jensen, O. N. (2003) Proteomic analysis of post-translational modifications. Nat. Biotechnol. 21, 255–261.PubMedCrossRefGoogle Scholar
  9. 9.
    Domon, B. and Aebersold, R. (2006) Mass spectrometry and protein analysis. Science 312, 212–217.PubMedCrossRefGoogle Scholar
  10. 10.
    Albrethsen, J. (2007) Reproducibility in protein profiling by MALDI-TOF mass spectrometry. Clin. Chem. 53, 852–858.PubMedCrossRefGoogle Scholar
  11. 11.
    Eng, J. K., McCormack, A. L., and Yates III, J. R. (1994) An approach to correlate tandem mass spectra data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989.CrossRefGoogle Scholar
  12. 12.
    Perkins, D. N., Pappin, D. J., Creasy, D. M., and Cottrell, J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567.PubMedCrossRefGoogle Scholar
  13. 13.
    Craig, R. and Beavis, R. C. (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467.PubMedCrossRefGoogle Scholar
  14. 14.
    Little, D. P., Speir, J. P., Senko, M. W., O’Connor, P. B., and McLafferty, F. W. (1994). Infrared multiphoton dissociation of large multiply charged ions for biomolecule sequencing. Anal. Chem. 66, 2809–2815.PubMedCrossRefGoogle Scholar
  15. 15.
    Kelleher, N. L., Zubarev, R. A., Bush, K., Furie, B., Furie, B. C., McLafferty, F. W., and Walsh, C. T. (1999) Localization of labile posttranslational modifications by electron capture dissociation: the case of gamma-carboxyglutamic acid. Anal. Chem. 71, 4250–4253.PubMedCrossRefGoogle Scholar
  16. 16.
    Zubarev, R. A., Haselmann, K. F., Budnik, B., Kjeldsen, F., and Jensen, F. (2002) Toward and understanding of the mechanisms of electron-capture dissociation: a historical perspective and modern ideas. Eur. J. Mass. Spectrom. 8, 337–349.CrossRefGoogle Scholar
  17. 17.
    Siuti, N. and Kelleher, N. L. (2007) Decoding protein modifications using top-down mass spectrometry. Nat. Methods. 4, 817–821.PubMedCrossRefGoogle Scholar
  18. 18.
    VerBerkmoes, N. C., Bundy, J. L., Hauser, L., Asano, K. G., Razumovskaya, J., Larimer, F., et al. (2002) Integrating “top-down” and “bottom-up” mass spectrometric approaches for proteomic analysis of Shewanella oneidensis. J. Proteome Res. 1, 239–252.PubMedCrossRefGoogle Scholar
  19. 19.
    Strader, M. B., Verberkmoes, N. C., Tabb, D. L., Connelly, H. M., Barton, J. W., Bruce, B. D., et al. (2004) Characterization of the 70S ribosome from Rhodopseudomonas palustris using an integrated “top-down” and “bottom-up” mass spectrometric approach. J. Proteome Res. 3, 965–978.PubMedCrossRefGoogle Scholar
  20. 20.
    Yu, Y., Ji, H., Doudna, J. A., and Leary, J. A. (2005) Mass spectrometric analysis of the human 40S ribosomal subunit: native and HCV IRES-bound complexes. Protein Sci. 14, 1438–1446.PubMedCrossRefGoogle Scholar
  21. 21.
    Kertesz, V., Connelly, H. M., Erickson, B. K., and Hettich, R. L. (2009) PTMSearchPlus: software tool for automated protein identification and post-translational modification characterization by integrating accurate intact protein mass and bottom-up mass spectrometric data searches. Anal. Chem. 81, 8387–8395.PubMedCrossRefGoogle Scholar
  22. 22.
    Wilkins, M. R., Gasteiger, E., Gooley, A. A., Herbert, B. R., Molloy, M. P., Binz, P. A., et al. (1999) High-throughput mass spectrometric discovery of protein post-translational modifications. J. Mol. Biol. 289, 645–657.PubMedCrossRefGoogle Scholar
  23. 23.
    Blom, N., Gammeltoft, S., and Brunak, S. (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294, 1351–1362.PubMedCrossRefGoogle Scholar
  24. 24.
    Monigatti, F., Gasteiger, E., Bairoch, A., and Jung, E. (2002) The sulfinator: predicting tyrosine sulfation sites in protein sequences. Bioinformatics 18, 769–770.PubMedCrossRefGoogle Scholar
  25. 25.
    Bendtsen, J. D., Nielsen, H., von Heijne, G., and Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795.PubMedCrossRefGoogle Scholar
  26. 26.
    Frottin, F., Martinez, A., Peynot, P., Mitra, S., Holz, R. C., Giglione, C., and Meinnel, T. (2006) The proteomics of N-terminal methionine cleavage. Mol. Cell. Proteomics 51, 2336–2349.Google Scholar
  27. 27.
    Metropolis, N., Rosenbluth, A. W, Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953) Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092.CrossRefGoogle Scholar
  28. 28.
    Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983) Optimization by simulated annealing. Science 220, 671–680.PubMedCrossRefGoogle Scholar
  29. 29.
    Brocchieri, L. and Karlin, S. (2005). Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 33, 3390–3400.PubMedCrossRefGoogle Scholar
  30. 30.
    Hastings, W. K. (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109.CrossRefGoogle Scholar
  31. 31.
    Huelsenbeck, J. P., Ronquist, F., Nielsen, R., and Bollback, J. P. (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310–2314.PubMedCrossRefGoogle Scholar
  32. 32.
    Holmes, M. R. and Giddings, M. C. (2004) Prediction of posttranslational modifications using intact-protein mass spectrometric data. Anal. Chem. 76, 276–282.PubMedCrossRefGoogle Scholar
  33. 33.
    Creasy, D. M. and Cottrell, J. S. (2004) Unimod: protein modifications for mass spectrometry. Proteomics 4, 1534–1536.PubMedCrossRefGoogle Scholar
  34. 34.
    Wisz, M. S, Suarez, M. K, Holmes, M. R, and Giddings, M. C. (2004) GFSWeb: a web tool for genome-based identification of proteins from mass spectrometric samples. J. Proteome Res. 3, 1292–1295.PubMedCrossRefGoogle Scholar
  35. 35.
    Searle, B. C, Dasari, S., Wilmarth, P. A., Turner, M., Reddy, A. P., David, L. L., and Nagalla, S. R. (2005) Identification of protein modifications using MS/MS de novo sequencing and the OpenSea alignment algorithm. J. Proteome Res. 4, 546–554.PubMedCrossRefGoogle Scholar
  36. 36.
    Gotoh, O. (1982) An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708.PubMedCrossRefGoogle Scholar
  37. 37.
    Lee, T. Y., Huang, H. D., Hung, J. H. Huang, H. Y., Yang, Y. S., and Wang, T. H. (2006) dbPTM: an information repository of protein posttranslational modification. Nucleic Acids Res. 34, D622–D627.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Department of Bioinformatics & Computational BiologyThe University of North Carolina at Chapel HillChapel HillUSA
  2. 2.Departments of Microbiology & Immunology and Biomedical EngineeringThe University of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations