Skip to main content
Log in

Identification of Single Amino Acid Substitutions in Proteogenomics

  • Mini–Review
  • Published:
Biochemistry (Moscow) Aims and scope Submit manuscript

Abstract

An important aim of proteogenomics, which combines data of high throughput nucleic acid and protein analysis, is to reliably identify single amino acid substitutions representing a main type of coding genome variants. Exact knowledge of deviations from the consensus genome can be utilized in several biomedical fields, such as studies of expression of mutated proteins in cancer, deciphering heterozygosity mechanisms, identification of neoantigens in anticancer vaccine production, search for RNA editing sites at the level of the proteome, etc. Generation of this new knowledge requires processing of large data arrays from high–resolution mass spectrometry, where information on single–point protein variation is often difficult to extract. Accordingly, a significant problem in proteogenomic analysis is the presence of high levels of false positive results for variant–containing peptides in the produced results. Here we review recently suggested approaches of high quality proteomics data processing that may provide more reliable identification of single amino acid substitutions, especially contrary to residue modifications occurring in vitro and in vivo. Optimized methods for assessment of false discovery rate save instrumental and computational time spent for validation of interesting findings of amino acid polymorphism by orthogonal methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

ADAR:

adenosine deaminase acting on RNA

DDA:

data–dependent acquisition

DIA:

data–independent acquisition

FDR:

false discovery rate

NGS:

next generation sequencing

SNV:

single nucleotide variant

References

  1. Anderson, L. (2014) Six decades searching for meaning in the proteome, J. Proteom., 107, 24–30.

    Article  CAS  Google Scholar 

  2. Bantscheff, M., Scholten, A., and Heck, A. J. (2009) Revealing promiscuous drug–target interactions by chemical proteomics, Drug Discov. Today, 14, 1021–1029.

    Article  CAS  PubMed  Google Scholar 

  3. Bantscheff, M., and Drewes, G. (2012) Chemoproteomic approaches to drug target identification and drug profiling, Bioorg. Med. Chem., 20, 1973–1978.

    Article  CAS  PubMed  Google Scholar 

  4. Liu, Y., and Guo, M. (2014) Chemical proteomic strategies for the discovery and development of anticancer drugs, Proteomics, 14, 399–411.

    Article  CAS  PubMed  Google Scholar 

  5. Mu, W., Lu, H. M., Chen, J., Li, S., and Elliott, A. M. (2016) Sanger confirmation is required to achieve optimal sensitivity and specificity in next–generation sequencing panel testing, J Mol. Diagn., 18, 923–932.

    Article  CAS  PubMed  Google Scholar 

  6. Marx, V. (2013) Biology: The big challenges of big data, Nature, 498, 255–260.

    Article  CAS  PubMed  Google Scholar 

  7. Domon, B., and Aebersold, R. (2010) Options and considerations when selecting a quantitative proteomics strategy, Nat. Biotechnol., 28, 710–721.

    Article  CAS  PubMed  Google Scholar 

  8. Nogueira, F. C., and Domont, G. B. (2014) Survey of shotgun proteomics, Methods Mol. Biol., 1156, 3–23.

    Article  CAS  PubMed  Google Scholar 

  9. Zhang, Y., Fonslow, B. R., Shan, B., Baek, M. C., and Yates, J. R. (2013) Protein analysis by shotgun/bottom-up proteomics, Chem. Rev., 113, 2343–2394.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Yates, J. R. (2013) The revolution and evolution of shotgun proteomics for large–scale proteome analysis, J. Am Chem. Soc., 135, 1629–1640.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Elias, J. E., and Gygi, S. P. (2007) Target–decoy search strategy for increased confidence in large–scale protein identifications by mass spectrometry, Nat. Methods, 4, 207–214.

    Article  CAS  PubMed  Google Scholar 

  12. Burger, T. (2017) A gentle introduction to the statistical foundations of false discovery rate in quantitative proteomics, J. Proteome Res., in print.

    Google Scholar 

  13. Nesvizhskii, A. I. (2007) Protein identification by tandem mass spectrometry and sequence database searching, Methods Mol. Biol., 367, 87–119.

    CAS  PubMed  Google Scholar 

  14. Shteynberg, D., Nesvizhskii, A. I., Moritz, R. L., and Deutsch, E. W. (2013) Combining results of multiple search engines in proteomics, Mol. Cell. Proteom., 12, 2383–2393.

    Article  CAS  Google Scholar 

  15. Eng, J. K., Searle, B. C., Clauser, K. R., and Tabb, D. L. (2011) A face in the crowd: recognizing peptides through database search, Mol. Cell. Proteom., 10, R111.009522.

  16. Bruce, C., Stone, K., Gulcicek, E., and Williams, K. (2013) Proteomics and the analysis of proteomic data: 2013 overview of current protein–profiling technologies, Curr. Protoc. Bioinform., 13, 13–21.

    Google Scholar 

  17. Gupta, N., Bandeira, N., Keich, U., and Pevzner, P. A. (2011) Target–decoy approach and false discovery rate: when things may go wrong, J. Am. Soc. Mass Spectrom., 22, 1111–1120.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Levitsky, L. I., Ivanov, M. V., Lobas, A. A., and Gorshkov, M. V. (2017) Unbiased false discovery rate estimation for shotgun proteomics based on the target–decoy approach, J. Proteome Res., 16, 393–397.

    Article  CAS  PubMed  Google Scholar 

  19. Smith, L. M., and Kelleher, N. L. (2013) Consortium for top down proteomics. Proteoform: a single term describing protein complexity, Nat. Methods, 10, 186–187.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Lisitsa, A., Moshkovskii, S., Chernobrovkin, A., Ponomarenko, E., and Archakov, A. (2014) Profiling proteoforms: promising follow–up of proteomics for biomarker discovery, Exp. Rev. Proteom., 11, 121–129.

    Article  CAS  Google Scholar 

  21. Ansong, C., Purvine, S. O., Adkins, J. N., Lipton, M. S., and Smith, R. D. (2008) Proteogenomics: needs and roles to be filled by proteomics in genome annotation, Brief. Funct. Gen. Proteom., 7, 50–62.

    Article  CAS  Google Scholar 

  22. Nesvizhskii, A. I. (2014) Proteogenomics: concepts, applications and computational strategies, Nat. Methods, 11, 1114–1125.

    CAS  PubMed  Google Scholar 

  23. Lek, M., Karczewski, K. J., Minikel, E. V., Samocha, K. E., Banks, E., Fennell, T., O’Donnell–Luria, A. H., Ware, J. S., Hill, A. J., Cummings, B. B., Tukiainen, T., Birnbaum, D. P., Kosmicki, J. A., Duncan, L. E., Estrada, K., Zhao, F., Zou, J., Pierce–Hoffman, E., Berghout, J., Cooper, D. N., Deflaux, N., DePristo, M., Do, R., Flannick, J., Fromer, M., Gauthier, L., Goldstein, J., Gupta, N., Howrigan, D., Kiezun, A., Kurki, M. I., Moonshine, A. L., Natarajan, P., Orozco, L., Peloso, G. M., Poplin, R., Rivas, M. A., Ruano–Rubio, V., Rose, S. A., Ruderfer, D. M., Shakir, K., Stenson, P. D., Stevens, C., Thomas, B. P., Tiao, G., Tusie–Luna, M. T., Weisburd, B., Won, H. H., Yu, D., Altshuler, D. M., Ardissino, D., Boehnke, M., Danesh, J., Donnelly, S., Elosua, R., Florez, J. C., Gabriel, S. B., Getz, G., Glatt, S. J., Hultman, C. M., Kathiresan, S., Laakso, M., McCarroll, S., McCarthy, M. I., McGovern, D., McPherson, R., Neale, B. M., Palotie, A., Purcell, S. M., Saleheen, D., Scharf, J. M., Sklar, P., Sullivan, P. F., Tuomilehto, J., Tsuang, M. T., Watkins, H. C., Wilson, J. G., Daly, M. J., and MacArthur, D. G. (2016) Exome aggregation consortium. Analysis of protein–coding genetic variation in 60,706 humans, Nature, 536, 285–291.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Pauling, L., Itano, H. A., Singer, S. J., and Wells, I. C. (1949) Sickle cell anemia, a molecular disease, Science, 110, 543–548.

    CAS  PubMed  Google Scholar 

  25. Vogel, C., and Marcotte, E. M. (2012) Insights into the regulation of protein abundance from proteomic and transcriptomic analyses, Nat. Rev. Genet., 13, 227–232.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Lievre, A., Bachet, J. B., Boige, V., Cayre, A., Le Corre, D., Buc, E., Ychou, M., Bouche, O., Landi, B., Louvet, C., Andre, T., Bibeau, F., Diebold, M. D., Rougier, P., Ducreux, M., Tomasic, G., Emile, J. F., Penault–Llorca, F., and Laurent–Puig, P. (2008) KRAS mutations as an independent prognostic factor in patients with advanced colorectal cancer treated with cetuximab, J. Clin. Oncol., 26, 374–379.

    Article  CAS  PubMed  Google Scholar 

  27. Polyakova, A., Kuznetsova, K., and Moshkovskii, S. (2015) Proteogenomics meets cancer immunology: mass spectrometric discovery and analysis of neoantigens, Exp. Rev. Proteom., 12, 533–541.

    Article  CAS  Google Scholar 

  28. Yadav, M., Jhunjhunwala, S., Phung, Q. T., Lupardus, P., Tanguay, J., Bumbaca, S., Franci, C., Cheung, T. K., Fritsche, J., Weinschenk, T., Modrusan, Z., Mellman, I., Lill, J. R., and Delamarre, L. (2014) Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing, Nature, 515, 572–576.

    Article  CAS  PubMed  Google Scholar 

  29. Gubin, M. M., Zhang, X., Schuster, H., Caron, E., Ward, J. P., Noguchi, T., Ivanova, Y., Hundal, J., Arthur, C. D., Krebber, W. J., Mulder, G. E., Toebes, M., Vesely, M. D., Lam, S. S., Korman, A. J., Allison, J. P., Freeman, G. J., Sharpe, A. H., Pearce, E. L., Schumacher, T. N., Aebersold, R., Rammensee, H. G., Melief, C. J., Mardis, E. R., Gillanders, W. E., Artyomov, M. N., and Schreiber, R. D. (2014) Checkpoint blockade cancer immunotherapy targets tumourspecific mutant antigens, Nature, 515, 577–581.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. McCoy, R. C., Wakefield, J., and Akey, J. M. (2017) Impacts of Neanderthal–introgressed sequences on the landscape of human gene expression, Cell, 168, 916–927.

    Article  CAS  PubMed  Google Scholar 

  31. Kliuchnikova, A. A., Kuznetsova, K. G., and Moshkovskii, S. A. (2016) ADAR–mediated messenger RNA editing: analysis at the proteome level, Biomed. Khim., 62, 510–519.

    Article  CAS  PubMed  Google Scholar 

  32. Kuznetsova, K. G., Ilina, I. Y., Chernobrovkin, A. L., Novikova, S. E., Farafonova, T. E., Karpov, D. S., Ivanov, M. V., Voronko, O. E., Ilgisonis, E. V., Kliuchnikova, A. A., Zgoda, V. G., Zubarev, R. A., Gorshkov, M. V., and Moshkovskii, S. A. (2017) Proteogenomics of adenosine–to–inosine RNA editing in fruit fly, Biorxiv. Preprint, doi: http://dx.doi.org/10.1101/101949.

    Google Scholar 

  33. Liscovitch–Brauer, N., Alon, S., Porath, H. T., Elstein, B., Unger, R., Ziv, T., Admon, A., Levanon, E. Y., Rosenthal, J. J., and Eisenberg, E. (2017) Trade–off between transcrip–tome plasticity and genome evolution in cephalopods, Cell, 169, 191202.e11.

    Google Scholar 

  34. Kryukov, G. V., Pennacchio, L. A., and Sunyaev, S. R. (2007) Most rare missense alleles are deleterious in humans: implications for complex disease and association studies, Am. J. Hum. Genet., 80, 727–739.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Lobas, A. A., Karpov, D. S., Kopylov, A. T., Solovyeva, E. M., Ivanov, M. V., Ilina, I. Y., Lazarev, V. N., Kuznetsova, K. G., Ilgisonis, E. V., Zgoda, V. G., Gorshkov, M. V., and Moshkovskii, S. A. (2016) Exome–based proteogenomics of HEK–293 human cell line: coding genomic variants identified at the level of shotgun proteome, Proteomics, 16, 1980–1991.

    Article  CAS  PubMed  Google Scholar 

  36. Ruggles, K. V., Tang, Z., Wang, X., Grover, H., Askenazi, M., Teubl, J., Cao, S., McLellan, M. D., Clauser, K. R., Tabb, D. L., Mertins, P., Slebos, R., Erdmann–Gilmore, P., Li, S., Gunawardena, H. P., Xie, L., Liu, T., Zhou, J. Y., Sun, S., Hoadley, K. A., Perou, C. M., Chen, X., Davies, S. R., Maher, C. A., Kinsinger, C. R., Rodland, K. D., Zhang, H., Zhang, Z., Ding, L., Townsend, R. R., Rodriguez, H., Chan, D., Smith, R. D., Liebler, D. C., Carr, S. A., Payne, S., Ellis, M. J., and Fenyo, D. (2016) An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer, Mol. Cell. Proteom., 15, 1060–1071.

    Article  CAS  Google Scholar 

  37. McFarland, C. D., Yaglom, J. A., Wojtkowiak, J. W., Scott, J. G., Morse, D. L., Sherman, M. Y., and Mirny, L. A. (2017) The damaging effect of passenger mutations on can–cer progression, Cancer Res., 77, 4763–4772.

    Article  CAS  PubMed  Google Scholar 

  38. Ivanov, M. V., Lobas, A. A., Karpov, D. S., Moshkovskii, S. A., and Gorshkov, M. V. (2017) Comparison of false discovery rate control strategies for variant peptide identifications in shotgun proteogenomics, J. Proteome Res., 16, 1936–1943.

    Article  CAS  PubMed  Google Scholar 

  39. Ning, K., and Nesvizhskii, A. I. (2010) The utility of mass spectrometry–based proteomic data for validation of novel alternative splice forms reconstructed from RNA–Seq data: a preliminary assessment, BMC Bioinformatics, 11, S14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Karpova, M. A., Karpov, D. S., Ivanov, M. V., Pyatnitskiy, M. A., Chernobrovkin, A. L., Lobas, A. A., Lisitsa, A. V., Archakov, A. I., Gorshkov, M. V., and Moshkovskii, S. A. (2014) Exome–driven characterization of the cancer cell lines at the proteome level: the NCI–60 case study, J. Proteome Res., 13, 5551–5560.

    Article  CAS  PubMed  Google Scholar 

  41. Gillet, L. C., Navarro, P., Tate, S., Rost, H., Selevsek, N., Reiter, L., Bonner, R., and Aebersold, R. (2012) Targeted data extraction of the MS/MS spectra generated by data–independent acquisition: a new concept for consistent and accurate proteome analysis, Mol. Cell. Proteom., 11, O111.016717.

  42. Sajic, T., Liu, Y., and Aebersold, R. (2015) Using data–independent, high–resolution mass spectrometry in protein biomarker research: perspectives and clinical applications, Proteom. Clin. Appl., 9, 307–321.

    CAS  Google Scholar 

  43. Huang, Q., Yang, L., Luo, J., Guo, L., Wang, Z., Yang, X., Jin, W., Fang, Y., Ye, J., Shan, B., and Zhang, Y. (2015) SWATH enables precise label–free quantification on proteome scale, Proteomics, 15, 1215–1223.

    Article  CAS  PubMed  Google Scholar 

  44. Blattmann, P., Heusel, M., and Aebersold, R. (2016) SWATH2stats: an R/Bioconductor package to process and convert quantitative SWATH–MS proteomics data for downstream analysis tools, PLoS One, 11, e0153160.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Egertson, J. D., MacLean, B., Johnson, R., Xuan, Y., and MacCoss, M. J. (2015) Multiplexed peptide analysis using data–independent acquisition and Skyline, Nat. Protoc., 10, 887–903.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Aebersold, R., and Mann, M. (2016) Mass–spectrometric exploration of proteome structure and function, Nature, 537, 347–355.

    Article  CAS  PubMed  Google Scholar 

  47. Tarasova, I. A., Masselon, C. D., Gorshkov, A. V., and Gorshkov, M. V. (2016) Predictive chromatography of peptides and proteins as a complementary tool for proteomics, Analyst, 141, 4816–4832.

    Article  CAS  PubMed  Google Scholar 

  48. Tsou, C. C., Avtonomov, D., Larsen, B., Tucholska, M., Choi, H., Gingras, A. C., and Nesvizhskii, A. I. (2015) DIA–Umpire: comprehensive computational framework for data–independent acquisition proteomics, Nat. Methods, 12, 258–264.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Creasy, D. M., and Cottrell, J. S. (2004) Unimod: protein modifications for mass spectrometry, Proteomics, 4, 1534–1536.

    Article  CAS  PubMed  Google Scholar 

  50. Hao, P., Ren, Y., Alpert, A. J., and Sze, S. K. (2011) Detection, evaluation and minimization of nonenzymatic deamidation in proteomic sample preparation, Mol. Cell. Proteom., 10, O111.009381.

  51. Chernobrovkin, A. L., Kopylov, A. T., Zgoda, V. G., Moysa, A. A., Pyatnitskiy, M. A., Kuznetsova, K. G., Ilina, I. Y., Karpova, M. A., Karpov, D. S., Veselovsky, A. V., Ivanov, M. V., Gorshkov, M. V., Archakov, A. I., and Moshkovskii, S. A. (2015) Methionine to isothreonine conversion as a source of false discovery identifications of genetically encoded variants in proteogenomics, J. Proteom., 120, 169–178.

    Article  CAS  Google Scholar 

  52. Kuznetsova, K. G., Trufanov, P. V., Moysa, A. A., Pyatnitskiy, M. A., Zgoda, V. G., Gorshkov, M. V., and Moshkovskii, S. A. (2016) Threonine versus isothreonine in synthetic peptides analyzed by high–resolution liquid chromatography/tandem mass spectrometry, Rapid Commun. Mass Spectrom., 30, 1323–1331.

    Article  CAS  PubMed  Google Scholar 

  53. Abaan, O. D., Polley, E. C., Davis, S. R., Zhu, Y. J., Bilke, S., Walker, R. L., Pineda, M., Gindin, Y., Jiang, Y., Reinhold, W. C., Holbeck, S. L., Simon, R. M., Doroshow, J. H., Pommier, Y., and Meltzer, P. S. (2013) The exomes of the NCI–60 panel: a genomic resource for cancer biology and systems pharmacology, Cancer Res., 73, 4372–4382.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Gholami, A. M., Hahne, H., Wu, Z., Auer, F. J., Meng, C., Wilhelm, M., and Kuster, B. (2013) Global proteome analysis of the NCI–60 cell line panel, Cell Rep., 4, 609–620.

    Article  CAS  PubMed  Google Scholar 

  55. Chick, J. M., Kolippakkam, D., Nusinow, D. P., Zhai, B., Rad, R., Huttlin, E. L., and Gygi, S. P. (2015) A masstolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides, Nat. Biotechnol., 33, 743–749.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Polyakova, A., Kuznetsova, K., and Moshkovskii, S. (2015) Proteogenomics meets cancer immunology: mass spectrometric discovery and analysis of neoantigens, Exp. Rev. Proteom., 12, 533–541.

    Article  CAS  Google Scholar 

  57. Kristensen, V. N., Lingjærde, O. C., Russnes, H. G., Vollan, H. K., Frigessi, A., and Borresen–Dale, A. L. (2014) Principles and methods of integrative genomic analyses in cancer, Nat. Rev. Cancer, 14, 299–313.

    Article  CAS  PubMed  Google Scholar 

  58. Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D., and Nesvizhskii, A. I. (2017) MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry–based proteomics, Nat. Methods, 14, 513–520.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Ramaswami, G., and Li, J. B. (2014) RADAR: a rigorously annotated database of A–to–I RNA editing, Nucleic Acids Res., 42, D109–1013.

    Article  CAS  PubMed  Google Scholar 

  60. Kim, S., and Pevzner, P. A. (2014) MS–GF+ makes progress towards a universal database search tool for proteomics, Nat. Commun., 5, 5277.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Craig, R., and Beavis, R. C. (2004) TANDEM: matching proteins with tandem mass spectra, Bioinformatics, 20, 1466–1467.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. A. Moshkovskii.

Additional information

Original Russian Text © S. A. Moshkovskii, M. V. Ivanov, K. G. Kuznetsova, M. V. Gorshkov, 2018, published in Biokhimiya, 2018, Vol. 83, No. 3, pp. 368–378.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moshkovskii, S.A., Ivanov, M.V., Kuznetsova, K.G. et al. Identification of Single Amino Acid Substitutions in Proteogenomics. Biochemistry Moscow 83, 250–258 (2018). https://doi.org/10.1134/S0006297918030057

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0006297918030057

Keywords

Navigation