Constrained De Novo Sequencing of neo-Epitope Peptides Using Tandem Mass Spectrometry

  • Sujun Li
  • Alex DeCourcy
  • Haixu Tang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10812)


Neoepitope peptides are newly formed antigens presented by major histocompatibility complex class I (MHC-I) on cell surfaces. The cells presenting neoepitope peptides are recognized and subsequently killed by cytotoxic T-cells. Immunopeptidomic approaches aim to characterize the peptide repertoire (including neoepitope) associated with the MHC-I molecules on the surface of tumor cells using proteomic technologies, providing critical information for designing effective immunotherapy strategies. We developed a novel constrained de novo sequencing algorithm to identify neo-epitope peptides from tandem mass spectra acquired in immunopeptidomic analyses. Our method incorporates prior probabilities to putative peptides according to position specific scoring matrices (PSSMs) representing the sequence preferences recognized by MHC-I molecules. We implemented a dynamic programming algorithm to determine the peptide sequences with an optimal posterior matching score for each given MS/MS spectrum. Similar to the de novo peptide sequencing, the dynamic programming algorithm allows an efficient searching in the entire peptide sequence space. On an LC-MS/MS dataset, we demonstrated the performance of our algorithm in detecting the neoepitope peptides bound by the HLA-C*0501 molecules that were superior to database search approaches and existing general purpose de novo peptide sequencing algorithms.


De novo neo-epitope Mass spectrometry Proteomics 



This work was supported by the NIH grant 1R01AI108888 and the Indiana University Precision Health Initiative (IU-PHI).


  1. 1.
    Bhattacharya, R., Sivakumar, A., Tokheim, C., Guthrie, V.B., Anagnostou, V., Velculescu, V.E., Karchin, R.: Evaluation of machine learning methods to predict peptide binding to MHC class I proteins. bioRxiv, p. 154757 (2017)Google Scholar
  2. 2.
    Blum, J.S., Wearsch, P.A., Cresswell, P.: Pathways of antigen processing. Annu. Rev. Immunol. 31, 443–473 (2013)CrossRefGoogle Scholar
  3. 3.
    Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., Bairoch, A.: UniProtKB/Swiss-Prot: the manually annotated section of the UniProt knowledgebase. Plant Bioinf.: Methods Protoc. 406, 89–112 (2007)CrossRefGoogle Scholar
  4. 4.
    Bouvier, M., Wiley, D.C.: Importance of peptide amino and carboxyl termini to the stability of MHC class I molecules. Science 265(5170), 398–402 (1994)CrossRefGoogle Scholar
  5. 5.
    Caron, E., Kowalewski, D.J., Koh, C.C., Sturm, T., Schuster, H., Aebersold, R.: Analysis of major histocompatibility complex (MHC) immunopeptidomes using mass spectrometry. Mol. Cell. Proteomics 14(12), 3105–3117 (2015)CrossRefGoogle Scholar
  6. 6.
    Chalmers, Z.R., Connelly, C.F., Fabrizio, D., Gay, L., Ali, S.M., Ennis, R., Schrock, A., Campbell, B., Shlien, A., Chmielecki, J., et al.: Analysis of 100,000 human cancer genomes reveals the landscape of tumor mutational burden. Genome Med. 9(1), 34 (2017)CrossRefGoogle Scholar
  7. 7.
    Comber, J.D., Philip, R.: MHC class I antigen presentation and implications for developing a new generation of therapeutic vaccines. Ther. Adv. Vaccines 2(3), 77–89 (2014)CrossRefGoogle Scholar
  8. 8.
    Cottrell, J.S., London, U.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18), 3551–3567 (1999)CrossRefGoogle Scholar
  9. 9.
    Dustin, M.L.: Cancer immunotherapy: killers on sterols. Nature 531(7596), 583–584 (2016)CrossRefGoogle Scholar
  10. 10.
    Editorial, N.B.: The problem with neoantigen prediction. Nat. Biotech. 35(2), 97 (2017)CrossRefGoogle Scholar
  11. 11.
    Elias, J.E., Gygi, S.P.: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4(3), 207–214 (2007)CrossRefGoogle Scholar
  12. 12.
    Eng, J.K., McCormack, A.L., Yates, J.R.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5(11), 976–989 (1994)CrossRefGoogle Scholar
  13. 13.
    Flower, D.R.: Towards in silico prediction of immunogenic epitopes. TRENDS Immunol. 24(12), 667–674 (2003)CrossRefGoogle Scholar
  14. 14.
    Frank, A., Pevzner, P.: PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77(4), 964–973 (2005)CrossRefGoogle Scholar
  15. 15.
    Gabriel, C., Fürst, D., Faé, I., Wenda, S., Zollikofer, C., Mytilineos, J., Fischer, G.: HLA typing by next-generation sequencing-getting closer to reality. HLA 83(2), 65–75 (2014)Google Scholar
  16. 16.
    Jeong, K., Kim, S., Pevzner, P.A.: UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics 29(16), 1953–1962 (2013)CrossRefGoogle Scholar
  17. 17.
    Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292(2), 195–202 (1999)CrossRefGoogle Scholar
  18. 18.
    Kaur, G., Gras, S., Mobbs, J.I., Vivian, J.P., Cortes, A., Barber, T., Kuttikkatte, S.B., Jensen, L.T., Attfield, K.E., Dendrou, C.A., et al.: Structural and regulatory diversity shape HLA-C protein expression levels. Nat. Commun. 8 (2017)Google Scholar
  19. 19.
    Kim, S., Pevzner, P.A.: MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 5 (2014)CrossRefGoogle Scholar
  20. 20.
    Kvistborg, P., Clynes, R., Song, W., Yuan, J.: Immune monitoring technology primer: whole exome sequencing for neoantigen discovery and precision oncology. J. Immunother. Cancer 4(1), 22 (2016)CrossRefGoogle Scholar
  21. 21.
    Laidlaw, B.J., Craft, J.E., Kaech, S.M.: The multifaceted role of CD4+ T cells in CD8+ T cell memory. Nat. Rev. Immunol. 16(2), 102–111 (2016)CrossRefGoogle Scholar
  22. 22.
    Le Gallo, M., Rudd, M.L., Urick, M.E., Hansen, N.F., Zhang, S., Lozy, F., Sgroi, D.C., Vidal Bel, A., Matias-Guiu, X., Broaddus, R.R., et al.: Somatic mutation profiles of clear cell endometrial tumors revealed by whole exome and targeted gene sequencing. Cancer 123, 3261–3268 (2017)CrossRefGoogle Scholar
  23. 23.
    Li, Y.F., Arnold, R.J., Radivojac, P., Tang, H.: Protein identification problem from a Bayesian point of view. Stat. Interface 5(1), 21 (2012)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Liepe, J., Marino, F., Sidney, J., Jeko, A., Bunting, D.E., Sette, A., Kloetzel, P.M., Stumpf, M.P., Heck, A.J., Mishto, M.: A large fraction of HLA class I ligands are proteasome-generated spliced peptides. Science 354(6310), 354–358 (2016)CrossRefGoogle Scholar
  25. 25.
    Linnemann, C., Van Buuren, M.M., Bies, L., Verdegaal, E.M., Schotte, R., Calis, J.J., Behjati, S., Velds, A., Hilkmann, H., El Atmioui, D., et al.: High-throughput epitope discovery reveals frequent recognition of neo-antigens by CD4+ T cells in human melanoma. Nat. Med. 21(1), 81–85 (2015)CrossRefGoogle Scholar
  26. 26.
    Ma, B.: Novor: real-time peptide de novo sequencing software. J. Am. Soc. Mass Spectrom. 26(11), 1885–1894 (2015)CrossRefGoogle Scholar
  27. 27.
    Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20), 2337–2342 (2003)CrossRefGoogle Scholar
  28. 28.
    Neefjes, J., Jongsma, M.L., Paul, P., Bakke, O.: Towards a systems understanding of MHC class I and MHC class II antigen presentation. Nat. Rev. Immunol. 11(12), 823–836 (2011)CrossRefGoogle Scholar
  29. 29.
    Nik-Zainal, S., Davies, H., Staaf, J., Ramakrishna, M., Glodzik, D., Zou, X., Martincorena, I., Alexandrov, L.B., Martin, S., Wedge, D.C., et al.: Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534(7605), 47–54 (2016)CrossRefGoogle Scholar
  30. 30.
    Schumacher, T.N., Schreiber, R.D.: Neoantigens in cancer immunotherapy. Science 348(6230), 69–74 (2015)CrossRefGoogle Scholar
  31. 31.
    Tanner, S., Shu, H., Frank, A., Wang, L.-C., Zandi, E., Mumby, M., Pevzner, P.A., Bafna, V.: InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77(14), 4626–4639 (2005)CrossRefGoogle Scholar
  32. 32.
    Tran, N.H., Levine, Z., Xin, L., Shan, B., Li, M.: Protein identification with deep learning: from abc to xyz. arXiv preprint (2017). arXiv:1710.02765
  33. 33.
    Tran, N.H., Zhang, X., Xin, L., Shan, B., Li, M.: De novo peptide sequencing by deep learning. Proc. Natl. Acad. Sci. 114(31), 8247–8252 (2017)CrossRefGoogle Scholar
  34. 34.
    Vacic, V., Iakoucheva, L.M., Radivojac, P.: Two sample logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics 22(12), 1536–1537 (2006)CrossRefGoogle Scholar
  35. 35.
    Vita, R., Overton, J.A., Greenbaum, J.A., Ponomarenko, J., Clark, J.D., Cantrell, J.R., Wheeler, D.K., Gabbard, J.L., Hix, D., Sette, A., et al.: The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 43(D1), D405–D412 (2014)CrossRefGoogle Scholar
  36. 36.
    Vizcaíno, J.A., Deutsch, E.W., Wang, R., Csordas, A., Reisinger, F., Rios, D., Dianes, J.A., Sun, Z., Farrah, T., Bandeira, N., et al.: Proteomexchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32(3), 223–226 (2014)CrossRefGoogle Scholar
  37. 37.
    Wan, Y., Yang, A., Chen, T.: PepHMM: a hidden markov model based scoring function for mass spectrometry database search. Anal. Chem. 78(2), 432–437 (2006)CrossRefGoogle Scholar
  38. 38.
    Xie, C., Yeo, Z.X., Wong, M., Piper, J., Long, T., Kirkness, E.F., Biggs, W.H., Bloom, K., Spellman, S., Vierra-Green, C., et al.: Fast and accurate HLA typing from short-read next-generation sequence data with xHLA. Proc. Natl. Acad. Sci. 201707945 (2017)Google Scholar
  39. 39.
    Yarchoan, M., Johnson III, B.A., Lutz, E.R., Laheru, D.A., Jaffee, E.M.: Targeting neoantigens to augment antitumour immunity. Nat. Rev. Cancer 17(4), 209–222 (2017)CrossRefGoogle Scholar
  40. 40.
    Zhang, L., Udaka, K., Mamitsuka, H., Zhu, S.: Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and tools. Briefings Bioinf. 13(3), 350–364 (2011)CrossRefGoogle Scholar
  41. 41.
    Zhao, Y., Tang, H., Ye, Y.: RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics 28(1), 125–126 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Informatics, Computing and EngineeringIndiana UniversityBloomingtonUSA

Personalised recommendations