GA-Novo: De Novo Peptide Sequencing via Tandem Mass Spectrometry Using Genetic Algorithm

  • Samaneh AzariEmail author
  • Bing Xue
  • Mengjie Zhang
  • Lifeng Peng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11454)


Proteomics is the large-scale analysis of the proteins. The common method for identifying proteins and characterising their amino acid sequences is to digest the proteins into peptides, analyse the peptides using mass spectrometry and assign the resulting tandem mass spectra (MS/MS) to peptides using database search tools. However, database search algorithms are highly dependent on a reference protein database and they cannot identify peptides and proteins not included in the database. Therefore, de novo sequencing algorithms are developed to overcome the problem by directly reconstructing the peptide sequence of an MS/MS spectrum without using any protein database. Current de novo sequencing algorithms often fail to construct the completely matched sequences, and produce partial matches. In this study, we propose a genetic algorithm based method, GA-Novo, to solve the complex optimisation task of de novo peptide sequencing, aiming at constructing full length sequences. Given an MS/MS spectrum, GA-Novo optimises the amino acid sequences to best fit the input spectrum. On the testing dataset, GA-Novo outperforms PEAKS, the most commonly used software for this task, by constructing 8% higher number of fully matched peptide sequences, and 4% higher recall at partially matched sequences.


Genetic algorithm Tandem mass spectrometry De novo sequencing Proteomics 


  1. 1.
    Papayannopoulos, I.A.: The interpretation of collision-induced dissociation tandem mass spectra of peptides. Mass Spectrom. Rev. 14(1), 49–73 (1995)CrossRefGoogle Scholar
  2. 2.
    Xu, C., Ma, B.: Complexity and scoring function of MS/MS peptide de novo sequencing. Comput. Syst. Bioinform. Conf. 5, 361–369 (2006)Google Scholar
  3. 3.
    Sakurai, T., Matsuo, T., Matsuda, H., Katakuse, I.: PAAS 3: a computer program to determine probable sequence of peptides from mass spectrometric data. Biol. Mass Spectrom. 11(8), 396–399 (1984)CrossRefGoogle Scholar
  4. 4.
    Ma, B.: Novor: real-time peptide de novo sequencing software. J. Am. Soc. Mass Spectrom. 26(11), 1885–1894 (2015)CrossRefGoogle Scholar
  5. 5.
    Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20), 2337–2342 (2003)CrossRefGoogle Scholar
  6. 6.
    Nielsen, M.L.: Characterization of polypeptides by tandem mass spectrometry using complementary fragmentation techniques. Ph.D. thesis, Acta Universitatis Upsaliensis (2006)Google Scholar
  7. 7.
    Webb-Robertson, B.J.M., Cannon, W.R.: Current trends in computational inference from mass spectrometry-based proteomics. Brief. Bioinform. 8(5), 304–317 (2007)CrossRefGoogle Scholar
  8. 8.
    Heredia-Langner, A., Cannon, W.R., Jarman, K.D., Jarman, K.H.: Sequence optimization as an alternative to de novo analysis of tandem mass spectrometry data. Bioinformatics 20(14), 2296–2304 (2004)CrossRefGoogle Scholar
  9. 9.
    Kistowski, M., Gambin, A.: Optimization algorithm for de novo analysis of tandem mass spectrometry data. BioTechnologia J. Biotechnol. Comput. Biol. Bionanotechnol. 92(3), 296–300 (2011)Google Scholar
  10. 10.
    Allmer, J.: Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert Rev. Proteomics 8(5), 645–657 (2011)CrossRefGoogle Scholar
  11. 11.
    Eng, J.K., McCormack, A.L., Yates, J.R.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5(11), 976–989 (1994)CrossRefGoogle Scholar
  12. 12.
    Yu, F., Li, N., Yu, W.: PIPI: PTM-invariant peptide identification using coding method. bioRxiv, p. 055806 (2016)Google Scholar
  13. 13.
    Herrmann, R.L.B., Hilderbrand, A.: Peptide fragmentation overview. In: Principles of Mass Spectrometry Applied to Biomolecules, vol. 10, p. 279 (2006)Google Scholar
  14. 14.
    Wessels, H.J., et al.: A comprehensive full factorial lc-ms/ms proteomics benchmark data set. Proteomics 12(14), 2276–2281 (2012)CrossRefGoogle Scholar
  15. 15.
    Cottrell, J.S., London, U.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18), 3551–3567 (1999)CrossRefGoogle Scholar
  16. 16.
    Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Samaneh Azari
    • 1
    Email author
  • Bing Xue
    • 1
  • Mengjie Zhang
    • 1
  • Lifeng Peng
    • 2
  1. 1.School of Engineering and Computer ScienceVictoria University of WellingtonWellingtonNew Zealand
  2. 2.Centre for Biodiscovery and School of Biological SciencesVictoria University of WellingtonWellingtonNew Zealand

Personalised recommendations