Phylogenetic Analysis Using Protein Mass Spectrometry

  • Shiyong Ma
  • Kevin M. Downard
  • Jason W. H. Wong
Part of the Methods in Molecular Biology book series (MIMB, volume 1549)


Through advances in molecular biology, comparative analysis of DNA sequences is currently the cornerstone in the study of molecular evolution and phylogenetics. Nevertheless, protein mass spectrometry offers some unique opportunities to enable phylogenetic analyses in organisms where DNA may be difficult or costly to obtain. To date, the methods of phylogenetic analysis using protein mass spectrometry can be classified into three categories: (1) de novo protein sequencing followed by classical phylogenetic reconstruction, (2) direct phylogenetic reconstruction using proteolytic peptide mass maps, and (3) mapping of mass spectral data onto classical phylogenetic trees. In this chapter, we provide a brief description of the three methods and the protocol for each method along with relevant tools and algorithms.

Key words

Phylogenetics De novo sequencing Mass mapping Molecular evolution Phylogenetic tree Mass tree 



Part of the results presented and cited in this chapter were supported with funds from an Australian Research Council Discovery Project Grant (DP120101167) awarded to K.M.D. and J.W.H.W. S.M. is supported by a China Scholarship Council and UNSW Australia Tuition Fee scholarships. J.W.H.W. is supported by an Australian Research Council Future Fellowship (FT130100096).


  1. 1.
    Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  2. 2.
    Sanger F, Coulson AR (1975) A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J Mol Biol 94(3):441–448CrossRefPubMedGoogle Scholar
  3. 3.
    Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24(3):133–141. doi: 10.1016/j.tig.2007.12.007 CrossRefPubMedGoogle Scholar
  4. 4.
    Asara JM, Schweitzer MH, Freimark LM, Phillips M, Cantley LC (2007) Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry. Science 316(5822):280–285. doi: 10.1126/science.1137614 CrossRefPubMedGoogle Scholar
  5. 5.
    Cappellini E, Jensen LJ, Szklarczyk D, Ginolhac A, da Fonseca RA, Stafford TW, Holen SR, Collins MJ, Orlando L, Willerslev E, Gilbert MT, Olsen JV (2012) Proteomic analysis of a pleistocene mammoth femur reveals more than one hundred ancient bone proteins. J Proteome Res 11(2):917–926. doi: 10.1021/pr200721u CrossRefPubMedGoogle Scholar
  6. 6.
    Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MB (2011) The real cost of sequencing: higher than you think! Genome Biol 12(8):125. doi: 10.1186/gb-2011-12-8-125 CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Lun AT, Swaminathan K, Wong JW, Downard KM (2013) Mass trees: a new phylogenetic approach and algorithm to chart evolutionary history with mass spectrometry. Anal Chem 85(11):5475–5482. doi: 10.1021/ac4005875 CrossRefPubMedGoogle Scholar
  8. 8.
    Ma S, Downard KM, Wong JW (2015) FluClass: a novel algorithm and approach to score and visualize the phylogeny of the influenza virus using mass spectrometry. Anal Chim Acta 895:54–61. doi: 10.1016/j.aca.2015.09.004 CrossRefPubMedGoogle Scholar
  9. 9.
    Swaminathan K, Downard KM (2014) Evolution of influenza neuraminidase and the detection of antiviral resistant strains using mass trees. Anal Chem 86(1):629–637. doi: 10.1021/ac402892m CrossRefPubMedGoogle Scholar
  10. 10.
    Edman P (1949) A method for the determination of amino acid sequence in peptides. Arch Biochem 22(3):475PubMedGoogle Scholar
  11. 11.
    Prager EM, Welling GW, Wilson AC (1978) Comparison of various immunological methods for distinguishing among mammalian pancreatic ribonucleases of known amino acid sequence. J Mol Evol 10(4):293–307CrossRefPubMedGoogle Scholar
  12. 12.
    Harris H (1966) Enzyme polymorphisms in man. Proc R Soc Lond B Biol Sci 164(995):298–310CrossRefPubMedGoogle Scholar
  13. 13.
    Downard KM (2013) Proteotyping for the rapid identification of influenza virus and other biopathogens. Chem Soc Rev 42(22):8584–8595. doi: 10.1039/c3cs60081e CrossRefPubMedGoogle Scholar
  14. 14.
    Lun AT, Wong JW, Downard KM (2012) FluShuffle and FluResort: new algorithms to identify reassorted strains of the influenza virus by mass spectrometry. BMC Bioinformatics 13:208. doi: 10.1186/1471-2105-13-208 CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Schwahn AB, Wong JW, Downard KM (2009) Subtyping of the influenza virus by high resolution mass spectrometry. Anal Chem 81(9):3500–3506. doi: 10.1021/ac900026f CrossRefPubMedGoogle Scholar
  16. 16.
    Wong JW, Schwahn AB, Downard KM (2010) FluTyper-an algorithm for automated typing and subtyping of the influenza virus from high resolution mass spectral data. BMC Bioinformatics 11:266. doi: 10.1186/1471-2105-11-266 CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Frank AM (2009) Predicting intensity ranks of peptide fragment ions. J Proteome Res 8(5):2226–2240. doi: 10.1021/pr800677f CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Han MV, Zmasek CM (2009) PhyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 10:356. doi: 10.1186/1471-2105-10-356 CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S, Gatto L, Fischer B, Pratt B, Egertson J, Hoff K, Kessner D, Tasman N, Shulman N, Frewen B, Baker TA, Brusniak MY, Paulse C, Creasy D, Flashner L, Kani K, Moulding C, Seymour SL, Nuwaysir LM, Lefebvre B, Kuhlmann F, Roark J, Rainer P, Detlev S, Hemenway T, Huhmer A, Langridge J, Connolly B, Chadick T, Holly K, Eckels J, Deutsch EW, Moritz RL, Katz JE, Agus DB, MacCoss M, Tabb DL, Mallick P (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30(10):918–920. doi: 10.1038/nbt.2377 CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Perkins DN, Pappin DJ, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18):3551–3567. doi: 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2 CrossRefPubMedGoogle Scholar
  21. 21.
    Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197CrossRefPubMedGoogle Scholar
  22. 22.
    UniProt C (2015) UniProt: a hub for protein information. Nucleic Acids Res 43(Database issue):D204–D212. doi: 10.1093/nar/gku989 Google Scholar
  23. 23.
    Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23(21):2947–2948. doi: 10.1093/bioinformatics/btm404 CrossRefPubMedGoogle Scholar
  24. 24.
    Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, Thompson JD, Higgins DG (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539. doi: 10.1038/msb.2011.75 CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. doi: 10.1093/nar/gkh340 CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Bandeira N, Clauser KR, Pevzner PA (2007) Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins. Mol Cell Proteomics 6(7):1123–1134. doi: 10.1074/mcp.M700001-MCP200 CrossRefPubMedGoogle Scholar
  27. 27.
    Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89(22):10915–10919CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Hall BG (2005) Comparison of the accuracies of several phylogenetic methods using protein and DNA sequences. Mol Biol Evol 22(3):792–802. doi: 10.1093/molbev/msi066 CrossRefPubMedGoogle Scholar
  29. 29.
    Yang Z, Rannala B (2012) Molecular phylogenetics: principles and practice. Nat Rev Genet 13(5):303–314. doi: 10.1038/nrg3186 CrossRefPubMedGoogle Scholar
  30. 30.
    Hoopmann MR, Finney GL, MacCoss MJ (2007) High-speed data reduction, feature detection, and MS/MS spectrum quality assessment of shotgun proteomics data sets using high-resolution mass spectrometry. Anal Chem 79(15):5620–5632. doi: 10.1021/ac0700833 CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Brown TA (2002) Molecular phylogenetics. In: Genomes. Wiley-Liss, OxfordGoogle Scholar
  32. 32.
    Gupta RS (1998) Protein phylogenies and signature sequences: a reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. Microbiol Mol Biol Rev 62(4):1435–1491PubMedPubMedCentralGoogle Scholar
  33. 33.
    Thompson JD, Linard B, Lecompte O, Poch O (2011) A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 6(3), e18093. doi: 10.1371/journal.pone.0018093 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  • Shiyong Ma
    • 1
    • 2
  • Kevin M. Downard
    • 1
    • 2
  • Jason W. H. Wong
    • 1
    • 2
  1. 1.Prince of Wales Clinical SchoolUNSW AustraliaSydneyAustralia
  2. 2.Lowy Cancer Research Centre, UNSWKensingtonAustralia

Personalised recommendations