Phylogenetic and Other Conservation-Based Approaches to Predict Protein Functional Sites

  • Heval Atas
  • Nurcan Tuncbag
  • Tunca DoğanEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1762)


Proteins use their functional regions to exploit various activities, including binding to other proteins, nucleic acids, or drugs. Functional sites of the proteins have a tendency to be more conserved than the rest of the protein surface. Therefore, detection of the conserved residues using phylogenetic analysis is a general approach to predict functionally critical residues. In this chapter, we describe some of the available methods to predict functional sites and demonstrate a complete pipeline with tool alternatives at several steps. We explain the standard procedure and all intermediate stages including homology detection with BLAST search, multiple sequence alignment (MSA) and the construction of a phylogenetic tree for a given query sequence. Additionally, we demonstrate the prediction results of these methods on a case study. Finally, we discuss the possible challenges and bottlenecks throughout the pipeline. Our step-by-step description about the functional site prediction could be a helpful resource for the researchers interested in finding protein functional sites, to be used in drug discovery research.

Key words

Drug discovery Evolutionary conservation Functional site Multiple sequence alignment Phylogenetic analysis Predictive approaches 



H.A. acknowledges TUBITAK 2211 Doctoral Fellowship Program. N.T. thanks to the TUBITAK Career Development Program (Project no: 117E192). T.D. acknowledges TUBITAK BIDEB 2218 Program.


  1. 1.
    The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158–D169. CrossRefGoogle Scholar
  2. 2.
    Keskin O, Tuncbag N, Gursoy A (2016) Predicting protein-protein interactions from the molecular to the proteome level. Chem Rev 116:4884–4909. CrossRefPubMedGoogle Scholar
  3. 3.
    Pazos F, Bang J-W (2006) Computational prediction of functionally important regions in proteins. Curr Bioinforma 1:15–23. CrossRefGoogle Scholar
  4. 4.
    Capra JA, Singh M (2007) Predicting functionally important residues from sequence conservation. Bioinformatics 23:1875–1882. CrossRefPubMedGoogle Scholar
  5. 5.
    Keskin O, Tsai C-J, Wolfson H, Nussinov R (2004) A new, structurally nonredundant, diverse data set of protein-protein interfaces and its implications. Protein Sci 13:1043–1055. CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Tuncbag N, Gursoy A, Guney E et al (2008) Architectures and functional coverage of protein–protein interfaces. J Mol Biol 381:785–802. CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Bray T, Chan P, Bougouffa S et al (2009) SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 10:379. CrossRefPubMedPubMedCentralGoogle Scholar
  8. 8.
    Kc DB, Livesay DR (2011) Topology improves phylogenetic motif functional site predictions. IEEE/ACM Trans Comput Biol Bioinform 8:226–233. CrossRefPubMedGoogle Scholar
  9. 9.
    Pazos F, Valencia A (2001) Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng 14:609–614. CrossRefPubMedGoogle Scholar
  10. 10.
    Lichtarge O, Bourne HR, Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342–358. CrossRefPubMedGoogle Scholar
  11. 11.
    Landau M, Mayrose I, Rosenberg Y et al (2005) ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res 33:W299–W302. CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Ashkenazy H, Abadi S, Martz E et al (2016) ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 44:W344–W350. CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Sankararaman S, Sjölander K (2008) INTREPID--INformation-theoretic TREe traversal for protein functional site IDentification. Bioinformatics 24:2445–2452. CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Lua RC, Wilson SJ, Konecki DM et al (2016) UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures. Nucleic Acids Res 44:D308–D312. CrossRefPubMedGoogle Scholar
  15. 15.
    Ofran Y, Rost B (2007) ISIS: interaction sites identified from sequence. Bioinformatics 23:e13–e16. CrossRefPubMedGoogle Scholar
  16. 16.
    de Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co-evolution. Nat Rev Genet 14:249–261. CrossRefPubMedGoogle Scholar
  17. 17.
    Hopf TA, Schärfe CPI, Rodrigues JPGLM et al (2014) Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3:1–45. CrossRefGoogle Scholar
  18. 18.
    Rodriguez-Rivas J, Marsili S, Juan D, Valencia A (2016) Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone. Proc Natl Acad Sci U S A 113:15018–15023. CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Gueudré T, Baldassi C, Zamparo M et al (2016) Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc Natl Acad Sci U S A 113:12186–12191. CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Baker FN, Porollo A (2016) CoeViz: a web-based tool for coevolution analysis of protein residues. BMC Bioinformatics 17:119. CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Huntley RP, Sawford T, Mutowo-Meullenet P et al (2015) The GOA database: gene ontology annotation updates for 2015. Nucleic Acids Res 43:D1057–D1063. CrossRefPubMedGoogle Scholar
  22. 22.
    Berezin C, Glaser F, Rosenberg J et al (2004) ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics 20:1322–1324. CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Health Informatics, Graduate School of InformaticsMETUAnkaraTurkey
  2. 2.Cancer Systems Biology Laboratory (CanSyL)METUAnkaraTurkey
  3. 3.European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL-EBI)CambridgeUK

Personalised recommendations