Using Phylogenetic Profiles to Predict Functional Relationships

  • Matteo Pellegrini
Part of the Methods in Molecular Biology book series (MIMB, volume 804)


Phylogenetic profiling involves the comparison of phylogenetic data across gene families. It is possible to construct phylogenetic trees, or related data structures, for specific gene families using a wide variety of tools and approaches. Phylogenetic profiling involves the comparison of this data to determine which families have correlated or coupled evolution. The underlying assumption is that in certain cases these couplings may allow us to infer that the two families are functionally related: that is their function in the cell is coupled. Although this technique can be applied to noncoding genes, it is more commonly used to assess the function of protein coding genes. Examples of proteins that are functionally related include subunits of protein complexes, or enzymes that perform consecutive steps along biochemical pathways. We hypothesize the deletion of one of the families from a genome would then indirectly affect the function of the other. Dozens of different implementations of the phylogenetic profiling technique have been developed over the past decade. These range from the first simple approaches that describe phylogenetic profiles as binary vectors to the most complex ones that attempt to model to the coevolution of protein families on a phylogenetic tree. We discuss a set of these implementations and present the software and databases that are available to perform phylogenetic profiling.

Key words

Phylogenetic profiles Coevolution Functional associations Comparative genomics Coevolving proteins 



The author wishes to acknowledge the UCLA-DOE Institute for Genomics and proteomics for support.


  1. 1.
    Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A, 96:4285–4288.PubMedCrossRefGoogle Scholar
  2. 2.
    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25:3389–3402.PubMedCrossRefGoogle Scholar
  3. 3.
    Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics, 4:41.PubMedCrossRefGoogle Scholar
  4. 4.
    Bowers PM, Cokus SJ, Eisenberg D, Yeates TO. (2004) Use of logic relationships to decipher protein network organization. Science, 306:2246–2249.PubMedCrossRefGoogle Scholar
  5. 5.
    Date SV, Marcotte EM. (2003) Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat Biotechnol, 21:1055–1062.PubMedCrossRefGoogle Scholar
  6. 6.
    Pazos F, Valencia A. (2001) Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng, 14:609–614.PubMedCrossRefGoogle Scholar
  7. 7.
    Wu J, Kasif S, DeLisi C. (2003) Identification of functional links between genes using phylogenetic profiles. Bioinformatics, 19: 1524–1530.PubMedCrossRefGoogle Scholar
  8. 8.
    Kharchenko P, Chen L, Freund Y, Vitkup D, Church GM. (2006) Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics, 7:177.PubMedCrossRefGoogle Scholar
  9. 9.
    Liberles DA. (2001) Evaluation of methods for determination of a reconstructed history of gene sequence evolution. Mol Biol Evol, 18:2040–2047.PubMedCrossRefGoogle Scholar
  10. 10.
    Barker D, Pagel M. (2005) Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Comput Biol, 1:e3.PubMedCrossRefGoogle Scholar
  11. 11.
    Barker D, Meade A, Pagel M. (2007) Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes. Bioinformatics, 23:14–20.PubMedCrossRefGoogle Scholar
  12. 12.
    Cokus S, Mizutani S, Pellegrini M. (2007) An improved method for identifying functionally linked proteins using phylogenetic profiles. BMC Bioinformatics, 8(Suppl 4):S7.Google Scholar
  13. 13.
    Bar-Joseph Z, Gifford DK, Jaakkola TS. (2001) Fast optimal leaf ordering for hierarchical clustering. Bioinformatics, 17(Suppl 1):S22–S29.PubMedCrossRefGoogle Scholar
  14. 14.
    Sun J, Li Y, Zhao Z. (2007) Phylogenetic profiles for the prediction of protein-protein interactions: how to select reference organisms? Biochem Biophys Res Commun, 353:985–991.PubMedCrossRefGoogle Scholar
  15. 15.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25:25–29.Google Scholar
  16. 16.
    Jothi R, Przytycka TM, Aravind L. (2007) Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics, 8:173.PubMedCrossRefGoogle Scholar
  17. 17.
    Kensche PR, van Noort V, Dutilh BE, Huynen MA. (2008) Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution. J R Soc Interface, 5:151–170.PubMedCrossRefGoogle Scholar
  18. 18.
    Li H, Pellegrini M, Eisenberg D. (2005) Detection of parallel functional modules by comparative analysis of genome sequences. Nat Biotechnol, 23:253–260.PubMedCrossRefGoogle Scholar
  19. 19.
    Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C. (2009) STRING 8 – a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res, 37:D412–D416.PubMedCrossRefGoogle Scholar
  20. 20.
    Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D. (2004) Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol, 5:R35.PubMedCrossRefGoogle Scholar
  21. 21.
    Date SV, Marcotte EM. (2005) Protein function prediction using the Protein Link EXplorer (PLEX). Bioinformatics, 21:2558–2559.PubMedCrossRefGoogle Scholar
  22. 22.
    Hu Z, Hung JH, Wang Y, Chang YC, Huang CL, Huyck M, DeLisi C. (2009) VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology. Nucleic Acids Res, 37:W115–W121.PubMedCrossRefGoogle Scholar
  23. 23.
    Dandekar T, Snel B, Huynen M, Bork P. (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci, 23:324–328.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Molecular, Cell and Developmental BiologyUniversity of CaliforniaLos AngelesUSA

Personalised recommendations