Advertisement

Ortholog Detection Using the Reciprocal Smallest Distance Algorithm

Protocol
Part of the Methods In Molecular Biology™ book series (MIMB, volume 396)

Summary

All protein coding genes have a phylogenetic history that when understood can lead to deep insights into the diversification or conservation of function, the evolution of developmental complexity, and the molecular basis of disease. One important part to reconstructing the relationships among genes in different organisms is an accurate method to find orthologs as well as an accurate measure of evolutionary diversification. The present chapter details such a method, called the reciprocal smallest distance algorithm (RSD). This approach improves upon the common procedure of taking reciprocal best Basic Local Alignment Search Tool hits (RBH) in the identification of orthologs by using global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes. RSD finds many putative orthologs missed by RBH because it is less likely to be misled by the presence of close paralogs in genomes. The package offers a tremendous amount of flexibility in investigating parameter settings allowing the user to search for increasingly distant orthologs between highly divergent species, among other advantages. The flexibility of this tool makes it a unique and powerful addition to other available approaches for ortholog detection.

Keywords

Orthologs orthology reciprocal smallest distance reciprocal BLAST hit maximum likelihood molecular phylogenetics phylogeny 

Notes

Acknowledgments

Many thanks to past and present members of the Computational Biology Initiative who provided advice and expertise, I-Hsien Wu, Tom Monaghan, Jian Pu, Saurav Singh, and Leon Peshkin. This material is based upon work supported by the National Science Foundation under Grant No. DBI 0543480.

References

  1. 1.
    Bernal, A., Ear, U., and Kyrpides N. (2001) Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 29, 126–127.CrossRefPubMedGoogle Scholar
  2. 2.
    Turchin, A. and Kohane, I. S. (2002) Gene homology resources on the World Wide Web. Physiol Genomics 11, 165–177.PubMedGoogle Scholar
  3. 3.
    Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T. O. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288.Google Scholar
  4. 4.
    Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice, D. W., Yeates, T. O., and Eisenberg, D. (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753.CrossRefPubMedGoogle Scholar
  5. 5.
    Dacks, J. B. and Doolittle, W. F. (2001) Reconstructing/deconstructing the earliest eukaryotes: how comparative genomics can help. Cell 107, 419–425.CrossRefPubMedGoogle Scholar
  6. 6.
    Lang, B. F., Seif, E., Gray, M. W., O’Kelly, C. J., and Burger, G. (1999) A comparative genomics approach to the evolution of eukaryotes and their mitochondria. J. Eukaryot. Microbiol. 46, 320–326.CrossRefPubMedGoogle Scholar
  7. 7.
    Rubin, G. M., Yandell, M. D., Wortman, J. R., et al. (2000) Comparative genomics of the eukaryotes. Science 287, 2204–2215.CrossRefPubMedGoogle Scholar
  8. 8.
    Ureta-Vidal, A., Ettwiller, L., and Birney, E. (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat. Rev. Genet. 4, 251–262.CrossRefPubMedGoogle Scholar
  9. 9.
    Espadaler, J., Aragues, R., Eswar, N., et al. (2005) Detecting remotely related proteins by their interactions and sequence similarity. Proc. Natl. Acad. Sci. USA 102, 7151–7156.CrossRefPubMedGoogle Scholar
  10. 10.
    Espadaler, J., Romero-Isart, O., Jackson, R. M., and Oliva, B. (2005) Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics 21, 3360–3368.CrossRefPubMedGoogle Scholar
  11. 11.
    Matthews, L. R., Vaglio, P., Reboul, J., et al. (2001) Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or “interologs”. Genome Res. 11, 2120–2126.CrossRefPubMedGoogle Scholar
  12. 12.
    Kim, W. K., Bolser, D. M., and Park, J. H. (2004) Large-scale co-evolution analysis of protein structural interlogues using the global protein structural interactome map (PSIMAP). Bioinformatics 20, 1138–1150.CrossRefPubMedGoogle Scholar
  13. 13.
    O’Brien, K. P., Westerlund, I., and Sonnhammer, E. L. (2004) OrthoDisease: a database of human disease orthologs. Hum. Mutat. 24, 112–119.CrossRefPubMedGoogle Scholar
  14. 14.
    Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–637.CrossRefPubMedGoogle Scholar
  15. 15.
    Fraser, H. B., Hirsh, A. E., Wall, D. P., and Eisen, M. B. (2004) Coevolution of gene expression among interacting proteins. Proc. Natl. Acad. Sci. USA 101, 9033–9038.CrossRefPubMedGoogle Scholar
  16. 16.
    Kulathinal, R. J., Bettencourt, B. R., and Hartl, D. L. (2004) Compensated deleterious mutations in insect genomes. Science 306, 1553–1554.CrossRefPubMedGoogle Scholar
  17. 17.
    Kondrashov, A. S., Sunyaev, S., and Kondrashov, F. A. (2002) Dobzhansky-Muller incompatibilities in protein evolution. Proc. Natl. Acad. Sci. USA 99, 14,878–14,883.CrossRefGoogle Scholar
  18. 18.
    Sunyaev, S., Kondrashov, F. A., Bork, P., and Ramensky, V. (2003) Impact of selection, mutation rate and genetic drift on human genetic variation. Hum. Mol. Genet. 12, 3325–3330.CrossRefPubMedGoogle Scholar
  19. 19.
    Sunyaev, S., Ramensky, V., Koch, I., Lathe, W., 3rd, Kondrashov, A. S., and Bork, P. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597.CrossRefPubMedGoogle Scholar
  20. 20.
    Fraser, H. B., Wall, D. P., and Hirsh, A. E. (2003) A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol. Biol. 3, 11.CrossRefPubMedGoogle Scholar
  21. 21.
    Herbeck, J. T. and Wall, D. P. (2005) Converging on a general model of protein evolution. Trends Biotechnol. 23, 485–487.CrossRefPubMedGoogle Scholar
  22. 22.
    Wall, D. P., Hirsh, A. E., Fraser, H. B., et al. (2005) Functional genomic analysis of the rates of protein evolution. Proc. Natl. Acad. Sci. USA 102, 5483–5488.CrossRefPubMedGoogle Scholar
  23. 23.
    Hirsh, A. E. and Fraser, H. B. (2001) Protein dispensability and rate of evolution. Nature 411, 1046–1049.CrossRefPubMedGoogle Scholar
  24. 24.
    Hurst, L. D. and Smith, N. G. (1999) Do essential genes evolve slowly? Curr. Biol. 9, 747–750.CrossRefPubMedGoogle Scholar
  25. 25.
    Jordan, I. K., Rogozin, I. B., Wolf, Y. I., and Koonin, E. V. (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12, 962–968.PubMedGoogle Scholar
  26. 26.
    Krylov, D. M., Wolf, Y. I., Rogozin, I. B., and Koonin, E. V. (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 13, 2229–2235.CrossRefPubMedGoogle Scholar
  27. 27.
    Yang, J., Gu, Z., and Li, W. H. (2003) Rate of protein evolution versus fitness effect of gene deletion. Mol. Bol. Evol. 20, 772–774.CrossRefGoogle Scholar
  28. 28.
    Zhang, J. and He, X. (2005) Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol. Biol. Evol. 22, 1147–1155.CrossRefPubMedGoogle Scholar
  29. 29.
    Rocha, E. P. and Danchin, A. (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol. Biol. Evol. 21, 108–116.CrossRefPubMedGoogle Scholar
  30. 30.
    Koonin, E. V., Aravind, L., and Kondrashov, A. S. (2000) The impact of comparative genomics on our understanding of evolution. Cell 101, 573–576.CrossRefPubMedGoogle Scholar
  31. 31.
    Koski, L. B. and Golding, G. B. (2001) The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol. 52, 540–542.PubMedGoogle Scholar
  32. 32.
    Tatusov, R. L., Galperin, M. Y., Natale, D. A., and Koonin, E. V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36.CrossRefPubMedGoogle Scholar
  33. 33.
    O’Brien, K. P., Remm, M., and Sonnhammer, E. L. (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33, D476–D480.CrossRefPubMedGoogle Scholar
  34. 34.
    Remm, M., Storm, C. E., and Sonnhammer, E. L. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052.CrossRefPubMedGoogle Scholar
  35. 35.
    Wheeler, D. L., Barrett, T., Benson, D. A., et al. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 33, D39–D45.CrossRefPubMedGoogle Scholar
  36. 36.
    Wall, D. P., Fraser, H. B., and Hirsh, A. E. (2003) Detecting putative orthologs. Bioinformatics 19, 1710–1711.CrossRefPubMedGoogle Scholar
  37. 37.
    Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  38. 38.
    Chenna, R., Sugawara, H., Koike, T., et al. (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 3497–3500.CrossRefPubMedGoogle Scholar
  39. 39.
    Yang, Z. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556.PubMedGoogle Scholar
  40. 40.
    Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282.PubMedGoogle Scholar
  41. 41.
    Nei, M., Xu, P., and Glazko G. (2001) Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms. Proc. Natl. Acad. Sci. USA 98, 2497–2502.CrossRefPubMedGoogle Scholar
  42. 42.
    Degnan, P. H., Lazarus, A. B., and Wernegreen, J. J. (2005) Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res. 15, 1023–1033.CrossRefPubMedGoogle Scholar
  43. 43.
    Gasch, A. P., Moses, A. M., Chiang, D. Y., Fraser, H. B., Berardini, M., and Eisen, M. B. (2004) Conservation and evolution of cis-regulatory systems in ascomycete fungi. PLoS Biol. 2, e398.CrossRefPubMedGoogle Scholar
  44. 44.
    Nayak, S., Goree, J., and Schedl T. (2005) fog-2 and the evolution of self-fertile hermaphroditism in Caenorhabditis. PLoS Biol. 3, e6.Google Scholar
  45. 45.
    Wu, H., Su, Z., Mao, F., Olman, V., and Xu, Y. (2005) Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res. 33, 2822–2837.CrossRefPubMedGoogle Scholar
  46. 46.
    Wuchty, S. (2004) Evolution and topology in the yeast protein interaction network. Genome Res. 14, 1310–1314.CrossRefPubMedGoogle Scholar
  47. 47.
    Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.CrossRefPubMedGoogle Scholar
  48. 48.
    Birney, E., Andrews, D., and Caccamo, M. (2006) Ensembl 2006. Nucleic Acids Res. 34, D556–D5561.CrossRefPubMedGoogle Scholar
  49. 49.
    Liolios, K., Tavernarakis, N., Hugenholtz, P., and Kyrpides, N. C. (2006) The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res. 34, D332–D334.CrossRefPubMedGoogle Scholar
  50. 50.
    Bastien, O., Roy, S., and Marechal, E. (2005) Construction of non-symmetric substitution matrices derived from proteomes with biased amino acid distributions. C R Biol. 328, 445–453.CrossRefPubMedGoogle Scholar
  51. 51.
    Olsen, R. and Loomis, W. F. (2005) A collection of amino acid replacement matrices derived from clusters of orthologs. J. Mol. Evol. 61, 659–665.CrossRefPubMedGoogle Scholar
  52. 52.
    Fitch, W. M. (1970) Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc. 2007

Authors and Affiliations

  1. 1.Department of Systems BiologyHarvard Medical SchoolHarvard
  2. 2.Department of Systems BiologyHarvard Medical SchoolUSA

Personalised recommendations