Skip to main content

Ortholog Detection Using the Reciprocal Smallest Distance Algorithm

  • Protocol
Comparative Genomics

Part of the book series: Methods In Molecular Biology™ ((MIMB,volume 396))

Summary

All protein coding genes have a phylogenetic history that when understood can lead to deep insights into the diversification or conservation of function, the evolution of developmental complexity, and the molecular basis of disease. One important part to reconstructing the relationships among genes in different organisms is an accurate method to find orthologs as well as an accurate measure of evolutionary diversification. The present chapter details such a method, called the reciprocal smallest distance algorithm (RSD). This approach improves upon the common procedure of taking reciprocal best Basic Local Alignment Search Tool hits (RBH) in the identification of orthologs by using global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes. RSD finds many putative orthologs missed by RBH because it is less likely to be misled by the presence of close paralogs in genomes. The package offers a tremendous amount of flexibility in investigating parameter settings allowing the user to search for increasingly distant orthologs between highly divergent species, among other advantages. The flexibility of this tool makes it a unique and powerful addition to other available approaches for ortholog detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bernal, A., Ear, U., and Kyrpides N. (2001) Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 29, 126–127.

    Article  CAS  PubMed  Google Scholar 

  2. Turchin, A. and Kohane, I. S. (2002) Gene homology resources on the World Wide Web. Physiol Genomics 11, 165–177.

    CAS  PubMed  Google Scholar 

  3. Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T. O. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288.

    Google Scholar 

  4. Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice, D. W., Yeates, T. O., and Eisenberg, D. (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753.

    Article  CAS  PubMed  Google Scholar 

  5. Dacks, J. B. and Doolittle, W. F. (2001) Reconstructing/deconstructing the earliest eukaryotes: how comparative genomics can help. Cell 107, 419–425.

    Article  CAS  PubMed  Google Scholar 

  6. Lang, B. F., Seif, E., Gray, M. W., O’Kelly, C. J., and Burger, G. (1999) A comparative genomics approach to the evolution of eukaryotes and their mitochondria. J. Eukaryot. Microbiol. 46, 320–326.

    Article  CAS  PubMed  Google Scholar 

  7. Rubin, G. M., Yandell, M. D., Wortman, J. R., et al. (2000) Comparative genomics of the eukaryotes. Science 287, 2204–2215.

    Article  CAS  PubMed  Google Scholar 

  8. Ureta-Vidal, A., Ettwiller, L., and Birney, E. (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat. Rev. Genet. 4, 251–262.

    Article  CAS  PubMed  Google Scholar 

  9. Espadaler, J., Aragues, R., Eswar, N., et al. (2005) Detecting remotely related proteins by their interactions and sequence similarity. Proc. Natl. Acad. Sci. USA 102, 7151–7156.

    Article  CAS  PubMed  Google Scholar 

  10. Espadaler, J., Romero-Isart, O., Jackson, R. M., and Oliva, B. (2005) Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics 21, 3360–3368.

    Article  CAS  PubMed  Google Scholar 

  11. Matthews, L. R., Vaglio, P., Reboul, J., et al. (2001) Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or “interologs”. Genome Res. 11, 2120–2126.

    Article  CAS  PubMed  Google Scholar 

  12. Kim, W. K., Bolser, D. M., and Park, J. H. (2004) Large-scale co-evolution analysis of protein structural interlogues using the global protein structural interactome map (PSIMAP). Bioinformatics 20, 1138–1150.

    Article  CAS  PubMed  Google Scholar 

  13. O’Brien, K. P., Westerlund, I., and Sonnhammer, E. L. (2004) OrthoDisease: a database of human disease orthologs. Hum. Mutat. 24, 112–119.

    Article  PubMed  Google Scholar 

  14. Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–637.

    Article  CAS  PubMed  Google Scholar 

  15. Fraser, H. B., Hirsh, A. E., Wall, D. P., and Eisen, M. B. (2004) Coevolution of gene expression among interacting proteins. Proc. Natl. Acad. Sci. USA 101, 9033–9038.

    Article  CAS  PubMed  Google Scholar 

  16. Kulathinal, R. J., Bettencourt, B. R., and Hartl, D. L. (2004) Compensated deleterious mutations in insect genomes. Science 306, 1553–1554.

    Article  CAS  PubMed  Google Scholar 

  17. Kondrashov, A. S., Sunyaev, S., and Kondrashov, F. A. (2002) Dobzhansky-Muller incompatibilities in protein evolution. Proc. Natl. Acad. Sci. USA 99, 14,878–14,883.

    Article  CAS  Google Scholar 

  18. Sunyaev, S., Kondrashov, F. A., Bork, P., and Ramensky, V. (2003) Impact of selection, mutation rate and genetic drift on human genetic variation. Hum. Mol. Genet. 12, 3325–3330.

    Article  CAS  PubMed  Google Scholar 

  19. Sunyaev, S., Ramensky, V., Koch, I., Lathe, W., 3rd, Kondrashov, A. S., and Bork, P. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597.

    Article  CAS  PubMed  Google Scholar 

  20. Fraser, H. B., Wall, D. P., and Hirsh, A. E. (2003) A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol. Biol. 3, 11.

    Article  PubMed  Google Scholar 

  21. Herbeck, J. T. and Wall, D. P. (2005) Converging on a general model of protein evolution. Trends Biotechnol. 23, 485–487.

    Article  CAS  PubMed  Google Scholar 

  22. Wall, D. P., Hirsh, A. E., Fraser, H. B., et al. (2005) Functional genomic analysis of the rates of protein evolution. Proc. Natl. Acad. Sci. USA 102, 5483–5488.

    Article  CAS  PubMed  Google Scholar 

  23. Hirsh, A. E. and Fraser, H. B. (2001) Protein dispensability and rate of evolution. Nature 411, 1046–1049.

    Article  CAS  PubMed  Google Scholar 

  24. Hurst, L. D. and Smith, N. G. (1999) Do essential genes evolve slowly? Curr. Biol. 9, 747–750.

    Article  CAS  PubMed  Google Scholar 

  25. Jordan, I. K., Rogozin, I. B., Wolf, Y. I., and Koonin, E. V. (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12, 962–968.

    CAS  PubMed  Google Scholar 

  26. Krylov, D. M., Wolf, Y. I., Rogozin, I. B., and Koonin, E. V. (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 13, 2229–2235.

    Article  CAS  PubMed  Google Scholar 

  27. Yang, J., Gu, Z., and Li, W. H. (2003) Rate of protein evolution versus fitness effect of gene deletion. Mol. Bol. Evol. 20, 772–774.

    Article  Google Scholar 

  28. Zhang, J. and He, X. (2005) Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol. Biol. Evol. 22, 1147–1155.

    Article  CAS  PubMed  Google Scholar 

  29. Rocha, E. P. and Danchin, A. (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol. Biol. Evol. 21, 108–116.

    Article  CAS  PubMed  Google Scholar 

  30. Koonin, E. V., Aravind, L., and Kondrashov, A. S. (2000) The impact of comparative genomics on our understanding of evolution. Cell 101, 573–576.

    Article  CAS  PubMed  Google Scholar 

  31. Koski, L. B. and Golding, G. B. (2001) The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol. 52, 540–542.

    CAS  PubMed  Google Scholar 

  32. Tatusov, R. L., Galperin, M. Y., Natale, D. A., and Koonin, E. V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36.

    Article  CAS  PubMed  Google Scholar 

  33. O’Brien, K. P., Remm, M., and Sonnhammer, E. L. (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33, D476–D480.

    Article  PubMed  Google Scholar 

  34. Remm, M., Storm, C. E., and Sonnhammer, E. L. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052.

    Article  CAS  PubMed  Google Scholar 

  35. Wheeler, D. L., Barrett, T., Benson, D. A., et al. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 33, D39–D45.

    Article  CAS  PubMed  Google Scholar 

  36. Wall, D. P., Fraser, H. B., and Hirsh, A. E. (2003) Detecting putative orthologs. Bioinformatics 19, 1710–1711.

    Article  CAS  PubMed  Google Scholar 

  37. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.

    CAS  PubMed  Google Scholar 

  38. Chenna, R., Sugawara, H., Koike, T., et al. (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 3497–3500.

    Article  CAS  PubMed  Google Scholar 

  39. Yang, Z. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556.

    CAS  PubMed  Google Scholar 

  40. Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282.

    CAS  PubMed  Google Scholar 

  41. Nei, M., Xu, P., and Glazko G. (2001) Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms. Proc. Natl. Acad. Sci. USA 98, 2497–2502.

    Article  CAS  PubMed  Google Scholar 

  42. Degnan, P. H., Lazarus, A. B., and Wernegreen, J. J. (2005) Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res. 15, 1023–1033.

    Article  CAS  PubMed  Google Scholar 

  43. Gasch, A. P., Moses, A. M., Chiang, D. Y., Fraser, H. B., Berardini, M., and Eisen, M. B. (2004) Conservation and evolution of cis-regulatory systems in ascomycete fungi. PLoS Biol. 2, e398.

    Article  PubMed  Google Scholar 

  44. Nayak, S., Goree, J., and Schedl T. (2005) fog-2 and the evolution of self-fertile hermaphroditism in Caenorhabditis. PLoS Biol. 3, e6.

    Google Scholar 

  45. Wu, H., Su, Z., Mao, F., Olman, V., and Xu, Y. (2005) Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res. 33, 2822–2837.

    Article  CAS  PubMed  Google Scholar 

  46. Wuchty, S. (2004) Evolution and topology in the yeast protein interaction network. Genome Res. 14, 1310–1314.

    Article  CAS  PubMed  Google Scholar 

  47. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.

    Article  CAS  PubMed  Google Scholar 

  48. Birney, E., Andrews, D., and Caccamo, M. (2006) Ensembl 2006. Nucleic Acids Res. 34, D556–D5561.

    Article  CAS  PubMed  Google Scholar 

  49. Liolios, K., Tavernarakis, N., Hugenholtz, P., and Kyrpides, N. C. (2006) The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res. 34, D332–D334.

    Article  CAS  PubMed  Google Scholar 

  50. Bastien, O., Roy, S., and Marechal, E. (2005) Construction of non-symmetric substitution matrices derived from proteomes with biased amino acid distributions. C R Biol. 328, 445–453.

    Article  CAS  PubMed  Google Scholar 

  51. Olsen, R. and Loomis, W. F. (2005) A collection of amino acid replacement matrices derived from clusters of orthologs. J. Mol. Evol. 61, 659–665.

    Article  CAS  PubMed  Google Scholar 

  52. Fitch, W. M. (1970) Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

Many thanks to past and present members of the Computational Biology Initiative who provided advice and expertise, I-Hsien Wu, Tom Monaghan, Jian Pu, Saurav Singh, and Leon Peshkin. This material is based upon work supported by the National Science Foundation under Grant No. DBI 0543480.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Humana Press Inc.

About this protocol

Cite this protocol

Wall, D.P., DeLuca, T. (2007). Ortholog Detection Using the Reciprocal Smallest Distance Algorithm. In: Bergman, N.H. (eds) Comparative Genomics. Methods In Molecular Biology™, vol 396. Humana Press. https://doi.org/10.1007/978-1-59745-515-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-515-2_7

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-934115-37-4

  • Online ISBN: 978-1-59745-515-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics