Summary
All protein coding genes have a phylogenetic history that when understood can lead to deep insights into the diversification or conservation of function, the evolution of developmental complexity, and the molecular basis of disease. One important part to reconstructing the relationships among genes in different organisms is an accurate method to find orthologs as well as an accurate measure of evolutionary diversification. The present chapter details such a method, called the reciprocal smallest distance algorithm (RSD). This approach improves upon the common procedure of taking reciprocal best Basic Local Alignment Search Tool hits (RBH) in the identification of orthologs by using global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes. RSD finds many putative orthologs missed by RBH because it is less likely to be misled by the presence of close paralogs in genomes. The package offers a tremendous amount of flexibility in investigating parameter settings allowing the user to search for increasingly distant orthologs between highly divergent species, among other advantages. The flexibility of this tool makes it a unique and powerful addition to other available approaches for ortholog detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bernal, A., Ear, U., and Kyrpides N. (2001) Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 29, 126–127.
Turchin, A. and Kohane, I. S. (2002) Gene homology resources on the World Wide Web. Physiol Genomics 11, 165–177.
Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T. O. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288.
Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice, D. W., Yeates, T. O., and Eisenberg, D. (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753.
Dacks, J. B. and Doolittle, W. F. (2001) Reconstructing/deconstructing the earliest eukaryotes: how comparative genomics can help. Cell 107, 419–425.
Lang, B. F., Seif, E., Gray, M. W., O’Kelly, C. J., and Burger, G. (1999) A comparative genomics approach to the evolution of eukaryotes and their mitochondria. J. Eukaryot. Microbiol. 46, 320–326.
Rubin, G. M., Yandell, M. D., Wortman, J. R., et al. (2000) Comparative genomics of the eukaryotes. Science 287, 2204–2215.
Ureta-Vidal, A., Ettwiller, L., and Birney, E. (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat. Rev. Genet. 4, 251–262.
Espadaler, J., Aragues, R., Eswar, N., et al. (2005) Detecting remotely related proteins by their interactions and sequence similarity. Proc. Natl. Acad. Sci. USA 102, 7151–7156.
Espadaler, J., Romero-Isart, O., Jackson, R. M., and Oliva, B. (2005) Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics 21, 3360–3368.
Matthews, L. R., Vaglio, P., Reboul, J., et al. (2001) Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or “interologs”. Genome Res. 11, 2120–2126.
Kim, W. K., Bolser, D. M., and Park, J. H. (2004) Large-scale co-evolution analysis of protein structural interlogues using the global protein structural interactome map (PSIMAP). Bioinformatics 20, 1138–1150.
O’Brien, K. P., Westerlund, I., and Sonnhammer, E. L. (2004) OrthoDisease: a database of human disease orthologs. Hum. Mutat. 24, 112–119.
Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–637.
Fraser, H. B., Hirsh, A. E., Wall, D. P., and Eisen, M. B. (2004) Coevolution of gene expression among interacting proteins. Proc. Natl. Acad. Sci. USA 101, 9033–9038.
Kulathinal, R. J., Bettencourt, B. R., and Hartl, D. L. (2004) Compensated deleterious mutations in insect genomes. Science 306, 1553–1554.
Kondrashov, A. S., Sunyaev, S., and Kondrashov, F. A. (2002) Dobzhansky-Muller incompatibilities in protein evolution. Proc. Natl. Acad. Sci. USA 99, 14,878–14,883.
Sunyaev, S., Kondrashov, F. A., Bork, P., and Ramensky, V. (2003) Impact of selection, mutation rate and genetic drift on human genetic variation. Hum. Mol. Genet. 12, 3325–3330.
Sunyaev, S., Ramensky, V., Koch, I., Lathe, W., 3rd, Kondrashov, A. S., and Bork, P. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597.
Fraser, H. B., Wall, D. P., and Hirsh, A. E. (2003) A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol. Biol. 3, 11.
Herbeck, J. T. and Wall, D. P. (2005) Converging on a general model of protein evolution. Trends Biotechnol. 23, 485–487.
Wall, D. P., Hirsh, A. E., Fraser, H. B., et al. (2005) Functional genomic analysis of the rates of protein evolution. Proc. Natl. Acad. Sci. USA 102, 5483–5488.
Hirsh, A. E. and Fraser, H. B. (2001) Protein dispensability and rate of evolution. Nature 411, 1046–1049.
Hurst, L. D. and Smith, N. G. (1999) Do essential genes evolve slowly? Curr. Biol. 9, 747–750.
Jordan, I. K., Rogozin, I. B., Wolf, Y. I., and Koonin, E. V. (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12, 962–968.
Krylov, D. M., Wolf, Y. I., Rogozin, I. B., and Koonin, E. V. (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 13, 2229–2235.
Yang, J., Gu, Z., and Li, W. H. (2003) Rate of protein evolution versus fitness effect of gene deletion. Mol. Bol. Evol. 20, 772–774.
Zhang, J. and He, X. (2005) Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol. Biol. Evol. 22, 1147–1155.
Rocha, E. P. and Danchin, A. (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol. Biol. Evol. 21, 108–116.
Koonin, E. V., Aravind, L., and Kondrashov, A. S. (2000) The impact of comparative genomics on our understanding of evolution. Cell 101, 573–576.
Koski, L. B. and Golding, G. B. (2001) The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol. 52, 540–542.
Tatusov, R. L., Galperin, M. Y., Natale, D. A., and Koonin, E. V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36.
O’Brien, K. P., Remm, M., and Sonnhammer, E. L. (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33, D476–D480.
Remm, M., Storm, C. E., and Sonnhammer, E. L. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052.
Wheeler, D. L., Barrett, T., Benson, D. A., et al. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 33, D39–D45.
Wall, D. P., Fraser, H. B., and Hirsh, A. E. (2003) Detecting putative orthologs. Bioinformatics 19, 1710–1711.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Chenna, R., Sugawara, H., Koike, T., et al. (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 3497–3500.
Yang, Z. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556.
Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282.
Nei, M., Xu, P., and Glazko G. (2001) Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms. Proc. Natl. Acad. Sci. USA 98, 2497–2502.
Degnan, P. H., Lazarus, A. B., and Wernegreen, J. J. (2005) Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res. 15, 1023–1033.
Gasch, A. P., Moses, A. M., Chiang, D. Y., Fraser, H. B., Berardini, M., and Eisen, M. B. (2004) Conservation and evolution of cis-regulatory systems in ascomycete fungi. PLoS Biol. 2, e398.
Nayak, S., Goree, J., and Schedl T. (2005) fog-2 and the evolution of self-fertile hermaphroditism in Caenorhabditis. PLoS Biol. 3, e6.
Wu, H., Su, Z., Mao, F., Olman, V., and Xu, Y. (2005) Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res. 33, 2822–2837.
Wuchty, S. (2004) Evolution and topology in the yeast protein interaction network. Genome Res. 14, 1310–1314.
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.
Birney, E., Andrews, D., and Caccamo, M. (2006) Ensembl 2006. Nucleic Acids Res. 34, D556–D5561.
Liolios, K., Tavernarakis, N., Hugenholtz, P., and Kyrpides, N. C. (2006) The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res. 34, D332–D334.
Bastien, O., Roy, S., and Marechal, E. (2005) Construction of non-symmetric substitution matrices derived from proteomes with biased amino acid distributions. C R Biol. 328, 445–453.
Olsen, R. and Loomis, W. F. (2005) A collection of amino acid replacement matrices derived from clusters of orthologs. J. Mol. Evol. 61, 659–665.
Fitch, W. M. (1970) Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113.
Acknowledgments
Many thanks to past and present members of the Computational Biology Initiative who provided advice and expertise, I-Hsien Wu, Tom Monaghan, Jian Pu, Saurav Singh, and Leon Peshkin. This material is based upon work supported by the National Science Foundation under Grant No. DBI 0543480.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Humana Press Inc.
About this protocol
Cite this protocol
Wall, D.P., DeLuca, T. (2007). Ortholog Detection Using the Reciprocal Smallest Distance Algorithm. In: Bergman, N.H. (eds) Comparative Genomics. Methods In Molecular Biology™, vol 396. Humana Press. https://doi.org/10.1007/978-1-59745-515-2_7
Download citation
DOI: https://doi.org/10.1007/978-1-59745-515-2_7
Publisher Name: Humana Press
Print ISBN: 978-1-934115-37-4
Online ISBN: 978-1-59745-515-2
eBook Packages: Springer Protocols