Comparative Genomics pp 95-110 | Cite as
Ortholog Detection Using the Reciprocal Smallest Distance Algorithm
- 30 Citations
- 2k Downloads
Summary
All protein coding genes have a phylogenetic history that when understood can lead to deep insights into the diversification or conservation of function, the evolution of developmental complexity, and the molecular basis of disease. One important part to reconstructing the relationships among genes in different organisms is an accurate method to find orthologs as well as an accurate measure of evolutionary diversification. The present chapter details such a method, called the reciprocal smallest distance algorithm (RSD). This approach improves upon the common procedure of taking reciprocal best Basic Local Alignment Search Tool hits (RBH) in the identification of orthologs by using global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes. RSD finds many putative orthologs missed by RBH because it is less likely to be misled by the presence of close paralogs in genomes. The package offers a tremendous amount of flexibility in investigating parameter settings allowing the user to search for increasingly distant orthologs between highly divergent species, among other advantages. The flexibility of this tool makes it a unique and powerful addition to other available approaches for ortholog detection.
Keywords
Orthologs orthology reciprocal smallest distance reciprocal BLAST hit maximum likelihood molecular phylogenetics phylogenyNotes
Acknowledgments
Many thanks to past and present members of the Computational Biology Initiative who provided advice and expertise, I-Hsien Wu, Tom Monaghan, Jian Pu, Saurav Singh, and Leon Peshkin. This material is based upon work supported by the National Science Foundation under Grant No. DBI 0543480.
References
- 1.Bernal, A., Ear, U., and Kyrpides N. (2001) Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 29, 126–127.CrossRefPubMedGoogle Scholar
- 2.Turchin, A. and Kohane, I. S. (2002) Gene homology resources on the World Wide Web. Physiol Genomics 11, 165–177.PubMedGoogle Scholar
- 3.Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T. O. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288.Google Scholar
- 4.Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice, D. W., Yeates, T. O., and Eisenberg, D. (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753.CrossRefPubMedGoogle Scholar
- 5.Dacks, J. B. and Doolittle, W. F. (2001) Reconstructing/deconstructing the earliest eukaryotes: how comparative genomics can help. Cell 107, 419–425.CrossRefPubMedGoogle Scholar
- 6.Lang, B. F., Seif, E., Gray, M. W., O’Kelly, C. J., and Burger, G. (1999) A comparative genomics approach to the evolution of eukaryotes and their mitochondria. J. Eukaryot. Microbiol. 46, 320–326.CrossRefPubMedGoogle Scholar
- 7.Rubin, G. M., Yandell, M. D., Wortman, J. R., et al. (2000) Comparative genomics of the eukaryotes. Science 287, 2204–2215.CrossRefPubMedGoogle Scholar
- 8.Ureta-Vidal, A., Ettwiller, L., and Birney, E. (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat. Rev. Genet. 4, 251–262.CrossRefPubMedGoogle Scholar
- 9.Espadaler, J., Aragues, R., Eswar, N., et al. (2005) Detecting remotely related proteins by their interactions and sequence similarity. Proc. Natl. Acad. Sci. USA 102, 7151–7156.CrossRefPubMedGoogle Scholar
- 10.Espadaler, J., Romero-Isart, O., Jackson, R. M., and Oliva, B. (2005) Prediction of protein-protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics 21, 3360–3368.CrossRefPubMedGoogle Scholar
- 11.Matthews, L. R., Vaglio, P., Reboul, J., et al. (2001) Identification of potential interaction networks using sequence-based searches for conserved protein-protein interactions or “interologs”. Genome Res. 11, 2120–2126.CrossRefPubMedGoogle Scholar
- 12.Kim, W. K., Bolser, D. M., and Park, J. H. (2004) Large-scale co-evolution analysis of protein structural interlogues using the global protein structural interactome map (PSIMAP). Bioinformatics 20, 1138–1150.CrossRefPubMedGoogle Scholar
- 13.O’Brien, K. P., Westerlund, I., and Sonnhammer, E. L. (2004) OrthoDisease: a database of human disease orthologs. Hum. Mutat. 24, 112–119.CrossRefPubMedGoogle Scholar
- 14.Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–637.CrossRefPubMedGoogle Scholar
- 15.Fraser, H. B., Hirsh, A. E., Wall, D. P., and Eisen, M. B. (2004) Coevolution of gene expression among interacting proteins. Proc. Natl. Acad. Sci. USA 101, 9033–9038.CrossRefPubMedGoogle Scholar
- 16.Kulathinal, R. J., Bettencourt, B. R., and Hartl, D. L. (2004) Compensated deleterious mutations in insect genomes. Science 306, 1553–1554.CrossRefPubMedGoogle Scholar
- 17.Kondrashov, A. S., Sunyaev, S., and Kondrashov, F. A. (2002) Dobzhansky-Muller incompatibilities in protein evolution. Proc. Natl. Acad. Sci. USA 99, 14,878–14,883.CrossRefGoogle Scholar
- 18.Sunyaev, S., Kondrashov, F. A., Bork, P., and Ramensky, V. (2003) Impact of selection, mutation rate and genetic drift on human genetic variation. Hum. Mol. Genet. 12, 3325–3330.CrossRefPubMedGoogle Scholar
- 19.Sunyaev, S., Ramensky, V., Koch, I., Lathe, W., 3rd, Kondrashov, A. S., and Bork, P. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597.CrossRefPubMedGoogle Scholar
- 20.Fraser, H. B., Wall, D. P., and Hirsh, A. E. (2003) A simple dependence between protein evolution rate and the number of protein-protein interactions. BMC Evol. Biol. 3, 11.CrossRefPubMedGoogle Scholar
- 21.Herbeck, J. T. and Wall, D. P. (2005) Converging on a general model of protein evolution. Trends Biotechnol. 23, 485–487.CrossRefPubMedGoogle Scholar
- 22.Wall, D. P., Hirsh, A. E., Fraser, H. B., et al. (2005) Functional genomic analysis of the rates of protein evolution. Proc. Natl. Acad. Sci. USA 102, 5483–5488.CrossRefPubMedGoogle Scholar
- 23.Hirsh, A. E. and Fraser, H. B. (2001) Protein dispensability and rate of evolution. Nature 411, 1046–1049.CrossRefPubMedGoogle Scholar
- 24.Hurst, L. D. and Smith, N. G. (1999) Do essential genes evolve slowly? Curr. Biol. 9, 747–750.CrossRefPubMedGoogle Scholar
- 25.Jordan, I. K., Rogozin, I. B., Wolf, Y. I., and Koonin, E. V. (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12, 962–968.PubMedGoogle Scholar
- 26.Krylov, D. M., Wolf, Y. I., Rogozin, I. B., and Koonin, E. V. (2003) Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 13, 2229–2235.CrossRefPubMedGoogle Scholar
- 27.Yang, J., Gu, Z., and Li, W. H. (2003) Rate of protein evolution versus fitness effect of gene deletion. Mol. Bol. Evol. 20, 772–774.CrossRefGoogle Scholar
- 28.Zhang, J. and He, X. (2005) Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol. Biol. Evol. 22, 1147–1155.CrossRefPubMedGoogle Scholar
- 29.Rocha, E. P. and Danchin, A. (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol. Biol. Evol. 21, 108–116.CrossRefPubMedGoogle Scholar
- 30.Koonin, E. V., Aravind, L., and Kondrashov, A. S. (2000) The impact of comparative genomics on our understanding of evolution. Cell 101, 573–576.CrossRefPubMedGoogle Scholar
- 31.Koski, L. B. and Golding, G. B. (2001) The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol. 52, 540–542.PubMedGoogle Scholar
- 32.Tatusov, R. L., Galperin, M. Y., Natale, D. A., and Koonin, E. V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36.CrossRefPubMedGoogle Scholar
- 33.O’Brien, K. P., Remm, M., and Sonnhammer, E. L. (2005) Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 33, D476–D480.CrossRefPubMedGoogle Scholar
- 34.Remm, M., Storm, C. E., and Sonnhammer, E. L. (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052.CrossRefPubMedGoogle Scholar
- 35.Wheeler, D. L., Barrett, T., Benson, D. A., et al. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 33, D39–D45.CrossRefPubMedGoogle Scholar
- 36.Wall, D. P., Fraser, H. B., and Hirsh, A. E. (2003) Detecting putative orthologs. Bioinformatics 19, 1710–1711.CrossRefPubMedGoogle Scholar
- 37.Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
- 38.Chenna, R., Sugawara, H., Koike, T., et al. (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31, 3497–3500.CrossRefPubMedGoogle Scholar
- 39.Yang, Z. (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555–556.PubMedGoogle Scholar
- 40.Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282.PubMedGoogle Scholar
- 41.Nei, M., Xu, P., and Glazko G. (2001) Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms. Proc. Natl. Acad. Sci. USA 98, 2497–2502.CrossRefPubMedGoogle Scholar
- 42.Degnan, P. H., Lazarus, A. B., and Wernegreen, J. J. (2005) Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res. 15, 1023–1033.CrossRefPubMedGoogle Scholar
- 43.Gasch, A. P., Moses, A. M., Chiang, D. Y., Fraser, H. B., Berardini, M., and Eisen, M. B. (2004) Conservation and evolution of cis-regulatory systems in ascomycete fungi. PLoS Biol. 2, e398.CrossRefPubMedGoogle Scholar
- 44.Nayak, S., Goree, J., and Schedl T. (2005) fog-2 and the evolution of self-fertile hermaphroditism in Caenorhabditis. PLoS Biol. 3, e6.Google Scholar
- 45.Wu, H., Su, Z., Mao, F., Olman, V., and Xu, Y. (2005) Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res. 33, 2822–2837.CrossRefPubMedGoogle Scholar
- 46.Wuchty, S. (2004) Evolution and topology in the yeast protein interaction network. Genome Res. 14, 1310–1314.CrossRefPubMedGoogle Scholar
- 47.Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.CrossRefPubMedGoogle Scholar
- 48.Birney, E., Andrews, D., and Caccamo, M. (2006) Ensembl 2006. Nucleic Acids Res. 34, D556–D5561.CrossRefPubMedGoogle Scholar
- 49.Liolios, K., Tavernarakis, N., Hugenholtz, P., and Kyrpides, N. C. (2006) The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. Nucleic Acids Res. 34, D332–D334.CrossRefPubMedGoogle Scholar
- 50.Bastien, O., Roy, S., and Marechal, E. (2005) Construction of non-symmetric substitution matrices derived from proteomes with biased amino acid distributions. C R Biol. 328, 445–453.CrossRefPubMedGoogle Scholar
- 51.Olsen, R. and Loomis, W. F. (2005) A collection of amino acid replacement matrices derived from clusters of orthologs. J. Mol. Evol. 61, 659–665.CrossRefPubMedGoogle Scholar
- 52.Fitch, W. M. (1970) Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113.CrossRefPubMedGoogle Scholar