A Parsimony Approach to Genome-Wide Ortholog Assignment

  • Zheng Fu
  • Xin Chen
  • Vladimir Vacic
  • Peng Nan
  • Yang Zhong
  • Tao Jiang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


The assignment of orthologous genes between a pair of genomes is a fundamental and challenging problem in comparative genomics, since many computational methods for solving various biological problems critically rely on bona fide orthologs as input. While it is usually done using sequence similarity search, we recently proposed a new combinatorial approach that combines sequence similarity and genome rearrangement. This paper continues the development of the approach and unites genome rearrangement events and (post-speciation) duplication events in a single framework under the parsimony principle. In this framework, orthologous genes are assumed to correspond to each other in the most parsimonious evolutionary scenario involving both genome rearrangement and (post-speciation) gene duplication. Besides several original algorithmic contributions, the enhanced method allows for the detection of inparalogs. Following this approach, we have implemented a high-throughput system for ortholog assignment on a genome scale, called MSOAR, and applied it to the genomes of human and mouse. As the result will show, MSOAR is able to find 99 more true orthologs than the INPARANOID program did. We have also compared MSOAR with the iterated exemplar algorithm on simulated data and found that MSOAR performed very well in terms of assignment accuracy. These test results indiate that our approach is very promising for genome-wide ortholog assignment.


Gene Pair Genome Rearrangement Ancestral Genome Ortholog Pair Orthologous Gene Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389–3402 (1997)CrossRefGoogle Scholar
  2. 2.
    Bairoch, A., et al.: The Universal Protein Resource (UniProt). Nuc. Acids Res. 33, D154–D159 (2005)Google Scholar
  3. 3.
    Cannon, S.B., Young, N.D.: OrthoParaMap: distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics 4(1), 35 (2003)CrossRefGoogle Scholar
  4. 4.
    Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Computing the assignment of orthologous genes via genome rearrangement. In: Proc. 3rd Asia Pacific Bioinformatics Conf (APBC 2005), pp. 363–378 (2005)Google Scholar
  5. 5.
    Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: The assignment of orthologous genes via genome rearrangement. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(4), 302–315 (2005)CrossRefGoogle Scholar
  6. 6.
    Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)CrossRefGoogle Scholar
  7. 7.
    Hannenhalli, S., Pevzner, P.: Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals). In: Proc. 27th Ann. ACM Symp. Theory of Comput (STOC 1995), pp. 178–189 (1995)Google Scholar
  8. 8.
    Hannenhalli, S., Pevzner, P.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proc. IEEE 36th Symp. Found. of Comp. Sci, pp. 581–592 (1995)Google Scholar
  9. 9.
    Karolchik, D., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., Weber, R.J., Haussler, D., Kent, W.J.: The UCSC Genome Browser Database. Nucleic Acids Res. 31(1), 51–54 (2003)CrossRefGoogle Scholar
  10. 10.
    Koonin, E.: Orthologs, paralogs, and evolutionary genomics. In: Annu. Rev. Genet. (2005)Google Scholar
  11. 11.
    Lee, Y., et al.: Cross-referencing eukaryotic genomes: TIGR orthologous gene alignments (TOGA). Genome Res. 12, 493–502 (2002)CrossRefGoogle Scholar
  12. 12.
    Li, L., Stoeckert, C., Roos, D.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189 (2003)CrossRefGoogle Scholar
  13. 13.
    Marron, M., Swenson, K., Moret, B.: Genomic distances under deletions and insertions. Theoretic Computer Science 325(3), 347–360 (2004)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    El-Mabrouk, N.: Reconstructing an ancestral genome using minimum segments duplications and reversals. Journal of Computer and System Sciences 65, 442–464 (2002)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Ozery-Flato, M., Shamir, R.: Two notes on genome rearragnements. Journal of Bioinformatics and Computational Biology 1(1), 71–94 (2003)CrossRefGoogle Scholar
  16. 16.
    Remm, M., Storm, C., Sonnhammer, E.: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001)CrossRefGoogle Scholar
  17. 17.
    Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)CrossRefGoogle Scholar
  18. 18.
    Swenson, K., Marron, M., Earnest-DeYoung, J., Moret, B.: Approximating the true evolutionary distance between two genomes. In: Proc. 7th SIA Workshop on Algorithm Engineering & Experiments, pp. 121–125 (2005)Google Scholar
  19. 19.
    Swenson, K., Pattengale, N., Moret, B.: A framework for orthology assignment from gene rearrangement data. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 153–166. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  20. 20.
    Storm, C., Sonnhammer, E.: Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics 18(1) (2002)Google Scholar
  21. 21.
    Tatusov, R.L., Galperin, M.Y., Natale, D.A., Koonin, E.: The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000)CrossRefGoogle Scholar
  22. 22.
    Tesler, G.: Efficient algorithms for multichromosomal genome rearrangements. Journal of Computer and System Sciences 65(3), 587–609 (2002)MATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Tatusov, R.L., Koonin, E., Lipman, D.J.: A genomic perspective on protein families. Science 278, 631–637 (1997)CrossRefGoogle Scholar
  24. 24.
    Wain, H.M., Bruford, E.A., Lovering, R.C., Lush, M.J., Wright, M.W., Povey, S.: Guidelines for human gene nomenclature. Genomics 79(4), 464–470 (2002)CrossRefGoogle Scholar
  25. 25.
    Yuan, Y.P., Eulenstein, O., Vingron, M., Bork, P.: Towards detection of orthologues in sequence databases. Bioinformatics 14(3), 285–289 (1998)CrossRefGoogle Scholar
  26. 26.
    Zheng, X., et al.: Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics 21(6), 703–710 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Zheng Fu
    • 1
  • Xin Chen
    • 2
  • Vladimir Vacic
    • 1
  • Peng Nan
    • 3
  • Yang Zhong
    • 1
  • Tao Jiang
    • 1
    • 4
  1. 1.Computer Science DepartmentUniversity of CaliforniaRiverside
  2. 2.School of Physical and Mathematical Sci.Nanyang Tech. Univ.Singapore
  3. 3.Shanghai Center for Bioinformatics TechnologyShanghaiChina
  4. 4.Tsinghua UniversityBeijingChina

Personalised recommendations