Skip to main content
Log in

Some Algorithmic Challenges in Genome-Wide Ortholog Assignment

  • Survey
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Genome-scale assignment of orthologous genes is a fundamental and challenging problem in computational biology and has a wide range of applications in comparative genomics, functional genomics, and systems biology. Many methods based on sequence similarity, phylogenetic analysis, chromosomal syntenic information, and genome rearrangement have been proposed in recent years for ortholog assignment. Although these methods produce results that largely agree with each other, their results may still contain significant differences. In this article, we consider the recently proposed parsimony approach for assigning orthologs between closely related genomes based on genome rearrangement, which essentially attempts to transform one genome into another by the smallest number of genome rearrangement events including reversal, translocation, fusion, and fission, as well as gene duplication events. We will highlight some of the challenging algorithmic problems that arise in the approach including (i) minimum common substring partition, (ii) signed reversal distance with duplicates, and (iii) signed transposition distance with duplicates. The most recent progress towards the solution of these problems will be reviewed and some open questions will be posed. We will also discuss some possible extensions of the approach to the simultaneous comparison of multiple genomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fitch W M. Distinguishing homologous from analogous proteins. Syst. Zool., 1970, 19(2): 99-113.

    Article  MathSciNet  Google Scholar 

  2. Koonin E V. Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet., 2005, 39: 309-338.

    Article  Google Scholar 

  3. Remm M, Storm C, Sonnhammer E. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol., 2001, 314(5): 1041-1052.

    Article  Google Scholar 

  4. Sankoff D. Genome rearrangement with gene families. Bioinformatics, 1999, 15(11): 909-917.

    Article  Google Scholar 

  5. Tatusov R L, Galperin M Y, Natale D A, Koonin E V. The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res., 2000, 28(1): 33-36.

    Article  Google Scholar 

  6. Tatusov R L, Koonin E V, Lipman D J. A genomic perspective on protein families. Science, 1997, 278: 631-637.

    Article  Google Scholar 

  7. Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(17): 3389-3402.

    Article  Google Scholar 

  8. Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T. Computing the assignment of orthologous genes via genome rearrangement. In Proc. the 3rd Asia Pacific Bioinformatics Conf. (APBC 2005), Singapore, Jan. 17-21, 2005, pp.363-378.

  9. Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T. The assignment of orthologous genes via genome rearrangement. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005, 2(4): 302-315.

    Article  Google Scholar 

  10. Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T. A parsimony approach to genome-wide ortholog assignment. In Proc. the 10th Annual International Conference on Research in Computational Molecular Biology (RECOMB), Venice, Italy, April 2-5, 2006, pp.578-594.

  11. Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T. MSOAR: A high-throughput ortholog assignment system based on genome rearrangement. Journal of Computational Biology, 2007, 14(9): 1160-1175.

    Article  MathSciNet  Google Scholar 

  12. Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J. Cross-referencing eukaryotic genomes: TIGR orthologous gene alignments (TOGA). Genome Res., 2002, 12(3): 493-502.

    Article  Google Scholar 

  13. Li L, Stoeckert C, Roos D. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res., 2003, 13(9): 2178-2189.

    Article  Google Scholar 

  14. Yuan Y P, Eulenstein O, Vingron M, Bork P. Towards detection of orthologues in sequence databases. Bioinformatics, 1998, 14(3): 285-289.

    Article  Google Scholar 

  15. Storm C, Sonnhammer E. Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics, 2002, 18(1): 92-99.

    Article  Google Scholar 

  16. Cannon S B, Young N D. OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics, 2003, 4(1): 35.

    Article  Google Scholar 

  17. Zheng X H, Lu F, Wang Z, Zhong F, Hoover J, Mural R. Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics, 2005, 21(6): 703-710.

    Article  Google Scholar 

  18. Kuzniar A, van Ham R, Pongor S, Leunissen J. The quest for orthologs: Finding the corresponding gene across genomes. Trends in Genetics, 2008, 24(11): 539-550.

    Article  Google Scholar 

  19. El-Mabrouk N. Reconstructing an ancestral genome using minimum segments duplications and reversals. Journal of Computer and System Sciences, 2002, 65(3): 442-464.

    Article  MATH  MathSciNet  Google Scholar 

  20. Marron M, Swenson K, Moret B. Genomic distances under deletions and insertions. Theoretical Computer Science, 2004, 325(3): 347-360.

    Article  MATH  MathSciNet  Google Scholar 

  21. Swenson K, Marron M, Earnest-DeYoung J. Moret B. Approximating the true evolutionary distance between two genomes. In Proc. the 7th SIA Workshop on Algorithm Engineering & Experiments, Vancouver, Canada, Jan. 22, 2005, pp.121-125.

  22. Swenson K, Pattengale N, Moret B. A framework for orthology assignment from gene rearrangement data. In Proc. the 3rd RECOMB Workshop on Comparative Genomics (RECOMB-CG2005), Dublin, Ireland, Sept. 18-20, 2005, LNCS 3678, Springer, pp.153-166.

  23. Hannenhalli S, Pevzner P. Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals. J. ACM, 1999, 46(1): 1-27; extended abstract in Proc. ACM STOC, Las Vegas, USA, May 23-June 1, 1995, pp.178-189.

  24. Shi G, Zhang L, Jiang T. MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement. In Proc. the 8th LSS Computational Systems Bioinformatics Conference, Stanford, USA, August 10-12, 2009, pp.12-24.

  25. Bairoch A, Apweiler R, Wu C H, Barker W C, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin M J, Natale D A, O’Donovan C, Redaschi N, Yeh L S. The Universal Protein Resource (UniProt). Nucleic Acids Res., 2005, 33(Database Issue): D154-D159.

    Article  Google Scholar 

  26. http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/hcop.pl.

  27. ftp://ftp.pantherdb.org/sequence classifications/.

  28. http://www.jax.org.

  29. M Ozery-Flato, Ron Shamir. Two notes on genome rearragnements. Journal of Bioinformatics and Computational Biology, 2003, 1(1): 71-94.

    Article  Google Scholar 

  30. Tesler G. Efficient algorithms for multichromosomal genome rearrangements. Journal of Computer and System Sciences, 2002, 65(3): 587-609.

    Article  MATH  MathSciNet  Google Scholar 

  31. Hannenhalli S, Pevzner P A. Transforming men into mice (polynomial algorithm for genomic distance problem). In Proc. IEEE 36th Ann. Symp. Foundations of Comp. Sci. Milwaukee, USA, Oct. 23-25, 1995, pp.581-592.

  32. Christie D, Irving R. Sorting strings by reversals and by transpositions. SIAM J. Discrete Math., 2001, 14(2): 193-206.

    Article  MATH  MathSciNet  Google Scholar 

  33. Kaplan H, Shamir R, Tarjan R. Faster and simpler algorithm for sorting signed permutations by reversals. In Proc. the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, USA, Jan. 5-7, 1997, pp.344-351.

  34. Bader D, Moret B, Yan M. A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. Journal of Computational Biology, 2001, 8(5): 483-491.

    Article  Google Scholar 

  35. Radcliffe A, Scott A, Wilmer E. Reversals and transpositions over finite alphabets. SIAM J. Discrete Math., 2005, 19(1): 224-244.

    Article  MATH  MathSciNet  Google Scholar 

  36. Caprara A. Sorting by reversals is difficult. In Proc. the First Annual International Conference on Computational Molecular Biology, Santa Fe, USA, Jan. 20-23, 1997, pp.75-83.

  37. Caprara A. Sorting permutations by reversals and Eulerian cycle decompositions. SIAM J. Discrete Math., 1999, 12(1): 91-110.

    Article  MATH  MathSciNet  Google Scholar 

  38. Bafna V, Pevzner P. Genome rearrangements and sorting by reversals. SIAM J. Comput., 1996, 25(2): 272-289; extended abstract appeared in Proc. IEEE FOCS 1993, Palo Alto, USA, Nov. 3-5, 1993, pp.148-157.

    Article  MATH  MathSciNet  Google Scholar 

  39. Kececioglu J, Sankoff D. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. Algorithmica, 1995, 13(1/2): 180-210.

    Article  MATH  MathSciNet  Google Scholar 

  40. Lin G, Jiang T. A further improved approximation algorithm for breakpoint graph decomposition. Journal of Combinatorial Optimization, 2004, 8(2): 183-194.

    Article  MATH  MathSciNet  Google Scholar 

  41. Gu Q, Peng S, Sudborough H. A 2-approximation algorithm for genome rearrangements by reversals and transpositions. Theoret. Comput. Sci., 1999, 210(2): 327-339.

    Article  MATH  MathSciNet  Google Scholar 

  42. Hartman T, Sharon R. A 1.5-approximation algorithm for sorting by transpositions and transreversals. Journal of Computer and System Sciences, 2005, 70(3): 300-320.

    Article  MathSciNet  Google Scholar 

  43. Bafna V, Pevzner P. Sorting by transpositions. SIAM J. Discrete Math., 1998, 11(2): 224-240.

    Article  MATH  MathSciNet  Google Scholar 

  44. Goldstein A, Kolman P, Zheng J. Minimum common string partition problem: Hardness and approximations. In Proc. the 15th International Symposium on Algorithms and Computation, Hong Kong, China, Dec. 20-22, 2004, pp.484-495.

  45. Chrobak M, Kolman P, Sgall J. The greedy algorithm for the minimum common string partition problem. In Proc. the 7th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Cambridge, USA, Aug. 22-24, 2004, LNCS 3122, Springer, pp.84-95.

  46. Kolman P. Approximating aeversal distance for strings with bounded number of duplicates. In Proc. the 30th International Symposium on Mathematical Foundations of Computer Science, Gdansk, Poland, Aug. 29-Sept. 2, 2005, pp.580-590.

  47. Halldorsson M M. Approximating discrete collections via local improvements. In Proc. the Sixth Annual ACM-SIAM Symp. Discrete Algorithms, San Francisco, USA, Jan. 22-24, 1995, pp.160-169.

  48. Kolman P, Walen T. Reversal distance for strings with duplicates: Linear time approximation using hitting set. In Proc. the 4th Workshop on Approximation and Online Algorithms, Zurich, Switzerland, Sept. 14-15, 2006, pp.279-289.

  49. Bourque G, Pevzner P. Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Research, 2002, 12(1): 26-36.

    Google Scholar 

  50. Sankoff D, Blanchette M.Multiple genome rearrangement and breakpoint phylogeny. Journal of Computational Biology, 1998, 5(3): 555-570.

    Article  Google Scholar 

  51. Fu Z, Jiang T. Clustering of main orthologs for multiple genomes. Journal of Bioinformatics and Computational Biology, 2008, 6(3): 573-584.

    Article  MathSciNet  Google Scholar 

  52. Wang L, Jiang T, Lawler E. Approximation algorithms for tree alignment with a given phylogeny. Algorithmica, 1996, 16(3): 302-315.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Jiang.

Additional information

This work is supported by the NSF of USA under Grant No. IIS-0711129.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, T. Some Algorithmic Challenges in Genome-Wide Ortholog Assignment. J. Comput. Sci. Technol. 25, 42–52 (2010). https://doi.org/10.1007/s11390-010-9304-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-010-9304-6

Keywords

Navigation