Abstract
Genome-scale assignment of orthologous genes is a fundamental and challenging problem in computational biology and has a wide range of applications in comparative genomics, functional genomics, and systems biology. Many methods based on sequence similarity, phylogenetic analysis, chromosomal syntenic information, and genome rearrangement have been proposed in recent years for ortholog assignment. Although these methods produce results that largely agree with each other, their results may still contain significant differences. In this article, we consider the recently proposed parsimony approach for assigning orthologs between closely related genomes based on genome rearrangement, which essentially attempts to transform one genome into another by the smallest number of genome rearrangement events including reversal, translocation, fusion, and fission, as well as gene duplication events. We will highlight some of the challenging algorithmic problems that arise in the approach including (i) minimum common substring partition, (ii) signed reversal distance with duplicates, and (iii) signed transposition distance with duplicates. The most recent progress towards the solution of these problems will be reviewed and some open questions will be posed. We will also discuss some possible extensions of the approach to the simultaneous comparison of multiple genomes.
Similar content being viewed by others
References
Fitch W M. Distinguishing homologous from analogous proteins. Syst. Zool., 1970, 19(2): 99-113.
Koonin E V. Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet., 2005, 39: 309-338.
Remm M, Storm C, Sonnhammer E. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol., 2001, 314(5): 1041-1052.
Sankoff D. Genome rearrangement with gene families. Bioinformatics, 1999, 15(11): 909-917.
Tatusov R L, Galperin M Y, Natale D A, Koonin E V. The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res., 2000, 28(1): 33-36.
Tatusov R L, Koonin E V, Lipman D J. A genomic perspective on protein families. Science, 1997, 278: 631-637.
Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(17): 3389-3402.
Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T. Computing the assignment of orthologous genes via genome rearrangement. In Proc. the 3rd Asia Pacific Bioinformatics Conf. (APBC 2005), Singapore, Jan. 17-21, 2005, pp.363-378.
Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T. The assignment of orthologous genes via genome rearrangement. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005, 2(4): 302-315.
Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T. A parsimony approach to genome-wide ortholog assignment. In Proc. the 10th Annual International Conference on Research in Computational Molecular Biology (RECOMB), Venice, Italy, April 2-5, 2006, pp.578-594.
Fu Z, Chen X, Vacic V, Nan P, Zhong Y, Jiang T. MSOAR: A high-throughput ortholog assignment system based on genome rearrangement. Journal of Computational Biology, 2007, 14(9): 1160-1175.
Lee Y, Sultana R, Pertea G, Cho J, Karamycheva S, Tsai J, Parvizi B, Cheung F, Antonescu V, White J, Holt I, Liang F, Quackenbush J. Cross-referencing eukaryotic genomes: TIGR orthologous gene alignments (TOGA). Genome Res., 2002, 12(3): 493-502.
Li L, Stoeckert C, Roos D. OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res., 2003, 13(9): 2178-2189.
Yuan Y P, Eulenstein O, Vingron M, Bork P. Towards detection of orthologues in sequence databases. Bioinformatics, 1998, 14(3): 285-289.
Storm C, Sonnhammer E. Automated ortholog inference from phylogenetic trees and calculation of orthology reliability. Bioinformatics, 2002, 18(1): 92-99.
Cannon S B, Young N D. OrthoParaMap: Distinguishing orthologs from paralogs by integrating comparative genome data and gene phylogenies. BMC Bioinformatics, 2003, 4(1): 35.
Zheng X H, Lu F, Wang Z, Zhong F, Hoover J, Mural R. Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics, 2005, 21(6): 703-710.
Kuzniar A, van Ham R, Pongor S, Leunissen J. The quest for orthologs: Finding the corresponding gene across genomes. Trends in Genetics, 2008, 24(11): 539-550.
El-Mabrouk N. Reconstructing an ancestral genome using minimum segments duplications and reversals. Journal of Computer and System Sciences, 2002, 65(3): 442-464.
Marron M, Swenson K, Moret B. Genomic distances under deletions and insertions. Theoretical Computer Science, 2004, 325(3): 347-360.
Swenson K, Marron M, Earnest-DeYoung J. Moret B. Approximating the true evolutionary distance between two genomes. In Proc. the 7th SIA Workshop on Algorithm Engineering & Experiments, Vancouver, Canada, Jan. 22, 2005, pp.121-125.
Swenson K, Pattengale N, Moret B. A framework for orthology assignment from gene rearrangement data. In Proc. the 3rd RECOMB Workshop on Comparative Genomics (RECOMB-CG2005), Dublin, Ireland, Sept. 18-20, 2005, LNCS 3678, Springer, pp.153-166.
Hannenhalli S, Pevzner P. Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals. J. ACM, 1999, 46(1): 1-27; extended abstract in Proc. ACM STOC, Las Vegas, USA, May 23-June 1, 1995, pp.178-189.
Shi G, Zhang L, Jiang T. MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement. In Proc. the 8th LSS Computational Systems Bioinformatics Conference, Stanford, USA, August 10-12, 2009, pp.12-24.
Bairoch A, Apweiler R, Wu C H, Barker W C, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin M J, Natale D A, O’Donovan C, Redaschi N, Yeh L S. The Universal Protein Resource (UniProt). Nucleic Acids Res., 2005, 33(Database Issue): D154-D159.
ftp://ftp.pantherdb.org/sequence classifications/.
M Ozery-Flato, Ron Shamir. Two notes on genome rearragnements. Journal of Bioinformatics and Computational Biology, 2003, 1(1): 71-94.
Tesler G. Efficient algorithms for multichromosomal genome rearrangements. Journal of Computer and System Sciences, 2002, 65(3): 587-609.
Hannenhalli S, Pevzner P A. Transforming men into mice (polynomial algorithm for genomic distance problem). In Proc. IEEE 36th Ann. Symp. Foundations of Comp. Sci. Milwaukee, USA, Oct. 23-25, 1995, pp.581-592.
Christie D, Irving R. Sorting strings by reversals and by transpositions. SIAM J. Discrete Math., 2001, 14(2): 193-206.
Kaplan H, Shamir R, Tarjan R. Faster and simpler algorithm for sorting signed permutations by reversals. In Proc. the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, USA, Jan. 5-7, 1997, pp.344-351.
Bader D, Moret B, Yan M. A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. Journal of Computational Biology, 2001, 8(5): 483-491.
Radcliffe A, Scott A, Wilmer E. Reversals and transpositions over finite alphabets. SIAM J. Discrete Math., 2005, 19(1): 224-244.
Caprara A. Sorting by reversals is difficult. In Proc. the First Annual International Conference on Computational Molecular Biology, Santa Fe, USA, Jan. 20-23, 1997, pp.75-83.
Caprara A. Sorting permutations by reversals and Eulerian cycle decompositions. SIAM J. Discrete Math., 1999, 12(1): 91-110.
Bafna V, Pevzner P. Genome rearrangements and sorting by reversals. SIAM J. Comput., 1996, 25(2): 272-289; extended abstract appeared in Proc. IEEE FOCS 1993, Palo Alto, USA, Nov. 3-5, 1993, pp.148-157.
Kececioglu J, Sankoff D. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. Algorithmica, 1995, 13(1/2): 180-210.
Lin G, Jiang T. A further improved approximation algorithm for breakpoint graph decomposition. Journal of Combinatorial Optimization, 2004, 8(2): 183-194.
Gu Q, Peng S, Sudborough H. A 2-approximation algorithm for genome rearrangements by reversals and transpositions. Theoret. Comput. Sci., 1999, 210(2): 327-339.
Hartman T, Sharon R. A 1.5-approximation algorithm for sorting by transpositions and transreversals. Journal of Computer and System Sciences, 2005, 70(3): 300-320.
Bafna V, Pevzner P. Sorting by transpositions. SIAM J. Discrete Math., 1998, 11(2): 224-240.
Goldstein A, Kolman P, Zheng J. Minimum common string partition problem: Hardness and approximations. In Proc. the 15th International Symposium on Algorithms and Computation, Hong Kong, China, Dec. 20-22, 2004, pp.484-495.
Chrobak M, Kolman P, Sgall J. The greedy algorithm for the minimum common string partition problem. In Proc. the 7th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Cambridge, USA, Aug. 22-24, 2004, LNCS 3122, Springer, pp.84-95.
Kolman P. Approximating aeversal distance for strings with bounded number of duplicates. In Proc. the 30th International Symposium on Mathematical Foundations of Computer Science, Gdansk, Poland, Aug. 29-Sept. 2, 2005, pp.580-590.
Halldorsson M M. Approximating discrete collections via local improvements. In Proc. the Sixth Annual ACM-SIAM Symp. Discrete Algorithms, San Francisco, USA, Jan. 22-24, 1995, pp.160-169.
Kolman P, Walen T. Reversal distance for strings with duplicates: Linear time approximation using hitting set. In Proc. the 4th Workshop on Approximation and Online Algorithms, Zurich, Switzerland, Sept. 14-15, 2006, pp.279-289.
Bourque G, Pevzner P. Genome-scale evolution: Reconstructing gene orders in the ancestral species. Genome Research, 2002, 12(1): 26-36.
Sankoff D, Blanchette M.Multiple genome rearrangement and breakpoint phylogeny. Journal of Computational Biology, 1998, 5(3): 555-570.
Fu Z, Jiang T. Clustering of main orthologs for multiple genomes. Journal of Bioinformatics and Computational Biology, 2008, 6(3): 573-584.
Wang L, Jiang T, Lawler E. Approximation algorithms for tree alignment with a given phylogeny. Algorithmica, 1996, 16(3): 302-315.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by the NSF of USA under Grant No. IIS-0711129.
Rights and permissions
About this article
Cite this article
Jiang, T. Some Algorithmic Challenges in Genome-Wide Ortholog Assignment. J. Comput. Sci. Technol. 25, 42–52 (2010). https://doi.org/10.1007/s11390-010-9304-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-010-9304-6