Using Permutation Patterns for Content-Based Phylogeny

  • Md Enamul Karim
  • Laxmi Parida
  • Arun Lakhotia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4146)


When the same set of genes appear in different orders on the chromosomes, they form a permutation pattern. Permutation patterns have been used to identify potential haplogroups in mammalian data [8]. They also have been successfully used to detect phylogenetic relationships between computer viruses [9]. In this paper we explore the use of these patterns as a content similarity measure and use this in inferring phylogenies from genome rearrangement data in polynomial time. The method uses a function of the cardinality of the set of common maximal permutation patterns as a proxy for evolutionary “proximity” between genomes. We introduce Pi-logen, a phylogeny tool based on this method. We summarize results of feasibility study for this scheme on synthetic data by (1) content verification and (2) ancestor prediction. We also successfully infer phylogenies on series of synthetic data and on chloroplast gene order of Campanulaceae data.


Synthetic Data Reference Tree Edit Operation Ancestral Genome Breakpoint Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Kececioglu, J., Sankoff, D.: Efficient bounds for oriented chromosome inversion distance. In: Crochemore, M., Gusfield, D. (eds.) CPM 1994. LNCS, vol. 807, pp. 307–325. Springer, Heidelberg (1994)Google Scholar
  2. 2.
    Blanchette, M., Bourque, G., Sankoff, D.: Breakpoint phylogenies. In: Miyano, S., Takagi, T. (eds.) In Genome Informatics Workshop (GIW 1997), pp. 25–34. University Academy Press, Tokyo (1997)Google Scholar
  3. 3.
    Bourque, G., Pevzner, P.A.: Genome-Scale Evolution: Reconstructing Gene Orders in the Ancestral Species. Genome Research 12(1), 26–36 (2002)Google Scholar
  4. 4.
    Caprara, A.: Formulations and complexity of multiple sorting by reversals. In: Istrail, S., et al. (eds.) Proceedings of the Third Annual International Conference on Computational Molecular Biology RECOMB, pp. 84–93. ACM Press, Lyon, France (1999)CrossRefGoogle Scholar
  5. 5.
    Cosner, M.E., et al.: An Empirical Comparison of Phylogenetic Methods on Chloroplast Gene Order Data in Campanulaceae. In: Sankoff, D., Nadeau, J. (eds.) Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment, and the Evolution of Gene Families, pp. 99–121. Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  6. 6.
    Eres, R., Landau, G.M., Parida, L.: Combinatorial approach to automatic discovery of cluster patterns. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 139–150. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Kauffman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis (1990)Google Scholar
  8. 8.
    Landau, G.M., Parida, L., Weimann, O.: Using PQ Trees for Comparative Genomics. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 128–143. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Karim, M.E., Walenstein, A., Lakhotia, A., Parida, L.: Malware phylogeny generation using permutations of code. European Journal of Computer Virology 1, 1–11 (2005)CrossRefGoogle Scholar
  10. 10.
    Nadeau, J., Taylor, B.: Lengths of chromosomal segments conserved since divergence of man and mouse. Proc. Natl.Acad. Sci. PNAS 81, 814–818 (1984)CrossRefGoogle Scholar
  11. 11.
    Watterson, G., Ewens, W., Hall, T., Morgan, A.: The chromosome inversion problem. J. Theor. Biol. 99, 1–7 (1982)CrossRefGoogle Scholar
  12. 12.
    Peer, I., Shamir, R.: The median problems for breakpoints are NP-complete, Electronic Colloquium on Computational Complexity Technical Report 98-071 (1998),
  13. 13.
    Hannenhalli, S., Pevzner, P.: Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals). In: Proc. of the 27th Annual Symposium on Theory of Computing STOC, pp. 178–189 (1995)Google Scholar
  14. 14.
    Sankoff, D.: Edit distance for genome comparison based on non-local operations. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 121–135. Springer, Heidelberg (1992)Google Scholar
  15. 15.
    Berman, P., Hannenhalli, S.: Fast sorting by reversal. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 168–185. Springer, Heidelberg (1996)Google Scholar
  16. 16.
    Kaplan, H., Shamir, R., Tarjan, R.: Faster and simpler algorithm for sorting signed permutations by reversals. In: Proc. of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms SODA, pp. 344–351 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Md Enamul Karim
    • 1
  • Laxmi Parida
    • 2
  • Arun Lakhotia
    • 1
  1. 1.Center for Advanced Computer StudiesUniversity of Louisiana at LafayetteUSA
  2. 2.Computational Biology Center, IBM T J Watson Research CenterYorktown HeightsUSA

Personalised recommendations