Controlling Size When Aligning Multiple Genomic Sequences with Duplications

  • Minmei Hou
  • Piotr Berman
  • Louxin Zhang
  • Webb Miller
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4175)


For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.


Pairwise Alignment Orthologous Pair Clique Partition Alignment Graph Clique Cover 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berman, P.: Relationship between density and deterministic complexity of NP-complete languages. In: Ausiello, G., Böhm, C. (eds.) ICALP 1978. LNCS, vol. 62, pp. 63–71. Springer, Heidelberg (1978)Google Scholar
  2. 2.
    Blanchette, M., et al.: Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research 14, 708–715 (2004)CrossRefGoogle Scholar
  3. 3.
    Cacceta, L., Erdos, P., Ordman, E.T., Pullman, N.J.: On the difference between clique numbers of a graph. Ars Combinatoria 19A, 97–106 (1985)Google Scholar
  4. 4.
    Cavers, M.: Clique partitions and coverings of graphs (Masters thesis, University of Waterloo) (2005)Google Scholar
  5. 5.
    Cooper, G.M., et al.: Distribution and intensity of constraint in mammalian genomic sequences. Genome Research 15, 901–913 (2005)CrossRefGoogle Scholar
  6. 6.
    Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)CrossRefGoogle Scholar
  7. 7.
    Fitch, W.M.: Homology, a personal view on some problems. Trends Genet. 16, 227–231 (2000)CrossRefGoogle Scholar
  8. 8.
    Gramm, J., et al.: Data reduction, exact, and heuristic algorithms for clique cover. In: ALENEX, pp. 86–94 (2006)Google Scholar
  9. 9.
    Gregory, D.A., Pullman, N.J.: On a clique covering problem of Orlin. Discrete Math. 41, 97–99 (1982)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Hall Jr., M.: A problem in partition. Bull. Amer. Math. Soc. 47, 801–807 (1941)Google Scholar
  11. 11.
    Hou, M., et al.: Aligning multiple genomic sequences that contain duplications (manuscript)Google Scholar
  12. 12.
    Hughes, J.R., et al.: Annotation of cis-regulatory elements by identification, subclassification, and functional assessment of multispecies conserved sequences. Proc. Natl. Acad. Sci. USA 102, 9830–9835 (2005)CrossRefGoogle Scholar
  13. 13.
    Kou, L.T., et al.: Covering edges by cliques with regard to keyword conflicts and intersection graphs. Communications of the ACM 21(2), 135–139 (1978)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Lund, C., Yannakakis, M.: On the hardness of approximation minimization problems. J. Assoc. for Comput. Mach. 41, 961–981 (1994)MathSciNetGoogle Scholar
  15. 15.
    Margulies, E.H., et al.: Relationship between evolutionary constraint and genome function in 1% of the human genome. Nature (submitted)Google Scholar
  16. 16.
    Margulies, E.H., et al.: Annotation of the human genome through comparisons of diverse mammalian sequences. Genome Research (submitted)Google Scholar
  17. 17.
    Orlin, J.: Contentment in graph theory: covering graphs with cliques. Indag. Math. 39, 406–424 (1977)MathSciNetGoogle Scholar
  18. 18.
    Pullman, N.J., Donald, A.: Clique coverings of graphs II: complements of cliques. Utilitas Math. 19, 207–213 (1981)MATHMathSciNetGoogle Scholar
  19. 19.
    Pullman, N.J.: Clique coverings of graphs IV: algorithms. SIAM J. on Computing 13, 57–75 (1984)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Schwartz, S., et al.: Human-Mouse Alignments with BLASTZ. Genome Res. 13(1), 103–107 (2003)CrossRefGoogle Scholar
  21. 21.
    Siepel, A., et al.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Research 15, 1034–1050 (2005)CrossRefGoogle Scholar
  22. 22.
    The ENCODE Project Consortium: The ENCODE (ENCyclopedia of DNA Elements) Project. Science 306, 636–640 (2004)Google Scholar
  23. 23.
    Wakefield, M.J., Maxwell, P., Huttley, G.A.: Vestige: maximum likelihood phylogenetic footprinting. BMC Bioinformatics 6, 130 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Minmei Hou
    • 1
  • Piotr Berman
    • 1
  • Louxin Zhang
    • 2
  • Webb Miller
    • 1
  1. 1.Department of Computer Science and Engineering, Penn StateUniversity ParkUSA
  2. 2.Department of MathematicsNational University of SingaporeSingapore

Personalised recommendations