Advertisement

Annals of Operations Research

, Volume 148, Issue 1, pp 167–187 | Cite as

Novel evolutionary models and applications to sequence alignment problems

  • Eva K. LeeEmail author
  • Todd Easton
  • Kapil Gupta
Article
  • 91 Downloads

Abstract

In this paper, we present a novel graph-theoretical approach for representing a wide variety of sequence analysis problems within a single model. The model allows incorporation of the operations “insertion”, “deletion”, and “substitution”, and various parameters such as relative distances and weights. Conceptually, we refer the problem as the minimum weight common mutated sequence (MWCMS) problem. The MWCMS model has many applications including multiple sequence alignment problem, the phylogenetic analysis, the DNA sequencing problem, and sequence comparison problem, which encompass a core set of very difficult problems in computational biology. Thus the model presented in this paper lays out a mathematical modeling framework that allows one to investigate theoretical and computational issues, and to forge new advances for these distinct, but related problems.

Through the introduction of supernodes, and the multi-layer supergraph, we proved that MWCMS is \({NP}\)-complete. Furthermore, it was shown that a conflict graph derived from the multi-layer supergraph has the property that a solution to the associated node-packing problem of the conflict graph corresponds to a solution of the MWCMS problem. In this case, we proved that when the number of input sequences is a constant, MWCMS is polynomial-time solvable. We also demonstrated that some well-known combinatorial problems can be viewed as special cases of the MWCMS problem. In particular, we presented theoretical results implied by the MWCMS theory for the minimum weight supersequence problem, the minimum weight superstring problem, and the longest common subsequence problem.

Two integer programming formulations were presented and a simple yet elegant decomposition heuristic was introduced. The integer programming instances have proven to be computationally intensive. Consequently, research involving simultaneous column and row generation and parallel computing will be explored. The heuristic algorithm, introduced herein for multiple sequence alignment, overcomes the order-dependent drawbacks of many of the existing algorithms, and is capable of returning good sequence alignments within reasonable computational time. It is able to return the optimal alignment for multiple sequences of length less than 1500 base pairs within 30 minutes. Its algorithmic decomposition nature lends itself naturally for parallel distributed computing, and we continue to explore its flexibility and scalability in a massive parallel environment.

Keywords

Evolutionary distance problem Multiple sequence alignment Phylogenetic analysis DNA sequencing Sequence comparison Minimum weight common mutated sequence Supernode Conflict graph Node-packing polytope 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Babel, L. (1991). “Finding Maximum Cliques in Arbitrary and in Special Graphs.” Computing, 46(4), 321–341.CrossRefGoogle Scholar
  2. Baeza-Yates, R.A. and C.H. Perleberg. (1992). “Fast and Practical Approximate String Matching.” In Proceeings of the 3rd Annual Symposium on Combinatorial Pattern Matching.Google Scholar
  3. Bains, W. and G.C. Smith. (1988). “A Novel Nethod for DNA Sequence Determination.” Journal of Theoretical Biology, 135, 303–307.CrossRefGoogle Scholar
  4. Bellare, M. and M. Sudan. (1994). “Improved Non-Approximability Results.” In Proc. 26th ACM Symp. on Theory of Computing, pp. 184–193.Google Scholar
  5. Berge, C. (1961). “Färbung Von Graphen Deren Sämtliche bzw, Ungerade Kreise Starr Sind.” Wiss. Z. Matin-Luther-Univ. Halle-Wittenberg, 114.Google Scholar
  6. Boppana, R. and M.M. Haldorsson. (1992). “Approximating Maximum Independent Set by Excluding Subgraphs.” BIT, 32, 130–196.CrossRefGoogle Scholar
  7. Chenna, R., H. Sugawara, T. Koike, T.J. Gibson, D.G. Higgins, and J.D. Thompson. (2003). “Multiple Sequence Alignment with the Clustal Series of Programs.” Nucleic Acids Research, 31(13), 3497–3500.CrossRefGoogle Scholar
  8. Chvátal, V. (1985). “Star-Cutsets and Perfect Graphs.” Journal of Combinatorial Theory Series B, 39, 189–199.CrossRefGoogle Scholar
  9. Chvátal, V. and D. Sankoff. (1975). “Longest Common Subsequences of two Random Sequences.” Journal of Applied Probability, 12, 306–315.CrossRefGoogle Scholar
  10. Duchet, P. (1984). “Classical Perfect Graphs, An Introduction with Emphasis on Triangulated and Interval Graphs.” Annals of Discrete Mathematics, 21, 67–96.Google Scholar
  11. Durbin, R., S. Eddy, A. Krogh, and G. Mitchison. (1998). Biological Sequence Analysis. Cambridge University Press, UK.Google Scholar
  12. Gallant, J., D. Maier, and J.A. Storer. (1980). “On Finding Minimal Length Superstrings.” Journal of Computer and System Sciences, 20, 50–58.CrossRefGoogle Scholar
  13. Garey, M. and D. Johnson. (1979). Computers and Intractibility: A Guide to the Theory of ℕℙ-Completeness. W.H. Freeman, San Francisco.Google Scholar
  14. Grötschel, M., L. Lovász, and A. Schrijver. (1988). Geometric Algorithms and Combinatorial Optimization. Springer-Verlag, New York.Google Scholar
  15. Grötschel, M., L. Lovász, and A. Schrijver. (1984). “Polynomial Algorithms for Perfect Graphs.” Annals of Discrete Mathematics, 325–356.Google Scholar
  16. Golumbic, M.C., D. Rotem, and J. Urrutia. (1983). “Comparability Graphs and Intersection Graphs.” Discrete Mathematics, 43, 37–46.CrossRefGoogle Scholar
  17. Hayward, R.B. (1985). “Weakly Triangulated Graphs.” Journal of Combinatorial Theory Series B, 39, 200–209.CrossRefGoogle Scholar
  18. Idury R.M. and M.S. Waterman. (1995). “A New Algorithm for DNA Sequence Assembly.” Journal of Computational Biology, 2(2), 291–306.CrossRefGoogle Scholar
  19. Jiang, T. and M. Li. (1995). “On the Approximation of Shortest Common Supersequences and Longest Common Subsequences.” SIAM J. Comput, 24(5), 1122–1139.CrossRefGoogle Scholar
  20. Kececioglu, J.D., H. Lenhof, K. Mehlhorn, P. Mutzel, K. Reinert, and M. Vingron. (2000). “A Polyhedral Approach to Sequence Alignment Problems.” Discrete Applied Mathematics, 104, 143–186.CrossRefGoogle Scholar
  21. Levenshtein, V.L.(1966). “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals.” Cybernetics Control Theory, 10(9), 707–710.Google Scholar
  22. Lipman, D.J., S.F. Altschul, and J.D. Kececioglu. (1989). “ A Tool for Multiple Sequence Alignment.” Proc Natl Acad Sci USA, 86(12), 4412–4415.Google Scholar
  23. Lu, M. and H. Lin. (1994) “Parallel Algorithms for the Longest Common Subsequence Problem.” IEEE Transaction on Parallel and Distri. Sys., 5(8), 835–847.CrossRefGoogle Scholar
  24. Maier, D. (1977). “The Complexity of Some Problems on Subsequences and Supersequences.” J. Assoc. Comput. Mach., 25, 322–336.Google Scholar
  25. Maier, D. and J.A. Storer. (1977). “A Note on the Complexity of the Superstring Problem.” Technical Report Report No. 233, Princeton UniversityGoogle Scholar
  26. Myoupo, J.F. and D. Seme. (1999). “Time-Efficient Parallel Algorithms for the Longest Common Subsequence and Related Problems.” Journal of Parallel and Distributed Computing, 57, 212–223.CrossRefGoogle Scholar
  27. Notredame, C. (2001). “Recent Progress in Multiple Sequence Alignment: A Survey.” Pharmacogenomics, 3(1).Google Scholar
  28. Sassano, A. (1997) “Chair-Free Berge Graphs are Perfect.” Graphs and Combinatorics, 13, 369–395.Google Scholar
  29. Schierup, M.H. and J. Hein. (2000). “Consequences of Recombination on Traditional Phylogenetic Analysis.” Genetics, 156(2), 879–891.Google Scholar
  30. Sellers, P.H. (1974). “On the Theory and Computation of Evolutionary Distances.” SIAM Journal on Applied Mathematics, 26(4), 787–793.CrossRefGoogle Scholar
  31. Shyu, S.J., Y.T. Tsai, and R.C.T. Lee. (2004). “The Minimal Spanning Tree Preservation Approaches for DNA Multiple Sequence Alignment and Evolutionary Tree Construction.” Journal of Combinatorial Optimization, 8(4), 453–468.CrossRefGoogle Scholar
  32. Tajima, F. and N. Takezaki. (1994). “Estimation of Evolutionary Distance for Reconstructing Molecular Phylogenetic Trees.” Molecular Biology and Evolution, 11, 278–286.Google Scholar
  33. Teng, S. and F. Yao. (1993) “Approximating Shortest Supersequences.” In Proc. of 34th Ann. IEEE Symp. on Foundations of Comp. Sci., IEEE Computer Society, pp. 158–165.Google Scholar
  34. Thompson, J.D., D.G. Higgins, and T.J. Gibson. (1994). “CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment Through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice.” Nucleic Acids Res., 22(22), 4673–4680.CrossRefGoogle Scholar
  35. Wagner, R.A. and M.J. Fischer. (1974). “The Sequence-to-Sequence Correction Problem.” J. Assoc. Comput. Mach., 21, 168–173.Google Scholar
  36. Waterman M.S. (1995). Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, UK.Google Scholar
  37. Zhang, Y. and M.S. Waterman. (2003). “An Eulerian Path approach to Global Multiple Alignment for DNA Sequences.” Journal of Computational Biology, 10(6), 803–819.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  1. 1.Center for Operations Research in MedicineSchool of Industrial and Systems Engineering, Georgia Institute of TechnologyAtlantaGeorgia
  2. 2.Winship Cancer InstituteEmory University School of MedicineAtlantaGeorgia

Personalised recommendations