Abstract
In the last years many techniques in bioinformatics have been developed for the central and complex problem of optimally aligning biological sequences. In this paper we propose a new optimization approach based on DC (Difference of Convex functions) programming and DC Algorithm (DCA) for the multiple sequence alignment in its equivalent binary linear program, called “Maximum Weight Trace” problem. This problem is beforehand recast as a polyhedral DC program with the help of exact penalty techniques in DC programming. Our customized DCA, requiring solution of a few linear programs, is original because it converges after finitely many iterations to a binary solution while it works in a continuous domain. To scale-up large-scale (MSA), a constraint generation technique is introduced in DCA. Preliminary computational experiments on benchmark data show the efficiency of the proposed algorithm DCAMSA, which generally outperforms some standard algorithms.
Similar content being viewed by others
References
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the fifteenth international conference on machine learning (ICML 1998), pp 82–90
Chambolle A, DeVore RA, Lee NY, Lucier BJ (1998) Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans Image Process 7: 319–335
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38
Greenberg HJ (2007) Integer quadratic programming models in computational biology. In: Operations research proceedings, vol 2006. Springer, Berlin, pp 83–95
Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge, MA
Kececioglu J (1993) The maximum weight trace problem in multiple sequence alignment. In: Proceedings of the 4th symposium on combinatorial pattern matching, pp 106–119
Kececioglu JD (1991) Exact and approximation algorithms for DNA sequence reconstruction, PhD thesis, University of Arizona
Kececioglu JD, Lenhof H-P, Mehlhorn K, Mutzel P, Reinert K, Vingron M (2000) A polyhedral approach to sequence alignment problems. Discret Appl Math 104:143–186
Le Thi HA. DC programming and DCA. Available on http://lita.sciences.univ-metz.fr/lethi/DCA.html
Le Thi HA, Pham Dinh T (1997) Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J Glob Optim 11(3):253–285
Le Thi HA, Pham Dinh T (2003) Large scale molecular optimization from distances matrices by a DC optimization approach. SIAM J Optim 14(1):77–116
Le Thi HA, Pham Dinh T (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann Oper Res 133:23–46
Le Thi HA, Pham Dinh T, Le DM (1999) Exact penalty in DC programming. Vietnam J Math 27(2):169–178
Le Thi HA, Belghiti M, Tao PD (2007) A new efficient algorithm based on DC programming and DCA for clustering. J Glob Optim 37:593–608
Le Thi HA, Pham Dinh T, Huynh VN (2009) Convergence analysis of DC algorithms for DC programming with subanalytic data. Research report, National Institute for Applied Sciences, Rouen
Lenhof H-P, Retnert K, Vingron M (1998) A polyhedral approach to RNA sequence structure alignment. J Comput Biol 5(3):517–530
McClure MA, Vasi TK, Fitch WM (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol Biol Evol 11:571–592
Myers E, Miller W (1988) Optimal alignments in linear space. Comput Appl Biosci 4(1):11–17
Neumann J, SchnÖrr C, Steidl G (2004) SVM-based feature selection by direct objective minimisation. Pattern recognition. In: Proceedings of 26th DAGM symposium, LNCS, Springer, Aug. 2004
Notredame C (2002) Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 3(1):131–144
Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res 24(8):1515–1524
Notredame C, Higgins DG, Heringa J (2000) T-COFFEE: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 392:205–217
Pham Dinh T, Le Thi HA (1997) Convex analysis approach to DC programming: theory, algorithms and applications. Acta Mathematica Vietnamica 22(1):289–355
Pham Dinh T, Le Thi HA (1998) DC optimization algorithms for solving the trust region subproblem. SIAM J Optim 8:476–505
Rajasekaran S, Nick H, Pardalos PM, Sahni S, Shaw G (2001a) Efficient algorithms for local alignment search. J Comb Optim 5(1):117–124
Rajasekaran S, Hu Y, Luo J, Nick H, Pardalos PM, Sahni S, Shaw G (2001b) Efficient algorithms for similarity search. J Comb Optim 5(1):125–132
Reinert K, Lenhof H, Mutzel P, Mehlhorn K, Kececioglu JD (1997) A branch-and-cut algorithm for multiple sequence alignment. RECOMB, pp 241–250
Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton, NJ
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680
Thompson J, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignments database for the evaluation of multiple sequence alignment programs. Bioinformatics 15:87–88
Yufeng L, Shen X, Doss H (2005) Multicategory \(\psi \)-Learning and support vector machine: computational tools. J Comput Graph Stat 14(1): 219–236
Yuille AL, Rangarajan A (2003) The convex concave procedure (CCCP). Neural Comput 15(4):915–936
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Le Thi, H.A., Pham Dinh, T. & Belghiti, M. DCA based algorithms for multiple sequence alignment (MSA). Cent Eur J Oper Res 22, 501–524 (2014). https://doi.org/10.1007/s10100-013-0324-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10100-013-0324-5