DCA based algorithms for multiple sequence alignment (MSA)

Le Thi, Hoai An; Pham Dinh, Tao; Belghiti, Moulay

doi:10.1007/s10100-013-0324-5

DCA based algorithms for multiple sequence alignment (MSA)

Original Paper
Published: 01 September 2013

Volume 22, pages 501–524, (2014)
Cite this article

Central European Journal of Operations Research Aims and scope Submit manuscript

Hoai An Le Thi¹,
Tao Pham Dinh² &
Moulay Belghiti²

260 Accesses
5 Citations
Explore all metrics

Abstract

In the last years many techniques in bioinformatics have been developed for the central and complex problem of optimally aligning biological sequences. In this paper we propose a new optimization approach based on DC (Difference of Convex functions) programming and DC Algorithm (DCA) for the multiple sequence alignment in its equivalent binary linear program, called “Maximum Weight Trace” problem. This problem is beforehand recast as a polyhedral DC program with the help of exact penalty techniques in DC programming. Our customized DCA, requiring solution of a few linear programs, is original because it converges after finitely many iterations to a binary solution while it works in a continuous domain. To scale-up large-scale (MSA), a constraint generation technique is introduced in DCA. Preliminary computational experiments on benchmark data show the efficiency of the proposed algorithm DCAMSA, which generally outperforms some standard algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the fifteenth international conference on machine learning (ICML 1998), pp 82–90
Chambolle A, DeVore RA, Lee NY, Lucier BJ (1998) Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans Image Process 7: 319–335
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38
Google Scholar
Greenberg HJ (2007) Integer quadratic programming models in computational biology. In: Operations research proceedings, vol 2006. Springer, Berlin, pp 83–95
Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge, MA
Book Google Scholar
Kececioglu J (1993) The maximum weight trace problem in multiple sequence alignment. In: Proceedings of the 4th symposium on combinatorial pattern matching, pp 106–119
Kececioglu JD (1991) Exact and approximation algorithms for DNA sequence reconstruction, PhD thesis, University of Arizona
Kececioglu JD, Lenhof H-P, Mehlhorn K, Mutzel P, Reinert K, Vingron M (2000) A polyhedral approach to sequence alignment problems. Discret Appl Math 104:143–186
Article Google Scholar
Le Thi HA. DC programming and DCA. Available on http://lita.sciences.univ-metz.fr/lethi/DCA.html
Le Thi HA, Pham Dinh T (1997) Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J Glob Optim 11(3):253–285
Google Scholar
Le Thi HA, Pham Dinh T (2003) Large scale molecular optimization from distances matrices by a DC optimization approach. SIAM J Optim 14(1):77–116
Google Scholar
Le Thi HA, Pham Dinh T (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann Oper Res 133:23–46
Article Google Scholar
Le Thi HA, Pham Dinh T, Le DM (1999) Exact penalty in DC programming. Vietnam J Math 27(2):169–178
Google Scholar
Le Thi HA, Belghiti M, Tao PD (2007) A new efficient algorithm based on DC programming and DCA for clustering. J Glob Optim 37:593–608
Google Scholar
Le Thi HA, Pham Dinh T, Huynh VN (2009) Convergence analysis of DC algorithms for DC programming with subanalytic data. Research report, National Institute for Applied Sciences, Rouen
Lenhof H-P, Retnert K, Vingron M (1998) A polyhedral approach to RNA sequence structure alignment. J Comput Biol 5(3):517–530
Article Google Scholar
McClure MA, Vasi TK, Fitch WM (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol Biol Evol 11:571–592
Google Scholar
Myers E, Miller W (1988) Optimal alignments in linear space. Comput Appl Biosci 4(1):11–17
Google Scholar
Neumann J, SchnÖrr C, Steidl G (2004) SVM-based feature selection by direct objective minimisation. Pattern recognition. In: Proceedings of 26th DAGM symposium, LNCS, Springer, Aug. 2004
Notredame C (2002) Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 3(1):131–144
Article Google Scholar
Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res 24(8):1515–1524
Article Google Scholar
Notredame C, Higgins DG, Heringa J (2000) T-COFFEE: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 392:205–217
Article Google Scholar
Pham Dinh T, Le Thi HA (1997) Convex analysis approach to DC programming: theory, algorithms and applications. Acta Mathematica Vietnamica 22(1):289–355
Google Scholar
Pham Dinh T, Le Thi HA (1998) DC optimization algorithms for solving the trust region subproblem. SIAM J Optim 8:476–505
Article Google Scholar
Rajasekaran S, Nick H, Pardalos PM, Sahni S, Shaw G (2001a) Efficient algorithms for local alignment search. J Comb Optim 5(1):117–124
Article Google Scholar
Rajasekaran S, Hu Y, Luo J, Nick H, Pardalos PM, Sahni S, Shaw G (2001b) Efficient algorithms for similarity search. J Comb Optim 5(1):125–132
Article Google Scholar
Reinert K, Lenhof H, Mutzel P, Mehlhorn K, Kececioglu JD (1997) A branch-and-cut algorithm for multiple sequence alignment. RECOMB, pp 241–250
Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton, NJ
Google Scholar
Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680
Article Google Scholar
Thompson J, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignments database for the evaluation of multiple sequence alignment programs. Bioinformatics 15:87–88
Article Google Scholar
Yufeng L, Shen X, Doss H (2005) Multicategory \(\psi \)-Learning and support vector machine: computational tools. J Comput Graph Stat 14(1): 219–236
Google Scholar
Yuille AL, Rangarajan A (2003) The convex concave procedure (CCCP). Neural Comput 15(4):915–936
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Theoretical and Applied Computer Science, UFR MIM, University of Lorraine, Ile du Saulcy, 57045 , Metz, France
Hoai An Le Thi
Laboratory of Mathematics, National Institute for Applied Sciences-Rouen, Avenue de l’Université, 76801 , Saint-Etienne-du-Rouvray Cedex, France
Tao Pham Dinh & Moulay Belghiti

Authors

Hoai An Le Thi
View author publications
You can also search for this author in PubMed Google Scholar
Tao Pham Dinh
View author publications
You can also search for this author in PubMed Google Scholar
Moulay Belghiti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hoai An Le Thi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le Thi, H.A., Pham Dinh, T. & Belghiti, M. DCA based algorithms for multiple sequence alignment (MSA). Cent Eur J Oper Res 22, 501–524 (2014). https://doi.org/10.1007/s10100-013-0324-5

Download citation

Published: 01 September 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10100-013-0324-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DCA based algorithms for multiple sequence alignment (MSA)

Abstract

Access this article

Similar content being viewed by others

DC Programming and DCA for Challenging Problems in Bioinformatics and Computational Biology

A Multi-objective Optimization Framework for Multiple Sequence Alignment with Metaheuristics

Heuristics for multiobjective multiple sequence alignment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DCA based algorithms for multiple sequence alignment (MSA)

Abstract

Access this article

Similar content being viewed by others

DC Programming and DCA for Challenging Problems in Bioinformatics and Computational Biology

A Multi-objective Optimization Framework for Multiple Sequence Alignment with Metaheuristics

Heuristics for multiobjective multiple sequence alignment

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation