Skip to main content
Log in

DCA based algorithms for multiple sequence alignment (MSA)

  • Original Paper
  • Published:
Central European Journal of Operations Research Aims and scope Submit manuscript

Abstract

In the last years many techniques in bioinformatics have been developed for the central and complex problem of optimally aligning biological sequences. In this paper we propose a new optimization approach based on DC (Difference of Convex functions) programming and DC Algorithm (DCA) for the multiple sequence alignment in its equivalent binary linear program, called “Maximum Weight Trace” problem. This problem is beforehand recast as a polyhedral DC program with the help of exact penalty techniques in DC programming. Our customized DCA, requiring solution of a few linear programs, is original because it converges after finitely many iterations to a binary solution while it works in a continuous domain. To scale-up large-scale (MSA), a constraint generation technique is introduced in DCA. Preliminary computational experiments on benchmark data show the efficiency of the proposed algorithm DCAMSA, which generally outperforms some standard algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the fifteenth international conference on machine learning (ICML 1998), pp 82–90

  • Chambolle A, DeVore RA, Lee NY, Lucier BJ (1998) Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans Image Process 7: 319–335

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39:1–38

    Google Scholar 

  • Greenberg HJ (2007) Integer quadratic programming models in computational biology. In: Operations research proceedings, vol 2006. Springer, Berlin, pp 83–95

  • Gusfield D (1997) Algorithms on strings, trees, and sequences. Cambridge University Press, Cambridge, MA

    Book  Google Scholar 

  • Kececioglu J (1993) The maximum weight trace problem in multiple sequence alignment. In: Proceedings of the 4th symposium on combinatorial pattern matching, pp 106–119

  • Kececioglu JD (1991) Exact and approximation algorithms for DNA sequence reconstruction, PhD thesis, University of Arizona

  • Kececioglu JD, Lenhof H-P, Mehlhorn K, Mutzel P, Reinert K, Vingron M (2000) A polyhedral approach to sequence alignment problems. Discret Appl Math 104:143–186

    Article  Google Scholar 

  • Le Thi HA. DC programming and DCA. Available on http://lita.sciences.univ-metz.fr/lethi/DCA.html

  • Le Thi HA, Pham Dinh T (1997) Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J Glob Optim 11(3):253–285

    Google Scholar 

  • Le Thi HA, Pham Dinh T (2003) Large scale molecular optimization from distances matrices by a DC optimization approach. SIAM J Optim 14(1):77–116

    Google Scholar 

  • Le Thi HA, Pham Dinh T (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann Oper Res 133:23–46

    Article  Google Scholar 

  • Le Thi HA, Pham Dinh T, Le DM (1999) Exact penalty in DC programming. Vietnam J Math 27(2):169–178

    Google Scholar 

  • Le Thi HA, Belghiti M, Tao PD (2007) A new efficient algorithm based on DC programming and DCA for clustering. J Glob Optim 37:593–608

    Google Scholar 

  • Le Thi HA, Pham Dinh T, Huynh VN (2009) Convergence analysis of DC algorithms for DC programming with subanalytic data. Research report, National Institute for Applied Sciences, Rouen

  • Lenhof H-P, Retnert K, Vingron M (1998) A polyhedral approach to RNA sequence structure alignment. J Comput Biol 5(3):517–530

    Article  Google Scholar 

  • McClure MA, Vasi TK, Fitch WM (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol Biol Evol 11:571–592

    Google Scholar 

  • Myers E, Miller W (1988) Optimal alignments in linear space. Comput Appl Biosci 4(1):11–17

    Google Scholar 

  • Neumann J, SchnÖrr C, Steidl G (2004) SVM-based feature selection by direct objective minimisation. Pattern recognition. In: Proceedings of 26th DAGM symposium, LNCS, Springer, Aug. 2004

  • Notredame C (2002) Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 3(1):131–144

    Article  Google Scholar 

  • Notredame C, Higgins DG (1996) SAGA: sequence alignment by genetic algorithm. Nucleic Acids Res 24(8):1515–1524

    Article  Google Scholar 

  • Notredame C, Higgins DG, Heringa J (2000) T-COFFEE: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 392:205–217

    Article  Google Scholar 

  • Pham Dinh T, Le Thi HA (1997) Convex analysis approach to DC programming: theory, algorithms and applications. Acta Mathematica Vietnamica 22(1):289–355

    Google Scholar 

  • Pham Dinh T, Le Thi HA (1998) DC optimization algorithms for solving the trust region subproblem. SIAM J Optim 8:476–505

    Article  Google Scholar 

  • Rajasekaran S, Nick H, Pardalos PM, Sahni S, Shaw G (2001a) Efficient algorithms for local alignment search. J Comb Optim 5(1):117–124

    Article  Google Scholar 

  • Rajasekaran S, Hu Y, Luo J, Nick H, Pardalos PM, Sahni S, Shaw G (2001b) Efficient algorithms for similarity search. J Comb Optim 5(1):125–132

    Article  Google Scholar 

  • Reinert K, Lenhof H, Mutzel P, Mehlhorn K, Kececioglu JD (1997) A branch-and-cut algorithm for multiple sequence alignment. RECOMB, pp 241–250

  • Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton, NJ

    Google Scholar 

  • Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22):4673–4680

    Article  Google Scholar 

  • Thompson J, Plewniak F, Poch O (1999) BAliBASE: a benchmark alignments database for the evaluation of multiple sequence alignment programs. Bioinformatics 15:87–88

    Article  Google Scholar 

  • Yufeng L, Shen X, Doss H (2005) Multicategory \(\psi \)-Learning and support vector machine: computational tools. J Comput Graph Stat 14(1): 219–236

    Google Scholar 

  • Yuille AL, Rangarajan A (2003) The convex concave procedure (CCCP). Neural Comput 15(4):915–936

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoai An Le Thi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le Thi, H.A., Pham Dinh, T. & Belghiti, M. DCA based algorithms for multiple sequence alignment (MSA). Cent Eur J Oper Res 22, 501–524 (2014). https://doi.org/10.1007/s10100-013-0324-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10100-013-0324-5

Keywords

Navigation