Journal of Combinatorial Optimization

, Volume 3, Issue 2–3, pp 247–275

An Approximation Algorithm for Alignment of Multiple Sequences using Motif Discovery

  • Laxmi Parida
  • Aris Floratos
  • Isidore Rigoutsos
Article

DOI: 10.1023/A:1009841927822

Cite this article as:
Parida, L., Floratos, A. & Rigoutsos, I. Journal of Combinatorial Optimization (1999) 3: 247. doi:10.1023/A:1009841927822

Abstract

Given a set of N sequence, the Multiple Sequence Alignment problem is to align these N sequences, possibly with gaps, that brings out the best commonality of the N sequences. The quality of the alignment is usually measured by penalizing the mis-matches and gaps, and rewarding the matches with appropriate weight functions. However for larger values of N, additional constraints are required to give meaningful alignments. We identify a user-controlled parameter, an alignment number K (2 ≤ K ≤ N): this additional requirement constrains the alignment to have at least K sequences agree on a character, whenever possible, in the alignment. We identify a natural optimization problem for this approach called the K-MSA problem. We show that the problem is MAX SNP hard. We give a natural extension of this problem that incorporates “biological relevance” by using motifs (common patterns in the sequences) and give an approximation algorithm for this problem in terms of the motifs in the data. MUSCA is an implementation of this approach and our experimental results indicate that this approach is efficient, particularly on large numbers of long sequences, and gives good alignments when tested on biological data such as DNA and protein sequences.

multiple sequence alignment alignment number protein sequences motif discovery MAX SNP hard approximate algorithm set covering problem 

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Laxmi Parida
    • 1
  • Aris Floratos
    • 1
  • Isidore Rigoutsos
    • 1
  1. 1.Computational Biology Center, IBM Thomas J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations