MOSAL: software tools for multiobjective sequence alignment

Paquete, Luís; Matias, Pedro; Abbasi, Maryam; Pinheiro, Miguel

doi:10.1186/1751-0473-9-2

MOSAL: software tools for multiobjective sequence alignment

Brief reports
Open access
Published: 08 January 2014

Volume 9, article number 2, (2014)
Cite this article

Download PDF

You have full access to this open access article

Source Code for Biology and Medicine

MOSAL: software tools for multiobjective sequence alignment

Download PDF

Luís Paquete¹,
Pedro Matias¹,
Maryam Abbasi¹ &
…
Miguel Pinheiro²

3206 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Multiobjective sequence alignment brings the advantage of providing a set of alignments that represent the trade-off between performing insertion/deletions and matching symbols from both sequences. Each of these alignments provide a potential explanation of the relationship between the sequences. We introduce MOSAL, a software tool that provides an open-source implementation and an on-line application for multiobjective pairwise sequence alignment.

Objective Functions

A Multi-objective Optimization Framework for Multiple Sequence Alignment with Metaheuristics

A Multi-Objective Evolutionary Algorithm for Improving Multiple Sequence Alignments

Background

Sequence alignment is in the core of many bioinformatics applications. It aims to identify regions of similarity in sequences of biological data, such as nucleotide and amino acid residues. The procedure consists of inserting gaps between the residues so that similar symbols from several sequences become aligned. For two sequences, dynamic programming algorithms can compute the optimal alignment in an efficient manner [1]. However, for very large DNA or protein databases, heuristic approaches like FASTA and BLAST have been used [2, 3]. See [4] for an extensive review from a computational point of view.

Any of these approaches rely on the a priori definition of coefficients that are assigned to the components of the score function. These weights are usually defined by default in most of the software packages for sequence alignment and are usually not modified by the practitioner. However, there is a considerable disagreement about how to weight each coefficient. A small change in the weights can lead to a completely different alignment.

One way of overcoming the problem of setting weights is to consider a multiobjective formulation, where the practitioner is provided a set of optimal alignments representing the trade-off between components of the score function, for instance, substitution score given by a substitution matrix and the number of gaps; in this case, an alignment is optimal if there is no other alignment with better substitution score value and lesser number of gaps. Usually, there is not only one optimal alignment but several for which this notion of optimality holds; such set of all optimal alignments is called the Pareto optimal alignment set.

Under a multiobjective formulation, no weights are needed to be set up. Moreover, according to a classical result in the multiobjective optimization field [5], this optimal set contains not only all of the optima of a weighted sum formulation, but also many other alignments that are not possible to find at all by the weighted sum approach. Each of these alignments can be seen as a potential explanation of the relationship between the sequences and may be of interest for the practitioner for a more in-depth analysis. In fact, several other problems in bioinformatics have been already reformulated from a multiobjective point of view [6].

A multiobjective approach to pairwise sequence alignment has been explored by several researchers, both from a problem formulation and algorithmic point of view [7–11]. Recently, it has been applied to the construction of phylogenetic trees, which has shown to provide complementary information to that obtained by common methods [9].

Implementations

MOSAL is a software tool that results from the problem formulation given in [9] with the aim of providing an open-source implementation and an on-line application where this implementation can be tested. The web-server is available at http://mosal.dei.uc.pt and physically located at the Department of Informatics Engineering, University of Coimbra, and is one of the outcomes of a national funded research project on multiobjective sequence alignment.

Code

The code is written in C and provided under a GNU General Public License. A makefile is available for compilation under GNU/Linux. The implementation can be setup for several multiobjective score functions as described in [9]: maximization of the number of matches or substitution score and minimization of gaps or indels.

Speed-up techniques described in [9] are also implemented and can be parameterized, in particular, the maximum size of the lower bound set for the pruning technique. This parameter should be defined with some care; if too small, the pruning has a reduced effect, and if too large, a excessive number of comparisons may reduce the advantage of pruning in terms of CPU-time. For most of the benchmarks tested, a value of 10 seems to be the most appropriate [9].

The command line options available are described in Table 1. The implementation outputs the Pareto optimal set of alignments and the corresponding score function values by default.

Table 1 Command line options

Full size table

On-line application

The web-server provides also an on-line application, written in PHP, that is available for sequences up to 2000 symbols. Four steps are needed to produce the set of Pareto optimal alignments:

Step 1:
Insertion of each sequence in FASTA format in a text box. The user can choose either Protein or DNA type of sequence in a switch button.
Step 2:
Choice of the score function with switch buttons. The user can choose either matches or substitution score for the first score function component and either indels or gaps for the second score function component. If substitution score is chosen, the user can choose a substitution score matrix (PAM 100, 250 and BLOSUM 62, 75, 80, 85 if Protein option is chosen in the previous step) or can even provide one in a predefined text format.
Step 3:
Choice of the sequence alignment options: with or without the alignments and with or without pruning technique. If pruning is chosen, the number of bounds must be provided (10 is given by default). The option without alignment provides only the score function values of the alignments.
Step 4:
Submit to the server, with the option of sending an e-mail to the user with the output files.

Once the Pareto optimal alignment set is computed, the score function values are shown in an iterative plot; the user can zoom and choose a given point to see the corresponding alignment, see Figure 1. No information about the submissions is stored in the web-server. During the benchmark testing, the application was able to retrieve the output in less than 10 seconds for the largest sizes.

A visualization tool in the on-line application allows to visualize all the alignments and the corresponding score function values produced by the implementation or by the on-line application. The coloring scheme used in the Sequence Manipulation Suite (see http://www.bioinformatics.org/sms2/) is also applied here to allow the identification of potential regions of interest in the several alignments.

Conclusions

MOSAL provides a set of tools for the practitioner to perform a more in-depth analysis on the relation between a pair of biological sequences. The multiobjective formulation that is explored by the framework provides further insight into the confidence of the alignments obtained by common methods; for instance, a large number of optimal scores suggests that a single alignment may be insufficient to understand the relation between the sequences and that further investigation is required. Moreover, the output can be used to construct phylogenetic trees as suggested in [9].

References

Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970, 48 (3): 443-453. 10.1016/0022-2836(70)90057-4.
Article CAS PubMed Google Scholar
Altschul S, Gish W, Miller W, Myers E, Lipman D: A basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.
Article CAS PubMed Google Scholar
Lipman D, Pearson W: Rapid and sensitive protein similarity searches. Science. 1985, 227: 1435-1441. 10.1126/science.2983426.
Article CAS PubMed Google Scholar
Gusfield D: Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology. 1997, New York: Cambridge University Press
Book Google Scholar
Ehrgott M: Multicriteria optimization. 2005, Berlin: Springer
Google Scholar
Handl J, Kell DB, Knowles JD: Multiobjective Optimization in Bioinformatics and Computational Biology. IEEE/ACM Trans Comput Biol Bioinform. 2007, 4 (2): 279-292.
Article CAS PubMed Google Scholar
Roytberg M, Semionenkov M, Tabolina O: Pareto-optimal alignment of biological sequences. Biophysics. 1999, 44 (4): 565-577.
Google Scholar
Taneda A: Multi-objective pairwise RNA sequence alignment. Bioinformatics. 2010, 26 (19): 2383-2390. 10.1093/bioinformatics/btq439.
Article CAS PubMed Google Scholar
Abbasi M, Paquete L, Liefooghe A, Pinheiro M, Matias P: Improvements on bicriteria pairwise sequence alignment: algorithms and applications. Bioinformatics. 2013, 29 (8): 996-1003. 10.1093/bioinformatics/btt098.
Article CAS PubMed Google Scholar
DeRonne K, Karypis G: Pareto optimal pairwise sequence alignment. IEEE/ACM Trans Comput Biol Bioinform. 2013, 10 (2): 481-493.
Article PubMed Google Scholar
Schnattinger T, Schöning U, Kestler H: Structural RNA alignment by multi-objective optimization. Bioinformatics. 2013, 29 (13): 1607-1613. 10.1093/bioinformatics/btt188.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was support by the Fundação para a Ciência e Tecnologia, project MOSAL - Multiobjective sequence alignment (PTDC/EIA-CCO/098674/2008) and by FEDER, Programa Operacional Factores de Competitividade do QREN, ref. COMPETE: FCOMP-01-0124-FEDER-010024.

Author information

Authors and Affiliations

CISUC, Department of Informatics Engineering, University of Coimbra, Polo II, 3030-290, Coimbra, Portugal
Luís Paquete, Pedro Matias & Maryam Abbasi
School of Medicine, University of St. Andrews, KY16 9TF St. Andrews, North Haugh, UK
Miguel Pinheiro

Authors

Luís Paquete
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Matias
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Abbasi
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Pinheiro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel Pinheiro.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

LP is the principal investigator of the project, PM implemented the application in the web-server, MA developed the problem formulation and algorithms, MP tested and suggested improvements in the framework. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Paquete, L., Matias, P., Abbasi, M. et al. MOSAL: software tools for multiobjective sequence alignment. Source Code Biol Med 9, 2 (2014). https://doi.org/10.1186/1751-0473-9-2

Download citation

Received: 03 September 2013
Accepted: 05 January 2014
Published: 08 January 2014
DOI: https://doi.org/10.1186/1751-0473-9-2

MOSAL: software tools for multiobjective sequence alignment

Abstract