A Parallel Algorithm for Multiple Biological Sequence Alignment

Andalon-Garcia, Irma R.; Chavoya, Arturo; Meda-Campaña, M. E.

doi:10.1007/978-3-642-28792-3_31

Irma R. Andalon-Garcia²⁰,
Arturo Chavoya²⁰ &
M. E. Meda-Campaña²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7223))

Included in the following conference series:

International Conference on Information Processing in Cells and Tissues

1052 Accesses

Abstract

The search of a multiple sequence alignment (MSA) is a well-known problem in bioinformatics that consists in finding a sequence alignment of three or more biological sequences. In this paper, we propose a parallel iterative algorithm for the global alignment of multiple biological sequences. In this algorithm, a number of processes work independently at the same time searching for the best MSA of a set of sequences. It uses a Longest Common Subsequence (LCS) technique in order to generate a first MSA. An iterative process improves the MSA by applying a number of operators that have been implemented to produce more accurate alignments. Simulations were made using sequences from the UniProKB protein database. A preliminary performance analysis and comparison with several common methods for MSA shows promising results. The implementation was developed on a cluster platform through the use of the standard Message Passing Interface (MPI) library.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. Molecular Biology-Elsevier 215(3), 403–410 (1990)
Google Scholar
Anbarasu, L., Narayanasamy, P., Sundararajan, V.: Multiple molecular sequence alignment by island parallel genetic algorithm. Current Science 78(7), 858–863 (2000)
Google Scholar
Bilu, Y., Agarwal, P., Kilodny, R.: Faster algorithms for optimal multiple sequence alignment based on pairwise comparisons. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(4), 408–422 (2006)
Article Google Scholar
Chengpeng, B.: DNA motif alignment by evolving a population of Markov chains. BMC Bioinformatics 10(1), S13 (2009)
Article Google Scholar
Edgar, R.: Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792–1797 (2004)
Article Google Scholar
Galperin, M., Cochrane, G.: The 2011 nucleic acids research database issue and the online molecular biology database collection. Nucleic Acids Research 39, D1–D6 (2011)
Article Google Scholar
Gotoh, O.: Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as a assessed by reference to structural alignments. J. Mol. Biol. 264, 823–838 (1996)
Article Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Biochemistry 89, 10915–10919 (1992)
Google Scholar
Jones, N., Pevzner, P.A.: An introduction to bioinformatics algorithms. MIT Press (1996)
Google Scholar
Kim, J., Pramanik, S., Chung, M.: Multiple sequence alignment using simulated annealing. Comput. Appl. Biosci. 10(4), 419–426 (1994)
Google Scholar
Kleinjung, J., Douglas, N., Heringa, J.: Parallelized multiple alignment. Bioinformatics Applications Note 18(9), 1270–1271 (2002)
Google Scholar
Lassmann, T., Frings, O., Sonnhammer, E.: Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleid Acids Research 37(3), 858–865 (2009)
Article Google Scholar
Li, K.: Clustalw-mpi: Clustalw analysis using distributed and parallel computing. Bioinformatics Applications Note 19(12), 1585–1586 (2003)
Google Scholar
Lipman, D., Pearson, W.: Rapid and sensitive protein similarity searches. Science 227(4693), 1435–1441 (1985)
Article Google Scholar
Lu, Y., Sze, S.: Improvig accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues. Nucleic Acids Research 37(2), 463–472 (2009)
Article Google Scholar
Luscombe, N., Greenbaum, D., Gerstein, M.: What is bioinformatics? a proposed definition and overview of the field. Method Inf. Med. 40(4), 346–358 (2001)
Google Scholar
Moretti, S., Armougom, F., Wallace, I., Higgins, D., Jongeneel, C., Notredame, C.: The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods. Nucleic Acids Research 35, Web Server Issue, W645–W648 (2007)
Article Google Scholar
Mount, D.: Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press (2004)
Google Scholar
National Center for Biotechnology Information: Fasta format, http://blast.ncbi.nlm.nih.gov/blastcgihelp.shtml
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Notredame, C., Higgins, D.: Saga: sequence alignment by genetic algorithm. Nucleic Acids Research 24(8), 1515–1524 (1996)
Article Google Scholar
Notredame, C., Higgins, D., Heringa, J.: T-coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)
Article Google Scholar
Shu, N., Elofsson, A.: KalignP: Improved multiple sequence alignments using position specific gap penalties in kalign2. Bioinformatics Applications Note 27(12), 1702–1703 (2011)
Google Scholar
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
Thompson, J., Higgins, D., Gibson, T.: Clustal w: improving the sensitivy of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22(22), 4673–4680 (1994)
Article Google Scholar
Wagner, R., Fischer, M.: The string-to-string correction problem. ACM 21(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar
Wallace, I., O’Sullivan, O., Higgins, D., Notredame, C.: M-coffee: combining multiple sequence alignment methods with t-coffee. Nucleic Acids Research 34(6), 1692–1699 (2006)
Article Google Scholar
Wang, Y., Li, K.: An adaptative and iterative algorithm for refining multiple sequence alignment. Computational Biology and Chemistry 28, 141–148 (2004)
Article MATH Google Scholar
Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning dna sequences. Journal of Computational Biology 7(1/2), 203–214 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Systems, Universidad de Guadalajara, Periferico Norte 799-L308, Zapopan, Jal., Mexico, 45100
Irma R. Andalon-Garcia, Arturo Chavoya & M. E. Meda-Campaña

Authors

Irma R. Andalon-Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Arturo Chavoya
View author publications
You can also search for this author in PubMed Google Scholar
M. E. Meda-Campaña
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronics, University of York, YO10 5DD, York, UK
Michael A. Lones & Stephen L. Smith &
MRC Laboratory of Molecular Biology, Hills Road, CB2 0QH, Cambridge, UK
Sarah Teichmann
The Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
Felix Naef
Department of Electronics, University of York, YO10 5DD, York, UK
James A. Walker & Martin A. Trefzer &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Andalon-Garcia, I.R., Chavoya, A., Meda-Campaña, M.E. (2012). A Parallel Algorithm for Multiple Biological Sequence Alignment. In: Lones, M.A., Smith, S.L., Teichmann, S., Naef, F., Walker, J.A., Trefzer, M.A. (eds) Information Processign in Cells and Tissues. IPCAT 2012. Lecture Notes in Computer Science, vol 7223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28792-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-28792-3_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28791-6
Online ISBN: 978-3-642-28792-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics