Advertisement

An Algorithm for Multiple and Global Alignments

  • Mourad Elloumi
  • Ahmed Mokaddem
Part of the Communications in Computer and Information Science book series (CCIS, volume 13)

Abstract

In this paper, we develop a new algorithm to construct Multiple and Global Alignments (MGA) of primary structures, i.e., strings coding biological macromolecules. The construction of such alignments is based on the one of the (longest) Approximate Common Subsequences (ACS), made up by longer approximate substrings appearing, approximately, in the same positions in all the strings. This ACS represents a MGA. Constructing such alignments is a way to find homologies between biological macromolecules. Our algorithm is of complexity O(N 2*L 2*(log(L))2) in computing time, where N is the number of the strings and L is the length of the longest string.

Keywords

Strings multiple and global alignments common subsequence divide-and-conquer strategy algorithms complexities 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sagot, M.F.: Ressemblance Lexicale et Structurale Entre Macromolécules -Formalisation et Approches Combinatoires, Thèse de Doctorat, Université de Marne-La-Vallée, France (1996)Google Scholar
  2. 2.
    Hannenhalli, S.: Transforming Men Into Mice a Computationnal Theory of Genome Rearrangements, PhD Thesis, The Pennsylvania State University (1995)Google Scholar
  3. 3.
    Hannenhalli, S., Pevzner, P.: Transforming Cabbage Into Turnip (Polynomial Algorithm for Srting Signed Permutations By Reversals). In: Proc. 27th Annual ACM Symposium on the Theory of Computing, pp. 178–189 (1995)Google Scholar
  4. 4.
    Christie, D.: Genome Rearrangement Problems Ph.D Thesis, University of Glasgow (1998)Google Scholar
  5. 5.
    Corpet, F.: Multiple Sequence Alignment With Hierarchical Clustering. Nucleic Acids Research 16(22), 10881–10890 (1988)CrossRefGoogle Scholar
  6. 6.
    Depiereux, E., Feytmans, E.: MATCH-BOX - A fundamentally new Algorithm for The Simultaneous Alignment of Several Protein Sequences. Comput. Appl. Biosci. 8(5), 501–509 (1992)Google Scholar
  7. 7.
    Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast Algorithms for Large-ScaleGenome Alignment and Comparison. Nucleic Acids Research 30(11), 2478–2483 (2002)CrossRefGoogle Scholar
  8. 8.
    Lee, C., Grasso, C., Sharlow, M.: Multiple Sequence Alignment UingPpartial Order Graphs. Bioinformatics (18), 452–464 (2002)CrossRefGoogle Scholar
  9. 9.
    Brudno, M., Chapman, M., Göttgens, B., Batzoglou, S., Morgenstern, B.: Fast and Sensitive Multiple Alignment of Large Genomic Sequences. BMC Bioinformatics 4(66), 1–11 (2003)Google Scholar
  10. 10.
    Bray, N., Pachter, L.: MAVID: Constrained Ancestral Alignment of Multiple Sequences. Genome Research (14), 693–699 (2004)CrossRefGoogle Scholar
  11. 11.
    Lassmann, T., Sonnhammer, E.: Kalign: An Accurate and Fast Multiple Sequence Alignment Algorithm. BMC Bioinformatics (6), 298 (2005)CrossRefGoogle Scholar
  12. 12.
    Schwartz, A., Pachter, L.: Multiple Alignment by Sequence Annealing. Bioinformatics (2006)Google Scholar
  13. 13.
    Morgenstern, B., Dress, A., Werner, T.: Multiple DNA and Protein Sequence Alignment Based on Segment-to-Segment Comparison. Proc. Natl. Acad. Sci. U.S.A. (93), 2098–12103 (1996)CrossRefGoogle Scholar
  14. 14.
    Morgenstern, B., Frech, K., Dress, A., Werner, T.: DIALIGN: Finding Local Similarities by Multiple Sequence Alignment. Bioinformatics 14(3), 290–294 (1998)CrossRefGoogle Scholar
  15. 15.
    Lenhof, H.P., Morgenstern, B., Reinert, K.: An Exact Solution for The Segment-to-Segment Multiple Sequence Alignment Problem. Bioinformatics (15), 203–210 (1999)CrossRefGoogle Scholar
  16. 16.
    Schwartz, S., Kent, W.J., Smit, A., Zhang, Z.: Human-Mouse Alignments with Blastz. Genome Research, 103–107 (2003)Google Scholar
  17. 17.
    Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., Green, E.D., Sidow, A., Batzoglou, S.: LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA. Genome Research (13), 721–731 (2003)CrossRefGoogle Scholar
  18. 18.
    Frith, M.C., Hansen, U., Spouge, J.L.: Finding Functional Sequence Elements by Multiple Local Alignments. Nucleic Acids Research 32(1), 189–200 (2004)CrossRefGoogle Scholar
  19. 19.
    Morgenstern, B.: DIALIGN: Multiple DNA and Protein Sequence Alignment at BiBiServ. Nucleic Acids Research 32(Web Server Issue) (2004)Google Scholar
  20. 20.
    Ovcharenko, I., Loots, G.G., Giardine, B.M., Hou, M., Ma, J., Hardison, R.C., Stubbs, L., Millers, W.: Mulan: Multiple-Sequence Local Alignment and Visualization for Studying Function and Evolution. Genome Research (15), 184–194 (2005)CrossRefGoogle Scholar
  21. 21.
    Needleman, S.B., Wunsch, C.D.: A General Method Applicable to the Search for Similarities in The Amino-Acid Sequence of two Proteins. Journal of Molecular Biolog (48), 443–453 (1970)CrossRefGoogle Scholar
  22. 22.
    Byers, T.H., Waterman, M.S.: Determining All Optimal and Near-Optimal Solutions when Solving Shortest Path Problems by Dynamic Programming. Operations Research 32(6), 1381–1384 (1984) Operations Research Society of America (Eds.)zbMATHMathSciNetGoogle Scholar
  23. 23.
    Waterman, M.S., Byers, T.H.: A Dynamic Programming Algorithm to Find All Solutions in a Neighborhood of The Optimum. Mathematical Biosciences (77), 179–188 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Zuker, M.: Suboptimal Sequence Alignment in Molecular biology: 1nalysis with Errors. J. Mol. Biol. 221, 403–420 (1991)CrossRefGoogle Scholar
  25. 25.
    Naor, D., Brutlag, D.L.: On Near-Optimal Alignments of Biological Sequences. J. Comp. Biol. (4), 349–366 (1994)CrossRefGoogle Scholar
  26. 26.
    Kurtz, S., Ohlebusch, E., Schleiermacher, C., Stoye, J.: Reputer: The Manifold Applications of Repeat Analysis. Nucleic Acids Research 29(22), 4633–4642 (2001)CrossRefGoogle Scholar
  27. 27.
    Noé, L.: Recherche de Similarités dans Les Séquences d’ADN: Modèles et Algorithmes pour la Conception de Graines Efficaces, Thése de Doctorat, Université Henri Poincaré (2005)Google Scholar
  28. 28.
    Huang, W., Umbach, D.M., Leping, L.: Accurate Anchoring Alignment of Divergent Sequences. Bioinformatics, 22(1), 29–34 (2006)CrossRefGoogle Scholar
  29. 29.
    Zhang, X., Kahveci, T.: A New Approach for Alignment of Multiple Proteins. In: Pacific Symposium on Biocomputing, vol. 11, pp. 339–350 (2006)Google Scholar
  30. 30.
    Edgar, R.C.: MUSCLE: Multiple Sequence Alignment with High Accuracy and High throughput. Nucleic Acids Research 32(5), 1792–1797 (2004)CrossRefGoogle Scholar
  31. 31.
    Liang Ye, Y., Huang, X.: MAP2: Multiple Alignments of Syntenic Genomic sequences. Nucleic Acids Research 33(1), 162–170 (2005)CrossRefGoogle Scholar
  32. 32.
    Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Cybernetics and Control Theory 10(8), 707–710 (1966)MathSciNetGoogle Scholar
  33. 33.
    Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms, pp. 60–65. Addison-Wesley Publishing Company, Reading (1974)zbMATHGoogle Scholar
  34. 34.
    Karp, R., Miller, R.E., Rosenberg, A.L.: Rapid Identification of Repeated Patterns in Strings, Trees and Arrays. In: 4th symposium of theory of Computing, pp. 125–136 (1972)Google Scholar
  35. 35.
    RNA Families Database of Alignments and CMs http://www.sanger.ac.uk/Software/Rfam
  36. 36.
    Protein Families Database, http://www.sanger.ac.uk/Software/Pfam
  37. 37.
    National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov
  38. 38.
    Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C.: A Model for Evolutionary Change. Atlas of Protein Sequence and Structure 5(3), 345–352 (1979)Google Scholar
  39. 39.
    Henikoff, S., Henikoff, J.G.: Amino Acid Substitution Matrices From Protein Blocks. Proc. Natl. Acad. Sci. U.S.A. 89, 10915–10919 (1992)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Mourad Elloumi
    • 1
  • Ahmed Mokaddem
    • 1
  1. 1.Higher School of Sciences and Technologies of TunisResearch Unit of Technologies of Information and CommunicationTunisTunisia

Personalised recommendations