Abstract
A dynamic programming algorithm to find all optimal alignments of DNA subsequences is described. The alignments use not only substitutions, insertions and deletions of nucleotides but also inversions (reversed complements) of substrings of the sequences. The inversion alignments themselves contain substitutions, insertions and deletions of nucleotides. We study the problem of alignment with non-intersecting inversions. To provide a computationally efficient algorithm we restrict candidate inversions to theK highest scoring inversions. An algorithm to find theJ best non-intersecting alignments with inversions is also described. The new algorithm is applied to the regions of mitochondrial DNA ofDrosophila yakuba and mouse coding for URF6 and cytochrome b and the inversion of the URF6 gene is found. The open problem of intersecting inversions is discussed.
Similar content being viewed by others
Literature
Arratia, R., P. Morris and M. S. Waterman. 1988. Stochastic scrabble: a law of large numbers for sequence matching with scores.J. appl. Prob. 25, 106–119.
Arratia, R. A., L. Goldstein and L. Gordon. 1989. Two moments suffice for Poisson approximation: The Chen-Stein method.Annls Prob. 17, 9–25.
Arratia, R. A., L. Gordon and M. S. Waterman. 1990. The Erdös-Rényi law in distribution, for coin tossing and sequence matching.Annls Statist. 18, 539–570.
Clary, D. O. and D. R. Wolstenholme. 1985. The mitochondrial DNA molecule ofDrosophila yakuba: nucleotide sequence, gene organization, and genetic code.J. molec. Evol. 22, 252–271.
Goldstein, L. and M. S. Waterman. 1992. Poisson, compound Poisson, and process approximations for testing statistical significance in sequence comparisons.Bull. math. Biol. in press.
Gotoh, O. 1982. An improved algorithm for matching biological sequences.J. molec. Biol. 162, 705–708.
Howe, C. J., R. F. Barker, C. M. Bowman and T. A. Dyer. 1988. Common features of three inversions in wheat chloroplast DNA.Curr. Genet. 13, 343–349.
Karlin, S. and S. F. Altschul. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes.Proc. natn. Acad. Sci. U.S.A. 87, 2264–2268.
Needleman, S. B. and C. D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins.J. molec. Biol. 48, 443–453.
Pearson, W. R. and D. J. Lipman. 1988. Improved tools for biological sequence comparison.Proc. natn. Acad. Sci. USA 85, 2444–2448.
Smith, T. F. and M. S. Waterman. 1981. Identification of common molecular subsequences.J. molec. Biol. 147, 195–197.
Smith, T. F., M. S. Waterman and W. M. Fitch. 1981. Comparative biosequence metrics,J. molec. Evol. 18, 38–46.
Wagner, R. A. 1983. On the complexity of the extended string-to-string correction problem. InTime Warps, String Edits, and Macromolecules: the Theory and Practice of Sequence Comparison. D. Sankoff and J. B. Kruskal (Eds), pp. 215–235. London: Addison-Wesley.
Waterman, M. S., T. F. Smith and W. A. Beyer. 1976. Some biological sequence metrics.Adv. Math. 20, 367–387.
Waterman, M. S. 1984. General methods of sequence comparison.Bull. math. Biol. 46, 473–500.
Waterman, M. S. and M. Eggert. 1987. A new algorithm for best subsequence alignments with application to tRNA-tRNA comparisons.J. molec. Biol. 197, 723–728.
Waterman, M. S., L. Gordon and R. Arratia. 1987. Phase transitions in sequence matches and nucleic acid structure.Proc. natn. Acad. Sci. U.S.A. 84, 1239–1243.
Waterman, M. S. 1989. Sequence alignments. In:Mathematical Methods for DNA Sequences, M. S. Waterman (Ed.), pp. 53–92. Boca Raton, Florida: CRC Press.
Zhou, D. X., O. Massenet, F. Quigley, M. J. Marion, F. Moneger, P. Huber, and R. Mache. 1988. Characterization of a large inversion in the spinach chloroplast genome relative toMarchantia: a possible transposon-mediated origin.Curr. Genet. 13, 433–439.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Schöniger, M., Waterman, M.S. A local algorithm for DNA sequence alignment with inversions. Bltn Mathcal Biology 54, 521–536 (1992). https://doi.org/10.1007/BF02459633
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02459633