Abstract
DNA/protein sequence alignments in computational molecular biology depend heavily on the settings of penalties for substitutions, insertions/deletions and gaps. Inappropriate choice of parameters causes irrelevant matches (“noise”) to be reported, thus obscuring biologically relevant matches. In practice, biologists frequently compare sequences in a few iterations, starting from a vague idea about appropriate parameters, then refining parameters to reduce noise. This procedure often helps to delineate biologically interesting similarities and to substantially reduce laborious analysis. This paper provides a computational underpinning for such iterative noise filtration in alignment graphs. Our main results assume that a preliminary “noisy” alignment, computed with reasonable but ad hoc parameters, is given; the problem is to modify the parameters to reduce noise. We present fast algorithms to refine penalty parameters and describe an application of these algorithms.
Keywords
- Decomposition Tree
- Optimal Alignment
- Locus Control Region
- Alignment Graph
- Computational Molecular Biology
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
The research was supported in part by the National Science Foundation under grant DIR-9106510.
The research was supported in part by the National Science Foundation under grant CCR-9308567 and the National Institutes of Health under grant R0I HG00987.
The research was supported in part by the National Institutes of Health under grant R01 LM05110.
This is a preview of subscription content, access via your institution.
Preview
Unable to display preview. Download preview PDF.
References
Boguski, M., R. Hardison, S. Schwartz and W. Miller (1992) Analysis of conserved domains and sequence motifs in cellular regulatory proteins and locus control regions using new software tools for multiple alignment and visualization. The New Biologist 4, 247–260.
Dayhoff, M., W. Barker and L. Hunt (1983) Establishing homologies in protein sequences. Methods in Enzymology 91, 524–545.
Fitch, W., and T. Smith (1983) Optimal sequence alignments. Proc. Natl. Acad. Sci. USA 80, 1382–1386.
Gotoh, O. (1990) Optimal sequence alignment allowing for long gaps. Bull. Math. Biol. 52, 359–373.
Gusfield, D., K. Balasubramanian and D. Naor (1992) Parametric optimization of sequence alignment. Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, January 1992, 432–439.
Hardison, R., K.-M. Chao, M. Adamkiewicz, D. Price, J. Jackson, T. Zeigler, N. Stojanovic and W. Miller (1993) Positive and negative regulatory elements of the rabbit embryonic ε-globin gene revealed by an improved multiple alignment program and functional analysis. DNA Sequence, 4, 163–176.
Hardison, R., and W. Miller (1993) Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters. Molecular Biology and Evolution 10, 73–102.
Huang, X., and W. Miller (1991) A time-efficient, linear-space local similarity algorithm. Advances in Applied Mathematics 12, 337–357.
Huang, X. (1994) An algorithm for identifying regions of a DNA sequence that satisfy a content requirement. Comput. Applic. Biosci. (to appear).
Miller, W., and E. W. Myers (1988) Sequence comparison with concave weighting functions, Bull. Math. Biol. 50, 97–120.
Miller, W., S. Schwartz and R. Hardison (1994) A point of contact between computer science and molecular biology. IEEE Computational Science and Engineering (to appear).
Panjukov V.V. (1993) Finding steady alignments: similarity and distance. Comp. Appl. in Biol. Sci, 9, 285–290.
Preparata F., and M. Shamos (1985) Computational geometry. An introduction. Springer-Verlag, New York.
Rechid, R., M. Vingron and P. Argos (1989) A new interactive protein sequence alignment program and comparison of its results with widely used algorithms. Comput. Appl. Biosci. 5, 107–113.
Schwartz, S., W. Miller, C.-M. Yang and R. Hardison (1991) Software tools for analyzing pairwise sequence alignments. Nucleic Acids Research 19, 4663–4667.
Vingron, M., and P. A. Pevzner (1993) Multiple sequence alignment and n-dimensional image reconstruction. A. Apostolico, M. Crochermore, Z. Galil, U. Manber (eds.) Combinatorial Pattern Matching 1993, Padova, Italy Lecture Notes in Computer Science 684, 243–253.
Vingron, M., and M. S. Waterman (1994) Parametric sequence alignment and penalty choice: Case studies. J. Mol. Biol. 235, 1–12.
Waterman, M. S., and M. Eggert (1987) A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J. Mol. Biol. 197, 723–725.
Waterman, M. S., M. Eggert and E. Lander (1992) Parametric sequence comparisons. Proc. Natl. Acad. Sci. USA 89, 6090–6093.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, X., Pevzner, P.A., Miller, W. (1994). Parametric recomputing in alignment graphs. In: Crochemore, M., Gusfield, D. (eds) Combinatorial Pattern Matching. CPM 1994. Lecture Notes in Computer Science, vol 807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58094-8_8
Download citation
DOI: https://doi.org/10.1007/3-540-58094-8_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58094-2
Online ISBN: 978-3-540-48450-9
eBook Packages: Springer Book Archive
