Parametric recomputing in alignment graphs

  • Xiaoqiu Huang
  • Pavel A. Pevzner
  • Webb Miller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 807)


DNA/protein sequence alignments in computational molecular biology depend heavily on the settings of penalties for substitutions, insertions/deletions and gaps. Inappropriate choice of parameters causes irrelevant matches (“noise”) to be reported, thus obscuring biologically relevant matches. In practice, biologists frequently compare sequences in a few iterations, starting from a vague idea about appropriate parameters, then refining parameters to reduce noise. This procedure often helps to delineate biologically interesting similarities and to substantially reduce laborious analysis. This paper provides a computational underpinning for such iterative noise filtration in alignment graphs. Our main results assume that a preliminary “noisy” alignment, computed with reasonable but ad hoc parameters, is given; the problem is to modify the parameters to reduce noise. We present fast algorithms to refine penalty parameters and describe an application of these algorithms.


Decomposition Tree Optimal Alignment Locus Control Region Alignment Graph Computational Molecular Biology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boguski, M., R. Hardison, S. Schwartz and W. Miller (1992) Analysis of conserved domains and sequence motifs in cellular regulatory proteins and locus control regions using new software tools for multiple alignment and visualization. The New Biologist 4, 247–260.Google Scholar
  2. 2.
    Dayhoff, M., W. Barker and L. Hunt (1983) Establishing homologies in protein sequences. Methods in Enzymology 91, 524–545.Google Scholar
  3. 3.
    Fitch, W., and T. Smith (1983) Optimal sequence alignments. Proc. Natl. Acad. Sci. USA 80, 1382–1386.Google Scholar
  4. 4.
    Gotoh, O. (1990) Optimal sequence alignment allowing for long gaps. Bull. Math. Biol. 52, 359–373.Google Scholar
  5. 5.
    Gusfield, D., K. Balasubramanian and D. Naor (1992) Parametric optimization of sequence alignment. Proceedings of the Third Annual ACM-SIAM Symposium on Discrete Algorithms, January 1992, 432–439.Google Scholar
  6. 6.
    Hardison, R., K.-M. Chao, M. Adamkiewicz, D. Price, J. Jackson, T. Zeigler, N. Stojanovic and W. Miller (1993) Positive and negative regulatory elements of the rabbit embryonic ε-globin gene revealed by an improved multiple alignment program and functional analysis. DNA Sequence, 4, 163–176.Google Scholar
  7. 7.
    Hardison, R., and W. Miller (1993) Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters. Molecular Biology and Evolution 10, 73–102.Google Scholar
  8. 8.
    Huang, X., and W. Miller (1991) A time-efficient, linear-space local similarity algorithm. Advances in Applied Mathematics 12, 337–357.Google Scholar
  9. 9.
    Huang, X. (1994) An algorithm for identifying regions of a DNA sequence that satisfy a content requirement. Comput. Applic. Biosci. (to appear).Google Scholar
  10. 10.
    Miller, W., and E. W. Myers (1988) Sequence comparison with concave weighting functions, Bull. Math. Biol. 50, 97–120.Google Scholar
  11. 11.
    Miller, W., S. Schwartz and R. Hardison (1994) A point of contact between computer science and molecular biology. IEEE Computational Science and Engineering (to appear).Google Scholar
  12. 12.
    Panjukov V.V. (1993) Finding steady alignments: similarity and distance. Comp. Appl. in Biol. Sci, 9, 285–290.Google Scholar
  13. 13.
    Preparata F., and M. Shamos (1985) Computational geometry. An introduction. Springer-Verlag, New York.Google Scholar
  14. 14.
    Rechid, R., M. Vingron and P. Argos (1989) A new interactive protein sequence alignment program and comparison of its results with widely used algorithms. Comput. Appl. Biosci. 5, 107–113.Google Scholar
  15. 15.
    Schwartz, S., W. Miller, C.-M. Yang and R. Hardison (1991) Software tools for analyzing pairwise sequence alignments. Nucleic Acids Research 19, 4663–4667.Google Scholar
  16. 16.
    Vingron, M., and P. A. Pevzner (1993) Multiple sequence alignment and n-dimensional image reconstruction. A. Apostolico, M. Crochermore, Z. Galil, U. Manber (eds.) Combinatorial Pattern Matching 1993, Padova, Italy Lecture Notes in Computer Science 684, 243–253.Google Scholar
  17. 17.
    Vingron, M., and M. S. Waterman (1994) Parametric sequence alignment and penalty choice: Case studies. J. Mol. Biol. 235, 1–12.Google Scholar
  18. 18.
    Waterman, M. S., and M. Eggert (1987) A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. J. Mol. Biol. 197, 723–725.Google Scholar
  19. 19.
    Waterman, M. S., M. Eggert and E. Lander (1992) Parametric sequence comparisons. Proc. Natl. Acad. Sci. USA 89, 6090–6093.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Xiaoqiu Huang
    • 1
  • Pavel A. Pevzner
    • 2
  • Webb Miller
    • 2
  1. 1.Department of Computer ScienceMichigan Technological UniversityHoughton
  2. 2.Computer Science DepartmentThe Pennsylvania State UniversityUniversity Park

Personalised recommendations