Abstract
A comparison of two sequences may uncover multiple regions of local similarity. While the significance of each local alignment may be evaluated independently, sometimes a combined assessment is appropriate. This paper discusses a variety of statistical and algorithmic issues that such an assessment presents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Smith, T.F. & Waterman, M.S. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147:195–197.
Pearson, W.R. & Lipman, D.J. (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444–2448.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215:403–410.
Smith, T.F., Waterman, M.S. & Burks, C. (1985). The statistical distribution of nucleic acid similarities. Nucl. Acids Res. 13:645–656.
Altschul, S.F. & Erickson, B.W. (1986). A nonlinear measure of subalignment similarity and its significance levels. Bull. Math. Biol. 48:617–632.
Arratia, R., Gordon, L. & Waterman, M.S. (1986). An extreme value theory for sequence matching. Ann. Stat. 14:971–993.
Collins, J.F., Coulson, A.F.W. & Lyall, A. (1988). The significance of protein sequence similarities. CABIOS 4:67–71.
Arratia, R. & Waterman, M.S. (1989). The Erdos-Renyi strong law for pattern matching with a given proportion of mismatches. Ann. Prob. 17:1152–1169.
Karlin, S. & Altschul, S.F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87:2264–2268
Dembo, A. & Karlin, S. (1991). Strong limit theorems of empirical functionals for large exceedances of partial sums of i.i.d. variables. Ann. Prob. 19:1737–1755.
Dembo, A. & Karlin, S. (1991). Strong limit theorems of empirical distributions for large segmental exceedances of partial sums of Markov variables. Ann. Prob. 19:1756–1767.
Mott, R. (1992). Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54:59–75.
Altschul, S.F. (1993). A protein alignment scoring system sensitive at all evolutionary distances. J. Mol. Evol. 36:290–300.
Arratia, R. & Waterman, M.S. (1994). A phase transition for the score in matching random sequences allowing deletions. Ann. Appl. Prob. 4:200–225.
Dembo, A., Karlin, S. & Zeitouni, O. (1994). Limit distribution of maximal non-aligned two-sequence segmental score. Ann. Prob. 22:2022–2039.
Waterman, M.S. & Vingron, M. (1994). Sequence comparison significance and Poisson approximation. Stat. Sci. 9:367–381.
Altschul, S.F. & Gish, W. (1996). Local alignment statistics. Meth. Enzymol. 266:460–480.
Sellers, P.H. (1984). Pattern recognition in genetic sequences by mismatch density. Bull. Math. Biol. 46:501–514
Altschul, S.F. & Erickson, B.W. (1986). Locally optimal subalignments using nonlinear similarity functions. Bull. Math. Biol. 48:633–660.
Waterman, M.S. & Eggert, M. (1987). A new algorithm for best subsequence alignments with applications to tRNA-rRNA comparisons. J. Mol. Biol. 197:723–728.
Huang, X., Hardison, R.C. & Miller, W. (1990). A space-efficient algorithm for local similarities. CABIOS 6:373–381.
Karlin, S. & Altschul, S.F. (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. U.S.A. 90:5873–5877.
Gumbel, E.J. (1958). Statistics of extremes. Columbia University Press, New York.
Woodsmall, R.M. & Benson, D.A. (1993). Information resources at the National Center for Biotechnology Information. Bull. Med. Libr. Assoc. 81:282–284.
Altschul, S.F., Boguski, M.S., Gish, W. & Wootton, J.C. (1994). Issues in searching molecular sequence databases. Nature Genet. 6:119–129.
Henikoff, S. & Henikoff, J.G. (1991). Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19:6565–6572.
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F. & Wootton, J.C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262:208–214.
Altschul, S.F. (1991). Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219:555–565.
Gotoh, O. (1982). An improved algorithm for matching biological sequences. J. Mol. Biol. 162:705–708.
Fitch, W.M. & Smith, T.F. (1983). Optimal sequence alignments. Proc. Natl. Acad. Sci. USA 80:1382–1386.
Altschul, S.F. & Erickson, B.W. (1986). Optimal sequence alignment using affine gap costs. Bull. Math. Biol. 48:603–616.
Myers, E.W. & Miller, W. (1988). Optimal alignments in linear space. CABIOS 4:11–17.
Henikoff, S. & Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89:10915–10919.
Bairoch, A. & Boeckmann, B. (1994). The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 22:3578–3580.
Xu, G.F., O’Connell, P., Viskochil, D., Cawthon, R., Robertson, M., Culver, M., Dunn, D., Stevens, J., Gesteland, R., White, R. & Weiss, R. (1990). The neurofibromatosis type I gene encodes a protein related to GAP. Cell 62:599–608.
Cvrckova, F. & Nasmyth, K. (1993). Yeast G1 cyclins CLNI and CLN2 and a GAP-like protein have a role in bud formation. EMBO J. 12:5277–5286.
Wootton, J.C. & Federhen, S. (1993). Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17:149–163.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer Science+Business Media New York
About this chapter
Cite this chapter
Altschul, S.F. (1997). Evaluating the Statistical Significance of Multiple Distinct Local Alignments. In: Suhai, S. (eds) Theoretical and Computational Methods in Genome Research. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5903-0_1
Download citation
DOI: https://doi.org/10.1007/978-1-4615-5903-0_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7708-5
Online ISBN: 978-1-4615-5903-0
eBook Packages: Springer Book Archive