Alignment Statistics for Long-Range Correlated Genomic Sequences
It is well known that the base composition along eukaryotic genomes is long-range correlated. Here, we investigate the effect of such long-range correlations on alignment score statistics. We model the correlated score-landscape by means of a Gaussian approximation. In this framework, we can calculate the corrections to the scale parameter λ of the extreme value distribution of alignment scores. To evaluate our approximate analytic results, we perform a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find that the mean and the exponential tail of the score distribution are in fact influenced by the correlations along the sequences. Therefore, the significance of measured alignment scores in biological sequences will change upon incorporation of the correlations in the null model.
KeywordsNull Model Gaussian Approximation Score Distribution Alignment Score Global Alignment
Unable to display preview. Download preview PDF.
- 11.Messer, P.W., Lässig, M., Arndt, P.F.: Universality of long-range correlations in expansion-randomization systems. J. Stat. Mech., P10004 (2005)Google Scholar
- 12.Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403 (1990)Google Scholar
- 21.Mott, R.: Maximum likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54, 59 (1999)Google Scholar
- 22.Bundschuh, R.: An analytic approach to significance assessment in local sequence alignment with gaps. In: RECOMB 2000, p. 86 (2000)Google Scholar