Alignment Statistics for Long-Range Correlated Genomic Sequences

  • Philipp W. Messer
  • Ralf Bundschuh
  • Martin Vingron
  • Peter F. Arndt
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


It is well known that the base composition along eukaryotic genomes is long-range correlated. Here, we investigate the effect of such long-range correlations on alignment score statistics. We model the correlated score-landscape by means of a Gaussian approximation. In this framework, we can calculate the corrections to the scale parameter λ of the extreme value distribution of alignment scores. To evaluate our approximate analytic results, we perform a detailed numerical study based on a simple algorithm to efficiently generate long-range correlated random sequences. We find that the mean and the exponential tail of the score distribution are in fact influenced by the correlations along the sequences. Therefore, the significance of measured alignment scores in biological sequences will change upon incorporation of the correlations in the null model.


Null Model Gaussian Approximation Score Distribution Alignment Score Global Alignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Waterman, M.S.: Introduction to Computational Biology: Maps, Sequences, and Genomes. CRC Press, Boca Raton (1995)zbMATHGoogle Scholar
  2. 2.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge (1998)CrossRefzbMATHGoogle Scholar
  3. 3.
    Peng, C.K., Buldyrev, S.V., Goldberger, A.L., Havlin, S., Sciortino, F., Simons, M., Stanley, H.E.: Long-range correlations in nucleotide sequences. Nature 356, 168 (1992)CrossRefGoogle Scholar
  4. 4.
    Li, W., Kaneko, K.: Long-range correlation and partial 1/f α spectrum in a noncoding DNA sequence. Europhys. Lett. 17, 655 (1992)CrossRefGoogle Scholar
  5. 5.
    Voss, R.F.: Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett. 68, 3805 (1992)CrossRefGoogle Scholar
  6. 6.
    Arneodo, A., Bacry, E., Graves, P.V., Muzy, J.F.: Characterizing long-range correlations in DNA sequences from wavelet analysis. Phys. Rev. Lett. 74, 3293 (1995)CrossRefGoogle Scholar
  7. 7.
    Bernaola-Galvan, P., Carpena, P., Roman-Roldan, R., Oliver, J.L.: Study of statistical correlations in DNA sequences. Gene. 300, 105 (2002)CrossRefGoogle Scholar
  8. 8.
    Li, W., Holste, D.: Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome. Phys. Rev. E 71, 41910 (2005)CrossRefGoogle Scholar
  9. 9.
    Li, W.: Expansion-modification systems: A model for spatial 1/f spectra. Phys. Rev. A 43, 5240 (1991)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Messer, P.W., Arndt, P.F., Lässig, M.: Solvable sequence evolution models and genomic correlations. Phys. Rev. Lett. 94, 138103 (2005)CrossRefGoogle Scholar
  11. 11.
    Messer, P.W., Lässig, M., Arndt, P.F.: Universality of long-range correlations in expansion-randomization systems. J. Stat. Mech., P10004 (2005)Google Scholar
  12. 12.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403 (1990)Google Scholar
  13. 13.
    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25, 403 (1997)CrossRefGoogle Scholar
  14. 14.
    Smith, S.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2, 482 (1981)CrossRefMathSciNetzbMATHGoogle Scholar
  15. 15.
    Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87, 2264 (1990)CrossRefzbMATHGoogle Scholar
  16. 16.
    Karlin, S., Dembo, A.: Limit distribution of the maximal segmental score among Markov-dependent partial sums. Adv. Appl. Prob. 24, 113 (1992)CrossRefMathSciNetzbMATHGoogle Scholar
  17. 17.
    Karlin, S., Altschul, S.F.: Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. U.S.A. 90, 5873 (1993)CrossRefGoogle Scholar
  18. 18.
    Smith, T.F., Waterman, M.S., Burks, C.: The statistical distribution of nucleic acid similarities. Nucleic Acids Res. 13, 645 (1985)CrossRefGoogle Scholar
  19. 19.
    Waterman, M.S., Vingron, M.: Rapid and accurate estimates of statistical significance for sequence data base searches. Proc. Natl. Acad. Sci. U.S.A. 91, 4625 (1994)CrossRefzbMATHGoogle Scholar
  20. 20.
    Altschul, S.F., Gish, W.: Local alignment statistics. Methods Enzymol. 266, 460 (1996)CrossRefGoogle Scholar
  21. 21.
    Mott, R.: Maximum likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54, 59 (1999)Google Scholar
  22. 22.
    Bundschuh, R.: An analytic approach to significance assessment in local sequence alignment with gaps. In: RECOMB 2000, p. 86 (2000)Google Scholar
  23. 23.
    Bundschuh, R.: Asymmetric exclusion process and extremal statistics of random sequences. Phys. Rev. E 65, 31911 (2002)CrossRefGoogle Scholar
  24. 24.
    Grossmann, S., Yakir, B.: Large deviations for global maxima of independent superadditive processes with negative drift and an application to optimal sequence alignments. Bernoulli 10, 829 (2004)CrossRefMathSciNetzbMATHGoogle Scholar
  25. 25.
    Park, Y., Sheetlin, S., Spouge, J.L.: Accelerated convergence and robust asymptotic regression of the Gumbel scale parameter for gapped sequence alignment. Journal of Physics A 38, 97 (2005)CrossRefzbMATHGoogle Scholar
  26. 26.
    Chia, N., Bundschuh, R.: A practical approach to significance assessment in alignment with gaps. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 474–488. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  27. 27.
    Altschul, S.F.: Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219, 555 (1991)CrossRefGoogle Scholar
  28. 28.
    Yu, Y.K., Bundschuh, R., Hwa, T.: Statistical significance and extremal ensemble of gapped local hybrid alignment. LNP: Biological Evolution and Statistical Physics 585, 3 (2002)CrossRefGoogle Scholar
  29. 29.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Philipp W. Messer
    • 1
  • Ralf Bundschuh
    • 2
  • Martin Vingron
    • 1
  • Peter F. Arndt
    • 1
  1. 1.Max Planck Institute for Molecular GeneticsBerlinGermany
  2. 2.Department of PhysicsOhio State UniversityColumbusUSA

Personalised recommendations