Journal of Mathematical Biology

, Volume 69, Issue 1, pp 147–182

Approximation of sojourn-times via maximal couplings: motif frequency distributions

Article

DOI: 10.1007/s00285-013-0690-6

Cite this article as:
Lladser, M.E. & Chestnut, S.R. J. Math. Biol. (2014) 69: 147. doi:10.1007/s00285-013-0690-6

Abstract

Sojourn-times provide a versatile framework to assess the statistical significance of motifs in genome-wide searches even under non-Markovian background models. However, the large state spaces encountered in genomic sequence analyses make the exact calculation of sojourn-time distributions computationally intractable in long sequences. Here, we use coupling and analytic combinatoric techniques to approximate these distributions in the general setting of Polish state spaces, which encompass discrete state spaces. Our approximations are accompanied with explicit, easy to compute, error bounds for total variation distance. Broadly speaking, if \({\mathsf{T}}_n\) is the random number of times a Markov chain visits a certain subset \({\mathsf{T}}\) of states in its first \(n\) transitions, then we can usually approximate the distribution of \({\mathsf{T}}_n\) for \(n\) of order \((1-\alpha )^{-m}\), where \(m\) is the largest integer for which the exact distribution of \({\mathsf{T}}_m\) is accessible and \(0\le \alpha \le 1\) is an ergodicity coefficient associated with the probability transition kernel of the chain. This gives access to approximations of sojourn-times in the intermediate regime where \(n\) is perhaps too large for exact calculations, but too small to rely on Normal approximations or stationarity assumptions underlying Poisson and compound Poisson approximations. As proof of concept, we approximate the distribution of the number of matches with a motif in promoter regions of C. elegans. Mathematical properties of the proposed ergodicity coefficients and connections with additive functionals of homogeneous Markov chains as well as ergodicity of non-homogeneous Markov chains are also explored.

Keywords

Additive functionals of Markov chains Embedding technique Ergodicity coefficients Motifs Non-homogeneous Markov chains Non-asymptotic approximations of distributions Patterns Sojourn-times Wolfgang Doeblin 

Mathematics Subject Classification (2000)

Primary 60J22 62E17 62L20 65C40 Secondary 92D20 60J05 

Supplementary material

285_2013_690_MOESM1_ESM.xlsx (55 kb)
Supplementary material 1 (xlsx 55 KB)
285_2013_690_MOESM2_ESM.xlsx (55 kb)
Supplementary material 2 (xlsx 54 KB)
285_2013_690_MOESM3_ESM.xlsx (55 kb)
Supplementary material 3 (xlsx 54 KB)

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Applied MathematicsUniversity of ColoradoBoulderUSA
  2. 2.Department of Applied Mathematics and StatisticsJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations