Skip to main content
Log in

RNA Pol II transcription model and interpretation of GRO-seq data

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

A mixture model and statistical method is proposed to interpret the distribution of reads from a nascent transcriptional assay, such as global run-on sequencing (GRO-seq) data. The model is annotation agnostic and leverages on current understanding of the behavior of RNA polymerase II. Briefly, it assumes that polymerase loads at key positions (transcription start sites) within the genome. Once loaded, polymerase either remains in the initiation form (with some probability) or transitions into an elongating form (with the remaining probability). The model can be fit genome-wide, allowing patterns of Pol II behavior to be assessed on each distinct transcript. Furthermore, it allows for the first time a principled approach to distinguishing the initiation signal from the elongation signal; in particular, it implies a data driven method for calculating the pausing index, a commonly used metric that informs on the behavior of RNA polymerase II. We demonstrate that this approach improves on existing analyses of GRO-seq data and uncovers a novel biological understanding of the impact of knocking down the Male Specific Lethal (MSL) complex in Drosophilia melanogaster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

Download references

Acknowledgments

We would like to thank Josephina Hendrix for assistance with analysis of publicly available datasets. This work was funded in part by a NSF IGERT Grant number 1144807 (MEL, JGA, RDD), a Sie Postdoctoral Fellowship (MAA), the Boettcher Foundation’s Webb-Waring Biomedical Research program (RDD) and a NSF ABI DBI-12624L0 (RDD). The authors acknowledge the BioFrontiers Computing Core at the University of Colorado Boulder for providing High Performance Computing resources (NIH 1S10OD012300) supported by BioFrontiers’ IT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel E. Lladser.

Ethics declarations

Conflict of interest

No competing financial interests exist.

Appendix: Double geometric distribution

Appendix: Double geometric distribution

A random variable X is said to have a (possibly asymmetric) Double Geometric distribution with parameters (ud) when it has the same distribution as \((-U)+D\), where U and D are independent Geometric random variables with means \((1/u-1)\) and \((1/d-1)\), respectively. In particular, the probability mass function of X is

$$\begin{aligned} p_{u,d}(k)=\frac{ud}{u+d-ud}\cdot \left\{ \begin{array}{lcl} (1-u)^{-k} &{} \quad \hbox {for} &{} k\le 0;\\ (1-d)^k &{} \quad \hbox {for} &{} k\ge 0. \end{array}\right. \end{aligned}$$

In this case, we write \(X\sim DoubleGeometric(u,d)\). More generally, given an integer i, we write \(X\sim DoubleGeometric(i,u,d)\) to mean that \((X-i)\sim DoubleGeometric(u,d)\).

If \(X\sim DoubleGeometric(i,u,d)\) then

  1. (1)

    \((i-X)\sim Geometric(u)\) when \(X\le i\); and

  2. (2)

    \((X-i)\sim Geometric(d)\) when \(X\ge i\).

These two properties justify the Double Geometric terminology.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lladser, M.E., Azofeifa, J.G., Allen, M.A. et al. RNA Pol II transcription model and interpretation of GRO-seq data. J. Math. Biol. 74, 77–97 (2017). https://doi.org/10.1007/s00285-016-1014-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-016-1014-4

Keywords

Mathematics Subject Classification

Navigation