Skip to main content

Statistical Inference on Three-Dimensional Structure of Genome by Truncated Poisson Architecture Model

  • Conference paper
  • First Online:
Ordered Data Analysis, Modeling and Health Research Methods

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 149))

Abstract

In recent years, next generation sequencing technology, coupled with an assay that is capable of detecting genome-wide chromatin interactions, has produced a massive amount of data and led to a greater understanding of long-range, or spatial, gene regulation mechanisms. Hence, the traditional one-dimensional linear view of a genome, which is especially prevalent in statistical and mathematical modeling, is inadequate in many genomic studies. Instead, it is essential, in studying genomic functions, to estimate the three-dimensional (3D) structure of a genome. The availability of genome-wide interaction data necessitates the development of analytical methods to recover the underlying 3D spatial chromatin structure, but challenges abound. One particular issue is the excess of zeros, especially with higher resolution, or inter-chromosomal, data. This leads to questions concerning the appropriateness of using the Poisson distribution to model such data. In this article, we introduce a truncated Poisson Architecture Model (tPAM) to directly model sequencing counts with many zeros. We carried out an extensive simulation study to evaluate tPAM and to compare its performance with an existing method that uses the Poisson distribution to model the counts. We applied tPAM to reconstruct the underlying 3D structures of two data sets, one of human and one of mouse, to demonstrate its utility. The analysis of the human data set considered chromosomes 14 and 22 jointly, thereby illustrating tPAM’s capability of analyzing inter-chromosomal data. On the other hand, the mouse analysis was focused on a region on chromosome 2 to evaluate tPAM’s performance for recovering structure with loci in different topologically associated domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baù, D., A. Sanyal, B.R. Lajoie, E. Capriotti, M. Byron, et al. 2011. The three-dimensional folding of the a-globin gene domain reveals formation of chromatin globules. Nature Structural and Molecular Biology 18: 107–114.

    Google Scholar 

  2. Ben-Elazar, S., et al. 2013. Spatial localization of co-regulated genes exceeds genomic gene clustering in the saccharomyces cerevisiae genome. Nucleic Acids Research 41: 2191–2201.

    Article  Google Scholar 

  3. Dixon, J.R., S. Selvaraj, F. Yue, et al. 2012. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485: 376–380.

    Article  Google Scholar 

  4. Duan, Z., M. Andronescu, K. Schutz, S. McIlwain, et al. 2010. A three-dimensional model of the yeast genome. Nature 465: 363–367.

    Article  Google Scholar 

  5. Fraser, J., M. Rousseau, S. Shenker, M.A. Ferraiuolo, et al. 2009. Chromatin conformation signatures of cellular differentiation. Genome biology 10: R37+.

    Google Scholar 

  6. Fullwood, M.J., M.H. Liu, Y.F. Pan, J. Liu, et al. 2011. TAn oestrogen-receptor-[agr]-bound human chromatin interactome. Nature 462: 58–64.

    Article  Google Scholar 

  7. Gelman, A., J.B. Carlin, H.S. Stern, D.B. Dunson, et al. 2013. Bayesian Data Analysis, Third Edition (Chapman and Hall/CRC Texts in Statistical Science). Chapman and Hall/CRC

    Google Scholar 

  8. Geweke, J. 1992. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bayesian Statistics (Vol. 4, pp. 169–193). Oxford: Oxford University Press.

    Google Scholar 

  9. Heidelberger, P., and P.D. Welch. 1983. Simulation Run Length Control in the Presence of an Initial Transient. Operations Research 31: 1109–1145.

    Article  MATH  Google Scholar 

  10. Hu, M., K. Deng, Z. Qin, et al. (2013). Bayesian inference of spatial organizations of chromosomes. PLOS Computational Biology 9: e1002893+.

    Google Scholar 

  11. Imakaev, M., G. Fudenberg, R. McCord, et al. 2012. Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods 9: 999–1003.

    Article  Google Scholar 

  12. Kalhor, R., H. Tjong, N. Jayathilaka, et al. 2012. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nature Biotechnology 30: 90–98.

    Article  Google Scholar 

  13. Lesne, A., J. Riposo, P. Roger, et al. (2014). 3D genome reconstruction from chromosomal contacts. Nature Biotechnology, advance online publication.

    Google Scholar 

  14. Lieberman-Aiden, E., N.L. van Berkum, et al. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326: 289–293.

    Article  Google Scholar 

  15. Raftery, A.E., and S.M. Lewis. (1995). The number of iterations, convergence diagnostics and generic Metropolis algorithms, In Practical Markov Chain Monte Carlo, (pp. 115–130).

    Google Scholar 

  16. Rousseau, M., J. Fraser, M. Ferraiuolo, J. Dostie, and M. Blanchette. (2011). Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling, BMC Bioinformatics 12: 414+.

    Google Scholar 

  17. Tanizawa, H., O. Iwasaki, A. Tanaka, et al. 2010. Mapping of long-range associations throughout the fission yeast genome reveals global genome organization linked to transcriptional regulation. Nucleic Acids Research 38: 8164–8177.

    Article  Google Scholar 

  18. Varoquaux, N., F. Ay, W.S. Noble, and J. Vert. 2014. A statistical approach for inferring the 3D structure of the genome. Bioinformatics 30: 26–33.

    Article  Google Scholar 

  19. Xiao, G., X. Wang, and A.B. Khodursky. 2011. Modeling three-dimensional chromosome structures using gene expression data. Journal of the American Statistical Association 106: 61–72.

    Article  MathSciNet  MATH  Google Scholar 

  20. Yaffe, E., and A. Tanay. 2011. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nature genetics 43: 1059–1065.

    Article  Google Scholar 

  21. Zhang, Z., Li, G., K. Toh, and W. Sung. 2013. Inference of spatial organizations of chromosomes using semi-definite embedding approach and Hi-c data. Proceedings of the 17th International Conference on Research in Computational Molecular Biology 16: 317–332.

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Science Foundation grants DMS-1042946 and DMS-1220772, the National Institute of Health Grant IROIGM114142-01, and by the Bisa Research Grant of Keimyung University in 2014. This material was also based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shili Lin .

Editor information

Editors and Affiliations

Appendices

Appendices

1.1 A. Isometric Transformation

To make \(\varOmega \) uniquely estimable, instead of incorporating the restrictions on \(\varOmega \) into prior, we employed a group of isometric (distance preserving) mappings. Suppose we sample \(\varOmega ^t\) at iteration t. For simplicity, we let \(\varOmega \) denote the transformed one throughout the rest of this appendix.

  1. Step 1.

    \(\mathbf {p}_1 \rightarrow (0,0,0)\).

    To place \(\mathbf {p}_1^t\) at the origin (0, 0, 0), we apply a translation operation \(\mathscr {R}_\tau \) such that

    $$\begin{aligned} \mathscr {R}_{\tau }: {} \mathbf {p}_i^t \rightarrow \mathbf {p}_i^t - \mathbf {p}_1^t. \end{aligned}$$
    (18)

    Let \(\varOmega =\{\mathbf {p}_1,\ldots ,\mathbf {p}_n\}\) be the translated architecture.

  2. Step 2.

    \(\mathbf {p}_n \rightarrow (p^x_n,0,0)\) with \(p^x_n>0\).

    1. a.

      \(\mathbf {p}_n \rightarrow (p^x_n,0,p^z_n)\).

      To place \(\mathbf {p} _{n}\) on the xz-plane, we apply a rotation operation \(\mathscr {R}_{\mathring{z}}\) with associated matrix \(R_{\mathring{z}}\), clockwise-rotation matrix on \(\mathbf {p} _{n}\) about the z-axis, sending it to the xz-plane:

      $$\begin{aligned} R_{\mathring{z}}=\left[ \begin{array}{ccc} \cos \phi _1 &{} \sin \phi _1 &{} 0 \\ -\sin \phi _1 &{} \cos \phi _1 &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right] , \end{aligned}$$

      where,

      $$\begin{aligned} \cos \phi _1 = p^{x}_n /\sqrt{(p_n^{x})^2+(p_n^{y})^2},\\ \sin \phi _1 = p^{y}_n /\sqrt{(p_n^{x})^2+(p_n^{y})^2}. \end{aligned}$$

      Let \(\varOmega =\{\mathbf {p}_1,\ldots ,\mathbf {p}_n\}\) be the rotated architecture.

    2. b.

      \(\mathbf {p}_n \rightarrow (p^x_n,0,0)\).

      To place \(\mathbf {p} _{n}\) on the x-axis, we apply a rotation operation \(\mathscr {R}_{\mathring{y}}\) with associated matrix \(R_{\mathring{y}}\), a clockwise-rotation matrix around the y-axis:

      $$\begin{aligned} R_{\mathring{y}}=\left[ \begin{array}{ccc} \cos \phi _2 &{} 0 &{} \sin \phi _2 \\ 0 &{} 1 &{} 0 \\ -\sin \phi _2 &{} 0 &{} \cos \phi _2 \end{array} \right] , \end{aligned}$$

      where

      $$\begin{aligned} \cos \phi _2 = p^{x}_n /\sqrt{(p_n^{x})^2+(p_n^{z})^2},\\ \sin \phi _2 = p^{z}_n /\sqrt{(p_n^{x})^2+(p_n^{z})^2}. \end{aligned}$$

      Let \(\varOmega =\{\mathbf {p}_1,\ldots ,\mathbf {p}_n\}\) be the rotated architecture.

  3. Step 3.

    \(\mathbf {p}_2 \rightarrow (p^x_2,0,p^z_2)\) with \(p^z_2>0\).

    To place \(\mathbf {p}_{2}\) on the xz-plane, we apply a counter-clockwise rotation about the x-axis \(\mathscr {R}_{\mathring{x}}\) with associated matrix \(R_{\mathring{x}}\):

    $$\begin{aligned} R_{\mathring{x}}=\left[ \begin{array}{ccc} 1 &{} 0 &{} 0 \\ 0 &{} \cos \phi _3 &{} -\sin \phi _3 \\ 0 &{} \sin \phi _3 &{} \cos \phi _3 \end{array} \right] , \end{aligned}$$

    where

    $$\begin{aligned} \cos \phi _3 = p^{z}_2 /\sqrt{(p_2^{y})^2+(p_2^{z})^2},\\ \sin \phi _3 = p^{y}_2 /\sqrt{(p_2^{y})^2+(p_2^{z})^2}. \end{aligned}$$

    Let \(\varOmega =\{\mathbf {p}_1,\ldots ,\mathbf {p}_n\}\) be the rotated architecture.

  4. Step 4.

    \(\mathbf {p}_3 \rightarrow (p^x_3,p^y_3,p^z_3)\) such that \(p^y_3>0\).

    To satisfy \(p _{3}^{y}>0\), if \(p_3^{y} < 0\), reflect \(\mathbf {p}\) as

    $$\begin{aligned} \mathscr {R}_{rfl}: {} p_i^y \rightarrow -p_i^y. \end{aligned}$$
    (19)

Let transformation \(\mathscr {I}\) be the composite of the five isometric transformations, \(\mathscr {R}_{\tau }\), \(\mathscr {R}_{\mathring{z}}\), \(\mathscr {R}_{\mathring{y}}\), \(\mathscr {R}_{\mathring{x}}\), and \(\mathscr {R}_{rfl}\) in the following way: \(\mathscr {I}\equiv R_{rfl} \mathscr {R}_{\mathring{x}} \mathscr {R}_{\mathring{y}}\mathscr {R}_{\mathring{z}}\mathscr {R}_{\tau }\). Then \(\mathscr {I}\) is an isometric (distance-preserving) transformation and the transformed coordinates satisfy the following estimability conditions on \(\mathbf {p}\) : \(\mathbf {p}_{1}=(0,0,0)\), \(\mathbf {p}_{2}=(p_{2}^{x},0,p _{2}^{z})\) with \(p_2^{z}>0\), \(\mathbf {p}_{3}=(p_{3}^{x},p_{3}^{y},p_{3}^{z})\) with \(p_{3}^{y}>0\), and \(\mathbf {p}_{n}=(p_n^x,0,0)\) with \(p_{n}^{x}>0\).

1.2 B. Leapfrog Method for Hamiltonian MCMC

In the second stage of Hamiltonian MCMC, we simultaneously update \((\mathbf {p}_i, \mathbf {v}_i)\) to obtain a proposal vector \((\mathbf {p}_i^*, \mathbf {v}_i^*)\) using a leapfrog method which involves a leap scale \(\varepsilon \) and a repetition number L:

  1. (1)

    For each of xyz, update \(v_i^{x}, v_i^{y}, v_i^{z}\) as

    $$\begin{aligned} v_i^{(.)} \leftarrow v_i^{(.)} + \frac{1}{2}\varepsilon \frac{d \log p(p_i^{(.)} |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^{(.)}}. \end{aligned}$$
    (20)
  2. (2)

    Repeat the following updates \(L-1\) times:

    $$\begin{aligned} v_i^{(.)} \leftarrow v_i^{(.)} + \frac{1}{2}\varepsilon \frac{d \log p(p_i^{(.)} |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^{(.)}},\;\;\;\;\; p_i^{(.)} \leftarrow p_i^{(.)}+\varepsilon v^{(.)}_i. \end{aligned}$$
    (21)
  3. (3)

    Update \(v_i^{x}, v_i^{y}, v_i^{z}\) as

    $$\begin{aligned} v_i^{(.)} \leftarrow v_i^{(.)} + \frac{1}{2}\varepsilon \frac{d \log p(p_i^{(.)} |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^{(.)}}. \end{aligned}$$
    (22)
  4. (4)

    The updated \(\mathbf {p}_i\) and \(\mathbf {v} _i\) constitute a proposal vector \((\mathbf {p}_i^*, \mathbf {v}_i^*)\).

    In the leapfrog method, the essential quantities to evaluate are

    $$\begin{aligned} \frac{d \log p(p_i^x |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^x} = \sum _{j \ne i }\left( y_{ij}-\lambda _{ij}\frac{e^{\lambda _{ij}}}{e^{\lambda _{ij}}-1}\right) \alpha _1\frac{p_i ^x-p_j^x}{\delta _{ij}^2},\end{aligned}$$
    (23)
    $$\begin{aligned} \frac{d \log p(p_i^y |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^y} = \sum _{j \ne i }\left( y_{ij}-\lambda _{ij}\frac{e^{\lambda _{ij}}}{e^{\lambda _{ij}}-1}\right) \alpha _1\frac{p_i ^y-p_j^y}{\delta _{ij}^2},\end{aligned}$$
    (24)
    $$\begin{aligned} \frac{d \log p(p_i^z |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^z} = \sum _{j \ne i }\left( y_{ij}-\lambda _{ij}\frac{e^{\lambda _{ij}}}{e^{\lambda _{ij}}-1}\right) \alpha _1\frac{p_i ^z-p_j^z}{\delta _{ij}^2}. \end{aligned}$$
    (25)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Park, J., Lin, S. (2015). Statistical Inference on Three-Dimensional Structure of Genome by Truncated Poisson Architecture Model. In: Choudhary, P., Nagaraja, C., Ng, H. (eds) Ordered Data Analysis, Modeling and Health Research Methods. Springer Proceedings in Mathematics & Statistics, vol 149. Springer, Cham. https://doi.org/10.1007/978-3-319-25433-3_15

Download citation

Publish with us

Policies and ethics