Abstract
In recent years, next generation sequencing technology, coupled with an assay that is capable of detecting genome-wide chromatin interactions, has produced a massive amount of data and led to a greater understanding of long-range, or spatial, gene regulation mechanisms. Hence, the traditional one-dimensional linear view of a genome, which is especially prevalent in statistical and mathematical modeling, is inadequate in many genomic studies. Instead, it is essential, in studying genomic functions, to estimate the three-dimensional (3D) structure of a genome. The availability of genome-wide interaction data necessitates the development of analytical methods to recover the underlying 3D spatial chromatin structure, but challenges abound. One particular issue is the excess of zeros, especially with higher resolution, or inter-chromosomal, data. This leads to questions concerning the appropriateness of using the Poisson distribution to model such data. In this article, we introduce a truncated Poisson Architecture Model (tPAM) to directly model sequencing counts with many zeros. We carried out an extensive simulation study to evaluate tPAM and to compare its performance with an existing method that uses the Poisson distribution to model the counts. We applied tPAM to reconstruct the underlying 3D structures of two data sets, one of human and one of mouse, to demonstrate its utility. The analysis of the human data set considered chromosomes 14 and 22 jointly, thereby illustrating tPAM’s capability of analyzing inter-chromosomal data. On the other hand, the mouse analysis was focused on a region on chromosome 2 to evaluate tPAM’s performance for recovering structure with loci in different topologically associated domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baù, D., A. Sanyal, B.R. Lajoie, E. Capriotti, M. Byron, et al. 2011. The three-dimensional folding of the a-globin gene domain reveals formation of chromatin globules. Nature Structural and Molecular Biology 18: 107–114.
Ben-Elazar, S., et al. 2013. Spatial localization of co-regulated genes exceeds genomic gene clustering in the saccharomyces cerevisiae genome. Nucleic Acids Research 41: 2191–2201.
Dixon, J.R., S. Selvaraj, F. Yue, et al. 2012. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485: 376–380.
Duan, Z., M. Andronescu, K. Schutz, S. McIlwain, et al. 2010. A three-dimensional model of the yeast genome. Nature 465: 363–367.
Fraser, J., M. Rousseau, S. Shenker, M.A. Ferraiuolo, et al. 2009. Chromatin conformation signatures of cellular differentiation. Genome biology 10: R37+.
Fullwood, M.J., M.H. Liu, Y.F. Pan, J. Liu, et al. 2011. TAn oestrogen-receptor-[agr]-bound human chromatin interactome. Nature 462: 58–64.
Gelman, A., J.B. Carlin, H.S. Stern, D.B. Dunson, et al. 2013. Bayesian Data Analysis, Third Edition (Chapman and Hall/CRC Texts in Statistical Science). Chapman and Hall/CRC
Geweke, J. 1992. Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bayesian Statistics (Vol. 4, pp. 169–193). Oxford: Oxford University Press.
Heidelberger, P., and P.D. Welch. 1983. Simulation Run Length Control in the Presence of an Initial Transient. Operations Research 31: 1109–1145.
Hu, M., K. Deng, Z. Qin, et al. (2013). Bayesian inference of spatial organizations of chromosomes. PLOS Computational Biology 9: e1002893+.
Imakaev, M., G. Fudenberg, R. McCord, et al. 2012. Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods 9: 999–1003.
Kalhor, R., H. Tjong, N. Jayathilaka, et al. 2012. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nature Biotechnology 30: 90–98.
Lesne, A., J. Riposo, P. Roger, et al. (2014). 3D genome reconstruction from chromosomal contacts. Nature Biotechnology, advance online publication.
Lieberman-Aiden, E., N.L. van Berkum, et al. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326: 289–293.
Raftery, A.E., and S.M. Lewis. (1995). The number of iterations, convergence diagnostics and generic Metropolis algorithms, In Practical Markov Chain Monte Carlo, (pp. 115–130).
Rousseau, M., J. Fraser, M. Ferraiuolo, J. Dostie, and M. Blanchette. (2011). Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling, BMC Bioinformatics 12: 414+.
Tanizawa, H., O. Iwasaki, A. Tanaka, et al. 2010. Mapping of long-range associations throughout the fission yeast genome reveals global genome organization linked to transcriptional regulation. Nucleic Acids Research 38: 8164–8177.
Varoquaux, N., F. Ay, W.S. Noble, and J. Vert. 2014. A statistical approach for inferring the 3D structure of the genome. Bioinformatics 30: 26–33.
Xiao, G., X. Wang, and A.B. Khodursky. 2011. Modeling three-dimensional chromosome structures using gene expression data. Journal of the American Statistical Association 106: 61–72.
Yaffe, E., and A. Tanay. 2011. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nature genetics 43: 1059–1065.
Zhang, Z., Li, G., K. Toh, and W. Sung. 2013. Inference of spatial organizations of chromosomes using semi-definite embedding approach and Hi-c data. Proceedings of the 17th International Conference on Research in Computational Molecular Biology 16: 317–332.
Acknowledgments
This work was supported in part by the National Science Foundation grants DMS-1042946 and DMS-1220772, the National Institute of Health Grant IROIGM114142-01, and by the Bisa Research Grant of Keimyung University in 2014. This material was also based upon work partially supported by the National Science Foundation under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendices
1.1 A. Isometric Transformation
To make \(\varOmega \) uniquely estimable, instead of incorporating the restrictions on \(\varOmega \) into prior, we employed a group of isometric (distance preserving) mappings. Suppose we sample \(\varOmega ^t\) at iteration t. For simplicity, we let \(\varOmega \) denote the transformed one throughout the rest of this appendix.
-
Step 1.
\(\mathbf {p}_1 \rightarrow (0,0,0)\).
To place \(\mathbf {p}_1^t\) at the origin (0, 0, 0), we apply a translation operation \(\mathscr {R}_\tau \) such that
$$\begin{aligned} \mathscr {R}_{\tau }: {} \mathbf {p}_i^t \rightarrow \mathbf {p}_i^t - \mathbf {p}_1^t. \end{aligned}$$(18)Let \(\varOmega =\{\mathbf {p}_1,\ldots ,\mathbf {p}_n\}\) be the translated architecture.
-
Step 2.
\(\mathbf {p}_n \rightarrow (p^x_n,0,0)\) with \(p^x_n>0\).
-
a.
\(\mathbf {p}_n \rightarrow (p^x_n,0,p^z_n)\).
To place \(\mathbf {p} _{n}\) on the xz-plane, we apply a rotation operation \(\mathscr {R}_{\mathring{z}}\) with associated matrix \(R_{\mathring{z}}\), clockwise-rotation matrix on \(\mathbf {p} _{n}\) about the z-axis, sending it to the xz-plane:
$$\begin{aligned} R_{\mathring{z}}=\left[ \begin{array}{ccc} \cos \phi _1 &{} \sin \phi _1 &{} 0 \\ -\sin \phi _1 &{} \cos \phi _1 &{} 0 \\ 0 &{} 0 &{} 1 \end{array} \right] , \end{aligned}$$where,
$$\begin{aligned} \cos \phi _1 = p^{x}_n /\sqrt{(p_n^{x})^2+(p_n^{y})^2},\\ \sin \phi _1 = p^{y}_n /\sqrt{(p_n^{x})^2+(p_n^{y})^2}. \end{aligned}$$Let \(\varOmega =\{\mathbf {p}_1,\ldots ,\mathbf {p}_n\}\) be the rotated architecture.
-
b.
\(\mathbf {p}_n \rightarrow (p^x_n,0,0)\).
To place \(\mathbf {p} _{n}\) on the x-axis, we apply a rotation operation \(\mathscr {R}_{\mathring{y}}\) with associated matrix \(R_{\mathring{y}}\), a clockwise-rotation matrix around the y-axis:
$$\begin{aligned} R_{\mathring{y}}=\left[ \begin{array}{ccc} \cos \phi _2 &{} 0 &{} \sin \phi _2 \\ 0 &{} 1 &{} 0 \\ -\sin \phi _2 &{} 0 &{} \cos \phi _2 \end{array} \right] , \end{aligned}$$where
$$\begin{aligned} \cos \phi _2 = p^{x}_n /\sqrt{(p_n^{x})^2+(p_n^{z})^2},\\ \sin \phi _2 = p^{z}_n /\sqrt{(p_n^{x})^2+(p_n^{z})^2}. \end{aligned}$$Let \(\varOmega =\{\mathbf {p}_1,\ldots ,\mathbf {p}_n\}\) be the rotated architecture.
-
a.
-
Step 3.
\(\mathbf {p}_2 \rightarrow (p^x_2,0,p^z_2)\) with \(p^z_2>0\).
To place \(\mathbf {p}_{2}\) on the xz-plane, we apply a counter-clockwise rotation about the x-axis \(\mathscr {R}_{\mathring{x}}\) with associated matrix \(R_{\mathring{x}}\):
$$\begin{aligned} R_{\mathring{x}}=\left[ \begin{array}{ccc} 1 &{} 0 &{} 0 \\ 0 &{} \cos \phi _3 &{} -\sin \phi _3 \\ 0 &{} \sin \phi _3 &{} \cos \phi _3 \end{array} \right] , \end{aligned}$$where
$$\begin{aligned} \cos \phi _3 = p^{z}_2 /\sqrt{(p_2^{y})^2+(p_2^{z})^2},\\ \sin \phi _3 = p^{y}_2 /\sqrt{(p_2^{y})^2+(p_2^{z})^2}. \end{aligned}$$Let \(\varOmega =\{\mathbf {p}_1,\ldots ,\mathbf {p}_n\}\) be the rotated architecture.
-
Step 4.
\(\mathbf {p}_3 \rightarrow (p^x_3,p^y_3,p^z_3)\) such that \(p^y_3>0\).
To satisfy \(p _{3}^{y}>0\), if \(p_3^{y} < 0\), reflect \(\mathbf {p}\) as
$$\begin{aligned} \mathscr {R}_{rfl}: {} p_i^y \rightarrow -p_i^y. \end{aligned}$$(19)
Let transformation \(\mathscr {I}\) be the composite of the five isometric transformations, \(\mathscr {R}_{\tau }\), \(\mathscr {R}_{\mathring{z}}\), \(\mathscr {R}_{\mathring{y}}\), \(\mathscr {R}_{\mathring{x}}\), and \(\mathscr {R}_{rfl}\) in the following way: \(\mathscr {I}\equiv R_{rfl} \mathscr {R}_{\mathring{x}} \mathscr {R}_{\mathring{y}}\mathscr {R}_{\mathring{z}}\mathscr {R}_{\tau }\). Then \(\mathscr {I}\) is an isometric (distance-preserving) transformation and the transformed coordinates satisfy the following estimability conditions on \(\mathbf {p}\) : \(\mathbf {p}_{1}=(0,0,0)\), \(\mathbf {p}_{2}=(p_{2}^{x},0,p _{2}^{z})\) with \(p_2^{z}>0\), \(\mathbf {p}_{3}=(p_{3}^{x},p_{3}^{y},p_{3}^{z})\) with \(p_{3}^{y}>0\), and \(\mathbf {p}_{n}=(p_n^x,0,0)\) with \(p_{n}^{x}>0\).
1.2 B. Leapfrog Method for Hamiltonian MCMC
In the second stage of Hamiltonian MCMC, we simultaneously update \((\mathbf {p}_i, \mathbf {v}_i)\) to obtain a proposal vector \((\mathbf {p}_i^*, \mathbf {v}_i^*)\) using a leapfrog method which involves a leap scale \(\varepsilon \) and a repetition number L:
-
(1)
For each of x, y, z, update \(v_i^{x}, v_i^{y}, v_i^{z}\) as
$$\begin{aligned} v_i^{(.)} \leftarrow v_i^{(.)} + \frac{1}{2}\varepsilon \frac{d \log p(p_i^{(.)} |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^{(.)}}. \end{aligned}$$(20) -
(2)
Repeat the following updates \(L-1\) times:
$$\begin{aligned} v_i^{(.)} \leftarrow v_i^{(.)} + \frac{1}{2}\varepsilon \frac{d \log p(p_i^{(.)} |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^{(.)}},\;\;\;\;\; p_i^{(.)} \leftarrow p_i^{(.)}+\varepsilon v^{(.)}_i. \end{aligned}$$(21) -
(3)
Update \(v_i^{x}, v_i^{y}, v_i^{z}\) as
$$\begin{aligned} v_i^{(.)} \leftarrow v_i^{(.)} + \frac{1}{2}\varepsilon \frac{d \log p(p_i^{(.)} |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^{(.)}}. \end{aligned}$$(22) -
(4)
The updated \(\mathbf {p}_i\) and \(\mathbf {v} _i\) constitute a proposal vector \((\mathbf {p}_i^*, \mathbf {v}_i^*)\).
In the leapfrog method, the essential quantities to evaluate are
$$\begin{aligned} \frac{d \log p(p_i^x |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^x} = \sum _{j \ne i }\left( y_{ij}-\lambda _{ij}\frac{e^{\lambda _{ij}}}{e^{\lambda _{ij}}-1}\right) \alpha _1\frac{p_i ^x-p_j^x}{\delta _{ij}^2},\end{aligned}$$(23)$$\begin{aligned} \frac{d \log p(p_i^y |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^y} = \sum _{j \ne i }\left( y_{ij}-\lambda _{ij}\frac{e^{\lambda _{ij}}}{e^{\lambda _{ij}}-1}\right) \alpha _1\frac{p_i ^y-p_j^y}{\delta _{ij}^2},\end{aligned}$$(24)$$\begin{aligned} \frac{d \log p(p_i^z |\mathbf {y},\vartheta _{-\mathbf {p}_i})}{d p_i^z} = \sum _{j \ne i }\left( y_{ij}-\lambda _{ij}\frac{e^{\lambda _{ij}}}{e^{\lambda _{ij}}-1}\right) \alpha _1\frac{p_i ^z-p_j^z}{\delta _{ij}^2}. \end{aligned}$$(25)
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Park, J., Lin, S. (2015). Statistical Inference on Three-Dimensional Structure of Genome by Truncated Poisson Architecture Model. In: Choudhary, P., Nagaraja, C., Ng, H. (eds) Ordered Data Analysis, Modeling and Health Research Methods. Springer Proceedings in Mathematics & Statistics, vol 149. Springer, Cham. https://doi.org/10.1007/978-3-319-25433-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-25433-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25431-9
Online ISBN: 978-3-319-25433-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)