Skip to main content

GDNorm: An Improved Poisson Regression Model for Reducing Biases in Hi-C Data

  • Conference paper
Algorithms in Bioinformatics (WABI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8701))

Included in the following conference series:

  • 1886 Accesses

Abstract

As a revolutionary tool, the Hi-C technology can be used to capture genomic segments that have close spatial proximity in three dimensional space and enable the study of chromosome structures at an unprecedentedly high throughput and resolution. However, during the experimental steps of Hi-C, systematic biases from different sources are often introduced into the resultant data (i.e., reads or read counts). Several bias reduction methods have been proposed recently. Although both systematic biases and spatial distance are known as key factors determining the number of observed chromatin interactions, the existing bias reduction methods in the literature do not include spatial distance explicitly in their computational models for estimating the interactions. In this work, we propose an improved Poisson regression model and an efficient gradient descent based algorithm, GDNorm, for reducing biases in Hi-C data that takes spatial distance into consideration. GDNorm has been tested on both simulated and real Hi-C data, and its performance compared with that of the state-of-the-art bias reduction methods. The experimental results show that our improved Poisson model is able to provide more accurate normalized contact frequencies (measured in read counts) between interacting genomic segments and thus a more accurate chromosome structure prediction when combined with a chromosome structure determination method such as ChromSDE. Moreover, assessed by recently published data from human lymphoblastoid and mouse embryonic stem cell lines, GDNorm achieves the highest reproducibility between the biological replicates of the cell lines. The normalized contact frequencies obtained by GDNorm is well correlated to the spatial distance measured by florescent in situ hybridization (FISH) experiments. In addition to accurate bias reduction, GDNorm has the highest time efficiency on the real data. GDNorm is implemented in C++ and available at http://www.cs.ucr.edu/~yyang027/gdnorm.htm

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dekker, J., Marti-Renom, M.A., Mirny, L.A.: Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nature Reviews. Genetics 14(6), 390–403 (2013)

    Article  Google Scholar 

  2. Hu, M., Deng, K., Qin, Z., Liu, J.S.: Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data. Quantitative Biology 1(2), 156–174 (2013)

    Article  Google Scholar 

  3. Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., Sandstrom, R., Bernstein, B., Bender, M.A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L.A., Lander, E.S., Dekker, J.: Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950), 289–293 (2009)

    Article  Google Scholar 

  4. Eskeland, R., Leeb, M., Grimes, G.R., Kress, C., Boyle, S., Sproul, D., Gilbert, N., Fan, Y., Skoultchi, A.I., Wutz, A., Bickmore, W.A.: Ring1B compacts chromatin structure and represses gene expression independent of histone ubiquitination. Molecular Cell 38(3), 452–464 (2010)

    Article  Google Scholar 

  5. Dekker, J., Rippe, K., Dekker, M., Kleckner, N.: Capturing chromosome conformation. Science 295(5558), 1306–1311 (2002)

    Article  Google Scholar 

  6. Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., van Steensel, B., de Laat, W.: Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nature Genetics 38(11), 1348–1354 (2006)

    Article  Google Scholar 

  7. Zhao, Z., Tavoosidana, G., Sjölinder, M., Göndör, A., Mariano, P., Wang, S., Kanduri, C., Lezcano, M., Sandhu, K.S., Singh, U., Pant, V., Tiwari, V., Kurukuti, S., Ohlsson, R.: Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nature Genetics 38(11), 1341–1347 (2006)

    Article  Google Scholar 

  8. Dostie, J., Richmond, T.A., Arnaout, R.A., Selzer, R.R., Lee, W.L., Honan, T.A., Rubio, E.D., Krumm, A., Lamb, J., Nusbaum, C., Green, R.D., Dekker, J.: Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Research 16(10), 1299–1309 (2006)

    Article  Google Scholar 

  9. Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., Ren, B.: Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485(7398), 376–380 (2012)

    Article  Google Scholar 

  10. Hu, M., Deng, K., Qin, Z., Dixon, J., Selvaraj, S., Fang, J., Ren, B., Liu, J.S.: Bayesian inference of spatial organizations of chromosomes. PLoS Computational Biology 9(1), e1002893 (2013)

    Google Scholar 

  11. Marti-Renom, M.A., Mirny, L.A.: Bridging the resolution gap in structural modeling of 3D genome organization. PLoS Computational Biology 7(7), e1002125 (2011)

    Article  Google Scholar 

  12. Zhang, Z., Li, G., Toh, K.-C., Sung, W.-K.: Inference of spatial organizations of chromosomes using semi-definite embedding approach and hi-C data. In: Deng, M., Jiang, R., Sun, F., Zhang, X. (eds.) RECOMB 2013. LNCS, vol. 7821, pp. 317–332. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. Yaffe, E., Tanay, A.: Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nature Genetics 43(11), 1059–1065 (2011)

    Article  Google Scholar 

  14. Imakaev, M., Fudenberg, G., Mccord, R.P., Naumova, N., Goloborodko, A., Lajoie, B.R., Dekker, J., Mirny, L.A.: Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature Methods (September) (2012)

    Google Scholar 

  15. Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., Mozziconacci, J.: Normalization of a chromosomal contact map. BMC Genomics 13, 436 (2012)

    Article  Google Scholar 

  16. Hu, M., Deng, K., Selvaraj, S., Qin, Z., Ren, B., Liu, J.S.: HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics 28(23), 3131–3133 (2012)

    Article  Google Scholar 

  17. Jin, F., Li, Y., Dixon, J.R., Selvaraj, S., Ye, Z., Lee, A.Y., Yen, C.A., Schmitt, A.D., Espinoza, C.A., Ren, B.: A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503(7475), 290–294 (2013)

    Google Scholar 

  18. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  19. Lindsey, J.K., Altham, P.M.E.: Analysis of the human sex ratio by using overdispersion models. Journal of the Royal Statistical Society. Series C (Applied Statistics) 47(1), 149–157 (1998)

    Article  Google Scholar 

  20. Rousseau, M., Fraser, J., Ferraiuolo, M.A., Dostie, J., Blanchette, M.: Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling. BMC Bioinformatics 12(1), 414 (2011)

    Article  Google Scholar 

  21. Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A 32(5), 922–923 (1976)

    Article  Google Scholar 

  22. Goulden, C.H.: Methods of Statistical Analysis, 2nd edn. Wiley, New York (1956)

    Google Scholar 

  23. Nagano, T., Lubling, Y., Stevens, T.J., Schoenfelder, S., Yaffe, E., Dean, W., Laue, E.D., Tanay, A., Fraser, P.: Single-cell Hi-C reveals cell-to-cell variability in chromosome structure. Nature 502(7469), 59–64 (2013)

    Article  Google Scholar 

  24. Wang, Z., Cao, R., Taylor, K., Briley, A., Caldwell, C., Cheng, J.: The properties of genome conformation and spatial gene interaction and regulation networks of normal and malignant human cell types. PLoS ONE 8(3), e58793 (2013)

    Google Scholar 

  25. Dobson, A.J.: An Introduction to Generalized Linear Models. Chapman and Hall, London (1990)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, EW., Jiang, T. (2014). GDNorm: An Improved Poisson Regression Model for Reducing Biases in Hi-C Data. In: Brown, D., Morgenstern, B. (eds) Algorithms in Bioinformatics. WABI 2014. Lecture Notes in Computer Science(), vol 8701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44753-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-44753-6_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-44752-9

  • Online ISBN: 978-3-662-44753-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics