Skip to main content

Quantifying the Impact of Non-coding Variants on Transcription Factor-DNA Binding

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10229))

Abstract

Many recent studies have emphasized the importance of genetic variants and mutations in cancer and other complex human diseases. The overwhelming majority of these variants occur in non-coding portions of the genome, where they can have a functional impact by disrupting regulatory interactions between transcription factors (TFs) and DNA. Here, we present a method for assessing the impact of non-coding mutations on TF-DNA interactions, based on regression models of DNA-binding specificity trained on high-throughput in vitro data. We use ordinary least squares (OLS) to estimate the parameters of the binding model for each TF, and we show that our predictions of TF binding changes due to DNA mutations correlate well with measured changes in gene expression. In addition, by leveraging distributional results associated with OLS estimation, for each predicted change in TF binding we also compute a normalized score (z-score) and a significance value (p-value) reflecting our confidence that the mutation affects TF binding. We use this approach to analyze a large set of pathogenic non-coding variants, and we show that these variants lead to significant differences in TF binding between alleles, compared to a control set of common variants. Thus, our results indicate that there is a strong regulatory component to the pathogenic non-coding variants identified thus far.

J. Zhao and D. Li—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Adzhubei, I.A., Schmidt, S., Peshkin, L., et al.: A method and server for predicting damaging missense mutations. Nat. Methods 7(4), 248–249 (2010)

    Article  Google Scholar 

  2. Andersen, M.C., Engstrom, P.G., Lithwick, S., et al.: In silico detection of sequence variations modifying transcriptional regulation. PLoS Comput. Biol. 4(1), e5 (2008)

    Article  MathSciNet  Google Scholar 

  3. Annala, M., Laurila, K., Lahdesmaki, H., Nykter, M.: A linear model for transcription factor binding affinity prediction in protein binding microarrays. PLoS One 6(5), e20059 (2011)

    Article  Google Scholar 

  4. Auton, A., Brooks, L.D., Durbin, R.M., et al.: A global reference for human genetic variation. Nature 526(7571), 68–74 (2015)

    Article  Google Scholar 

  5. Badis, G., Berger, M.F., Philippakis, A.A., et al.: Diversity and complexity in DNA recognition by transcription factors. Science 324(5935), 1720–1723 (2009)

    Article  Google Scholar 

  6. Barrera, L.A., Vedenko, A., Kurland, J.V., et al.: Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science 351(6280), 1450–1454 (2016)

    Article  Google Scholar 

  7. Berger, M.F., Badis, G., Gehrke, A.R., et al.: Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133(7), 1266–1276 (2008)

    Article  Google Scholar 

  8. Berger, M.F., Bulyk, M.L.: Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protoc. 4(3), 393–411 (2009)

    Article  Google Scholar 

  9. Berger, M.F., Philippakis, A.A., Qureshi, A.M., et al.: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24(11), 1429–1435 (2006)

    Article  Google Scholar 

  10. Boyle, A.P., Hong, E.L., Hariharan, M., et al.: Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22(9), 1790–1797 (2012)

    Article  Google Scholar 

  11. Bulyk, M.L., Johnson, P.L., Church, G.M.: Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 30(5), 1255–1261 (2002)

    Article  Google Scholar 

  12. Fu, Y., Liu, Z., Lou, S., et al.: FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15(10), 480 (2014)

    Article  Google Scholar 

  13. Granek, J.A., Clarke, N.D.: Explicit equilibrium modeling of transcription-factor binding and gene regulation. Genome Biol. 6(10), R87 (2005)

    Article  Google Scholar 

  14. Jolma, A., Yan, J., Whitington, T., et al.: DNA-binding specificities of human transcription factors. Cell 152(1–2), 327–339 (2013)

    Article  Google Scholar 

  15. Kheradpour, P., Ernst, J., Melnikov, A., et al.: Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23(5), 800–811 (2013)

    Article  Google Scholar 

  16. Khurana, E., Fu, Y., Chakravarty, D., et al.: Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17(2), 93–108 (2016)

    Article  Google Scholar 

  17. Kulakovskiy, I.V., Medvedeva, Y.A., Schaefer, U., et al.: HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 41(Database issue), 195–202 (2013)

    Article  Google Scholar 

  18. Landrum, M.J., Lee, J.M., Benson, M., et al.: ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44(D1), D862–868 (2016)

    Article  Google Scholar 

  19. Lumley, T., Diehr, P., Emerson, S., Chen, L.: The importance of the normality assumption in large public health data sets. Annu. Rev. Public Health 23, 151–169 (2002)

    Article  Google Scholar 

  20. Maerkl, S.J., Quake, S.R.: A systems approach to measuring the binding energy landscapes of transcription factors. Science 315(5809), 233–237 (2007)

    Article  Google Scholar 

  21. Mathelier, A., Fornes, O., Arenillas, D.J., et al.: JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44(D1), D110–115 (2016)

    Article  Google Scholar 

  22. Mathelier, A., Zhao, X., Zhang, A.W., et al.: JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42(Database issue), D142–D147 (2014)

    Article  Google Scholar 

  23. Matys, V., Kel-Margoulis, O.V., Fricke, E., et al.: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34(Database issue), D108–D110 (2006)

    Article  Google Scholar 

  24. Maurano, M.T., Humbert, R., Rynes, E., et al.: Systematic localization of common disease-associated variation in regulatory DNA. Science 337(6099), 1190–1195 (2012)

    Article  Google Scholar 

  25. McLaren, W., Gil, L., Hunt, S.E., et al.: The ensembl variant effect predictor. Genome Biol. 17(1), 122 (2016)

    Article  Google Scholar 

  26. McVicker, G., van de Geijn, B., Degner, J.F., et al.: Identification of genetic variants that affect histone modifications in human cells. Science 342(6159), 747–749 (2013)

    Article  Google Scholar 

  27. Melnikov, A., Murugan, A., Zhang, X., et al.: Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30(3), 271–277 (2012)

    Article  Google Scholar 

  28. Newburger, D.E., Bulyk, M.L.: UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 37(Database issue), 77–82 (2009)

    Article  Google Scholar 

  29. Ng, P.C., Henikoff, S.: SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31(13), 3812–3814 (2003)

    Article  Google Scholar 

  30. Perera, D., Chacon, D., Thoms, J.A., et al.: OncoCis: annotation of cis-regulatory mutations in cancer. Genome Biol. 15(10), 485 (2014)

    Google Scholar 

  31. Robasky, K., Bulyk, M.L.: UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res. 39(Database issue), D124–D128 (2011)

    Article  Google Scholar 

  32. Rowan, S., Siggers, T., Lachke, S.A., et al.: Precise temporal control of the eye regulatory gene Pax6 via enhancer-binding site affinity. Genes Dev. 24(10), 980–985 (2010)

    Article  Google Scholar 

  33. Siggers, T., Gordan, R.: Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 42(4), 2099–2111 (2014)

    Article  Google Scholar 

  34. Stenson, P.D., Mort, M., Ball, E.V., et al.: The human gene mutation database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133(1), 1–9 (2014)

    Article  Google Scholar 

  35. Stormo, G.D.: Modeling the specificity of protein-DNA interactions. Quant. Biol. 1(2), 115–130 (2013)

    Article  Google Scholar 

  36. Thomas-Chollier, M., Defrance, M., Medina-Rivera, A., et al.: RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res. 39(Web Server issue), 86–91 (2011)

    Article  Google Scholar 

  37. Tomovic, A., Oakeley, E.J.: Position dependencies in transcription factor binding sites. Bioinformatics 23(8), 933–941 (2007)

    Article  Google Scholar 

  38. Udalova, I.A., Mott, R., Field, D., Kwiatkowski, D.: Quantitative prediction of NF-kappa B DNA-protein interactions. Proc. Natl. Acad. Sci. U.S.A. 99(12), 8167–8172 (2002)

    Article  Google Scholar 

  39. Ward, L.D., Kellis, M.: Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30(11), 1095–1106 (2012)

    Article  Google Scholar 

  40. Weirauch, M.T., Cote, A., Norel, R., et al.: Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31(2), 126–134 (2013)

    Article  Google Scholar 

  41. Weirauch, M.T., Yang, A., Albu, M., et al.: Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158(6), 1431–1443 (2014)

    Article  Google Scholar 

  42. Zhao, Y., Ruan, S., Pandey, M., Stormo, G.D.: Improved models for transcription factor binding site identification using nonindependent interactions. Genetics 191(3), 781–790 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part by awards number P01CA142538 from the National Cancer Institute, and R01GM117106 from the National Institute of General Medical Sciences (to RG). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raluca Gordân .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zhao, J., Li, D., Seo, J., Allen, A.S., Gordân, R. (2017). Quantifying the Impact of Non-coding Variants on Transcription Factor-DNA Binding. In: Sahinalp, S. (eds) Research in Computational Molecular Biology. RECOMB 2017. Lecture Notes in Computer Science(), vol 10229. Springer, Cham. https://doi.org/10.1007/978-3-319-56970-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56970-3_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56969-7

  • Online ISBN: 978-3-319-56970-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics