Quantifying the Impact of Non-coding Variants on Transcription Factor-DNA Binding

  • Jingkang Zhao
  • Dongshunyi Li
  • Jungkyun Seo
  • Andrew S. Allen
  • Raluca Gordân
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10229)


Many recent studies have emphasized the importance of genetic variants and mutations in cancer and other complex human diseases. The overwhelming majority of these variants occur in non-coding portions of the genome, where they can have a functional impact by disrupting regulatory interactions between transcription factors (TFs) and DNA. Here, we present a method for assessing the impact of non-coding mutations on TF-DNA interactions, based on regression models of DNA-binding specificity trained on high-throughput in vitro data. We use ordinary least squares (OLS) to estimate the parameters of the binding model for each TF, and we show that our predictions of TF binding changes due to DNA mutations correlate well with measured changes in gene expression. In addition, by leveraging distributional results associated with OLS estimation, for each predicted change in TF binding we also compute a normalized score (z-score) and a significance value (p-value) reflecting our confidence that the mutation affects TF binding. We use this approach to analyze a large set of pathogenic non-coding variants, and we show that these variants lead to significant differences in TF binding between alleles, compared to a control set of common variants. Thus, our results indicate that there is a strong regulatory component to the pathogenic non-coding variants identified thus far.


TF-DNA binding Non-coding variants Regression models 



This research was supported in part by awards number P01CA142538 from the National Cancer Institute, and R01GM117106 from the National Institute of General Medical Sciences (to RG). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health.


  1. 1.
    Adzhubei, I.A., Schmidt, S., Peshkin, L., et al.: A method and server for predicting damaging missense mutations. Nat. Methods 7(4), 248–249 (2010)CrossRefGoogle Scholar
  2. 2.
    Andersen, M.C., Engstrom, P.G., Lithwick, S., et al.: In silico detection of sequence variations modifying transcriptional regulation. PLoS Comput. Biol. 4(1), e5 (2008)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Annala, M., Laurila, K., Lahdesmaki, H., Nykter, M.: A linear model for transcription factor binding affinity prediction in protein binding microarrays. PLoS One 6(5), e20059 (2011)CrossRefGoogle Scholar
  4. 4.
    Auton, A., Brooks, L.D., Durbin, R.M., et al.: A global reference for human genetic variation. Nature 526(7571), 68–74 (2015)CrossRefGoogle Scholar
  5. 5.
    Badis, G., Berger, M.F., Philippakis, A.A., et al.: Diversity and complexity in DNA recognition by transcription factors. Science 324(5935), 1720–1723 (2009)CrossRefGoogle Scholar
  6. 6.
    Barrera, L.A., Vedenko, A., Kurland, J.V., et al.: Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science 351(6280), 1450–1454 (2016)CrossRefGoogle Scholar
  7. 7.
    Berger, M.F., Badis, G., Gehrke, A.R., et al.: Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 133(7), 1266–1276 (2008)CrossRefGoogle Scholar
  8. 8.
    Berger, M.F., Bulyk, M.L.: Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protoc. 4(3), 393–411 (2009)CrossRefGoogle Scholar
  9. 9.
    Berger, M.F., Philippakis, A.A., Qureshi, A.M., et al.: Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24(11), 1429–1435 (2006)CrossRefGoogle Scholar
  10. 10.
    Boyle, A.P., Hong, E.L., Hariharan, M., et al.: Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22(9), 1790–1797 (2012)CrossRefGoogle Scholar
  11. 11.
    Bulyk, M.L., Johnson, P.L., Church, G.M.: Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors. Nucleic Acids Res. 30(5), 1255–1261 (2002)CrossRefGoogle Scholar
  12. 12.
    Fu, Y., Liu, Z., Lou, S., et al.: FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15(10), 480 (2014)CrossRefGoogle Scholar
  13. 13.
    Granek, J.A., Clarke, N.D.: Explicit equilibrium modeling of transcription-factor binding and gene regulation. Genome Biol. 6(10), R87 (2005)CrossRefGoogle Scholar
  14. 14.
    Jolma, A., Yan, J., Whitington, T., et al.: DNA-binding specificities of human transcription factors. Cell 152(1–2), 327–339 (2013)CrossRefGoogle Scholar
  15. 15.
    Kheradpour, P., Ernst, J., Melnikov, A., et al.: Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23(5), 800–811 (2013)CrossRefGoogle Scholar
  16. 16.
    Khurana, E., Fu, Y., Chakravarty, D., et al.: Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17(2), 93–108 (2016)CrossRefGoogle Scholar
  17. 17.
    Kulakovskiy, I.V., Medvedeva, Y.A., Schaefer, U., et al.: HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res. 41(Database issue), 195–202 (2013)CrossRefGoogle Scholar
  18. 18.
    Landrum, M.J., Lee, J.M., Benson, M., et al.: ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44(D1), D862–868 (2016)CrossRefGoogle Scholar
  19. 19.
    Lumley, T., Diehr, P., Emerson, S., Chen, L.: The importance of the normality assumption in large public health data sets. Annu. Rev. Public Health 23, 151–169 (2002)CrossRefGoogle Scholar
  20. 20.
    Maerkl, S.J., Quake, S.R.: A systems approach to measuring the binding energy landscapes of transcription factors. Science 315(5809), 233–237 (2007)CrossRefGoogle Scholar
  21. 21.
    Mathelier, A., Fornes, O., Arenillas, D.J., et al.: JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 44(D1), D110–115 (2016)CrossRefGoogle Scholar
  22. 22.
    Mathelier, A., Zhao, X., Zhang, A.W., et al.: JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 42(Database issue), D142–D147 (2014)CrossRefGoogle Scholar
  23. 23.
    Matys, V., Kel-Margoulis, O.V., Fricke, E., et al.: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res. 34(Database issue), D108–D110 (2006)CrossRefGoogle Scholar
  24. 24.
    Maurano, M.T., Humbert, R., Rynes, E., et al.: Systematic localization of common disease-associated variation in regulatory DNA. Science 337(6099), 1190–1195 (2012)CrossRefGoogle Scholar
  25. 25.
    McLaren, W., Gil, L., Hunt, S.E., et al.: The ensembl variant effect predictor. Genome Biol. 17(1), 122 (2016)CrossRefGoogle Scholar
  26. 26.
    McVicker, G., van de Geijn, B., Degner, J.F., et al.: Identification of genetic variants that affect histone modifications in human cells. Science 342(6159), 747–749 (2013)CrossRefGoogle Scholar
  27. 27.
    Melnikov, A., Murugan, A., Zhang, X., et al.: Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30(3), 271–277 (2012)CrossRefGoogle Scholar
  28. 28.
    Newburger, D.E., Bulyk, M.L.: UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 37(Database issue), 77–82 (2009)CrossRefGoogle Scholar
  29. 29.
    Ng, P.C., Henikoff, S.: SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31(13), 3812–3814 (2003)CrossRefGoogle Scholar
  30. 30.
    Perera, D., Chacon, D., Thoms, J.A., et al.: OncoCis: annotation of cis-regulatory mutations in cancer. Genome Biol. 15(10), 485 (2014)Google Scholar
  31. 31.
    Robasky, K., Bulyk, M.L.: UniPROBE, update 2011: expanded content and search tools in the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res. 39(Database issue), D124–D128 (2011)CrossRefGoogle Scholar
  32. 32.
    Rowan, S., Siggers, T., Lachke, S.A., et al.: Precise temporal control of the eye regulatory gene Pax6 via enhancer-binding site affinity. Genes Dev. 24(10), 980–985 (2010)CrossRefGoogle Scholar
  33. 33.
    Siggers, T., Gordan, R.: Protein-DNA binding: complexities and multi-protein codes. Nucleic Acids Res. 42(4), 2099–2111 (2014)CrossRefGoogle Scholar
  34. 34.
    Stenson, P.D., Mort, M., Ball, E.V., et al.: The human gene mutation database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133(1), 1–9 (2014)CrossRefGoogle Scholar
  35. 35.
    Stormo, G.D.: Modeling the specificity of protein-DNA interactions. Quant. Biol. 1(2), 115–130 (2013)CrossRefGoogle Scholar
  36. 36.
    Thomas-Chollier, M., Defrance, M., Medina-Rivera, A., et al.: RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res. 39(Web Server issue), 86–91 (2011)CrossRefGoogle Scholar
  37. 37.
    Tomovic, A., Oakeley, E.J.: Position dependencies in transcription factor binding sites. Bioinformatics 23(8), 933–941 (2007)CrossRefGoogle Scholar
  38. 38.
    Udalova, I.A., Mott, R., Field, D., Kwiatkowski, D.: Quantitative prediction of NF-kappa B DNA-protein interactions. Proc. Natl. Acad. Sci. U.S.A. 99(12), 8167–8172 (2002)CrossRefGoogle Scholar
  39. 39.
    Ward, L.D., Kellis, M.: Interpreting noncoding genetic variation in complex traits and human disease. Nat. Biotechnol. 30(11), 1095–1106 (2012)CrossRefGoogle Scholar
  40. 40.
    Weirauch, M.T., Cote, A., Norel, R., et al.: Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31(2), 126–134 (2013)CrossRefGoogle Scholar
  41. 41.
    Weirauch, M.T., Yang, A., Albu, M., et al.: Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158(6), 1431–1443 (2014)CrossRefGoogle Scholar
  42. 42.
    Zhao, Y., Ruan, S., Pandey, M., Stormo, G.D.: Improved models for transcription factor binding site identification using nonindependent interactions. Genetics 191(3), 781–790 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jingkang Zhao
    • 1
    • 2
  • Dongshunyi Li
    • 3
  • Jungkyun Seo
    • 2
  • Andrew S. Allen
    • 1
    • 3
  • Raluca Gordân
    • 1
    • 3
    • 4
  1. 1.Center for Genomic and Computational BiologyDuke UniversityDurhamUSA
  2. 2.Program in Computational Biology and BioinformaticsDuke UniversityDurhamUSA
  3. 3.Department of Biostatistics and BioinformaticsDuke UniversityDurhamUSA
  4. 4.Department of Computer ScienceDuke UniversityDurhamUSA

Personalised recommendations