Determination of specificity influencing residues for key transcription factor families

Abstract

Transcription factors (TFs) are major modulators of transcription and subsequent cellular processes. The binding of TFs to specific regulatory elements is governed by their specificity. Considering the gap between known TFs sequence and specificity, specificity prediction frameworks are highly desired. Key inputs to such frameworks are protein residues that modulate the specificity of TF under consideration. Simple measures like mutual information (MI) to delineate specificity influencing residues (SIRs) from alignment fail due to structural constraints imposed by the three-dimensional structure of protein. Structural restraints on the evolution of the amino-acid sequence lead to identification of false SIRs. In this manuscript we extended three methods (direct information, PSICOVand adjusted mutual information) that have been used to disentangle spurious indirect protein residue-residue contacts from direct contacts, to identify SIRs from joint alignments of amino-acids and specificity. We predicted SIRs for homeodomain (HD), helix-loop-helix, LacI and GntR families of TFs using these methods and compared to MI. Using various measures, we show that the performance of these three methods is comparable but better than MI. Implication of these methods in specificity prediction framework is discussed. The methods are implemented as an R package and available along with the alignments at http://stormo.wustl.edu/SpecPred.

References

  1. 1.

    Balwierz, P. J., Pachkov, M., Arnold, P., Gruber, A. J., Zavolan, M. and van Nimwegen, E. (2014) ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs. Genome Res., 24, 869–884

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  2. 2.

    Khurana, E., Fu, Y., Colonna, V., Mu, X. J., Kang, H. M., Lappalainen, T., Sboner, A., Lochovsky, L., Chen, J., Harmanci, A., et al. (2013) Integrative annotation of variants from 1092 humans: application to cancer genomics. Science, 342, 1235587

    PubMed Central  Article  PubMed  Google Scholar 

  3. 3.

    Wright, D. A., Li, T., Yang, B. and Spalding, M. H. (2014) TALENmediated genome editing: prospects and perspectives. Biochem. J., 462, 15–24

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Mendenhall, E. M., Williamson, K. E., Reyon, D., Zou, J. Y., Ram, O., Joung, J. K. and Bernstein, B. E. (2013) Locus-specific editing of histone modifications at endogenous enhancers. Nat. Biotechnol., 31, 1133–1136

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  5. 5.

    Lin, Y., Chomvong, K., Acosta-Sampson, L., Estrela, R., Galazka, J. M., Kim, S. R., Jin, Y. S. and Cate, J. H. (2014) Leveraging transcription factors to speed cellobiose fermentation by Saccharomyces cerevisiae. Biotechnol. Biofuels, 7, 126

    PubMed Central  PubMed  Google Scholar 

  6. 6.

    Cheng, C., Alexander, R., Min, R., Leng, J., Yip, K. Y., Rozowsky, J., Yan, K. K., Dong, X., Djebali, S., Ruan, Y., et al. (2012) Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res., 22, 1658–1667

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  7. 7.

    Haynes, B. C., Maier, E. J., Kramer, M. H., Wang, P. I., Brown, H. and Brent, M. R. (2013) Mapping functional transcription factor networks from gene expression data. Genome Res., 23, 1319–1328

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  8. 8.

    Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. and Luscombe, N. M. (2009) A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet., 10, 252–263

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Matthews, B.W. (1988) No code for recognition. Nature, 335, 294–295

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Benos, P. V., Lapedes, A. S. and Stormo, G. D. (2002) Probabilistic code for DNA recognition by proteins of the EGR family. J. Mol. Biol., 323, 701–727

    CAS  Article  PubMed  Google Scholar 

  11. 11.

    Gupta, A., Christensen, R. G., Bell, H. A., Goodwin, M., Patel, R. Y., Pandey, M., Enuameh, M. S., Rayla, A. L., Zhu, C., Thibodeau-Beganny, S., et al. (2014) An improved predictive recognition model for Cys2-His2 zinc finger proteins. Nucleic Acids Res., 42, 4800–4812

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  12. 12.

    Kaplan, T., Friedman, N. and Margalit, H. (2005) Ab initio prediction of transcription factor targets using structural knowledge. PLoS Comput. Biol., 1, e1

    PubMed Central  Article  PubMed  Google Scholar 

  13. 13.

    Liu, J. and Stormo, G. D. (2008) Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors. Bioinformatics, 24, 1850–1857

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  14. 14.

    Persikov, A. V., Osada, R. and Singh, M. (2009) Predicting DNA recognition by Cys2-His2 zinc finger proteins. Bioinformatics, 25, 22–29

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  15. 15.

    Persikov, A. V. and Singh, M. (2014) De novo prediction of DNAbinding specificities for Cys2-His2 zinc finger proteins. Nucleic Acids Res., 42, 97–108

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  16. 16.

    Wolfe, S. A., Nekludova, L. and Pabo, C. O. (2000) DNA recognition by Cys2-His2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct., 29, 183–212

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Christensen, R. G., Enuameh, M. S., Noyes, M. B., Brodsky, M. H., Wolfe, S. A. and Stormo, G. D. (2012) Recognition models to predict DNA-binding specificities of homeodomain proteins. Bioinformatics, 28, i84–i89

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  18. 18.

    Stormo, G. D. (2013) Introduction to protein-DNA interactions: structure, thermodynamics, and bioinformatics. NewYork: Cold Spring Harbor Laboratory Press.

    Google Scholar 

  19. 19.

    Giraud, B. G., Heumann, J. M. and Lapedes, A. S. (1999) Superadditive correlation. Phys. Rev. E, 59, 4983–4991

    CAS  Article  Google Scholar 

  20. 20.

    Lapedes, A. S., Giraud, B., Liu, L.C. and Stormo, G. D. (1999) Correlated mutations in models of protein sequences: phylogenetic and structural effects. The institute of mathematical statistics lecture notesmonograph series, 33, 236–256

    Article  Google Scholar 

  21. 21.

    Lapedes, A., Giraud, B. and Jarzynski, C. (2002) Using sequence alignments to predict protein structure and stability with high accuracy. q-bio. QM, arXiv, 1207.2484

    Google Scholar 

  22. 22.

    Cocco, S., Monasson, R. and Weigt, M. (2013) From principal component to direct coupling analysis of coevolution in proteins: loweigenvalue modes are needed for structure prediction. PLoS Comput. Biol., 9, e1003176

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  23. 23.

    Jones, D. T., Buchan, D. W., Cozzetto, D. and Pontil, M. (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics, 28, 184–190

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Kamisetty, H., Ovchinnikov, S. and Baker, D. (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc. Natl. Acad. Sci. USA, 110, 15674–15679

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  25. 25.

    Marks, D. S., Colwell, L. J., Sheridan, R., Hopf, T. A., Pagnani, A., Zecchina, R. and Sander, C. (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One, 6, e28766

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  26. 26.

    Morcos, F., Pagnani, A., Lunt, B., Bertolino, A., Marks, D. S., Sander, C., Zecchina, R., Onuchic, J. N., Hwa, T. and Weigt, M. (2011) Directcoupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. USA, 108, E1293–E1301

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  27. 27.

    Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. and Hwa, T. (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl. Acad. Sci. USA, 106, 67–72

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  28. 28.

    Burger, L. and van Nimwegen, E. (2008) Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Mol. Syst. Biol., 4, 165

    PubMed Central  Article  PubMed  Google Scholar 

  29. 29.

    Ovchinnikov, S., Kamisetty, H. and Baker, D. (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife, 3, e02030

    PubMed Central  Article  PubMed  Google Scholar 

  30. 30.

    Feizi, S., Marbach, D., Médard, M. and Kellis, M. (2013) Network deconvolution as a general method to distinguish direct dependencies in networks. Nat. Biotechnol., 31, 726–733

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  31. 31.

    Zhu, L. J., Christensen, R. G., Kazemian, M., Hull, C. J., Enuameh, M. S., Basciotta, M. D., Brasefield, J. A., Zhu, C., Asriyan, Y., Lapointe, D. S., et al. (2011) FlyFactorSurvey: a database of Drosophila transcription factor binding specificities determined using the bacterial onehybrid system. Nucleic Acids Res., 39, D111–D117

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  32. 32.

    Robasky, K. and Bulyk, M. L. (2011) UniPROBE, update 2011: expanded content and search tools in the online database of proteinbinding microarray data on protein-DNA interactions. Nucleic Acids Res., 39, D124–D128

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  33. 33.

    Jolma, A., Yan, J., Whitington, T., Toivonen, J., Nitta, K. R., Rastas, P., Morgunova, E., Enge, M., Taipale, M., Wei, G., et al. (2013) DNAbinding specificities of human transcription factors. Cell, 152, 327–339

    CAS  Article  PubMed  Google Scholar 

  34. 34.

    Novichkov, P. S., Kazakov, A. E., Ravcheev, D. A., Leyn, S. A., Kovaleva, G. Y., Sutormin, R. A., Kazanov,M. D., Riehl,W., Arkin, A. P., Dubchak, I., et al. (2013) RegPrecise 3.0—a resource for genomescale exploration of transcriptional regulation in bacteria. BMC Genomics, 14, 745

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  35. 35.

    Magrane, M. and Consortium, U. (2011) UniProt Knowledgebase: a hub of integrated protein data. Database, 2011, bar009

    PubMed Central  Article  PubMed  Google Scholar 

  36. 36.

    Dehal, P. S., Joachimiak, M. P., Price, M. N., Bates, J. T., Baumohl, J. K., Chivian, D., Friedland, G. D., Huang, K. H., Keller, K., Novichkov, P. S., et al. (2010) MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic Acids Res., 38, D396–D400

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  37. 37.

    Eddy, S. R. (2011) Accelerated profile HMM searches. PLoS Comput. Biol., 7, e1002195

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  38. 38.

    Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Heger, A., Hetherington, K., Holm, L., Mistry, J., et al. (2014) Pfam: the protein families database. Nucleic Acids Res., 42, D222–D230

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  39. 39.

    Wang, T. and Stormo, G. D. (2003) Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics, 19, 2369–2380

    CAS  Article  PubMed  Google Scholar 

  40. 40.

    Wang, T. and Stormo, G. D. (2005) Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. USA, 102, 17400–17405

    PubMed Central  CAS  Article  PubMed  Google Scholar 

  41. 41.

    Mahony, S. and Benos, P.V. (2007) STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic Acids Res, 35 (Web Server issue), W253–W258

    PubMed Central  Article  PubMed  Google Scholar 

  42. 42.

    Kwan, C. (2014) A regression-based interpretation of the inverse of thesample covariance matrix. Spreadsheets in Education (eJSiE), 7, Article 3

    Google Scholar 

  43. 43.

    Dunn, S. D., Wahl, L. M. and Gloor, G. B. (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics, 24, 333–340

    CAS  Article  PubMed  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Ronak Y. Patel or Gary D. Stormo.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Patel, R.Y., Garde, C. & Stormo, G.D. Determination of specificity influencing residues for key transcription factor families. Quant Biol 3, 115–123 (2015). https://doi.org/10.1007/s40484-015-0045-y

Download citation

Keywords

  • protein-DNA interactions
  • residue co-variance
  • motifs
  • co-evolution
  • feature selection
  • direct information
  • specificity determinants