Skip to main content

A Feature-Based Approach to Modeling Protein-DNA Interactions

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4453))

Abstract

Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. In many cases this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF-DNA interactions, based on Markov networks. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our models, and devise an algorithm for learning their structural features from binding site data. We evaluate our approach on synthetic data, and then apply it to binding site and ChIP-chip data from yeast. We reveal sequence features that are present in the binding specificities of yeast TFs, and show that FMMs explain the binding data significantly better than PSSMs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elnitski, L., et al.: Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques. Genome Res. 16(12), 1455–1464 (2006)

    Article  Google Scholar 

  2. Bulyk, M.L.: Dna microarray technologies for measuring protein-dna interactions. Current Opinion in Biotechnology 17, 1–9 (2006)

    Article  Google Scholar 

  3. Maerkl, S.J., Quake, S.R.: A systems approach to measuring the binding energy landscapes of transcription factors. Science 315(5809), 233–236 (2007)

    Article  Google Scholar 

  4. Barash, Y., Elidan, G., Friedman, N., Kaplan, T.: Modeling dependencies in protein-dna binding sites. In: RECOMB (2003)

    Google Scholar 

  5. Harbison, C.T., et al.: Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004), 99–104 (2004)

    Article  Google Scholar 

  6. MacIsaac, K., et al.: An improved map of conserved regulatory sites for saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006)

    Article  Google Scholar 

  7. Della Pietra, S., et al.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 380–393 (1997)

    Article  Google Scholar 

  8. Lee, S.I., Ganapathi, V., Koller, D.: Efficient structure learning of Markov networks using L1-regularization. In: NIPS (2007)

    Google Scholar 

  9. Perkins, S., Lacker, K., Theiler, J.: Grafting: fast, incremental feature selection by gradient descent in function space. J. Mach. Learn. Res. 3, 1333–1356 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  10. Minka, T.P.: Algorithms for maximum-likelihood logistic regression. Technical Report 758, Carnegie Mellon University (2001)

    Google Scholar 

  11. Yedidia, J.S., et al.: Generalized belief propagation. In: NIPS, pp. 689–695 (2000)

    Google Scholar 

  12. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B. 58(1), 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  13. Ng, A.: Feature selection, l1 vs. l2 regularization, and rotational invariance. In: ICML (2004)

    Google Scholar 

  14. Rothermel, B., Thornton, J., Butow, R.: Rtgp3, a basic helix-loop-helix/leucine zipper protein that functions in mitochondrial-induced changes in gene expression, contains independent activation domains. J Biol Chem. 272, 19801–19807 (1997)

    Article  Google Scholar 

  15. Zeitlinger, J., et al.: Program-specific distribution of a transcription factor dependent on partner transcription factor and mapk signaling. Cell 113(3), 395–404 (2003)

    Article  Google Scholar 

  16. Segal, E., et al.: Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics 19(Suppl. 1), 273–282 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Terry Speed Haiyan Huang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Sharon, E., Segal, E. (2007). A Feature-Based Approach to Modeling Protein-DNA Interactions. In: Speed, T., Huang, H. (eds) Research in Computational Molecular Biology. RECOMB 2007. Lecture Notes in Computer Science(), vol 4453. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71681-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71681-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71680-8

  • Online ISBN: 978-3-540-71681-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics