A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites

Barash, Yoseph; Bejerano, Gill; Friedman, Nir

doi:10.1007/3-540-44696-6_22

Yoseph Barash⁶,
Gill Bejerano⁶ &
Nir Friedman⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2149))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

494 Accesses
23 Citations

Abstract

A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent flood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms. One important consequence is the ability to recognize groups of genes that are co-expressed using microarray expression data. We then wish to identify in-silico putative transcription factor binding sites in the promoter regions of these gene, that might explain the coregulation, and hint at possible regulators. In this paper we describe a simple and fast, yet powerful, two stages approach to this task. Using a rigorous hyper-geometric statistical analysis and a straightforward computational procedure we find small conserved sequence kernels. These are then stochastically expanded into PSSMs using an EM-like procedure. We demonstrate the utility and speed of our methods by applying them to several data sets from recent literature. We also compare these results with those of MEME when run on the same sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

T.L. Bailey and Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 2, pages 28–36. 1994.
Google Scholar
Y. Barash and N. Friedman. Context-specific Bayesian clustering for gene expression data. In Proc. Ann. Int. Conf. Comput. Mol. Biol., volume 5, pages 12–21. 2001.
Google Scholar
Y. Benjamini and Y Hochberg. Controlling the False Discovery Rate: a practical and powerful approach to multiple testing. J. Royal Statistical Society B, 57:289–300, 1995.
MATH MathSciNet Google Scholar
A. Brazma, I. Jonassen, J. Vilo, and E. Ukkonen. Predicting gene regulatory elements in silico on a genomic scale. Genome Res., 8:1202–15, 1998.
Google Scholar
J. Buhler and M. Tompa. finding motifs using random projections. In Proc. Ann. Int. Conf. Comput. Mol. Biol., volume 5, pages 69–76. 2001.
Google Scholar
H. J. Bussemaker, H. Li, and E.D. Siggia. building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. PNAS, 97(18): 10096–100, 2000.
Article MathSciNet Google Scholar
H. J. Bussemaker, H. Li, and E. D. Siggia. Regulatory element detection using a probabilistic segmentation model. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 8, pages 67–74. 2000.
Google Scholar
S. Chu, J. DeRisi, M. Eisen, J. Mullholland, D. Botstein, P. Brown, and I. Herskowitz. The transcriptional program of sporulation in budding yeast. Science, 282:699–705, 1998.
Article Google Scholar
J. DeRisi., V. Iyer, and P. Brown. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 282:699–705, 1997.
Google Scholar
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.
Google Scholar
R. Durrett. Probablity Theory and Examples. Wadsworth and Brooks, Cole, California, 1991.
Google Scholar
M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. PNAS, 95:14863–14868, 1998.
Article Google Scholar
D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for dna sequences: analysis of promoter sequences from Escherichia coli. J. Mol. Biol., 186:117–28, 1985.
Article Google Scholar
A. P. Gasch, P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown. Genomic expression program in the response of yeast cells to environmental changes. Mol. Bio. Cell, 11:4241–4257, 2000.
Google Scholar
T. R. Hughes, M. J. Marton, A. R. Jones, C. J. Roberts, R. Stoughton, C. D. Armour, H. A. Bennett, E. Coffey, H. Dai, Y.D. He, M. J. Kidd, A. M. King, M. R. Meyer, D. Slade, P. Y. Lum, S. B. Stepaniants, D. D. Shoemaker, D. Gachotte, K. Chakraburtty, J. Simon, M. Bard, and S. H. Friend. Functional discovery via a compendium of expression profiles. Cell, 102(1): 109–26, 2000.
Article Google Scholar
V. R. Iyer, C. E. Horak, C. S. Scafe, D. Botstein, M. Snyder, and P. O. Brown. Genomic binding sites of the yeast cell-cycle transcription factors sbf and mbf. Nature, 409:533–538, 2001.
Article Google Scholar
L. J. Jensen and S. Knudsen. Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation. Bioinformatics, 16:326–333, 2000.
Article Google Scholar
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, R. F. Neuwald, and J. C. Wooton. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262:208–214, 1993.
Article Google Scholar
P.A. Pevzner and S.H. Sze. Combinatorial approaches to finding subtle signals in dna sequences. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 8, pages 269–78. 2000.
Google Scholar
F.P. Roth, P.W. Hughes, J.D. Estep, and G.M. Church. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol., 16:939–945, 1998.
Article Google Scholar
S. Sinha and M. Tompa. A statistical method for finding transcription factor binding sites. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 8, pages 344–54. 2000.
Google Scholar
P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9(12):3273–97, 1998.
Google Scholar
S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho, and G. M. Church. Systematic determination of genetic network architecture. Nat Genet, 22(3):281–5, 1999. Comment in: Nat Genet 1999 Jul;22(3):213-5.
Article Google Scholar
J. van Helden, B. Andre, and J. Collado-Vides. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol., 281(5):827–42, 1998.
Article Google Scholar
J. van Helden, A. F Rios, and J. Collado-Vides. discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucl. Acids Res., 28(8): 1808–18, 2000.
Article Google Scholar
J. Vilo, A. Brazma, I. Jonassen, A. Robinson, and E. Ukkonen. Mining for putative regulatory elements in the yeast genome using gene expression data. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 8, pages 384–94. 2000.
Google Scholar
F. Wolfertstetter, K. Frech, G. Herrmann, and T. Werner. Identification of functional elements in unaligned nucleic acid sequences by a novel tuple search algorithm. Comput. Appl. Biosci., 12(1):71–80, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Engineering, The Hebrew University, Jerusalem, 91904, Israel
Yoseph Barash, Gill Bejerano & Nir Friedman

Authors

Yoseph Barash
View author publications
You can also search for this author in PubMed Google Scholar
Gill Bejerano
View author publications
You can also search for this author in PubMed Google Scholar
Nir Friedman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

LIRMM, 161 rue Ada, 34392, Montpellier, France
Olivier Gascuel
Department of Computer Science, University of New Mexico, Albuquerque, NM, 87131, USA
Bernard M. E. Moret

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barash, Y., Bejerano, G., Friedman, N. (2001). A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites. In: Gascuel, O., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2001. Lecture Notes in Computer Science, vol 2149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44696-6_22

Download citation

DOI: https://doi.org/10.1007/3-540-44696-6_22
Published: 17 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42516-8
Online ISBN: 978-3-540-44696-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics