Abstract
A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent flood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms. One important consequence is the ability to recognize groups of genes that are co-expressed using microarray expression data. We then wish to identify in-silico putative transcription factor binding sites in the promoter regions of these gene, that might explain the coregulation, and hint at possible regulators. In this paper we describe a simple and fast, yet powerful, two stages approach to this task. Using a rigorous hyper-geometric statistical analysis and a straightforward computational procedure we find small conserved sequence kernels. These are then stochastically expanded into PSSMs using an EM-like procedure. We demonstrate the utility and speed of our methods by applying them to several data sets from recent literature. We also compare these results with those of MEME when run on the same sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T.L. Bailey and Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 2, pages 28–36. 1994.
Y. Barash and N. Friedman. Context-specific Bayesian clustering for gene expression data. In Proc. Ann. Int. Conf. Comput. Mol. Biol., volume 5, pages 12–21. 2001.
Y. Benjamini and Y Hochberg. Controlling the False Discovery Rate: a practical and powerful approach to multiple testing. J. Royal Statistical Society B, 57:289–300, 1995.
A. Brazma, I. Jonassen, J. Vilo, and E. Ukkonen. Predicting gene regulatory elements in silico on a genomic scale. Genome Res., 8:1202–15, 1998.
J. Buhler and M. Tompa. finding motifs using random projections. In Proc. Ann. Int. Conf. Comput. Mol. Biol., volume 5, pages 69–76. 2001.
H. J. Bussemaker, H. Li, and E.D. Siggia. building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. PNAS, 97(18): 10096–100, 2000.
H. J. Bussemaker, H. Li, and E. D. Siggia. Regulatory element detection using a probabilistic segmentation model. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 8, pages 67–74. 2000.
S. Chu, J. DeRisi, M. Eisen, J. Mullholland, D. Botstein, P. Brown, and I. Herskowitz. The transcriptional program of sporulation in budding yeast. Science, 282:699–705, 1998.
J. DeRisi., V. Iyer, and P. Brown. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 282:699–705, 1997.
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.
R. Durrett. Probablity Theory and Examples. Wadsworth and Brooks, Cole, California, 1991.
M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. PNAS, 95:14863–14868, 1998.
D. J. Galas, M. Eggert, and M. S. Waterman. Rigorous pattern-recognition methods for dna sequences: analysis of promoter sequences from Escherichia coli. J. Mol. Biol., 186:117–28, 1985.
A. P. Gasch, P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen, G. Storz, D. Botstein, and P. O. Brown. Genomic expression program in the response of yeast cells to environmental changes. Mol. Bio. Cell, 11:4241–4257, 2000.
T. R. Hughes, M. J. Marton, A. R. Jones, C. J. Roberts, R. Stoughton, C. D. Armour, H. A. Bennett, E. Coffey, H. Dai, Y.D. He, M. J. Kidd, A. M. King, M. R. Meyer, D. Slade, P. Y. Lum, S. B. Stepaniants, D. D. Shoemaker, D. Gachotte, K. Chakraburtty, J. Simon, M. Bard, and S. H. Friend. Functional discovery via a compendium of expression profiles. Cell, 102(1): 109–26, 2000.
V. R. Iyer, C. E. Horak, C. S. Scafe, D. Botstein, M. Snyder, and P. O. Brown. Genomic binding sites of the yeast cell-cycle transcription factors sbf and mbf. Nature, 409:533–538, 2001.
L. J. Jensen and S. Knudsen. Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation. Bioinformatics, 16:326–333, 2000.
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, R. F. Neuwald, and J. C. Wooton. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science, 262:208–214, 1993.
P.A. Pevzner and S.H. Sze. Combinatorial approaches to finding subtle signals in dna sequences. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 8, pages 269–78. 2000.
F.P. Roth, P.W. Hughes, J.D. Estep, and G.M. Church. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol., 16:939–945, 1998.
S. Sinha and M. Tompa. A statistical method for finding transcription factor binding sites. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 8, pages 344–54. 2000.
P. T. Spellman, G. Sherlock, M. Q. Zhang, V. R. Iyer, K. Anders, M. B. Eisen, P. O. Brown, D. Botstein, and B. Futcher. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, 9(12):3273–97, 1998.
S. Tavazoie, J. D. Hughes, M. J. Campbell, R. J. Cho, and G. M. Church. Systematic determination of genetic network architecture. Nat Genet, 22(3):281–5, 1999. Comment in: Nat Genet 1999 Jul;22(3):213-5.
J. van Helden, B. Andre, and J. Collado-Vides. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol., 281(5):827–42, 1998.
J. van Helden, A. F Rios, and J. Collado-Vides. discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucl. Acids Res., 28(8): 1808–18, 2000.
J. Vilo, A. Brazma, I. Jonassen, A. Robinson, and E. Ukkonen. Mining for putative regulatory elements in the yeast genome using gene expression data. In Proc. Int. Conf. Intell. Syst. Mol. Biol., volume 8, pages 384–94. 2000.
F. Wolfertstetter, K. Frech, G. Herrmann, and T. Werner. Identification of functional elements in unaligned nucleic acid sequences by a novel tuple search algorithm. Comput. Appl. Biosci., 12(1):71–80, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barash, Y., Bejerano, G., Friedman, N. (2001). A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites. In: Gascuel, O., Moret, B.M.E. (eds) Algorithms in Bioinformatics. WABI 2001. Lecture Notes in Computer Science, vol 2149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44696-6_22
Download citation
DOI: https://doi.org/10.1007/3-540-44696-6_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42516-8
Online ISBN: 978-3-540-44696-5
eBook Packages: Springer Book Archive