Chapter

Algorithms in Bioinformatics

Volume 2149 of the series Lecture Notes in Computer Science pp 278-293

Date:

A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites

  • Yoseph BarashAffiliated withSchool of Computer Science & Engineering, The Hebrew University
  • , Gill BejeranoAffiliated withSchool of Computer Science & Engineering, The Hebrew University
  • , Nir FriedmanAffiliated withSchool of Computer Science & Engineering, The Hebrew University

* Final gross prices may vary according to local VAT.

Get Access

Abstract

A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent flood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms. One important consequence is the ability to recognize groups of genes that are co-expressed using microarray expression data. We then wish to identify in-silico putative transcription factor binding sites in the promoter regions of these gene, that might explain the coregulation, and hint at possible regulators. In this paper we describe a simple and fast, yet powerful, two stages approach to this task. Using a rigorous hyper-geometric statistical analysis and a straightforward computational procedure we find small conserved sequence kernels. These are then stochastically expanded into PSSMs using an EM-like procedure. We demonstrate the utility and speed of our methods by applying them to several data sets from recent literature. We also compare these results with those of MEME when run on the same sets.