, Volume 35, Issue 6, pp 1053-1067
Date: 22 Mar 2007

Context Specific Transcription Factor Prediction

Abstract

One of the goals of systems biology is the identification of regulatory mechanisms that govern an organism’s response to external stimuli. Transcription factors have been hypothesized as a major contributor to an organism’s response to various outside stimuli, and a great deal of work has been done to predict the set of transcription factors which regulate a given gene. Most of the current methods seek to identify possible binding sites from genomic sequence. Initial attempts at predicting transcription factors from genomic sequences suffered from the problem of false positives. Making the problem more difficult, it has also been shown that while predicted binding sites might be false positives, they can be shown to bind to their corresponding sequences in vitro. One method for rectifying this is through the use of phylogenetic analysis in which only regions which show high evolutionary conservation are analyzed. However such an approach may be too stringent because of the level of degeneracy shown in transcription factor binding site position weight matrices. Due to the degeneracy, there may be only a few bases that need to be conserved across species. Therefore, while a sequence may not show a high level of evolutionary conservation, these sequences may still show high affinity for the same transcription factor. In predicting transcription factor binding we explore the notion that “Co-expression implies co-regulation” [Allocco et al. BMC Bioinformatics 5:18, 2004]. With multiple genes requiring similar transcription factors binding sites, there exists a basis for eliminating false positives. This method allows for the selection of transcription factors binding sites that are active under a given experimental paradigm, thereby allowing us to indirectly incorporate the effects of chromosome and recognition site presentation upon transcription factor binding prediction. Rather than having to rationalize that a few transcription factors binding sites are over-represented in a cluster of genes, one can show that a few transcription factors are active in the cluster of genes that have been grouped together. Although the method focuses on predicting experiment-specific transcription factor binding sites, it is possible that if such a methodology were used in an iterative process where different experiments were analyzed, one could obtain a comprehensive set of transcription factors binding sites which regulate the various dynamic responses shown by biological systems under a variety of conditions hence building a more comprehensive model of transcriptional regulation.