The identification of transcription factor binding sites (TFBS) is an important initial step in determining the DNA signals that regulate transcription of the genome. We tested the performance of three distinct computational methods for the identification of TFBS applied to the human genome sequence, as judged by their ability to recover the location of experimentally determined, and uniquely mapped, TFBS taken from the TRANSFAC database. These identification methods all attempt to filter the quantity of TFBS identified by aligning positional weight matrices that describe the binding site and employ either (i) a P-value threshold for accepting a site, (ii) an over-representation measure of neighboring sites, or (iii) conservation with the mouse genome and application of P-value thresholds. The results show that the best recognition of TFBS is achieved by combining the identification of TFBS in regions of human–mouse conservation and also by applying a high stringency P-value to the TFBS identified in non-coding regions that are not conserved. Additionally, we find that only half of the 481 experimentally mapped sites can be found in sequence regions conserved with mouse, but the predictive power of the binding site identification method is up to threefold higher in the conserved regions.
Similar content being viewed by others
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Levy, S., Hannenhalli, S. Identification of transcription factor binding sites in the human genome sequence . Mamm Genome 13, 510–514 (2002). https://doi.org/10.1007/s00335-002-2175-6
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s00335-002-2175-6