Advances in Computational Identification and Modeling of DNA Regulatory Elements in the Human Genome
Identification of DNA regulatory elements in the human genome remains a significant challenge. Variation in these regulatory elements can contribute to disease in many ways by altering protein levels. Enhancers constitute an important class of these DNA regulatory elements, and a major component of current research is focused on a more complete understanding of enhancer function and improved techniques for enhancer detection. We recently developed a computational approach to identify enhancers from primary DNA sequence using a support vector machine (kmer-SVM) framework. Here we show that the kmer-SVM model can accurately predict tissue specific enhancer activity without any prior knowledge about TF binding sites. We adapt this approach to predict genomic TF binding data generated by the ENCODE project, showing that genomic MYC binding can be accurately predicted from local DNA sequence with the kmer-SVM. We find similar accuracy with an SVM using PWMs representing known TF binding specificities. By integrating Chip-seq and expression data, we show that while much of MYC binding is shared between ENCODE cell types and is promoter proximal, cell-type specific MYC binding is distal and is correlated with enhanced cell-specific expression of nearby (~50kb) genes. The distinction between shared and cell-specific MYC binding is determined by DNA sequence variation around the canonical MYC binding site, which by itself cannot distinguish cell-specific binding events. These results suggest that tissue specific enhancer activity is specified by primary DNA sequence, that local sequence context controls tissue specific activity through cooperative TF interactions, and that local context sequence features can be identified from genomic binding data.
Keywordscomputational biology genomics transcriptional regulation enhancers
Unable to display preview. Download preview PDF.