Prediction of Transcription Factor Families Using DNA Sequence Features
- Cite this paper as:
- Anand A., Fogel G.B., Pugalenthi G., Suganthan P.N. (2008) Prediction of Transcription Factor Families Using DNA Sequence Features. In: Chetty M., Ngom A., Ahmad S. (eds) Pattern Recognition in Bioinformatics. PRIB 2008. Lecture Notes in Computer Science, vol 5265. Springer, Berlin, Heidelberg
Understanding the mechanisms of protein-DNA interaction is of critical importance in biology. Transcription factor (TF) binding to a specific DNA sequence depends on at least two factors: A protein-level DNA-binding domain and a nucleotide-level specific sequence serving as a TF binding site. TFs have been classified into families based on these factors. TFs within each family bind to specific nucleotide sequences in a very similar fashion. Identification of the TF family that might bind at a particular nucleotide sequence requires a machine learning approach. Here we considered two sets of features based on DNA sequences and their physicochemical properties and applied a one-versus-all SVM (OVA-SVM) with class-wise optimized features to identify TF family-specific features in DNA sequences. Using this approach, a mean prediction accuracy of ~80% was achieved, which represents an improvement of ~7% over previous approaches on the same data.
KeywordsTranscription factor family prediction multi-class classification
Unable to display preview. Download preview PDF.