Integrating binding site predictions using meta classification methods

  • Y. Sun
  • M. Robinson
  • R. Adams
  • A. G. Rust
  • P. Kaye
  • N. Davey


Currently the best algorithms for transcription factor binding site prediction are severely limited in accuracy. There is good reason to believe that predictions from these different classes of algorithms could be used in conjunction to improve the quality of predictions. In this paper, we apply single layer networks and support vector machines on predictions from 12 key algorithms. Furthermore, we use a ‘window’ of consecutive results for the input vectors in order to contextualise the neighbouring results. Moreover, we improve the classification result with the aid of under- and over- sampling techniques. We find that by integrating 12 base algorithms, support vector machines and single layer networks can give better binding site predictions.


Support Vector Machine Minority Class Site Prediction Imbalanced Dataset Window Input 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1] Scholar
  2. [2]
    Bailey, T.L. & Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology: 28–36, AAAI Press.Google Scholar
  3. [3] Scholar
  4. [4]
    Blanchette, M. & Tompa, M. (2003) FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Research: Vol. 31, No. 13, 3840–3842.CrossRefGoogle Scholar
  5. [5]
    Markstein, M., Stathopoulos, A., Markstein, V., Markstein, P., Harafuji, N., Keys, D., Lee, B., Richardson, P., Rokshar, D., Levine, M. (2002) Decoding Noncoding Regulatory DNAs in Metazoan Genomes. Proceeding of 1st IEEE Computer Society Bioinformatics Conference (CSB 2002), Stanford, CA, USA.Google Scholar
  6. [6]
    Arnone, M. I. and Davidson, E. H. (1997) The hardwiring of development: Organization and function of genomic regulatory systems. Development: 124, 1851–1864.Google Scholar
  7. [7]
    Apostolico, A., Bock, M.E, Lonardi, S., & Xu, X. (2000) Efficient Detection of Unusual Words. Journal of Computational Biology: Vol.7, No. 1/2.Google Scholar
  8. [8]
    Rajewsky, N., Vergassola, M., Gaul, U. & Siggia, E.D. (2002) Computational detection of genomic cis regulatory modules, applied to body patterning in the early Drosophila embryo, BMC. Bioinformatics: 3:30.CrossRefGoogle Scholar
  9. [9]
    Thijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor B, Rouz P, & Moreau, Y. (2001) A Gibbs Sampling method to detect over-represented motifs in upstream regions of coexpressed genes, Proceedings Recomb’2001: pp. 305–312.Google Scholar
  10. [10]
    Hughes, J.D., Estep, P.W., Tavazoie, S., & Church, G.M. (2000) Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. Journal of Molecular Biology Mar 10; 296(5): 1205–1214CrossRefGoogle Scholar
  11. [11]
    Japkowicz, N. (2003) Class imbalances: Are we focusing on the right issure? Workshop on learning from imbalanced datasets, II, ICML, Washington DC.Google Scholar
  12. [12]
    Chawla, N. V., Bowyer, K. W., Hall, L. O. and Kegelmeyer, W. P. (2002) SMOTE: Synthetic minority over-sampling Technique. Journal of Artificial Intelligence Research. Vol. 16, pp. 321–357.Google Scholar
  13. [13]
    Bishop, CM. (1995) Neural Networks for Pattern Recognition. Oxford University Press, New York.Google Scholar
  14. [14]
    Scholköpf, B and Smola, A. J. (2002) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, The MIT Press.Google Scholar
  15. [15]
    Buckland, M. and Gey, F. (1994) The relationship between Recall and Precision. Journal of the American Society for Information Science: Vol. 45, No. 1, pp. 12–19.CrossRefGoogle Scholar
  16. [16]
    Joshi, M., Kumar, V, and Agarwal, R. (2001) Evaluating Boosting algorithms to classify rare classes: Comparison and improvements. First IEEE International Conference on Data Mining, San Jose, CA.Google Scholar

Copyright information

© Springer-Verlag/Wien 2005

Authors and Affiliations

  • Y. Sun
    • 1
  • M. Robinson
    • 1
  • R. Adams
    • 1
  • A. G. Rust
    • 2
  • P. Kaye
    • 1
  • N. Davey
    • 1
  1. 1.Science and technology research schoolUniversity of HertfordshireUK
  2. 2.Institute of Systems BiologySeattleUSA

Personalised recommendations