Prediction of Binding Sites in the Mouse Genome Using Support Vector Machines

  • Yi Sun
  • Mark Robinson
  • Rod Adams
  • Alistair Rust
  • Neil Davey
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5164)


Computational prediction of cis-regulatory binding sites is widely acknowledged as a difficult task. There are many different algorithms for searching for binding sites in current use. However, most of them produce a high rate of false positive predictions. Moreover, many algorithmic approaches are inherently constrained with respect to the range of binding sites that they can be expected to reliably predict. We propose to use SVMs to predict binding sites from multiple sources of evidence. We combine random selection under-sampling and the synthetic minority over-sampling technique to deal with the imbalanced nature of the data. In addition, we remove some of the final predicted binding sites on the basis of their biological plausibility. The results show that we can generate a new prediction that significantly improves on the performance of any one of the individual prediction algorithms.


Support Vector Machine Mouse Genome Post Processing Minority Class False Positive Prediction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bailey, T.L., Elkan, C.: Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)Google Scholar
  2. 2.
    Blanchette, M., Tompa, M.: FootPrinter: A Program Designed for Phylogenetic Footprinting. Nucleic Acids Research 31(13), 3840–3842 (2003)CrossRefGoogle Scholar
  3. 3.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)zbMATHGoogle Scholar
  4. 4.
    Ettwiller, L., Paten, B., Souren, M., Loosli, F., Wittbrodt, J., Birney, E.: The Discovery, Positioning and Verification of a Set of Transcription-associated Motifs in Vertebrate. Genome Biol. 6(12) (2005)Google Scholar
  5. 5.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 389–422 (2002)zbMATHCrossRefGoogle Scholar
  6. 6.
    Hu, J.J., Yang, Y.F.D., Kihara, D.: EMD: an Ensemble Algorithm for Discovering Regulatory Motifs in DNA Sequsences. BMC Bioinformatics (2006)Google Scholar
  7. 7.
    Huber, B.R., Bulyk, M.L.: Meta-analysis Discovery of Tissue-specific DNA Sequence Motifs from Mammalian Gene Expressin Data. BMC Bioinformatics (2006)Google Scholar
  8. 8.
    Japkowicz, N.: Class Imbalances: Are We Focusing on the Right Issure? In: Workshop on learning from imbalanced datasets, II, ICML (2003)Google Scholar
  9. 9.
    Scholköpf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)Google Scholar
  10. 10.
    Sun, Y., Robinson, M., Adams, R., Kaye, P., Rust, A.G., Davey, N.: Using Real-valued Meta Classifiers to Integrate Binding Site Predictions. In: Proceedings of International Joint Conference on Neural Network (2005)Google Scholar
  11. 11.
    Sun, Y., Robinson, M., Adams, R., Davey, N., Rust, A.: Predicting Binding Sites in the Mouse Genome. In: Proceedings The Sixth International Conference on Machine Learning and Applications (ICMLA 2007) (2007)Google Scholar
  12. 12.
    Tompa, M., et al.: Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites. Nature Biotechnology 23(1) (2005)Google Scholar
  13. 13.
    Wu, G., Chang, E.: Class-boundary Alignment for Imbalanced Dataset Learning. In: Workshop on learning from imbalanced datasets, II, ICML (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Yi Sun
    • 1
  • Mark Robinson
    • 2
  • Rod Adams
    • 1
  • Alistair Rust
    • 3
  • Neil Davey
    • 1
  1. 1.Science and technology research schoolUniversity of HertfordshireUnited Kingdom
  2. 2.Department of Biochemistry and Molecular BiologyMichigan State UniversityEast LansingUSA
  3. 3.Institute for Systems BiologySeattleUSA

Personalised recommendations