Rotational Prior Knowledge for SVMs

  • Arkady Epshteyn
  • Gerald DeJong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)


Incorporation of prior knowledge into the learning process can significantly improve low-sample classification accuracy. We show how to introduce prior knowledge into linear support vector machines in form of constraints on the rotation of the normal to the separating hyperplane. Such knowledge frequently arises naturally, e.g., as inhibitory and excitatory influences of input variables. We demonstrate that the generalization ability of rotationally-constrained classifiers is improved by analyzing their VC and fat-shattering dimensions. Interestingly, the analysis shows that large-margin classification framework justifies the use of stronger prior knowledge than the traditional VC framework. Empirical experiments with text categorization and political party affiliation prediction confirm the usefulness of rotational prior knowledge.


Support Vector Machine Prior Knowledge Neural Information Processing System Generalization Error Hypothesis Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)zbMATHGoogle Scholar
  2. 2.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  3. 3.
    Dumas, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management (1998)Google Scholar
  4. 4.
    Campbell, C., Cristianini, N., Smola, A.: Query learning with large margin classifiers. In: Proceedings of The Seventeenth International Conference on Machine Learning, pp. 111–118 (2000)Google Scholar
  5. 5.
    Raina, R., Shen, Y., Ng, A., McCallum, A.: Classification with hybrid generative/discriminative models. In: Proceedings of the Seventeenth Annual Conference on Neural Information Processing Systems (2003)Google Scholar
  6. 6.
    Fink, M.: Object classification from a single example utilizing class relevance metrics. In: Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems (2004)Google Scholar
  7. 7.
    Scholkopf, B., Simard, P., Vapnik, V., Smola, A.: Prior knowledge in support vector kernels. In: Advances in kernel methods - support vector learning (2002)Google Scholar
  8. 8.
    Fung, G., Mangasarian, O., Shavlik, J.: Knowledge-based support vector machine classifiers. In: Proceedings of the Sixteenth Annual Conference on Neural Information Processing Systems (2002)Google Scholar
  9. 9.
    Wu, X., Srihari, R.: Incorporating prior knowledge with weighted margin support vector machines. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)Google Scholar
  10. 10.
    Mangasarian, O., Shavlik, J., Wild, E.: Knowledge-based kernel approximation. Journal of Machine Learning Research (2004)Google Scholar
  11. 11.
    Shawe-Taylor, J., Bartlett, P.L., Williamson, R.C., Anthony, M.: Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory 44 (1998)Google Scholar
  12. 12.
    Anthony, M., Biggs, N.: PAC learning and artificial neural networks. Technical report (2000)Google Scholar
  13. 13.
    Erlich, Y., Chazan, D., Petrack, S., Levy, A.: Lower bound on VC-dimension by local shattering. Neural Computation 9 (1997)Google Scholar
  14. 14.
    Grunbaum, B.: Convex Polytopes. John Wiley, Chichester (1967)Google Scholar
  15. 15.
    Blake, C., Merz, C.: UCI repository of machine learning databases (1998),
  16. 16.
    Blake, C., Merz, C.: 20 newsgroups database (1998),
  17. 17.
    Miller, G.: WordNet: an online lexical database. International Journal of Lexicography 3 (1990)Google Scholar
  18. 18.
    Dasgupta, S., Kalai, A.T., Monteleoni, C.: Analysis of perceptron-based active learning. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 249–263. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: Using aggressive feature selection to make svms competitive with c4.5. In: Proceedings of The Twenty-First International Conference on Machine Learning (2004)Google Scholar
  20. 20.
    Amit, D., Campbell, C., Wong, K.: The interaction space of neural networks with sign-constrained weights. Journal of Physics (1989)Google Scholar
  21. 21.
    Barber, D., Saad, D.: Does extra knowledge necessarily improve generalization? Neural Computation 8 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Arkady Epshteyn
    • 1
  • Gerald DeJong
    • 1
  1. 1.University of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations