Data Mining pp 147-158 | Cite as

Prediction with the SVM Using Test Point Margins

  • Süreyya Özöğür-Akyüz
  • Zakria Hussain
  • John Shawe-Taylor
Part of the Annals of Information Systems book series (AOIS, volume 8)


Support vector machines (SVMs) carry out binary classification by constructing a maximal margin hyperplane between the two classes of observed (training) examples and then classifying test points according to the half-spaces in which they reside (irrespective of the distances that may exist between the test examples and the hyperplane). Cross-validation involves finding the one SVM model together with its optimal parameters that minimizes the training error and has good generalization in the future. In contrast, in this chapter we collect all of the models found in the model selection phase and make predictions according to the model whose hyperplane achieves the maximum separation from a test point. This directly corresponds to the L norm for choosing SVM models at the testing stage. Furthermore, we also investigate other more general techniques corresponding to different L p norms and show how these methods allow us to avoid the complex and timeconsuming paradigm of cross-validation. Experimental results demonstrate this advantage, showing significant decreases in computational time as well as competitive generalization error.


Support Vector Machine Test Point Support Vector Machine Model Generalization Error Covariance Matrix Adaptation Evolution Strategy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    R. Berk. An introduction to ensemble methods for data analysis. In eScholarship Repository, University of California., 2004.
  2. 2.
    B.E. Boser, I.M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In In Fifth Annual Workshop on Computational Learning Theory , ACM., pages 144–152, Pittsburgh, 1992. ACM.Google Scholar
  3. 3.
    L. Breiman. Stacked regressions. Machine Learning, 24(1):49–64, 1996.Google Scholar
  4. 4.
    O. Chapelle and V. Vapnik. Choosing multiple parameters for support vector machines. Machine Learning, 46:131–159, 2002.CrossRefGoogle Scholar
  5. 5.
    R. Clemen. Combining forecasts: A review and annotated bibliography. Journal of Forecasting, 5:559–583, 1989.CrossRefGoogle Scholar
  6. 6.
    N. Cristianini and J. Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.Google Scholar
  7. 7.
    W. Frawley, G. Piatetsky-Shapiro, and C. Matheus. Knowledge discovery in databases: An overview. In AI Magazine, pages 213–228, 1992.Google Scholar
  8. 8.
    Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148–156, Bari, Italy, 1996.Google Scholar
  9. 9.
    F. Friedrichs and C. Igel. Evolutionary tuning of multiple svm parameters. Neurocomputing, 64:107–117, 2005.CrossRefGoogle Scholar
  10. 10.
    D. Hand, H. Mannila, and P. Smyth. An introduction to Support Vector Machines. MIT Press, Cambridge, MA, 2001.Google Scholar
  11. 11.
    S.S. Keerthi. Efficient tuning of svm hyperparameters using radius/margin bound and iterative algorithms. IEEE Transactions on Neural Networks, 13:1225-1229, 2002.Google Scholar
  12. 12.
    S.S. Keerthi, V. Sinsdhwani, and O. Chapelle. An efficient method for gradient-based adaptation of hyperparameters in svm models. In In Schölkopf, B.; Platt, J.C.; Hoffman, T. (ed.): Advances in Neural Informations Processing Systems 19. MIT Press, 2007.Google Scholar
  13. 13.
    S.B. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica, 249, 2007.Google Scholar
  14. 14.
    J. Langford and J. Shawe-Taylor. PAC bayes and margins. In Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press.Google Scholar
  15. 15.
    D.A. McAllester. Some pac-bayesian theorems. Machine Learning, 37(3):355–363, 1999.CrossRefGoogle Scholar
  16. 16.
    D. Opitz. Popular ensemble methods: An empircal study. Journal of Artificial Intelligence Research, 11, 1999.Google Scholar
  17. 17.
    S. Özöğür, J. Shawe-Taylor, G.-W. Weber, and Z.B. Ögel. Pattern analysis for the prediction of fungal pro-peptide cleavage sites. article in press in special issue of Discrete Applied Mathematics on Networks in Computational Biology, doi:10.1016/j.dam.2008.06.043, 2007.Google Scholar
  18. 18.
    M. Perrone. Improving Regression Estimation: Averaging Methods for Variance Reduction with Extension to General Convex Measure Optimization. Ph.D. thesis, Brown University, Providence, RI, 1993.Google Scholar
  19. 19.
    R. Schapire, Y. Freund, P. Bartlett, and W. Lee. A new explanation for the effectiveness of voting methods. In In Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, 1997.Google Scholar
  20. 20.
    J. Shawe-Taylor. Classification accuracy based on observed margin. Algorithmica, 22: 157–172, 1998.CrossRefGoogle Scholar
  21. 21.
    V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998.Google Scholar
  22. 22.
    D. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Institute of Applied MathematicsMiddle East Technical UniversityAnkaraTurkey
  2. 2.Department of Computer ScienceCentre for Computational Statistics and Machine Learning, University CollegeLondonUK

Personalised recommendations