Computational Statistics

, Volume 23, Issue 4, pp 623–641 | Cite as

Penalised spline support vector classifiers: computational issues

Original Paper


We study computational issues for support vector classification with penalised spline kernels. We show that, compared with traditional kernels, computational times can be drastically reduced in large problems making such problems feasible for sample sizes as large as ~106. The optimisation technology known as interior point methods plays a central role. Penalised spline kernels are also shown to allow simple incorporation of low-dimensional structure such as additivity. This can aid both interpretability and performance.


Additive models Interior point methods Low-dimensional structure Low-rank Kernels Semiparametric regression Support vector machines 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Blake C, Merz C (1998) UCI repository of machine learning databases.
  2. Boyd S and Vandenberghe L (2004). Convex optimization. Cambridge University Press, Cambridge MATHGoogle Scholar
  3. Breiman L (2001). Statistical modeling: the two cultures (with discussion). Stat Sci 16: 199–231 MATHCrossRefMathSciNetGoogle Scholar
  4. Burges C (1998). A tutorial on support vector machines for pattern recognition. Data Mining Knowl Dis 2: 121–167 CrossRefGoogle Scholar
  5. Cristianini N and Shawe-Taylor J (2000). An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge Google Scholar
  6. Dudoit S, Fridlyand J and Speed TP (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97: 77–87 MATHCrossRefMathSciNetGoogle Scholar
  7. Eilers PHC and Marx BD (1996). Flexible smoothing with B-splines and penalties (with discussion). Stat Sci 11: 89–121 MATHCrossRefMathSciNetGoogle Scholar
  8. Ferris MC and Munson TS (2003). Interior point methods for massive support vector machines. SIAM J Optim 13: 783–804 MATHCrossRefMathSciNetGoogle Scholar
  9. Fine S and Scheinberg K (2002). Efficient svm training using low-rank kernel representations. J Mach Learn Res 2: 243–264 MATHCrossRefGoogle Scholar
  10. French JL, Kammann EE and Wand MP (2001). Comment on paper by Ke and Wang. J Am Stat Assoc 96: 1285–1288 Google Scholar
  11. Hastie T (1996). Pseudosplines. J R Stat Soc, Ser B 58: 379–396 MATHMathSciNetGoogle Scholar
  12. Hastie TJ and Tibshirani RJ (1990). Generalized additive models. Chapman & Hall/CRC, London MATHGoogle Scholar
  13. Hastie T, Tibshirani R and Friedman JH (2001). The elements of statistical learning, data mining, inference, and learning. Springer, New York MATHGoogle Scholar
  14. Ho TK, Kleinberg EM (1996) Building projectable classifiers of arbitrary complexity. In: Proceedings of the 13th international conference on pattern recognition, Vienna, Austria, pp 880–885
  15. Hush D, Kelly P, Scovel C and Steinwart I (2006). QP algorithms with guaranteed accuracy and run time for support vector machines. Journal of Mach Learn Res 7: 733–769 MathSciNetGoogle Scholar
  16. Johnson ME, Moore LM and Ylvisaker D (1990). Minimax and maximin distance designs. J Stat Plann Infer 26: 131–148 CrossRefMathSciNetGoogle Scholar
  17. Kaufman L and Rousseeuw PJ (1990). Finding groups in data: an introduction to cluster analysis. Wiley, New York Google Scholar
  18. Mehrotra S (1992). On the implementation of a primal–dual interior point method. SIAM J Optim 2: 575–601 MATHCrossRefMathSciNetGoogle Scholar
  19. Mészáros Cs (1998). The BPMPD interior point solver for convex quadratic programming problems. Optim Meth Softw 11&12: 431–449 Google Scholar
  20. Mészáros Cs (1999). Steplengths in infeasible primal–dual interior point methods of quadratic programming. Oper Res Lett 25: 39–45 MATHCrossRefMathSciNetGoogle Scholar
  21. Nocedal J and Wright SJ (1999). Numerical optimization. Springer, New York MATHGoogle Scholar
  22. Nychka D, Haaland P, O’Connell M and Ellner S (1998). FUNFITS, data analysis and statistical tools for estimating functions. In: Nychka, D, Piegorsch, WW, and Cox, LH (eds) Case studies in environmental statistics, pp 159–179. Springer, New York Google Scholar
  23. Nychka D, Saltzman N (1998) Design of air quality monitoring networks. In: Nychka D, Cox L, Piegorsch W (eds) Case studies in environmental statistics. Lecture notes in statistics, Springer, Heidelberg, pp 51–76Google Scholar
  24. Pearce ND and Wand MP (2006). Penalised splines and reproducing Kernel methods. Am Stat 60: 233–240 CrossRefMathSciNetGoogle Scholar
  25. Ruppert D, Wand MP and Carroll RJ (2003). Semiparametric regression. Cambridge University Press, New York MATHGoogle Scholar
  26. Schoenberg I (1968). Monosplines and quadrature formulae. In: Greville, T (eds) Theory and application of spline functions. University of Wisconsin Press, Madison Google Scholar
  27. Schölkopf B and Smola AJ (2002). Learning with Kernels. MIT Press, Cambridge Google Scholar
  28. Simon HU (2004) On the complexity of working set selection. In: Proceedings of the 15th international conference on algorithmic learning theory.
  29. Smola AJ, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, San FranciscoGoogle Scholar
  30. Vandenberghe L, Comanor K (2003) A sequential analytic centering approach to the support vector machine. In: Proceedings of SPIE advanced signal processing algorithms, architectures, and implementations XIII, pp 209–218, August 6–8, San DiegoGoogle Scholar
  31. Turlach BA, Weingessel A (2006) \({{\tt quadprog\,1.4-8.}}\) R package.
  32. Wahba G (1990). Spline models for observational data. SIAM, Philadelphia MATHGoogle Scholar
  33. Williams CKI and Seeger M (2001). Using the Nystrom method to speed up kernel machines. In: Leen, TK and Diettrich, TG (eds) Advances in neural information processing systems, vol 13, pp 682–688. MIT Press, Cambridge Google Scholar
  34. Williams C and Seeger M (2001). Using the Nystroem method to speed up Kernel machines. Neural Inform Process Syst 13: 682–688 Google Scholar
  35. Wood SN (2003). Thin-plate regression splines. J R Stat Soc Ser B 65: 95–114 MATHCrossRefGoogle Scholar
  36. Wright SJ (1997). Primal–dual interior-point methods. SIAM, Philadelphia MATHGoogle Scholar
  37. Yau P, Kohn R and Wood S (2003). Bayesian variable selection and model averaging in high-dimensional multinomial nonparametric regression. J Comput Graph Stat 12: 1–32 CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  1. 1.School of Mathematics and StatisticsUniversity of New South WalesSydneyAustralia

Personalised recommendations