Reconstructing a Sparse Solution from a Compressed Support Vector Machine

  • Joachim GiesenEmail author
  • Sören Laue
  • Jens K. Mueller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9582)


A support vector machine is a means for computing a binary classifier from a set of observations. Here we assume that the observations are n feature vectors each of length m together with n binary labels, one at each observed feature vector. The feature vectors can be combined into a \(n\times m\) feature matrix. The classifier is computed via an optimization problem that depends on the feature matrix. The solution of this optimization problem is a vector of dimension m from which a classifier with good generalization properties can be computed directly. Here we show that the feature matrix can be replaced by a compressed feature matrix that comprises n feature vectors of length \(\ell <m\). The solution of the optimization problem for the compressed feature matrix has only dimension \(\ell \) and can computed faster since the optimization problem is smaller. Still, the solution to the compressed problem needs to be related to the original solution. We present a simple scheme that reconstructs the original solution from a solution of the compressed problem up to a small error. For the reconstruction guarantees we assume that the solution of the original problem is sparse. We show that sparse solutions can be promoted by a feature selection approach.



This work has been carried out within the project CG Learning. The project CG Learning acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 255827.


  1. 1.
    Candès, E.J., Tao, T.: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Clarkson, K.L., Woodruff, D.P.: Low rank approximation and regression in input sparsity time. In: Symposium on Theory of Computing Conference (STOC), pp. 81–90 (2013)Google Scholar
  3. 3.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  4. 4.
    Crammer, K., Singer, Y.: On the learnability and design of output codes for multiclass problems. In: Computational Learning Theory (COLT), pp. 35–46 (2000)Google Scholar
  5. 5.
    Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Fornasier, M., Rauhut, H.: Compressive Sensing, Chap. 2. Springer, New York (2011)zbMATHGoogle Scholar
  7. 7.
    Joachims, T.: Training linear SVMs in linear time. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 217–226 (2006)Google Scholar
  8. 8.
    Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contem. Math. 26, 189–206 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Kimeldorf, G.S., Wahba, G.: A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Stat. 41, 495–502 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Lanckriet, G.R.G., Cristianini, N., Bartlett, P.L., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semi-definite programming. In: Proceedings of the Nineteenth International Conference (ICML), pp. 323–330 (2002)Google Scholar
  11. 11.
    Lin, C.-J., Weng, R.C., Keerthi, S.S.: Trust region newton method for logistic regression. J. Mach. Learn. Res. 9, 627–650 (2008)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Meng, X., Mahoney, M.W.: Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In: Symposium on Theory of Computing Conference (STOC), pp. 91–100 (2013)Google Scholar
  13. 13.
    Nelson, J., Nguyen, H.L.: OSNAP: faster numerical linear algebra algorithms via sparser subspace embeddings. In: Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 117–126 (2013)Google Scholar
  14. 14.
    Paul, S., Boutsidis, C., Magdon-Ismail, M., Drineas, P.: Random projections for support vector machines. In: International Conference on Artificial Intelligence (AISTATS) (2013)Google Scholar
  15. 15.
    Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
  16. 16.
    Vedaldi, A., Zisserman, A.: Sparse kernel approximations for efficient classification and detection. In: IEEE Conference on Computer Vision and Pattern Recognition (ICCV) (2012)Google Scholar
  17. 17.
    Vempala, S.: The Random Projection Method. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science Series. American Mathematical Society, Providence (2004)zbMATHGoogle Scholar
  18. 18.
    Yu, H.-F., Hsieh, C.-J., Chang, K.-W., Lin, C.-J.: Large linear classification when data cannot fit in memory. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 833–842 (2010)Google Scholar
  19. 19.
    Zhang, L., Mahdavi, M., Jin, R., Yang, T.: Recovering optimal solution by dual random projection. In: Conference on Learning Theory (COLT) (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Friedrich-Schiller-Universität JenaJenaGermany

Personalised recommendations