Reconstructing a Sparse Solution from a Compressed Support Vector Machine
A support vector machine is a means for computing a binary classifier from a set of observations. Here we assume that the observations are n feature vectors each of length m together with n binary labels, one at each observed feature vector. The feature vectors can be combined into a \(n\times m\) feature matrix. The classifier is computed via an optimization problem that depends on the feature matrix. The solution of this optimization problem is a vector of dimension m from which a classifier with good generalization properties can be computed directly. Here we show that the feature matrix can be replaced by a compressed feature matrix that comprises n feature vectors of length \(\ell <m\). The solution of the optimization problem for the compressed feature matrix has only dimension \(\ell \) and can computed faster since the optimization problem is smaller. Still, the solution to the compressed problem needs to be related to the original solution. We present a simple scheme that reconstructs the original solution from a solution of the compressed problem up to a small error. For the reconstruction guarantees we assume that the solution of the original problem is sparse. We show that sparse solutions can be promoted by a feature selection approach.
This work has been carried out within the project CG Learning. The project CG Learning acknowledges the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme for Research of the European Commission, under FET-Open grant number: 255827.
- 2.Clarkson, K.L., Woodruff, D.P.: Low rank approximation and regression in input sparsity time. In: Symposium on Theory of Computing Conference (STOC), pp. 81–90 (2013)Google Scholar
- 4.Crammer, K., Singer, Y.: On the learnability and design of output codes for multiclass problems. In: Computational Learning Theory (COLT), pp. 35–46 (2000)Google Scholar
- 7.Joachims, T.: Training linear SVMs in linear time. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 217–226 (2006)Google Scholar
- 10.Lanckriet, G.R.G., Cristianini, N., Bartlett, P.L., El Ghaoui, L., Jordan, M.I.: Learning the kernel matrix with semi-definite programming. In: Proceedings of the Nineteenth International Conference (ICML), pp. 323–330 (2002)Google Scholar
- 12.Meng, X., Mahoney, M.W.: Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In: Symposium on Theory of Computing Conference (STOC), pp. 91–100 (2013)Google Scholar
- 13.Nelson, J., Nguyen, H.L.: OSNAP: faster numerical linear algebra algorithms via sparser subspace embeddings. In: Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 117–126 (2013)Google Scholar
- 14.Paul, S., Boutsidis, C., Magdon-Ismail, M., Drineas, P.: Random projections for support vector machines. In: International Conference on Artificial Intelligence (AISTATS) (2013)Google Scholar
- 15.Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)Google Scholar
- 16.Vedaldi, A., Zisserman, A.: Sparse kernel approximations for efficient classification and detection. In: IEEE Conference on Computer Vision and Pattern Recognition (ICCV) (2012)Google Scholar
- 18.Yu, H.-F., Hsieh, C.-J., Chang, K.-W., Lin, C.-J.: Large linear classification when data cannot fit in memory. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 833–842 (2010)Google Scholar
- 19.Zhang, L., Mahdavi, M., Jin, R., Yang, T.: Recovering optimal solution by dual random projection. In: Conference on Learning Theory (COLT) (2013)Google Scholar