Abstract
Support Vector Machines (SVMs) suffer from a widely recognized scalability problem in both memory use and computational time. To improve scalability, we have developed a parallel SVM algorithm (PSVM), which reduces memory use through performing a row-based, approximate matrix factorization, and which loads only essential data to each machine to perform parallel computation. Let n denote the number of training instances, p the reduced matrix dimension after factorization (p is significantly smaller than \(n\)) and \(m\) the number of machines. PSVM reduces the memory requirement by the Interior Point Method from \({\fancyscript{O}} (n^2)\;\hbox{to}\;{\fancyscript{O}}(np/m)\), and improves computation time to \({\fancyscript{O}}(np^2/m)\). Empirical studies show PSVM to be effective. This chapter\(^\dagger\) was first published in NIPS’07 [1] and the open-source code was made available at [2].
†© NIPS, 2007. This chapter is a minor revision of the author's work with Kaihua Zhu, Hongjie Bai, Hao Wang, Zhihuan Qiu, Jian Li, and Hang Cui published in NIPS'07 and then in Scaling Up Machine Learning by Cambridge University Press. Permission to publish this chapter is granted by copyright agreements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
RCV is located at http://jmlr.csail.mit.edu/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.ht. The image set is a binary-class image dataset consisting of \(144\) perceptual features. The others are obtained from http://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets. We separated the datasets into training/testing (see Table 10.1 for the splits) and performed cross validation.
- 2.
We observed super-linear speedup when 30 machines were used for training Image and when up to 50 machines were used for RCV. We believe that this super-linear speedup resulted from performance gain in the memory management system when the physical memory was not in contention with other processes running at the data center. This benefit was cancelled by other overheads (explained in Sect. 10.4.3) when more machines were employed.
References
E.Y. Chang, K. Zhu , H. Wang, H. Bai, J. Li, Z. Qiu, H. Cui, Parallelizing support vector machines on distributed computers, in Proceedings of NIPS, 2007
E.Y. Chang, H. Bai, K. Zhu, H. Wang, J. Li, Z. Qiu, Google PSVM open source. http://code.google.com/p/psvm/
V. Vapnik, The Nature of Statistical Learning Theory. (Springer, New York, 1995)
S. Mehrotra, On the implementation of a primal-dual interior point method. SIAM J. Optim. 2, 575–601 (1992)
T. Joachims, Making large-scale SVM learning practical, in Advances in Kernel Methods—Support Vector Learning, ed. by B. Schölkopf, C. Burges, A. Smola (MIT Press, 1999)
C.C. Chang, C.J. Lin, LIBSVM: a library for support vector machines, 2001
J. Platt, Sequential minimal optimization: a fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research, 1998
S. Vishwanathan, A.J. Smola, M.N. Murty, Simple SVM, in Proceedings of ICML, 2003, pp. 760–767
T. Joachims, Training linear SVMs in linear time, in Proceedings of ACM KDD, 2006, pp. 217–226
C.T. Chu, S.K. Kim, Y.A.. Lin, Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun, Map reduce for machine learning on multicore, in Proceedings of NIPS, 2006, pp. 281–288
Y.J. Lee, S.Y. Huang, Reduced support vector machines: a statistical theory. IEEE Trans. Neural Networks 18(1), 1–13 (2007)
I.W. Tsang, J.T. Kwok, P.M. Cheung, Core vector machines: fast svm training on very large data sets. J Mach. Learn. Res. 6, 363–392 (2005)
H.P. Graf, E. Cosatto, L. Bottou, I. Dourdanovic, V. Vapnik, Parallel support vector machines: The cascade svm, in Proceedings of NIPS, 2005, pp. 521–528
S. Boyd, Convex Optimization (Cambridge University Press, Cambridge, 2004)
G.H. Golub, C.F.V. Loan, Matrix Computations (Johns Hopkins University Press, Baltimore, 1996)
S. Fine, K. Scheinberg, Efficient svm training using low-rank kernel representations. J. Mach. Learn. Res. 2, 243–264 (2001)
F.R. Bach, M.I. Jordan, Predictive low-rank decomposition for kernel methods, in Proceedings of International Conference on Machine Learning (ICML), 2005, pp. 33–40
J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters, in Proceedings of OSDI: Symposium on Operating System Design and Implementation, 2004, pp. 137–150
S. Ghemawat, H. Gobioff, S.T. Leung The Google file system, in Proceedings of 19th ACM Symposium on Operating Systems Principles, 2003, pp. 29–43
G. Loosli, S. Canu, Comments on the core vector machines: fast SVM training on very large data sets. J. Mach. Learn. Res. 8, 291–301 (2007)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Pres
About this chapter
Cite this chapter
Chang, E.Y. (2011). PSVM: Parallelizing Support Vector Machines on Distributed Computers. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-20429-6_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)