Abstract
We show how to optimize a Support Vector Machine and a predictor for Collaborative Filtering with Stochastic Gradient Descent on the GPU, achieving 1.66 to 6-times accelerations compared to a CPU-based implementation. The reference implementations are the Support Vector Machine by Bottou and the BRISMF predictor from the Netflix Prices winning team. Our main idea is to create a hash function of the input data and use it to execute threads in parallel that write on different elements of the parameter vector. We also compare the iterative optimization with a batch gradient descent and an alternating least squares optimization. The predictor is tested against over a hundred million data sets which demonstrates the increasing memory management capabilities of modern GPUs. We make use of matrix as well as float compression to alleviate the memory bottleneck.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bottou, L.: Stochastic gradient SVM (2010), http://leon.bottou.org/projects/sgd#stochastic_gradient_svm
Carpenter, A.: cuSVM: a CUDA implementation of SVM (2009), http://patternsonascreen.net/cuSVMDesc.pdf
Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: ICML, pp. 104–111 (2008)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press (2000)
Kato, K., Hosino, T.: Singular value decomposition for collaborative filtering on a GPU. Materials Science and Engineering 10(1), 12–17 (2010)
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42, 30–37 (2009)
L-Applegate, D., Bixby, R.E., Chvatal, V., Cook, W.J.: The Travelling Salesman Problem. Princeton University Press (2006)
Platt, J.C.: Sequential minimal optimization: A fast algorithm for training support vector machines (1998), http://research.microsoft.com/pubs/69644/tr-98-14.pdf
Schoelkopf, B., Smola, A.J.: Learning with Kernels. MIT Press (2001)
Summa, M.G., Bottou, L., Goldfarb, B., Murtagh, F., Pardoux, C., Touati, M. (eds.): Statistical Learning and Data Science. Chapman & Hall (2011)
Takács, G., Pilászy, I., Németh, B., Tikk, D.: Matrix factorization and neighbor based algorithms for the Netflix Prize problem. In: ACM Conf. on Recommendation Systems, pp. 267–274 (2008)
Toescher, A., Jahrer, M., Bell, R.M.: The bigchaos solution to the Netflix Grand Prize (2009)
Vapnik, V.N., Chervonenkis, A.Y.: Theory of Pattern Recognition. Nauka, USSR (1974) (in Russian)
Zastrau, D.: Beschleunigte Maschinelle Lernverfahren auf der GPU (2011), http://anonstorage.net/PStorage/74.diplomarbeit-david-zastrau.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zastrau, D., Edelkamp, S. (2012). Stochastic Gradient Descent with GPGPU. In: Glimm, B., Krüger, A. (eds) KI 2012: Advances in Artificial Intelligence. KI 2012. Lecture Notes in Computer Science(), vol 7526. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33347-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-33347-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33346-0
Online ISBN: 978-3-642-33347-7
eBook Packages: Computer ScienceComputer Science (R0)