Abstract
We consider the convex quadratic linearly constrained problem with bounded variables and with huge and dense Hessian matrix that arises in many applications such as the training problem of bias support vector machines. We propose a decomposition algorithmic scheme suitable to parallel implementations and we prove global convergence under suitable conditions. Focusing on support vector machines training, we outline how these assumptions can be satisfied in practice and we suggest various specific implementations. Extensions of the theoretical results to general linearly constrained problem are provided. We included numerical results on support vector machines with the aim of showing the viability and the effectiveness of the proposed scheme.
Similar content being viewed by others
References
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1995)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. arXiv preprint arXiv:1606.04838 (2016)
Cao, L.J., Keerthi, S.S., Ong, C.J., Zhang, J.Q., Lee, H.P.: Parallel sequential minimal optimization for the training of support vector machines. IEEE Trans. Neural Netw. 17(4), 1039–1049 (2006)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, P.H., Fan, R.E., Lin, C.J.: A study on SMO-type decomposition methods for support vector machines. IEEE Trans. Neural Netw. 17(4), 893–908 (2006)
Correa, J.R., Schulz, A.S., Stier-Moses, N.E.: Selfish routing in capacitated networks. Math. Oper. Res. 29(4), 961–976 (2004)
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in Neural Information, vol. 2, pp. 1646–1654 (2014)
Facchinei, F., Scutari, G., Sagratella, S.: Parallel selective algorithms for nonconvex big data optimization. IEEE Trans. Signal Process. 63(7), 1874–1889 (2015). https://doi.org/10.1109/TSP.2015.2399858
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Fan, R.E., Chen, P.H., Lin, C.J.: Working set selection using second order information for training support vector machines. J. Mach. Learn. Res. 6, 1889–1918 (2005)
Glasmachers, T., Igel, C.: Maximum-gain working set selection for SVMs. J. Mach. Learn. Res. 7, 1437–1466 (2006)
Glasmachers, T., Igel, C.: Second-order SMO improves SVM online and active learning. Neural Comput. 20(2), 374–382 (2008)
Gonzalez-Lima, M.D., Hager, W.W., Zhang, H.: An affine-scaling interior-point method for continuous knapsack constraints with application to support vector machines. SIAM J. Optim. 21(1), 361–390 (2011)
Graf, H.P., Cosatto, E., Bottou, L., Dourdanovic, I., Vapnik, V.: Parallel support vector machines: the cascade SVM. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information, vol. 17, pp. 521–528. MIT Press, Cambridge (2004). http://books.nips.cc/papers/files/nips17/NIPS2004_0190.pdf
Joachims, T.: Making large scale SVM learning practical. In: Schölkopf, C., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning. MIT Press, Cambridge (1998)
Kamesam, P.V., Meyer, R.R.: Multipoint methods for separable nonlinear networks. In: Korte, B., Ritter, K. (eds.) Mathematical Programming at Oberwolfach II, pp. 185–205. Springer, Berlin (1984)
Kao, C., Lee, L.F., Pitt, M.M.: Simulated maximum likelihood estimation of the linear expenditure system with binding non-negativity constraints. Ann. Econ. Finance 2(1), 203–223 (2001)
Keerthi, S.S., Gilbert, E.G.: Convergence of a generalized SMO algorithm for SVM classifier design. Mach. Learn. 46(1–3), 351–360 (2002)
Lin, C., Lucidi, S., Palagi, L., Risi, A., Sciandrone, M.: Decomposition algorithm model for singly linearly-constrained problems subject to lower and upper bounds. J. Optim. Theory Appl. 141(1), 107–126 (2009)
Lin, C.J.: Asymptotic convergence of an SMO algorithm without any assumptions. IEEE Trans. Neural Netw. 13, 248–250 (2002)
Lin, C.J.: A formal analysis of stopping criteria of decomposition methods for support vector machines. IEEE Trans. Neural Netw. 13(5), 1045–1052 (2002)
Liu, J., Wright, S., Ré, C., Bittorf, V., Sridhar, S.: An asynchronous parallel stochastic coordinate descent algorithm. J. Mach. Learn. Res. 16, 285–322 (2015)
Liuzzi, G., Palagi, L., Piacentini, M.: On the convergence of a Jacobi-type algorithm for singly linearly-constrained problems subject to simple bounds. Optim. Lett. 5(2), 347–362 (2011)
Lucidi, S., Palagi, L., Risi, A., Sciandrone, M.: A convergent decomposition algorithm for support vector machines. Comput. Optim. Appl. 38(2), 217–234 (2007)
Lucidi, S., Palagi, L., Risi, A., Sciandrone, M.: A convergent hybrid decomposition algorithm model for SVM training. IEEE Trans. Neural Netw. 20(6), 1055–1060 (2009)
Manno, A., Sagratella, S., Livi, L.: A convergent and fully distributable SVMs training algorithm. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 3076–3080. IEEE (2016)
Palagi, L., Sciandrone, M.: On the convergence of a modified version of SVM\(^{light}\) algorithm. Optim. Method Softw. 20(2–3), 311–328 (2005)
Risi, A.: Convergent decomposition methods for support vector machines. Ph.D. thesis, Sapienza University of Rome (2008)
Scutari, G., Facchinei, F., Lampariello, L., Sardellitti, S., Song, P.: Parallel and distributed methods for constrained nonconvex optimization-part II: applications in communications and machine learning. IEEE Trans. Signal Process. 65(8), 1945–1960 (2017)
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011). https://doi.org/10.1007/s10107-010-0420-4
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14(1), 567–599 (2013)
Shalev-Shwartz, S., Zhang, T.: Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. Math. Program. 155(1–2), 105–145 (2016)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)
Steinwart, I., Hush, D., Scovel, C.: Training SVMs without offset. J. Mach. Learn. Res. 12, 141–202 (2011)
Takác, M., Bijral, A., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMs. In: 30th International Conference on Machine Learning, pp. 537–552. Springer, Berlin (2013)
Tseng, P., Yun, S.: A coordinate gradient descent method for linearly constrained smooth optimization and support vector machines training. Comput. Optim. Appl. 47(2), 179–206 (2010)
Yang, J.: An improved cascade SVM training algorithm with crossed feedbacks. In: First International Multi-Symposiums on Computer and Computational Sciences, 2006. IMSCCS ’06, vol. 2, pp. 735–738 (2006)
Zanghirati, G., Zanni, L.: A parallel solver for large quadratic programs in training support vector machines. Parallel Comput. 29(4), 535–551 (2003)
Zanni, L., Serafini, T., Zanghirati, G.: Parallel software for training large scale support vector machines on multiprocessor systems. J. Mach. Learn. Res. 7, 1467–1492 (2006)
Zhang, J.P., Li, Z.W., Yang, J.: A parallel SVM training algorithm on large-scale classification problems. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 2005, vol. 3, pp. 1637–1641 (2005)
Acknowledgements
The authors thank Prof. Marco Sciandrone (Dipartimento di Ingegneria dell’Informazione, Università di Firenze) for fruitful discussions and suggestions that improved significantly the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
The work of Laura Palagi was partially supported by the Italian Project PLATINO (Grant Agreement No. PON01_01007); the work of Simone Sagratella was partially supported by the Grant: Avvio alla Ricerca 488, Sapienza University of Rome.
Rights and permissions
About this article
Cite this article
Manno, A., Palagi, L. & Sagratella, S. Parallel decomposition methods for linearly constrained problems subject to simple bound with application to the SVMs training. Comput Optim Appl 71, 115–145 (2018). https://doi.org/10.1007/s10589-018-9987-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-018-9987-0