Engineering fast multilevel support vector machines
- 558 Downloads
The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters which requires computationally expensive fitting. This increases the quality but also reduces the performance dramatically. We introduce a generalized fast multilevel framework for regular and weighted SVM and discuss several versions of its algorithmic components that lead to a good trade-off between quality and time. Our framework is implemented using PETSc which allows an easy integration with scientific computing tasks. The experimental results demonstrate significant speed up compared to the state-of-the-art nonlinear SVM libraries. Reproducibility: our source code, documentation and parameters are available at https://github.com/esadr/mlsvm.
KeywordsClassification Support vector machine Parameter fitting Imbalanced learning Hierarchical method Multilevel method PETSc
We would like to thank three anonymous reviewers whose valuable comments helped to improve this paper significantly. This material is based upon work supported by the National Science Foundation under Grants Nos. 1638321 and 1522751.
- Asharaf, S., & Murty, M. N. (2006). Scalable non-linear support vector machine using hierarchical clustering. In 18th international conference on pattern recognition, 2006. ICPR 2006 (vol. 1, pp. 908–911). IEEE.Google Scholar
- Balay, S., Abhyankar, S., Adams, M. F., Brown, J., Brune, P., Buschelman, K., Dalcin, L., Eijkhout, V., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Rupp, K., Smith, B. F., Zampini, S., & Zhang, H. (2016). PETSc users manual. Technical Report ANL-95/11 - Revision 3.7, Argonne National Laboratory. http://www.mcs.anl.gov/petsc
- Berry, M., Potok, T. E., Balaprakash, P., Hoffmann, H., Vatsavai, R., & Prabhat (2015). Machine learning and understanding for intelligent extreme scale scientific computing and discovery. Techical Report 15-CS-1768, ASCR DOE Workshop Report. https://www.orau.gov/machinelearning2015/
- Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., & Schulz, C. (2016). Recent advances in graph partitioning. Algorithm engineering: Selected results and surveys. Cham: Springer.Google Scholar
- Chang, C.C., & Lin, C.J. (2011). Libsvm: A library for support vector machines. acm transactions on intelligent systems and technology, 2: 27:1–27:27. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm (2011)
- Cheong, S., Oh, S. H., & Lee, S. Y. (2004). Support vector machines with binary tree architecture for multi-class classification. Neural Information Processing-Letters and Reviews, 2(3), 47–51.Google Scholar
- Chevalier, C., & Safro, I. (2009). Comparison of coarsening schemes for multilevel graph partitioning. In Learning and intelligent optimization (pp. 191–205).Google Scholar
- Cui, L., Wang, C., Li, W., Tan, L., & Peng, Y. (2017). Multi-modes cascade SVMs: Fast support vector machines in distributed system (pp. 443–450). Singapore: Springer. https://doi.org/10.1007/978-981-10-4154-9_51.
- Dhillon, I., Guan, Y., & Kulis, B. (2005). A fast kernel-based multilevel algorithm for graph clustering. In Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05) (pp. 629–634). ACM Press. https://doi.org/10.1145/1081870.1081948
- Fang, H. r., Sakellaridi, S., & Saad, Y. (2010). Multilevel manifold learning with application to spectral clustering. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 419–428). ACM.Google Scholar
- Frank, A., & Asuncion, A. (2010). UCI machine learning repository (vol. 213). [http://archive.ics.uci.edu/ml]. Irvine : University of California, School of Information and Computer Science.
- Graf, H. P., Cosatto, E., Bottou, L., Dourdanovic, I., & Vapnik, V. (2004). Parallel support vector machines: The cascade SVM. In Advances in neural information processing systems (pp. 521–528).Google Scholar
- Horng, S. J., Su, M. Y., Chen, Y. H., Kao, T. W., Chen, R. J., Lai, J. L., et al. (2011). A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Systems with Applications, 38(1), 306–313. https://doi.org/10.1016/j.eswa.2010.06.066. http://www.sciencedirect.com/science/article/pii/S0957417410005701.
- Hsieh, C. J., Si, S., & Dhillon, I. (2014). A divide-and-conquer solver for kernel support vector machines. In: E. P. Xing, & T. Jebara (Eds.) Proceedings of the 31st international conference on machine learning. Proceedings of machine learning research (vol. 32, pp. 566–574). Bejing: PMLR. http://proceedings.mlr.press/v32/hsieha14.html
- Joachims, T. (1999). Making large scale svm learning practical. Technical report, Universität Dortmund.Google Scholar
- Karypis, G., & Kumar, V. (1998). MeTiS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, Version 4.0. University of Minnesota, Minneapolis.Google Scholar
- Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning (pp. 609–616). ACM.Google Scholar
- Lessmann, S., Stahlbock, R., & Crone, S. F. (2006). Genetic algorithms for support vector machine model selection. In International joint conference on neural networks, 2006. IJCNN’06. (pp. 3063–3069). IEEE.Google Scholar
- Li, T., Liu, X., Dong, Q., Ma, W., & Wang, K. (2016). HPSVM: Heterogeneous parallel SVM with factorization based IPM algorithm on CPU-GPU cluster. In 2016 24th Euromicro international conference on parallel, distributed, and network-based processing (PDP) (pp. 74–81). IEEE.Google Scholar
- Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml
- Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International conference on computer vision theory and application VISSAPP’09 (pp. 331–340). INSTICC Press.Google Scholar
- Noack, A., & Rotta, R. (2009). Multi-level algorithms for modularity clustering. In Experimental algorithms (pp. 257–268). Springer.Google Scholar
- Noack, A., & Rotta, R. (2009). Multi-level algorithms for modularity clustering. In J. Vahrenhold (Ed.) Experimental algorithms, Lecture Notes in Computer Science (vol. 5526, pp. 257–268). Berlin: Springer. https://doi.org/10.1007/978-3-642-02011-7_24.
- Osuna, E., Freund, R., & Girosi, F. (1997). An improved training algorithm for support vector machines. In Neural Networks for Signal Processing  VII. Proceedings of the 1997 IEEE Workshop (pp. 276–285). IEEE.Google Scholar
- Platt, J.C. (1999). Fast training of support vector machines using sequential minimal optimization. In Advances in kernel methods (pp. 185–208). MIT press.Google Scholar
- Puget, R., & Baskiotis, N. (2015). Hierarchical label partitioning for large scale classification. In IEEE international conference on data science and advanced analytics (DSAA), 2015. 36678 2015 (pp. 1–10). IEEE.Google Scholar
- Razzaghi, T., & Safro, I. (2015). Scalable multilevel support vector machines. In International conference on computational science (ICCS), Procedia Computer Science (vol. 51, pp. 2683–2687). Elsevier.Google Scholar
- Sadrfaridpour, E., Jeereddy, S., Kennedy, K., Luckow, A., Razzaghi, T., & Safro, I. (2017). Algebraic multigrid support vector machines. accepted in European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), arXiv preprint arXiv:1611.05487.
- Safro, I., Ron, D., & Brandt, A. (2008). Multilevel algorithms for linear ordering problems. ACM Journal of Experimental Algorithmics, 13, 4:1.4–4:1.20.Google Scholar
- Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.Google Scholar
- You, Y., Demmel, J., Czechowski, K., Song, L., & Vuduc, R. (2015). CA-SVM: Communication-avoiding support vector machines on distributed systems. In 2015 IEEE international parallel and distributed processing symposium (IPDPS) (pp. 847–859). IEEE.Google Scholar
- Yu, H., Yang, J., & Han, J. (2003). Classifying large data sets using svms with hierarchical clusters. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 306–315). ACM.Google Scholar
- Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z., Cui, H., & Chang, E. Y. (2008). Parallelizing support vector machines on distributed computers. In Advances in neural information processing systems (pp. 257–264).Google Scholar
- Zhu, Z. A., Chen, W., Wang, G., Zhu, C., & Chen, Z. (2009). P-packSVM: Parallel primal gradient descent kernel SVM. In Ninth IEEE international conference on data mining, 2009. ICDM’09 (pp. 677–686). IEEE.Google Scholar