Advertisement

Machine Learning

, Volume 108, Issue 11, pp 1879–1917 | Cite as

Engineering fast multilevel support vector machines

  • Ehsan Sadrfaridpour
  • Talayeh Razzaghi
  • Ilya SafroEmail author
Article

Abstract

The computational complexity of solving nonlinear support vector machine (SVM) is prohibitive on large-scale data. In particular, this issue becomes very sensitive when the data represents additional difficulties such as highly imbalanced class sizes. Typically, nonlinear kernels produce significantly higher classification quality to linear kernels but introduce extra kernel and model parameters which requires computationally expensive fitting. This increases the quality but also reduces the performance dramatically. We introduce a generalized fast multilevel framework for regular and weighted SVM and discuss several versions of its algorithmic components that lead to a good trade-off between quality and time. Our framework is implemented using PETSc which allows an easy integration with scientific computing tasks. The experimental results demonstrate significant speed up compared to the state-of-the-art nonlinear SVM libraries. Reproducibility: our source code, documentation and parameters are available at https://github.com/esadr/mlsvm.

Keywords

Classification Support vector machine Parameter fitting Imbalanced learning Hierarchical method Multilevel method PETSc 

Notes

Acknowledgements

We would like to thank three anonymous reviewers whose valuable comments helped to improve this paper significantly. This material is based upon work supported by the National Science Foundation under Grants Nos. 1638321 and 1522751.

References

  1. An, S., Liu, W., & Venkatesh, S. (2007). Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recognition, 40(8), 2154–2162.zbMATHCrossRefGoogle Scholar
  2. Asharaf, S., & Murty, M. N. (2006). Scalable non-linear support vector machine using hierarchical clustering. In 18th international conference on pattern recognition, 2006. ICPR 2006 (vol. 1, pp. 908–911). IEEE.Google Scholar
  3. Balay, S., Abhyankar, S., Adams, M. F., Brown, J., Brune, P., Buschelman, K., Dalcin, L., Eijkhout, V., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes, L. C., Rupp, K., Smith, B. F., Zampini, S., & Zhang, H. (2016). PETSc users manual. Technical Report ANL-95/11 - Revision 3.7, Argonne National Laboratory. http://www.mcs.anl.gov/petsc
  4. Bao, Y., Hu, Z., & Xiong, T. (2013). A pso and pattern search based memetic algorithm for svms parameters optimization. Neurocomputing, 117, 98–106.CrossRefGoogle Scholar
  5. Berry, M., Potok, T. E., Balaprakash, P., Hoffmann, H., Vatsavai, R., & Prabhat (2015). Machine learning and understanding for intelligent extreme scale scientific computing and discovery. Techical Report 15-CS-1768, ASCR DOE Workshop Report. https://www.orau.gov/machinelearning2015/
  6. Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., et al. (2008). On modularity clustering. IEEE Transactions on Knowledge and Data Engineering, 20(2), 172–188.zbMATHCrossRefGoogle Scholar
  7. Brandt, A., & Ron, D. (2003). Chapter 1: Multigrid solvers and multilevel optimization strategies. In J. Cong & J. R. Shinnerl (Eds.), Multilevel optimization and VLSICAD. Dordrecht: Kluwer.zbMATHGoogle Scholar
  8. Brannick, J., Brezina, M., MacLachlan, S., Manteuffel, T., McCormick, S., & Ruge, J. (2006). An energy-based amg coarsening strategy. Numerical Linear Algebra with Applications, 13(2–3), 133–148.MathSciNetzbMATHCrossRefGoogle Scholar
  9. Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., & Schulz, C. (2016). Recent advances in graph partitioning. Algorithm engineering: Selected results and surveys. Cham: Springer.Google Scholar
  10. Cawley, G. C., & Talbot, N. L. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11(Jul), 2079–2107.MathSciNetzbMATHGoogle Scholar
  11. Chang, C.C., & Lin, C.J. (2011). Libsvm: A library for support vector machines. acm transactions on intelligent systems and technology, 2: 27:1–27:27. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm (2011)
  12. Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2002). Choosing multiple parameters for support vector machines. Machine Learning, 46(1), 131–159.zbMATHCrossRefGoogle Scholar
  13. Chen, J., & Safro, I. (2011). Algebraic distance on graphs. SIAM Jouranl on Scientific Computing, 33(6), 3468–3490.MathSciNetzbMATHCrossRefGoogle Scholar
  14. Cheong, S., Oh, S. H., & Lee, S. Y. (2004). Support vector machines with binary tree architecture for multi-class classification. Neural Information Processing-Letters and Reviews, 2(3), 47–51.Google Scholar
  15. Chevalier, C., & Safro, I. (2009). Comparison of coarsening schemes for multilevel graph partitioning. In Learning and intelligent optimization (pp. 191–205).Google Scholar
  16. Claesen, M., De Smet, F., Suykens, J. A., & De Moor, B. (2014). Ensemblesvm: A library for ensemble learning using support vector machines. Journal of Machine Learning Research, 15(1), 141–145.zbMATHGoogle Scholar
  17. Coussement, K., & Van den Poel, D. (2008). Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques. Expert Systems with Applications, 34(1), 313–327.CrossRefGoogle Scholar
  18. Cui, L., Wang, C., Li, W., Tan, L., & Peng, Y. (2017). Multi-modes cascade SVMs: Fast support vector machines in distributed system (pp. 443–450). Singapore: Springer.  https://doi.org/10.1007/978-981-10-4154-9_51.
  19. Dhillon, I., Guan, Y., & Kulis, B. (2005). A fast kernel-based multilevel algorithm for graph clustering. In Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05) (pp. 629–634). ACM Press.  https://doi.org/10.1145/1081870.1081948
  20. Dietterich, T. (1995). Overfitting and undercomputing in machine learning. ACM Computing Surveys (CSUR), 27(3), 326–327.CrossRefGoogle Scholar
  21. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research, 9(Aug), 1871–1874.zbMATHGoogle Scholar
  22. Fan, R. E., Chen, P. H., & Lin, C. J. (2005). Working set selection using second order information for training support vector machines. The Journal of Machine Learning Research, 6, 1889–1918.MathSciNetzbMATHGoogle Scholar
  23. Fang, H. r., Sakellaridi, S., & Saad, Y. (2010). Multilevel manifold learning with application to spectral clustering. In Proceedings of the 19th ACM international conference on information and knowledge management (pp. 419–428). ACM.Google Scholar
  24. Frank, A., & Asuncion, A. (2010). UCI machine learning repository (vol. 213). [http://archive.ics.uci.edu/ml]. Irvine : University of California, School of Information and Computer Science.
  25. Graf, H. P., Cosatto, E., Bottou, L., Dourdanovic, I., & Vapnik, V. (2004). Parallel support vector machines: The cascade SVM. In Advances in neural information processing systems (pp. 521–528).Google Scholar
  26. Hao, P. Y., Chiang, J. H., & Tu, Y. K. (2007). Hierarchically svm classification based on support vector clustering method and its application to document categorization. Expert Systems with Applications, 33(3), 627–635.CrossRefGoogle Scholar
  27. Horng, S. J., Su, M. Y., Chen, Y. H., Kao, T. W., Chen, R. J., Lai, J. L., et al. (2011). A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Systems with Applications, 38(1), 306–313.  https://doi.org/10.1016/j.eswa.2010.06.066. http://www.sciencedirect.com/science/article/pii/S0957417410005701.
  28. Hsieh, C. J., Si, S., & Dhillon, I. (2014). A divide-and-conquer solver for kernel support vector machines. In: E. P. Xing, & T. Jebara (Eds.) Proceedings of the 31st international conference on machine learning. Proceedings of machine learning research (vol. 32, pp. 566–574). Bejing: PMLR. http://proceedings.mlr.press/v32/hsieha14.html
  29. Huang, C., Lee, Y., Lin, D., & Huang, S. (2007). Model selection for support vector machines via uniform design. Computational Statistics & Data Analysis, 52(1), 335–346.MathSciNetzbMATHCrossRefGoogle Scholar
  30. Joachims, T. (1999). Making large scale svm learning practical. Technical report, Universität Dortmund.Google Scholar
  31. Karypis, G., Han, E. H., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8), 68–75.CrossRefGoogle Scholar
  32. Karypis, G., & Kumar, V. (1998). MeTiS: A software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices, Version 4.0. University of Minnesota, Minneapolis.Google Scholar
  33. Khan, L., Awad, M., & Thuraisingham, B. (2007). A new intrusion detection system using support vector machines and hierarchical clustering. The VLDB Journal, 16(4), 507–521.  https://doi.org/10.1007/s00778-006-0002-5.CrossRefGoogle Scholar
  34. Khreich, W., Granger, E., Miri, A., & Sabourin, R. (2010). Iterative boolean combination of classifiers in the roc space: An application to anomaly detection with hmms. Pattern Recognition, 43(8), 2732–2752.zbMATHCrossRefGoogle Scholar
  35. Kushnir, D., Galun, M., & Brandt, A. (2006). Fast multiscale clustering and manifold identification. Pattern Recognition, 39(10), 1876–1891.  https://doi.org/10.1016/j.patcog.2006.04.007.zbMATHCrossRefGoogle Scholar
  36. Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th annual international conference on machine learning (pp. 609–616). ACM.Google Scholar
  37. Lessmann, S., Stahlbock, R., & Crone, S. F. (2006). Genetic algorithms for support vector machine model selection. In International joint conference on neural networks, 2006. IJCNN’06. (pp. 3063–3069). IEEE.Google Scholar
  38. Leyffer, S., & Safro, I. (2013). Fast response to infection spread and cyber attacks on large-scale networks. Journal of Complex Networks, 1(2), 183–199.CrossRefGoogle Scholar
  39. Li, T., Liu, X., Dong, Q., Ma, W., & Wang, K. (2016). HPSVM: Heterogeneous parallel SVM with factorization based IPM algorithm on CPU-GPU cluster. In 2016 24th Euromicro international conference on parallel, distributed, and network-based processing (PDP) (pp. 74–81). IEEE.Google Scholar
  40. Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml
  41. Lin, C. F., & Wang, S. D. (2002). Fuzzy support vector machines. IEEE Transactions on Neural Networks, 13(2), 464–471.CrossRefGoogle Scholar
  42. Lin, S. W., Lee, Z. J., Chen, S. C., & Tseng, T. Y. (2008). Parameter determination of support vector machine and feature selection using simulated annealing approach. Applied Soft Computing, 8(4), 1505–1512.CrossRefGoogle Scholar
  43. López, V., del Río, S., Benítez, J. M., & Herrera, F. (2015). Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data. Fuzzy Sets and Systems, 258, 5–38.MathSciNetCrossRefGoogle Scholar
  44. Lovaglio, P., & Vittadini, G. (2013). Multilevel dimensionality-reduction methods. Statistical Methods & Applications, 22(2), 183–207.  https://doi.org/10.1007/s10260-012-0215-2.MathSciNetzbMATHCrossRefGoogle Scholar
  45. Luts, J., Ojeda, F., Van de Plas, R., De Moor, B., Van Huffel, S., & Suykens, J. A. (2010). A tutorial on support vector machine-based methods for classification problems in chemometrics. Analytica Chimica Acta, 665(2), 129–145.CrossRefGoogle Scholar
  46. Mazurowski, M. A., Habas, P. A., Zurada, J. M., Lo, J. Y., Baker, J. A., & Tourassi, G. D. (2008). Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Networks, 21(2), 427–436.CrossRefGoogle Scholar
  47. Mehrotra, S. (1992). On the implementation of a primal-dual interior point method. SIAM Journal on Optimization, 2(4), 575–601.MathSciNetzbMATHCrossRefGoogle Scholar
  48. Muja, M., & Lowe, D. G. (2009). Fast approximate nearest neighbors with automatic algorithm configuration. In International conference on computer vision theory and application VISSAPP’09 (pp. 331–340). INSTICC Press.Google Scholar
  49. Muja, M., & Lowe, D. G. (2014). Scalable nearest neighbor algorithms for high dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(11), 2227–2240.CrossRefGoogle Scholar
  50. Noack, A., & Rotta, R. (2009). Multi-level algorithms for modularity clustering. In Experimental algorithms (pp. 257–268). Springer.Google Scholar
  51. Noack, A., & Rotta, R. (2009). Multi-level algorithms for modularity clustering. In J. Vahrenhold (Ed.) Experimental algorithms, Lecture Notes in Computer Science (vol. 5526, pp. 257–268). Berlin: Springer.  https://doi.org/10.1007/978-3-642-02011-7_24.
  52. Osuna, E., Freund, R., & Girosi, F. (1997). An improved training algorithm for support vector machines. In Neural Networks for Signal Processing [1997] VII. Proceedings of the 1997 IEEE Workshop (pp. 276–285). IEEE.Google Scholar
  53. Platt, J.C. (1999). Fast training of support vector machines using sequential minimal optimization. In Advances in kernel methods (pp. 185–208). MIT press.Google Scholar
  54. Puget, R., & Baskiotis, N. (2015). Hierarchical label partitioning for large scale classification. In IEEE international conference on data science and advanced analytics (DSAA), 2015. 36678 2015 (pp. 1–10). IEEE.Google Scholar
  55. Razzaghi, T., Roderick, O., Safro, I., & Marko, N. (2016). Multilevel weighted support vector machine for classification on healthcare data with missing values. PloS ONE, 11(5), e0155,119.CrossRefGoogle Scholar
  56. Razzaghi, T., & Safro, I. (2015). Scalable multilevel support vector machines. In International conference on computational science (ICCS), Procedia Computer Science (vol. 51, pp. 2683–2687). Elsevier.Google Scholar
  57. Ron, D., Safro, I., & Brandt, A. (2011). Relaxation-based coarsening and multiscale graph organization. Multiscale Modeling & Simulation, 9(1), 407–423.MathSciNetzbMATHCrossRefGoogle Scholar
  58. Rotta, R., & Noack, A. (2011). Multilevel local search algorithms for modularity clustering. Journal of Experimental Algorithmics (JEA), 16, 2–3.MathSciNetzbMATHGoogle Scholar
  59. Sadrfaridpour, E., Jeereddy, S., Kennedy, K., Luckow, A., Razzaghi, T., & Safro, I. (2017). Algebraic multigrid support vector machines. accepted in European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), arXiv preprint arXiv:1611.05487.
  60. Safro, I., Ron, D., & Brandt, A. (2008). Multilevel algorithms for linear ordering problems. ACM Journal of Experimental Algorithmics, 13, 4:1.4–4:1.20.Google Scholar
  61. Safro, I., Sanders, P., & Schulz, C. (2015). Advanced coarsening schemes for graph partitioning. ACM Journal of Experimental Algorithmics (JEA), 19, 2–2.MathSciNetzbMATHGoogle Scholar
  62. Safro, I., & Temkin, B. (2011). Multiscale approach for the network compression-friendly ordering. Jouranl of Discrete Algorithms, 9(2), 190–202.MathSciNetzbMATHCrossRefGoogle Scholar
  63. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.Google Scholar
  64. Sharon, E., Galun, M., Sharon, D., Basri, R., & Brandt, A. (2006). Hierarchy and adaptivity in segmenting visual scenes. Nature, 442(7104), 810–813.  https://doi.org/10.1038/nature04977.CrossRefGoogle Scholar
  65. Sun, Y., Kamel, M. S., Wong, A. K., & Wang, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12), 3358–3378.zbMATHCrossRefGoogle Scholar
  66. Tavallaee, M., Stakhanova, N., & Ghorbani, A. A. (2010). Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(5), 516–524.CrossRefGoogle Scholar
  67. Trottenberg, U., & Schuller, A. (2001). Multigrid. Orlando: Academic Press.zbMATHGoogle Scholar
  68. Wang, L. (2008). Feature selection with kernel class separability. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9), 1534–1546.CrossRefGoogle Scholar
  69. Wu, Q., & Zhou, D. X. (2005). Svm soft margin classifiers: Linear programming versus quadratic programming. Neural Computation, 17(5), 1160–1187.MathSciNetzbMATHCrossRefGoogle Scholar
  70. Yang, Z., Tang, W., Shintemirov, A., & Wu, Q. (2009). Association rule mining-based dissolved gas analysis for fault diagnosis of power transformers. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 39(6), 597–610.CrossRefGoogle Scholar
  71. You, Y., Demmel, J., Czechowski, K., Song, L., & Vuduc, R. (2015). CA-SVM: Communication-avoiding support vector machines on distributed systems. In 2015 IEEE international parallel and distributed processing symposium (IPDPS) (pp. 847–859). IEEE.Google Scholar
  72. You, Y., Fu, H., Song, S. L., Randles, A., Kerbyson, D., Marquez, A., et al. (2015). Scaling support vector machines on modern HPC platforms. Journal of Parallel and Distributed Computing, 76, 16–31.CrossRefGoogle Scholar
  73. Yu, H., Yang, J., & Han, J. (2003). Classifying large data sets using svms with hierarchical clusters. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 306–315). ACM.Google Scholar
  74. Zhang, X., Chen, X., & He, Z. (2010). An aco-based algorithm for parameter optimization of support vector machines. Expert Systems with Applications, 37(9), 6618–6628.CrossRefGoogle Scholar
  75. Zhou, L., Lai, K. K., & Yu, L. (2009). Credit scoring using support vector machines with direct search for parameters selection. Soft Computing—A Fusion of Foundations, Methodologies and Applications, 13(2), 149–155.zbMATHGoogle Scholar
  76. Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z., Cui, H., & Chang, E. Y. (2008). Parallelizing support vector machines on distributed computers. In Advances in neural information processing systems (pp. 257–264).Google Scholar
  77. Zhu, Z. A., Chen, W., Wang, G., Zhu, C., & Chen, Z. (2009). P-packSVM: Parallel primal gradient descent kernel SVM. In Ninth IEEE international conference on data mining, 2009. ICDM’09 (pp. 677–686). IEEE.Google Scholar
  78. Zhu, Z. B., & Song, Z. H. (2010). Fault diagnosis based on imbalance modified kernel fisher discriminant analysis. Chemical Engineering Research and Design, 88(8), 936–951.CrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of ComputingClemson UniversityClemsonUSA
  2. 2.Department of Industrial EngineeringNew Mexico State UniversityLas CrucesUSA

Personalised recommendations