Decision Tree Using Local Support Vector Regression for Large Datasets

Tran-Nguyen, Minh-Thu; Bui, Le-Diem; Kim, Yong-Gi; Do, Thanh-Nghi

doi:10.1007/978-3-319-75417-8_24

Decision Tree Using Local Support Vector Regression for Large Datasets

Minh-Thu Tran-Nguyen¹⁸,
Le-Diem Bui^18,20,
Yong-Gi Kim²⁰ &
…
Thanh-Nghi Do^18,19

Conference paper
First Online: 14 February 2018

3138 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10751))

Abstract

Our proposed decision tree using local support vector regression models (tSVR) is to handle the regression task of large datasets. The learning algorithm tSVR of regression models is done by two main steps. The first one is to construct a decision tree regressor for partitioning the full training dataset into k terminal-nodes (subsets), followed which the second one is to learn the SVR model from each terminal-node to predict the data locally in the parallel way on multi-core computers. The tSVR algorithm is faster than the standard SVR in training the non-linear regression model from large datasets while maintaining the high correctness in the prediction. The numerical test results on datasets from UCI repository showed that the proposed tSVR is efficient compared to the standard SVR.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
It remarks that the complexity analysis of the tSVR excepts the tree regressor learnt to split the full dataset. However this training the tree regressor has the very low computational cost compared with the quadratic programming solution required by the SVR learning algorithm.

References

Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995). https://doi.org/10.1007/978-1-4757-3264-1
Book MATH Google Scholar
Guyon, I.: Web page on SVM applications (1999). http://www.clopinet.com/isabelle/Projects/-SVM/app-list.html
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)
MATH Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: and Other Kernel-Based Learning Methods. Cambridge University Press, New York (2000)
Book MATH Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods Support Vector Learning, pp. 185–208 (1999)
Google Scholar
OpenMP Architecture Review Board: OpenMP application program interface V3.0 (2008)
Google Scholar
Vapnik, V.: Principles of risk minimization for learning theory. In: Advances in Neural Information Processing Systems 4, NIPS Conference, Denver, Colorado, USA, 2–5 December 1991, pp. 831–838 (1991)
Google Scholar
Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)
Article Google Scholar
Vapnik, V., Bottou, L.: Local algorithms for pattern recognition and dependencies estimation. Neural Comput. 5(6), 893–909 (1993)
Article Google Scholar
Do, T., Poulet, F.: Parallel learning of local SVM algorithms for classifying large datasets. T. Large-Scale Data-Knowl.-Cent. Syst. 31, 67–93 (2016)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM : a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011)
Article Google Scholar
Lin, C.: A practical guide to support vector classification (2003)
Google Scholar
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Collobert, R., Bengio, S., Bengio, Y.: A parallel mixture of SVMs for very large scale problems. Neural Comput. 14(5), 1105–1114 (2002)
Article MATH Google Scholar
Do, T., Poulet, F.: Classifying very high-dimensional and large-scale multi-class image datasets with latent-LSVM. In: 2016 International IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, July 18–21, 2016, pp. 714–721 (2016)
Google Scholar
Do, T., Poulet, F.: Latent-LSVM classification of very high-dimensional and large-scale multi-class datasets. Concurr. Comput.: Pract. Exp. e4224–n/a
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Gu, Q., Han, J.: Clustered support vector machines. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, 29 April–1 May 2013, vol. 31, pp. 307–315. JMLR Proceedings (2013)
Google Scholar
Bui, L.-D., Tran-Nguyen, M.-T., Kim, Y.-G., Do, T.-N.: Parallel algorithm of local support vector regression for large datasets. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E.J. (eds.) FDSE 2017. LNCS, vol. 10646, pp. 139–153. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70004-5_10
Chapter Google Scholar
Do, T.-N.: Non-linear classification of massive datasets with a parallel algorithm of local support vector machines. In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods for Knowledge Engineering. AISC, vol. 358, pp. 231–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17996-4_21
Chapter Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley, January 1967
Google Scholar
Do, T.-N., Poulet, F.: Random local SVMs for classifying large datasets. In: Dang, T.K., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E. (eds.) FDSE 2015. LNCS, vol. 9446, pp. 3–15. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26135-5_1
Chapter Google Scholar
Chang, F., Guo, C.Y., Lin, X.R., Lu, C.J.: Tree decomposition for large-scale SVM problems. J. Mach. Learn. Res. 11, 2935–2972 (2010)
MathSciNet MATH Google Scholar
Chang, F., Liu, C.C.: Decision tree as an accelerator for support vector machines. In: Ding, X., (ed.) Advances in Character Recognition. InTech (2012)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993)
Google Scholar
Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 985–992. The MIT Press (2001)
Google Scholar
Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 2126–2136 (2006)
Google Scholar
Yang, T., Kecman, V.: Adaptive local hyperplane classification. Neurocomputing 71(1315), 3001–3004 (2008)
Article Google Scholar
Segata, N., Blanzieri, E.: Fast and scalable local kernel machines. J. Mach. Learn. Res. 11, 1883–1926 (2010)
MathSciNet MATH Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 97–104. ACM (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Technology, Can Tho University, Can Tho, Vietnam
Minh-Thu Tran-Nguyen, Le-Diem Bui & Thanh-Nghi Do
UMI UMMISCO 209 (IRD/UPMC), UPMC, Sorbonne University, Pierre and Marie Curie University, Paris 6, France
Thanh-Nghi Do
AI Lab, Computer Science Department, Gyeongsang National University, Jinju, Korea
Le-Diem Bui & Yong-Gi Kim

Authors

Minh-Thu Tran-Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Le-Diem Bui
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Gi Kim
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Nghi Do
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh-Nghi Do .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
Quang Binh University, Dong Hoi City, Vietnam
Duong Hung Hoang
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Rutgers University, Piscataway, New Jersey, USA
Hoang Pham
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tran-Nguyen, MT., Bui, LD., Kim, YG., Do, TN. (2018). Decision Tree Using Local Support Vector Regression for Large Datasets. In: Nguyen, N., Hoang, D., Hong, TP., Pham, H., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2018. Lecture Notes in Computer Science(), vol 10751. Springer, Cham. https://doi.org/10.1007/978-3-319-75417-8_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-75417-8_24
Published: 14 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75416-1
Online ISBN: 978-3-319-75417-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics