Abstract
We propose to use local labeling rules in random forests of decision trees for effectively classifying data. The decision rules use the majority vote for labeling at terminal-nodes in decision trees, maybe making the classical random forest algorithm degrade the classification performance. Our investigation aims at replacing the majority rules with the local ones, i.e. support vector machines to improve the prediction correctness of decision forests. The numerical test results on 8 datasets from UCI repository and 2 benchmarks of handwritten letters recognition showed that our proposal is more accurate than the classical random forest algorithm.
Notes
- 1.
Two classifiers are diverse if they make different errors on new datapoints [16].
- 2.
We remark that we tried to vary the number of decision trees from 20 to 500 for finding the best experimental results.
- 3.
the time complexity of learning a SVM model is in the order of \(n^2\) where n is the number of individuals [38].
References
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.: Classification and Regression Trees. Wadsworth International, Belmont (1984)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)
Rokach, L., Maimon, O.Z.: Data Mining with Decision Trees: Theory and Applications, vol. 69. World Scientific Pub Co Inc, Singapore (2008)
Cutler, A., Cutler, D.R., Stevens, J.R.: Tree-based methods. In: Li, X., Xu, R. (eds.) High-Dimensional Data Analysis in Cancer Research. Applied Bioinformatics and Biostatistics in Cancer Research, pp. 1–19. Springer, New York (2009)
Berry, M.J., Linoff, G.: Data Mining Techniques: For Marketing, Sales, and Customer Support. Wiley, New York (2011)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.): Machine Learning, Neural and Statistical Classification. Ellis Horwood, New York (1994)
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)
Valiant, L.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)
Kearns, M., Valiant, L.G.: Learning boolean formulae or finite automata is as hard as factoring. Technical report TR 14–88, Harvard University Aiken Computation Laboratory (1988)
Kearns, M., Valiant, L.: Cryptographic limitations on learning boolean formulae and finite automata. J. ACM 41(1), 67–95 (1994)
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4(1), 1–58 (1992)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Domingos, P.: A unified bias-variance decomposition. In: Proceedings of 17th International Conference on Machine Learning, pp. 231–238. Morgan Kaufmann, Stanford, CA (2000)
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Computational Learning Theory: Proceedings of the Second European Conference, pp. 23–37 (1995)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Asuncion, A., Newman, D.: UCI repository of machine learning databases (2007)
van der Maaten, L.: A new benchmark dataset for handwritten character recognition (2009). http://homepage.tudelft.nl/19j49/Publications_files/characters.zip
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York (2009)
Breiman, L.: Arcing classifiers. Ann. Stat. 26(3), 801–849 (1998)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000)
Ho, T.K.: Random decision forest. In: Proceedings of the Third International Conference on Document Analysis and Recognition, pp. 278–282 (1995)
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Mach. Learn. 45(1), 5–32 (2001)
Vapnik, V.: Principles of risk minimization for learning theory. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems, vol. 4, pp. 831–838. Morgan Kaufmann, San Mateo (1991)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, New York (2000)
Weston, J., Watkins, C.: Support vector machines for multi-class pattern recognition. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks, pp. 219–224 (1999)
Guermeur, Y.: Svm multiclasses, théorie et applications (2007)
Kreßel, U.: Pairwise classification and support vector machines. In: Smola, A., et al. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 255–268. MIT Press, Cambridge (1999)
Platt, J., Cristianini, N., Shawe-Taylor, J.: Large margin dags for multiclass classification. In: Solla, S.A., Leen, T.K., Müller, K. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 547–553. MIT Press, Cambridge (2000)
Vural, V., Dy, J.: A hierarchical method for multi-class support vector machines. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 831–838 (2004)
Benabdeslem, K., Bennani, Y.: Dendogram-based svm for multi-class classification. J. Comput. Inf. Technol. 14(4), 283–289 (2006)
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(4), 1871–1874 (2008)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011)
Vapnik, V., Bottou, L.: Local algorithms for pattern recognition and dependencies estimation. Neural Comput. 5(6), 893–909 (1993)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Lin, C.: A practical guide to support vector classification (2003)
Murthy, S., Kasif, S., Salzberg, S., Beigel, R.: OC1: randomized induction of oblique decision trees. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 322–327 (1993)
Wu, W., Bennett, K., Cristianini, N., Shawe-Taylor, J.: Large margin trees for induction and transduction. In: Proceedings of the Sixth International Conference on Machine Learning, pp. 474–483 (1999)
Rokach, L., Maimon, O.: Top-down induction of decision trees classifiers - a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 35(4), 476–487 (2005)
Bennett, K.P., Mangasarian, O.L.: Multicategory discrimination via linear programming. Optim. Meth. Softw. 3, 27–39 (1994)
Loh, W.Y., Vanichsetakul, N.: Tree-structured classification via generalized discriminant analysis (with discussion). J. Am. Stat. Assoc. 83, 715–728 (1988)
Yildiz, O., Alpaydin, E.: Linear discriminant trees. Int. J. Pattern Recogn. Artif. Intell. 19(3), 323–353 (2005)
Cutler, A., Guohua, Z.: PERT - perfect random tree ensembles. Comput. Sci. Stat. 33, 490–497 (2001)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Do, T.-N., Lenca, P., Lallich, S., Pham, N.-K.: Classifying very-high-dimensional data with random forests of oblique decision trees. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 39–55. Springer, Heidelberg (2010)
Robnik-Sikonja, M.: Improving random forests. In: Proceedings of the Fifth European Conference on Machine Learning, pp. 359–370 (2004)
Friedman, J.H., Kohavi, R., Yun, Y.: Lazy decision trees. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence and Eighth Innovative Applications of ArtificialIntelligence Conference, AAAI 1996, IAAI 1996, vol. 1, pp. 717–724, Portland, Oregon, 4–8 Aug 1996
Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), pp. 161–169, Nashville, Tennessee, USA, 8–12 Jul 1997
Marcellin, S., Zighed, D., Ritschard, G.: An asymmetric entropy measure for decision trees. In: IPMU 2006, Paris, France, pp. 1292–1299 (2006)
Lenca, P., Lallich, S., Do, T.-N., Pham, N.-K.: A comparison of different off-centered entropies to deal with class imbalance for decision trees. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 634–643. Springer, Heidelberg (2008)
Do, T., Lenca, P., Lallich, S.: Enhancing network intrusion classification through the kolmogorov-smirnov splitting criterion. In: ICTACS 2010, pp. 50–61, Vietnam (2010)
Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 202–207, Portland, Oregon, USA (1996)
Seewald, A.K., Petrak, J., Widmer, G.: Hybrid decision tree learners with alternative leaf classifiers: an empirical study. In: Inernational Florida Artificial Intelligence Research Society Conference, pp. 407–411 (2000)
Pham, N.K., Do, T.N., Lenca, P., Lallich, S.: Using local node information in decision trees: coupling a local decision rule with an off-centered. In: International Conference on Data Mining, pp. 117–123, Las Vegas, Nevada, USA, CSREA Press (2008)
Chang, F., Guo, C.Y., Lin, X.R., Lu, C.J.: Tree decomposition for large-scale SVM problems. J. Mach. Learn. Res. 11, 2935–2972 (2010)
Ritschard, G., Marcellin, S., Zighed, D.A.: Arbre de décision pour données déséquilibrées : sur la complémentarité de l’intentisé d’implication et de l’entropie décentrée. In: Analyse Statistique Implicative - Une méthode d’analyse de données pour la recherche de causalités, pp. 207–222 (2009)
Lerman, I., Gras, R., Rostam, H.: Elaboration et évaluation d’un indice d’implication pour données binaires. Math. Sci. Hum. 74, 5–35 (1981)
Geurts, P., Wehenkel, L., d’Alché Buc, F.: Kernelizing the output of tree-based methods. In: Cohen, W.W., Moore, A., (eds.) Proceedings of the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, 25–29 Jun 2006, ACM International Conference Proceeding Series, vol. 148, pp. 345–352, ACM (2006)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. B 39(1), 1–38 (1977)
Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)
Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 985–992. The MIT Press, Cambridge (2001)
Boley, D., Cao, D.: Training support vector machines using adaptive clustering. In: Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D.B. (eds.) Proceedings of the Fourth SIAM International Conference on Data Mining, pp. 126–137, Lake Buena Vista, Florida, USA, 22–24 Apr 2004, SIAM (2004)
Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2126–2136 (2006)
Yang, T., Kecman, V.: Adaptive local hyperplane classification. Neurocomputing 71(1315), 3001–3004 (2008)
Segata, N., Blanzieri, E.: Fast and scalable local kernel machines. J. Mach. Learn. Res. 11, 1883–1926 (2010)
Cheng, H., Tan, P.N., Jin, R.: Efficient algorithm for localized support vector machine. IEEE Trans. Knowl. Data Eng. 22(4), 537–549 (2010)
Kecman, V., Brooks, J.: Locally linear support vector machines and other local models. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–6 (2010)
Ladicky, L., Torr, P.H.S.: Locally linear support vector machines. In: Getoor, L., Scheffer, T., (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 985–992, Bellevue, Washington, USA, Jun 28–Jul 2 2011, Omnipress (2011)
Gu, Q., Han, J.: Clustered support vector machines. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, Apr 29–May 1 2013, JMLR Proceedings, vol. 31, pp. 307–315 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Do, TN. (2015). Using Local Rules in Random Forests of Decision Trees. In: Dang, T., Wagner, R., Küng, J., Thoai, N., Takizawa, M., Neuhold, E. (eds) Future Data and Security Engineering. FDSE 2015. Lecture Notes in Computer Science(), vol 9446. Springer, Cham. https://doi.org/10.1007/978-3-319-26135-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-26135-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26134-8
Online ISBN: 978-3-319-26135-5
eBook Packages: Computer ScienceComputer Science (R0)