Abstract
We present a rule induction method based on decision trees for classification and prediction problems. Our approach to tree construction relies on a discrete variant of support vector machines, in which the error is expressed by the number of misclassified instances, in place of the misclassification distance considered by traditional SVMs, and an additional term is included to reduce the complexity of the generated rule. This leads to the formulation of a mixed integer programming problem, whose approximate solution is obtained via a sequential LP-based algorithm. The decision tree is then built by means of a multivariate split derived at each node from the approximate solution of the discrete SVM. Computational tests on well-known benchmark datasets indicate that our classifier achieves a superior trade-off between accuracy and complexity of the induced rules, outperforming other competing approaches.
Triantaphyllou, E. and G. Felici (Eds.), Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Massive Computing Series, Springer, Heidelberg, Germany, pp. 305–326, 2006.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
K. Bennett and O.L. Mangasarian, “Robust linear programming discrimination of two linearly inseparable sets”, Optimization Methods and Software, Vol. 1, pp. 23–34, 1992.
K. Bennett and O.L. Mangasarian, “Multicategory discrimination via linear programming”, Optimization Methods and Software, Vol. 3, pp. 29–39, 1994.
K. Bennett, N. Cristianini, J. Shawe-Taylor and D. Wu, “Enlarging the margins in perceptron decision trees”, Machine Learning, Vol. 41, pp. 295–313, 2000.
L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and regression trees, Wadsworh International Group, Belmont, CA, USA, 1984.
Chunhui Chen and O. L. Mangasarian,. “Hybrid misclassification minimization”, Advances in Computational Mathematics, Vol. 5, pp. 127–136, 1996.
G.J. Koehler and S. Erenguc, “Minimizing misclassifications in linear discriminant analysis”, Decision Science, Vol. 21, pp. 63–85, 1990.
R. Kohavi, “A study of cross-validation and bootstrapping for accuracy estimation and model selection”, Proceedings of the Nth International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco, CA, USA, pp. 338–345, 1995.
K.F. Lam, E.U. Choo and J.W. Moy, “Minimizing deviations from the group mean: a new linear programming approach for the two-group classification problem”, European Journal of Operational Research, Vol. 88, pp. 358–367, 1996.
D. La Torre and C. Vercellis, “C 1,1 approximations of generalized support vector machines”, Journal of Concrete and Applicable Mathematics, Vol. 1, pp. 125–134, 2003.
T.S. Lim, W.Y. Loh and Y.S. Shih, “A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms”, Machine Learning, Vol. 40, pp. 203–229, 2000.
W.Y. Loh and Y.S. Shih, “Split selection methods for classification trees”, Statistica Sinica, Vol. 7, pp. 815–840, 1997.
O.L. Mangasarian, “Linear and nonlinear separation of patterns by linear programming”, Operations Research, Vol. 13, pp. 444–452, 1965.
O.L. Mangasarian, “Mathematical programming in neural networks”, ORSA J. on Computing, Vol. 5, pp. 349–360, 1993.
O.L. Mangasarian, “Misclassification minimization”, Journal of Global Optimization, Vol. 5, pp. 309–323, 1994.
O.L. Mangasarian, “Machine learning via polyhedral concave minimization”, In H. Fischer et al. eds., Applied mathematics and parallel computing, Physica-Verlag, Heidelberg, Germany, pp. 175–188, 1996.
O.L. Mangasarian, R. Setiono and W. Wolberg, “Pattern recognition via linear programming: theory and application to medical diagnosis”, In T.F. Coleman and Y. Li eds., Large-scale numerical optimization, SIAM, 1990.
S.K. Murthy, “Automatic construction of decision trees from data: A multi-disciplinary survey”, Data Mining and Knowledge Discovery, Vol. 2, pp. 345–389, 1998.
S.K. Murthy, S. Kasif and S. Salzberg, “A system for induction of oblique decision trees”, Journal of Artificial Intelligence Research, Vol. 2, pp. 1–32, 1994.
C. Orsenigo and C. Vercellis, “Multivariate classification trees based on minimum features discrete support vector machines”, IMA Journal of Management Mathematics, Vol. 14, pp. 221–234, 2003.
C. Orsenigo and C. Vercellis, “Discrete support vector decision trees via tabu-search”, Journal of Computational Statistics and Data Analysis, 2004, to appear.
J.R. Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann, San Mateo, CA, USA, 1993.
S.R. Safavin and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE Trans. on Systems, Man and Cybernetics, Vol. 21, pp. 660–674, 1991.
V. Vapnik, The nature of statistical learning theory, Springer-Verlag, New York, NY, USA, 1995.
V. Vapnik, Statistical learning theory, Wiley, New York, NY, USA, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Orsenigo, C., Vercellis, C. (2006). Rule Induction Through Discrete Support Vector Decision Trees. In: Triantaphyllou, E., Felici, G. (eds) Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques. Massive Computing, vol 6. Springer, Boston, MA . https://doi.org/10.1007/0-387-34296-6_9
Download citation
DOI: https://doi.org/10.1007/0-387-34296-6_9
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34294-8
Online ISBN: 978-0-387-34296-2
eBook Packages: Computer ScienceComputer Science (R0)