Regression-based classification methods and their comparison with decision tree algorithms

  • Mikhail V. Kiselev
  • Sergei M. Ananyan
  • Sergei B. Arseniev
Parallel Session 3a
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1263)


Classification learning can be considered as a regression problem with dependent variable consisting of 0s and 1s. Reducing classification to the problem of finding numerical dependencies we gain an opportunity to utilize powerful regression methods implemented in the PolyAnalyst data mining system. Resulting regression functions can be considered as fuzzy membership indicators for a recognized class. In order to obtain classifying rules, the optimum threshold values which minimize the number of misclassified cases can be found for these functions. We show that this approach allows one to solve the over-fit problem satisfactorily and provides results that are at least not worse than results obtained by the most popular decision tree algorithms.


classification learning non-linear regression decision trees 


  1. Belsley, D.A., Kuh, E., Welsch, R.E. Regression diagnostics: identifying influential data and sources of collinearity, New York, John Wiley & Sons, 1980.Google Scholar
  2. Bloedorn, E., Michalski, R.S. The AQ17-DCI system for data-driven constructive induction and its application to the analysis of world economics, in: Proceeding of ISMIS'96 (Ninth International Symposium on Methodologies for Intelligent Systems), Zakopane, Poland, Springer, 1996, pp 108–117.Google Scholar
  3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J. Classification and Regression Trees, Belmont, CA: Wardsworth, 1984.Google Scholar
  4. Jensen, D. Knowledge discovery through induction with radomization testing, in: Proceedings of the AAAIKDD-91 Workshop, Anaheim, CA, 1991, pp 148–159.Google Scholar
  5. Kass, G.V. An exploratory technique for investigating large quantities of categorical data, Applied Statistics, 24(2), 1974Google Scholar
  6. Kiselev, M.V. PolyAnalyst — a machine discovery system inferring functional programs, in Proceedings of AAAI Workshop on Knowledge Discovery in Databases'94, Seattle, 1994, pp. 237–249.Google Scholar
  7. Kiselev, M.V. PolyAnalyst 2.0: combination of statistical data preprocessing and symbolic KDD technique, in: Proceedings of ECML-95 Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, Heraklion, Greece, 1995, pp. 187–192.Google Scholar
  8. Kiselev, M.V., Arseniev, S.B. Discovery of numerical dependencies in form of rational expressions, in; Proceedings of ISMIS'96 (Ninth International Symposium on Methodologies for Intelligent Systems), Zakopane, Poland, Springer, 1996, pp. 134–145.Google Scholar
  9. Quinlan, J.R. C4.5 Programs for machine learning. Morgan Kaufmann, 1993.Google Scholar
  10. Zighed, D.A., Auray, J.P., Duru, G. SIPINA: Méthode et logiciel. Lyon Lacassagne, 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Mikhail V. Kiselev
    • 1
  • Sergei M. Ananyan
    • 2
  • Sergei B. Arseniev
    • 3
  1. 1.Megaputer Intelligence Ltd.MoscowRussia
  2. 2.Department of PhysicsCollege of William and MaryWilliamsburgUSA
  3. 3.Megaputer Intelligence Ltd.MoscowRussia

Personalised recommendations