Machine Learning

, Volume 55, Issue 1, pp 31–52

Classification Using Φ-Machines and Constructive Function Approximation

  • Doina Precup
  • Paul E. Utgoff
Article

Abstract

This article presents a new classification algorithm, called CLEF, which induces a Φ-machine by constructing its own features based on the training data. The features can be viewed as defining subsets of the instance space, and they allow CLEF to create useful non-linear functions over the input variables. The algorithm is guaranteed to find a classifier that separates the training instances, if such a separation is possible. We compare CLEF empirically to several other classification algorithms, including a well-known decision tree inducer, an artificial neural network inducer, and a support vector machine inducer. Our results show that the CLEF-induced Φ-machines and support vector machines have similar accuracy on the suite tested, and that both are significantly more accurate than the other classifiers produced. We argue that the classifiers produced by CLEF are easy to interpret, and hence may be preferred over support vector machines in certain circumstances.

classification constructive induction linear machine Φ-machine non-linear discrimination decision tree support vector machine 

References

  1. Ash, T. (1989). Dynamic node creation in backpropagation networks (Technical Report ICS Report 8901). Institute for Cognitive Science, University of California, San Diego, CA.Google Scholar
  2. Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. [http://www.ics.uci.edu/mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.Google Scholar
  3. Brodley, C., & Utgoff, P. E. (1995). Multivariate decision trees. Machine Learning, 19, 45–77.Google Scholar
  4. Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley & Sons.Google Scholar
  5. Elomaa, T., & Rousu, J. (1996). Finding optimal multi-splits for numerical attributes in decision tree learning (Technical Report NC-TR-96-041). NeuroCOLT.Google Scholar
  6. Fahlman, S. E., & Lebiere, C. (1990). The cascade correlation architecture. Advances in Neural Information Processing Systems (pp. 524–532). Boston, MA: MIT Press.Google Scholar
  7. Fayyad, U., & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1022–1027). San Francisco, CA: Morgan Kaufmann.Google Scholar
  8. Fisher, D. H., & McKusick, K. (1989). An empirical comparison of ID3 and backpropagation. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 788–793). San Francisco, CA: Morgan Kaufmann.Google Scholar
  9. Frean, M. (1990a). Small nets and short paths: Optimizing neural computation. Doctoral dissertation, Center for Cognitive Science, University of Edinburgh.Google Scholar
  10. Frean, M. (1990b). The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation, 2, 198–209.Google Scholar
  11. Frean, M. (1992). A thermal perceptron learning rule. Neural Computation, 4, 946–957.Google Scholar
  12. Fulton, T., Kasif, S., & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 244–251). San Francisco, CA: Morgan Kaufmann.Google Scholar
  13. Gallant, S. I. (1990). Perceptron-based learning algorithms. IEEE Transactions on Neural Networks, 1, 179–191.Google Scholar
  14. Gamma, J. (1999). Discriminant trees. In Proceedings of the Sixteenth International Conference on Machine Learning (pp. 134–142). San Francisco, CA: Morgan Kaufmann.Google Scholar
  15. Gunn, S. (1998). Support vector machines for classification and regression (Technical Report). ISIS, University of Southampton.Google Scholar
  16. Hanson, S. J. (1990). Meiosis networks. In Advances in Neural Information Processing Systems (pp. 533–541). Boston, MA: MIT Press.Google Scholar
  17. Matheus, C. J., & Rendell, L. A. (1989). Constructive induction on decision trees. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 645–650). San Francisco, CA: Morgan Kaufmann.Google Scholar
  18. Mezard, M., & Nadal, J. (1989). Learning feed-forward networks: The tiling algorithm. J. Phys. A: Math Gen., 22, 2191–2203.Google Scholar
  19. Mooney, R., Shavlik, J., Towell, G., & Gove, A. (1989). An experimental comparison of symbolic and connectionist learning algorithms. In Proceedings of the Eleventh International Joint conference on Artificial Intelligence (pp. 775–780). San Francisco, CA: Morgan Kaufmann.Google Scholar
  20. Nadal, J. (1989). Study of a growth algorithm for a feed-forward neural network. International Journal of Neural Systems, 1, 55–59.Google Scholar
  21. Nilsson, N. J. (1965). Learning machines. New York: McGraw-Hill.Google Scholar
  22. Pagallo, G. M. (1990). Adaptive decision tree algorithms for learning from examples. Doctoral dissertation, University of California, Santa Cruz.Google Scholar
  23. Pagallo, G. M., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5, 71–99.Google Scholar
  24. Parekh, R., Yang, J., & Honavar, V. (1997). Constructive neural network learning algorithms for multi-category real-valued pattern classification (Technical Report 97-06). Iowa State University, Computer Science Department.Google Scholar
  25. Parekh, R., Yang, J., & Honavar, V. (2000). Constructive neural network learning algorithms for pattern classification. IEEE Transactions on Neural Networks, 11(2), 436–451.Google Scholar
  26. Poulard, H. (1995). Barycentric correction procedure: A fast method for learning threshold units. In Proceedings of World Congress on Neural Networks (WCNN-95) (pp. 710–713). Washington, DC.Google Scholar
  27. Quinlan, J. R. (1989). Unknown attribute values in induction. In Proceedings of the Sixth International Workshop on Machine Learning (pp. 164–168). San Francisco, CA: Morgan Kaufmann.Google Scholar
  28. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  29. Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (vol. 1). Boston, MA: MIT Press.Google Scholar
  30. Utgoff, P. E. (1988). Perceptron trees: A case study in hybrid concept representations. In Proceedings of the Seventh National Conference on Artificial Intelligence (pp. 601–606). San Francisco, CA: Morgan Kaufmann.Google Scholar
  31. Utgoff, P. E., Berkman, N. C., & Clouse, J. A. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29, 5–44.Google Scholar
  32. Utgoff, P. E., & Precup, D. (1998). Constructive function approximation. In H. Motoda & H. Liu (Eds.), Feature extraction, construction, and selection: A data-mining perspective (pp. 219–235). Boston, MA: Kluwer Academic Publishers.Google Scholar
  33. Vapnik, V. (1995). The nature of statistical learning theory. Springer Verlag, New York.Google Scholar
  34. Walpole, R. (1974). Introduction to statistics. Hampshire, England: MacMillan Publishers Company.Google Scholar
  35. Wynne-Jones, M. (1992). Node splitting: A constructive algorithm for feed-forward neural networks. Advances in Neural Information Processing Systems (pp. 1072–1079). Boston, MA: MIT Press.Google Scholar
  36. Yang, J., Parekh, R., & Honavar, V. (1999). DistAl: An inter-pattern distance-based constructive learning algorithm. Intelligent Data Analysis, 3, 55–73.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Doina Precup
    • 1
  • Paul E. Utgoff
    • 2
  1. 1.School of Computer ScienceMcGill UniversityMontrealCanada
  2. 2.Department of Computer ScienceUniversity of MassachusettsUSA

Personalised recommendations