Solving Regression Problems Using Competitive Ensemble Models

  • Yakov Frayman
  • Bernard F. Rolfe
  • Geoffrey I. Webb
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2557)

Abstract

The use of ensemble models in many problem domains has increased significantly in the last fewyears. The ensemble modeling, in particularly boosting, has shown a great promise in improving predictive performance of a model. Combining the ensemble members is normally done in a co-operative fashion where each of the ensemble members performs the same task and their predictions are aggregated to obtain the improved performance. However, it is also possible to combine the ensemble members in a competitive fashion where the best prediction of a relevant ensemble member is selected for a particular input. This option has been previously somewhat overlooked. The aim of this article is to investigate and compare the competitive and co-operative approaches to combining the models in the ensemble. A comparison is made between a competitive ensemble model and that of MARS with bagging, mixture of experts, hierarchical mixture of experts and a neural network ensemble over several public domain regression problems that have a high degree of nonlinearity and noise. The empirical results showa substantial advantage of competitive learning versus the co-operative learning for all the regression problems investigated. The requirements for creating the efficient ensembles and the available guidelines are also discussed.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barnett, J. A. “Computational methods for a mathematical theory of evidence”, Proceedings of IJCAI, pp. 868–875, 1981.Google Scholar
  2. 2.
    Bates, J. M. and C. W. J. Granger. “The combination of forecasts”. Operations Research Quaterly, 20:451–468, 1969.Google Scholar
  3. 3.
    Bauer, E. and Kohavi, R. “An empirical comparison of voting classification algorithms: bagging, boosting and variants”. Machine Learning, 36(1,2), 105–139, 1999.CrossRefGoogle Scholar
  4. 4.
    Baxt, W. G. “Improving the accuracy of an artificial neural network using multiple differently trained networks”. Neural Computation, 4:772–780, 1992.CrossRefGoogle Scholar
  5. 5.
    Breiman, L. “Bagging predictors”. Machine Learning, 26(2):123–140, 1996.Google Scholar
  6. 6.
    Brooks, R. A. “A robust layered control system for a mobile robot”. IEEE Journal of Robotics and Automation, 2:14–23, 1986.MathSciNetGoogle Scholar
  7. 7.
    Catfolis, T. and Meert, K. “Hybridization and specialization of real-time recurrent learning-based neural networks”, Connectionist Science, 9(1):51–70, 1997.CrossRefGoogle Scholar
  8. 8.
    Denker, J., Schwartz, D., Wittner, B., Solla, S., Howard, R., Jackel, L. and Hopfield, J. “Large automatic learning, rule extraction and generalisation”. Complex Systems, 1:877–922, 1987.MATHMathSciNetGoogle Scholar
  9. 9.
    Drucker, H. “Improving regressors using boosting techniques”. Proceedings of the 14th International Conference on Machine Learning, pp. 107–115, 1997.Google Scholar
  10. 10.
    Frayman, Y., Rolfe B. F., Hodgson, P. D. and Webb G. I. “Predicting the rolling force in hot steel rolling mill using an ensemble model”. Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2002), 2002. (in press).Google Scholar
  11. 11.
    Freund, Y. and R. Schapire. “A decision-theoretic generalization of on-line learning and an application to boosting”. Journal of Computer and System Sciences, 55(1):119–139, 1997.MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Friedman J. “Multivariate adaptive regression splines (with discussion)”. Annals of Statistics, 19(1), 1–82, 1991.MATHMathSciNetCrossRefGoogle Scholar
  13. 13.
    Friedman, J., Hastie, T., and Tibshirani, R. “Additive logistic regression: a statistical view of boosting (with discussion)”, Annals of Statistics, 28(2), 337–374, 2000.MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Friedman, J. “Greedy function approximation: a gradient boosting machine”. Annals of Statistics, 29(4). 2001.Google Scholar
  15. 15.
    Hashem, S. “Optimal linear combinations of neural networks”. Neural Networks, 10(4):599–614, 1997.CrossRefGoogle Scholar
  16. 16.
    Jacobs, R. A, Jordan, M. I., Nowlan, S. J., and Hinton, G. E. “Adaptive mixtures of local experts”. Neural Computation, 3:79–97, 1991.CrossRefGoogle Scholar
  17. 17.
    Jordan, M. I. and Jacobs R. A. “Hierarchical mixtures of experts and the em algorithm”. Neural Computation, 6(2):181–214, 1994.CrossRefGoogle Scholar
  18. 18.
    Ridgeway, G. “The state of boosting”. Computing Science and Statistics, 31:172–7181, 1999.Google Scholar
  19. 19.
    Rogova, G. “Combining the results of several neural network classifiers”. Neural Networks, 7(5):777–781, 1994.CrossRefGoogle Scholar
  20. 20.
    Schapire, R. E. “The strength of weak learnability”. Machine Learning, 5:197–227, 1990.Google Scholar
  21. 21.
    Sharkey, A.J.C. (Ed.) Combining artificial neural nets: ensemble and modular multi-net systems, Springer-Verlag, 1999.Google Scholar
  22. 22.
    Ting, K. M. “The characterisation of predictive accuracy and decision combination”. Proceedings of the 13th International Conference on Machine Learning, pp. 498–506, 1996.Google Scholar
  23. 23.
    Webb, G. “MultiBoosting: a technique for combining boosting andwagging”. Machine Learning, 40(2): 159–196, 2000.CrossRefGoogle Scholar
  24. 24.
    Wolpert, D.H. “Stacked generalization”. Neural Networks, 5:241–259, 1992.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Yakov Frayman
    • 1
  • Bernard F. Rolfe
    • 1
  • Geoffrey I. Webb
    • 1
  1. 1.School of Information TechnologyDeakin UniversityGeelongAustralia

Personalised recommendations