Model combination in the multiple-data-batches scenario

  • Kai Ming Ting
  • Boon Toh Low
Part II: Regular Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1224)


The approach of combining models learned from multiple batches of data provide an alternative to the common practice of learning one model from all the available data (i.e., the data combination approach). This paper empirically examines the base-line behaviour of the model combination approach in this multiple-data-batches scenario. We find that model combination can lead to better performance even if the disjoint batches of data are drawn randomly from a larger sample, and relate the relative performance of the two approaches to the learning curve of the classifier used.

The practical implication of our results is that one should consider using model combination rather than data combination, especially when multiple batches of data for the same task are readily available.

Another interesting result is that we empirically show that the near-asymptotic performance of a single model, in some classification task, can be significantly improved by combining multiple models (derived from the same algorithm) if the constituent models are substantially different and there is some regularity in the models to be exploited by the combination method used. Comparisons with known theoretical results are also provided.


model combination data combination empirical evaluation learning curve near-asymptotic performance 


  1. Aha, D.W., D. Kibler & M.K. Albert (1991), Instance-Based Learning Algorithms, Machine Learning, 6, pp. 37–66.Google Scholar
  2. Ali, K.M. & M.J. Pazzani (1996), Error Reduction through Learning Multiple Descriptions, Machine Learning, Vol. 24, No. 3, pp. 173–206.Google Scholar
  3. Baxt, W.G. (1992), Improving the Accuracy of an Artificial Neural Network using Multiple Differently Trained Networks, Neural Computation, Vol. 4, No. 5, pp. 772–780, The MIT Press.Google Scholar
  4. Brazdil,P. & Torgo,L. (1990), Knowledge Acquisition via Knowledge Integration. In Current Trends in Knowledge Acquisition, Wielinga, B. et al.(eds.).Google Scholar
  5. Breiman, L. (1996a), Bagging Predictors, Machine Learning, Vol. 24, No. 2, pp. 123–140.Google Scholar
  6. Breiman, L. (1996b), Bias, Variance, and Arcing Classifiers, Technical Report 460, Department of Statistics, University of California, Berkeley, CA.Google Scholar
  7. Breiman, L. (1996c), Pasting Bites Together for Prediction in Large Data Sets and On-Line, [].Google Scholar
  8. Breiman, L., J.H. Friedman, R.A. Olshen & C.J. Stone (1984), Classification And Regression Trees, Belmont, CA: Wadsworth.Google Scholar
  9. Brodley, C.E. (1993), Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection, in Proceedings of the Tenth International Conference on Machine Learning, pp. 17–24.Google Scholar
  10. Buntine, W. (1991), Classifiers: A Theoretical and Empirical Study, in Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pp. 638–644, Morgan-Kaufmann.Google Scholar
  11. Cestnik, B. (1990), Estimating Probabilities: A Crucial Task in Machine Learning, in Proceedings of the European Conference on Artificial Intelligence, pp. 147–149.Google Scholar
  12. Chan, P.K. & S.J. Stolfo (1995), A Comparative Evaluation of Voting and Metalearning on Partitioned Data, in Proceedings of the Twelfth International Conference on Machine Learning, pp. 90–98, Morgan Kaufmann.Google Scholar
  13. Chan, P.K. & S.J. Stolfo (1996), On the Accuracy of Meta-learning for Scalable Data Mining, in Journal of Intelligent System, to appear.Google Scholar
  14. Cost, S. & S. Salzberg (1993), A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features, Machine Learning, 10, pp. 57–78.Google Scholar
  15. Craven, M.W. & J.W. Shavlik (1993), Learning to Represent Codons: A Challenge Problem for Constructive Induction, Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1319–1324.Google Scholar
  16. Fayyad, U.M. & K.B. Irani (1993), Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning, in Proceedings of 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027.Google Scholar
  17. Freund, Y. & R.E. Schapire (1996), Experiments with a New Boosting Algorithm, in Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156, Morgan Kaufmann.Google Scholar
  18. Hansen, L.K. & P. Salamon (1990), Neural Network Ensembles, in IEEE Transactions of Pattern Analysis and Machine Intelligence, 12, pp. 993–1001.Google Scholar
  19. Kearns, M. & H.S. Seung (1995), Learning from a Population of Hypotheses, Machine Learning, 18, pp. 255–276, Kluwer Academic Publishers.Google Scholar
  20. Kononenko, I. & M. Kovačič (1992), Learning as Optimization: Stochastic Generation of Multiple Knowledge, in Proceedings of the Ninth International Conference on Machine Learning, pp. 257–262, Morgan Kaufmann.Google Scholar
  21. Krogh, A. & J. Vedelsby (1995), Neural Network Ensembles, Cross Validation, and Active Learning, in Advances in Neural Information Processing Systems 7, G. Tesauro, D.S. Touretsky & T.K. Leen (Editors), pp. 231–238.Google Scholar
  22. Kwok, S. & C. Carter (1990), Multiple Decision Trees, Uncertainty in Artificial Intelligence 4, R. Shachter, T. Levitt, L. Kanal and J. Lemmer (Editors), pp. 327–335, North-Holland.Google Scholar
  23. Merz, C.J. & Murphy, P.M. (1996), UCI Repository of machine learning data-bases [http:// mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.Google Scholar
  24. Oliver, J.J. & D.J. Hand (1995), On Pruning and Averaging Decision Trees, in Proceedings of the Twelfth International Conference on Machine Learning, pp. 430–437. Morgan Kaufmann.Google Scholar
  25. Perrone, M.P. & L.N. Cooper (1993), When Networks Disagree: Ensemble Methods for Hybrid Neural Networks, in Artificial Neural Networks for Speech and Vision, R.J. Mammone (Editor), Chapman-Hall.Google Scholar
  26. Provost, F.J. & D.N. Hennessy (1996), Scaling Up: Distributed Machine Learning with Cooperation, in Proceedings of the Thirteen National Conference on Artificial Intelligence, pp. 74–79, Menlo Park, CA: AAAI Press.Google Scholar
  27. Quinlan, J.R. (1993), C4.5: Program for machine learning, Morgan Kaufmann.Google Scholar
  28. Quinlan, J.R. (1996), Boosting, Bagging, and C4.5, in Proceedings of the 13th National Conference on Artificial Intelligence, pp. 725–730, AAAI Press.Google Scholar
  29. Quinlan, J.R., P.J. Compton, K.A. Horn & L. Lazarus (1987), Inductive Knowledge Acquisition: A Case Study, in Applications of Expert Systems, J.R. Quinlan (Editor). Turing Institute Press with Addison Wesley.Google Scholar
  30. Schapire, R.E. (1990), The Strength of Weak Learnability, Machine Learning, 5, pp. 197–227, Kluwer Academic Publishers.Google Scholar
  31. Sejnowski, T.J. & C.R. Rosenberg (1987), Parallel networks that learn to pronounce English text, Complex Systems, 1, pp. 145–168.Google Scholar
  32. Tcheng, D., B. Lambert, C-Y. Lu & L. Rendell (1989), Building Robust Learning Systems by Combining Induction and Optimization, in Proceedings of the 11th International Joint Conference on Artificial Intelligence, pp. 806–812.Google Scholar
  33. Ting, K.M. (1994), Discretization of Continuous-Valued Attributes and Instance-Based Learning, TR 491, Basser Department of Computer Science, University of Sydney.Google Scholar
  34. Ting, K.M. (1996), The Characterisation of Predictive Accuracy and Decision Combination, in Proceedings of the Thirteenth International Conference on Machine Learning, pp. 498–506, Morgan Kaufmann.Google Scholar
  35. Ting, K.M. (1997), Discretisation in Lazy Learning Algorithms, to appear in the special issue on Lazy Learning in Artificial Intelligence Review Journal.Google Scholar
  36. Ting, K.M. & B.T. Low (1996), Theory Combination: an alternative to Data Combination, Working Paper 96/19, Department of Computer Science, University of Waikato. [].Google Scholar
  37. Ting, K.M. & I. H. Witten (1997), Stacked Generalization: when does it work?, Working Paper 97/1, Dept of Computer Science, University of Waikato.Google Scholar
  38. Towell, G., J. Shavlik & M. Noordewier (1990), Refinement of Approximate Domain Theories by Knowledge-Based Artificial Neural Networks, in Proceedings of the Eighth National Conference on Artificial Intelligence.Google Scholar
  39. Utgoff, P.E. (1989), Perceptron Trees: A case study in hybrid concept representations, Connection Science, 1, pp. 337–391.Google Scholar
  40. Wettschereck, D. (1994), A Hybrid Nearest-Neighbor and Nearest-Hyperrectangle Algorithm, in Proceedings of the Seventh European Conference on Machine Learning, LNAI-784, pp. 323–335, Springer Verlag.Google Scholar
  41. Wolpert, D.H. (1992), Stacked Generalization, Neural Networks, Vol. 5, pp. 241–259, Pergamon Press.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Kai Ming Ting
    • 1
  • Boon Toh Low
    • 2
  1. 1.Department of Computer ScienceUniversity of WaikatoHamiltonNew Zealand
  2. 2.Department of Systems Engineering and Engineering ManagementChinese University of Hong KongShatinHong Kong

Personalised recommendations