The Classification Game: Complexity Regularization through Interaction

  • Samarth Swarup
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6069)


We show that if a population of neural network agents is allowed to interact during learning, so as to arrive at a consensus solution to the learning problem, then they can implicitly achieve complexity regularization. We call this learning paradigm, the classification game. We characterize the game-theoretic equilibria of this system, and show how low-complexity equilibria get selected. The benefit of finding a low-complexity solution is better expected generalization. We demonstrate this benefit through experiments.


Neural Network Hide Layer Nash Equilibrium Internal Representation Input Point 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barron, A.R.: Complexity regularization with application to artificial neural networks. In: Roussas, G. (ed.) Nonparametric Functional Estimation and Related Topics, pp. 561–576. Kluwer Academic Publishers, Boston (1991)Google Scholar
  2. 2.
    Hinton, G., van Camp, D.: Keeping neural networks simple by minimizing the description length of the weights. In: Pitt, L. (ed.) Proceedings of the Sixth ACM Conference on Computational Learning Theory, Santa Cruz, CA, USA, pp. 5–13. ACM, New York (1993)CrossRefGoogle Scholar
  3. 3.
    Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Computation 9(1), 1–42 (1997)zbMATHCrossRefGoogle Scholar
  4. 4.
    Wolpert, D.: Backpropagation over i-o functions rather than weights. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 200–207. Morgan Kaufmann, San Mateo (1994)Google Scholar
  5. 5.
    Swarup, S., Gasser, L.: The classification game: Combining supervised learning and language evolution. Connection Science (to appear, 2010)Google Scholar
  6. 6.
    Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice-Hall, Englewood Cliffs (1998)Google Scholar
  7. 7.
    Hinton, G.E.: Learning translation invariant recognition in a massively parallel network. In: Goos, G., Hartmanis, J. (eds.) PARLE 1987. LNCS, vol. 258, pp. 1–13. Springer, Heidelberg (1987)Google Scholar
  8. 8.
    Jin, Y., Okabe, T., Sendhoff, B.: Neural network regularization and ensembling using multi-objective evolutionary algorithms. In: Proceedings of the Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE Press, Los Alamitos (2004)Google Scholar
  9. 9.
    Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent. In: Solla, S.A., Leen, T.K., Muller, K.R. (eds.) Advances in Neural Information Processing Systems, vol. 12, pp. 512–518. MIT Press, Cambridge (2000)Google Scholar
  10. 10.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  11. 11.
    Haussler, D.: Decision-theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation 100(1), 78–150 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Occam’s razor. Information Processing Letters 24(6), 377–380 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Board, R., Pitt, L.: On the necessity of Occam algorithms. In: Proceedings of the Twenty Second Annual ACM Symposium on the Theory of Computing (STOC), pp. 54–63 (1990)Google Scholar
  14. 14.
    Li, M., Tromp, J., Vitányi, P.: Sharpening Occam’s razor. Information Processing Letters 85(5), 267–274 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)Google Scholar
  16. 16.
    Luo, P., Wong, K.Y.M.: Dynamical and stationary properties of on-line learning from finite training sets. Physical Review E 67(1) (2003)Google Scholar
  17. 17.
    Ampazis, N., Perantonis, S.J., Taylor, J.G.: Dynamics of multilayer networks in the vicinity of temporary minima. Neural Networks 12, 43–58 (1999)CrossRefGoogle Scholar
  18. 18.
    Diaco, A., DiCarlo, J., Santos, J.: Stanford medical students database (2000),
  19. 19.
    Sirovich, L., Kirby, M.: Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America A 4, 519–524 (1987)CrossRefGoogle Scholar
  20. 20.
    Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (1991)Google Scholar
  21. 21.
    Werbos, P.: Backpropagation: Past and future. In: Proceedings of the IEEE International Conference on Neural Networks, pp. 343–353. IEEE Press, Los Alamitos (1988)CrossRefGoogle Scholar
  22. 22.
    Swarup, S., Gasser, L.: Language evolution on a dynamic social network. In: The MORS Conference on Analyzing the Impact of Emerging Societies on National Security, Argonne, IL, April 14-18 (2008)Google Scholar
  23. 23.
    Swarup, S., Gasser, L.: Unifying evolutionary and network dynamics. Phys. Rev. E 75, 66114 (2007)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Samarth Swarup
    • 1
  1. 1.Network Dynamics and Simulation Science Lab, Virginia Bioinformatics InstituteVirginia Polytechnic Institute and State UniversityBlacksburg

Personalised recommendations