Skip to main content

Evolving Culture Versus Local Minima

  • Chapter
  • First Online:
Growing Adaptive Machines

Part of the book series: Studies in Computational Intelligence ((SCI,volume 557))

Abstract

We propose a theory that relates difficulty of learning in deep architectures to culture and language. It is articulated around the following hypotheses: (1) learning in an individual human brain is hampered by the presence of effective local minima; (2) this optimization difficulty is particularly important when it comes to learning higher-level abstractions, i.e., concepts that cover a vast and highly-nonlinear span of sensory configurations; (3) such high-level abstractions are best represented in brains by the composition of many levels of representation, i.e., by deep architectures; (4) a human brain can learn such high-level abstractions if guided by the signals produced by other humans, which act as hints or indirect supervision for these high-level abstractions; and (5), language and the recombination and optimization of mental concepts provide an efficient evolutionary recombination operator, and this gives rise to rapid search in the space of communicable ideas that help humans build up better high-level internal representations of their world. These hypotheses put together imply that human culture and the evolution of ideas have been crucial to counter an optimization difficulty: this optimization difficulty would otherwise make it very difficult for human brains to capture high-level knowledge of the world. The theory is grounded in experimental observations of the difficulties of training deep artificial neural networks. Plausible consequences of this theory for the efficiency of cultural evolution are sketched.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See [3] for a review of Deep Learning research, which had a breakthrough in 2006 [6, 22, 36].

  2. 2.

    Note that the rewards received by an agent depend on the tasks that it faces, which may be different depending on the biological and social niche that it occupies.

  3. 3.

    Stationary i.i.d case where examples independently come from the same stationary distribution \(P\).

  4. 4.

    In many machine learning algorithms, one minimizes the training error plus a regularization penalty which prevents the learner from simply learning the training examples by heart without good generalization on new examples.

  5. 5.

    Although it is always possible to trivially overfit the top two layers of a deep network by memorizing patterns, this may still happen with very poor training of lower levels, corresponding to poor representation learning.

  6. 6.

    Which ignores the interaction with the other levels, except for receiving input from the level below.

  7. 7.

    Results got worse in terms of generalization error, while training error could be small thanks to capacity in the top few layers.

  8. 8.

    i.e., communicate the concept as a function that associates an indicator of its presence with all compatible sensory configurations.

  9. 9.

    The training criterion is here seen as a function of the learned parameters, as a sum of an error function over a training distribution of examples.

  10. 10.

    “better” in the sense of the survival value they provide, and how well they allow their owner to understand the world around them. Note how this depends on the context (ecological and social niche) and that there may be many good solutions.

  11. 11.

    Remember that a meme is copied in a process of teaching by example which is highly stochastic, due to the randomness in encounters (in which particular percepts serve as examples of the meme) and due to the small number of examples of the meme. This creates a highly variable randomly distorted version of the meme in the learner’s brain.

  12. 12.

    Selfish memes [12, 13] may also strive in a population: they do not really help the population but they nonetheless maintain themselves in it by some form of self-promotion or exploiting human weaknesses.

References

  1. D.H. Ackley, G.E. Hinton, T.J. Sejnowski, A learning algorithm for Boltzmann machines. Cogn. Sci. 9, 147–169 (1985)

    Article  Google Scholar 

  2. M.A. Arbib, The Handbook of Brain Theory and Neural Networks (MIT Press, Cambridge, 1995)

    Google Scholar 

  3. Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Lear. 2(1), 1–127 2009. Also published as a book. Now Publishers, 2009

    Google Scholar 

  4. Y. Bengio, O. Delalleau, On the expressive power of deep architectures, in Proceedings of the 22nd International Conference on Algorithmic Learning Theory, 2011, ed. by J. Kivinen, C. Szepesvári, E. Ukkonen, T. Zeugmann

    Google Scholar 

  5. Y. Bengio, O. Delalleau, C. Simard, Decision trees do not generalize to new variations. Comput. Intell. 26(4), 449–467 (2010)

    Google Scholar 

  6. Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep networks, in Advances in Neural Information Processing Systems 19 (NIPS’06), ed. by B. Schölkopf, J. Platt, T. Hoffman (MIT Press, Cambridge, 2007), pp. 153–160

    Google Scholar 

  7. Y. Bengio, Y. LeCun, Scaling learning algorithms towards AI. in Large Scale Kernel Machines, ed. by L. Bottou, O. Chapelle, D. DeCoste, J. Weston (MIT Press, Cambridge, 2007)

    Google Scholar 

  8. Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum learning, in Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML’09), ed. by L. Bottou, M. Littman (ACM, 2009)

    Google Scholar 

  9. L. Bottou, Stochastic learning, in Advanced Lectures on Machine Learning, number LNAI 3176 in Lecture notes in artificial intelligence, ed. by O. Bousquet, U. von Luxburg (Springer, Berlin, 2004), pp. 146–168

    Chapter  Google Scholar 

  10. M.A. Carreira-Perpiñan, G.E. Hinton, On contrastive divergence learning, in Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (AISTATS’05), ed. by R.G. Cowell, Z. Ghahramani (Society for Artificial Intelligence and Statistics, 2005) pp. 33–40.

    Google Scholar 

  11. R. Caruana, Multitask connectionist learning, in Proceedings of the 1993 Connectionist Models Summer School, 1993, pp. 372–379

    Google Scholar 

  12. R. Dawkins, The Selfish Gene (Oxford University Press, London, 1976)

    Google Scholar 

  13. K. Distin, The Selfish Meme (Cambridge University Press, London, 2005)

    Google Scholar 

  14. J.L. Elman, Learning and development in neural networks: the importance of starting small. Cognition 48, 781–799 (1993)

    Article  Google Scholar 

  15. D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, S. Bengio, Why does unsupervised pre-training help deep learning? J. Mach. Lear. Res. 11, 625–660 (2010)

    Google Scholar 

  16. J. Håstad, Almost optimal lower bounds for small depth circuits, in Proceedings of the 18th annual ACM Symposium on Theory of Computing (ACM Press, Berkeley, 1986), pp. 6–20

    Google Scholar 

  17. J. Håstad, M. Goldmann, On the power of small-depth threshold circuits. Comput. Complex. 1, 113–129 (1991)

    Article  MATH  Google Scholar 

  18. G.E. Hinton, T.J. Sejnowski, D.H. Ackley, Boltzmann machines: constraint satisfaction networks that learn. Technical Report TR-CMU-CS-84-119, (Dept. of Computer Science, Carnegie-Mellon University, 1984)

    Google Scholar 

  19. G.E. Hinton, Learning distributed representations of concepts, in Proceedings of the Eighth Annual Conference of the Cognitive Science Society (Lawrence Erlbaum, Hillsdale, Amherst 1986, 1986), pp. 1–12

    Google Scholar 

  20. G.E. Hinton, Connectionist learning procedures. Artif. Intell. 40, 185–234 (1989)

    Article  Google Scholar 

  21. G.E. Hinton, S.J. Nowlan, How learning can guide evolution. Complex Syst. 1, 495–502 (1989)

    Google Scholar 

  22. G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  23. G.E. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Google Scholar 

  24. J.H. Holland, Adaptation in Natural and Artificial Systems (University of Michigan Press, Ann Arbor, 1975)

    Google Scholar 

  25. E. Hutchins, B. Hazlehurst, How to invent a lexicon: the development of shared symbols in interaction, in Artificial Societies: The Computer Simulation of Social Life, ed. by N. Gilbert, R. Conte (UCL Press, London, 1995), pp. 157–189

    Google Scholar 

  26. E. Hutchins, B. Hazlehurst, Auto-organization and emergence of shared language structure, in Simulating the Evolution of Language, ed. by A. Cangelosi, D. Parisi (Springer, London, 2002), pp. 279–305

    Chapter  Google Scholar 

  27. K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. LeCun, What is the best multi-stage architecture for object recognition? in Proceedings of IEEE International Conference on Computer Vision (ICCV’09), 2009, pp. 2146–2153

    Google Scholar 

  28. F. Khan, X. Zhu, B. Mutlu, How do humans teach: on curriculum learning and teaching dimension, in Advances in Neural Information Processing Systems 24 (NIPS’11), 2011 pp. 1449–1457

    Google Scholar 

  29. K.A. Krueger, P. Dayan, Flexible shaping: how learning in small steps helps. Cognition 110, 380–394 (2009)

    Article  Google Scholar 

  30. H. Larochelle, Y. Bengio, J. Louradour, P. Lamblin, Exploring strategies for training deep neural networks. J. Mach. Lear. Res. 10, 1–40 (2009)

    MATH  Google Scholar 

  31. H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML’09), ed. by L. Bottou, M. Littman (ACM, Montreal (Qc), Canada, 2009)

    Google Scholar 

  32. J. Martens. Deep learning via Hessian-free optimization, in Proceedings of the Twenty-seventh International Conference on Machine Learning (ICML-10), ed. by L. Bottou, M. Littman (ACM, 2010) pp. 735–742

    Google Scholar 

  33. E. Moritz, Memetic science: I–general introduction. J. Ideas 1, 1–23 (1990)

    Google Scholar 

  34. G.B. Peterson, A day of great illumination: B. F. Skinner’s discovery of shaping. J. Exp. Anal. Behav. 82(3), 317–328 (2004)

    Article  Google Scholar 

  35. R. Raina, A. Battle, H. Lee, B. Packer, A.Y. Ng, Self-taught learning: transfer learning from unlabeled data, in Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML’07), ed. by Z. Ghahramani (ACM, 2007), pp. 759–766

    Google Scholar 

  36. M. Ranzato, C. Poultney, S. Chopra, Y. LeCun, Efficient learning of sparse representations with an energy-based model, in Advances in Neural Information Processing Systems 19 (NIPS’06), ed. by B. Schölkopf, J. Platt, T. Hoffman (MIT Press, 2007) pp. 1137–1144

    Google Scholar 

  37. D.E. Rumelhart, J.L. McClelland, and the PDP Research Group Parallel Distributed Processing Explorations in the Microstructure of Cognition, (MIT Press, Cambridge, 1986)

    Google Scholar 

  38. R. Salakhutdinov, G.E. Hinton, Deep Boltzmann machines, in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), vol. 8, 2009

    Google Scholar 

  39. R. Salakhutdinov, G.E. Hinton, Deep Boltzmann machines. in Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS’09), vol. 5, 2009, pp. 448–455

    Google Scholar 

  40. R. Salakhutdinov, H. Larochelle, Efficient learning of deep Boltzmann machines, in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010), JMLR W&CP, vol. 9, 2010, pp. 693–700

    Google Scholar 

  41. B.F. Skinner, Reinforcement today. Am. Psychol. 13, 94–99 (1958)

    Article  Google Scholar 

  42. F. Subiaul, J. Cantlon, R.L. Holloway, H.S. Terrace, Cognitive imitation in rhesus macaques. Science 305(5682), 407–410 (2004)

    Article  Google Scholar 

  43. R. Sutton, A. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998)

    Google Scholar 

  44. J. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Google Scholar 

  45. L. van der Maaten, G.E. Hinton, Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    Google Scholar 

  46. J. Weston, F. Ratle, R. Collobert, Deep learning via semi-supervised embedding, in Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08), ed. by W.W. Cohen, A. McCallum, S.T. Roweis (ACM, New York, NY, USA, 2008), pp. 1168–1175

    Google Scholar 

  47. A. Yao, Separating the polynomial-time hierarchy by oracles, in Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, 1985, pp. 1–10

    Google Scholar 

  48. A.L. Yuille, The convergence of contrastive divergences, in Advances in Neural Information Processing Systems 17 (NIPS’04), ed. by L.K. Saul, Y. Weiss, L. Bottou (MIT Press, 2005) pp. 1593–1600

    Google Scholar 

Download references

Acknowledgments

The author would like to thank Caglar Gulcehre, Aaron Courville, Myriam Côté, and Olivier Delalleau for useful feedback, as well as NSERC, CIFAR and the Canada Research Chairs for funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yoshua Bengio .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bengio, Y. (2014). Evolving Culture Versus Local Minima. In: Kowaliw, T., Bredeche, N., Doursat, R. (eds) Growing Adaptive Machines. Studies in Computational Intelligence, vol 557. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55337-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-55337-0_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-55336-3

  • Online ISBN: 978-3-642-55337-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics