Abstract
In this paper a novel approach to neurocognitive modeling is proposed in which the central constraints are provided by the theory of reinforcement learning. In this formulation learning is (1) exploiting the statistical properties of the system’s environment, (2) constrained by biologically inspired Hebbian interactions and (3) based only on algorithms which are consistent and stable. In the resulting model some of the most enigmatic problems of artificial intelligence have to be addressed. In particular, considerations on combinatorial explosion lead to constraints on the concepts of state-action pairs: these concepts have the peculiar flavor of determinism in a partially observed and thus highly uncertain world. We will argue that these concepts of factored reinforcement learning result in an intriguing learning task that we call the symbol learning problem. For this task we sketch an information theoretic framework and point towards a possible resolution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
P. Abbeel and A. Y. Ng, Apprenticeship learning via inverse reinforcement learning, in: D. Schuurmans, R. Geiner and C. Brodley, editors, Proceedings of the 21st International Conference on Machine Learning, pages 663–670, New York, NY, 2004. ACM Press.
K. Abed-Meraim and A. Belouchrani, Algorithms for joint block diagonalization, in: Proceedings of EUSIPCO, pages 209–212, 2004.
D. Ackley, G. E. Hinton and T. Sejnowski, A learning algorithm for Boltzmann machines, Cognitive Science, 9 (1985), 147–169.
N. Alon, R. A. Duke, H. Lefmann, V. Rödl and R. Yuster, The algorithmic aspects of the regularity lemma, Journal of Algorithms, 16 (1994), 80–109.
F. Attneave, Some informational aspects of visual perception, Psychological Review, 61 (1954), 183–193.
F. R. Bach and M. I. Jordan, Beyond independent components: Trees and clusters, Journal of Machine Learning Research, 4 (2003), 1205–1233.
F. R. Bach and M. I. Jordan, Finding clusters in Independent Component Analysis, in: Proceedings of ICA2003, pages 891–896, 2003.
D. H. Ballard, G. E. Hinton and T. J. Sejnowski, Parallel visual computation, Nature, 306 (1983), 21–26.
H. B. Barlow, Sensory Communication, pages 217–234, MIT Press, Cambridge, MA, 1961.
A. Barto, Discrete and continuous models, International Journal of General Systems, (1978), 163–177.
A. P. Batista and W. T. Newsome, Visuo-motor control: Giving the brain a hand, Current Biology, 10 (2000), R145–R148.
J. Baxter, A. Tridgell and L. Weaver, Machines that learn to play games, chapter Reinforcement learning and chess, pages 91–116, Nova Science Publishers, Inc., 2001.
C. Boutilier, R. Dearden and M. Goldszmidt, Exploiting structure in policy construction, in: Proceedings of the 14th Fourteenth International Joint Conference on Artificial Intelligence, pages 1104–1111, 1995.
C. Boutilier, R. Dearden and M. Goldszmidt, Stochastic dynamic programming with factored representations, Artificial Intelligence, 121(1–2) (2000), 49–107.
R. I. Brafman and M. Tennenholtz, A near-optimal polynomial time algorithm for learning in certain classes of stochastic games, Artificial Intelligence, 121(1–2) (2000), 31–47.
R. I. Brafman and M. Tennenholtz, R-max — a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, 3 (2002), 213–231.
L. Buşoniu, R. Babuška and B. De Schutter, Multi-agent reinforcement learning: A survey, in: Proceedings of the 9th International Conference on Control, Automation, Robotics and Vision, pages 527–532, 2006.
[18] E. Candes and J. Romberg, Quantitative robust uncertainty principles and optimally sparse decompositions, Foundations of Computational Mathematics, 6 (2006), 227–254.
E. Candes, J. Romberg and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, 52 (2006), 489–509.
J. F. Cardoso, Multidimensional independent component analysis, in: Proceedings of ICASSP, volume 4, pages 1941–1944, 1998.
O. Carter, D. Presti, C. Callistemon, Y. Ungerer, G. Liu and J. Pettigrew, Meditation alters perceptual rivalry in Tibetan Buddhist monks, Current Biology, 15 (2005), R412–R413.
Y.-H. Chang, T. Ho and L. P. Kaelbling, All learning is local: Multi-agent learning in global reward games, in: Advances in Neural Information Processing Systems 16, 2004.
S. Choi, A. Cichocki, H.-M. Park and S.-Y. Lee, Blind source separation and independent component analysis, Neural Information Processing — Letters and Reviews, 6 (2005), 1–57.
J. J. Chrobak, A. Lőrincz and G. Buzsáki, Physiological patterns in the hippocampo-entorhinal cortex system, Hippocampus, 10 (2000), 457–465.
P. Comon, Independent Component Analysis, a new concept? Signal Processing, Elsevier, 36(3) (April 1994), 287–314. Special issue on Higher-Order Statistics.
V. Conitzer and T. Sandholm, AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents, Machine Learning, 67 (2007), 23–43.
N. D. Daw, Y. Niv and P. Dayan, Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control, Nature Neuroscience, 8 (2005), 1704–1711.
P.-T. de Boer, D. P. Kroese, S. Mannor and R. Y. Rubinstein, A tutorial on the cross-entropy method, Annals of Operations Research, 134 (2004), 19–67.
M. R. Delgado, Reward-related responses in the human striatum, Annals of the New York Academy of Sciences, 1104 (2007), 70–88.
D. C. Dennett, Consciousness explained, Little Brown, Boston, MA, 1991.
D. W. Dong and J. J. Atick, Statistics of natural time varying images, Network Computation in Neural Systems, 6 (1995), 345–358.
D. W. Dong and J. J. Atick, Temporal decorrelation: A theory of lagged and nonlagged responses in the lateral geniculate-nucleus, Network Computation in Neural Systems, 6 (1995), 159–178.
D. Donoho, Compressed sensing, IEEE Transactions on Information Theory, 52 (2006), 1289–1306.
P. Drineas, R. Kannan and M. W. Mahoney, Fast monte carlo algorithms for matrices i: Approximating matrix multiplication, SIAM Journal of Computing, 36 (2006), 132–157.
P. Drineas, M. W. Mahoney and S. Muthukrishnan, Sampling algorithms for l2 regression and applications, in: Proceedings of the 17th Annual SODA, pages 1127–1136, 2006.
D. J. Field, What is the goal of sensory coding?, Neural Computation, 6 (1994), 559–601.
J. A. Fodor, Methodological solipsism considered as a research strategy in cognitive psychology, Behavioral and Brain Sciences, 3 (1980), 63–109.
T. Fomin, T. Rozgonyi, Cs. Szepesvári and A. Lőrincz, Self-organizing multiresolution grid for motion planning and control, International Journal of Neural Systems, 7 (1997), 757–776.
M. Franzius, H. Sprekeler and L. Wiskott, Slowness and sparseness lead to place, head-direction and spatial-view cells, PLoS Computational Biology, (8), 2007, doi:10.1371/journal.pcbi.0030166.
A. M. Frieze and R. Kannan, The regularity lemma and approximation schemes for dense problems, in: Proceedings of the 37th Annual IEEE Symposium on Foundations of Computing, pages 12–20, 1996.
Alan Frieze and Ravi Kannan, A simple algorithm for constructing szemerédi’s regularity partition, Electronic Journal of Combinatorics, 6 (1999). http://www.emis.ams.org/journals/EJC/Volume 6/PDF/v6i1r17.pdf.
C. Fyfe and R. Baddeley, Finding compact and sparse-distributed representations of visual images, Network Computation in Neural Systems, 6 (1995), 333–344.
C. G. Gross, G. S. Yap and M. S. A. Graziano, Coding of visual space by premotor neurons, Science, 266 (1994), 1054–1057.
C. Guestrin, D. Koller, C. Gearhart and N. Kanodia, Generalizing plans to new environments in relational MDPs, in: Proceedings of the 18th International Joint Conference on Artificial Intelligence, 2003.
C. Guestrin, D. Koller, R. Parr and S. Venkataraman, Efficient solution algorithms for factored MDPs, Journal of Artificial Intelligence Research, 19 (2002), 399–468.
V. Gyenes and A. Lőrincz, Co-learning and the development of communication, Lecture Notes in Computer Science, 4668 (2007), 827–837.
S. Harnad, The symbol grounding problem, Physica D, D 42 (1990), 335–346.
D. A. Henze, L. Wittner and G. Buzsáki, Single granule cells reliably discharge targets in the hippocampal CA3 network in vivo, Nature Neuroscience, 5 (2002), 790–795.
G. E. Hinton and R. R. Slakhutdnikov, Reducing the dimensionality of data with neural networks, Science, 313 (2006), 504–507.
Y. K. Hwang and N. Ahuja, Gross motion planning — a survey, ACM Computing Surveys, 24(3) (1992), 219–291.
A. Hyvärinen, Independent component analysis for time-dependent stochastic processes, in: Proceedings of ICANN, pages 541–546, Berlin, 1998. Springer-Verlag.
A. Hyvärinen and U. Köster, FastISA: A fast fixed-point algorithm for independent subspace analysis, in: Proceedings of ESANN, Evere, Belgium, 2006.
S. Ishii, H. Fujita, M. Mitsutake, T. Yamazaki, J. Matsuda and Y. Matsuno, A reinforcement learning scheme for a partially-observable multi-agent game, Machine Learning, 59(1–2) (2005), 31–54.
W. James, The Principles of Psychology, 1890, p. 488 http://www.archive.org/details/theprinciplesofp01jameuoft
Zs. Kalmár, Cs. Szepesvári and A. Lőrincz, Module-based reinforcement learning: Experiments with a real robot, Machine Learning, 31 (1998), 55–85.
M. Kawato, H. Hayakawa and T. Inui, A forward-inverse model of reciprocal connections between visual neocortical areas, Network, 4 (1993), 415–422.
M Kearns and S. Singh, Near-optimal reinforcement learning in polynomial time, in: Proceedings of the 15th International Conference on Machine Learning, pages 260–268, San Francisco, CA, 1998. Morgan Kaufmann Publishers Inc.
F. Kloosterman, T. van Haeften and F. H. Lopes da Silva, Two reentrant pathways in the hippocampal-entorhinal system, Hippocampus, 14 (2004), 1026–1039.
B. J. Knowlton and L. R. Squire, The learning of categories: parallel brain systems for item memory and category knowledge, Science, 10 (1993), 1747–1749.
B. Knutson and G. E. Wimmer, Splitting the difference: How does the brain code reward episodes?, Annals of the New York Academy of Sciences, 1104, (2007), 54–69.
D. Koller and R. Parr, Policy iteration for factored MDPs, in: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pages 326–334, 2000.
B. Kveton, M. Hauskrecht and C. Guestrin, Solving factored MDPs with hybrid state and action variables, Journal of Artificial Intelligence Research, 27 (2006), 153–201.
D. Lee and H. Seo, Mechanisms of reinforcement learning and decision making in the primate dorsolateral prefrontal cortex, Annals of the New York Academy of Sciences, 1104 (2007), 108–122.
H. Lee, A. Battle, R. Raina and A. Y. Ng, Efficient sparse coding algorithms, in: B. Schölkopf, J. Platt and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 801–808. MIT Press, Cambridge, MA, 2007.
D. A. Leopold and N. K. Logothetis, Activity changes in early visual cortex reflect monkeys’ percepts during binocular rivalry, Nature, 379 (1996), 549–553.
D. A. Leopold, M. Wilke, A. Maier and N. K. Logothetis, Stable perception of visually ambiguous patterns, Nature Neuroscience, 5 (2002), 605–609.
N. K. Logothetis and J. D. Schall, Neuronal correlates of subjective visual-perception, Science, 245 (1989), 761–763.au]1 András Lőrincz
A. Lőrincz, Forming independent components via temporal locking of reconstruction architectures: A functional model of the hippocampus, Biological Cybernetics, 75 (1998), 37–47.
A. Lőrincz and G. Buzsáki, Two-phase computational model training long-term memories in the entorhinal-hippocampal region, Annals of the New York Academy of Sciences, 911, (2000), 83–111.
A. Lőrincz, Gy. Hévízi and Cs. Szepesvári, Ockham’s razor modeling of the matrisome channels of the basal ganglia thalamocortical loop, International Journal of Neural Systems, 11 (2001), 125–143.
A. Lőrincz, V. Gyenes, M. Kiszlinger and I. Szita, Mind model seems necessary for the emergence of communication, Neural Information Processing — Letters and Reviews, 11 (2007), 109–121.
A. Lőrincz, M. Kiszlinger and G. Szirtes, Model of the hippocampal formation explains the coexistence of grid cells and place cells, http://arxiv.org/abs/0804.3176, 2008.
A. Lőrincz, Zs. Palotai and G. Szirtes, Spike-based cross-entropy method for reconstruction, Neurocomputing, 2008, (in press).
A. Lőrincz, I. Pólik and I. Szita, Event-learning and robust policy heuristics, Cognitive Systems Research, 4 (2003), 319–337.
A. Lőrincz and Z. Szabó, Neurally plausible, non-combinatorial iterative independent process analysis, Neurocomputing, 70 (2007), 1569–1573.
A. Lőrincz, B. Szatmáry and G. Szirtes, Mystery of structure and function of sensory processing areas of the neocortex: A resolution, Journal of Computational Neuroscience, 13 (2002), 187–205.
A. Lőrincz and G. Szirtes, Autoregressive model of the hippocampal representation of events, in: Proceedings of IJCNN2009, (in press).
L. Margolin, On the convergence of the cross-entropy method, Annals of Operations Research, 134 (2005), 201–214.
B. L. McNaughton, F P. Battaglia, O. Jensen, E. I. Moser and M.-B. Moser, Path integration and the neural basis of the ćognitive map, Nature Reviews Neuroscience, 7 (2006), 663–678.
T. C. Mills, Time Series Techniques for Economists, Cambridge University Press, Cambridge, 1990.
P. R. Montague, S. E. Hyman and J. D. Cohen, Computational roles for dopamine in behavioural control, Nature, 431 (2004), 760–767.
G. Neu and Cs. Szepesvári, Apprenticeship learning using inverse reinforcement learning and gradient methods, in: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, pages 295–302. AUAI Press, 2007.
A. Y. Ng and S. Russell, Algorithms for inverse reinforcement learning, in: Proceedings of the 17th International Conference on Machine Learning, pages 663–670, San Francisco, CA, 2000. Morgan Kaufmann Publishers Inc.
G. Nolte, F. C. Meinecke, A. Ziehe and K. R. Müller, Identifying interactions in mixed and noisy complex systems, Physical Review E, 73 (2006), doi: 051913.
B. A. Olshausen and D. J. Field, Emergence of simple-cell receptive field properties by learning a sparse code for natural images, Nature, 381 (1996), 607–609.
B. A. Olshausen and D. J. Field, Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37 (1997), 3311–3325.
W. X. Pan, R. Schmidt, J. R. Wickens and B. I. Hyland, Dopamine cells respond to predicted events during classical conditioning: Evidence for eligibility traces in the reward-learning network, Journal of Neuroscience, 25 (2005), 6235–6242.
B. Póczos and A. Lőrincz, Independent subspace analysis using geodesic spanning trees, in: Proceedings of the 22nd International Conference on Machine Learning, pages 673–680, New York, NY, USA, 2005. ACM Press.
B. Póczos, Z. Szabó, M. Kiszlinger and A. Lőrincz, Independent process analysis without a priori dimensional information, Lecture Notes in Computer Science, 4666 (2007), 252–259.
B. Póczos, B. Takács and A. Lőrincz, Independent subspace analysis on innovations, in: Proceedings of ECML, pages 698–706, Berlin, 2005. Springer-Verlag.
T. Poggio, V. Torre and C. Koch, Computational vision and regularization theory, Nature, 317 (1985), 314–319.
Z. W. Pylyshyn, Computation and cognition: Issues in the foundations of cognitive science, Behavioral and Brain Sciences, 3 (1980), 111–169.
R. P. N. Rao and D. H. Ballard, Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects, Nature Neuroscience, 2 (1999), 79–87.
P. Redgrave and K. Gurney, The short-latency dopamine signal: a role in discovering novel actions?, Nature Reviews Neuroscience, 7 (2006), 967–975.
A. D. Redish, F. P. Battaglia, M. K. Chawla, A. D. Ekstrom, J. L. Gerrard, P. Lipa, E. S. Rosenzweig, P. F. Worley, J. F. Guzowski, B. L. McNaughton and C. A. Barnes, Independence of firing correlates of anatomically proximate hippocampal pyramidal cells, Journal of Neuroscience, 21 (2001), 1–6.
M. Rehn and F. T. Sommer, A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields, Journal of Computational Neuroscience, 22 (2007), 135–146.
P. Reinagel and R. C. Reid, Temporal coding of visual information in the thalamus, Journal of Neuroscience, 20 (2000), 5392–5400.
B. Sallans, Reinforcement Learning for Factored Markov Decision Processes, PhD thesis, University of Toronto, 2002.
K. Samejima and K. Doya, Multiple representations of belief states and action values in corticobasal ganglia loops, Annals of the New York Academy of Sciences, 1104 (2007), 213–228.
S. Sanner and C. Boutilier, Approximate linear programming for first-order MDPs, in: Proceedings of the 21th Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 509–517, 2005.
W. Schultz, Getting formal with dopamine and reward, Neuron, 36 (2002), 241–263.
W. B. Scoville and B. Milner, Loss of recent memory after bilateral hippocampal lesions, Journal of Neurology, Neurosurgery and Psychiatry, 20 (1957), 11–21.
P. Spronck, M. Ponsen, I. Sprinkhuizen-Kuyper and E. Postma, Adaptive game ai with dynamic scripting, Machine Learning, 63(3) (2006), 217–248.
L. R. Squire, Memory and hippocampus: a synthesis of findings with rats, monkeys and humans, Psychological Review, 99 (1992), 195–231.
H. Stögbauer, A. Kraskov, S. A. Astakhov and P. Grassberger, Least dependent component analysis based on mutual information, Physical Review E, 70, 2004.
Z. Szabó, B. Póczos and A. Lőrincz, Cross-entropy optimization for independent process analysis, in: Lecture Notes in Computer Science, 3889 (2006), 909–916. Springer, 2006.
Z. Szabó, B. Póczos and A. Lőrincz, Separation theorem for K-independent subspace analysis with sufficient conditions, Technical report, 2006, ttp://arxiv.org/abs/math.ST/0608100.
Z. Szabó, B. Póczos and A. Lőrincz, Undercomplete blind subspace deconvolution, Journal of Machine Learning Research, 8 (2007), 1063–1095.
Cs. Szepesvári, Sz. Cimmer and A. Lőrincz, Neurocontroller using dynamic state feedback for compensatory control, Neural Networks, 10 (1997), 1691–1708.
Cs. Szepesvári and A. Lőrincz, Approximate inverse-dynamics based robust control using static and dynamic feedback, in: Kalkkuhl, K. J. Hunt, R. Zbikowski and A. Dzielinski, editors, Applications of Neural Adaptive Control Theory, volume 2, pages 151–179. World Scientific, Singapore, 1997.
Cs. Szepesvári and A. Lőrincz, An integrated architecture for motion-control and path-planning, Journal of Robotic Systems, 15 (1998), 1–15.
I. Szita and A. Lőrincz, Learning Tetris using the noisy cross-entropy method, Neural Computation, 18(12) (2006), 2936–2941.
I. Szita and A. Lőrincz, Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man, Journal of Artificial Intelligence Research, 30 (2007), 659–684.
I. Szita and A. Lőrincz, Factored value iteration converges, Acta Cybernetica, accepted (2008). http://arxiv.org/abs/0801.2069.
I. Szita and A. Lőrincz, Online variants of the cross-entropy method, http://arxiv.org/abs/0801.1988v1, 2008.
I. Szita, B. Takács and A. Lőrincz, Epsilon-mdps: Learning in varying environments, Journal of Machine Learning Research, 3 (2003), 145–174.
T. Tao, Szemerédi’s regularity lemma revisited, Contributions to Discrete Mathematics, 1 (2006), 8–28.
S. C. Tanaka, K. Doya, G. Okada, K. Ueda, Y. Okamoto and S. Yamawaki, 3,4, Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops, Nature Neuroscience, 7 (2004), 887–893.
G. Tesauro, Temporal difference learning and TD-gammon, Communications of the ACM, 38(3) (1995), 58–68.
F. J. Theis, Uniqueness of complex and multidimensional independent component analysis, Signal Processing, 84(5) (2004), 951–956.
F. J. Theis, Blind signal separation into groups of dependent signals using joint block diagonalization, in: Proceedings of ISCAS, pages 5878–5881, 2005.
F. J. Theis, Towards a general independent subspace analysis, in: Advances in Neural Information Processing Systems 19, pages 1361–1368, 2007.
R. Vollgraf and K. Obermayer, Multi-dimensional ICA to separate correlated sources, in: Advances in Neural Information Processing Systems 14, pages 993–1000. MIT Press, 2001.
S. Yu and J. Shi, Multiclass spectral clustering, in: Proceedings of ICCV, pages 313–319, 2003.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 János Bolyai Mathematical Society and Springer-Verlag
About this chapter
Cite this chapter
Lőrincz, A. (2008). Learning and Representation: From Compressive Sampling to the ‘Symbol Learning Problem’. In: Bollobás, B., Kozma, R., Miklós, D. (eds) Handbook of Large-Scale Random Networks. Bolyai Society Mathematical Studies, vol 18. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69395-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-69395-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69394-9
Online ISBN: 978-3-540-69395-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)