Recognizing recurrent neural networks (rRNN): Bayesian inference for recurrent neural networks
- 928 Downloads
Recurrent neural networks (RNNs) are widely used in computational neuroscience and machine learning applications. In an RNN, each neuron computes its output as a nonlinear function of its integrated input. While the importance of RNNs, especially as models of brain processing, is undisputed, it is also widely acknowledged that the computations in standard RNN models may be an over-simplification of what real neuronal networks compute. Here, we suggest that the RNN approach may be made computationally more powerful by its fusion with Bayesian inference techniques for nonlinear dynamical systems. In this scheme, we use an RNN as a generative model of dynamic input caused by the environment, e.g. of speech or kinematics. Given this generative RNN model, we derive Bayesian update equations that can decode its output. Critically, these updates define a ‘recognizing RNN’ (rRNN), in which neurons compute and exchange prediction and prediction error messages. The rRNN has several desirable features that a conventional RNN does not have, e.g. fast decoding of dynamic stimuli and robustness to initial conditions and noise. Furthermore, it implements a predictive coding scheme for dynamic inputs. We suggest that the Bayesian inversion of RNNs may be useful both as a model of brain function and as a machine learning tool. We illustrate the use of the rRNN by an application to the online decoding (i.e. recognition) of human kinematics.
KeywordsRecurrent neural networks Bayesian inference Nonlinear dynamics Human motion
We thank both anonymous reviewers for the helpful and constructive comments on a previous version of thismanuscript.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
- Archambeau C, Opper M, Shen Y, Cornford D, Shawe-Taylor J (2008) Variational inference for diffusion processes. In: Platt J, Koller D, Singer Y, Roweis S (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 17–24Google Scholar
- Doucet, A, Freitas, N, Gordon, N (eds) (2001) Sequential Monte Carlo Methods in Practice. Springer, BerlinGoogle Scholar
- Friston K, Stephan K, Li B, Daunizeau J (2010) Generalised filtering. Math Probl Eng. Article ID 621, 670. doi: 10.1155/2010/621670
- Ghahramani Z, Beal MJ (2001) Propagation algorithms for variational bayesian learning. In: Leen T, Dietterich T, Tresp V (eds) Advances in neural information processing systems, vol 13. MIT Press, Cambridge, pp 507–513Google Scholar
- Hammer B, Steil JJ (2002) Tutorial Perspectives on learning with rnns. In: Proceedings of European symposium on artificial neural networks (ESANN) d-side publi, pp 357–368Google Scholar
- Jaeger H (2001) The “echo state” approach to analysing and training recurrent neural networks. GMD Report 148, German National Research Center for Information TechnologyGoogle Scholar
- Jazwinski AH (1970) Stochastic processes and filtering theory. Academic Press, New YorkGoogle Scholar
- Kantas N, Doucet A, Singh SS, Maciejowski JM (2009) Overview of sequential monte carlo methods for parameter estimation on general state space models. In: Proceedings of the 15th IFAC symposium on system identification (SYSID), Saint-Malo, FranceGoogle Scholar
- Kelso JAS (1995) Dynamic patterns: the self-organization of brain and behavior. MIT Press, CambridgeGoogle Scholar
- Legenstein R, Maass W (2007) What makes a dynamical system computationally powerful?. In: Haykin S, Principe JC, Sejnowski TJ, McWhirter JG (eds) New directions in statistical signal processing: from systems to brains.. MIT Press, Cambridge, pp 127–154Google Scholar
- Mel BW (2008) Why have dendrites? a computational perspective, Chap. 16. In: Stuart G, Spruston N , Häusser M (eds) Dendrites, 2nd edn. Oxford University Press, OxfordGoogle Scholar
- Mumford D (1996) Pattern theory: a unifying perspective. In: Knill DC Richards W (eds) Perception as Bayesian inference. Cambridge University Press, CambridgeGoogle Scholar
- Natarajan R, Huys QJM, Dayan P, Zemel RS (2008) Encoding and decoding spikes for dynamic stimuli. Neural Comput 20(9): http://www.mitpressjournals.org/doi/pdf/10.1162/neco.2008.01-07-436 2325–2360. doi: 10.1162/neco.2008.01-07-436
- Pearlmutter BA (1989) Learning state space trajectories in recurrent neural networks. Neural Comput 1(2): http://www.mitpressjournals.org/doi/pdf/10.1162/neco.19188.8.131.523 263– 269. doi: 10.1162/neco.19184.108.40.2063
- Taylor GW, Hinton GE (2009) Factored conditional restricted boltzmann machines for modeling motion style. In: Proceedings of the 26th international conference on machine learning (ICML)Google Scholar
- Valpola H, Karhunen J (2002) An unsupervised ensemble learning method for nonlinear dynamic state-space models. Neural Comput 14(11): http://www.mitpressjournals.org/doi/pdf/10.1162/089976602760408017 2647–2692. doi: 10.1162/089976602760408017
- Williams RJ, Zipser D (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2): http://www.mitpressjournals.org/doi/pdf/10.1162/neco.19220.127.116.110 270–280. doi: 10.1162/neco.1918.104.22.1680
- Wilson R, Finkel L (2009) A neural implementation of the Kalman filter. In: Bengio Y, Schuurmans D, Lafferty J, Williams CKI, Culotta A (eds) Advances in neural information processing systems, vol 22. MIT Press, Cambridge, pp 2062–2070Google Scholar