Bounded Rational Decision-Making with Adaptive Neural Network Priors
- 2 Citations
- 3 Mentions
- 2.9k Downloads
Abstract
Bounded rationality investigates utility-optimizing decision-makers with limited information-processing power. In particular, information theoretic bounded rationality models formalize resource constraints abstractly in terms of relative Shannon information, namely the Kullback-Leibler Divergence between the agents’ prior and posterior policy. Between prior and posterior lies an anytime deliberation process that can be instantiated by sample-based evaluations of the utility function through Markov Chain Monte Carlo (MCMC) optimization. The most simple model assumes a fixed prior and can relate abstract information-theoretic processing costs to the number of sample evaluations. However, more advanced models would also address the question of learning, that is how the prior is adapted over time such that generated prior proposals become more efficient. In this work we investigate generative neural networks as priors that are optimized concurrently with anytime sample-based decision-making processes such as MCMC. We evaluate this approach on toy examples.
Keywords
Bounded rationality Variational Autoencoder Adaptive priors Markov Chain Monte Carlo1 Introduction
Intelligent agents are usually faced with the task of optimizing some utility function \(\mathbf {U}\) that is a priori unknown and can only be evaluated sample-wise. We do not restrict ourselves on the form of this function, thus in principle it could be a classification or regression loss, a reward function in a reinforcement learning environment or any other utility function. The framework of information-theoretic bounded rationality [16, 17] and related information-theoretic models [3, 14, 20, 21, 23] provide a formal framework to model agents that behave in a computationally restricted manner by modeling resource constraints through information-theoretic constraints. Such limitations also lead to the emergence of hierarchies and abstractions [5], which can be exploited to reduce computational and search effort. Recently, the main principles have been successfully applied to spiking and artificial neural networks, in particular feedforward-neural network learning problems, where the information-theoretic constraint was mainly employed as some kind of regularization [7, 11, 12, 18]. In this work we introduce bounded rational decision-making with adaptive generative neural network priors. We investigate the interaction between anytime sample-based decision-making processes and concurrent improvement of prior policies through learning, where the prior policies are parameterized as Variational Autoencoders [10]—a recently proposed generative neural network model.
The paper is structured as follows. In Sect. 2 we discuss the basic concepts of information-theoretic bounded rationality, sampled-based interpretations of bounded rationality in the context of Markov Chain Monte Carlo (MCMC), and the basic concepts of Variational Autoencoders. In Sect. 3 we present the proposed decision-making model by combining sample-based decision-making with concurrent learning of priors parameterized by Variational Autoencoders. In Sect. 4 we evaluate the model with toy examples. In Sect. 5 we discuss our results.
2 Methods
2.1 Bounded Rational Decision Making
\( {\left\{ \begin{array}{ll} \begin{array}{rcl} p(a|w) &{}=&{} \frac{1}{Z(w)}p(a) \exp (\beta _1 \mathbf {U}(w,a)) \\ p(a) &{}=&{} \sum _w \rho (w) p(a|w), \\ \end{array} \end{array}\right. } \)
where \(Z(w) = \sum _a p(a) \exp (\beta _1 \mathbf {U}(w,a)) \) is normalization factor. Computing such a normalization factor is usually computationally expensive as it involves summing over spaces with high cardinality. We avoid this by Monte Carlo approximation.
2.2 MCMC as Sample-Based Bounded Rational Decision-Making
Note that the decision of the chain will in general follow a non-equilibrium distribution, but that we can use the bounded rational optimum as a normative baseline to quantify how efficiently resources are used by analyzing how closely the bounded rational equilibrium is approximated.
2.3 Representing Prior Strategies with Variational Autoencoders
While an anytime optimization process such as MCMC can be regarded as a transformation from prior to posterior, the question remains how to choose the prior. While the prior may be assumed to be fixed, it would be far more efficient if the prior itself were subjected to an optimization process that minimizes the overall information-processing costs. Since in the case of multiple world states w the optimal prior is given by the marginal \(p(a)=\sum _w \rho (w)p(a|w)\), we can use the outputs a of the anytime decision-making process to train a generative model of the prior p(a). If the generative model was chosen from a parametric family such as a Gaussian distribution, then training would consist in updating mean and variance of the Gaussian. Choosing such a parametric family imposes restrictions on the shape of the prior, in particular in the continuous domain. Therefore, we investigate non-parametric generative models of the prior, in particular neural network models such as Variational Autoencoders (VAEs).
For each incoming world state w our model samples a prior indexed by \(x_i \thicksim p(x\vert w)\). Each prior \(p(a\vert x)\) is represented by a VAE. To arrive at the posterior policy \(p(a \vert w,x)\), an anytime MCMC optimization is seeded with \(a_0 \thicksim p(a\vert x)\) to generate a sample from \(p(a \vert w,x)\). The prior selection policy is also implemented by an MCMC chain and selects agents that have achieved high utility on a particular w.
3 Modeling Bounded Rationality with Adaptive Neural Network Priors
In this section we combine MCMC anytime decision-processes with adaptive autoencoder priors. In the case of a single world state, the combination is straightforward in that each decision selected by the MCMC process is fed as an observable input to an autoencoder. The updated autoencoder is then used as an improved prior to initialize the next MCMC decision. In case of multiple world states, there are two straightforward scenarios. In the first scenario there are as many priors as world states and each of them is updated independently. For each world state we obtain exactly the same solution as in the single world state case. In the second scenario there is only a single prior over actions for all world states. In this case the autoencoder is trained with the decisions by all MCMC chains such that the autoencoder should converge to the optimal rate distortion prior. A third, more interesting scenario occurs when we allow multiple priors, but less than world states—compare Fig. 1. This is especially plausible when dealing with continuous world states, but also in the case of large discrete spaces.
3.1 Decision Making with Multiple Priors
3.2 Model Architecture
Equation (10) describe abstractly how a two-step decision process with bounded rational decision-makers should be optimally partitioned. In this section we propose a sample-based model of a bounded rational decision process that approximately corresponds to Eq. (10) such that the performance of the decision process can be compared against its normative baseline. To translate Eq. (10) into a stochastic process we proceed in three steps. First, we implement the priors p(a|x) as Variational Autoencoders. Second, we formulate an MCMC chain that is initialized with a sample from the prior and generates a decision \(a\sim p(a|x,w)\). Third, we design an MCMC chain that functions as a selector between the different priors.
Autoencoder Priors. Each prior p(a|x) in Eq. (10) is represented by a VAE that learns to generate action samples that mimic the samples given by the MCMC chains—compare Fig. 2. The functions \(\mu _\theta (z)\), \(\mu _\eta (a)\) and \(\varSigma _\eta (a)\) are implemented as feed-forward neural networks with one hidden layer. The units in the hidden layer were all chosen with sigmoid activation function, the output units in the case of the \(\mu \)-functions were also chosen as sigmoids and for the \(\varSigma \)-function as ReLU. During training the weights \(\eta \) and \(\theta \) are adapted to optimize the expected log-likelihood of the action samples that are given by the decisions made by the MCMC chains for all world states that have been assigned to the prior p(a|x). Due to the Gaussian shape of the decoder distribution, optimizing the log-likelihood corresponds to minimizing quadratic loss of the reconstruction error. After training, the network can generate sample actions itself by feeding the decoder network with samples from \(\mathcal {N}(z|0,\mathbb {I})\).
The encoder translates the observed action into a latent variable z, whereas the decoder translates the latent variable z into a proposed action a. During training the weights \(\eta \) and \(\theta \) are adapted to optimize the expected log-likelihood of the observed action samples. After training, the network can generate actions by feeding the decoder network with samples from \(\mathcal {N}(z|0,\mathbb {I})\).
Top: The line is given by the Rate Distortion Curve that forms a theoretical efficiency frontier, characterized by the ratio between mutual information and expected utility. Crosses represent single-prior agents and dots multi-prior systems. The labels indicate how many steps were assigned to the second MCMC chain of a total of 100 steps. Bottom: Information processing and expected utility is increasing in the number of utility evaluations, as we expected.
4 Empirical Results
To demonstrate our approach we evaluate two scenarios. First, a simple agent, which is equipped with a single prior policy \(p_\eta (a)\), as introduced in Sect. 2. In case of a single agent there is no need for a prior selection stage. Second, we evaluated a multi-prior decision-making system and compared the results to the single prior agent. For the mutli-prior agent, we split a fixed number of MCMC steps between the prior selection and the action selection. The task we designed consists of six world states where each world state has a Gaussian utility function in the interval [0, 1] with a unique optimum. In both settings, we equipped the Variational Autoencoders with one hidden layer consisting of 16 units with ReLU activations. We implemented the experiments using Keras [2]. We show the results in Fig. 3.
Our results indicate that using MCMC evaluation steps as a surrogate for information processing costs can be interpreted as bounded rational decision-making. In Fig. 3 we show the efficiency of several agents with different processing constraints. To compare our results to the theoretical baseline, we discretized the action space into 100 equidistant slices and solved the problem using the algorithm proposed in [5] to implement Eq. (10). Furthermore our results indicate that the multi-prior system generally outperforms the single-prior system in terms of utility.
Our results indicate that having multiple priors is more beneficial, if more steps are available in total. Note that the stochasticity of our method decreases with the number of allowed steps, as shown by the uncertainty band (transparent regions).
5 Discussion
In this study we implemented bounded rational decision makers with adaptive priors. We achieved this with Variational Autoencoder priors. The bounded rational decision-making process was implemented by MCMC optimization to find the optimal posterior strategy, thus giving a computationally simple way of generating samples. As the number of steps in the optimization process was constrained, we could quantify the information processing capabilities of the resulting decision-makers using relative Shannon entropy. Our analysis may have interesting implications, as it provides a normative framework for this kind of combined optimization of adaptive priors and decision-making processes. Prior to our work there have been several attempts to apply the framework of information-theoretic bounded rationality to machine learning tasks [7, 11, 12, 18]. The novelty of our approach is that we design adaptive priors for both the single-step case and the multi-agent case and we demonstrate how to transform information-theoretic constraints into computational constraints in the form of MCMC steps.
Recently, the combination of Monte Carlo optimization and neural networks has gained increasing popularity. These approaches include both using MCMC processes to find optimal weights in ANNs [1, 4] and using ANNs as parametrized proposal distributions in MCMC processes [8, 13]. While our approach is more similar to the latter, the important difference is that in such adaptive MCMC approaches there is only a single MCMC chain with a single (adaptive) proposal to optimize a single task, whereas in our case there are multiple adaptive priors to initialize multiple chains with otherwise fixed proposal, which can be used to learn multiple tasks simultaneously. In that sense our work is more related to mixture-of-experts methods and divide-and-conquer paradigms [6, 9, 24], where we employ a selection policy rather than a blending policy, as we design our model specifically to encourage specialization. In mixture-of-experts models, there are multiple decision-makers that correspond to multiple priors in our case, but experts are typically not modeled as anytime optimization processes. The possibly most popular combination of neural network learning with Monte Carlo methods was achieved by AlphaGo [19], which beat the leading Go champion by optimizing the strategies provided by value networks and policy networks with Monte Carlo Tree Search, leading to a major breakthrough in reinforcement learning. An important difference here is that the neural network is used to directly approximate the posterior and MCMC is used to improve performance by concentrating on the most promising moves during learning, whereas in our case ANNs are used to represent the prior. Moreover, in our work we assumed the utility function (i.e. the value network) to be given. For future work it would be interesting to investigate how to incorporate learning the utility function into our model to investigate more complex scenarios such as in reinforcement learning.
Notes
Acknowledgement
This work was supported by the European Research Council Starting Grant BRISC, ERC-STG-2015, Project ID 678082.
References
- 1.Andrieu, C., De Freitas, N., Doucet, A.: Reversible jump MCMC simulated annealing for neural networks. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 11–18. Morgan Kaufmann Publishers Inc. (2000)Google Scholar
- 2.Chollet, F., et al.: Keras (2015). https://keras.io
- 3.Vul, E., Goodman, N., Griffiths, T.L., Tenenbaum, J.B.: One and done? Optimal decisions from very few samples. Cogn. Sci. 38(4), 599–637 (2014)CrossRefGoogle Scholar
- 4.Freitas, J., Niranjan, M., Gee, A.H., Doucet, A.: Sequential Monte Carlo methods to train neural network models. Neural Comput. 12(4), 955–993 (2000)CrossRefGoogle Scholar
- 5.Genewein, T., Leibfried, F., Grau-Moya, J., Braun, D.A.: Bounded rationality, abstraction, and hierarchical decision-making: an information-theoretic optimality principle. Front. Robot. AI 2, 27 (2015)CrossRefGoogle Scholar
- 6.Ghosh, D., Singh, A., Rajeswaran, A., Kumar, V., Levine, S.: Divide-and-conquer reinforcement learning. arXiv preprint arXiv:1711.09874 (2017)
- 7.Grau-Moya, J., Leibfried, F., Genewein, T., Braun, D.A.: Planning with information-processing constraints and model uncertainty in Markov decision processes. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9852, pp. 475–491. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46227-1_30CrossRefGoogle Scholar
- 8.Gu, S., Ghahramani, Z., Turner, R.E.: Neural adaptive sequential Monte Carlo. In: Advances in Neural Information Processing Systems, pp. 2629–2637 (2015)Google Scholar
- 9.Haruno, M., Wolpert, D.M., Kawato, M.: Mosaic model for sensorimotor learning and control. Neural Comput. 13(10), 2201–2220 (2001)CrossRefGoogle Scholar
- 10.Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
- 11.Leibfried, F., Braun, D.A.: A reward-maximizing spiking neuron as a bounded rational decision maker. Neural Comput. 27(8), 1686–1720 (2015)CrossRefGoogle Scholar
- 12.Leibfried, F., Grau-Moya, J., Ammar, H.B.: An information-theoretic optimality principle for deep reinforcement learning. arXiv preprint arXiv:1708.01867 (2017)
- 13.Levy, D., Hoffman, M.D., Sohl-Dickstein, J.: Generalizing Hamiltonian Monte Carlo with neural networks. In: International Conference on Learning Representations (2018)Google Scholar
- 14.Lewis, R.L., Howes, A., Singh, S.: Computational rationality: linking mechanism and behavior through bounded utility maximization. Top. Cogn. Sci. 6(2), 279–311 (2014)CrossRefGoogle Scholar
- 15.MacKay, D.J.C.: Introduction to Monte Carlo methods. In: Jordan, M.I. (ed.) Learning in Graphical Models. ASID, vol. 89, pp. 175–204. Springer, Dordrecht (1998). https://doi.org/10.1007/978-94-011-5014-9_7CrossRefGoogle Scholar
- 16.Ortega, P.A., Braun, D.A.: Thermodynamics as a theory of decision-making with information-processing costs. Proc. R. Soc. Lond. A: Math. Phys. Eng. Sci. 469(2153) (2013)Google Scholar
- 17.Ortega, P.A., Braun, D.A., Dyer, J., Kim, K.E., Tishby, N.: Information-theoretic bounded rationality. arXiv preprint arXiv:1512.06789 (2015)
- 18.Peng, Z., Genewein, T., Leibfried, F., Braun, D.A.: An information-theoretic on-line update principle for perception-action coupling. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 789–796. IEEE (2017)Google Scholar
- 19.Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
- 20.Tishby, N., Polani, D.: Information theory of decisions and actions. In: Cutsuridis, V., Hussain, A., Taylor, J. (eds.) Perception-Action Cycle: Models, Architectures, and Hardware. SSCNS, pp. 601–636. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-1452-1_19CrossRefGoogle Scholar
- 21.Todorov, E.: Efficient computation of optimal actions. Proc. Natl. Acad. Sci. 106(28), 11478–11483 (2009)CrossRefGoogle Scholar
- 22.Von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior, Commemorative edn. Princeton University Press, Princeton (2007)zbMATHGoogle Scholar
- 23.Wolpert, D.H.: Information theory - the bridge connecting bounded rational game theory and statistical physics. In: Braha, D., Minai, A., Bar-Yam, Y. (eds.) Complex Engineered Systems: Science Meets Technology. UCS, pp. 262–290. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-32834-3_12CrossRefGoogle Scholar
- 24.Yuksel, S.E., Wilson, J.N., Gader, P.D.: Twenty years of mixture of experts. IEEE Trans. Neural Netw. Learn. Syst. 23(8), 1177–1193 (2012)CrossRefGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.