Abstract
How do we ensure that future generally intelligent AI share our values? This is the value-alignment problem. It is a weighty matter. After all, if AI are neutral with respect to our wellbeing, or worse, actively hostile toward us, then they pose an existential threat to humanity. Some philosophers have argued that one important way in which we can mitigate this threat is to develop only AI that shares our values or that has values that ‘align with’ ours. However, there is nothing to guarantee that this policy will be universally implemented—in particular, ‘bad actors’ are likely to flout it. In this paper, I show how the predictive processing model of the mind, currently ascendant in cognitive science, may ameliorate the value-alignment problem. In essence, I argue that there are a plurality of reasons why any future generally intelligent AI will possess a predictive processing cognitive architecture (e.g. because we decide to build them that way; because it is the only possible cognitive architecture that can underpin general intelligence; because it is the easiest way to create AI.). I also argue that if future generally intelligent AI possess a predictive processing cognitive architecture, then they will come to share our pro-moral motivations (of valuing humanity as an end; avoiding maleficent actions; etc.), regardless of their initial motivation set. Consequently, these AI will pose a minimal threat to humanity. In this way then, I conclude, the value-alignment problem is significantly ameliorated under the assumption that future generally intelligent AI will possess a predictive processing cognitive architecture.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Of course, this threat will only be a ‘live’ or ‘pressing’ one if the AI has a significant chance of realizing its ambitions. See Chalmers (2010) for a detailed discussion of why super-intelligent AI–AI whose intellect dwarfs our own—are both highly likely, if generally intelligent AI is possible at all, and highly likely to possess the means to pose a real threat to humanity.
Those familiar with the predictive processing literature should note that I am, following Hohwy (2013) and Clark (2015), assuming here a cognitivist and/or representationalist interpretation of the predictive processing model. This cognitivist/representationalist reading is questioned, and alternative non-cognitivist or non-representationalist interpretations of predictive processing are discussed, in—for example—Kirchhoff & Robertson (2018) and Downey (2018). The interested reader should consult these references for further discussion. I cannot defend the cognitivist/representationalist reading here. Rather, I shall simply be assuming it.
By ‘predictive processors’ I mean proponents of the predictive processing model of the mind.
The reader may remain skeptical (justifiably, by my lights) over the prospects for an adequate predictive processing theory of desire and motivation. The reader can consult, for example, Klein (2020) for a sustained argument that the predictive processing model cannot adequately account in principle for the phenomenon of desire.
Such desires to behave in the (de re) moral ways include, for example, the desire to care for conspecifics or the desire to avoid harming others without excuse etc. etc.
It might be thought that (anti-Humean) Realism presents an attractive solution to the value-alignment problem. After all, many such Realists hold that an agent’s moral beliefs give her overriding motivation to act as they indicate she is morally required to act (at least, when she fully comprehends the contents of these moral beliefs). Consequently, if Realism is true, and if generally intelligent AI are capable of having moral intuitions, in light of which they form the same moral beliefs as we do, then we should expect such AI to share our pro-moral motivations. However, under these assumptions, there is, on the face of it, nothing to stop ‘bad actors’ from creating generally intelligent AI that lack the capacity to have moral intuitions or moral beliefs—either by omitting to program something like a faculty of moral sense that produces such moral intuitions, or by damaging or removing it after creation. For this reason then, the assumption of Realism does not constitute an amelioration of the value-alignment problem relative to the standard solution. I will therefore abstain from any further discussion of Realism in this paper.
The reader might ask: ‘what if act consequentialism is true?’. Of course, if act consequentialism is true, then the majority of actions ever performed will have been wrong, since they were not the optimific action out of those available. However, I am assuming here that commonsense morality (or something near enough) is true. Act consequentialism is highly revisionary with respect to commonsense morality and thus (I will assume here that it is) false. Rather, I am assuming here that commonsense morality—morality as it is conceived by the proverbial ‘man on the Clapham Omnibus’, and theorized by philosophical deontologists (rights to life and non-interference etc.)—is true.
If the first-generation of generally intelligent AI can create new AI themselves, then the second-generation of generally intelligent AI may be the product, not of humans, but of this first-generation.
My reasoning here mirrors David Chalmer’s (2010) discussion of how the value-alignment problem is ameliorated when assuming Kantian psychology and moral philosophy. In brief, Kantian moral philosophy has it that morality is rationally required for any agent capable of grasping and reflecting on their reasons for action. This account therefore entails that any perfectly rational agent will be perfectly moral. Granting that intelligent correlates with rationality, it therefore follows, for the Kantian, that super-intelligent AI will be (close to) perfectly moral.
Here I use the locution ‘rational agent’, not to mean an agent that is appropriately responsive to her reasons, but rather to mean an agent that is a person—namely, a thinker capable of self-conscious reflection on her own attitudes (such as a normal adult human in contrast to, say, a chicken).
References
Adams, R., Shipp, S., & Friston, K. (2012). Predictions not commands: Active inference in the motor system. Brain Structure Function, 218(3), 611–643.
Baraglia, J., Nagai, Y. & Asada, M. (2014). Prediction error minimization for emergence of altruistic behavior. In 4th international conference on development and learning and on epigenetic robotics.
Blackburn, S. (1998). Ruling passions: A theory of practical reasoning. Oxford University Press.
Bostrom, N. (2012). The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines, 22(2), 71–85.
Bostrom, N. (2014). Superintelligence: Paths, dangers. Oxford University Press.
Botvinick, M., & Toussaint, M. (2012). Planning as inference. Trends in Cognitive Science, 16(10), 485–488.
Chalmers, D. (2010). The singularity: A philosophical analysis. Journal of Consciousness Studies, 17(9–10), 7–65.
Clark, A. (2013a). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204.
Clark, A. (2013b). Expecting the world: Perception, prediction, and the origin of human knowledge. The Journal of Philosophy, 15(9), 469–496.
Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied mind. Oxford University Press.
Clark, A. (2019). Beyond desire? Agency, choice, and the predictive mind. Australasian Journal of Philosophy, 98, 1–15.
Cullen, M., Davey, B., Friston, K. J., & Moran, R. J. (2018). Active inference in OpenAI gym: A paradigm for computational investigations into psychiatric illness. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 3(9), 809–818.
Davidson, D. (1985). Essays on actions and events. Oxford University Press.
Dennett, D. (1987). The intentional stance. MIT Press.
Downey, A. (2018). Predictive processing and the representation wars: A victory for the eliminativist (via Fictionalism). Synthese, 195, 5115–5139.
Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society B, 360(1456), 815–836.
Friston, K. (2012). Free-energy minimization and the dark-room problem. Frontiers in Psychology, 2012(3), 130.
Friston, K. (2013). Active inference and free energy: commentary on Andy Clark’s ‘predictive brains, situated agents, and the future of cognitive science.’ Behavioral and Brain Sciences, 36(3), 212–213.
Friston, K., & Stephan, K. (2007). Free energy and the brain. Synthese, 159, 417–458.
Friston, K., Kilner, J., & Harrison, L. (2006). A free energy principle for the brain. Journal of Physiology Paris, 100(1–3), 70–87.
Friston, K., Mattout, J., & Kilner, J. (2011). Action understanding and active inference. Biological Cybernetics, 104, 137–160.
Friston, K., Adams, R., & Montague, R. (2012). What is value—accumulated reward or evidence? Frontiers in Neurorobotics. https://doi.org/10.3389/fnbot.2012.00011
Hohwy, J. (2013). The predictive mind. Oxford University Press.
Kirchhoff, M., & Robertson, I. (2018). Enactivism and predictive processing: A non-representational view. Philosophical Explorations, 21(2), 264–281.
Korsgaard, C. (2009). Self-constitution: Agency, identity, and integrity. Oxford University Press.
Klein, C. (2018). What do predictive coders want? Synthese, 95(6), 2451–2557.
Klein, C. (2020). A humean challenge to predictive coding. In S. Gouveia, D. Mendonca, & M. Curado (Eds.), The philosophy and science of predictive processing. Bloomsbury Press.
McDowell, J. (1978). Are moral requirements hypothetical imperatives? Proceedings of the Aristotelian Society, 52, 13–29.
McDowell, J. (1979). Virtue and reason. The Monist, 62(3), 331–350.
Nagel, T. (1970). The possibility of altruism. Oxford Clarendon Press.
Smith, M. (1987). The humean theory of motivation. Mind, 96, 36–61.
Smith, M. (1994). The moral problem. Blackwell Publishers.
Shafer-Landau, R. (2003). Moral realism: A defense. Oxford University Press.
Solway, A., & Botvinick, M. (2012). Goal-directed decision making as probabilistic inference: A computational framework and potential neural correlates. Psychological Review, 119(1), 120–154.
Sun, Z., & Firestone, C. (2020). The dark room problem. Trends in Cognitive Science, 24, 346–348.
Tomasello, M. (2016). A natural history of human morality. Harvard University Press.
Van de Cruys, S., Friston, K., & Clark, A. (2020). Controlled optimism: Reply to sun and firestone on the dark room problem. Trends in Cognitive Sciences, 24(9), 680–681.
Wedgwood, R. (2004). The metaethicists’ mistake. Philosophical Perspectives, 18, 405–426.
Wedgwood, R. (2007). The nature of normativity. Clarendon Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ratoff, W. Can the predictive processing model of the mind ameliorate the value-alignment problem?. Ethics Inf Technol 23, 739–750 (2021). https://doi.org/10.1007/s10676-021-09611-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10676-021-09611-0