Abstract
In this chapter, we discuss a host of technical problems that we think AI scientists could work on to ensure that the creation of smarter-than-human machine intelligence has a positive impact. Although such systems may be decades away, it is prudent to begin research early: the technical challenges involved in safety and reliability work appear formidable, and uniquely consequential. Our technical agenda discusses three broad categories of research where we think foundational research today could make it easier in the future to develop superintelligent systems that are reliably aligned with human interests:
-
1.
Highly reliable agent designs: how to ensure that we built the right system.
-
2.
Error tolerance: how to ensure that the inevitable flaws are manageable and correctable.
-
3.
Value specification: how to ensure that the system is pursuing the right sorts of objectives.
Since little is known about the design or implementation details of such systems, the research described in this chapter focuses on formal agent foundations for AI alignment research—that is, on developing the basic conceptual tools and theory that are most likely to be useful for engineering robustly beneficial systems in the future.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A more careful wording might be “aligned with the interests of sentient beings.” We would not want to benefit humans at the expense of sentient non-human animals—or (if we build them) at the expense of sentient machines.
- 2.
Since the Dartmouth Proposal (McCarthy et al. 1955), it has been a standard idea in AI that a sufficiently smart machine intelligence could be intelligent enough to improve itself. In 1965, I.J. Good observed that this might create a positive feedback loop leading to an “intelligence explosion” (Good 1965). Sotala and Yampolskiy (2015, Sect. 2.3, this volume) and Bostrom (2014, Chap. 14) has observed that an intelligence explosion is especially likely if the agent has the ability to acquire more hardware, improve its software, or design new hardware.
- 3.
Legg and Hutter (2007) provide a preliminary answer to this question, by defining a “universal measure of intelligence” which scores how well an agent can learn the features of an external environment and maximize a reward function. This is the type of formalization we are looking for: a scoring metric which describes how well an agent would achieve some set of goals. However, while the Legg-Hutter metric is insightful, it makes a number of simplifying assumptions, and many difficult open questions remain (Soares 2015).
- 4.
As this is a multi-agent scenario, the problem of counterfactuals can also be thought of as game-theoretic. The goal is to define a procedure which reliably identifies the best available action; the label of “decision theory” is secondary. This goal subsumes both game theory and decision theory: the desired procedure must identify the best action in all settings, even when there is no clear demarcation between “agent” and “environment.” Game theory informs, but does not define, this area of research.
- 5.
Of course, if an agent reasons perfectly under logical uncertainty, it would also reason well about the construction of successor agents. However, given the fallibility of human reasoning and the fact that this path is critically important, it seems prudent to verify the agent’s reasoning methods in this scenario specifically.
- 6.
Or of all humans, or of all sapient creatures, etc. There are many philosophical concerns surrounding what sort of goals are ethical when aligning a superintelligent system, but a solution to the value learning problem will be a practical necessity regardless of which philosophical view is the correct one.
References
Armstrong S (2015) AI motivated value selection, accepted to the 1st International Workshop on AI and Ethics, held within the 29th AAAI Conference on Artificial Intelligence (AAAI-2015), Austin, TX
Armstrong S, Sandberg A, Bostrom N (2012) Thinking inside the box: Controlling and using an oracle AI. Minds and Machines 22(4):299–324
Bárász M, Christiano P, Fallenstein B, Herreshoff M, LaVictoire P, Yudkowsky E (2014) Robust cooperation in the Prisoner’s Dilemma: Program equilibrium via provability logic, unpublished manuscript. Available via arXiv. http://arxiv.org/abs/1401.5577
Ben-Porath E (1997) Rationality, Nash equilibrium, and backwards induction in perfect-information games. Review of Economic Studies 64(1):23–46
Bensinger R (2013) Building phenomenological bridges. Less Wrong Blog http://lesswrong.com/lw/jd9/building_phenomenological_bridges/
Bird J, Layzell P (2002) The evolved radio and its implications for modelling the evolution of novel sensors. In: Proceedings of the 2002 Congress on Evolutionary Computation. Vol. 2, IEEE, Honolulu, HI, pp 1836–1841
Bostrom N (2014) Superintelligence: Paths, Dangers, Strategies. Oxford University Press, New York
Christiano P (2014a) Non-omniscience, probabilistic inference, and metamathematics. Tech. Rep. 2014–3, Machine Intelligence Research Institute, Berkeley, CA, http://intelligence.org/files/Non-Omniscience.pdf
Christiano P (2014b) Specifying “enlightened judgment” precisely (reprise). Ordinary Ideas Blog http://ordinaryideas.wordpress.com/2014/08/27/specifying-enlightened-judgment-precisely-reprise/
de Blanc P (2011) Ontological crises in artificial agents’ value systems. Tech. rep., The Singularity Institute, San Francisco, CA, http://arxiv.org/abs/1105.3821
Demski A (2012) Logical prior probability. In: Bach J, Goertzel B, Iklé M (eds) Artificial General Intelligence, Springer, New York, 7716, pp 50–59, 5th International Conference, AGI 2012, Oxford, UK, December 8–11, 2012. Proceedings
Fallenstein B (2014) Procrastination in probabilistic logic. Working paper, Machine Intelligence Research Institute, Berkeley, CA, http://intelligence.org/files/ProbabilisticLogicProcrastinates.pdf
Fallenstein B, Soares N (2014) Problems of self-reference in self-improving space-time embedded intelligence. In: Goertzel B, Orseau L, Snaider J (eds) Artificial General Intelligence, Springer, New York, 8598, pp 21–32, 7th International Conference, AGI 2014, Quebec City, QC, Canada, August 1–4, 2014. Proceedings
Fallenstein B, Soares N (2015) Vingean reflection: Reliable reasoning for self-improving agents. Tech. Rep. 2015–2, Machine Intelligence Research Institute, Berkeley, CA, https://intelligence.org/files/VingeanReflection.pdf
Gaifman H (1964) Concerning measures in first order calculi. Israel Journal of Mathematics 2(1):1–18
Gaifman H (2004) Reasoning with limited resources and assigning probabilities to arithmetical statements. Synthese 140(1–2):97–119
Gödel K, Kleene SC, Rosser JB (1934) On Undecidable Propositions of Formal Mathematical Systems. Institute for Advanced Study, Princeton, NJ
Good IJ (1965) Speculations concerning the first ultraintelligent machine. In: Alt FL, Rubinoff M (eds) Advances in Computers, vol 6, Academic Press, New York, pp 31–88
Halpern JY (2003) Reasoning about Uncertainty. MIT Press, Cambridge, MA
Hintze D (2014) Problem class dominance in predictive dilemmas. Tech. rep., Machine Intelligence Research Institute, Berkeley, CA, http://intelligence.org/files/ProblemClassDominance.pdf
Hutter M (2000) A theory of universal artificial intelligence based on algorithmic complexity, unpublished manuscript. Available via arXiv. http://arxiv.org/abs/cs/0004001
Hutter M, Lloyd JW, Ng KS, Uther WTB (2013) Probabilities on sentences in an expressive logic. Journal of Applied Logic 11(4):386–420
Jeffrey RC (1983) The Logic of Decision, 2nd edn. Chicago University Press, Chicago, IL
Joyce JM (1999) The Foundations of Causal Decision Theory. Cambridge Studies in Probability, Induction and Decision Theory, Cambridge University Press, New York, NY
Legg S, Hutter M (2007) Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4):391–444
Lehmann EL (1950) Some principles of the theory of testing hypotheses. Annals of Mathematical Statistics 21(1):1–26
Lewis D (1979) Prisoners’ dilemma is a Newcomb problem. Philosophy & Public Affairs 8(3):235–240, http://www.jstor.org/stable/2265034
Lewis D (1981) Causal decision theory. Australasian Journal of Philosophy 59(1):5–30
Łoś J (1955) On the axiomatic treatment of probability. Colloquium Mathematicae 3(2):125–137, http://eudml.org/doc/209996
MacAskill W (2014) Normative uncertainty. PhD thesis, St Anne’s College, University of Oxford, http://ora.ox.ac.uk/objects/uuid:8a8b60af-47cd-4abc-9d29-400136c89c0f
McCarthy J, Minsky M, Rochester N, Shannon C (1955) A proposal for the Dartmouth summer research project on artificial intelligence. Proposal, Formal Reasoning Group, Stanford University, Stanford, CA
Muehlhauser L, Salamon A (2012) Intelligence explosion: Evidence and import. In: Eden A, Søraker J, Moor JH, Steinhart E (eds) Singularity Hypotheses: A Scientific and Philosophical Assessment, Springer, Berlin, the Frontiers Collection
Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Langley P (ed) Proceedings of the Seventeenth International Conference on Machine Learning (ICML-’00), Morgan Kaufmann, San Francisco, pp 663–670
Omohundro SM (2008) The basic AI drives. In: Wang P, Goertzel B, Franklin S (eds) Artificial General Intelligence 2008, IOS, Amsterdam, no. 171 in Frontiers in Artificial Intelligence and Applications, pp 483–492, proceedings of the First AGI Conference
Pearl J (2000) Causality: Models, Reasoning, and Inference, 1st edn. Cambridge University Press, New York, NY
Poe EA (1836) Maelzel’s chess-player. Southern Literary Messenger 2(5):318–326
Rapoport A, Chammah AM (1965) Prisoner’s Dilemma: A Study in Conflict and Cooperation, Ann Arbor Paperbacks, vol 165. University of Michigan Press, Ann Arbor, MI
Russell S (2014) Unifying logic and probability: A new dawn for AI? In: Information Processing and Management of Uncertainty in Knowledge-Based Systems: 15th International Conference, IPMU 2014, Montpellier, France, July 15–19, 2014, Proceedings, Part I, Springer, no. 442 in Communications in Computer and Information Science, pp 10–14
Sawin W, Demski A (2013) Computable probability distributions which converge on \(\pi _1\) will disbelieve true \(\pi _2\) sentences. Tech. rep., Machine Intelligence Research Institute, Berkeley, CA, http://intelligence.org/files/Pi1Pi2Problem.pdf
Shannon CE (1950) XXII. Programming a computer for playing chess. Philosophical Magazine 41(314):256–275
Soares N (2014) Tiling agents in causal graphs. Tech. Rep. 2014–5, Machine Intelligence Research Institute, Berkeley, CA, http://intelligence.org/files/TilingAgentsCausalGraphs.pdf
Soares N (2015) Formalizing two problems of realistic world-models. Tech. Rep. 2015–3, Machine Intelligence Research Institute, Berkeley, CA, https://intelligence.org/files/RealisticWorldModels.pdf
Soares N (2016) The value learning problem. In: Ethics for Artificial Intelligence Workshop at the 25th International Joint Conference on Artificial Intelligence (IJCAI-16). New York, NY, July 9th-15th
Soares N, Fallenstein B (2014) Toward idealized decision theory. Tech. Rep. 2014–7, Machine Intelligence Research Institute, Berkeley, CA, https://intelligence.org/files/TowardIdealizedDecisionTheory.pdf
Soares N, Fallenstein B (2015) Questions of reasoning under logical uncertainty. Tech. Rep. 2015–1, Machine Intelligence Research Institute, Berkeley, CA, https://intelligence.org/files/QuestionsLogicalUncertainty.pdf
Solomonoff RJ (1964) A formal theory of inductive inference. Part I. Information and Control 7(1):1–22
United Kingdom Ministry of Defense (1991) Requirements for the procurement of safety critical software in defence equipment. Interim Defence Standard 00-55, United Kingdom Ministry of Defense
United States Department of Defense (1985) Department of Defense trusted computer system evaluation criteria. Department of Defense Standard DOD 5200.28-STD, United States Department of Defense, http://csrc.nist.gov/publications/history/dod85.pdf
Vinge V (1993) The coming technological singularity: How to survive in the post-human era. In: Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace, NASA Lewis Research Center, no. 10129 in NASA Conference Publication, pp 11–22, http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19940022856.pdf
Wald A (1939) Contributions to the theory of statistical estimation and testing hypotheses. Annals of Mathematical Statistics 10(4):299–326
Weld D, Etzioni O (1994) The first law of robotics (a call to arms). In: Hayes-Roth B, Korf RE (eds) Proceedings of the Twelfth National Conference on Artificial Intelligence, AAAI Press, Menlo Park, CA, pp 1042–1047, http://www.aaai.org/Papers/AAAI/1994/AAAI94-160.pdf
Yudkowsky E (2008) Artificial intelligence as a positive and negative factor in global risk. In: Bostrom N, Ćirković MM (eds) Global Catastrophic Risks, Oxford University Press, New York, pp 308–345
Yudkowsky E (2011) Complex value systems in Friendly AI. In: Schmidhuber J, Thórisson KR, Looks M (eds) Artificial General Intelligence, Springer, Berlin, no. 6830 in Lecture Notes in Computer Science, pp 388–393, 4th International Conference, AGI 2011, Mountain View, CA, USA, August 3–6, 2011. Proceedings
Yudkowsky E (2013) The procrastination paradox. Brief technical note, Machine Intelligence Research Institute, Berkeley, CA, http://intelligence.org/files/ProcrastinationParadox.pdf
Yudkowsky E (2014) Distributions allowing tiling of staged subjective EU maximizers. Tech. rep., Machine Intelligence Research Institute, Berkeley, CA, http://intelligence.org/files/DistributionsAllowingTiling.pdf
Yudkowsky E, Herreshoff M (2013) Tiling agents for self-modifying AI, and the Löbian obstacle. Early draft, Machine Intelligence Research Institute, Berkeley, CA, http://intelligence.org/files/TilingAgents.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer-Verlag GmbH Germany
About this chapter
Cite this chapter
Soares, N., Fallenstein, B. (2017). Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda. In: Callaghan, V., Miller, J., Yampolskiy, R., Armstrong, S. (eds) The Technological Singularity. The Frontiers Collection. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-54033-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-54033-6_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-54031-2
Online ISBN: 978-3-662-54033-6
eBook Packages: Religion and PhilosophyPhilosophy and Religion (R0)