The value alignment problem: a geometric approach

Peterson, Martin

doi:10.1007/s10676-018-9486-0

The value alignment problem: a geometric approach

Original Paper
Published: 03 November 2018

Volume 21, pages 19–28, (2019)
Cite this article

Ethics and Information Technology Aims and scope Submit manuscript

Martin Peterson¹

1490 Accesses
2 Altmetric
Explore all metrics

Abstract

Stuart Russell defines the value alignment problem as follows: How can we build autonomous systems with values that “are aligned with those of the human race”? In this article I outline some distinctions that are useful for understanding the value alignment problem and then propose a solution: I argue that the methods currently applied by computer scientists for embedding moral values in autonomous systems can be improved by representing moral principles as conceptual spaces, i.e. as Voronoi tessellations of morally similar choice situations located in a multidimensional geometric space. The advantage of my preferred geometric approach is that it can be implemented without specifying any utility function ex ante.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Tesla’s autopilot mode is marketed as a semi-autonomous system, not as a fully autonomous one.
For an overview of al three accidents, see The Guardian (March 31, 2018).
I leave it open whether autonoumous systems make decisions, or if all decisions are ultimately made by the engineers who design these systems. For the purposes of this paper there is no need to ascribe moral agency to autonomous systems.
Christopher von Hugo, manager of driver assistance and active safety at Mercedes-Benz, announced at the Paris auto show in 2016 that autonomous vehicles should always prioritize occupant safety over pedestrians. See Taylor (2016). It leave it to the reader to determine whether Mr. Hugo was speaking on behalf of his employer or merely expressing his personal opinion.
See e.g. Goodall (2016) Carfwod (2016), but note that Nyholm and Smids (2016) question the analogy.
See Bostrom (2014) for an extensive discussion of this topic. See also Dafoe and Russell (2016).
The quote is from a talk Dr. Russell gave at the World Economic Forum in Davos, Switzerland in Januray 2015. The talk is available on Youtube (https://www.youtube.com/watch?v=WvmeTaFc_Qw). Russell has also expressed the same idea in the papers listed in the references.
Russell (2016. p. 59).
See e.g. Bostrom (2014) and Milli et al. (2017).
If an ethical theory ranks some options as infinitely better than others, or entails cyclical orderings, then no real-valued utility function could mimic the prescriptions of such an ethical theory. It is also an open question whether the “theory” I sketch in this article could be represented by some real-valued utility function. (This depends on how we understand the ranking of domain-specific principles.) Brown (2011) also points out that no real-valued utility function can account for the existence of moral dilemmas. See Peterson (2013, Chap. 8) for a discussion of how hyper-real utility functions could help us overcome this problem.
For reasons explained in the previous footnote, a problem with this suggestion might be that no real-valued utility function can account for Aristotle’s notion of supererogation. See Peterson (2013, Chap. 8).
IEEE (2017a).
IEEE (2017a, pp. 23, 36).
IEEE (2017a, p. 20).
For an overview, see Attfiled (2014).
Hadfield-Menell et al. (2016, p. 2).
Milli et al. (2017, p. 1).
IEEE (2017b, p. 1).
This is a fundamenatal assumption in Bostrom (2014) and, for instance, Milli et al. (2017), but it has far as I am aware never been exstensively discussed.
For reasons explained in the previous footnote, a problem with this suggestion might be that no real-valued utility function can account for Aristotle’s notion of supererogation. See Peterson (2013, Chap. 8).
Whether my proposal can be mimicked by some real-valued utility function is an open question (as noted in footnote 10), and also irrelevant. What matters is that my proposal can be implemented in a machine without explicitly ascribing utilities to outcomes or alternatives. From an epistemic point of this, this is a clear advantage over the utility-based approach.
The section draws on Chapter 1 in ET.
See ET, pp. 14–15.
See Nicomachean Ethics 1131a10–b15; Politics, III.9.1280 a8–15, III. 12. 1282b18–23.
See Jonsen and Toulmin (1988) for a defense of causuistry.
CBA: An option is morally right only if the net surplus of benefits over costs for all those affected is at least as large as that of every alternative.
PP: An option is morally right only if reasonable precautionary measures are taken to safeguard against uncertain but non-negligible threats.
ST: An option is morally right only if it does not lead to any significant long-term depletion of natural, social or economic resources.
AUT: An option is morally right only if it does not reduce the independence, self-governance or freedom of the people affected by it.
FP: An option is morally right only if it does not lead to unfair inequalities among the people affected by it.
Note that I am not claiming that all ethical theories are false. I am merely suggesting that it is not neceesary to take a stance on which theory is correct in order to align the values of autonous systems with ours in the manner specified in the moderate value alignment thesis.
It is of course possible that the majority is wrong. I am not trying to derive an “ought” from an is’; see Chap. 3 of ET for a discussion of Hume’s Is-Ought principle.
A reviwer has suggested that it would be helpful to clarify how the geometric method differs from Rawls’ method of reflective equilibrium. The most important difference is that unlike Rawls’ method, the geomtric method is compatible with coherentistic as well as foundationalist principles. The ex ante mechanism for selecting paradigm cases outlined in Chapter 2 of ET assigns a priviliged, foundational role to paradigm cases. The ex post mechanism discussed in the same chapter is coherentistic in the sense that the location of the paradigm cases depends on what cases the principle has been applied to in the past.
See Chapter 8 of ET.
See Chapters 1 and 2. See also the experimental evidence report in Chapters 3 and 5.
See e.g. Peterson (2013) for a defense of this view.
I would like to thank Rob Reed for suggseting this helpful point to me.
See, for instance, Gavagai.se.
Shrader-Frechette (2017).
Peterson (2017, pp. 37–38).
Stewart et al. (1973, pp. 415–417), my italics.
Kruskal and Wish (1978, pp. 30–31), my italics.
Lokhorst (2018, p. 1).
ET, p. 17.
Peterson (2017, p. 17).

References

Anderson, M., & Anderson, S. L. (2014). GenEth: A general ethical dilemma analyzer.” Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (2014): 253–261.
Attfield, R. (2014). Environmental ethics: An overview for the twenty-first century. New York: Wiley.
Google Scholar
Bostrom, N. (2014). Superintelligence. Oxford: Oxford University Press.
Google Scholar
Brown, C. (2011). Consequentialize this. Ethics, 121(4), 749–771.
Article Google Scholar
Crawford, K., & Calo, R. (2016). There is a blind spot in AI research. Nature, 538(7625).
Dafoe, A., & Russell, S. (2016). Yes, we are worried about the existential risk of artificial intelligence. MIT Technology Review.
Gärdenfors, P. (2000). Conceptual spaces: The geometry of thought. Cambridge: MIT Press.
Book Google Scholar
Gärdenfors, P. (2014). The geometry of meaning: Semantics based on conceptual spaces. Cambridge: MIT Press.
MATH Google Scholar
Goodall, N. J. (2016). Can you program ethics into a self-driving car? IEEE Spectrum, 53(6), 28–58.
Article Google Scholar
Guardian Staff and Agencies, (2018). Tesla car that crashed and killed driver was running on Autopilot, firm says. The Guardian, March 31st, 2018.
Hadfield-Menell, D., Dragan, A., Abbeel, P., & Russell, S. (2016). “The off-switch game”, arXiv preprint arXiv: 1611.08219.
IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. (2017a). “Ethically Aligned Design (EAD) - Version 2.” Retrieved January 26, 2018, from http://standards.ieee.org/develop/indconn/ec/autonomous_systems.html.
IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. (2017b). “Classical Ethics in A/IS” Retrieved January 26, 2018, from https://standards.ieee.org/develop/indconn/ec/ead_classical_ethics_ais_v2.pdf.
Jonsen, A. R., & Toulmin, S. E. (1988). The abuse of casuistry: A history of moral reasoning. University of California Press.
Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. New York: Sage Publications.
Book Google Scholar
Lokhorst, G. J. C. (2018). Science and Engineering Ethics.“, 415–417. https://doi.org/10.1007/s11948-017-0014-0.
Milli, S., Hadfield-Menell, D., Dragan, A., & Russell, S. (2017). “Should Robots be Obedient?”. arXiv preprint arXiv.1705.09990.
Nyholm, S., & Smids, J. (2016). The ethics of accident-algorithms for self-driving cars: An applied trolley problem? Ethical Theory and Moral Practice, 19(5), 1275–1289.
Article Google Scholar
Paulo, N. (2015). Casuistry as common law morality. Theoretical Medicine and Bioethics, 36(6), 373–389.
Article Google Scholar
Peterson, M. (2013). The dimensions of consequentialism: Ethics, equality and risk. Cambridge University Press.
Peterson, M. (2017). The ethics of technology: A geometric analysis of five moral principles. Oxford: Oxford University Press.
Book Google Scholar
Peterson, M. (2018). The ethics of technology: Response to critics. Science and Engineering Ethics. https://doi.org/10.1007/s119.
Google Scholar
Rosch, E. (1975). Cognitive reference points. Cognitive Psychology, 7, 532–547.
Article Google Scholar
Rosch, E. H. (1973). Natural categories. Cognitive Psychology, 4, 328–350.
Article Google Scholar
Russell, S. (2016). Should we fear supersmart robots. Scientific American, 314(6), 58–59.
Article Google Scholar
Shrader-Frechette, K. (2017). Review of the ethics of technology: A geometric analysis of five moral principles. Notre Dame Philosophical Reviews. University of Notre Dame. Retrieved November 11 2017 from. http://ndpr.nd.edu/news/the-ethics-of-technology-a-geometric-analysis-of-five-moral-principles/.
Stewart, A., Prandy, K., & Blackburn, R. M. (1973) Measuring the class structure. Nature, 245, 415.
Article Google Scholar
Taylor, M. (2016). Self-driving Mercedes-Benzes will prioritize occupant safety over pedestrians, Retrieved January 26, 2018, from https://blog.caranddriver.com/self-driving-mercedes-will-prioritize-occupant-safety-over-pedestrians.

Download references

Author information

Authors and Affiliations

Department of Philosophy, Texas A&M University, College Station, TX, USA
Martin Peterson

Authors

Martin Peterson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Peterson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peterson, M. The value alignment problem: a geometric approach. Ethics Inf Technol 21, 19–28 (2019). https://doi.org/10.1007/s10676-018-9486-0

Download citation

Published: 03 November 2018
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s10676-018-9486-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The value alignment problem: a geometric approach

Abstract

Access this article

Similar content being viewed by others

Ethical Decision-Making Theory: An Integrated Approach

Artificial Intelligence, Values, and Alignment

On the computational complexity of ethics: moral tractability for minds and machines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Ethical Decision-Making Theory: An Integrated Approach

Artificial Intelligence, Values, and Alignment

On the computational complexity of ethics: moral tractability for minds and machines

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation