The value alignment problem: a geometric approach

Abstract

Stuart Russell defines the value alignment problem as follows: How can we build autonomous systems with values that “are aligned with those of the human race”? In this article I outline some distinctions that are useful for understanding the value alignment problem and then propose a solution: I argue that the methods currently applied by computer scientists for embedding moral values in autonomous systems can be improved by representing moral principles as conceptual spaces, i.e. as Voronoi tessellations of morally similar choice situations located in a multidimensional geometric space. The advantage of my preferred geometric approach is that it can be implemented without specifying any utility function ex ante.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

Notes

  1. 1.

    Tesla’s autopilot mode is marketed as a semi-autonomous system, not as a fully autonomous one.

  2. 2.

    For an overview of al three accidents, see The Guardian (March 31, 2018).

  3. 3.

    I leave it open whether autonoumous systems make decisions, or if all decisions are ultimately made by the engineers who design these systems. For the purposes of this paper there is no need to ascribe moral agency to autonomous systems.

  4. 4.

    Christopher von Hugo, manager of driver assistance and active safety at Mercedes-Benz, announced at the Paris auto show in 2016 that autonomous vehicles should always prioritize occupant safety over pedestrians. See Taylor (2016). It leave it to the reader to determine whether Mr. Hugo was speaking on behalf of his employer or merely expressing his personal opinion.

  5. 5.

    See e.g. Goodall (2016) Carfwod (2016), but note that Nyholm and Smids (2016) question the analogy.

  6. 6.

    See Bostrom (2014) for an extensive discussion of this topic. See also Dafoe and Russell (2016).

  7. 7.

    The quote is from a talk Dr. Russell gave at the World Economic Forum in Davos, Switzerland in Januray 2015. The talk is available on Youtube (https://www.youtube.com/watch?v=WvmeTaFc_Qw). Russell has also expressed the same idea in the papers listed in the references.

  8. 8.

    Russell (2016. p. 59).

  9. 9.

    See e.g. Bostrom (2014) and Milli et al. (2017).

  10. 10.

    If an ethical theory ranks some options as infinitely better than others, or entails cyclical orderings, then no real-valued utility function could mimic the prescriptions of such an ethical theory. It is also an open question whether the “theory” I sketch in this article could be represented by some real-valued utility function. (This depends on how we understand the ranking of domain-specific principles.) Brown (2011) also points out that no real-valued utility function can account for the existence of moral dilemmas. See Peterson (2013, Chap. 8) for a discussion of how hyper-real utility functions could help us overcome this problem.

  11. 11.

    For reasons explained in the previous footnote, a problem with this suggestion might be that no real-valued utility function can account for Aristotle’s notion of supererogation. See Peterson (2013, Chap. 8).

  12. 12.

    IEEE (2017a).

  13. 13.

    IEEE (2017a, pp. 23, 36).

  14. 14.

    IEEE (2017a, p. 20).

  15. 15.

    For an overview, see Attfiled (2014).

  16. 16.

    Hadfield-Menell et al. (2016, p. 2).

  17. 17.

    Milli et al. (2017, p. 1).

  18. 18.

    IEEE (2017b, p. 1).

  19. 19.

    This is a fundamenatal assumption in Bostrom (2014) and, for instance, Milli et al. (2017), but it has far as I am aware never been exstensively discussed.

  20. 20.

    For reasons explained in the previous footnote, a problem with this suggestion might be that no real-valued utility function can account for Aristotle’s notion of supererogation. See Peterson (2013, Chap. 8).

  21. 21.

    Whether my proposal can be mimicked by some real-valued utility function is an open question (as noted in footnote 10), and also irrelevant. What matters is that my proposal can be implemented in a machine without explicitly ascribing utilities to outcomes or alternatives. From an epistemic point of this, this is a clear advantage over the utility-based approach.

  22. 22.

    The section draws on Chapter 1 in ET.

  23. 23.

    See ET, pp. 14–15.

  24. 24.

    See Nicomachean Ethics 1131a10–b15; Politics, III.9.1280 a8–15, III. 12. 1282b18–23.

  25. 25.

    See Jonsen and Toulmin (1988) for a defense of causuistry.

  26. 26.

    CBA: An option is morally right only if the net surplus of benefits over costs for all those affected is at least as large as that of every alternative.

  27. 27.

    PP: An option is morally right only if reasonable precautionary measures are taken to safeguard against uncertain but non-negligible threats.

  28. 28.

    ST: An option is morally right only if it does not lead to any significant long-term depletion of natural, social or economic resources.

  29. 29.

    AUT: An option is morally right only if it does not reduce the independence, self-governance or freedom of the people affected by it.

  30. 30.

    FP: An option is morally right only if it does not lead to unfair inequalities among the people affected by it.

  31. 31.

    Note that I am not claiming that all ethical theories are false. I am merely suggesting that it is not neceesary to take a stance on which theory is correct in order to align the values of autonous systems with ours in the manner specified in the moderate value alignment thesis.

  32. 32.

    It is of course possible that the majority is wrong. I am not trying to derive an “ought” from an is’; see Chap. 3 of ET for a discussion of Hume’s Is-Ought principle.

  33. 33.

    A reviwer has suggested that it would be helpful to clarify how the geometric method differs from Rawls’ method of reflective equilibrium. The most important difference is that unlike Rawls’ method, the geomtric method is compatible with coherentistic as well as foundationalist principles. The ex ante mechanism for selecting paradigm cases outlined in Chapter 2 of ET assigns a priviliged, foundational role to paradigm cases. The ex post mechanism discussed in the same chapter is coherentistic in the sense that the location of the paradigm cases depends on what cases the principle has been applied to in the past.

  34. 34.

    See Chapter 8 of ET.

  35. 35.

    See Chapters 1 and 2. See also the experimental evidence report in Chapters 3 and 5.

  36. 36.

    See e.g. Peterson (2013) for a defense of this view.

  37. 37.

    I would like to thank Rob Reed for suggseting this helpful point to me.

  38. 38.

    See, for instance, Gavagai.se.

  39. 39.

    Shrader-Frechette (2017).

  40. 40.

    Peterson (2017, pp. 37–38).

  41. 41.

    Stewart et al. (1973, pp. 415–417), my italics.

  42. 42.

    Kruskal and Wish (1978, pp. 30–31), my italics.

  43. 43.

    Lokhorst (2018, p. 1).

  44. 44.

    ET, p. 17.

  45. 45.

    Peterson (2017, p. 17).

References

  1. Anderson, M., & Anderson, S. L. (2014). GenEth: A general ethical dilemma analyzer.” Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (2014): 253–261.

  2. Attfield, R. (2014). Environmental ethics: An overview for the twenty-first century. New York: Wiley.

    Google Scholar 

  3. Bostrom, N. (2014). Superintelligence. Oxford: Oxford University Press.

    Google Scholar 

  4. Brown, C. (2011). Consequentialize this. Ethics, 121(4), 749–771.

    Article  Google Scholar 

  5. Crawford, K., & Calo, R. (2016). There is a blind spot in AI research. Nature, 538(7625).

  6. Dafoe, A., & Russell, S. (2016). Yes, we are worried about the existential risk of artificial intelligence. MIT Technology Review.

  7. Gärdenfors, P. (2000). Conceptual spaces: The geometry of thought. Cambridge: MIT Press.

    Google Scholar 

  8. Gärdenfors, P. (2014). The geometry of meaning: Semantics based on conceptual spaces. Cambridge: MIT Press.

    Google Scholar 

  9. Goodall, N. J. (2016). Can you program ethics into a self-driving car? IEEE Spectrum, 53(6), 28–58.

    Article  Google Scholar 

  10. Guardian Staff and Agencies, (2018). Tesla car that crashed and killed driver was running on Autopilot, firm says. The Guardian, March 31st, 2018.

  11. Hadfield-Menell, D., Dragan, A., Abbeel, P., & Russell, S. (2016). “The off-switch game”, arXiv preprint arXiv: 1611.08219.

  12. IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. (2017a). “Ethically Aligned Design (EAD) - Version 2.” Retrieved January 26, 2018, from http://standards.ieee.org/develop/indconn/ec/autonomous_systems.html.

  13. IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. (2017b). “Classical Ethics in A/IS” Retrieved January 26, 2018, from https://standards.ieee.org/develop/indconn/ec/ead_classical_ethics_ais_v2.pdf.

  14. Jonsen, A. R., & Toulmin, S. E. (1988). The abuse of casuistry: A history of moral reasoning. University of California Press.

  15. Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling. New York: Sage Publications.

    Google Scholar 

  16. Lokhorst, G. J. C. (2018). Science and Engineering Ethics.“, 415–417. https://doi.org/10.1007/s11948-017-0014-0.

  17. Milli, S., Hadfield-Menell, D., Dragan, A., & Russell, S. (2017). “Should Robots be Obedient?”. arXiv preprint arXiv.1705.09990.

  18. Nyholm, S., & Smids, J. (2016). The ethics of accident-algorithms for self-driving cars: An applied trolley problem? Ethical Theory and Moral Practice, 19(5), 1275–1289.

    Article  Google Scholar 

  19. Paulo, N. (2015). Casuistry as common law morality. Theoretical Medicine and Bioethics, 36(6), 373–389.

    Article  Google Scholar 

  20. Peterson, M. (2013). The dimensions of consequentialism: Ethics, equality and risk. Cambridge University Press.

  21. Peterson, M. (2017). The ethics of technology: A geometric analysis of five moral principles. Oxford: Oxford University Press.

    Google Scholar 

  22. Peterson, M. (2018). The ethics of technology: Response to critics. Science and Engineering Ethics. https://doi.org/10.1007/s119.

    Google Scholar 

  23. Rosch, E. (1975). Cognitive reference points. Cognitive Psychology, 7, 532–547.

    Article  Google Scholar 

  24. Rosch, E. H. (1973). Natural categories. Cognitive Psychology, 4, 328–350.

    Article  Google Scholar 

  25. Russell, S. (2016). Should we fear supersmart robots. Scientific American, 314(6), 58–59.

    Article  Google Scholar 

  26. Shrader-Frechette, K. (2017). Review of the ethics of technology: A geometric analysis of five moral principles. Notre Dame Philosophical Reviews. University of Notre Dame. Retrieved November 11 2017 from. http://ndpr.nd.edu/news/the-ethics-of-technology-a-geometric-analysis-of-five-moral-principles/.

  27. Stewart, A., Prandy, K., & Blackburn, R. M. (1973) Measuring the class structure. Nature, 245, 415.

    Article  Google Scholar 

  28. Taylor, M. (2016). Self-driving Mercedes-Benzes will prioritize occupant safety over pedestrians, Retrieved January 26, 2018, from https://blog.caranddriver.com/self-driving-mercedes-will-prioritize-occupant-safety-over-pedestrians.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Martin Peterson.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Peterson, M. The value alignment problem: a geometric approach. Ethics Inf Technol 21, 19–28 (2019). https://doi.org/10.1007/s10676-018-9486-0

Download citation

Keywords

  • Value alignment problem
  • Autonomous systems
  • Conceptual spaces
  • Self-driving cars
  • Stuart Russell
  • IEEE