Inferring Human Values for Safe AGI Design

  • Can Eren SezenerEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9205)


Aligning goals of superintelligent machines with human values is one of the ways to pursue safety in AGI systems. To achieve this, it is first necessary to learn what human values are. However, human values are incredibly complex and cannot easily be formalized by hand. In this work, we propose a general framework to estimate the values of a human given its behavior.


Value learning Inverse reinforcement learning Friendly AI Safe AGI 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dewey, D.: Learning what to value. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 309–314. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  2. 2.
    Hibbard, B.: Avoiding unintended AI behaviors. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS, vol. 7716, pp. 107–116. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  3. 3.
    Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)Google Scholar
  4. 4.
    Muehlhauser, L., Helm, L.: The singularity and machine ethics. In: Eden, A.H., Moor, J.H., Sraker, J.H., Steinhart, E. (eds.) Singularity Hypotheses, pp. 101–126. Springer, Heidelberg (2012). The Frontiers CollectionCrossRefGoogle Scholar
  5. 5.
    Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 663–670. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  6. 6.
    Schmidhuber, J.: The speed prior: a new simplicity measure yielding near-optimal computable predictions. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, p. 216. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  7. 7.
    Soares, N.: The value learning problem. Tech. rep., Machine Intelligence ResearchInstitute, Berkeley, CA (2015)Google Scholar
  8. 8.
    Solomonoff, R.: A formal theory of inductive inference. part i. Information and Control 7(1), 1–22 (1964)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Yudkowsky, E.: Complex value systems in friendly AI. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) AGI 2011. LNCS, vol. 6830, pp. 388–393. Springer, Heidelberg (2011) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceOzyegin UniversityIstanbulTurkey

Personalised recommendations