Inferring Human Values for Safe AGI Design
Aligning goals of superintelligent machines with human values is one of the ways to pursue safety in AGI systems. To achieve this, it is first necessary to learn what human values are. However, human values are incredibly complex and cannot easily be formalized by hand. In this work, we propose a general framework to estimate the values of a human given its behavior.
KeywordsValue learning Inverse reinforcement learning Friendly AI Safe AGI
Unable to display preview. Download preview PDF.
- 3.Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)Google Scholar
- 5.Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 663–670. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
- 7.Soares, N.: The value learning problem. Tech. rep., Machine Intelligence ResearchInstitute, Berkeley, CA (2015)Google Scholar