Abstract
Autonomous agents (AA) will increasingly be interacting with us in our daily lives. While we want the benefits attached to AAs, it is essential that their behavior is aligned with our values and norms. Hence, an AA will need to estimate the values and norms of the humans it interacts with, which is not a straightforward task when solely observing an agent’s behavior. This paper analyses to what extent an AA is able to estimate the values and norms of a simulated human agent (SHA) based on its actions in the ultimatum game. We present two methods to reduce ambiguity in profiling the SHAs: one based on search space exploration and another based on counterfactual analysis. We found that both methods are able to increase the confidence in estimating human values and norms, but differ in their applicability, the latter being more efficient when the number of interactions with the agent is to be minimized. These insights are useful to improve the alignment of AAs with human values and norms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For ease of presentation, we chose to present P with no monetary unit. Empirical work [21] shows that the effect of the pie size is relatively small.
- 2.
The norm that is drawn from the normal distribution is not used as input for the norm in subsequent rounds (i.e., the agent does not memorize it).
- 3.
To be exact, the proposed demand should be considered as what the proposer considers a ‘normal’ threshold. If it considered a higher threshold to be normal it would have demanded less, if it considered a lower threshold normal it would have demanded more.
- 4.
The standard deviation in the demand solely based on values \(\sigma _{vd}\) was added to ensure agents vary in what values they find important (\(di_a\)). The \(\sigma _{vd}\) for humans is postulated instead of extracted from empirical data.
- 5.
Given the deterministic model of the SHA, it might be expected that the RMSE should tend to zero. This is not the case because [d, valueDemand, \(normDemand] \in \mathbb {Z}\), and therefore a rounding operator is used.
- 6.
- 7.
In our case \(OR_a\): given that preference to values and norms are constant, demand is defined according to (4))
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML). ACM (2004)
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv preprint arXiv:1606.06565 (2016)
Cooper, D.J., Dutcher, E.G.: The dynamics of responder behavior in ultimatum games: a meta-study. Exp. Econ. 14(4), 519–546 (2011)
Cranefield, S., Winikoff, M., Dignum, V., Dignum, F.: No pizza for you: value-based plan selection in BDI agents. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), pp. 178–184 (2017)
Crawford, S.E.S., Ostrom, E.: A grammar of institutions. Polit. Sci. 89(3), 582–600 (2007)
Dechesne, F., Di Tosto, G., Dignum, V., Dignum, F.: No smoking here: values, norms and culture in multi-agent systems. Artif. Intell. Law 21(1), 79–107 (2013)
Del Missier, F., Mäntylä, T., Hansson, P., Bruine de Bruin, W., Parker, A.M., Nilsson, L.G.: The multifold relationship between memory and decision making: an individual-differences study. J. Exp. Psychol.: Learn. Mem. Cogn. 39(5), 1344 (2013)
Dignum, V.: Responsible Artificial Intelligence: How to Develop and Use AI in a Responsible Way. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30371-6
Fehr, E., Fischbacher, U.: The nature of human altruism. Nature 425(6960), 785–791 (2003)
Fishbein, M., Ajzen, I.: Predicting and Changing Behavior: The Reasoned Action Approach. Taylor & Francis Ltd, Milton Park (2011)
Güth, W., Schmittberger, R., Schwarze, B.: An experimental analysis of ultimatum bargaining. J. Econ. Behav. Organ. 3(4), 367–388 (1982)
Hadfield-Menell, D., Milli, S., Abbeel, P., Russell, S.J., Dragan, A.: Inverse reward design. In: Proceeding of the 31st Conference on Neural Information Processing Systems (NIPS), pp. 6765–6774 (2017)
Irving, G., Askell, A.: Ai safety needs social scientists. Distill 4(2), e14 (2019)
Levine, S., Popovic, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. In: Proceeding of the 31st Conference on Neural Information Processing Systems (NIPS), pp. 19–27 (2011)
Malle, B.F.: How the Mind Explains Behavior: Folk Explanations, Meaning, and Social Interaction. MIT Press, Cambridge (2006)
Mercuur, R., Dignum, V., Jonker, C.M., et al.: The value of values and norms in social simulation. J. Artif. Soc. Soc. Simul. 22(1), 1–9 (2019)
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2018)
Mindermann, S., Armstrong, S.: Occam’s razor is insufficient to infer the preferences of irrational agents. In: Conference on Neural Information Processing Systems (NIPS), pp. 5598–5609 (2018)
Nielsen, T.D., Jensen, F.V.: Learning a decision maker’s utility function from (possibly) inconsistent behavior. Artif. Intell. 160(1–2), 53–78 (2004)
Nouri, E., Georgila, K., Traum, D.: Culture-specific models of negotiation for virtual characters: multi-attribute decision-making based on culture-specific values. AI Soc. 32(1), 51–63 (2014). https://doi.org/10.1007/s00146-014-0570-7
Oosterbeek, H., Sloof, R., Van De Kuilen, G.: Cultural differences in ultimatum game experiments: evidence from a meta-analysis. SSRN Electron. J. 8(1), 171–188 (2001)
Pearl, J.: The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62(3), 54–60 (2019)
Van de Poel, I., et al.: Ethics, Technology, and Engineering: An Introduction. Wiley, Hoboken (2011)
Roese, N.J.: Counterfactual thinking. Psychol. Bull. 121(1), 133 (1997)
Roth, A.E., Erev, I.: Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term. Games Econ. Behav. 8(1), 164–212 (1995)
Schwartz, S.H.: An overview of the Schwartz theory of basic values. Online Read. Psychol. Culture 2, 1–20 (2012)
Soares, N., Fallenstein, B.: Agent foundations for aligning machine intelligence with human interests: a technical research agenda. In: Callaghan, V., Miller, J., Yampolskiy, R., Armstrong, S. (eds.) The Technological Singularity. TFC, pp. 103–125. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54033-6_5
Acknowledgements
This work was supported by the AiTech initiative of the Delft University of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Siebert, L.C., Mercuur, R., Dignum, V., van den Hoven, J., Jonker, C. (2021). Improving Confidence in the Estimation of Values and Norms. In: Aler Tubella, A., Cranefield, S., Frantz, C., Meneguzzi, F., Vasconcelos, W. (eds) Coordination, Organizations, Institutions, Norms, and Ethics for Governance of Multi-Agent Systems XIII. COIN COINE 2017 2020. Lecture Notes in Computer Science(), vol 12298. Springer, Cham. https://doi.org/10.1007/978-3-030-72376-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-72376-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72375-0
Online ISBN: 978-3-030-72376-7
eBook Packages: Computer ScienceComputer Science (R0)