Abstract
While several tools exist for training and evaluating narrow machine learning (ML) algorithms, their design generally does not follow a particular or explicit evaluation methodology or theory. Inversely so for more general learners, where many evaluation methodologies and frameworks have been suggested, but few specific tools exist. In this paper we introduce a new framework for broad evaluation of artificial intelligence (AI) learners, and a new tool that builds on this methodology. The platform, called SAGE (Simulator for Autonomy & Generality Evaluation), works for training and evaluation of a broad range of systems and allows detailed comparison between narrow and general ML and AI. It provides a variety of tuning and task construction options, allowing isolation of single parameters across complexity dimensions. SAGE is aimed at helping AI researchers map out and compare strengths and weaknesses of divergent approaches. Our hope is that it can help deepen understanding of the various tasks we want AI systems to do and the relationship between their composition, complexity, and difficulty for various AI systems, as well as contribute to building a clearer research road map for the field. This paper provides an overview of the framework and presents results of an early use case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://index.ros.org/doc/ros2/ – accessed Feb. \(26^{th}\) 2020.
- 2.
http://gazebosim.org/ – accessed Feb. \(26^{th}\) 2020.
- 3.
https://github.com/opennars/OpenNARS-for-Applications – accessed May \(10^{th}\) 2020.
References
Adams, S., et al.: Mapping the landscape of human-level artificial general intelligence. AI Mag. 33(1), 25–42 (2012)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: SAGE: task-environment platform for evaluating a broad range of AI learners. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 4148–4152 (2015)
Bieger, J., Thórisson, K.R., Steunebrink, B.R., Thorarensen, T., Sigurdardóttir, J.S.: Evaluation of general-purpose artificial intelligence: why, what & how. In: EGPAI 2016 - Evaluating General-Purpose A.I., Workshop Held in Conjuction with the European Conference on Artificial Intelligence (2016)
Brockman, G., et al.: OpenAI Gym. ArXiv preprint ArXiv:1606.01540 (2016)
Hernández-Orallo, J., et al.: A new AI evaluation cosmos: ready to play the game? AI Mag. 38(3), 66–69 (2017)
Johnston, B.: The toy box problem (and a preliminary solution). In: Conference on Artificial General Intelligence. Atlantis Press (2010)
Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), vol. 3, pp. 2149–2154. IEEE (2004)
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)
Levesque, H., Davis, E., Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012)
Li, Y.: Deep reinforcement learning: an overview. ArXiv preprint ArXiv:1701.07274 (2017)
Martınez-Plumed, F., Hernández-Orallo, J.: AI results for the atari 2600 games: difficulty and discrimination using IRT. In: EGPAI, Workshop on Evaluating General-Purpose Artificial Intelligence, vol. 33 (2016)
Oppy, G., Dowe, D.: The turing test. In: Stanford Encyclopedia of Philosophy, pp. 519–539 (2003)
Quigley, M., et al.: ROS: an open-source Robot Operating System. In: ICRA Workshop on Open Source Software, Kobe, Japan, vol. 3, p. 5 (2009)
Riedl, M.O.: The Lovelace 2.0 test of artificial creativity and intelligence. ArXiv preprint ArXiv:1410.6142 (2014)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, London (2016)
Świechowski, M., Park, H., Mańdziuk, J., Kim, K.J.: Recent advances in general game playing. Sci. World J. 2015, 22 (2015)
Thorarensen, T.: FraMoTEC: A framework for modular task-environment construction for evaluating adaptive control systems. M.Sc. thesis, Department of Computer Science, Reykjavik University (2016)
Thórisson, K.R., Bieger, J., Schiffel, S., Garrett, D.: Towards flexible task environments for comprehensive evaluation of artificial intelligent systems and automatic learners. In: Bieger, J., Goertzel, B., Potapov, A. (eds.) AGI 2015. LNCS (LNAI), vol. 9205, pp. 187–196. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21365-1_20
Thórisson, K.R., Bieger, J., Thorarensen, T., Sigurðardóttir, J.S., Steunebrink, B.R.: Why artificial intelligence needs a task theory. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS (LNAI), vol. 9782, pp. 118–128. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41649-6_12
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Wang, P.: Rigid Flexibility: The Logic of Intelligence. Springer, Dordrecht (2006). https://doi.org/10.1007/1-4020-5045-3
Acknowledgements
The authors would like to thank Hjörleifur Henriksson for help with computer setup and data collection, and Patrick Hammer for help with ONA. This work was in part supported by grants from Reykjavik University, the Icelandic Institute for Intelligent Machines and Cisco Systems, Inc.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Eberding, L.M., Thórisson, K.R., Sheikhlar, A., Andrason, S.P. (2020). SAGE: Task-Environment Platform for Evaluating a Broad Range of AI Learners. In: Goertzel, B., Panov, A., Potapov, A., Yampolskiy, R. (eds) Artificial General Intelligence. AGI 2020. Lecture Notes in Computer Science(), vol 12177. Springer, Cham. https://doi.org/10.1007/978-3-030-52152-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-52152-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52151-6
Online ISBN: 978-3-030-52152-3
eBook Packages: Computer ScienceComputer Science (R0)