SAGE: Task-Environment Platform for Evaluating a Broad Range of AI Learners

Eberding, Leonard M.; Thórisson, Kristinn R.; Sheikhlar, Arash; Andrason, Sindri P.

doi:10.1007/978-3-030-52152-3_8

Leonard M. Eberding^12,13,
Kristinn R. Thórisson^12,14,
Arash Sheikhlar¹² &
…
Sindri P. Andrason¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12177))

Included in the following conference series:

International Conference on Artificial General Intelligence

1382 Accesses
5 Citations

Abstract

While several tools exist for training and evaluating narrow machine learning (ML) algorithms, their design generally does not follow a particular or explicit evaluation methodology or theory. Inversely so for more general learners, where many evaluation methodologies and frameworks have been suggested, but few specific tools exist. In this paper we introduce a new framework for broad evaluation of artificial intelligence (AI) learners, and a new tool that builds on this methodology. The platform, called SAGE (Simulator for Autonomy & Generality Evaluation), works for training and evaluation of a broad range of systems and allows detailed comparison between narrow and general ML and AI. It provides a variety of tuning and task construction options, allowing isolation of single parameters across complexity dimensions. SAGE is aimed at helping AI researchers map out and compare strengths and weaknesses of divergent approaches. Our hope is that it can help deepen understanding of the various tasks we want AI systems to do and the relationship between their composition, complexity, and difficulty for various AI systems, as well as contribute to building a clearer research road map for the field. This paper provides an overview of the framework and presents results of an early use case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://index.ros.org/doc/ros2/ – accessed Feb. \(26^{th}\) 2020.
2.
http://gazebosim.org/ – accessed Feb. \(26^{th}\) 2020.
3.
https://github.com/opennars/OpenNARS-for-Applications – accessed May \(10^{th}\) 2020.

References

Adams, S., et al.: Mapping the landscape of human-level artificial general intelligence. AI Mag. 33(1), 25–42 (2012)
Article Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: SAGE: task-environment platform for evaluating a broad range of AI learners. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 4148–4152 (2015)
Google Scholar
Bieger, J., Thórisson, K.R., Steunebrink, B.R., Thorarensen, T., Sigurdardóttir, J.S.: Evaluation of general-purpose artificial intelligence: why, what & how. In: EGPAI 2016 - Evaluating General-Purpose A.I., Workshop Held in Conjuction with the European Conference on Artificial Intelligence (2016)
Google Scholar
Brockman, G., et al.: OpenAI Gym. ArXiv preprint ArXiv:1606.01540 (2016)
Hernández-Orallo, J., et al.: A new AI evaluation cosmos: ready to play the game? AI Mag. 38(3), 66–69 (2017)
Article Google Scholar
Johnston, B.: The toy box problem (and a preliminary solution). In: Conference on Artificial General Intelligence. Atlantis Press (2010)
Google Scholar
Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), vol. 3, pp. 2149–2154. IEEE (2004)
Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)
Google Scholar
Levesque, H., Davis, E., Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012)
Google Scholar
Li, Y.: Deep reinforcement learning: an overview. ArXiv preprint ArXiv:1701.07274 (2017)
Martınez-Plumed, F., Hernández-Orallo, J.: AI results for the atari 2600 games: difficulty and discrimination using IRT. In: EGPAI, Workshop on Evaluating General-Purpose Artificial Intelligence, vol. 33 (2016)
Google Scholar
Oppy, G., Dowe, D.: The turing test. In: Stanford Encyclopedia of Philosophy, pp. 519–539 (2003)
Google Scholar
Quigley, M., et al.: ROS: an open-source Robot Operating System. In: ICRA Workshop on Open Source Software, Kobe, Japan, vol. 3, p. 5 (2009)
Google Scholar
Riedl, M.O.: The Lovelace 2.0 test of artificial creativity and intelligence. ArXiv preprint ArXiv:1410.6142 (2014)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, London (2016)
MATH Google Scholar
Świechowski, M., Park, H., Mańdziuk, J., Kim, K.J.: Recent advances in general game playing. Sci. World J. 2015, 22 (2015)
Article Google Scholar
Thorarensen, T.: FraMoTEC: A framework for modular task-environment construction for evaluating adaptive control systems. M.Sc. thesis, Department of Computer Science, Reykjavik University (2016)
Google Scholar
Thórisson, K.R., Bieger, J., Schiffel, S., Garrett, D.: Towards flexible task environments for comprehensive evaluation of artificial intelligent systems and automatic learners. In: Bieger, J., Goertzel, B., Potapov, A. (eds.) AGI 2015. LNCS (LNAI), vol. 9205, pp. 187–196. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21365-1_20
Chapter Google Scholar
Thórisson, K.R., Bieger, J., Thorarensen, T., Sigurðardóttir, J.S., Steunebrink, B.R.: Why artificial intelligence needs a task theory. In: Steunebrink, B., Wang, P., Goertzel, B. (eds.) AGI -2016. LNCS (LNAI), vol. 9782, pp. 118–128. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41649-6_12
Chapter Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Wang, P.: Rigid Flexibility: The Logic of Intelligence. Springer, Dordrecht (2006). https://doi.org/10.1007/1-4020-5045-3
Book MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank Hjörleifur Henriksson for help with computer setup and data collection, and Patrick Hammer for help with ONA. This work was in part supported by grants from Reykjavik University, the Icelandic Institute for Intelligent Machines and Cisco Systems, Inc.

Author information

Authors and Affiliations

Center for Analysis and Design of Intelligent Agents, Reykjavik University, Reykjavik, Iceland
Leonard M. Eberding, Kristinn R. Thórisson, Arash Sheikhlar & Sindri P. Andrason
Institute of Photogrammetry and GeoInformation, Leibniz U., Hannover, Germany
Leonard M. Eberding
Icelandic Institute for Intelligent Machines, Reykjavik, Iceland
Kristinn R. Thórisson

Authors

Leonard M. Eberding
View author publications
You can also search for this author in PubMed Google Scholar
Kristinn R. Thórisson
View author publications
You can also search for this author in PubMed Google Scholar
Arash Sheikhlar
View author publications
You can also search for this author in PubMed Google Scholar
Sindri P. Andrason
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arash Sheikhlar .

Editor information

Editors and Affiliations

SingularityNET Foundation, Amsterdam, The Netherlands
Ben Goertzel
Moscow Institute of Physics and Technology, Dolgoprudny, Russia
Aleksandr I. Panov
SingularityNET Foundation, Amsterdam, The Netherlands
Alexey Potapov
University of Louisville, Louisville, KY, USA
Roman Yampolskiy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Eberding, L.M., Thórisson, K.R., Sheikhlar, A., Andrason, S.P. (2020). SAGE: Task-Environment Platform for Evaluating a Broad Range of AI Learners. In: Goertzel, B., Panov, A., Potapov, A., Yampolskiy, R. (eds) Artificial General Intelligence. AGI 2020. Lecture Notes in Computer Science(), vol 12177. Springer, Cham. https://doi.org/10.1007/978-3-030-52152-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-52152-3_8
Published: 06 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52151-6
Online ISBN: 978-3-030-52152-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics