Abstract
Evaluation of artificial intelligence (AI) systems is a prerequisite for comparing them on the many dimensions they are intended to perform on. Design of task-environments for this purpose is often ad-hoc, focusing on some limited aspects of the systems under evaluation. Testing on a wide range of tasks and environments would better facilitate comparisons and understanding of a system’s performance, but this requires that manipulation of relevant dimensions cause predictable changes in the structure, behavior, and nature of the task-environments. What is needed is a framework that enables easy composition, decomposition, scaling, and configuration of task-environments. Such a framework would not only facilitate evaluation of the performance of current and future AI systems, but go beyond it by allowing evaluation of knowledge acquisition, cognitive growth, lifelong learning, and transfer learning. In this paper we list requirements that we think such a framework should meet to facilitate the evaluation of intelligence, and present preliminary ideas on how this could be realized.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Archibald, T.W., McKinnon, K.I.M., Thomas, L.C.: On the generation of Markov decision processes. J. Oper. Res. Soc. 46, 354–361 (1995)
Asta, S., Özcan, E., Parkes, A.J.: Batched mode hyper-heuristics. In: Nicosia, G., Pardalos, P. (eds.) LION 7. LNCS, vol. 7997, pp. 404–409. Springer, Heidelberg (2013)
Bhatnagar, S., Sutton, R.S., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Automatica 45(11), 2471–2482 (2009)
Bieger, J., Thórisson, K.R., Garrett, D.: Raising AI: tutoring matters. In: Goertzel, B., Orseau, L., Snaider, J. (eds.) AGI 2014. LNCS, vol. 8598, pp. 1–10. Springer, Heidelberg (2014)
Bischl, B., Mersmann, O., Trautmann, H., Preuß, M.: Algorithm selection based on exploratory landscape analysis and cost-sensitive learning. In: Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, GECCO 2012, pp. 313–320. ACM, New York (2012)
Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu, R.: Hyper-heuristics: A survey of the state of the art. J. Oper. Res. Soc. 64(12), 1695–1724 (2013)
Decker, K.: TAEMS: A framework for environment centered analysis & design of coordination mechanisms. In: O’Hare, G.M.P., Jennings, N.R. (eds.) Foundations of Distributed Artificial Intelligence, pp. 429–448. Wiley Inter-Science (1996)
Ebner, M., Levine, J., Lucas, S.M., Schaul, T., Thompson, T., Togelius, J.: Towards a video game description language. In: Lucas, S.M., Mateas, M., Preuss, M., Spronck, P., Togelius, J. (eds.) Artificial and Computational Intelligence in Games. Dagstuhl Follow-Ups, vol. 6, pp. 85–100. Schloss Dagstuhl (2013)
Garrett, D., Bieger, J., Thórisson, K.R.: Tunable and generic problem instance generation for multi-objective reinforcement learning. In: ADPRL 2014. IEEE (2014)
Hernández-Orallo, J.: A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In: Baum, E., Hutter, M., Kitzelmann, E. (eds.) AGI 2010, pp. 182–183. Atlantis Press (2010)
Hernández-Orallo, J.: AI Evaluation: past, present and future (2014). arXiv:1408.6908
Hernández-Orallo, J., Dowe, D.L.: Measuring universal intelligence: Towards an anytime intelligence test. Artif. Intell. 174(18), 1508–1539 (2010)
Legg, S., Hutter, M.: Tests of Machine Intelligence [cs] (December 2007). arXiv:0712.3825
Legg, S., Veness, J.: An approximation of the universal intelligence measure. In: Dowe, D.L. (ed.) Solomonoff Festschrift. LNCS(LNAI), vol. 7070, pp. 236–249. Springer, Heidelberg (2013)
Lim, C.U., Harrell, D.F.: An approach to general videogame evaluation and automatic generation using a description language. In: CIG 2014. IEEE (2014)
Love, N., Hinrichs, T., Haley, D., Schkufza, E., Genesereth, M.: General game playing: Game description language specification. Tech. Rep. LG-2006-01, Stanford Logic Group (2008)
McDermott, D., Ghallab, M., Howe, A., Knoblock, C., Ram, A., Veloso, M., Weld, D., Wilkins, D.: PDDL-The Planning Domain Definition Language. Tech. Rep. TR-98-003, Yale Center for Computational Vision and Control (1998). http://www.cs.yale.edu/homes/dvm/
Rohrer, B.: Accelerating progress in Artificial General Intelligence: Choosing a benchmark for natural world interaction. J. Art. Gen. Int. 2(1), 1–28 (2010)
Schaul, T.: A video game description language for model-based or interactive learning. In: CIG 2013, pp. 1–8. IEEE (2013)
Schaul, T., Togelius, J., Schmidhuber, J.: Measuring intelligence through games (2011). arXiv preprint arXiv:1109.1314
Togelius, J., Champandard, A.J., Lanzi, P.L., Mateas, M., Paiva, A., Preuss, M., Stanley, K.O.: Procedural content generation: Goals, challenges and actionable steps. In: Lucas, S.M., Mateas, M., Preuss, M., Spronck, P., Togelius, J. (eds.) Artificial and Computational Intelligence in Games. Dagstuhl Follow-Ups, vol. 6, pp. 61–75. Schloss Dagstuhl (2013)
Turing, A.M.: Computing machinery and intelligence. Mind 59(236), 433–460 (1950)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Thórisson, K.R., Bieger, J., Schiffel, S., Garrett, D. (2015). Towards Flexible Task Environments for Comprehensive Evaluation of Artificial Intelligent Systems and Automatic Learners. In: Bieger, J., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2015. Lecture Notes in Computer Science(), vol 9205. Springer, Cham. https://doi.org/10.1007/978-3-319-21365-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-21365-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21364-4
Online ISBN: 978-3-319-21365-1
eBook Packages: Computer ScienceComputer Science (R0)