Autonomous Agents and Multi-Agent Systems

, Volume 29, Issue 3, pp 402–454 | Cite as

On environment difficulty and discriminating power

  • José Hernández-OralloEmail author


This paper presents a way to estimate the difficulty and discriminating power of any task instance. We focus on a very general setting for tasks: interactive (possibly multi-agent) environments where an agent acts upon observations and rewards. Instead of analysing the complexity of the environment, the state space or the actions that are performed by the agent, we analyse the performance of a population of agent policies against the task, leading to a distribution that is examined in terms of policy complexity. This distribution is then sliced by the algorithmic complexity of the policy and analysed through several diagrams and indicators. The notion of environment response curve is also introduced, by inverting the performance results into an ability scale. We apply all these concepts, diagrams and indicators to two illustrative problems: a class of agent-populated elementary cellular automata, showing how the difficulty and discriminating power may vary for several environments, and a multi-agent system, where agents can become predators or preys, and may need to coordinate. Finally, we discuss how these tools can be applied to characterise (interactive) tasks and (multi-agent) environments. These characterisations can then be used to get more insight about agent performance and to facilitate the development of adaptive tests for the evaluation of agent abilities.


Environment difficulty Agent evaluation Discriminating power Agent policy Algorithmic information theory Universal psychometrics Reinforcement learning Elementary cellular automata 



I thank the reviewers for their comments, especially those aiming at a clearer connection with the field of multi-agent systems and the suggestion of better approximations for the calculation of the response curves. The implementation of the elementary cellular automata used in the environments is based on the library ‘CellularAutomaton’ by John Hughes for R [58]. I am grateful to Fernando Soler-Toscano for letting me know about their work [65] on the complexity of 2D objects generated by elementary cellular automata. I would also like to thank David L. Dowe for his comments on a previous version of this paper. This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST - European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economía y Competitividad in Spain (PCIN-2013-037).


  1. 1.
    Anderson, J., Baltes, J., & Cheng, C. T. (2011). Robotics competitions as benchmarks for ai research. The Knowledge Engineering Review, 26(01), 11–17.CrossRefGoogle Scholar
  2. 2.
    Andre, D., & Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents. In Proceedings of the National Conference on Artificial Intelligence (pp. 119–125). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Google Scholar
  3. 3.
    Antunes, L., Fortnow, L., van Melkebeek, D., & Vinodchandran, N. V. (2006). Computational depth: Concept and applications. Theoretical Computer Science, 354(3), 391–404. Foundations of Computation Theory (FCT 2003), 14th Symposium on Fundamentals of Computation Theory 2003.zbMATHMathSciNetCrossRefGoogle Scholar
  4. 4.
    Arai, K., Kaminka, G. A., Frank, I., & Tanaka-Ishii, K. (2003). Performance competitions as research infrastructure: Large scale comparative studies of multi-agent teams. Autonomous Agents and Multi-Agent Systems, 7(1–2), 121–144.Google Scholar
  5. 5.
    Ashcraft, M. H., Donley, R. D., Halas, M. A., & Vakali, M. (1992). Chapter 8 working memory, automaticity, and problem difficulty. In Jamie I.D. Campbell (Ed.), The nature and origins of mathematical skills, volume 91 of advances in psychology (pp. 301–329). North-Holland.Google Scholar
  6. 6.
    Ay, N., Müller, M., & Szkola, A. (2010). Effective complexity and its relation to logical depth. IEEE Transactions on Information Theory, 56(9), 4593–4607.Google Scholar
  7. 7.
    Barch, D. M., Braver, T. S., Nystrom, L. E., Forman, S. D., Noll, D. C., & Cohen, J. D. (1997). Dissociating working memory from task difficulty in human prefrontal cortex. Neuropsychologia, 35(10), 1373–1380.CrossRefGoogle Scholar
  8. 8.
    Bordini, R. H., Hübner, J. F., & Wooldridge, M. (2007). Programming multi-agent systems in AgentSpeak using Jason. London: Wiley. com.zbMATHCrossRefGoogle Scholar
  9. 9.
    Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S. et al. (2000). Decision-theoretic, high-level agent programming in the situation calculus. In Proceedings of the National Conference on Artificial Intelligence (pp. 355–362). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Google Scholar
  10. 10.
    Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.CrossRefGoogle Scholar
  11. 11.
    Chaitin, G. J. (1977). Algorithmic information theory. IBM Journal of Research and Development, 21, 350–359.zbMATHMathSciNetCrossRefGoogle Scholar
  12. 12.
    Chedid, F. B. (2010). Sophistication and logical depth revisited. In 2010 IEEE/ACS International Conference on Computer Systems and Applications (AICCSA) (pp. 1–4). IEEE.Google Scholar
  13. 13.
    Cheeseman, P., Kanefsky, B. & Taylor, W. M. (1991). Where the really hard problems are. In Proceedings of IJCAI-1991 (pp. 331–337).Google Scholar
  14. 14.
    Dastani, M. (2008). 2APL: A practical agent programming language. Autonomous Agents and Multi-agent Systems, 16(3), 214–248.CrossRefGoogle Scholar
  15. 15.
    Delahaye, J. P. & Zenil, H. (2011). Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness. Applied Mathematics and Computation, 219(1), 63–77Google Scholar
  16. 16.
    Dowe, D. L. (2008). Foreword re C. S. Wallace. Computer Journal, 51(5), 523–560. Christopher Stewart WALLACE (1933–2004) memorial special issue.CrossRefGoogle Scholar
  17. 17.
    Dowe, D. L., & Hernández-Orallo, J. (2012). IQ tests are not for machines, yet. Intelligence, 40(2), 77–81.CrossRefGoogle Scholar
  18. 18.
    Du, D. Z., & Ko, K. I. (2011). Theory of computational complexity (Vol. 58). London: Wiley-Interscience.Google Scholar
  19. 19.
    Elo, A. E. (1978). The rating of chessplayers, past and present (Vol. 3). London: Batsford.Google Scholar
  20. 20.
    Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. London: Lawrence Erlbaum.Google Scholar
  21. 21.
    Fatès, N. & Chevrier, V. (2010). How important are updating schemes in multi-agent systems? an illustration on a multi-turmite model. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 (pp. 533–540). International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
  22. 22.
    Ferber, J. & Müller, J. P. (1996). Influences and reaction: A model of situated multiagent systems. In Proceedings of Second International Conference on Multi-Agent Systems (ICMAS-96) (pp. 72–79).Google Scholar
  23. 23.
    Ferrando, P. J. (2009). Difficulty, discrimination, and information indices in the linear factor analysis model for continuous item responses. Applied Psychological Measurement, 33(1), 9–24.MathSciNetCrossRefGoogle Scholar
  24. 24.
    Ferrando, P. J. (2012). Assessing the discriminating power of item and test scores in the linear factor-analysis model. Psicológica, 33, 111–139.Google Scholar
  25. 25.
    Gent, I. P., & Walsh, T. (1994). Easy problems are sometimes hard. Artificial Intelligence, 70(1), 335–345.zbMATHMathSciNetCrossRefGoogle Scholar
  26. 26.
    Gershenson, C. & Fernandez, N. (2012). Complexity and information: Measuring emergence, self-organization, and homeostasis at multiple scales. Complexity, 18(2), 29–44.Google Scholar
  27. 27.
    Gruner, S. (2010). Mobile agent systems and cellular automata. Autonomous Agents and Multi-agent Systems, 20(2), 198–233.CrossRefGoogle Scholar
  28. 28.
    Hardman, D. K., & Payne, S. J. (1995). Problem difficulty and response format in syllogistic reasoning. The Quarterly Journal of Experimental Psychology, 48(4), 945–975.CrossRefGoogle Scholar
  29. 29.
    He, J., Reeves, C., Witt, C., & Yao, X. (2007). A note on problem difficulty measures in black-box optimization: Classification, realizations and predictability. Evolutionary Computation, 15(4), 435–443.CrossRefGoogle Scholar
  30. 30.
    Hernández-Orallo, J. (2000). Beyond the turing test. Journal of Logic Language & Information, 9(4), 447–466.zbMATHCrossRefGoogle Scholar
  31. 31.
    Hernández-Orallo, J. (2000). On the computational measurement of intelligence factors. In A. Meystel (Ed.), Performance metrics for intelligent systems workshop (pp. 1–8). Gaithersburg, MD: National Institute of Standards and Technology.Google Scholar
  32. 32.
    Hernández-Orallo, J. (2000). Thesis: Computational measures of information gain and reinforcement in inference processes. AI Communications, 13(1), 49–50.Google Scholar
  33. 33.
    Hernández-Orallo, J. (2010). A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In M. Hutter et al. (Ed.), 3rd International Conference on Artificial General Intelligence (pp. 182–183). Atlantis Press Extended report at
  34. 34.
    Hernández-Orallo, J., & Dowe, D. L. (2010). Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence, 174(18), 1508–1539.MathSciNetCrossRefGoogle Scholar
  35. 35.
    Hernández-Orallo, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Insa-Cabrera, J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 82–91). Berlin: Springer.Google Scholar
  36. 36.
    Hernández-Orallo, J., Dowe, D. L., & Hernández-Lloreda, M. V. (2014). Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research, 27, 50–74.CrossRefGoogle Scholar
  37. 37.
    Hernández-Orallo, J., Insa, J., Dowe, D. L. & Hibbard, B. (2012). Turing tests with turing machines. In A. Voronkov (Ed.), The Alan Turing Centenary Conference, Turing-100, Manchester, 2012, volume 10 of EPiC Series (pp. 140–156).Google Scholar
  38. 38.
    Hernández-Orallo, J. & Minaya-Collado, N. (1998). A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In Proceedings of International Symposium of Engineering of Intelligent Systems (EIS’98) (pp. 146–163). ICSC Press.Google Scholar
  39. 39.
    Hibbard, B. (2009). Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence, 1(1), 54–61.CrossRefGoogle Scholar
  40. 40.
    Hoos, H. H. (1999). Sat-encodings, search space structure, and local search performance. In 1999 International Joint Conference on Artificial Intelligence (Vol. 16, pp. 296–303).Google Scholar
  41. 41.
    Insa-Cabrera, J., Benacloch-Ayuso, J. L., & Hernández-Orallo, J. (2012). On measuring social intelligence: Experiments on competition and cooperation. In J. Bach, B. Goertzel, & M. Iklé (Eds.), AGI, volume 7716 of lecture notes in computer science (pp. 126–135). Berlin: Springer.Google Scholar
  42. 42.
    Insa-Cabrera, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Hernández-Orallo, J. (2011). Comparing humans and AI agents. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 122–132). Berlin: Springer.Google Scholar
  43. 43.
    Knuth, D. E. (1973). Sorting and searching, volume 3 of the art of computer programming. Reading, MA: Addison-Wesley.Google Scholar
  44. 44.
    Kotovsky, K., & Simon, H. A. (1990). What makes some problems really hard: Explorations in the problem space of difficulty. Cognitive Psychology, 22(2), 143–183.CrossRefGoogle Scholar
  45. 45.
    Legg, S. (2008). Machine super intelligence. PhD thesis, Department of Informatics, University of Lugano, June 2008.Google Scholar
  46. 46.
    Legg, S., & Hutter, M. (2007). Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4), 391–444.CrossRefGoogle Scholar
  47. 47.
    Leonetti, M. & Iocchi, L. (2010). Improving the performance of complex agent plans through reinforcement learning. In Proceedings of the 2010 International Conference on Autonomous Agents and Multiagent Systems (Vol. 1, pp. 723–730). International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
  48. 48.
    Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.Google Scholar
  49. 49.
    Levin, L. A. (1986). Average case complete problems. SIAM Journal on Computing, 15, 285.zbMATHMathSciNetCrossRefGoogle Scholar
  50. 50.
    Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). Berlin: Springer.zbMATHCrossRefGoogle Scholar
  51. 51.
    Low, C. K., Chen, T. Y., & Rónnquist, R. (1999). Automated test case generation for bdi agents. Autonomous Agents and Multi-agent Systems, 2(4), 311–332.CrossRefGoogle Scholar
  52. 52.
    Madden, M. G., & Howley, T. (2004). Transfer of experience between reinforcement learning environments with progressive difficulty. Artificial Intelligence Review, 21(3), 375–398.zbMATHCrossRefGoogle Scholar
  53. 53.
    Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115(2), 300.CrossRefGoogle Scholar
  54. 54.
    Michel, F. (2004). Formalisme, outils et éléments méthodologiques pour la modélisation et la simulation multi-agents. PhD thesis, Université des sciences et techniques du Languedoc, Montpellier.Google Scholar
  55. 55.
    Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.CrossRefGoogle Scholar
  56. 56.
    Orponen, P., Ko, K. I., Schöning, U., & Watanabe, O. (1994). Instance complexity. Journal of the ACM (JACM), 41(1), 96–121.zbMATHCrossRefGoogle Scholar
  57. 57.
    Simon, H. A., & Kotovsky, K. (1963). Human acquisition of concepts for sequential patterns. Psychological Review, 70(6), 534.CrossRefGoogle Scholar
  58. 58.
    Team, R., et al. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar
  59. 59.
    Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94.Google Scholar
  60. 60.
    Wiering, M., & van Otterlo, M. (Eds.). (2012). Reinforcement learning: State-of-the-art. Berlin: Springer.Google Scholar
  61. 61.
    Wolfram, S. (2002). A new kind of science. Champaign, IL: Wolfram Media.zbMATHGoogle Scholar
  62. 62.
    Zatuchna, Z., & Bagnall, A. (2009). Learning mazes with aliasing states: An LCS algorithm with associative perception. Adaptive Behavior, 17(1), 28–57.MathSciNetCrossRefGoogle Scholar
  63. 63.
    Zenil, H. (2010). Compression-based investigation of the dynamical properties of cellular automata and other systems. Complex Systems, 19(1), 1–28.zbMATHMathSciNetGoogle Scholar
  64. 64.
    Zenil, H. (2011). Une approche expérimentale à la théorie algorithmique de la complexité. PhD thesis, Dissertation in fulfilment of the degree of Doctor in Computer Science, Université de Lille.Google Scholar
  65. 65.
    Zenil, H., Soler-Toscano, F., Delahaye, J. P. & Gauvrit, N. (2012). Two-dimensional kolmogorov complexity and validation of the coding theorem method by compressibility. arXiv, preprint arXiv:1212.6745.

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.DSICUniversitat Politècnica de ValènciaValenciaSpain

Personalised recommendations