Knowledge and Information Systems

, Volume 37, Issue 3, pp 665–692 | Cite as

Adaptive goal selection for agents in dynamic environments

  • Huiliang Zhang
  • Xudong Luo
  • Chunyan Miao
  • Zhiqi Shen
  • Jin You
Regular Paper

Abstract

In psychology, goal-setting theory, which has been studied by psychologists for over 35 years, reveals that goals play significant roles in incentive, action and performance for human beings. Based on this theory, a goal net model has been proposed to design intelligent agents that can be viewed as a soft copy of human being somehow. The goal net model has been successfully applied in many agents, specially, non-player-character agents in computer games. Such an agent selects the optimal solution in all possible solutions found by using a recursive algorithm. However, if a goal net is very complex, the time of selection could be too long for the agent to respond quickly when the agent needs to re-select a new solution against the world’s change. Moreover, in some dynamic environments, it is impossible to know the exact outcome of choosing a solution in advance, and so the possible solutions cannot be evaluated precisely. Thus, to address the problem, this paper applies learning algorithm into goal selection in dynamic environments. More specifically, we first develop a reorganization algorithm that can convert a goal net to its equivalent counterpart that a Q-learning algorithm can operate on; then, we define the key component of Q-learning, reward function, according to the feature of goal nets; and finally lots of experiments are conducted to show that, in dynamic environments, the agent with the learning algorithm significantly outperforms the one with the recursive searching algorithm. Therefore, our work suggests an agent model that can effectively be applied in dynamic time-sensitive domain, like computer games and the P2P systems of online movie watching.

Keywords

Agent Goal net Planning Machine learning Q-learning Aggregation operators 

Notes

Acknowledgments

The authors would like to thank the anonymous reviewers for very helpful comments. This paper is partially supported by National Natural Science Foundation of China (No. 61173019), Bairen plan of Sun Yat-sen University, and major projects of the Ministry of Education China (No. 10JZD0006).

References

  1. 1.
    Ahmed R, Karypis G (2012) Algorithms for mining the evolution of conserved relational states in dynamic networks. Knowl Inf Syst 33(3):603–630CrossRefGoogle Scholar
  2. 2.
    Amyot D, Ghanavati S, Horkoff J, Mussbacher G, Peyton L, Yu E (2010) Evaluating goal models within the goal-oriented requirement language. Int J Intell Syst 25(8):841–877CrossRefGoogle Scholar
  3. 3.
    Anchuri P, Zaki MJ, Barkol O, Bergman R, Felder Y, Golan S, Sityon A (2012) Graph mining for discovering infrastructure patterns in configuration management databases. Knowl Inf Syst 33(3):491–522CrossRefGoogle Scholar
  4. 4.
    Baylor AL, Kim Y (2005) Simulating instructional roles through pedagogical agents. Int J Artif Intell Educ 15(2):95–115Google Scholar
  5. 5.
    Blashford-Snell V (2008) The cooking book. Dorling Kindersley, London, United KingdomGoogle Scholar
  6. 6.
    Chang M, He M, Luo X (2010) Designing a successful adaptive agent for TAC Ad auction. In: Proceedings of the 19th European conference on artificial intelligence, Lisbon, Portugal, pp 587–592Google Scholar
  7. 7.
    Chang M, He M, Luo X (2011) AstonCAT-plus: an efficient specialist for the tac market design tournament. In: Proceedings of 22nd international joint conferences on, artificial intelligence, pp 146–151Google Scholar
  8. 8.
    Dubois D, Grabisch M, Modave F, Prade H (2000) Relating decision under uncertainty and multicriteria decision making models. Int J Intell Syst 15:967979Google Scholar
  9. 9.
    Erol K, Hendler J, Nau DS (1994) HTN planning: complexity and expressivity. In: Proceedings of the 12th national conference on artificial intelligence, Seattle, WA, USA, pp 1123–1128Google Scholar
  10. 10.
    Fikes RE, Nilsson NJ (1971) STRIPS: a new approach in the application of theorem proving to problem solving. Artif Intell 2:189–208CrossRefMATHGoogle Scholar
  11. 11.
    Fu Y, Zhu X, Li B (2012) A survey on instance selection for active learning. Knowl Inf Syst 1–35. doi: 10.1007/s10115-012-0507-8
  12. 12.
    Haake M, Gulz A (2009) A look at the roles of look and roles in embodied pedagogical agents—a user preference perspective. Int J Artif Intell Educ 19(1):39–71Google Scholar
  13. 13.
    He M, Leung H, Jennings NR (2003) A fuzzy logic based bidding strategy for autonomous agents in continuous double auctions. IEEE Trans Knowl Data Eng 15(6):1345–1363CrossRefGoogle Scholar
  14. 14.
    He M, Rogers A, Luo X, Jennings NR (2006) Designing a successful trading agent for supply chain management. In: Proceedings of the 5th international conference on autonomous agents and multi-agent systems, Hakodate, Japan, pp 1159–1166Google Scholar
  15. 15.
    Hogg C, Hector M-A, Kuter U (2008) HTN-MAKER: learning HTNs with minimal additional knowledge engineering required. In: Proceedings of the 23rd AAAI conference on artificial intelligence, Illinois, Chicago, pp 950–956Google Scholar
  16. 16.
    Hogg C, Kuter U, Munoz-Avila H (2009) Learning hierarchical task networks for nondeterministic planning domains. In: Proceedings of the 21st international joint conference on artificial intelligence, Pasadena, California, USA, pp 1708–1714Google Scholar
  17. 17.
    Huang Z, Huang Q (2012) To reach consensus using uninorm aggregation operator: a gossip-based protocol. Int J Intell Syst 27:375–395CrossRefGoogle Scholar
  18. 18.
    Ilghami O, Munoz-Avila H, Nau DS (2002) CaMeL: learning method preconditions for HTN planning. In: Proceedings of the 6th international conference on AI planning and scheduling, Toulouse, France, pp 131–141Google Scholar
  19. 19.
    Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285Google Scholar
  20. 20.
    Kapp MN, Sabourin R, Maupin P (2011) A dynamic optimization approach for adaptive incremental learning. Int J Intell Syst 26:1101–1124CrossRefGoogle Scholar
  21. 21.
    Lamsweerde AV (2001) Goal-oriented requirements engineering: a guided tour. In: Proceedings of the 5th IEEE international symposium on requirements engineering, Canada, Toronto, pp 249–263Google Scholar
  22. 22.
    Lovejoy W (2010a) Bargaining chains. Manage Sci 56(12):2282–2301Google Scholar
  23. 23.
    Lovejoy W (2010b) Conversations with supply chain managers. Technical report, Ross School of Business, University of Michigan, Working paper 1145Google Scholar
  24. 24.
    Luo X (2012) The evaluation of a knowledge based acquisition system of fuzzy tradeoff strategies for negotiating agents. In: Proceedings of the 14th annual international conference on electronic commerce, Singapore, pp 157–158Google Scholar
  25. 25.
    Luo X, Jennings NR (2007) A spectrum of compromise aggregation operators for multi-attribute decision making. Artif Intell 171(2–3):161–184MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Luo X, Jennings NR, Shadbolt N (2006) Acquiring user tradeoff strategies and preferences for negotiating agents: a default-then-adjust method. Int J Man-Mach Stud 64(4):304–321Google Scholar
  27. 27.
    Luo X, Jennings NR, Shadbolt NR, Leung H-f, Lee JH-m (2003) A fuzzy constraint based model for bilateral, multi-issue negotiations in semi-competitive environments. Artif Intell 148(1–2):53–102MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    Luo X, Lee JH-m, Leung H-f, Jennings NR (2003) Prioritised fuzzy constraint satisfaction problems: axioms, instantiation and validation. Fuzzy Sets Syst 136(2):151–188MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Luo X, Miao C, Jennings N, He M, Shen Z, Zhang M (2012) KEMNAD: a knowledge engineering methodology for negotiating agent development. Comput Intell 28(1):51–105MathSciNetCrossRefGoogle Scholar
  30. 30.
    Ma W, Xiong W, Luo X (2013) A model for decision making with missing, imprecise, and uncertain evaluations of multiple criteria. Int J Intell Syst 28(2):152–184CrossRefGoogle Scholar
  31. 31.
    McDermott D, Ghallab M, Howe A, Knoblock C, Ram A, Veloso M, Weld D, Wilkins D (1998) PDDL—the planning domain definition language. Technique report, Yale Center for Computational Vision and ControlGoogle Scholar
  32. 32.
    Merigó JM, Casanovas M (2011) The uncertain induced quasi-arithmetic owa operator. Int J Intell Syst 26(1):1–24CrossRefMATHGoogle Scholar
  33. 33.
    Merrick K, Maher ML (2009) Motivated learning from interesting events: adaptive, multitask learning agents for complex environments. Adapt Behav 17(1):7–27CrossRefGoogle Scholar
  34. 34.
    Mitchell TM (1997) Machine learning. McGraw Hill, Burr RidgeMATHGoogle Scholar
  35. 35.
    Nagurney A (2006) Supply chain network economics: dynamics of prices, flows, and profits. Edward Elgar Publishing, CheltenhamGoogle Scholar
  36. 36.
    Natarajan S, Tadepalli P, Fern A (2012) A relational hierarchical model for decision-theoretic assistance. Knowl Inf Syst 32(2):329–349CrossRefGoogle Scholar
  37. 37.
    Nowaczyk S (2006) Learning of agents with limited resources. In: Proceedings of the 21st national conference on artificial intelligence, Boston, Massachusetts, pp 1893–1894Google Scholar
  38. 38.
    Pednault E (1989) ADL: exploring the middle ground between strips and the situation calculus. In: Proceedings of the 1st international conference on principles of knowledge representation and reasoning, Toronto, ON. Morgan Kaufmann, San Mateo, CA, pp 324–332Google Scholar
  39. 39.
    Rao AS, Georgeff M (1995) BDI agents: from theory to practice. In: Lesser VR, Gasser L (eds) Proceedings of the 1st international conference on multiagent systems, San Francisco, USA, pp 312–319Google Scholar
  40. 40.
    Sacerdoti E (1975) The non-linear nature of plans. In: Proceedings of the 4th international joint conference on artificial intelligence, Tbilisi, Georgia, pp 206–214Google Scholar
  41. 41.
    Shen Z (2005) Goal-oriented modeling for intelligent agents and their applications. PhD thesis, Nanyang Technological UniversityGoogle Scholar
  42. 42.
    Shen Z, Li D, Miao C, Gay R (2005) Goal-oriented methodology for agent system development. In: Proceedings of the 2005 IEEE/WIC/ACM international conference on intelligent agent technology, Compiegne, France, pp 95–101Google Scholar
  43. 43.
    Shen Z, Miao C, Gay R (2004) Goal oriented modeling for intelligent software agents. In: Proceedings of the 2004 IEEE/WIC/ACM international conference on intelligent agent technology, Beijing, China, pp 540–543Google Scholar
  44. 44.
    Shen Z, Miao C, Miao Y, Tao X, Gay R (2006) A goal-oriented approach to goal selection and action selection. In: Proceedings of the IEEE international conference on fuzzy systems, 2006. IEEE world congress on computational intelligence, Vancouver, BC, Canada, pp 114–121Google Scholar
  45. 45.
    Singh D, Sardina S, Padgham L, Airiau S (2010) Learning context conditions for BDI plan selection. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems, Toronto, Canada, pp 325–332Google Scholar
  46. 46.
    Strobbe M, Van Laere O, Dhoedt B, De Turck F, Demeester P (2012) Hybrid reasoning technique for improving context-aware applications. Knowl Inf Syst 31(3):581–616CrossRefGoogle Scholar
  47. 47.
    Sukthankar G, Sycara K (2008) Hypothesis pruning and ranking for large plan recognition problems. In: Proceedings of the 23rd national conference on artificial intelligence, vol 2. Chicago, Illinois, pp 998–1003Google Scholar
  48. 48.
    Warren JF (2003) Rickshaw coolie: a people’s history of Singapore, 1880–1940. NUS Press, SingaporeGoogle Scholar
  49. 49.
    Watkins C, Dayan P (1992) Q-learning. Mach Learn 8:279–292MATHGoogle Scholar
  50. 50.
    Xiao Z, Chen W, Li L (2012) A method based on interval-valued fuzzy soft set for multi-attribute group decision-making problems under uncertain environment. Knowl Inf Syst 1–17. doi: 10.1007/s10115-012-0496-7
  51. 51.
    Yager R (1988) On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Trans Syst Man Cybernet 18(1):183–190MathSciNetCrossRefMATHGoogle Scholar
  52. 52.
    Yager RR (2008) Decision making under dempster-shafer uncertainties. In: Yager RR, Liping L (eds) Classic works of the Dempster-Shafer theory of belief functions. Springer, Berlin, pp 619–632Google Scholar
  53. 53.
    Yager R, Rybalov A (1996) Uninorm aggregation operators. Fuzzy Sets Syst 80:111–120MathSciNetCrossRefMATHGoogle Scholar
  54. 54.
    Yang Q, Wu K, Jiang Y (2007) Learning action models from plan examples using weighted MAX-SAT export. Artif Intell 171(2–3):107–143MathSciNetCrossRefMATHGoogle Scholar
  55. 55.
    Zhang H, Shen Z, Miao C (2009) Enabling goal oriented action planning with goal net. In: Proceedings of the 2009 IEEE/WIC/ACM international conference on intelligent agent technology, Milano, Italy, pp 271–274Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Huiliang Zhang
    • 1
  • Xudong Luo
    • 2
  • Chunyan Miao
    • 1
  • Zhiqi Shen
    • 3
  • Jin You
    • 4
  1. 1.School of Computer EngineeringNanyang Technological UniversitySingaporeSingapore
  2. 2.Institute of Logic and CognitionSun Yat-sen UniversityGuangzhouChina
  3. 3.School of Electrical and Electronic EngineeringNanyang Technological University SingaporeSingapore
  4. 4.Department of PsychologyUniversity of HoustonHoustonUSA

Personalised recommendations