Autonomous Agents and Multi-Agent Systems

, Volume 31, Issue 1, pp 151–177 | Cite as

Evaluating fault tolerance approaches in multi-agent systems



A multi-agent system (MAS) is a distributed system that consists of multiple agents working together to solve mutual problems. Even though MASs are well suited for the development of complex distributed systems, the number of real-world usages is still small. One of the main reasons for this is that MASs are very fragile. In a typical, large-scale MAS, the rate of failure grows with the number of hosts, the number of deployed agents, and the duration of the agent’s task execution. For this reason, numerous approaches have been introduced to deal with aspects of failure handling. However, the absence of centralized control and a large number of individual intelligent components makes it difficult to detect and treat errors. The risk of uncontrollable fault propagation is high and can seriously impact on system performance. There are two important factors that limit the usage of MASs: (1) existing fault tolerance (FT) approaches are not generic, as they focus on and improve specific issues of FT; and (2) despite the plethora of available FT approaches and theories, there is a remarkable lack of general metrics, tools, benchmarks, and experimental methods for formal validation and comparison of existing or newly developed FT approaches. As FT approaches in MASs become a well-established field, the need for generalized, standardized evaluation of FT approaches emerges as imperative. In this paper, we first present a detailed overview of existing FT solutions, approaches, and techniques in agent platform hosted MASs. From that overview, we derive the commonalities in existing research. Next, we present the main contribution of our paper: an evaluation methodology, with a set of metrics, for comparing FT approaches in MASs. We adopt an engineering perspective on the problem, defining a methodology and metrics that are both implementation- and domain-independent. The metrics are formalized with an acyclic directed graph. By using our methodology, evaluators can select an appropriate FT approach for targeted MAS application, thus improving MAS usability, stability, and development speed. In order to show the viability of our approach, a case study that compares two FT approaches for a targeted MAS is presented. The case study results show that our methodology can be used for selecting an appropriate FT approach for the targeted MAS.


Fault tolerance Multi-agent systems Metrics Methodology Fault tolerance approach evaluation 


  1. 1.
    Tanenbaum, A. S., & Steen, M. V. (2002). Distributed systems: principles and paradigms. Upper Saddle River: Prentice Hall.MATHGoogle Scholar
  2. 2.
    Bellifemine, F. L., Caire, G., & Greenwood, D. (2007). Developing multi-agent systems with JADE. West Sussex: Wiley.CrossRefGoogle Scholar
  3. 3.
    Rudowsky, I. (2004). Intelligent agents. The Communications of the Association for Information Systems, 14(1), 48.Google Scholar
  4. 4.
    Wooldridge, M. (1997). Agent-based software engineering. IEE Proceedings Software, 144(1), 26–37.CrossRefGoogle Scholar
  5. 5.
    Decker, K. S., & Sycara, K. (1997). Intelligent adaptive information agents. Journal of Intelligent Information Systems, 9(3), 239–260.CrossRefGoogle Scholar
  6. 6.
    Punithavathi, R., & Duraiswamy, K. (2010). A fault tolerant mobile agent information retrieval system. Journal of computer science, 6(5), 553.CrossRefGoogle Scholar
  7. 7.
    Jurasovic, K., Kusek, M., & Jezic, G. (2009). Multi-agent service deployment in telecommunication networks. Agent and multi-agent systems: technologies and applications (pp. 560–569). Berlin: Springer.CrossRefGoogle Scholar
  8. 8.
    Yang, Z., Ma, C., Feng, J. Q., Wu, Q. H., Mann, S., & Fitch, J. (2006). A multi-agent framework for power system automation. International Journal of Innovations in Energy Systems and Power, 1(1), 39–45.Google Scholar
  9. 9.
    Zhang, Z., McCalley, J. D., Vishwanathan, V., & Honavar, V. (June, 2004). Multiagent system solutions for distributed computing, communications, and data integration needs in the power industry. In Power Engineering Society General Meeting, 2004, IEEE (pp. 45-49). IEEE.Google Scholar
  10. 10.
    Fedoruk, A., & Deters, R. (July, 2002). Improving fault-tolerance by replicating agents. In Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 2 (pp. 737–744). ACM.Google Scholar
  11. 11.
    Batouma, N., & Sourrouille, J. L. (2011). Dynamic adaption of resource aware distributed applications. International journal of grid and distributed computing, 4(2), 25–42.Google Scholar
  12. 12.
    Anon. (2002). SLA Information Zone. Accessed June 28, 2014.
  13. 13.
    Ahmad, H. F., Sun, G., & Mori, K. (2001). Autonomous information provision to achieve reliability for users and providers. In Proceedings. 5th International Symposium on Autonomous Decentralized Systems, 2001 (pp. 65–72). IEEE.Google Scholar
  14. 14.
    Ahmad, H. F., & Suguri, H. (April, 2003). Dynamic information allocation through mobile agents to achieve load balancing in evolving environment. In The Sixth International Symposium on Autonomous Decentralized Systems, 2003. ISADS 2003 (pp. 25-33). IEEE.Google Scholar
  15. 15.
    Huhns, M. N., et al. (2005). Research directions for service-oriented multiagent systems. IEEE Internet Computing, 9(6), 65.CrossRefGoogle Scholar
  16. 16.
    Calisti, M., et al. (2010). Service-oriented architectures and multi-agent systems technology. Dagstuhl Seminar Proceedings (p. 10021).Google Scholar
  17. 17.
    Briot, J. P., & Ghédira, K. (2003). Déploiement des systemes multi-agents-Vers un passagea l’échelle-JFSMA’03. Revue des Sciences et Technologies de l’Information (RSTI).Google Scholar
  18. 18.
    Kumar, S., & Cohen, P. R. (June, 2000). Towards a fault-tolerant multi-agent system architecture. In Proceedings of the Fourth International Conference on Autonomous Agents (pp. 459-466). ACM.Google Scholar
  19. 19.
    Almeida, A. L., Aknine, S., Briot, J. P., & Malenfant, J. (April, 2006). Plan-based replication for fault-tolerant multi-agent systems. In Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International (p. 7). IEEE.Google Scholar
  20. 20.
    Isong, B. E., & Bekele, E. (2013). A systematic review of fault tolerance in mobile agents. American Journal of Software Engineering and Applications, 2(5), 111–124.CrossRefGoogle Scholar
  21. 21.
    Stanković, R., & Štula, M. (February, 2013). Fault tolerance through interaction and mutual cooperation in hierarchical multi-agent systems. In 5th International Conference on Agents and Artificial Intelligence.Google Scholar
  22. 22.
    Marin, O. (2003). The Darx framework: Adapting fault tolerance for agent systems (Doctoral dissertation, Université Paris VI).Google Scholar
  23. 23.
    Tosic, M., & Zaslavsky, A. (2005). Reliable multi-agent systems with persistent publish/subscribe messaging. Innovations in applied artificial intelligence (pp. 165–174). Berlin: Springer.CrossRefGoogle Scholar
  24. 24.
    Kumar, S., Cohen, P. R., & Levesque, H. J. (2000). The adaptive agent architecture: Achieving fault-tolerance using persistent broker teams. In Proceedings. Fourth International Conference on MultiAgent Systems, 2000 (pp. 159–166). IEEE.Google Scholar
  25. 25.
    Faci, N., Guessoum, Z., & Marin, O. (May, 2006). DimaX: A fault-tolerant multi-agent platform. In Proceedings of the 2006 International Workshop on Software Engineering for Large-Scale Multi-Agent Systems (pp. 13–20). ACM.Google Scholar
  26. 26.
    Mitrovic, D., Budimac, Z., Ivanovic, M., & Vidakovic, M. (October, 2010). Improving fault-tolerance of distributed multi-agent systems with mobile network-management agents. In Proceedings of the 2010 International Multiconference on Computer Science and Information Technology (IMCSIT)(pp. 217–222). IEEE.Google Scholar
  27. 27.
    Summiya, S., Ijaz, K., Manzoor, U., & Ali Shahid, A. (November, 2006). A fault tolerant infrastructure for mobile agent. In Proceedings of the International Conference on Computational Intelligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (p. 235). IEEE Computer Society.Google Scholar
  28. 28.
    Yang, J., Cao, J., Wu, W., & Xu, C. Z. (2005). Parallel algorithms for fault-tolerant mobile agent execution. Distributed and parallel computing (pp. 246–256). Berlin: Springer.CrossRefGoogle Scholar
  29. 29.
    Jin, G., Ahn, B., & Lee, K. D. (2004). A fault-tolerant protocol for mobile agent. Computational science and its applications-ICCSA 2004 (pp. 993–1001). Berlin: Springer.CrossRefGoogle Scholar
  30. 30.
    Johansen, D., Marzullo, K., Schneider, F. B., Jacobsen, K., & Zagorodnov, D. (1999). NAP: Practical fault-tolerance for itinerant computations. In 19th IEEE International Conference on Distributed Computing Systems, 1999. Proceedings (pp. 180–189). IEEE.Google Scholar
  31. 31.
    Klügl, F. (2008). Measuring complexity of multi-agent simulations—An attempt using metrics. Languages, methodologies and development tools for multi-agent systems (pp. 123–138). Berlin: Springer.CrossRefGoogle Scholar
  32. 32.
    Wille, C., Brehmer, N., & Dumke, R. R. (2004). Software measurement of agent-based systems an evaluation study of the agent academy. Technical Report Preprint No. 3, Faculty of Informatics, University of Magdeburg.Google Scholar
  33. 33.
    Such, J. M., Alberola, J. M., Mulet, L., Espinosa, A., Garcia-Fornes, A., & Botti, V. (2007). Large-scale multiagent platform benchmarks. In Languages, Methodologies and Development Tools for Multi-Agent Systems (LADS 2007). Proceedings of the Multi-Agent Logics, Languages, and Organisations-Federated Workshops (pp. 192–204).Google Scholar
  34. 34.
    Kusek, K., Jurasovic, G., & Jezic, M. (2006). A performance analysis of multi-agent systems. International Transactions on Systems Science and Applications, 1(4).Google Scholar
  35. 35.
    Alberola, J. M., Such, J. M., Garcia-Fornes, A., Espinosa, A., & Botti, V. (2010). A performance evaluation of three multiagent platforms. Artificial Intelligence Review, 34(2), 145–176.CrossRefGoogle Scholar
  36. 36.
    Mulet, L., Such, J. M., & Alberola, J. M. (May, 2006). Performance evaluation of open-source multiagent platforms. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multiagent Systems (pp. 1107–1109). ACM.Google Scholar
  37. 37.
    Fernández, V., Grimaldo, F., Lozano, M., & Orduna, J. M. (2010). Evaluating Jason for distributed crowd simulations. In ICAART (2) (pp. 206–211).Google Scholar
  38. 38.
    Pérez-Carro, P., Grimaldo, F., Lozano, M., & Orduna, J. M. (2014). Characterization of the Jason multiagent platform on multicore processors. Scientific Programming, 22(1), 21–35.CrossRefGoogle Scholar
  39. 39.
    Silva, L. M., Soares, G., Martins, P., Batista, V., & Santos, L. (2000). Comparing the performance of mobile agent systems: A study of benchmarking. Computer Communications, 23(8), 769–778.CrossRefGoogle Scholar
  40. 40.
    Krippendorff, K. (1986). A dictionary of cybernetics. Norfolk: The American Society for Cybernetics.Google Scholar
  41. 41.
    Aprameya Rao, I. V., Jain, M., & Karlapalem, K. (May, 2007). Towards simulating billions of agents in thousands of seconds. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (p. 143). ACM.Google Scholar
  42. 42.
    Cardoso, R. C., Hübner, J. F., & Bordini, R. H. (2013). Benchmarking communication in actor-and agent-based languages. Engineering multi-agent systems (pp. 58–77). Berlin: Springer.CrossRefGoogle Scholar
  43. 43.
    Wilensky, U., (1999). NetLogo Home Page Accessed May 06, 2015.
  44. 44.
    Dimou, C., Symeonidis, A. L., & Mitkas, P. (April, 2007). Towards a generic methodology for evaluating MAS performance. In International Conference on Integration of Knowledge Intensive Multi-Agent Systems, 2007. KIMAS 2007 (pp. 174–179). IEEE.Google Scholar
  45. 45.
    Zadeh, L. A. (2002). In quest of performance metrics for intelligent systems—A challenge that cannot be met with existing methods. CALIFORNIA UNIV BERKELEY DEPT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE.Google Scholar
  46. 46.
    Evans, J. M., & Messina, E. R. (2001). Performance metrics for intelligent systems. NIST SPECIAL PUBLICATION SP (pp. 101–104).Google Scholar
  47. 47.
    Fenton, N., & Bieman, J. (2014). Software metrics: A rigorous and practical approach. Boca Raton: CRC Press.CrossRefMATHGoogle Scholar
  48. 48.
    Hu, X., & Zeigler, B. P. (2004). Measuring cooperative robotic systems using simulation-based virtual environment. DE LA SALLE UNIV MANILA (PHILIPPINES) COLLEGE OF BUSINESS AND ECONOMICS.Google Scholar
  49. 49.
    Nelson, A., Grant, E., & Henderson, T. (2002). Competitive relative performance evaluation of neural controllers for competitive game playing with teams of real mobile robots. NIST SPECIAL PUBLICATION SP, 43–50.Google Scholar
  50. 50.
    Scholtz, J., Antonishek, B., & Young, J. (2004). Evaluation of human–robot interaction in the NIST reference search and rescue test arenas. In Proceedings in the Performance Metrics for Intelligent Systems (PerMIS ’04).Google Scholar
  51. 51.
    Nowosielski, R., Gerlach, L., Payá-Vayá, G., Hesselbarth, S., & Blume, H. Methodology for observation and evaluation of fault tolerance implementations inside high temperature ASICs.Google Scholar
  52. 52.
    McCann, J. A., & Huebscher, M. C., (January, 2004). Evaluation issues in autonomic computing. In Grid and Cooperative Computing-GCC 2004 Workshops (pp. 597–608). Berlin: Springer.Google Scholar
  53. 53.
    Wooldridge, M., Jennings, N. R., & Kinny, D. (2000). The Gaia methodology for agent-oriented analysis and design. Autonomous Agents and multi-agent systems, 3(3), 285–312.CrossRefGoogle Scholar
  54. 54.
    Zambonelli, F., Jennings, N. R., & Wooldridge, M. (2003). Developing multiagent systems: The Gaia methodology. ACM Transactions on Software Engineering and Methodology (TOSEM), 12(3), 317–370.CrossRefGoogle Scholar
  55. 55.
    Deloach, S. (2004). The MaSE methodology. Methodologies and Software Engineering for Agent Systems-The Agent-Oriented Software Engineering Handbook Series: Multiagent Systems, Artificial Societies, and Simulated Organizations, 11, 107–125.CrossRefGoogle Scholar
  56. 56.
    Bresciani, P., Perini, A., Giorgini, P., Giunchiglia, F., & Mylopoulos, J. (2004). Tropos: An agent-oriented software development methodology. Autonomous Agents and Multi-Agent Systems, 8(3), 203–236.CrossRefMATHGoogle Scholar
  57. 57.
    Elammari, M., & Lalonde, W. (June, 1999). An agent-oriented methodology: High-level and intermediate models. In Proceedings of the 1st International Workshop on Agent-Oriented Information Systems (pp. 1–16).Google Scholar
  58. 58.
    Padgham, L., & Winikoff, M. (2003). Prometheus: A methodology for developing intelligent agents. Agent-oriented software engineering III (pp. 174–185). Berlin: Springer.CrossRefGoogle Scholar
  59. 59.
    Bauer, B., & Odell, J. (2005). UML 2.0 and agents: How to build agent-based systems with the new UML standard. Engineering Applications of Artificial Intelligence, 18(2), 141–157.CrossRefGoogle Scholar
  60. 60.
    Mellouli, S. (2005). FATMAS: A methodology to design fault-tolerant multi-agent systems (Doctoral dissertation, Université Laval).Google Scholar
  61. 61.
    Abdelaziz, T., Elammari, M., Unland, R., & Branki, C. (2010). MASD: Multi-agent systems development methodology. Multiagent and Grid Systems, 6(1), 71–101.CrossRefMATHGoogle Scholar

Copyright information

© The Author(s) 2015

Authors and Affiliations

  1. 1.Siemens CVCSplitCroatia
  2. 2.Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture, FESBUniversity of SplitSplitCroatia

Personalised recommendations