Model-driven design space exploration for multi-robot systems in simulation

Multi-robot systems are increasingly deployed to provide services and accomplish missions whose complexity or cost is too high for a single robot to achieve on its own. Although multi-robot systems offer increased reliability via redundancy and enable the execution of more challenging missions, engineering these systems is very complex. This complexity affects not only the architecture modelling of the robotic team but also the modelling and analysis of the collaborative intelligence enabling the team to complete its mission. Existing approaches for the development of multi-robot applications do not provide a systematic mechanism for capturing these aspects and assessing the robustness of multi-robot systems. We address this gap by introducing ATLAS, a novel model-driven approach supporting the systematic design space exploration and robustness analysis of multi-robot systems in simulation. The ATLAS domain-speciﬁc language enables modelling the architecture of the robotic team and its mission and facilitates the speciﬁcation of the team’s intelligence. We evaluate ATLAS and demonstrate its effectiveness in three simulated case studies: a healthcare Turtlebot-based mission and two unmanned underwater vehicle missions developed using the Gazebo/ROS and MOOS-IvP robotic platforms, respectively.


Introduction
Multi-Robot Systems (MRS) are a class of systems in which distributed and interconnected robots are orchestrated to perform missions whose complexity and cost are too high for individual robots to accomplish on their own [1]. MRS play a strategic role in safety-critical and business-critical missions ranging from precision agriculture and fast delivery of medical samples to real-time road traffic monitoring and infrastructure inspection [2]. The intrinsic characteristics of these missions, i.e. distributed sensing and action, uncertain operating environment, and the need for endurance and robust behaviour, necessitate the use of MRS instead of single-robot solutions. MRS bring additional benefits including improved scalability and performance (missions can be performed more efficiently through parallelism if they are decomposable), mission enablement (use of collective intelligence to execute missions beyond the capabilities of individuals) and increased robustness and reliability through redundancy.
MRS can execute their missions through collective intelligence (CI) algorithms that encapsulate communication policies within the closed-loop control of individual robots (e.g., MAPE-K [3]) and adaptation strategies for delegating responsibilities within the team. The use of CI enables capitalising on the unique benefits offered by MRS in these business and safety-critical application domains [4]. Consider, for instance, a team of UUVs, each equipped with sonar sensors, deployed to discover hazardous objects by scanning a large marine area. A CI instance may partition this area based on the UUV sensor capabilities (e.g., reliability, energy consumption) while also specifying how the team will respond and redistribute pending tasks when a team member's battery is exhausted, or when a team member fails or experiences difficulties (e.g., when a UUV enters an area where the water salinity or temperature is outside the operating envelope of its sensors).
Selecting a suitable CI is a very important problem that directly affects the MRS performance and resilience [1]. An effective CI would enable MRS to cope with uncertain environments, evolving mission objectives and unpredictable degradation of robotic components (e.g., sensor failure of multiple robots) [5]. Clearly, this is a non-trivial problem. Engineers have a multitude of options both in terms of adaptation strategies and communication policies that complicate the process of designing, implementing, debugging and assessing candidate CI algorithms. The large design space that comprises robotics teams of different sizes and individual robots with a wide range of performance and functionality characteristics (e.g., sensor width, energy capacity) only exacerbates the task of choosing the most suitable robotic team and desired CI instantiation.
Despite recent advances in the specification and analysis of MRS [6,7], existing approaches either focus on providing specialised robotic functionality (e.g., perception, control) [8][9][10] or software for specific robotic platforms (e.g., ROS [11] or MOOS-IvP [12]). This limits their applicability to MRS missions characterised by simplistic CI behaviour resulting in reduced MRS resilience or bespoke algorithms that intertwine the CI logic with low-level platform-specific code resulting in significantly higher than expected maintenance cost [13]. These important limitations of existing approaches increase significantly the effort to explore the tradeoff between candidate MRS designs [14] and adaptation strategies of different CI algorithms [15].
In this paper we introduce ATLAS, a model-driven, toolsupported framework for the systematic engineering of MRS that facilitates the exploration and tradeoff analysis of candidate MRS designs and CI algorithms. Driven by insights derived from recent robotics surveys [14,16], ATLAS is underpinned by the following principles. First, the ATLAS domain-specific language (DSL) enables the specification of (i) the MRS mission, including both functional and non-functional requirements; and (ii) the characteristics of individual robots comprising the MRS, including architec-ture, internal behaviours, and capabilities (e.g., use of energy efficient or reliable sensors). Second, the ATLAS code generation engine consumes the MRS mission and system specifications, and produces the necessary infrastructure (i.e., ATLAS middleware, CI templates, target simulator logical interface) that enables the communication of the ATLAS components with the target robotic simulator (e.g., MOOS-IvP [12], ROS [11]). Finally, the low coupling between the ATLAS components supported by the CI templates improves system extensibility while also supporting tradeoff analysis between different CI algorithms.
The main contributions of this paper are: • The ATLAS tool-supported framework for the systematic engineering of MRS enabling tradeoff analysis of different MRS architectures and CI algorithms from the early stages of the MRS development process; • An extensive ATLAS evaluation of three MRS case studies built with ROS/Gazebo [11] and MOOS-IvP [12], widely used platforms for ground-based and UUV simulations, respectively.
• A prototype open-source ATLAS tool and case study repository, both available on our project webpage at https://www.github.com/jrharbin-york/ATLAS.
The design space exploration provided by ATLAS can be used to iteratively refine the system design as part of its systematic process. For example, ATLAS can show through tradeoff analysis that certain CI algorithms are not viable for the target robotic missions, or that some software components could lead to system configurations that do not satisfy the mission requirements. Therefore, ATLAS can support a cost-benefit analysis between CI choice and mission requirements. Although we do not attempt to trace the underlying cause of violations, visual inspection via the the user interface provided by the simulator provides a mechanism for error tracing and performance analysis. This paper extends significantly our previous work presented in [17] in the following ways: (1) the ATLAS framework is extended to fully support the ROS robotic platform and the Gazebo simulator, including considerable adaptation to the ATLAS DSL to facilitate the definition of MRS missions both for the MOOS-IvP and ROS/Gazebo robotic simulators; (2) new ATLAS components have been implemented enabling the automatic search and selection of robotic system configurations based on mission goals and model similarity leveraging robust model hashing [18]; (3) we adapt a ROS-based case study (developed by the project team as part of a related research project) that involves a team of mobile robots servicing patients within a healthcare facility and extended it with ATLAS -specific constructs and two complementary CI algorithms; (4) we evaluate the ATLAS framework on this new case study using three inter-  (5) we perform a more thorough analysis of related work. The paper is structured as follows. Section 2 presents the new case study from the healthcare that we use to illustrate ATLAS , which is detailed in Sect. 3. Sections 4 and 5 present our ATLAS implementation and evaluation, respectively. Finally, Sect. 6 discusses related work, and Sect. 7 summarises our results and suggests directions for future research.

Healthcare robotic team
We will illustrate our ATLAS approach using a team of multi-purpose mobile healthcare robots that collaborate to execute service tasks within a hospital facility. Healthcare robots are increasingly deployed in the clinical domain to carry out routine activities, supporting the often overwhelmed healthcare professionals (i.e., nurses, doctors and other medical staff) in their demanding and fast-paced working environment [19,20]. The robots can typically perform simple tasks including the delivery of medical materials to care units, transport of medical samples to the lab for analysis, and disinfection of patient rooms and operating suites [21]. For instance, using robots capable of disinfecting hospital environments by emitting concentrated Ultraviolet-C (UV-C) light is investigated within the context of the European Research Council funded project SESAME (https://www. sesame-project.org). Figure 1 shows three Turtlebot 3 Burger robots 1 operating within a simulated healthcare facility comprising a total of 20 rooms (with each room located between the vertical walls and either side of the long horizontal corridor). Each room may host a single patient or may be empty. When visiting an occupied room, a robot can check whether the patient needs assistance, measure their vital parameters or simply interact with the patient. To support localisation and navigation, each robot is equipped with laser distance and inertia measurement unit sensors (e.g., gyroscope, accelerometer), and a transceiver communicating with a centralised command-and-

R1
The robots must service all their assigned rooms.

R2
Each robot must return to its starting location for recharging before its battery is depleted and become stranded in the hospital environment.

R3
The robots must fulfil requirements R1 and R2 in the shortest possible time and before T end (the maximum mission end time).
control computer centre which runs the CI and coordinates the activities of the robotic team. Figure 1a depicts the simulation environment in Gazebo, whereas Fig. 1b shows the rviz visualisation capturing also the localisation area and navigation path per robot (i.e., showing that two robots are moving towards entering the rooms at the bottom left, with the last robot moving to its home location at the left of the hospital corridor). The CI employed by the command-and-control centre organises the robotic team by allocating the rooms that must be visited per robot so that the requirements in Table 1 are satisfied. To this end, an effective CI should partition the responsibilities among the robotic team so that all assignable rooms are serviced successfully (R1), enabling robots to return to their starting points when their assigned work is completed. Since the robots have limited energy capacity, the priority is to service as many rooms as possible before their batteries are exhausted (R2) in the shortest possible time, and before the maximum mission execution time T end = 335s (R3). The energy consumption model uses information computed from the Turtlebot3 specification and comprises the following causes of energy loss: (i) energy required by robotic electronic components like processor and sensors per time unit during mission execution (11 J/s); (ii) energy consumed by vehicle motors for propelling the robot (40 J/m); and (iii) average energy per room to carry out the service task (18kJ/room). Once allocation is made, each robot visits their allocated rooms in sequence before returning to its starting location. Unallocated rooms will be serviced by healthcare professionals with higher overheads than those incurred when serviced by the robotic team.
Engineering this robotic application entails considering the possible variability options that could influence the robots' behaviour, producing different system configurations. Accordingly, engineers may be interested in exploring whether the mission can be executed with a variable number of robots (e.g., two or three robots in this instance). While using all available robots can, in principle, help in completing the mission faster, it unavoidably increases the hardware maintenance costs and reduces the system's resilience, e.g., if two robots break down or deplete their battery at the same time they will compete for the repair and charging resources, respectively. Using fewer robots could also require a less demanding CI algorithm for controlling the room allocation and coordination between the robots. Furthermore, each robot can be equipped with a standard (71 kJ) or high capacity (220 kJ) battery, with the more powerful battery being significantly more expensive and requiring extra charging time.
Thus, system configurations that can complete the mission with at least one robot using the standard battery are preferred. Finally, selecting the subset of occupied rooms that the robotic team can service best is another variability dimension with the number of possible room combinations given by or i or ! (or −i)!×i! , where or and i are the occupied rooms and the number of rooms to be allocated to robots, respectively. As we explain later (Sect. 3.2) and to preserve the tractability of simulation analysis, we arrange the rooms into three groups of six rooms each. Collectively, these variability options produce 60 possible mission configurations (i.e., designs) for the robotic team.
In addition to those robotic team configurations, the command-and-control centre could employ different CI algorithms to coordinate the robotic team. For this mission, robotic engineers are interested in evaluating the following standard and advanced CI algorithms.
-Standard CI: This CI assigns rooms to robots following a round-robin manner considering also the distance between the room and the robot. Thus, this CI is easy to implement and requires minimal system knowledge, i.e., knowing only the location of each robot and the rooms. Despite its simplicity, this CI ignores the battery capacity per robot and may underperform when the work distribution is too much for robots with a standard battery.
-Energy-aware CI: Room allocation to robots using this CI is proportional to their energy levels. Thus, a robot with double the battery capacity will be allocated twice as many rooms to service (although this may not be exact in cases with small numbers of rooms). The energy-aware CI also dynamically reallocates work from a robot with a battery close to depletion to other robots, proportional to their remaining capacity. This feature can improve performance, but involves additional implementation complexity, including extra development and debugging effort, together with communication overheads to reassign work dynamically during the mission.

Overview
The high-level ATLAS architecture is depicted in Fig. 2. At its core, lies the ATLAS DSL (Sect. 3.2) that enables the specification of the robotic's team structure and mission objectives, complemented by a model-driven code generation engine for the specialisation of a middleware (Sect. 3.4), a simulator-specific logical interface, and CI templates that facilitate tradeoff analysis of CI algorithms (Sect. 3.5).
The middleware facilitates the communication between MRS team members and between the collective intelligence algorithms. These algorithms implement the high-level command and control which is responsible for the completion and adequate performance of the mission. For instance, for our healthcare use case from Sect. 2, the CI is responsible for issuing the initial tasks to be performed by each robot or for reassigning tasks when a team member has a low battery level.
The low-level logic of robotic team members is implemented using MRS implementation-specific functionality. Low-level behaviours, including station keeping, inter-robot avoidance and path navigation, are mediated by state held in the target robotic platform (e.g., in publish-subscribe databases). Therefore, although Fig. 2 illustrates the highlevel communication, the yellow MRS nodes on the right contain additional state and communication flows which are specific to the target robotic platform.
This key ATLAS feature of using a middleware to coordinate the high-level CI command and control exploits the modular architecture of modern robotic systems and leverages low-level support from the publish-subscribe communication protocol underpinning widely used robotic platforms including ROS [11] and MOOS-IvP [12]. Furthermore, the middleware enhances separation of concerns between the implementation of the robotic team and the devised CI algorithms which are responsible for steering the team to achieve its mission. This is another key feature of ATLAS that supports the investigation of multiple, and likely complementary, CI algorithms [15] (e.g., decentralised collective learning, leader-election algorithms) for carrying out the specified mission and comparing their non-functional attributes such as performance, reliability, and scalability.
The simulator-specific logical interface facilitates the communication between the middleware and the target robotic simulator, enabling not only to receive and transmit data but also to monitor the robotic team state during mission execution. Accordingly, this interface reduces the coupling between framework components and enhances extensibility. Furthermore, the interface provides a reusable extension point whose specialisation enables to connect and experiment with different robotic simulators with modest effort.
These key ATLAS features contribute in alleviating some of the key challenges identified in recent robotics surveys for developing, debugging and maintaining robotic applications [14,16]. The following sections provide details of the main components underpinning ATLAS , delineating also the fully fledged ATLAS instances for both the ROS and MOOS-IvP robotic simulators, providing evidence for the ATLAS generality. Figure 3 shows the high-level workflow of the ATLAS methodology. In Step 1, domain experts can use the robotic mission informed ATLAS DSL to produce models for their target MRS mission including requirements, mission goals, safety invariants, and the composition of the team. We note that, in principle, domain experts and system testers could employ other formalisms than the ATLAS DSL to represent these dimensions of their robotic mission and architecture. However, the following steps, including code generation and middleware interfacing, are fairly integrated with this specification. As expected, a corresponding set of model-tomodel transformation activities must take place to produce the required artefacts. Since this activity is not within the scope of this work, we leave it for future research.
The invocation of the ATLAS code generation engine in Step 2 uses the full information in the supplied models to automatically generate (i) a middleware configuration that enables the communication of the various system components; (ii) CI templates that will be used by the middleware to orchestrate the robotic team; (iii) interface code that enables the middleware to communicate directly with the target robotic simulator and extract the required information from the simulators; and (iv) the necessary configuration files for the target robotic simulator. In our prototype implementation, these transformations are performed using the Epsilon Generation Language (EGL) [22].  Next, in Step 3, engineers implement the logic and coordination of the robotic team. To achieve this, they populate the CI templates generated in Step 2 with suitable code that realises their selected CI algorithms. This separation of concerns between the implementation of the high-level behaviour (demonstrated by the CI algorithm) and the lowlevel functionality of a robot has two key benefits. First, it reduces the effort to analyse multiple CI algorithms without changing the underlying low-level system behaviour. Second, it enables reuse of already available low-level code developed for individual robotic team members that has been developed independently. Our current ATLAS implementation supports the generation of CI templates in Java. Further details about the CI implementation, alternative implementation strategies and choices for development language are presented in Sect. 3.5.
Once the necessary artefacts have been instantiated, the MRS simulation is automatically executed in Step 4. ATLAS records simulation results related to the MRS mission requirements in the form of logs comprising both messages exchanged between system components and events relevant to system goals and requirements that occurred during the simulation. The analysis of these results permits the computation of metrics that support the selection of the most effective MRS designs and the identification of the most suitable CI algorithm for the target mission.

DSL core concepts
The primary concepts of the ATLAS DSL consist of the Mission (Fig. 4) and the VariationProgram (Fig. 5). The Mission part includes the Goals that should be accomplished. Each goal is connected to the Region where it takes place and includes the specific GoalAction that should be executed for the fulfilment of the goal. Example actions include Patrolling an area, Avoiding an obstacle, etc. Although the DSL covers the vast majority of actions available in the current ROS and MOOS robotic environments, this is an extensibility point of the DSL. Accordingly, interested users can extend the GoalAction type to introduce new actions and the appropriate model-to-text (M2T) transformations to generate the corresponding robotic simulator specific implementations, which may be required for new use cases. For example, when testing a robotic monitoring scenario to discover potential collisions between robot arms and humans, the user could extend the DSL by implementing a new GoalAction subclass, e.g., Arm-CollisionTracker. The logic of this goal action would register potential collisions either directly supplied by the robotic simulator (if this information is available and ready to use) or 3D geometry information to be analysed post hoc after the simulation completes its execution.
Since goals may have their own constraints, (e.g., starting after or ending before a specific time), each goal is linked to a GoalTemporalConstraint defining the earliest starting and latest finish time of the goal. Each GoalAction is responsible for tracking the fulfilled/violated status of a particular high-level mission goal. As these are scenario-specific, the implementation of the action per goal is delegated to the user in Step 3 of the ATLAS methodology (cf. Fig. 3). For example, the GoalAction named TrackDistances will track the relative distances of robots to each other and check for possible intersection with environmental objects. This information is recorded by the middleware allowing the generation of related metrics for further analysis in Step 5.
Robotic simulators provide bespoke implementations of various behaviours such as navigating to specific coordinates (e.g., using the MoveBaseGoal and WayPoint behaviours in ROS and MOOS-IvP, respectively), and avoiding other robots and obstacles. These behaviours can be employed by low-level simulator code to facilitate the specification and execution of a robotic mission. This information is also important for the CI as it enables to dynamically alter the behaviour of a robot. In our healthcare robotic team, for instance, a conservative energy-aware CI can exploit this ROS construct to dynamically instruct the robot to return to its base for recharging when its battery level drops below a critical threshold, thus reducing the risk that the robot will get stuck within the healthcare facility and obstruct other robots or healthcare professionals.
In addition to the specification of high-level behaviours, the DSL supports the specification of SimulatorVariables that are relevant for the mission investigated and, therefore, need custom handling by the middleware and the CI algorithm. In ROS, for instance, a simulator variable is represented by the topic in a ROS computational graph instance that delivers this information to the middleware. More specifically, a SimulatorVariable is defined by its name, its type (e.g., for ROS, this is a string giving the ROS type name), a tag indicating the high-level nature of the variable (velocity, position, time, generic), and two boolean fields defining whether the variable is specific to a particular robot and if it should be propagated to the CI instance. The purpose of including these simulator variables in the DSL is to ensure the middleware subscribes to the variables and to enable the propagation of any updates to the CI. If the propagate-ToCI field is set to true, when an update occurs, the CI will be notified about the updated information and can respond appropriately to a particular scenario-specific MRS event, e.g., terminate the mission when notified about the successful traversal of a set of waypoints or raise an alarm when a robot is blocked within the healthcare facility because its battery has been exhausted. Multiple simulators variables can be linked within a behaviour and connected to high-level mission goals through the behaviours reference. In other words, the Behaviour class also enables specifying multiple simulator variables (topics in ROS) from the MRS simulator that capture the status of a robot.
Another important element of the ATLAS DSL is the set of Robots employed to satisfy a specific goal. Conforming to the hierarchical representation of many robotic systems [10], ATLAS enables the specification of the MRS architecture in a compositional manner (Fig. 5). A Component represents the top level element of this hierarchical representation. Components are divided into SystemComponent and Envi-ronmentalComponent. A SystemComponent can either be a Robot (e.g., the mobile robot in the healthcare robotic team from Sect. 2) or a Computer (e.g., the command-and- control centre that is responsible for the execution of the CI algorithm and the coordination of the robots). Robots and Computers comprise several subcomponents (e.g., Sensors, Actuators, MotionSources, etc.). Also, each component contains ComponentProperties of different datatypes that enable the specification of different characteristics of the associated component (e.g., nominal operating rate of a sensor or expected energy consumption). An EnvironmentalComponent permits modelling elements in the environment via the component concept. For example, there could be numerous objects in the environment, including rooms to be serviced in the healthcare case study, and target objects to be located in a marine detection case study (cf. Sect. 5). These environmental objects can be represented and grouped in clusters through the EnviromentalObjectGroup class which is a subclass of the EnvironmentalComponent. Figure 6 shows the model instance for the healthcare mission from Sect. 2. The model comprises the specifications of the three Turtlebots that are available to carry out the patient inspection tasks within the healthcare facility. The model includes also the goals that should be executed to assess the compliance with the case study requirements, i.e., track the distances travelled by the robots, check the completion tasks in the allocated rooms, and compute the total energy consumption by the robotic team. The two goals checkRoomsCompleted and trackEnergyHealthcare are custom implementations for the healthcare case study. Several simulator variables are also defined which allow ATLAS to monitor the completion status of rooms and the assigned work of robots. The /clock topic defines the simulator operating time, while the /amcl_pose topic defines the pose for each robot, with the isRobotSpecific field set to true indicating that a different subscription should be made for each of the three robots. Any changes to these variables will be monitored by the middleware and communicated to the CI. The configuration also includes the rooms as environmental

Searching for optimal robot configurations
Another key feature of the ATLAS DSL is the support for analysing candidate system configurations through design space exploration. To achieve this, ATLAS leverages concepts from the domain of software product line engineering [23][24][25], enabling the specification of alternative candidate designs as a VariationGroup with a defined cardinality (see Fig. 5). The members of a variation group are components, at the same level of the system specification hierarchy, that can participate in the mission. Accordingly, variation groups can be formed by SystemComponent subclasses such as robots and their subcomponents. For example, if there are different types of Sensor or Battery that a robot can use for a mission, those subcomponents can form another variation group. Environmental features that are subclasses of EnvironmentalComponents can form a variation group too. The set of all variation groups forms a VariationProgram. Given such a program, ATLAS uses model-to-model (M2M) transformation to automatically generate informative mission models that conform to the mission metamodel in Fig. 4 and meet the min/maxRequired properties of each group. For example, consider the healthcare example from Sect. 2 and assume we want to assess if the mission can be fulfilled using a subset of the available robots. The robots are added in the Mission model and the VariationGroup referring to these robots is created. The minRequired and maxRequired properties are set to two and three, respectively. ATLAS consumes the variation groups and through M2M transformation automatically generates the possible mission model configurations (designs). Since a VariationGroup directly references components from the model specification of the target robotic team, metamodel enhancements, e.g., adding other system components or environmental elements, can be handled in the same manner and require no adaptation to the logic underpinning the search for optimal robot configurations.
The large number of generated mission model configurations and the, typically, considerable time needed to execute a single configuration incur substantial computational overheads that, unavoidably, render prohibitive the exhaustive analysis of all possible configurations [26]. ATLAS aims at alleviating this issue and mitigating the impact of running simulations for different configurations through informed selection of the most promising configurations for analysis. To this end, ATLAS leverages recent MDE advances in model similarity estimation through robust hashing [18] and uses simulated annealing [27] to guide the selection of the most promising and informative model configurations. Robust hashing of MDE models [18] builds on techniques developed for the comparison of large datasets and enables the estimation of model similarity considering both the contents of model elements (e.g., attributes, operations) and their relative position with regards to other model elements. Given the set of models to compare, we use the robust hashing technique to produce a similarity matrix between these models. The similarity score between a pair of models ranges from 0 (completely dissimilar models) to 100 (identical models). For further details, we refer interested readers to [18].
The ATLAS search process of the configuration space is underpinned by the premise that candidate models which are more similar to a model that has been analysed (and yielded good results) are more likely to produce similar results than other candidate models which are less similar to the analysed model [28][29][30]. Accordingly, simulated annealing, as a probabilistic global search optimisation technique [27], offers a good tradeoff between searching the local area of the configuration space (until the optimal configuration is identified) and escaping local optimum configurations (by temporarily accepting configurations that yield worse results than the currently best configuration). This optimisation algorithm has shown promising results in supporting design space exploration in related research [31]. Nevertheless, we note that other heuristics and meta-heuristics such as evolutionary algorithms and swarm intelligence [32] could also be applied to solve this optimisation problem [33]. Investigating the applicability and tradeoffs between different global optimisation algorithms presents an interesting idea for future work.
Algorithm 1 ATLAS intelligent search process 1: function SearchModels(M, I, G, N , T , C) M: candidate models set; I: max iterations; G: mission goals; N : neighbourhood factor; T : initial temperature; C: cooling rate 2: end for 16: returnm, Vm 17: end function Algorithm 1 shows the high-level ATLAS search process to intelligently find good configurations. The algorithm requires as inputs the set of candidate models M, the maximum number of iterations I, the mission goals G, the neighbourhood factor N , and the simulated annealing specific parameters initial temperature T and cooling rate C. Initially, the SearchModels function randomly selects a reference candidate model m r (line 2) and evaluates its goodness based on the mission goals G (line 3), saving the derived model and evaluation information as the current best (line 4). Then, through the robust hashing approach described in [18], we construct a similarity matrix SM M of size |M| 2 for the model set M, with each model in columns ordered in decreasing similarity to its source model on each row. The loop in lines 6-15 is executed for I iterations during which more promising models are selected for evaluation. To this end, the GetSimilar function (line 7) retrieves the most similar models to the reference model m r considering the current search status. Thus, the current iteration i combined with the maximum number of iterations I and the neighbourhood factor N control the trade-off between exploration and exploitation, enabling at the beginning of the search to consider all models as candidates, irrespective of their similarity to m r (favouring exploration), and progressively shrinking the neighbourhood to models that are more similar to m r (favouring exploitation). Based on the resulting set of similar models M N m r , a model is randomly selected and evaluated (lines [8][9], assigning it at the best modelm if it yields better results than the current best (lines 10-13). Next, the algorithm executes the standard simulated annealing process where the reference model m r is updated if the currently evaluated model m i is better, i.e., the IsBetter(V m i , V m r ) function holds, or accepts a seemingly worse solution to avoid poten- . The latter case is a key feature of simulated annealing and the probability of accepting a worse solution is controlled by the simulated annealing hyperparameters temperature T and cooling rate C [27]. Once the algorithm terminates, the best model foundm and its result Vm is returned (line 16). For the healthcare case study, for instance, the best model corresponds to a system design under which the robotic team satisfies requirements R1 (the robots service all their assigned rooms) and R2 (the robots return to their base to recharge), and optimises requirement R3 (the team completes its mission in the shortest possible time). Figure 7 shows the variation model (irrelevant details are omitted for reasons of brevity) for the variation scenario of the healthcare case study from Sect. 2. The execution of the M2M using as input this variation model will generate 60 possible configurations that satisfy the constraints defined in the variation groups, i.e., each robot should use either the standard or high capacity battery and the mission should use at least two out of three available Turtlebots. In addition, two out of the three environmental object groups will always be chosen, giving at least 12 rooms to be serviced, but with three possible room selection choices, i.e., rooms (1-12), rooms (1-6, 13-18), or rooms (7-18).

ATLAS middleware
The middleware is a key component of ATLAS that enhances separation of concerns between the CI instances and the target robotic simulator. The CI receives information about the MRS status via the middleware, executes its logic, and relays back its decisions to the MRS via the middleware. Also, the middleware uses runtime monitoring to assess the status of the goals defined in the DSL. This internal state, including goal events, is stored in log files that allow the computation of relevant mission metrics and post hoc analysis. The middleware, CI components and the underlying MRS are all implemented as separate processes, which decouples them from each other, thus increasing debugging flexibil-ity and facilitating their independent testing. Furthermore, the low coupling between the CI and the MRS simulator, mediated by the middleware, enables not only to experiment easily with several candidate CI algorithms but also reinforces maintainability and extensibility (e.g., the components can be computing platform and programming language independent).
Irrespective of the target robotic simulator, the ATLAS middleware comprises (i) a simulator-specific interface that enables ATLAS to connect to the simulator, and subscribe and publish messages/topics; (ii) a highly efficient message broker (e.g., ActiveMQ [34]) to facilitate fast inter-robot communication and interaction with the CI algorithm; and (iii) a CI mapping software module that converts the high-level CI commands into message/topic changes for the target simulator. ATLAS uses the mission model (e.g., Fig. 6) and automatically generates these components through a sequence of M2T transformations. The current ATLAS implementation fully supports the Gazebo/ROS and MOOS-IvP simulators. Clearly, for each target simulator the M2T transformation and required components are developed only once and can be reused thereafter. We discuss below the concrete instantiations for these two simulators. The full details of the M2T transformations are available on our project webpage.
ATLAS middleware for MOOS-IvP: The M2T transformation uses the mission model and configures appropriately the ATLAS middleware to enable monitoring the MRS status for the specified goals. Also, MOOS-specific configuration files are produced to represent the robot configurations, properties and behaviours necessary to run the simulation. Within MOOS-IvP, each robot is represented as a community comprising a set of C++ software modules that provides the functionality of robot components. Each community is served by an individual publish-subscribe database (MOOSDB) which contains key-value pairs. When robots communicate, these key-value pairs form a communication channel through which one robot will publish a message and subscribed robots will receive the update and act accordingly. Example messages that can be sent/received include sensor detection events, speed values and location information.
The communication between the ATLAS middleware and the simulator occurs through MOOSDBInterface, a generic MOOS software component that enables interfacing directly with robot communities. The simulator did not support this functionality. Thus, we developed this reusable software component which can now be instrumented through the middleware to publish/subscribe to messages within MOOSDB databases, e.g., receiving updated robot coordinates, activating the return home behaviour upon mission completion.
Messages received by the MOOS-IvP simulator are kept in the message broker (ActiveMQ) until they are automatically processed and translated into update nodes mapped to the initial mission goals. These updates are transmitted to the CI algorithm enabling revision of its internal state and informing subsequent decision-making. The middleware also supports translation of generic requests to low-level MRS messages (e.g., return to home or go to this location commands).
ATLAS middleware for Gazebo/ROS: This simulator also realises the publish-subscribe architecture with a topic subscription graph being the primary communication mechanism between robots and their components. Accordingly, the M2T transformation produces the necessary ROS launch scripts, which set out the simulation component task graphs, and the necessary ROS configuration to launch the ROS processes and implement the simulation functionality.
The middleware communicates with the ROS simulator using rosbridge (http://wiki.ros.org/rosbridge_suite), a stable and widely used ROS component that represents the logical interface module between ATLAS and Gazebo. Through rosbridge, ATLAS can subscribe to ROS topic updates, triggering notifications when these subscribed topics are updated. Dynamic topic subscriptions can be added/removed during simulation, thus enabling to reconfigure the interface during mission execution.
To connect to the ROS simulation, the Java-based middleware uses jrosbridge to initiate the topic subscriptions using the simulator variables defined in the DSL; see, for instance, the model elements /amcl_pose and /roomCompleted in Fig. 6. Updates on these variables update the middleware's internal state and generate events that are forwarded to the CI activating CI-specific decisions. For the reverse communication pathway, i.e., from the CI to Gazebo, CI actuation commands (e.g., go home/robot stop) are translated into specific low-level ROS topic changes, instantiating the behaviour changes specified by these commands. For example, a generic action for "robot stop" (due to a depleted battery) is translated into a command that sets the robot's current goal to its present point, and prevents it from moving further away from its current location.

Collective intelligence
Collective Intelligence (CI) is an important MRS mechanism that facilitates the encoding of the high-level logic to coordinate robot decisions and manage the behaviour of the overall MRS. In addition to supporting the analysis of different robotic team configurations using variation groups (cf. Sect. 3.3), ATLAS enables the automated evaluation of alternative CI algorithms. Users can inspect the mission metrics produced by each CI instance and select the best for hardening the MRS. This is another key feature of ATLAS that reduces further the coupling between the target robotic simulator and the high-level decision-making process.
To facilitate the development of CI algorithms, ATLAS performs an M2T transformation using the mission model to generate CI templates. These templates have empty placeholder methods that reflect the mission goals, behaviours and events mapped to robot components defined within the mission model (Fig. 6). Engineers are expected to specialise those templates with the appropriate logic to develop the target CI algorithm (cf. Step 3 in Fig. 3).
Although this CI generation facility is provided for convenient development, the decoupling of the CI into a separate process provides considerable flexibility. If system testers wish to exploit CI algorithms already implemented in other languages, it would be possible for users to use a custom CI instead of ATLAS-specific generated stubs. To use this external CI, the users would need to provide parsing and communications mechanisms by providing an API to their custom CI and to interface it to the middleware. Further discussion of this functionality is presented in Sect. 4.1.
The communication of fully fledged CI instances with the ATLAS middleware is underpinned by the inversion of control programming paradigm [35]. To achieve this, the middleware becomes aware of the methods within the CI template and the messages associated with each method during the M2T transformation. When the middleware receives an update to a subscribed message/topic from the robotic simulator, it maps the message to the appropriate CI method and proceeds with its invocation. The CI executes its logic and informs the middleware for its decision (e.g., instruct a robot to take over a failed peer) so that the latter can send the appropriate message updates to the simulator via the logical interface. Since the CI instance runs as an independent Java process, both the CI and the middleware run independently and operate in a non-blocking mode using JSON messages.
The current ATLAS version supports the delegation of CI control to a single MRS component. This component can be a centralised command-and-control station or a single robot that acts as the leader with full knowledge and total control over its peers. Supporting other CI variants like hierarchical or decentralised [4,36] is out of scope and is left for future work.
Example 3 Listing 1 shows an example of the CI implementation for the energy-aware CI for the healthcare robotic team use case. The global variables rbts, roomAssignments and roomLocations hold information about the location of robots, the rooms assigned to each robot and the room locations, respectively. The init() method parses the defined model and loads information necessary for the CI; for example, the room locations and the initial energy per robot. Then, the invocation of the assignRoomsByEnergy method enables the allocation of pending rooms to the available robots proportionally to their energy capacity. Once room allocation is completed, the sendRoomAssignments method facilitates the String roomStr = roomAssignments.get(rbt).stream(). 30 map(String::valueOf).collect(Collectors.joining(",")); 31 CILog.logCI("Sending to "+ rbt +" rooms" + roomStr); 32 API.sendSimulatorVariable(rbt.getName(), "/rooms", roomStr, true); 33 } 34 } 35 //update the energy level for the given robot 36  if (key.equals("/roomCompleted")) { 58 Integer room = Integer.valueOf(val); 59 CILog.logCI("Room "+ room +"done by "+ rbt); 60 roomAssignments.get(rbt).remove(room); 61 } else if (...) {..} 62 } transmission of the assigned rooms to the simulator by utilising the middleware-generic function sendSimulatorVariable which transmits a comma separated string of room numbers to the /rooms simulator value of the target robot. The energy updates produced by the middleware's energy model are received by the CI and translated into calls to the eUpdate method, which compares the energy level to a fixed energy threshold for the robots. When a robot's energy is below this threshold, the CI considers the robot as being at risk of getting stuck into the healthcare facility. Thus, the CI instructs the robot to return to home (through using the corresponding middleware function), renders the robot idle, and reassigns the rooms not serviced yet to other robots (through the reassignRooms method). This method retrieves the outstanding rooms of the idle robot, and if there are any, it reassigns them to the other robots by leveraging the (assignRoomsByEnergy) method.
The simVarUpdate method provides a facility to update the local CI knowledge using information from the simulator. This method is invoked automatically by the middleware upon receiving an update to a variable specified in the mission model and to which the CI is subscribed to (e.g., Fig. 6). In this case, if the variable is an update to the ROS topic /r oomCompleted, it will automatically remove the room number from the given robot's outstanding work.

ATLAS prototype tool
The prototype ATLAS model-driven tool uses the Epsilon family of languages [22] to perform the MDE tasks. In particular, the DSL is implemented with EMF [37]. We use Epsilon's EMF Model Java API to check the variation groups and produce the candidate MRS designs (Sect. 3.3) and the Epsilon Generation Language to generate the ATLAS middleware (Sect. 3.4), the configuration files needed by the target simulator and the CI template (Sect. 3.5). We also use ActiveMQ [34] to link the MRS simulator, CI algorithm and the middleware. The open-source ATLAS source code and the full experimental results summarised next are available at https://www.github.com/jrharbin-york/ATLAS. Finally, a video showing the execution of ATLAS in a marine case study that involves a team of unmanned underwater vehicles (UUVs) (see Section 5.2) is available at https://tinyurl.com/ ATLAS-ExampleVideo.

Code generation
We provide below further information about the code generation functionality provided in ATLAS .
Simulator Interface Generation: The code for the simulator interface is automatically generated using EGL, which consumes the model and generates a loader interface to con-figure the middleware for the target use case. This loader contains instantiates objects from the model, representing the configuration of robots and sensors etc. to be loaded by the middleware. A Mission object is returned from the loader class, which the middleware uses internally to maintain the state and support further processing.
Collective Intelligence Generation: Templates are also generated for the collective intelligence algorithms using EGL, based upon the information provided in the model. ATLAS users should populate these templates with suitable code that delivers the expected functionality (e.g., see Listing 1). Currently, we assume that in the ATLAS architecture, the CI runs centrally upon a single computer which receives messages from the robotic team and coordinates the CI (i.e., the CI is not distributed). Users interested in exploiting CI algorithms already available in other programming languages (e.g., C++, Python) could achieve this by using their custom CI instead of the generated templates. However, parsing and communication mechanisms (e.g., a client-server architecture) must be provided to enable the interaction between their custom CI and the middleware.

Research questions
We performed extensive experiments to assess the benefits and performance of ATLAS and answer the following research questions.

RQ1 (Configuration Analysis). Can ATLAS help with finding the optimal configuration for a robotic team?
We use this research question to analyse the extent to which ATLAS can support the specification and analysis of different variation options in an MRS and enable the selection of the optimal configuration.

RQ2 (Collective Intelligence Analysis). Can ATLAS support tradeoff analysis between different collective intelligence algorithms?
We use this research question to analyse if ATLAS can assess the situations in which different CI algorithms perform well on specific metrics. Assessing these results will allow robotic developers to select optimal CIs for a given mission.

RQ3 (Reproducibility). How does the non-determinism of robot simulations affect the reproducibility of the ATLAS results?
We analysed if ATLAS can enable the identification of non-reproducible behaviours of MRS configurations and the discovery of outliers, thus providing evidence that the MRS, CI or the overall system may produce sub-optimal behaviours.

Case studies
We followed the standard practice in empirical software engineering [38,39] and evaluated ATLAS using three distinct case studies: one from the domain of healthcare robotics using ROS/Gazebo [11] and two other from the domain of UUVs using the MOOS-IvP robotic simulator [12]. More specifically, the three case studies selected are: (1) the healthcare robotic mission presented in Sect. 2; (2) an object detection UUV mission developed by the ATLAS team and described in Sect. 5.2.1; and (3) a variant of the Bo-Alpha mission [40] developed by the MOOS-IvP community and included within the standard package of the simulator, which we describe in Sect. 5.2.2 .

UUV object detection mission
The first MOOS-IvP case study involves a UUV team deployed on an object detection mission within a large marine area containing both benign and hazardous objects. Each UUV is equipped with a sonar sensor that can detect environmental objects when they are in close proximity, localisation hardware and a radio transceiver to interface with a centralised command-and-control computer (shoreside) which runs the CI and coordinates the activities of the UUV team. Figure 8 shows the three UUVs executing lawnmower-style sweeps (horizontally back and forwards followed by vertical steps) over subdivided regions of the area, and three objects to be located (where green and red triangles indicate benign and malicious objects, respectively). In this case study, an environmental object represents a generic target for detection in the marine area, whose precise nature would depend on the case study. Here, we assume that environmental objects can be either "benign" or "malign" (hazardous), with the "malign" category requiring a more intensive response from the system and further investigation. The terminology used has been derived from similar use cases available on the MOOS-IvP simulator repository. In a mine detection mission, for instance, the "malign" object would represent an active mine, and the "benign" object would represent a coral reef or a sea creature. Similarly, in a search and rescue mission, the environmental objects could either include humans needing rescue or debris. Since in this paper we are investigating the CI response and not sensor accuracy, we assume that the status of the object is always correctly detected by the UUV sensors.
The shoreside uses a CI algorithm to coordinate the UUVs behaviour and fulfil the requirements shown in Table 2. Given the safety-critical nature of this mission, all objects should be detected (R1). Depending on the object's type, one or two verifications by peer UUVs should be performed (R2), thus reducing the risk that an incorrect type has been assigned to the detected object. Since UUVs have limited battery capacity, it is important to partition the area effectively and complete the mission at the least possible time (R3) and before the maximum mission execution time given by T max = 2400 s.
When designing the UUV team, engineers want to investigate how effectively teams comprising three or four UUVs would accomplish the mission. Furthermore, each UUV can be equipped with a wide or a narrow sensor whose scanning area is 10 m or 20 m wide, respectively. Consequently, this results in 48 possible configurations (i.e., designs) for the UUV team.
In addition to the UUV team configurations, several CI algorithms could be used to support the execution of the object detection mission. We assume that robotic engineers are interested in assessing the following standard and advanced CI algorithms.
-Standard CI: This CI partitions the area between the UUVs equally, based on a lawnmower pattern with a constant vertical separation, independent of each UUV's sonar sensor range. When performing verifications of detected objects (R2), the CI selects the UUV that is the closest to the object. Verifications are performed for a constant fixed time of 600 s, scanning the area around the detection. When completing the verification, the UUV is commanded to resume its original sweep region from the beginning.
-Advanced CI: This CI partitions the area between the UUVs proportionally to the strength of their sonar sensors, i.e., a UUV equipped with a more capable sensor is assigned a larger area than a UUV equipped with a narrow sensor. The CI also monitors the status of the verification, returning the dispatched UUV back to its originally assigned area as soon as the final waypoint of the verification task is reached. When this happens, the CI instructs the UUV to resume its task from the point at which it was interrupted.
Evidently, the advanced CI is more efficient and aims at reducing the overall mission execution time but incurs significant communication costs due to the frequent communication between the dispatched UUV and the shoreside. On the contrary, the standard CI, albeit slower, is more meticulous and may help the UUV to discover previously missed objects.

Bo-Alpha mission
This case study is based on the standard Bo-Alpha 2 case study supplied with the MOOS-IvP simulator. During this mission, UUVs monitor an area and make measurements (e.g., salinity, temperature) on the left and right sides of a topology, occasionally alternating sides. To avoid collisions, the UUVs must not come too close to each other and also avoid obstacle zones. The UUVs monitor their residual energy levels and communicate this information and positions to a control station running the CI. Figure 9 shows two robots crossing the central region containing five obstacles depicted as white octagons. For this UUV mission, there are 16 possible configurations: two robots with a "fast" or "slow" version each, travelling at 1.5 m/s and 3.0 m/s, respectively. Each UUV also supports a standard (2800 mAh) and a high capacity (5000 mAh) battery.
The mission is executed for T max = 1200 s and the CI coordinates the team's behaviour to fulfil the requirements shown in Table 3. The metrics used to assess the satisfaction of the requirements are the residual energy of the UUVs after mission completion and the total UUVs distance from their return locations. A competent CI would steer the robots back to their base with sufficient residual energy. Engineers are interested in analysing the following CI instances.
-Standard CI: The UUVs alternate sides every 150 s. This CI recalls the robots to return home 150 s before T max .  Vehicles must return to their starting points before their battery is depleted -Energy-based CI: This CI algorithm tracks the positions of the UUVs and alternates them when both have finished their assigned area. Also, this CI monitors the energy remaining on the UUV battery and sends the recall command to return to base when the critical energy threshold of 750mAh is satisfied.

Experimental setup
We performed a wide range of experiments for the healthcarebased use case described in Sect. 2 using the ROS simulator and the two UUV-based use cases (Sects. 5

Results and discussion
We assess ATLAS on the use cases for each robotic simulator independently. To this end, Sects. 5.4.1 and 5.4.2 focus on answering the research questions from Sect. 5.1 for the ROSand MOOS-IvP-based case studies, respectively. Figure 10a shows the number of completed rooms for the possible MRS configurations of the healthcare case study with the robot using the standard CI. In this scenario, we set the threshold for requirement R1 to twelve, i.e., the robotic team must successfully service twelve rooms in order to fulfil requirement R1. As shown in the figure, twelve configurations fulfilled this requirement, servicing all assigned rooms, and returning the robots back to their base for recharging. All these configurations included the high-capacity battery option on all robots, with three configurations using three robots and the remaining nine featuring two robots.

RQ1 (Configuration Analysis).
The worst performing configuration serviced only four completed rooms during the evaluation. This configuration comprised one robot with a high capacity battery and another robot with a standard battery. Further investigation revealed that the behaviour occurred due to a rare event involving a failure either in the ROS simulation process or a failure of one of robot's path planning algorithm. This is an interesting behaviour of the simulation and we revisit this configuration in the analysis of RQ3 later in this section. Figure 10b depicts the number of completed rooms versus the total cumulative energy of the robotic team at the end of simulation with the command-and-control centre using again the standard CI. Note that some points in the scatter plot are superimposed upon each other because the variance of the energy values is small. Looking at the figure, we can observe that some configurations violated requirement R1 (i.e., serviced less than twelve rooms) and also violated requirement R2 (as indicated by the almost zero residual energy at the end of the mission). Subsequent analysis indicated that these configurations involved robots equipped with standard capacity batteries, which were proven inadequate in enabling the robotic team to complete its mission using the standard CI. Configurations that completed the mission successfully with the standard CI and whose robots had a good amount of residual energy involved at least two robots with high capacity batteries, while a few successful configurations employed all three robots with high capacity batteries. Given this information, it is clear that the standard CI requires robots with relatively large energy resources in order to fulfil the mission requirements.
We complement the presentation for the configuration analysis by showing in Fig. 10c the collected results for

RQ2 (Collective Intelligence Analysis).
To answer this research question, we analysed the differences in performance between the CI instances across all MRS configurations. To this end, Fig. 11a compares the number of completed rooms between the standard and energy-aware CI instances. The results show that 45 configurations successfully serviced all twelve rooms using the energy-aware CI, compared to only twelve configurations under the standard CI. This result is not surprising due to the design advantages of the energy-aware CI which includes a more informed initial workload allocation considering each robot's available energy coupled with the ability to redistribute work from robot(s) whose battery has been exhausted to other robots with adequate residual energy. We further assess the differences in fulfilling requirements R1 and R2 between both CIs by comparing the number of completed rooms and the total energy at mission end in Fig. 11b. As before (cf. Fig. 10b), some configuration results may be superimposed upon each other, as the variation between energy values for similar configuration is often small. In general, the total remaining energy when mission ends is lower for the energy-tracking CI than the standard CI in most configurations that successfully serviced eleven and twelve rooms. This is a useful insight and evidences the capability of the energy-tracking CI to redistribute workload at the expense of increased energy reduction for robots with a larger battery capacity. An interesting observation concerns the configurations that successfully serviced only six and nine rooms thus failing to satisfy requirement R1 under both CI instances. When using the standard CI, however, these  configurations failed also to satisfy requirement R2 because they exhaust their batteries, whereas when using the energyaware CI they were able to return to their base for recharging without getting stuck within the healthcare facility. Figure 11c presents for both CI instances all requirements simultaneously, i.e., the completed rooms, the total energy at the end of the simulation, and the mission completion time. The results show that for the energy-aware CI, the configurations that fail requirement R1, servicing less than the expected twelve rooms, also violated requirement R3, receiving T end as their assigned mission completion timing. Considering requirement R3, for configurations that meet requirement R1 and service all twelve rooms, the energyaware CI demonstrates its capability to complete the mission before T end but with an increased time compared to the standard CI, and with a lower cumulative residual energy level across the team members. This is expected, since the energyaware CI either gives more work to robots with a larger battery or redistributes work from robots with exhausted batteries. Consequently, robots with a large battery capacity require more time to complete their work.
These results demonstrate that ATLAS can support the automatic investigation of performance differences between functionally equivalent CI algorithms. This process enables tradeoff and cost-benefit analysis of the optimal CI algorithm for a given set of mission requirements and possible MRS configurations. More specifically, we identified that 12 configurations can meet requirement R1 and complete all rooms successfully with the standard CI, but these require a high capacity battery on all robots. If the cost of these resources is acceptable, then it would be possible to deploy the robotic team with one of these configurations. However, if only standard capacity batteries are viable, then the energy-tracking CI needs to be used with one of the 45 viable configurations, incurring additional development work to add energy tracking to the CI. If neither of these are viable, then find-ing a viable solution entails system engineers making further manual iterative redesigns of the system or mission, perhaps changing the location of robots' starting points or altering the tasks performed by the robotic team. RQ3 (Reproducibility) Non-determinism can be due to robot behaviour (e.g., path planning), components of the simulation engine, or the operating system [41]. Non-deterministic simulations can produce behaviour that is not representative of real systems. Regardless of the source of non-determinism, we can assess the behaviour of a system via repeated executions of a fixed system configuration. This provides a distribution of metrics over multiple runs, enabling the identification of outliers in which non-determinism may produce sub-optimal behaviour. Therefore, we investigated potential non-determinism in the simulations of the healthcare case study by selecting two configurations from the 60 possible MRS configurations and evaluated them for ten times using both CIs instances. First, we use one of the best configurations from Fig. 10a that completed the mission successfully with all twelve rooms serviced. For both the standard and energy-aware CI, all configuration runs successfully serviced all twelve rooms. Figure 12a presents the total residual energy on mission completion and the mission completion time for the ten runs. Across the repeated evaluations, there is a very low variation of the total residual energy (it may appear large due to the horizontal scale, but in fact the percentage difference is low). The completion time does show some variation, which we hypothesise is due to differences in the navigation path chosen by the robots as occasionally the robots have to alter their path to avoid each other or back up and choose an alternate path to prevent potential collisions with the walls.
Second, we used one of the worst configurations from Fig. 10a which completed only four rooms with the standard CI (cf. see RQ1 results for the healthcare case study). Figure 12b shows the completed rooms and total energy metrics across ten runs of this configuration. When the standard CI is employed, this configuration successfully services nine rooms. In contrast, with the energy-aware CI, one run produced an output of ten completed rooms, and all the others achieved twelve. These results illustrate that there is some instability in the simulation for this particular configuration regarding robot selections, room choices, and the overall robotic behaviour, also demonstrating that the original value of four completed rooms with the standard CI being an outlier. We believe that the behaviour occurred because the path planning algorithm failed intermittently when one of the robots (with the high capacity battery) approached too close to a wall and got stuck. This shows that ATLAS can isolate potentially unstable or non-deterministic simulation configurations. However, the robotic engineer should exploit its domain knowledge to delve into the details of this behaviour and establish the actual root cause of the instability or nondeterminism in the simulation. Figure 13a shows the number of missed detections for the 48 possible MRS configurations of the object detection mission using the standard (top) and advanced CI (bottom). Under the standard CI, ATLAS identified eight optimal configurations in which the MRS completed its mission successfully with zero missed detections. Other configurations produce a wide range of possible detection failures, up to the worst case of seven missed detections.

RQ1 (Configuration Analysis).
We analysed the produced MRS configurations to discover factors that can affect the system performance and reliability. Our analysis showed that the optimal MRS designs for a team of four UUVs always equipped the UUV gilda with a wide sensor, while for a three UUV team all UUVs should be equipped with a wide sensor. Engineers can factor in the cost of robots and sensors and decide whether a three-or four-robot team is preferred for this mission. Further analysis revealed a particularly sensitive MRS design when UUV gilda uses a narrow sensor. These designs tend to produce three missed detections, due to the UUV gilda missing the initial detection of the rightmost object in its assigned area and two subsequent verifications. Accordingly, such designs fail to meet the mission requirements and should not be preferred. Figure 13b shows the relation between the number of missed detections and mission completion time for the object detection mission. Some MRS configurations using the standard CI exhausted the available time signifying that at least one UUV did not complete its task on time. We identified a general pattern indicating a tradeoff with a decline in sweep completion time at the cost of partial mission completion, i.e., the mission can complete earlier when fewer detections are performed. MRS configurations that detected no object completed around 750-1600 s, where 750 s correspond to the shortest time needed to complete a sweep. Since no verifications were performed, this is expected. Subsequent analysis showed that most of these configurations comprise three UUVs all equipped with a narrow sensor. This configuration produces missed detections due to the width of the sweep patterns used in the standard CI, especially when objects are in the middle of the assigned UUV regions.
Considering the Bo-Alpha mission, we found one MRS configuration comprising two fast robots that completed two full sweeps using the standard CI (Fig. 14a). However, this configuration did not meet the other mission requirement as it failed to steer the UUV team back to its base with sufficient residual energy. Further analysis of the configurations that completed a single sweep yielded a configuration that reported some residual energy and a mean vehicle distance of less than 50 m. This configuration consists of a slow vehicle (henry) with a large battery, that enables the UUV to return to base with some residual energy, and a fast vehicle (gilda) with a smaller battery that can complete its sweep sooner.
These findings clearly demonstrate that ATLAS can support the analysis of different MRS configurations via its variation groups. The outcome of this analysis enables the selection of an optimal configuration for a given mission and requirements.

RQ2 (Collective Intelligence Analysis):
To answer this research question, we compared the standard and advanced CI algorithms in both UUV case studies. For the object detection mission, Fig. 13a shows that the advanced CI produces a better worst-case scenario than the standard CI, with two missed detections at most. In fact, over 40 MRS designs succeed without missing an object failure. This behaviour is mainly caused by the CI adapting to the sensor range and using a smaller vertical sweep step size for narrow sensors. Also, many MRS configurations using the advanced CI completed the mission faster and had fewer missed detections than the corresponding configurations using the standard CI (Fig. 13b). This behaviour occurs because the advanced CI instructs the UUV, after completing its verification task, to resume its original task from the point at which it was interrupted.
The analysis of the Bo-Alpha mission showed no optimal MRS configurations with the standard CI, since every configuration fails to complete more than one sweep of the left and right areas (R1 violation) or fails to return to base (R3 violation). The low number of completed sweeps using the standard CI is due to the improperly short timing that alternates the UUVs between different areas before they have completed their sweeps (Fig. 14a). In contrast, the energybased CI (with the low energy return threshold of 750 mAh) produces a larger number of completed sweeps with a modal value of six. Most of the energy-based MRS configurations The standard CI experiences many cases with very low residual energy (Fig. 14c). The overall better performance of the energy-based CI occurs because this CI uses a waypoint completion feedback to alternate the sweep sides between the vehicles, rather than a time limit as in the standard CI. The better performance is also due to using the energy feedback for sending the return command sufficiently early, rather than waiting until getting close to the simulation end time.
These results provide sufficient empirical evidence that ATLAS can support the comparison of different CI algorithms given a set of mission-specific metrics. Engineers can use these results to select optimal CIs for specific combinations of missions and robots. We note that the manual crafting and analysis of designs and CI algorithms for robotic teams was the status quo before ATLAS . Our experience with developing robotic missions using a non-ATLAS -based solution was the primary motivation for devising ATLAS . RQ3 (Reproducibility). As before, we assessed the impact of non-determinism in the reproducibility of simulation-based evaluation of MRS components by analysing the best MRS configuration and CI algorithm pair for each UUV case study over 30 independent runs.
In the object detection mission, we used a configuration that produced the lowest timing and zero missed detections in research question RQ2. Figure 15 shows the relationship between missed detections and completion time for both CIs Fig. 15 Results over 30 independent runs for the object detection mission using the best configuration and both CIs for the object detection UUV mission over 30 independent runs for this configuration. In all runs, there is a longer completion time under the standard CI than the advanced CI. This is not surprising since with the standard CI the UUV restarts its sweep pattern from the start after each verification. Moreover, the completion time for the advanced CI with no missed detections is generally clustered around the 1100-1300 s range, with some advanced CI executions producing a single missed detection and correspondingly a slightly shorter sweep time. This behaviour occurs because when there is a missed detection, at least one robot will not be interrupted for verifications, thus allowing the robot to complete its task slightly faster.
In the Bo-Alpha mission, we used a configuration that obtained the maximum number of sweeps in research question RQ2 with large batteries on both vehicles. The results over 30 independent runs show that there is little variability (Fig. 16). Generally, the energy tracking CI performs well and better than the standard CI. The final distance from the base in Fig. 16b shows that generally the metric is always constant, with a large mean distance with the standard CI and a small value with the energy tracking CI. Given that the robots are equipped with large batteries, the results show a considerable amount of final energy left upon the vehicle which is moving slower. Interestingly, there are a couple of outlier cases, which report a higher total final distance for the energy tracking CI. One possible explanation may be that given the variations in position, the faster vehicle was too far away from base when the return command was sent and therefore was unable to return home in time. Another explanation is that in rare cases, when returning home, robot collision avoidance strategies may incorrectly navigate the robots a considerable distance away from their intended home points. These results show the ability of ATLAS to identify potential instability of configurations in rare cases. However, it is up to the user to analyse the logs and determine precisely what happened in each case.

Threats to validity
Construct validity threats could arise by incorrect assumptions and simplifications during the specification of the experimental methodology and not using representative mission case studies. To mitigate this type of threat, we used three case studies, two developed by us and one provided by the MOOS-IvP community and is part of the standard version of the simulator. The criteria for selecting the case study from the MOOS-IvP mission library were the following: (i) the mission had to involve multiple robots; (ii) the team of robots had to coordinate via a collective intelligence mechanism to achieve the mission goals; and (iii) the mission had to be adequately documented.
Internal validity threats could produce incorrect analysis data and lead to incorrect insights. To mitigate this type of threat, the research questions are independent of each other, thus errors in one research question do not propagate to the other. Moreover, results were derived by automatically analysing the simulator-produced logs and reducing potential user bias. We also support reproducibility of our findings by making our analysis scripts available in the project's repository.
External validity threats could reduce the generalisation of ATLAS . We reduce this threat class by using established MDE practices and tools including EMF [37], Epsilon [22] and ActiveMQ [34]. The experimental evaluation involved three case studies (one provided by the MOOS-IvP community), while the use of two distinct robotic simulators reduces further this threat. Finally, the evaluation of ATLAS both in Gazebo/ROS and MOOS-IvP demonstrates the generality of our framework and its capability to support multiple robotic simulators. However, further experiments are needed to establish the applicability, feasibility and scalability of ATLAS in domains and applications with characteristics different from those used in our evaluation.

Related work
The work presented in this paper lies at the intersection of model-driven engineering, robotics, and design space exploration.

Model-driven engineering for robotics
Developing model-driven solutions for the robotics domain is an established area, which has produced several results over the years [42][43][44]. The majority of the proposed domainspecific modelling languages deals only with specific robot functions such as perception or control, while there are some model-driven toolchains like RobotML [8], BRICS [10], SmartSoft [7], and Robochart [9] which provide multiple modelling notations to be used together when developing a robotic system. For a detailed description of different approaches to model-driven engineering of robots, the reader is referred to [16] and [45].
Despite the available literature on the application of MDE to robotics, the engineering of MRS is still inadequately investigated. Cattivera and Casalaro [46] conducted a systematic mapping study on the application of MDE to the engineering of mobile robots. Out of all the studies reviewed, the authors found that only 24% (i.e. 19 studies out of 80) deal with MRS compared to single robots. This is however an improvement from an earlier survey by the same authors in 2015 [47], in which only 19% of surveyed papers considered multiple robots.
The most common formalism used for modelling multirobot behaviour is finite state machines and statecharts (e.g. [48][49][50]). Other approaches include Ciccozzi et al. [43], who propose the FLYAQ family of graphical domain-specific languages to model the structure and behaviour of multirobot aerial systems, and Pinciroli and Beltrame [51] who propose a textual DSL for specifying the behaviour of robot swarms. Instead of developing a language for specifying the behaviour of multi-robot systems, Dragule et al. [52] extend FLYAQ with a specification language, which enables engineers to specify domain-specific constraints for robotic missions in a declarative manner. Finally, very few approaches propose solutions for modelling explicitly communication, task allocation, and coordination between robots with the exception of [53].
Notwithstanding their merits, the aforementioned languages and tools focus on the specification of the behaviour and structure of multi-robot systems. In contrast, ATLAS focuses on the design space exploration of robotic systems facilitating tradeoff analysis.

Design space exploration
Different techniques have been proposed in the literature for solving the 'Design Space Exploration problem', i.e., automatically exploring large numbers of system design alternatives and identifying designs that optimise properties of interest.
Learning-based techniques identify the optimal designs by predicting the quality of a design before the actual synthesis and thus sampling only a portion of interest from the entire design space. Mahapatra and Schafer [31] proposed a more efficient simulated annealing algorithm called Fast Simulated Annealer (FSA) which uses a decision tree algorithm to identify parameters that have a high impact on the final result. These parameters are maintained in the next iteration of the algorithm until the optimal solution is found. Similarly, Chen et al. [54] presented Co-Training Model Tree (COMT), which is a semi-supervised learning model for predicting the qualities of design configurations and does not depend on labelled design configurations. Finally, Liu and Carloni [55] presented a learning-based method, which is based on the Random Forest learning model, transductive experimental design (i.e., a method for sampling the design space), and randomized selection, for finding an approximate Pareto front of designs.
Generic population-based meta-heuristic optimisation techniques, such as evolutionary algorithms [32], consider distinct designs as individuals in a population and a fitness function is used to qualify the solutions. Subsequent evolutions of the population are obtained by applying appropriate operators such as reproduction, selection, and gene mutation. The Non-dominated Sorting Genetic Algorithm (NSGA-II), which is a specific type of evolutionary algorithm for multiobjective optimisation, has been used for searching the space of possible solutions in the context of design space exploration [28,30,56]. In this work a number of good designs are grouped into candidate populations according to the optimization criteria. Similarly, Palesi and Givargis [57] use genetic algorithms to discover Pareto-optimal configurations of a system-on-a-chip architecture.
Design space exploration had been formulated as an optimization problem and explored using swarm intelligence algorithms like particle swarm optimisation (PSO) [58], bacterial foraging optimisation (BFO) [59], and ant colony optimisation [60]. Shathanaa and Ramasubramanian [61] used learning-based methods and achieved better performance in terms of the quality of the Pareto front created, whereas swarm intelligence-based methods outperform others in terms of execution speed.
Model-driven engineering has also been used in the context of design space exploration. Model-driven approaches specify the design problem as a set of models and model transformations and generate input for other tools that perform the space exploration. The Gaspard Framework [62] focuses on the design of massively parallel embedded systems. It uses automatic model refinement and model transformations to allow for design space exploration to evaluate performance characteristics through simulations. Neema et al. [63] propose the DESERT tool suite, which supports constrained-based design space exploration and model synthesis for embedded systems. Moreover, the OCTOPUS toolset [29] supports the design of software-intensive embedded systems and provides an intermediate representation for the specification of design problems. The tool integrates with analysis tools such as Uppaal [64] for the exploration of the design space. Finally, Hegedüs et al. [65] propose a modeldriven framework for guided design space exploration, where the system states are graphs, operations are defined as graph transformation rules, while goals and constraints are defined as graph patterns. While most of the other model-driven approaches to design space exploration depend on external tools for searching the design space, Hegedüs et al. [65] they use model transformations as a way to perform the explo-ration itself and they propose using information from analysis tools to guide this model transformation based exploration.
The majority of the aforementioned approaches focus on the exploration of optimal structural designs. In addition to these aspects, ATLAS focuses also on the exploration of optimal behaviours and particularly the exploration and evaluation of different algorithms for the collective intelligence of a robotic team.

Simulation-based analysis for robotics
Simulations are used extensively in the engineering of robotic systems. One of the main purposes of simulation for robotics is to provide a safe and fully controlled virtual testing and verification environment. Afzal et al. [26] propose a framework that can facilitate automated testing of robotic systems using software-in-the-loop (low-fidelity) simulations and anomaly detection. Similarly, Huck et al. [66] focus on testing industrial human-robot collaborative systems by using a human model and an optimization algorithm to generate high-risk human behaviour in simulation, thereby exposing potential hazards.
Simulation is also used to accelerate the engineering design cycle for robotic systems and reduce its costs. Serban et al. [67] propose Chrono, a multi-physics simulation package aimed at modelling, simulation, and visualisation of the mechanical parts of ground vehicles. Zhao et al. [68] introduce a simulation-based system for optimizing the physical structure and controllers of robots. The goal of the system is to take a set of user-specified primitive components and generate an optimal robot structure and controller for traversing a given terrain.
Lastly, simulations are used to generate at low cost large amounts of training data for the machine learning components of robots. Tobin et al. [69] trains models for object localisation on simulated images that transfer to real images by randomising rendering in the simulator. Similarly, Chebotar et al. [70] enable policy transfer to new real-world scenarios by training on a distribution of simulated scenarios. Finally, Andrychowicz et al. [71] teach a robotic arm using reinforcement learning dexterous in-hand manipulation policies that can perform vision-based object reorientation in a simulated environment.
Compared to the above approaches, ATLAS focuses on the exploration and evaluation of different algorithms for the collective intelligence of the robotic robot team. Moreover, our approach is simulator-agnostic, since its flexible, message-based architecture allows it to be easily extended to accommodate experimentation with different robotic platforms.

Conclusions and future work
This paper presents the ATLAS framework for design space exploration and tradeoff analysis of MRS architectures and CI algorithms. We described the framework's architecture, the core concepts of the ATLAS domain-specific language, and the ATLAS middleware. To this end, ATLAS is underpinned by: (i) a domain-specific modelling language, which enables the specification of MRS architectures, capabilities, and missions (including functional and non-functional requirements); and (ii) a code generation engine, which consumes models specified in the ATLAS language and produces the necessary infrastructure for experimenting on robotic simulators and platforms.
The evaluation of the framework is based on an MRS healthcare case study built with ROS/Gazebo [11] and two MRS case studies built with MOOS-IvP [12]. Our experimental evaluation indicated that ATLAS is capable of modelling MRS and their missions, and it enables the exploration and tradeoff analysis of different MRS configurations and CI algorithms.
In the future, we plan to evaluate the expressive power of the ATLAS DSL by applying it to more case studies, and extend its applicability to different robotic platforms [72]. Also, we would like to improve the variability modelling capabilities of ATLAS , enabling the analysis of more elaborate design alternatives [24] and investigate the incorporation of advanced intelligent techniques to search for optimal MRS configurations [56,73,74]. Finally, it would be useful to more formally apply a cost-benefit analysis of mission configurations. Using feedback from system users, ATLAS can be employed to analyse tradeoffs between CI algorithm development and requirement costs, and the resources required for implementing a mission.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.