Introduction

SwarmItFIX [1] (Fig. 1a) is a novel robotic setup that has been developed as a robot fixtureless assembly (RFA) platform under the European Union project mainly for sheet metal manufacturing applications. However, with certain modifications, this RFA platform can contribute to other applications, including material handling and material transfer. SwarmItFIX is a multiagent platform consisting of three different agents: head, parallel kinematic machine (PKM), and Base (Fig. 2). For a single SwarmItFIX robot, all the three agents have to work in a coordinated manner to accomplish any particular task. These robots perform a novel Swing and Dock (SaD) locomotion on a fixed modular platform, called as workbench. The workspace of these SwarmItFIX platform can be modified by varying the size of the workbenches or by increasing the number of docking pins. The details of the SwarmItFIX setup and the agent coordination have been described in detail in the previous works of the authors [2,3,4]. A software framework called MRROC++ has been utilized for the path planning of robot systems including ASEA IRB 6 and SwarmItFIX [5].

Fig. 1
figure 1

Robot setup

Fig. 2
figure 2

CAD model of the SwarmItFIX setup integrated with the serial manipulator

Serial manipulators (Fig. 1b) assist CNC machines during machining which would enhance throughput. In robot-assisted machining, for maintaining the proper balance between machining efficiency and quality, constant feed rate is generally preferred. Different feed rate segments in a single machining trajectory increase the joints’ jerk, which ultimately leads to deviation in the machining trajectory [6]. The repeatability error present in the trajectory of a serial manipulator is directly proportional to its reach. This error is in the inner areas of the work and high in the areas when the robot attains its maximum reach [7, 8]. The stiffness of joints will be heterogeneous throughout the work envelope of the serial manipulator hence, the machining has to be performed within its possible range of best stiffness [9,10,11]. In this work, a serial robot possessing higher work volume is deployed in coordination with SwarmItFIX robots.

Given the coordinates of the target position and direction of tool, the end-effector target pose of the serial manipulator can be determined. The inverse kinematics model computes the angles of all joints for a corresponding end-effector target pose (Fig. 3). As multiple solutions are possible, optimization-based inverse kinematics model is generally preferred for the computation of joint angles. In addition, several constraints can be imposed for achieving joint angles within the required range, so that an abrupt change in joint angles between two adjacent target points could be avoided.

Fig. 3
figure 3

Forward vs inverse kinematics

The Markov decision process (MDP) addresses the sequential decision problems in fully observable stochastic environments with transition and reward functions. Earlier researchers have also made the MDP to address the deterministic environments [12]. The problems modeled as MDPs are mostly solved using model-based or model-free reinforcement learning (RL) algorithms. In general, both the techniques are offline or static in their working nature, wherein an optimal or sub-optimal policy (π*) can be obtained from the computational environment which will be utilized for the real-time problems. In both the RL algorithms, the computational agents take actions in their computational environment and collect the reward, where the objective is to maximize the cumulative reward. The major difference between the model-based and model-free RL algorithms is that, the former requires a transition model; whereas, it is not required for the other. As the SwarmItFIX robots’ workspace is modular, the model-free RL techniques are preferred for the path planning of SwarmItFIX robots. In this method, a random policy (π) is fed instead of calculating a transition model. The computational agent performs action (a) at any particular state (s) suggested by π, and attains a new state (s′) and receives the reward R(s,a). According the utility matrix Q(s,a) will be updated automatically. Based on this updated Q(s,a), the existing policy π will be refined further. The process will be repeated for a predefined number of iterations or until the optimal or sub-optimal policy is obtained [13].

The multi-robot systems can either be homogenous (robots of the same kind) or heterogeneous (robots of a different kind) depends on their capabilities [14]. The current work deals with a novel hybrid multi-robot system, consisting of one serial manipulator robot and two SwarmItFIX robots. The serial manipulator’s role is to perform the machining operation with end effector as the tool, whereas, the SwarmItFIX robots provide support from the bottom of the sheet metal. The geometrical and FEM constraints (unary and binary) help to obtain a safe distance to be maintained between two consecutive heads (hhDn) and between head and contour (chDm) [15, 16]. In this work, for the multi-robot path planning and decision-making problems of SwarmItFIX robots, a hierarchical decentralized static or offline strategy has been proposed. The computational agents communicate with each other without a centralized agent, traverse in the computational environment, and identify the collision- free optimal path. When the number of agents increases, the complexity of the coordination problem will also increase. In those situations, this decentralized approach provides more flexibility and adaptability for the system than that of the centralized approach. Also, the offline strategy become simpler in those situations compared with the online strategy. Moreover, it is cumbersome to use the dynamic or online method in the presence of a heterogeneous multi-robot system [14].

Generally, planning in multi-robot systems will be classified into task planning and motion or path planning. Task planning decomposes the complete problem into multiple sub-tasks and designates them to various robots deployed. Simultaneously, the motion planning generates a trajectory or path for each robot to accomplish the designated task. In this work, initially, the task planning for SwarmItFIX robots and the serial manipulator robot has been done individually. Finally, the motion planning of serial manipulator robot and SwarmItFIX robots are performed using inverse kinematics and prioritized reinforcement learning, respectively.

The contribution of this paper includes the following:

  • Task planning of SwarmItFIX robots.

  • Modeling the multi-robot coordination problem of SwarmItFIX as MDP.

  • Development and implementation of a novel prioritized SARSA algorithm to solve the proposed MDP.

  • Development of optimization-based inverse kinematics model of the ceiling-mounted serial manipulator robot.

  • Development of multi-robot planner for the coordinated locomotion of serial manipulator robot with two SwarmItFIX robots for sheet metal milling operation.

Related works

The work [17] addresses the position, velocity, and acceleration analysis of a ceiling-mounted 6-axis serial manipulators. The work [18] shows, by integrating deep RL and inverse kinematics models, an efficient path planning model can be obtained for the serial manipulators. However, the method is under implementation. The optimization-based inverse kinematics model successfully provides joint angles while avoiding singularities. The same optimization-based model has also been effectively implemented for the robotic milling tasks [9,10,11]. The literature also identifies that, more than the analytical methods like IKFast [19], optimization based methods are more suitable for solving inverse kinematics problem of serial manipulators that performs milling operations [9,10,11]. The gradient-based solvers are more efficient in solving non-linear optimization-based inverse kinematics problems as they allow additional constraints. This restrict sudden changes in joint variables between two consecutive way points unlike the evolutionary algorithm-based solvers such as Bio-IK [19]. Hence, in this work the objective function and the constraints are newly framed, however, the existing BFGS Gradient Projection solver has been utilized for calculating the joint angles of the ceiling mounted serial manipulator.

For multi-agent systems, it would be more appropriate that the agents compute the plan/policies individually (distributed approach) rather than relying on the centralized units. The distributed approach consists of two phases; (i) deciding the priority of the agents (task planning), and (ii) computation of plans of individual agents by considering the information of other agents (motion planning). Moreover, the agents are fed with limited information on other agents which ultimately reduces the computational complexity. Based on these aspects, the distributed approach is most preferred for the multi-robot path-planning problems [20,21,22]. In Ref. [23], the authors investigated two distributed algorithms, for solving three different cases of multi-agent path planning problems. The results showed that the priority-based algorithm outperforms the other distributed algorithm in all three cases regarding success rate. This priority-based algorithm also outperforms many existing centralized algorithms includes MA-A*, EPEA*, and CBS, in terms of runtime. Authors in [24] have modeled the automated valet parking problem as a multi-agent path planning problem, and solved it using two prioritized methods, constraint-based method (CBS-Pri) and prioritized heuristic method (CA*-Pri). Based on the results, it was found that the CBS-Pri gives the optimal solution by taking more time comparatively. Whereas, CA*-Pri takes lesser time and does not guarantees the optimal solution. The authors [25, 26] have proved that the prioritized approach for the multi-agent path planning problem yields optimal results with reduced makespan with less computation time and does not impose any constraints related to communication, external coordination, and synchronization among robots. Unlike coordinated approaches, at the cost of a small increase in path length, the prioritized approaches scale well with more number of robots. By choosing a proper prioritization methodology, the problems accommodating as many as 24 robots in the confined environments can be solved in a simplified manner compared with other coordination methods [27]. The work by [28] proposes a novel dynamic priority strategy for the multi-mobile robot path planning problem. In terms of the number of deadlocks, the dynamic priority-based path planning strategy proposed efficiently solved the coordination problem when compared with the static approach. However, in the present SwarmItFIX coordination problem, the task planning module completely avoids the possibility of deadlocks. Also, at any given point of time, one robot alone advances to the new location; whereas, the other robots hold the position while supporting the sheet metal. Based on these reasons, a static priority-based approach has been selected for the multi-robot path planning of SwarmItFIX robots.

Problem definition

The task is to perform the machining (milling) on the giant sheet metal, where the robots perform the machining and fixturing operations. A ceiling-mounted 6-axis serial manipulator with the milling tool as an end effector performs the machining operation, and the novel SwarmItFIX robots are utilized as robotic (RFA) fixtures. For achieving high accuracy in the machining process, the SwarmItFIX robots have to provide support at the appropriate places where the serial manipulator is currently performing the machining operation. To achieve the task mentioned above, two different coordination problems need to be addressed. The first one is the heterogeneous coordination between the ceiling-mounted serial manipulator robot and the group of SwarmItFIX robots. Second is the homogenous coordination among the multiple SwarmItFIX robots. These two coordination problems are addressed in this work by further dividing them into three different planning problems and presented in the next section.

Problem formulation

The coordination between COMAU NS16 serial manipulator robot and SwarmItFIX robots is achieved by solving three subproblems as given in the planner (Fig. 5). The three subproblems include:

  • Trajectory planning of COMAU NS16.

  • Multi-robot task planning of SwarmItFIX robots.

  • Multi-robot path planning of SwarmItFIX robots.

Trajectory planning of COMAU NS16

In this work, the COMAU NS16 serial manipulator (Fig. 1b) is deployed to perform the sheet milling operation with milling tool as its end effector. The technical specifications of COMAU NS16 and its kinematic DH parameters are presented in Tables 1 [29] and 2, respectively. The corner of the workbench is considered the origin position (x0, y0, z0) for all robots to reduce complexity. The total transformation between the base of serial manipulator and the TCP can be represented as homogenous transformation matrix given in Eq. (1). While developing the DH based inverse kinematics model of the ceiling-mounted serial manipulator robot, the base (smxb, smyb, smzb) of the robot is considered as its home position. The frame assignment of COMAU NS16 for the DH convention is shown in Fig. 4. The serial manipulator’s trajectory is planned by considering its TCP velocity as constant, as the machining is considered to be performed at a constant feed rate. As the tool path is pre-defined, the trajectory planning is performed in the cartesian space. Initially, the vertex coordinates of the machining trajectory are fed into the trajectory planning model, then the whole trajectory is split into various line segments and each segment is again divided into numerous waypoints. Finally, required angles (smθk,w) of all the manipulator joints for all the waypoints are determined with the help of an optimization-based inverse kinematics model given in Eqs. (25). Where, Eq. (2) is the objective function, constraints (3) and (4) limits the angles of all the joints of serial manipulator. Equation (3) limits all joints angles smθk, except smθ1. Where, Eq. (4) is a collision avoidance constraint that has been imposed exclusively for smθ1. Finally, the joint angles between two-way points (smθk,w1 and smθk,w) are interpolated in the joint space:

$$ \begin{aligned} {}^{0}T_{7} & = {}^{0}T_{1} \times {}^{1}T_{2} \times {}^{2}T_{3} \times {}^{3}T_{4} \times {}^{4}T_{5} \times {}^{5}T_{6} \times {}^{6}T_{7} \hfill \\ \, & = \left[ {\begin{array}{*{20}c} {n_{x} } & {o_{x} } & {a_{x} } & {{}^{{{\text{sm}}}}x_{b} + p_{x} } \\ {n_{y} } & {o_{y} } & {a_{y} } & {{}^{{{\text{sm}}}}y_{b} + p_{y} } \\ {n_{z} } & {o_{z} } & {a_{z} } & {{}^{{{\text{sm}}}}z_{b} + p_{z} } \\ 0 & 0 & 0 & 1 \\ \end{array} } \right]. \hfill \\ \end{aligned} $$
(1)
Table 1 Joint limits of COMAU NS16
Table 2 DH table
Fig. 4
figure 4

Frame assignment and work volume of COMAU NS16

Objective function:

$$ {\text{Min}}\left[ {{}^{{{\text{sm}}}}\theta_{k,w} - f\left( {\begin{array}{*{20}c} {{}^{{{\text{sm}}}}x_{w} } \\ {{}^{{{\text{sm}}}}y_{w} } \\ {{}^{{{\text{sm}}}}z_{w} } \\ \end{array} } \right)} \right]. $$
(2)

Constraints:

$$ {}^{{{\text{sm}}}}\theta_{k}^{ - } \le {}^{{{\text{sm}}}}\theta_{k,w} \le {}^{{{\text{sm}}}}\theta_{k}^{ + } \, ;{\text{ where}}, \, k = 0,2,3,4,5, $$
(3)
$$ \, 0 \le {}^{{{\text{sm}}}}\theta_{1,w} \le {}^{{{\text{sm}}}}\theta_{1}^{ + } , $$
(4)
$$ {}^{{{\text{sm}}}}\theta_{k,w - 1} - {}^{{{\text{sm}}}}\theta_{k,w} \le \eta . $$
(5)

Multi-robot task planning of SwarmItFIX robots

This module takes care of heterogeneous and homogenous coordination among the robots by computing the support locations and appropriate support duration and allocating them to respective SwarmItFIX robot. Initially, using ϕn, the corner support locations will be identified. Then, the required number of support locations is calculated based on the respective segment length. Finally, the support locations are identified using the Monte Carlo-based head task assignment by considering the geometrical and FEM constraints. Once (headxm,n, headym,n, headzm,n) are identified, the time plan module computes tm,n for all the support locations as under:

$$ t_{{m,n}} = \frac{{\sqrt{ \begin{gathered} \left\{ {{}^{{{\text{head}}}}x_{{m,n}} + {}^{{{\text{ch}}}}D_{m} [\cos (90 - \phi _{n} )] - {}^{{{\text{vert}}}}x_{n} } \right\}^{2} \hfill \\ + \left\{ {{}^{{{\text{head}}}}y_{{m,n}} + {}^{{{\text{ch}}}}D_{m} [\sin (90 - \phi _{n} )] - {}^{{{\text{vert}}}}y_{n} } \right\}^{2} \hfill \\ \end{gathered} }}}{{{}^{{{\text{tool}}}}v}}. $$
(6)

As already mentioned, the milling process is assumed to be performed at a constant feed rate for better machining quality and hence the time plan computation for the serial manipulator becomes straightforward.

Task allocation

The total number of robots deployed is decided based on the size of the machining trajectory and workbench. Once the support locations and orientations (headxm,n, headym,n, headzm,n, ϕn) are calculated, Taskm as presented in Eq. (7) will be allocated to a swarmitfix robot Rq:

$$ {\text{Task}}_{m} = \left[ {^{{{\text{head}}}} x_{m,n} ,^{{{\text{head}}}} y_{m,n} ,^{{{\text{head}}}} z_{m,n} ,\phi_{n} ,^{{{\text{base}}}} {\text{Gpos}}_{m} ,t_{m,n} } \right]. $$
(7)

Initially, the Taskm is categorized into Todd and Teven. If the total number of robots deployed is 2, then tasks Todd and Teven are allocated to robot 1 (R1) and robot 2 (R2), respectively. If the total robots deployed is 3, then Teven is allocated to R2 and Todd is shared among R1 and R3. If the mutual docking pins are present between two consecutive baseGposm, then task will be allocated to the current robot itself. Otherwise, the new robot will be deployed. If the total robots deployed is 4, Todd is shared among R1 and R3, and Teven among R2 and R4 (see Algorithm 1).

figure a

Multi robot path planning of SwarmItFIX robots

The SARSA TD presented in this work is a model-free reinforcement learning algorithm that computes collision-free action sequence of the base agent alone. In addition, the five step sequence ensures collision free locomotion of the complete SwarmItFIX robot based on the data received from head, PKM, and base planning submodules (Fig. 5). The five-step sequence for the locomotion of SwarmItFIX robot is explained below.

  • Step 1: Retraction of head from previous support location (headxm1,n, headym1,n, headzm1,n).

  • Step 2: PKM agent movement to keep the head in pkmθj,park.

  • Step 3: Locomotion sequence of the base agent (baseGposm1 → baseGposm) and the respective counter-rotation of PKM.

  • Step 4: The PKM agent movement keeps the head at a normal approach away from pkmθj,m.

  • Step 5: Head agent attains (headxm,n, headym,n, headzm,n) and headθm,n.

Fig. 5
figure 5

Hybrid coordination planner

The head agent’s positioning in the target coordinate (headxm,n, headym,n, headzm,n) is achieved by the inverse position kinematics model of the PKM agent [30]; whereas, the headθm,n is wholly related to the ϕn. When (headxm,n, headym,n, headzm,n) is located outside the work envelope of PKM agent, the robot traverses to the new location baseGposm with the help of the base agent’s SaD locomotion. As more than one robot is being deployed and the work environment becomes dynamic, the robots currently providing the support to the sheet metal become obstacles (obsTaskm) to the traversing robot. Hence, in order to compute the collision-free locomotion sequence of the SwarmItFIX robots, a novel decentralized type prioritized offline multi-agent motion planning technique (see Algorithm 2) is utilized in this work with SARSA TD as the base algorithm.

MDP modeling

When the SwarmItFIX robot is assigned with Taskm by the task allocation module, the computational environment (Fig. 6) which replicates the real-time environment is created accordingly. Then the prioritized SARSA algorithm is utilized to compute the collision-free path of base agent.

Fig. 6
figure 6

MDP environment—SwarmItFIX robot

Various RL parameters of the SwarmItFIX base agent are given below.

State-space (S): The states (s ∈ S) replicates the different positions of the workbench. The number of states in S is calculated using Euler’s formula (8). Current positions of other robots will also be noted, and the algorithm considers those positions as obsTaskm for the RqBseqm of the traversing robot Rq. The states which share mutual pins with obstacle states are also considered as the obsTaskm (Fig. 6):

$$ N_{{{\text{state}}}} = 1 - \, N_{{{\text{pins}}}} + N_{{{\text{edge}}}} . $$
(8)

The workbench of the SwarmItFIX consist of 129 edges (Nedge) and 52 docking pins (Npins). Using Eq. (8), the states (Nstates) is calculated as 78 as shown in Fig. 6.

Action space (A): The actions (a ∈ A) of the computational agent replicates the physical locomotion of the base agent of the SwarmItFIX. The action space consists of nine different actions, the details of which are given in appendix A.

Reward matrix R(s,a): The reward value for any particular state s is calculated based on the Euclidean distance between baseCposq and baseGposm (9). Negative rewards will be given to the obsTaskm:

$$ R\left( {s,a} \right) = \left\{ \begin{gathered} \left( {1 - \left\| {^{{{\text{base}}}} {\text{Cpos}}_{q} -^{{{\text{base}}}} {\text{Gpos}}_{m} } \right\|} \right),{\text{ if}}\; \, ^{{{\text{base}}}} {\text{Cpos}}_{q} =^{{{\text{base}}}} {\text{Gpos}}_{m} \hfill \\ - 100,{\text{ elif }}^{{{\text{base}}}} {\text{Cpos}}_{q} =^{{{\text{obs}}}} {\text{Task}}_{m} \hfill \\ - \left\| {^{{{\text{base}}}} {\text{Cpos}}_{q} -^{{{\text{base}}}} {\text{Gpos}}_{m} } \right\|,{\text{ otherwise}} \hfill \\ \end{gathered} \right\}. $$
(9)

State-action matrix (Q(s,a)): The utility values for all the actions at all states are calculated and updated in the Q-table Q(s,a). The size of Q(s,a) is 78 states × 9 actions.

Implementation

The trajectory details (smTraj) computed by trajectory planning module of the serial manipulator module and locomotion sequence (Rqseq) obtained by the multi-robot path planning module of SwarmItFIX is finally communicated to the implementation module. In the implementation module, smTraj is converted into PDL2 program and fed to the COMAU NS16 controller. Similarly, the Rqseq is fed to the SwarmItFIX controller using MRROC++ interface.

figure b

Results and discussions

Results of serial manipulator

The configuration of system utilized to perform this multi-robot coordination planning computation includes Intel Xeon CPUE3-1225v6 Processor 3.30 GHz, Windows 10, 64-bit OS with 16.0 GB RAM. The numerical inverse kinematics-based trajectory planning of the COMAU NS16 serial manipulator is performed using the Robotics system toolbox of Matlab. Task planning and the homogenous prioritized multi-robot planning of the SwarmItFIX (RFA) robots are performed using python 3.7 idle interface, and the results are presented in this section.

Trajectory planning of serial manipulator

The following parameters are assumed for the purpose of trajectory planning of the serial manipulator. The joint angles of the manipulator at the smθk,home are kept as (180°, 0°, − 90°, 0°, 0°, 0°, 0°), so that the first vertex coordinate (vertx1, verty1, vertz1) is attained with the shorter transformation and toolv is considered as 5 mm/s [16]. The front left corner of the workbench is considered the origin (x0, y0, z0), and the rest of the coordinates are assumed by keeping (x0, y0, z0) as reference. Table 3 shows various coordinate frame values.

Table 3 Coordinates of vertices of polygonal machining trajectory

The plot in Fig. 7a show the details of the position, velocity, and acceleration of TCP for the complete machining trajectory. Angles of all joints of the COMAU NS16 at all the waypoints computed by the numerical inverse kinematics model are detailed in Fig. 7b. The positional error (Fig. 7c) of the TCP frame obtained during simulation is observed to be very low throughout the trajectory, which portrays the efficiency of the inverse kinematics-based trajectory planning model. Figure 7d show the iterations consumed to compute the joint angles at each waypoint. The data conveys the computational efficiency of the numerical inverse kinematics technique deployed. The peak noticed at the beginning denotes a higher number of iterations consumed for calculating the angles of joints 2, 5, and 6. This peak is due to higher variation between joint angles of neighboring waypoints (see Fig. 7b). The bumps in the joint angles happen because the reach at those waypoints is minimal. It can also be observed that no sudden variations are present throughout the trajectory. Hence, the trajectory obtained is very smooth. Figure 7e represent the home position (smθk,home) of the serial manipulator. Figure 7f–h are related to joint variables when time t = 2, 350, and 700 s of the machining process, respectively.

Fig. 7
figure 7

Results of trajectory planning of serial manipulator

Multi-robot task planning of SwarmItFIX robots

Results obtained from the SwarmItFIX task planning module are given in Table 4. The Monte Carlo-based MDP algorithm identifies a total of 30 different support locations (headxm,n, headym,n, headzm,n) throughout the trajectory, and the time duration also estimated using Eq. (5). Finally, the tasks obtained are allocated for both the SwarmItFIX robots based on the procedure established in algorithm 1. Figure 10 proves that the data computed by the task planning module makes both the robots extend the support at the appropriate coordinates at the right time in-line with the machining tool trajectory.

Table 4 Results of task planning module

Homogenous coordination among SwarmItFIX

The positions C66 and C1 are considered as the home positions baseHposq of base agent of SwarmItFIX robot 1 (R1) and robot 2 (R2), respectively. The base agent abides by the sub-optimal policies (π*) computed by the prioritized multi-robot path planning module and obtains the locomotion sequence for attaining the goal positions. Finally, both the robots follow the five-step sequence and attain the goal positions consecutively. Figure 8a represents the sub-optimal policy (R1π3*) of the base agent of R1. Position C7 is the goal state, and positions C32–C34, C45–C47, and C58–C60 have to be avoided as these positions were occupied by R2. Sub-optimal policy R1π3* guides R1 to attain the goal position without colliding with R2 by providing the optimal sequence of actions. In all states, the action with the highest utility value is considered the optimal action. It can be observed from Fig. 8b that action (a1) completes all iterations with the highest utility value among other actions at C20. Thus, R1π3* suggests action (a1) as the optimal action for position C20. Similarly, Fig. 8c represent the sub-optimal policy for the base agent of R2 (R2π1*) for attaining the goal position C7. The optimal action convergence for the state C78 is shown in Fig. 8d. One can also note that the utility values are null for all actions at goal positions as well as obstacle positions. If looked closely, the optimal action in Fig. 8b converges within few iterations, which is very much faster compared with the convergence of optimal action of the second case (Fig. 8d). This is because the distance between baseCposq and baseGposm is comparatively less in the first case. Figure 9 depicts the transformation of π into π* shown in Fig. 8a and c. It can be observed that during the initial phase of iterations, the computational agent performs actions suggested by random policy (π) and requires more than 500 steps for attaining the goal position. When the iteration progresses, the π is slowly becomes optimistic and then the computational agent performs a lesser number of actions to reach the goal position. The sequence of locomotion of the SwarmItFIX robot and the time plan of all the three robots are detailed in Figs. 10 and 11, respectively. The complete details of the sequence of agents of SwarmItFIX robot are presented in appendix B. The horizontal bars (Fig. 11) show the duration of support extended by the SwarmItFIX robot at 30 different support locations, and the line of interpolation shows the movement of the serial manipulator robot. This result confirms on the optimal coordination and time synchronization amongst all the three robots.

Fig. 8
figure 8

Sub-optimal policies and utility values of actions space of the state

Fig. 9
figure 9

Sub-optimal policy convergence

Fig. 10
figure 10

SwarmItFIX agents placement

Fig. 11
figure 11

Coordination and time synchronization of COMAU NS16 and SwarmItFIX

The SwarmItFIX robots extend the support in appropriate locations at the right time when the machining is being performed. In all the planned locations, the supports are provided in line with the movement of the TCP. This synchronization ultimately portrays the efficiency of the proposed planner. In the hexagonal tool trajectory, whenever the robot switches between two segments, a sudden jerk in the tool path is observed (near positions 11, 16, 21 in Fig. 11). However, at the 16th position, the robot achieves the comparatively longer jerk in the tool path (between 350 and 400 s). This jerk is due to the fact that, the robot reaches the farther points in the x-axis of the tool trajectory (higher reach) during the specified duration. The same can be seen in Fig. 7a also. As mentioned in the introductory section, if the reach is maximum the accuracy and repeatability will be of minimum.

Conclusion and future work

The proposed hierarchical-based hybrid type decentralized offline planner successfully coordinate all three robots in sheet metal milling process.

The numerical optimization-based inverse kinematics has been utilized for computing the joint angles for the COMAU NS16 manipulator by considering various constraints. To ensure the collision-free trajectory of the serial manipulator robot with the workpiece, a constraint (4) on smθ1,w, which restricts the negative set of angles, has been imposed. Also, in order to generate a smoother trajectory (without abrupt changes in joint angles between two-way points), a constraint (5) that ensures the shorter translation is also imposed. The same can be verified from Fig. 6b. The position, velocity, and acceleration of TCP for complete trajectory have been computed without any swift changes in joint angles and collision.

The proposed novel prioritized multi-robot path planning approach successfully computes the sub-optimal policies for both the robots for all goal positions. The SARSA algorithm consumes 616.2 s for computation of sub-optimal policies of base agent of the SwarmItFIX robot 1. Whereas, for base agent of the SwarmItFIX robot 2, it took only 196.4 s. The reason being, first, the R2 utilizes only five base positions repeatedly for the total of 15 different tasks. But the robot 1 uses nine base positions for the same number of tasks (Fig. 10). Second, R1 attains all 15 different base goal positions by performing 22 number of SaD steps. But the R2 only performs 10 SaD steps. From the above data, it can be clearly understood the R2 traverses within the small area of the workbench, whereas, the R1 covers the larger area.

The algorithm 1 is able to perform task allocation up to four number of SwarmItFIX robots. But this work only considers only two SwarmItFIX robots. However, this number can be increased depends on the availability of the more robots in the experimental setup. For the computation of sub-optimal policies of the base agent, a constant ceiling number of iterations are performed. This constant number is because the utility value convergence duration is not the same for all states. It varies depends on its distance from the goal state. However, the computation can be further optimized by developing a mathematical relation that returns the optimal number of iterations using this distance between the current position and goal position and as a factor.

For ceiling-mounted COMAU NS16 manipulator is concerned, this work only portrays its trajectory planning only. Investigation on analysis of machining efficiency, machining quality, and stiffness analysis of all joints of the COMAU NS16 manipulator in the ceiling-mounted (inverted) position.