1 Introduction

Presently, the whole world is in panic due to the outbreak of a disease that resulted millions of deaths. In past days, many diseases like plague, Ebola, middle east respiratory syndrome (MERS) and severe acute respiratory syndrome (SARS) being the reason of global epidemic (Hosseiny 2020) that results in fatality. In December 2019, a new virus named as corona virus-2019 (COVID-19) (Singhal 2020; Lai 2020; Rabi et al. 2020) is identified in Wuhan, China with high spreading rate. The corona virus is a linear shaped RNA virus which is positive sensed and single stranded. These viruses are a group of viruses that causes respiratory problem in human body. Such as, alpha (α), beta (β), gamma (δ), and delta (γ) coronavirus (Shereen 2020). COVID-19 belongs to beta family of corona viruses. The symptom starts with common cold, cough, fever, headache and leads to the severe respiratory problem in human body and may cause the deaths. The mortality rate of COVID-19 is approximately 3.3% only. The viruses transmitted through droplets of cough and sneeze of the infected person and contaminated surface. On an average, one infected person can spread the infection among 3–10 individuals. However, the medic people are on high risk as they have frequent interaction with the patient. To avoid the rate of infection among the health worker, many organizations are using robot as attendant. In the year 2015 during the Ebola outbreak (Ashour 2020), robots were used for the said purpose. As epidemics escalate, the potential roles of robotics are becoming increasingly clear. The task of providing services using robots (Yang et al. 2020) to combat COVID-19 is already been started by a government institute ITI of Odisha, India in coordination with SakRobotix (http://timesofindia.indiatimes.com/articleshow/75435055.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst). They are now assigned with the task of taking care of the patient by providing them food, medicine and water. However, the cooperation among the working robot is overlooked. Robots are assigned to perform surgery (Kimmig et al. 2020) for the suspected person to avoid the close contact of the medic people. Other tasks assignments are skipped in this paper. The authors in Yang and Liu (2020) and Javaid (2020) have proposed the concept of robot police. Medical robotics is an emerging field to use the computer-assisted and human-assisted robots for different medical tasks (Troccaz et al. 2019). Robots are deployed randomly in different areas. The ideas about the environment and different events are shared among the robots (Alsamhi et al. 2020).However, the health worker won’t get benefit from this. The authors in Zemmar et al. (2020) have explained about the surgical robots during this pandemic. A different set up is established for the robot. The contamination of the health worker with the patient is reduced. But it becomes costlier for the new set up and does not explain about attending the COVID-19 patient as well.

The main objective is to reduce the rate of exposure of the health worker to the patient and the contaminated surfaces as well, reduction in the number of health worker, controlling the rate of spreading the virus. To combat with this pandemic disease, we have proposed a new approach of proving medical assistance through the robot with proper cooperation. The robots use a variant of reinforcement learning (Kober et al. 2013) approach to achieve their deterministic moves in controlled environment called as Q-learning. It is a simple and efficient approach that enables the robot to discover the optimal behavior. Each step performance of the robot is measured with an objective function (Pandey et al. 2017; Low et al. 2019) based on the algorithm and parameters (Konar 2013; Low et al. 2019; Van et al. 2016) used.

In this paper, we have designed a theoretical model for robot cooperation. Multiple robots are available in the hospital to attend the patient. The robots work as.

  1. (1)

    Patient carrier to the target bed moves the patient from the ambulance arrival to the empty bed.

  2. (2)

    Medicines and diet provider Provide medicine and diet to the patient on time

  3. (3)

    On-call service provider provide services like water distribution to patient who requests.

  4. (4)

    Emergency controller that calls the doctor on emergency condition of the patient.

Our main contributions in this paper are, (1) Reducing the exposure of the medic people, (2) Reduction in number of health worker by deployment of medic robot, (3) Categorizing the medic robot in to different groups based on the assignment of work, (4) Application of reinforcement learning approach for finding the path of the medic robot, and (5) Computation of collision free path and achieving cooperation between the co-working robots.

Rest of the paper is organized into three sections. The Sect. 2 describes the material and methods used to execute the proposed approach, Problem formulation with the theoretical environment. In Sect. 3, discusses the working procedure with algorithm to execute the proposed approach and result analysis followed by Sect. 4 with conclusion and future scope.

2 Material and method

2.1 Problem formulation

The problem is formulated with multiple homogeneous robots working in the COVID-19 medical environment (Bagoji and Bharatha 2020) as shown in Fig. 1. It represents the environment where the robots will attend the COVID-19 patient. These robots are divided into two groups based on their assigned function. The robot groups are as follows.

Fig. 1
figure 1

COVID-19 hall

Group 1 (Patient Carrying Robot) Responsible for transferring the new patient from the entry point to the empty bed as the target in the COVID-19 hall. Two robots are assigned to do the task. They are fitted with image sensor and stand near the entry point. Once they observed the arrival of a patient, they will go execute the hold, put, move and return function to reach to the target with the patient in a stretcher. These two robots are assumed to satisfy the distance constraint that is the length of the stretcher. If it is more than that then they will face issues on transferring the patient from one position to another position. The basic Q-learning approach is used to compute the next position at each step towards the target with collision avoidance.

Group 2 (Service Provider Robot) Responsible for providing services like medicines, food and water to the patient in time and monitors patient’s health condition. These robots are assumed to stay inside the room. They are equipped with time and temperature sensor. The patients’ medicine and food times are set in the robot. At a particular instant of time t, the robot R will provide medicine and food or water to the patient i. the function that describe the task is represented as PRO-MED (R, i, t) and PRO-SER (R, i, t) respectively. For each robot of this group the information is stored in an array where the entry represents the timing of the patient. The temperature sensor senses the temperature of the patient and sends the data to the concerned health worker. If any emergency is found in any patient, video call can be made to the medic so that necessary steps can be taken. However, in regular interval the robots offer water to the patient. Image sensor and temperature sensor equipped with some robots to serve the request and monitor health.

To avoid collision among themselves, they need an algorithm to be executed. From the robots assigned position, several paths exist to reach to the target patient. Difference between the desired shortest path \(DA\) and the actual path obtained \(DC\) is computed. The paths are computed as in Eqs. 1 and 2.

$$DA = \sqrt {\left( {x_{init} - x_{goal} } \right)^{2} + \left( {y_{init} - y_{goal} } \right)^{2} }$$
(1)

Equation (1) computes the Euclidean distance between the initial and target of the robots. However, for collision avoidance with the obstacle and other robots in the environment, the distance varies from the actual Euclidean distance. The distance computed in such environment is represented as follows.

$$DC = \mathop \sum \limits_{i = 1}^{k} \sqrt {\left( {x_{i} - x_{goal} } \right)^{2} + \left( {y_{i} - y_{goal} } \right)^{2} }$$
(2)

For each i representing intermediate state encountered along the computed path and value ranges from 1 to k. The objective function is described as follows to minimize the distance error.

$$F_{1} = c_{1} *\left| {DA - DC} \right|$$
(3)

Similarly, another objective function is computed for the robot that minimizes the delay cost incurred with the turns or rotations made by the robot. \(TA and TC\) compute the total number of turns made in the actual path and the computed path.

$$F_{2} = c_{2} *\left| {TA - TC} \right| + c_{3} *L$$
(4)

There, \(c_{1} , c_{2} , c_{3}\) is the weight factor with the values 1, 0.7 1nd 0.3 respectively. Average rotational penalty is computed by L with the total number turns made along the path, which is computed as

$$L = \mathop \sum \limits_{i = 1}^{l} L_{i}$$
(5)

Numbers of rotations or turns made by the robots are computed with respect to the next pixel computed. Each time a next position is found, \(L_{i }\) is updated using the following equation with the assumption that the robots are facing towards positive y axis.

$$L_{i} = \left\{ {\begin{array}{*{20}l} 1 \hfill & { if\,\, x_{i} = x_{i - 1} + v cos \, \theta \,\,and\,\, y_{i} = y_{i - 1} + v sin\theta } \hfill \\ 1 \hfill & {if \,\,x_{i} = x_{i - 1} + 1 \,\,and\,\, y_{i} = y_{i - 1} } \hfill \\ 0 \hfill & {otherwise} \hfill \\ \end{array} } \right.$$
(6)

The objective function now minimizes the rotational cost and path cost using the following equation.

$$F = \beta_{1} * F_{2} + \beta_{2} *F_{1}$$
(7)

where \(\beta_{1} \;{\text{ and}} \;\beta_{2}\) are the weights of the difference in distance and rotational cost respectively. The values are adjusted during simulation and found \(\beta_{1} = 0.2 \;{\text{and}} \;\beta_{2} = 0.4\) as the preeminent values. Hence, optimized cost is achieved by minimizing the fitness function in Eq. (7) with assigned weights for each criterion.

2.2 Q-learning approach

In Q-learning approach, the next state of the robot is computed with the state-action pair. The environment is considered as 2-dimensional matrix. Each field in the matrix is called as a state and represented by the set \(S = \left\{ {s_{1} ,s_{2} , s_{3} , \ldots s_{N} } \right\}.\) each state is associated with m number of actions and represented as \(A = \left\{ {a_{1,} a_{2} , \ldots a_{M} } \right\}\). \(Q\) is a M X N matrix that holds the Q-value of each matrix entry for the state-action pair \(\left\{ {s_{i} ,a_{j} } \right\}\) as ith state with jth action. The next state is selected from the current position using the transition function \(\sigma \left( {s_{i} ,a_{j} } \right) = s_{i}^{^{\prime}} .\) The best next state is selected by maximizing the total rewards expected \(r_{i} .\) The Q-value is updated using the following equation.

$$Q\left( {s_{i} ,a_{j} } \right) = \left( {1 - \alpha } \right)*Q\left( {s_{i} ,a_{j} } \right) + \alpha *\left[ {r_{i} + \gamma *max_{a} Q\left( {s_{i}^{^{\prime}} , a} \right)} \right]$$
(8)

where, \(r_{i} ,\;{\text{ and}}\;\gamma (0 < \gamma < 1)\) represents learning rate, reward received by the robot on execution of action \(a_{j} at state s_{i} ,\) discount factor respectively. The Q-learning algorithm generates a Q-table with all states and possible actions for achieving the Q-values. The optimal state is generated by recursively attempting the Q-learning algorithm with the following greedy strategy which leads the Q-value to converge on time.

$$Q_{i + 1} \left( {s_{i} ,a_{j} } \right) = Q_{i} \left( {s_{i} ,a_{j} } \right) + \alpha *\Delta Q_{i} \left( {s_{i} , a_{j} } \right)$$
(9)

The learning rate and change in Q-value due to the action taken is defined with respect to the time spent at a state \(s_{i}\) for the selected action a is represented using the following equations.

$$\alpha = 1 + \frac{1}{{total\,time\,spent\,in\,visiting\,state\,s_{i} }}$$
(10)
$$\Delta Q_{i} \left( {s_{i} , a_{j} } \right) = r_{i} + \alpha max_{a } Q_{i} \left( {s_{i}^{^{\prime}} ,a_{j} } \right) - Q_{i} \left( {s_{i} , a_{j} } \right)$$
(11)

where, \(\sigma\) is the transition function to compute the next state for the selected action as follows.

$$Q_{i} \left( {s_{i}^{^{\prime}} , a_{j} } \right) = Q_{i} \left( {\sigma \left( {s_{i} , a_{j} } \right), a} \right))$$
(12)

In multi-robot environment reward is computed with respect to the collision avoidance state selection and distance constraints for group1 robots.

2.3 Working model

To design the theoretical model for the proposed algorithm, the COVID-19 patient hall is designated as a 400 × 500 pixels area approximately having the beds and robots at some grid area as shown in Fig. 1. The robots are equipped with wheels of 6 degree of freedom. The angular movement of the robot includes 0, 180, 90 (L, R), 45 (L, R), 135 (L, R) and no moves. The direction of movement and next step depends on the angle of rotation and cleanliness of the next state. The next state computation is done using Figs. 2 and 3. We have divided the working procedure in two cases based on the robots’ group.

Fig. 2
figure 2

Next state consideration group 1 robot

Fig. 3
figure 3

Next state consideration group 2 and 3 robots

Case 1 It describes the working procedure of group 1 robots. Both the robots are assumed to carry the COVID-19 patient to the empty bed named as TARGET in Fig. 2. Both the robots carry their next step in such a way that the distance between both of them will be same as the stretcher length at each move. The next states computation for each robot is done using the following equation.

$$x_{p}^{^{\prime}} = x_{p} + v cos\theta ,\quad y_{p}^{^{\prime}} = y_{p} + v sin\theta$$
(13)

where p is the current robot’s position. The value of velocity is assumed as 50 cm/s and value of \(\theta\) based on the selected action. The set of action A is represented as \(\left\{ {E, W, N,S, NE,NW,SW,SE} \right\}\) with the angular movement 0, 180, 90 (L), 90 (R), 45 (L), 135 (L), 135 (R), and 45 (R) respectively. On implementation of Q-learning, the next state is computed with respect to the Q-value of that state and hence the corresponding action will be taken. In this group of robots, the maximize Q-value for both the robot will be considered with the assumption that the same action executed for both of them. The Q-value is computed as follows with \(s_{p} = \left( {x _{p} ,y_{p} } \right) \;{\text{and}} \;s_{p}^{^{\prime}} = \left( {x_{p}^{^{\prime}} , y_{p}^{^{\prime}} } \right)\) with the assumption \(s_{p}^{^{\prime}} = \sigma \left( {s_{p} ,a_{j} } \right)\)

$$Q_{R,i + 1} \left( {s_{p} ,a_{j} } \right) = Q_{R,i} \left( {s_{p} ,a_{j} } \right) + \alpha *\Delta Q_{R,i} \left( {s_{p} , a_{j} } \right)$$
(14)

where R represents the robot number and its value is 1 or 2 for the group 1 category. If any of the patient carrying robot find an obstacle in its path for a given action and the other one has maximized Q-value for that action, both of them will take the next action with a maximum Q-value.

Case 2 It describes the working procedure for group2 robots. As shown in Fig. 4, they are the single robot capable to move in all the direction as group1 robots. Based on the distance and direction of the target, the robot selects its next step. The next step computed using Eq. (13).

Fig. 4
figure 4

Flowchart of QCOV-R algorithm

2.4 Proposed algorithm

Based on the working procedure as mentioned in the previous section, the executions steps are formed. The algorithm represents both the cases for all the robots’ group. Case 1 includes the working steps of the group 1 robot carrying patient to the COVID-19 hall. Similarly, case 2 describes the execution steps of service providing robots that is group 2. The algorithm starts with initializing the robots’ position and their groups. The work varies with respect to the initialized group. If the robot identified as group 1, it finds its cooperating robot in the group and senses the arrival of patient. Subsequently, it finds its steps towards the target with the patient. If any obstacle found at its path it updates the previously stored Q-table and find the next best step. If it is identified as a group 2 robot, it identifies the target patient and call the FIND () function. The function returns with the next position with collision avoidance to provide service. In the similar way, the robot provides the service to the patient by sensing the patients’ request. These group2 robots are also responsible to monitor the health condition of the patient. If it senses the temperature deviates from the threshold temperature, it calls the doctor for emergency service.

figure a

The pictorial representation of the algorithm is as shown in Fig. 4. Positions for each of the medic robot are initialized in the environment. Each robot is identified by its group number. If it is group 1, then a cooperating robot is identified for execution of the patient carrying task. FIND () procedure is called to compute the target position. If it is not a group 1 robot, then simply identify its target. Based on the alarm the robots perform their task as a medicine provider or an emergency handler. Accordingly, the tasks are executed and the medic robots return to their original position.

3 Result analysis

The proposed algorithm is carried out to test several parameters such as infection ratio, mortality rate, asymptotic complexity, collision avoidance and path deviation.

3.1 Risk analysis

The data set (www.covid19india.org; Ing et al. 2020; Estadilla 2020) for COVID-19 is taken from dashboard of Johns Hopkins University, WHO, World meters, Indian Council of Medical Research, and Kaggle site. The pandemic of a virus is measured from its reproduction number which is also termed as R naught value (Chen 2020; Delamater 2019). This number indicates the rate of spreading from one person to others. We have computed the values from the available data and represented in Fig. 5. In India, the value ranges from 1.7 to 1.81 (Rao et al. 2016). In reality, a COVID-19 patient usually exposed to 47 medic people and affects 2–3 individuals based on its spreading rate. Calculation is made on the basis of number of public health care services and the medic people including doctors, nurses, midwives and pharmacists are exposed to a COVID-19 patient per 10,000 populations. As compared to world, the ratio of medic people in India is less as shown in Fig. 6. Total health worker per 10,000 populations is 28 and 16–20 in the world and India respectively who are on risk without the use of medic robot. However, the risk can be reduced with the assistance of medic robot.

Fig. 5
figure 5

R0 value comparison

Fig. 6
figure 6

Health workers on risk and COVID patient in 10,000 population

The medic people attending the COVID-19 patients are on high risk of infection. The spreading varies as shown in Fig. 7. The novel corona virus is declared as global pandemic by WHO based on its R-naught value. Total number of health workers affected in the world as well as India due to this virus is as shown in Fig. 8. If we use the medic robot the rate of infection among the medic people will be reduced.

Fig. 7
figure 7

Risk of medic people to be infected without the help of medic robot

Fig. 8
figure 8

No. of health worker affected with the virus

3.2 Experimental analysis

The proposed approach is coded using programming in C language and the obtained result plotted using SCILAB software. A grid of 440 × 480 pixels area is considered as the COVID-19 hall where the medic robots are present. They are plotted with circle of radius 5 pixels unit. The initial positions of the robots are fixed. Group 1 robots are present near the hall gate. Other service provider robots are present at some predefined points. The hall is plotted with 20 beds in rectangular shape of size 80 × 20 pixels unit. These beds and other robots are assumed as the static obstacles and dynamic obstacles respectively. The COVID-19 hall is considered as the multi-robot environment with multiple medic robots performing their assigned task. We have executed the medic robot operation and cooperation in 5 different environments. However, for space purpose we have shown only 2 environments in this paper. Figure 9 shows the initial environment, initial position of group 1 medic robots, predefined bed and empty bed as the target. The medic robots identify the target position and move towards the target with collision avoidance (Şenbaşlar et al. 2019) with the so-called obstacle as bed or another robot in that environment. An intermediate step is shown in Fig. 10 that describes the movement of the group 1 medic robot from their position to the target by holding a stretcher having patient. For simplicity, we have shown the movement from the hall gate instead of the ambulance point. In environment 1, the target bed is assumed as the 3rd bed of 4th column. The coordinates are identified by the medic robot to move the patient to that bed. The stretcher in our program which will carry the patient is represented with a rectangle.

Fig. 9
figure 9

Initial environment for group 1 robots

Fig. 10
figure 10

Intermediate step of group 1 robots

The medic robot of this group need high cooperation and coordination among them. If one robot will move in one direction and other will take different action, then the assigned task can’t be completed. The action selected by both the robots are uniform throughout their moves. However, if the Q-values (Jiang et al. 2018; Rahman et al. 2018) obtained for one robot is maximum and another will find an obstacle that returns minimum Q-value, the second robot will be at its positon and the first one make turnes to be aligned with its co-working robots as well as the target. Figure 11 shows the complete path of group 1 robot in environment 1. Once the medic robots finish their assigned task, they will return back to its initial position as shown in Fig. 12. The medic robots may face the situation of dynamic changes in their target while transition. In such case, they will be sensed with the changed target position and update their path as well as the Q-table for the new target as shown in Fig. 13. The complete path due to the dynamic change in the target and return path are as shown in Figs. 14 and 15 respectively. The initial instance for group 1 medic robot in another environment is as shown in Fig. 16. In this environment we have executed the program with a different target as 1st bed of 2nd row. The intermediate step of transition from initial position to the target is represented using Fig. 17. Figure 18 shows the final path to the sensed target with collision avoidance with other robots as well as other equipments in the hall. The return path is as shown in Fig. 19.

Fig. 11
figure 11

Final step of group 1 robots

Fig. 12
figure 12

Group1 robot return path

Fig. 13
figure 13

Target changed during movement

Fig. 14
figure 14

Path travelled to reach new Target

Fig. 15
figure 15

Return path of group 1 medic robots for changed target

Fig. 16
figure 16

Initial step of group 1 robot in env 2

Fig. 17
figure 17

Intermediate step of group 1 robots in env 2

Fig. 18
figure 18

Final step of group 1 medic robot in env 2

Fig. 19
figure 19

Return path of medic robot in env 2

Figures 19 and 20 describes the intermediate and final step of the group 1 medic robot in this environment respectively. The medic robots completes its assigned work and return back to its original position as shown in Fig. 21. By assuming a dynamic change in the target, the medic robot felt the change and updates its movement according to the new target. The paths to reach to the new target and back to its initial positions are as represented in Figs. 22, 23 and 24 respectively.

Fig. 20
figure 20

Dynamic change in target in env 2

Fig. 21
figure 21

Path to new target in env 2

Fig. 22
figure 22

Return path in env 2

Fig. 23
figure 23

Initial step of service provider robot

Fig. 24
figure 24

Intermediate step of service provider robot

The service provider robots are supposed to provide services to the target patient. The target is identified and served with the requirements by such robots. The initial condition, and path to the target and return back from the target to its pre-defined positions are shown in Figs. 23, 24, 25 and 26 respectively. The complete path is obtained using the Q-matrix computed in the multi-robot environment. The instances of Q-matrix are shown in Figs. 27 and 28. Dynamic changes of the target for service provider robot is not revealed here because of the availability of other robots to provide services. As well the service provider robots are alarmed for providing services. The coordinates position obtaiened by the group1 medic robots in both the environments are represented in Figs. 29 and 30 respectively. Paths travelled are computed and shown in Fig. 31. Service provider medic robot’s path are computed from its initial position and shown in Fig. 32.

Fig. 25
figure 25

Final step of service provider robot

Fig. 26
figure 26

Return path of service provider robot

Fig. 27
figure 27

Instance 1 of Q matrix

Fig. 28
figure 28

Instance 2 of Q-matrix

Fig. 29
figure 29

Path travelled by group 1 robot in env 1

Fig. 30
figure 30

Path travelled by group 1 robot in env 2

Fig. 31
figure 31

Distance covered by g1 in env1 and env2`

Fig. 32
figure 32

Path travelled by group2 medic robot

The Euclidean distance covered by the medic robots with respect to the travelled trajectory in both the environment using Q-learning and Improved-Q-Learning (Low et al. 2019; Singh and Singh 2018) approach as well. Path obtained by both the approach are approximately same. Both group medic robots are moved in both the environment using the above-mentioned approaches. The travelled paths are shown in Figs. 31 and 32. Average travelled trajectory path deviation (ATTPD) (Das et al. 2016; Panda et al. 2018; Bakdi 2017) for each movement is computed with respect to actual path (\(Path_{actual}\)) and computed path (\(Path_{computed}\)) using the following equation.

$$ATTPD = \left| {\frac{{Path_{actual} - Path_{computed} }}{{Path_{actual} }}} \right|$$
(9)

Figure 33 shows the average path deviation of different medic robots. Both Q-learning and Improved-Q-learning algorithm with medic robots are executed and shown in Fig. 34. It reveals that the execution times are approximately same for each robot irrespective of the algorithms. The path deviated by the medic robot in terms of steps with percentage of deviation for each environment is as shown in Fig. 35 and also depicted in Table 1.

Fig. 33
figure 33

Average path deviation

Fig. 34
figure 34

Execution time in seconds

Fig. 35
figure 35

Deviation percentage

The objective function of the QCOV approach is evaluated in terms of number of steps, turns made, execution time and path travelled by the medic robots. The weights are updated and performance of the robots is analyzed and shown in Table 2. The values highlighted in the Table yield minimum deviation. Values are computed for both the fixed target and dynamic target for group1 robots. Both patient carrying and service provider medic robots are executed in five arrangements as listed in Table 3. These values are computed for both the Q-learning approach and the Improved Q-learning approach using the following equation. The result reveals that the objective function returns approximately same value for both of the algorithm. Both the algorithms are of reinforcement type and executed in the almost known environment. The dynamicity is only due to the movements of the service provider robots which are limited in number. Hence, the execution time as well as the overall performance for these approaches returns with minimal difference as shown in Fig. 36.

Table 1 Path deviation in terms of steps
Table 2 Parameters’ influence on Robots’ performance
Table 3 Performance evaluation in different environments
Fig. 36
figure 36

Performance of medic robots

3.3 Comparative analysis

A comparative analysis of the proposed approach is done with other approaches mentioned in the literature. Ability and complexity (Sadhu and Konar 2017; Gadaleta 2017) both are analyzed with respect to the utilized time space and effectiveness. Time required to complete the execution process is referred as time complexity. However, the amount of space required is called as space complexity.

3.3.1 Ability comparison

A comparative analysis based on several parameters such as types of robot used, the objective achieved, complexity of the approaches and the cost effectiveness of different approaches are represented in Table 4. The authors in Kimmig et al. (2020), Troccaz et al. (2019) and Alsamhi et al. (2020) (http://timesofindia.indiatimes.com/articleshow/75435055.cms?utm_source=contentofinterest&utm_medium=text&utm_campaign=cppst) have used autonomous robots for execution of the assigned task. The main objective of the robots’ usage is as a service provider. The approaches are too complex based on its space and time. Similarly, the need of separate set up for the robots makes the above- said approaches costly.

Table 4 Comparison of proposed method and other references

3.3.2 Space complexity

The space required for the algorithm is the space to store the Q-table. As we have represented the COVID-19 hall as a matrix with N rows (number of states) and M column (number of actions associated with each state), we required a N X M matrix to store the Q-value irrespective of the robot’s group. Thereby, the space complexity is represented by the asymptotic notation as O(MN).For next state identification, the largest Q-value is need to be determined. In improved-Q-learning approach, 2 storage spaces are required for each state to store the Q-value and lock variable. It requires 2 × N storage space and the asymptotic space complexity is O(N). It is similar to the classical Q-learning approach if we keep M is constant.

3.3.3 Time complexity

In classical Q-learning approach, for each action the robot will access the Q-table and requires M − 1 number of comparisons. The total number of accesses made by N robots will be N × (M − 1). Thereby the resulting time complexity is asymptotically bound as O (MN). Action set is restricted with M = 8 actions only that is \(A = \left\{ {E, W, N, S, NE,NW, SE, SW} \right\}\). Hence, both the complexity is rewritten as O (N) by ignoring the constant term M. However, in improved-Q-learning approach, no need to compare the Q-value. Rather, the lock variable needs to be checked. It requires N number of comparisons resulting the time complexity as O(N).

The complexity analysis in terms of both space and time, it reveals that both Q-learning and improved-Q-learning approach exhibit same bounds. Hence, for simplicity purpose, Q-learning is better than any other complex algorithm to get the expected performance of the medic robots.

4 Conclusion

The outbreak of COVID-19 has put the whole world under extensive pressure due to its doubling rate of spread. The medic people working as the front-line warriors are on high risk as they have frequent exposure to the infected patient and the contaminated surface as well. Robot have the potential to work as the medic to reduce the life risk of the physician of all specialties. This paper explains the deployment of medic robot for patient carrying, delivering food and medications and handling the emergency health services. The medic robots are working cooperatively without collision in the COVID-19 chamber with the use of simple reinforcement learning approach. Using this concept, the mortality rate of the health worker will be reduced to 2% only all over the world. In future, we will extend this work with more complex learning approach.