End-to-end human inspired learning based system for dynamic obstacle avoidance

As a first, the paper proposes modelling and learning of specific behaviors for dynamic obstacle avoidance in end-to-end motion planning. In the literature many end-to-end methods have been used in simulators to drive a car and to apply the learnt strategies to avoid the obstacles using the lane changing, following the vehicle as per the traffic rules, driving in-between the lane boundaries, and many more behaviors. The proposed method is designed to avoid obstacles in the scenarios where a dynamic obstacle is headed directly towards the robot from different directions. To avoid the critical encounter of the dynamic obstacles, we trained a novel deep neural network (DNN) with two specific behavioral obstacle avoidance strategies, namely “head-on collision avoidance” and “stop and move”. These two strategies of obstacle avoidance come from the human behavior of obstacle avoidance. Looking at the current frame only, for a very similar visual display of the scenario, the two strategies have contrasting outputs and overall outcomes that makes learning very difficult. A random data recording over general simulations is unlikely to record the corner cases of both behaviors that rarely occur, and a behavior-specific training used in this paper intensifies the same cases for a better learning of the robot in such corner cases. We calculate the intention of the obstacle, whether it will move or not. This proposed method is compared with three state-of-the-art methods of motion planning, namely Timed-Elastic Band, Dynamic Window Approach and Nonlinear Probabilistic Velocity Obstacle. The proposed method beats all the state-of-the-art methods used for comparisons.


Introduction
While humans move in the environment or drive a car or bicycle, they observe the environment and identify the dynamic obstacles and static obstacles and infer the intent of the dynamic obstacles without having any communication. This information is extracted by humans with a glimpse of the environment using which the humans plan their behavior or control the car. This is a challenging problem in robotics. To imitate the human behavior, we closely observe the obstacle avoidance strategy of humans. This paper addresses the problem of collision avoidance of dynamic and static obstacles closely inspired from the B S. M. Haider Jafri sayedhaiderjafri6@gmail.com Rahul Kala rkala001@gmail.com 1 Centre of Intelligent Robotics, Indian Institute of Information Technology, Allahabad, India behavior of humans. The dynamic collision avoidance of the humans is a combination of two most common behaviors, head-on collision avoidance and "stop and move". The head-on collision situation comes when a person moving towards a goal encounters an obstacle coming from the opposite direction directly towards the person. In this situation, the person both takes a left or right turn to avoid a collision and thereafter instantly aligns to the goal. In [1], the headon collision avoidance behavior was solved by proposing a reciprocal velocity obstacle method, where the agents take right on coming closer to each other. Since the authors primarily aim to minimize the path length, the agents minimally deviate their paths while maintaining smaller clearances. In contrast, the proposed method is anticipatory where clarity of intentions is the main aim and therefore the agents balance between minimizing path length, maximizing clearance, and demonstrating a clear strategy to avoid each other without communication.
In this study, we believe that the person generally prefers taking a left turn instead of a right turn shown in Fig. 1 as Fig. 1 Humans avoiding obstacles, Head-on collision [3]: (a-d) Person taking left to avoid the obstacle coming in the direction of motion marked by a red colored line, and again aligning in the direction to goal after avoiding, Stop and move: (e-h) Person (green) stops to let the obstacle (red) pass by, and starts moving after the obstacle has passed from the front. Schematic representation: (i) head-on collision (j) stop and move per the social conventions in the authors' country. As per the social conventions in some other places, a right turn may be preferred. Figure 1a-d shows a head-on collision avoidance behavior, where both persons marked with the red circle try to avoid each other. The situation corresponding to the "stop and move" behavior comes when an obstacle crosses the path of a person, and the person slows (or stops in an extreme case) to give a pass to the obstacle as shown in Fig. 1e-h.
The figures show the behavior where the person in green circle stops to let the person in red circle pass. The stop and move behavior is a natural behavior of humans while driving a car and while crossing the roads. The autonomous vehicles follow the same pattern of behavior while detecting traffic light or detecting intersections. In [2] the authors build a deep learning-based solution to drive a vehicle in different scenarios, while trying to learn the affordance of the traffic state.
The dynamic obstacles are capable of decision making without communication. In a decentralized system of navigation, it is hard for the participating agents to get into a consensus on how to best avoid each other, as each agent's trajectory depends upon the others. How the humans can come to a consensus without any communication in a decentralized manner is still a mystery for the researchers. However, it is important to estimate the intent of the dynamic obstacle to devise a strategy to avoid it.
The imitation of obstacle avoidance strategy is not an easy task. The nature of a human's obstacle avoidance at a local level is reactive, while the deliberative aspects are largely limited to deciding the route around the static obstacles. The obstacle avoidance strategy is learnt from the experiences by the humans. The humans learn many tasks through their life experiences. First, they start walking, then they start running, and last but not the least they start driving. A similar concept applied in this paper. First, we make a navigation function to learn the static obstacle avoidance. Secondly, we learn the head-on collision behavior. At last, a "stop and move" behavior is learnt. The success of end-to-end learning-based [4] approaches in the literature suggest that obstacle avoidance behaviors can be learnt and successfully transferred to the robot. In this paper, the raw information of the environment, the temporal information of the nearest dynamic obstacle and the goal information makes the input to the end-to-end DNN to predict the angular and linear velocity and makes the agent capable to instantly avoid the obstacles.
The data set is collected by the deployment of Pioneer 3AT mobile robot in the Gazebo simulator. We have created a situation similar to the obstacle avoidance strategy of the humans in the simulator. We manually move the agent (Pioneer 3AT) using a joystick and avoid the dynamic obstacle using a head-on collision behavior and "stop and wait" behavior. The static obstacle is also avoided with a preference for the left turn. The data set is collected for 12 h and 0.25 million time-steps. The training of the end-to-end model from this dataset was difficult. We created a DNN model using a 1D CNN and LSTM with three inputs at the input layer. The model was successfully trained for the obstacle avoidance strategies. The major achievement of this model is that the behavior is a combination of three behaviors and while deployment of the agent in different scenarios (other than the trained), the agent behaves differently (other than the trained behavior). It can explain that the model can handle the uncertainty (situation other than training) by reacting differently.
In the area of learning-based motion planning, significant contributions are coming in multi-agent reinforcement learning (RL) frameworks that eliminate the hard work for the data collection. Our work contributes only towards the offline learning and not online learning like RL. In multi-agent RL, the different agents learn to mutually avoid each other in different settings while maximizing a common reward function. We argue that the humans do not follow a policy that maximizes a pre-known reward function. The robots learnt under multi-agent RL settings can often expect the humans to act in certain ways and a contradictory avoidance by the humans can come as an unforeseen situation for the robots. Further, the robot acting contradictory to what the humans anticipate can be uncomforting for the humans, significantly limiting the adoption of robots amidst human environments.
Similarly, the use of deliberative planning algorithms for the generation of a dataset for supervised learning is a commonly used technique. The trained algorithm in such a context carries the same limitations as that of the deliberative algorithm used for making a dataset. Again, the limitation of such approaches is that the humans may act contradictory to the simulated obstacles, a situation that the robots are not trained to handle. The robots are also not guaranteed to display the social conventions anticipated by the humans.
A typical implementation for learning the robot navigation in a supervised manner teleoperates the robots in general scenarios, sometimes using intelligent algorithms for the navigation of the obstacles. A significant portion of the human navigation consists of free-walking or following someone at front at a distant. The head-on collision avoidance and stopand-wait are rare behaviors. A large dataset recording is likely to feature very small instances of head-on collision avoidance and stop-and-wait behaviors, that many models may even filter out as noise. The two behaviors are heuristically very important. Therefore, the proposed approach creates a dataset focusing on such rare behaviors in order to train the robot well on such corner cases that are often neglected in the literature.
The main contributions of this paper are as follows: 1. This paper proposes an end-to-end reactive motion planning algorithm to avoid the dynamic and static obstacles in the environment. 2. The end-to-end DNN architecture is trained using three different behavior-specific datasets to incorporate multiple behaviors in a single function of DNN. 3. We introduced intention detection in this method, which makes the agent aware of the intention of the obstacles close to the agent, including whether the obstacle is static or dynamic. The intension is from a raw sensor feed that removes the necessity of object level detection of the obstacles. 4. The method outperforms 3 state-of-the-art methods for the navigation of robots.
The structure of the paper is as follows: Sect. 1 gives the introduction of the proposed method and the collection of the dataset in the simulator. Section 2 discusses the three aspects of motion planning (i) classical methods, (ii) hybrid methods, and iii) end-to-end supervised and reinforcement learning based approaches. In Sect. 3 we discuss the inputs and the architecture of the proposed method. In Sect. 4, the method is evaluated in static and dynamic environments and comparisons with the state-of-the-art methods are presented. The strengths and weaknesses of the method are discussed in Sect. 4.

Related works
Many algorithms have been proposed for dynamic and static environments. The deliberative algorithms work for static and partially dynamic environments. However, they consume a lot of energy and time to improvise the path for the static environment and the dynamic environment for situations where the trajectories of the other agents can be anticipated. The reactive approaches save the energy and time in decision making for moving the robot to the goal. The major drawback of this approach that there is no guarantee to reach the destination. The learning-based approaches are used for reactive decision making, enabling the agent to learn from the experiences. This section of the paper covers the different algorithms of motion planning from the classical approaches to the learning-based motion planning approaches to make the readers aware of the gaps of research that exist so far and how the present approach contributes to fill these gaps.
The most naïve algorithms of motion planning are Bug1, Bug2 [5] and Tangent bug [6] algorithms. These algorithms make the agent follow the boundary of the obstacle until they find an obstacle free orientation towards the goal. Another common class of algorithms is vector-based. The basic idea of these algorithms is negative and positive forces. The first assumption is that the obstacle in the environment generates a negative force, and the goal generates a positive force. Based on this assumption, the two algorithms implemented are VFF (Vector Field Force) [7] and PFs (Potential Fields) [8]. These algorithms are easy to implement and not much complex. They are highly parametric and complex to configure and make the robot stuck in a local minimum, not guaranteed to find the solution.
To navigate the robot in a structured environment, the Cartesian coordinate space is used. On the contrary, the histogram approaches represent the environment in the angular form. They find the angular orientation of the goal, obstacle and free space, and present in the form of a cost function. The VFH (Vector Field Histogram) [9] and its updated version VFH + [10] are the histogram-based approaches. These algorithms incorporate the uncertainty of the environment and also adhere to the kinematic constraints of the agent (nonholonomic robot). They also suffer from the local minima problem in concave (say, "U"-shaped) obstacles and do not allow the agent to move further towards the goal. In the velocity space approaches, the whole environment is represented in the form of velocity space instead of Cartesian space. The space includes the velocities which makes the agent move towards the destination without any collision, eliminating the velocities which are occluded by obstacles. The algorithm selects the best velocity at each iteration from the velocity space, which quickly moves the agent to destination. Some prevalent velocity-based methods in the motion planning of mobile robot are DWA (Dynamic Window Approach) [11] and CVM (Curvature velocity Method) [12]. The problem of local minima in these approaches cannot completely be eradicated and persists.
To overcome the problem of local minima, the ND [13] (Nearness Diagram) and SND [14] (Smooth Nearness Diagram) algorithms were proposed. These algorithms classify the environment for different scenarios and the strategy of obstacle avoidance and goal seeking used is different for different scenarios. The major issues of these kinds of algorithms are a highly complexity and a good perception of the environment is necessary to increase the accuracy of classification.
Another area of motion planning to tackle the problem of local minima is the combination of more than one technique. One technique finds the global goal of the agent and another technique finds the next step to reach the sub-goal. In [15], the authors used the optimal control theory to find the optimized path based on travelling time between every pair of targets. The authors used different clustering algorithms to solve the task assignment problem, assigning each vehicle to a cluster of targets to be optimally visited. Similarly, in [16], the optimal control theory was used to find the optimized cost between every target, while a co-evolutionary and multi-population Genetic algorithm was used for the task assignment problem. Both approaches used navigation in a drift field. The methods like LCM (Lane Curvature Method) [17] and BCM (Beam Curvature Method) [18] divide the environment to collect the goal and apply CVM to reach this goal. The other method DCVM, modified version of CVM is used to handle the dynamic obstacles. This method predicts the Cartesian coordinates of the dynamic obstacle and merges in the perception of the environment. This way the agent can plan their next movement and avoid the dynamic obstacle and anticipate the collision free manoeuvre. In this kind of algorithm, the risk of collision depends upon how accurately the position of the dynamic obstacles is predicted. The tracking of dynamic obstacles either uses the Kalman filter [19] or LSTM neural network [20]. NPVO (Non-Linear Probabilistic Velocity Obstacle) [21] uses the LSTM network to predict the motion of the dynamic obstacles and the probabilistic approach finds the velocity available to move the agent without any collision towards the goal. The DNN LSTM has the power to predict the temporal information for a larger time period and it is only restricted to the availability of a large dataset. On the contrary, the Kalman filter calculates the time steps but does not need a large training dataset. These approaches assume that the complete information about the environment, including a knowledge of the positions and velocities of the dynamic agents is available. This can be time consuming, and the approaches are affected by the false positives and the false negatives of the detection and tracking algorithms, heavily limiting their use.
The reviewed methods are either deterministic or probabilistic approaches or with combination of deep learning techniques. There is another family of methods based on DNN. These methods are individually capable of navigating the agent intelligently in the presence of dynamic and static obstacles. Pomerleau [22] proposed an end-to-end model of shallow neural network named ALVINN (Autonomous Land Vehicle in a Neural Network) to steer the mobile robot in the public roads. In recent works, neural networks have gained a lot of attention with the introduction of deep learning by Goodfellow [23]. The resurgence of DNN makes the researchers come up with an end-to-end DNN model to steer the mobile robot (car) in public roads. CNN is the best pattern recognition technique developed so far. CNN extracts the best features from the images directly without any feature extraction technique used in the previous classifiers. Chen et al. [2] proposed a CNN to evaluate the safe distance from the preceding obstacles present in the environment and used the same as inputs to control the agent. Bojarski et al. [24] of NVIDIA also used CNN to steer the car by using a camera and no goal information was provided. Similar to our work, Xu et al. [25] used the CNN LSTM neural network to predict the discrete action to move the car with a single camera. However, in our work the predicted action is inspired from the human behaviors. Heckar et al. [26] used the 360degree view of the camera as one input and planned routes as another input to the DNN and predicted the steering and speed, instead of giving a complete route. In our work we are only giving the coordinates of the goal. Müller et al. [27] model of learning was divided in three sections. The first received the segmented data. The second section generated the trajectory based on the driving policy. The last section was the PID controller to control the actuators. This method segmented the data of CARLA simulator to evaluate the model in real world. This method preprocessed the raw image before giving as an input to the model. The methods discussed drive a simulated robot around a facility while avoiding static and dynamic obstacles. When two humans avoid each other, they have some lesser understood conventions that must be honored. The humans already expect the other entity to act as per the conventions. While adhering to the conventions, there is a possibility of having many corner cases that may be barely observable in a general recording of the data. The aim of this paper is thus not to record a huge volume of a general navigation data, but to record behavior-specific data. A machine trained on a behavior specific data is more likely to handle the corner cases well. Furthermore, since a behavior has already been fixed, the data requirement and thus the complexity of the network is limited.
The major drawback of the supervised learning-based motion planning is the need of a large collection of datasets and training the difficult interactive task. Reinforcement learning provides relief of dataset and also trial and error makes the agent learn the difficult task itself. Pan et al. [28] used a simulator to learn the agent behaviors and the authors deployed the agent in the real-world scenarios. Similarly, Shalev-Shwartz et al. [29] applied transfer learning from virtual world to real world based on reinforcement learning. The learning in the reinforcement learning depends upon the policy and it is not necessary the selected policy will converge the training in the right direction. To overcome this problem, Fan et al. [30] deployed the agent in different scenarios with multiple policies that were hybridized with the traditional approach to handle the simple scenarios. On contrary the manual collection of data in supervised learning guarantees to converge the training. We further argue that reinforcement learning attempts to maximize a reward function, while the learnt system is only optimal for a chosen reward function. The humans in everyday life display conventions that are not formally known and thus designing the correct reward function is a very big challenge. Furthermore, the humans react differently in different situations, and thus replacing a human by a simulated agent is not possible, since the simulated agent is not guaranteed to behave like the humans. Behavior specific training can only be done by a supervised training on a data recorded with the help of the humans.
We have reviewed many methods of motion planning, from classical approaches to end-to-end learning. This literature helps the present study of motion planning. This study is inspired from the human's behavior of obstacle avoidance in real life scenarios, including the display of social etiquettes during navigation. The behavior transfers the human-navigation learning to the agent with the belief that the transfer represents the socially accepted strategy of obstacle avoidance in the dynamic environment.

Human inspired reactive motion planning
This section represents the complete flow of the proposed approach, from perception to the obstacle avoidance behaviors.

Perception of the environment
In the area of motion planning an important part is how the agent perceives the environment. The first percept in this proposed method is the raw local information of the environment R(t) at time t to make the agent independent of the environment, while dealing with the dynamic environment. It was observed in the preliminary experiments that the algorithm got confused between the static and dynamic obstacles and the agent seldom stopped in front of the static obstacle with the belief that the static obstacle will cross the agent path. This problem causes the agent to get stuck for an infinitesimal time period. To eradicate this confusion in the algorithm, we have added the intention detection u(t) which identifies the intention of the obstacles, whether they intent to move or stay in the same position. The intention detection module calculates the position vector of the obstacles in the Cartesian space and the angular space frame in every step of time from the raw data of LiDAR.
The last input in the algorithm is the goal information D(t) and it makes the agent capable of aligning in the direction of the goal in the absence of obstacles. The goal information D(t) contains the Euclidean distance of the goal w.r.t the agent frame, d g (t); and the orientation towards the goal w.r.t the agent frame, θ ga (t). Suppose the agent is at position (X a (t),Y a (t)) with an orientation θ a (t) as reported by the onboard localization system used by the robot. The goal information calculation is shown in Eqs. (1)(2)(3)(4)(5) and the notations are shown in Fig. 2. Equations (2-3) measure the distance to goal d g (t) with the known agent position (X a (t), Y a (t)) and goal position (X g , Y g ). The goal location is assumed to be constant. Equation (2) changes the origin to the location of the agent giving the robot position in the new coordinate axis as (x g (t),y g (t)), while Eq. (3) calculates the distance to goal. Equation (5) calculates the orientation towards the goal w.r.t. the agent θ ga (t) as the difference between the angle to goal θ g (t) calculated in Eq. (4) and the agent's orientation θ a (t). Equation (5) specifically uses the atan2 function to bound the angular difference within the range (− π,π].

Intention detection
The intention detection solves the major problem of motion planning algorithm associated with the difference between the dynamic and static obstacle avoidance. Typically, literatures use object level classifications to track and predict the motion of the moving object that can be problematic for false positives and false negatives that may arise for different reasons. Therefore, the proposed work uses raw LiDAR readings to locate a point of interest on the dynamic object that is tracked to guess the intent of the dynamic object. Unlike many other literatures, the dynamic objects are heuristically detected, and their dynamics information is extracted, rather than letting the machine learning approach learn the same computations. The training of the network with multiple objects without intention detection failed many times on scenarios that were not present in training and thus such a network restricted the agent in a similar type of environment as used in training. Intention detection gives additional inputs related to the obstacles that enable the agent to avoid all types of obstacles. In this detection, the temporal information u(t) of only the nearest dynamic or static obstacle is explicitly calculated while the other obstacles would still be given to the network as the LiDAR raw data of the environment. In   only and the other obstacle information is still available from the raw LiDAR data R(t). The intention detection algorithm uses the raw LiDAR sensor readings to locate a characteristic point on the obstacle surface used for tracking. The process is shown in Algorithm 1. Line 1 checks if at least one obstacle is within a distance of d min , in which case it can affect the motion of the robot. Let R α (t) be the laser reading at an angle of α at time t. α obs is the angular range of the nearest obstacle detected by the LiDAR scan angles, shown in Fig. 3. A laser ray strikes at a general angle α and once it hits an obstacle, the laser scanner reports the distance to the obstacle at the angle α. The laser scanner scans at all angles between an initial and a final angle, in steps that are specific to the LiDAR sensor.
The closest obstacle will correspond to the smallest laser ray reading with a value of min(R(t)), say that corresponds to the laser ray at an angle α. Since the robot is a smooth circular body, laser rays just before α (and similarly after α) will also hit the robot body at a distance of just greater than min(R(t)).
Assume that the change in the distance reading at the exterior surface of the obstacle body facing the robot is ε, meaning that the obstacle exterior body facing the robot has a distance range of [min(R(t)), min(R(t)) + ε]. ε roughly corresponds to the robot radius which is the variation in distance that the laser ray would experience as it traverses the robot's body. All angles that give readings within this range are found from the LiDAR sensor. The continuous angular range that bounds these LiDAR readings is the angular occupancy of the nearest obstacle, or α obs = [α min obs , α max obs ]. The calculation is shown in lines 2, 4 and 5. If the obstacle is to the left of the robot (with a negative angle), the rightmost corner corresponding to α max obs is taken in Lines 6-11; while if the obstacle is to the right of the robot (with a positive angle), the leftmost corner corresponding to α min obs is taken in Lines 13-18. Sometimes the obstacle may be directly ahead of the robot, in which a portion of the obstacle has a positive angular coverage while another portion of the obstacle has a negative angular coverage. In such cases, the leftmost corner corresponding to α min obs is chosen in Lines 20-24.
The network stores the obstacle distance (d o (t)) and obstacle position information (O x (t), O y (t)), along with the first derivative (or speed, v x (t), v y (t)) and the second discrete derivative (or acceleration, a x (t), a y (t)) that is given as an input to the network, shown in Eqs. (6)(7)(8)(9)(10)(11)(12). Here h is the time constant between two consecutive sensor readings. The robot is controlled at a constant control cycle (by sleeping for excess time) to maintain a constant time interval between every two readings.
. Algorithm 1: Intention_detection Input: R(t) Output: θo, do, Ox, Oy 1 : If min(R(t)) ≤ dmin // (threshold LiDAR range for obstacle detection) 2 : : min 3 : else: return (0,0,0,0) 4 : max 5 : Oy(t)= do(t)sin(θo) 11 : return(θo(t), do(t), Ox(t), Oy(t)) 12 : Else 13 : If Oy(t)= do(t) sin(θo) 18 : return(θo(t), do(t), Ox(t), Oy(t)) 19 : Else 20 : The novelty of this manuscript is the different reactive behaviors of obstacle avoidance incorporated in a single DNN (deep neural network) function. The single trained function collects the raw data of LiDAR and initiates the action accordingly. The percept of the environment is input to the trained function and the function predicts the desired linear and angular speed to make the agent reach the desired location without colliding with any obstacle. Figure 4 shows the two most common behaviors of humans, and we advocate that these behaviors are robust in nature for the avoidance of dynamic obstacles that can be transferred to mobile robots. We have used the same condition in training. When we deploy the agent in different situations other than those used for training, the agent demonstrated a mixed behavior. The agent did not show the two behaviors explicitly. The trained function was capable of handling the uncertainty. In the experiment section the agent was deployed in 80 novel conditions to check the robustness of the proposed method and the agent successfully navigated in all of them.

Proposed architecture:
The architecture of DNN based navigation function has three vital inputs. The first is the environment percept input represented by the raw LiDAR readings R(t) that is passed through a 1D CNN to extract the relevant feature from the raw data. The robot sensor percept is sequential in nature. Therefore, the extracted features from the CNN are passed into a LSTM network. The second input u(t) contains the running status of obstacle near to the agent and is directly given to the LSTM network. These data are the sequential information of obstacle motion. The second input plays a significant role in determining the behavior to exhibit. Based on this information, the network can enable or disable the specific behaviors. The extracted feature from both the inputs is concatenated and passed to the fully connected (FC) layer. The third input is the goal location relative to the current robot pose and is directly given to the FC layer. This two-dimension data is directly concatenated with the extracted raw data features and obstacle temporal information features. The FC layer of the model predicts the angular and linear velocity that is given to the actuators. The network architecture is shown in Fig. 5. The inputs and outputs are also shown as Eq. (13). The overall algorithm is given as Algorithm 2.

Results
The algorithm does not have many parameters that significantly affect the algorithm's performance. There are a few parameters only to mention. The value of d min was taken as 10 m in the paper. From the experimental point of view this parameter does not have any effect and can be set to an arbitrarily large value. It is believed that obstacles beyond a distance should have no effect on the decision making of the robot, a theoretical guarantee that enables simulators restrict proximity queries to within the specified threshold. Practically an obstacle too far off will have a negligible effect in the motion of the robot. However, to ascertain those obstacles very far off do not affect the working of the robot, a hard limit of d min (10 m) is set. ε roughly corresponds to the robot radius which is the variation in distance that the laser ray would experience as it traverses the robot's body. The algorithm appends N p inputs as a sequence to extract latent temporal parameters. Since it significantly affects the dimensionality of the input, the value is taken to be a small value as N p = 3. A higher value would significantly increase the input dimensions, while a smaller value would not get enough temporal samples to extract the latent temporal parameters. h, the time constant between two sensor readings, was used to denote the measurement of velocity and acceleration however the parameter is subsumed by the weight of the model. The other parameters correspond to the architecture of the network. Like many deep learning approaches, we argue that the architectural parameters do not significantly affect the performance of the network, however, the quality of data does affect the performance. The architectural parameters shown in Fig. 5. The LiDAR data have a dimension of 1 × 1020, which after appending N p = 3 data samples make the input of 3 × 1020 dimensions that is passed through the 1D CNN with filter size 32 and ReLU activation function to extract the relevant features. The LSTM has an output dimension of 16 and ReLU activation function. The second input of obstacle information has a dimension of 3 × 8 that is directly given to a LSTM network with 16 output dimension and ReLU activation function. A concatenation of the two LSTM outputs is given to a Fully Connected (FC) layer of 128 neurons and ReLU activation function. The third input of goal information with a dimension of 1 × 2 is directly given to a FC layer of 16 neurons and ReLU activation function. The outputs of the two FC layers are concatenated to represent the latent input features. These features are converted into linear velocity and angular velocity using two separate FC layers of size 128 each and a ReLU activation function.
The agent with approximately rd = 0.25 m radius is deployed in a lane of 4 m and the agent's initial position is in the 1-m radius with center (− 8.48, 0.08) and the goal position is taken as (8.48, 0.08). The agent must reach to the 0.5-m radius of the goal position. The agent starts tracking the obstacles when they come in the radius of d min = 10 m. The agent is controlled at a frequency of 10 Hz.  The robot was steered using a joystick from the third view to record the dataset. In general, with a third view, the human operator can look at the obstacles which may not be within the sensor's field of view and make maneuvers to avoid such obstacles in anticipation, something that cannot be learnt. However, the results reported here are behavior specific and only contain the obstacles that participate in the exhibition of the behavior. The robot's LiDAR sensor used in this approach can look at all the other dynamic obstacles and there is no occlusion. On the contrary, if the first view was used using a front mounted camera, the human operator would not have known about the dynamic obstacles at the side (or back) that are beyond the camera's field of view. A LiDAR sensor however scans 360 degrees leaving only a few laser rays that strike the robot's body and the LiDAR has a complete view of the map in this case. The complete dataset generated by the approach is available at [31].
The proposed algorithm's performance evaluation was done in static and dynamic environments with different scenarios. The proposed algorithm's performance is quantitatively compared with the following three state-of-the-art algorithms to evaluate the robustness. In the end statistical test was applied to find the relevance of quantitative analysis.
• The Dynamic Window Approach (DWA) [11]: The ROS DWA planner used. The agent was equipped with a DWA planner deployed in the environment with 80 scenarios. • The Timed elastic band (Teb) [32]: The ROS Teb planner was used. It was also deployed in 80 different scenarios.

Evaluation metric
We first evaluate all the methods based on computational time. In Table 1, it is shown that while all the approaches take competing computation time, LSTM Based Reactive Controller (LMBRC) takes slightly less time compared to the other state-of-art methods. The table shows that a singlefunction based proposed algorithm takes slightly less time as compared to map-based methods and the other methods like NPVO. The computation time is the time elapsed between two consecutive publishing velocity by the methods.
To further investigate the robustness of the proposed method, we calculate four metrics named clearance, travelled distance, success (to reach the destination) and travelling time. We also considered the velocity pattern generated by the controller in all three methods. Clearance calculates the minimum distance between the agent Q and the nearest dynamic or static obstacle O shown in Eq. (14). It is assumed that the agent and the obstacle are both circular in nature. With this assumption, the clearance at any point of time is given by the distance between the centres minus the sum of radii of the agent and the obstacle. The calculation of clearance is shown in Fig. 6, where d() is the Euclidean distance function and rd is the radius of agent and the obstacle. Clearance is a spatial metric that will change as the robot and obstacles move along with time. The minimum clearance in the entire run is computed as the clearance value of the scenario that measures the least distance between the agent and the obstacle when the two entities are closest to each other, shown in Eq. (15). The travelled distance is the distance travelled by the agent while moving towards the destination shown in Eq. (16). The travelling time is the time taken by the agent to travel from the source to the destination, which is an additional evaluation metric.

Static environment
In this experiment, the environment was set up with a static obstacle in 20 different places and the algorithm asks the agent to avoid the obstacle to reach its destination. In the proposed approach avoiding static becomes challenging because "stop and move" behavior confuses between the dynamic and static obstacle. In [33], it was argued that to know the intention of an obstacle either the motion of the obstacle or static object level classification is necessary. On the contrary, intention detection of obstacles removes this necessity. In Table 2, the clearance mean of the proposed method LSTM Based Reactive Controller (LMBRC) is better than all the three methods, the robot travelled lesser distance while moving towards the destination and however, it takes more time than DWA and TEB. It was observed LMBRC immediately aligns towards the destination while the rest of the methods take time to change their alignment and waste time, also increasing the travelled distance. That is why even having smaller clearance mean, they travel more distance as compared to LMBRC, shown in Fig. 6. The Figures show the travelling time using the number of time steps. A time step is 0.1 s in all the results. The only drawback of LMBRC while avoiding static obstacles is that sometimes it takes time to decide either to avoid the obstacle or to wait to let the obstacle pass that increases the travel time in a completely static environment. The same problem is faced by NVPO. It takes more time than LMBRC shown in Fig. 7c, d, where NPVO takes double time steps to reach the destination all three methods. The specific scenario used for testing is however deceptive in nature. Static obstacles generally comprise of walls, furnishing, etc. that are typically avoided by a deliberative planner that breaks goal-seeking as a sequence of sub-goal seeking behaviours. The reactive navigation thus does not need to account for the static obstacles and such modelling is absent in reactive approaches like the Velocity Obstacle method [34] and all their recent variants. To create a static obstacle, the approach therefore uses another robot that is typically supposed to move, however the robot is not allowed to move creating a static environment. The robot looks at another robot and from the training data assumes that the robot would move and adjusts its speed accordingly. The robot may prefer to wait for the other robot due to similarity with the training cases. The evaluation of LMBRC in static environment shows that the intention detection eventually eradicates the confusion state of the agent. This increases the travel time. However, such cases are practically hard to come. Driving a vehicle following another vehicle is a common behavior and correspondingly slowing down once the vehicle ahead slows down is a common behavior. It is a rarity to have the vehicle ahead broken-down requiring surpassing the vehicle. Even though the time taken is larger, the algorithm can overcome the static obstacle. The NPVO method was evaluated in the Stage simulator due to the dependency of the method, however, we scaled the environment in terms of performance like the Gazebo simulator.

Dynamic environment
We quantitatively analyze the behavior of agents with dynamic obstacles with the same evaluation metrics. Due to the similarity of velocity of the agent and the obstacles, we do not specifically analyze the agent velocity with the dynamic obstacles. The only one significant change came in linear velocity of NPVO and LMBRC, the waiting time to decide the intention of the obstacle significantly decreased. LMBRC easily decided that the obstacle is dynamic using intention detection and NPVO anticipated the behavior of the dynamic obstacle easily. The agent was equipped with all four methods of study one by one and deployed in the environment shown in Fig. 8. Figure 8a, b show the front view and top view of the environment in the Gazebo simulator. In Fig. 8c, d, the obstacle and the agent are shown in the Rviz simulator.
The static obstacle will not be detected when it goes outside the FOV (Field of view) of the agent. In Fig. 8c, the dynamic and static obstacle both come in the FOV of the agent. The green rectangle shows the static obstacle, while the dynamic obstacles are shown by black boxes with arrows showing the direction of motion. In Fig. 8d, the agent reached the destination, and both the static and dynamic obstacles are out of the FOV of the agent. The spot where the agent waits is marked with a red circle. The main challenge of the proposed work is how to avoid the dynamic obstacle. We deployed the agent in 20 different scenarios of obstacles. The algorithm was tested on not more than 2 dynamic obstacles and in few cases combination of dynamic obstacles and static obstacles. Figure 8 shows only one scenario. The rest of the scenarios contains agent and obstacles in different positions. It was observed in the experiment that the agent shows a mix behavior of both learned behaviors. The other challenge in this approach is the maximum linear velocity of obstacles that the agent can avoid. We have performed 20 scenarios with 4 different velocities of obstacles. The performance of all four methods showed significant changes in their performance. In Table 3, the first experiment performed with the maximum linear velocity of obstacle as 0.25 m/s. DWA performed worst and got the only 40% success, even the maximum velocity of the agent was 0.7 m/s. The other methods performed to an acceptable level. It was observed in the DWA, it shows some sign of "stop and move" but immediately starts looking for the free space to avoid the obstacle that makes it the worst performer even with very low velocity of obstacle. The TEB and NPVO seems good in avoiding the dynamic obstacle and with low velocity both got 100% success in test cases. Still LMBRC looks efficient because of maintaining a higher clearance, lesser travelled distance and a lesser travelled time.
In Table 4, the metric values corresponding to obstacle velocity of 0.5 m/s shown. In this case, both map-based approaches DWA and Teb performed very poorly and only achieved 50% success in test cases. The increased velocity of obstacle does not show a high impact on the learning-based approaches LMBRC and NPVO.
In Tables 5 and 6, the DWA and TEB performance improves slightly, but still maintains a lesser success than NPVO and LMBRC. From Tables 3, 4, 5 and 6, it is clearly observable that the method involving a combination of different algorithms increases the latency and does not make the agent react immediately. In the map-based approach the latency is high due to local map update and local planner and global planner update. In the NPVO, the trajectory of the obstacle is predicted using DNN to update the local map also has a higher latency of reaction than LMBRC. NPVO failed to avoid the obstacle with velocity of 1 m/s. In LMBRC the latency is very less due to dependency on a single function. The tables also show that LMBRC takes less travel time in all the scenarios as compared to all other algorithms. However, collision in the higher velocity not only depends upon the latency but also on how the agent tries to avoid the obstacle. It is shown in Tables 5 and 6 that the DWA and TEB performance improves as compared to the previous cases.

Statistical analysis
We have performed statistical analysis of the behavior of the agent in the dynamic environment. It is observed in Table 7, that LMBRC maintains a larger clearance mean than the other methods. It was expected from the proposed work, while dealing with dynamic obstacles the extra precaution is necessary to assure the collision free navigation. This proposed work maintains a smaller travelled distance even with larger clearance mean. Path length and clearance are opposing metrics. A high clearance (preferred) typically increases the path length (not-preferred) and vice versa. The algorithms therefore try to maintain a tradeoff between the two opposing factors. An unusual observation from the experiments is that the LMBRC algorithm maintains a larger clearance which however does not affect the path length. This is because the LMBRC algorithm can wait for the obstacles to clear rather than being driven by the obstacles in the case of contradictory navigation goals. Figure 9 shows the specific conditions where the algorithms exhibit the worst performance except the LMBRC algorithm. In Fig. 9a, c, e the robot starts following an obstacle for the sake of open space and this makes the robot maintain a smaller clearance while taking a large path length and time. On contrary, in Fig. 9g, the robot stops, does not move and waits for the obstacle to pass. This behavior makes the robot maintain a large clearance with a small path length. Another reason behind this observation is that LMBRC starts avoiding obstacles earlier and aligns back to the goal slowly in contrast to the other algorithms that saves on the path length metric. In Fig. 9b, d and f the algorithms start avoiding the obstacle only when they are considerably near and aim to steer back quickly, thus forming a steep curve around the obstacle. However, LMBRC starts proactively avoiding the obstacle, and after avoidance slowly drifts towards the goal that makes the traced path close to the geometrically shortest path possible shown in Fig. 9h. We have performed the One-Tailed test shown in Table 8 to find the difference between LMBRC with the other methods and the calculated p-value of clearance, travelled distance and travelled time between LMBRC and other methods obtained lower than 3%. The lower p-value implies that the difference between both algorithms is significantly different.

Experiments under noisy settings
In this section, we demonstrate our experiment in an environment like real time. The real time environments are ubiquitous of noises in the sensor data. These noises make the robot inconsistent in its behavior. To understand the robustness of the proposed algorithm in a real time environment, we have introduced a Gaussian Noise in the LiDAR sensor and deployed the agent. Figure 10 shows four different levels of Gaussian noises in the LiDAR sensor.
From Table 9a it is seen that the noise with mean 0 and standard deviation 0.5 does not much affect the functionality of the algorithms and the agent manages to reach to the destination. In Table 9b-d, the map-based navigation approaches   and NPVO do not make the agent to reach and collide with the obstacle and make the robot stuck. On the contrary LMBRC managed to reach the destination but the increase of noise decreases its efficiency.

Conclusions
Reactive navigation or the direct mapping of input to output solves a lot of challenges of map-based navigation enduring since decades like software complexity and hard to reconfigure for the desired behavior. In general, for the reactive navigation algorithm the transient time between input to output action should be less, then only the collision with the dynamic obstacles can be avoided. Simultaneously the algorithm should also be concerned about the goal. For example, the human avoids the obstacle while he/she is also concerned that the deviation should not move him/her significantly away from the goal. In the proposed network the raw data of the environment and the intention detection input of dynamic obstacles near to the agent makes the algorithm take extra precautions than the prevalent end-to-end algorithms. It was observed in the behavior of the agent that the intention detection input made the agent aware of the motion information of the obstacle, whether the obstacle is coming from the left or right without requiring an object-level classification. We have shown that the training with two behaviors does not only handle the two situations but also avoids the obstacle in 80 novel situations. The training with three different datasets to a single network without intention detection was difficult and confused the agent between static and dynamic obstacles. The intention detection input helped the agent to identify in between them.
The inspiration for the architecture was to provide safety in end-to-end learning, which is the major work in this area of research and a single function leads towards the destination. This neural network has three inputs. The first input passes through CNN and extracts the relevant information from the raw data of LiDAR. The second input extracts motion features of the nearest obstacle and directly passes to the LSTM. The third input is the goal information sent to the FC layer of the network. The concatenation of different layers was a difficult problem in the architecture, and it was possible only with multiple trial and errors. This is a novel architecture of the neural network and experiments show that it can handle the dynamic and static obstacles simultaneously and also lead to the destination.
The experiment procedure is the essence of this prototype developed for reactive navigation. In most literatures, the dataset is collected from sophisticated roads and also tested on roads used for training, where the agents move in different lanes. Outdoor scenarios have limited well-established behaviors, while the behaviors are less formally studied and categorized for the indoor scenarios. The approach thus focusses on such behaviors that have neither got a lot of interest in the literature, nor are likely to be recorded from a continuous stream of generic data. Even if such behaviors are recorded, they are likely to be filtered out as noise by a generic machine learning system. In the experiment, we intentionally make the obstacle try to hit the agent, also crossing the agent path from a different angle and with a different linear velocity to robustly train the agent. We have reviewed many literatures and no literature explicitly discusses about the maximum obstacle velocity. In the preliminary experiments, we observed that it is an important parameter to be discussed and also to evaluate the performance of the proposed algorithm. The experiments concern the indoor environments like hotels, hospitals, and airports and outdoor environments like pedestrians. The scenarios witness different obstacles with different preferred speeds.
The proposed DNN based architecture of reactive navigation inherits serval the strong points. (i) It can avoid dynamic obstacles with a velocity twice of the agent velocity. (ii) Tracking of the nearest dynamic obstacle without object-level classification provides extra safeguard from the collision. (iii) A single trained function makes the agent capable of choosing from multiple obstacle avoidance behaviors while leading the agent to the destination. (iv) First time in end-to-end motion planning, the obstacle avoidance behavior is inspired from specific obstacle avoidance behaviors of humans that are unlikely to be captured in a generic data.
The proposed work restricted the agent deployment in the environment of two dynamic obstacles only. This work is a testbed to incorporate the human intention in motion planning. We found that using a joystick for controlling the agent is the immediate way of involving the human intention. The future plan for this research is to collect the data of human trajectory in different scenarios like pedestrians, mall and railway station where the dataset can incorporate human etiquettes and to train a DNN to find the efficient policy of obstacle avoidance.
In [30], the authors argue that the supervised learning requires a lot of data and reinforcement learning (RL) reduces the dependency of manual collection of a dataset, but at the same time the policy of RL does not make the agent to learn the social etiquettes of motion planning of the human and instead optimizes against a parametrized reward function. On the contrary, the problem in hand is to learn primitive behaviors that neither require a lot of data, nor are the scenarios too complex. While the number of behaviors may be large, navigation is largely a function of the obstacles in the immediate vicinity, against which the motion is predominantly a single behavior. In the future, we intent to incorporate the inverse reinforcement learning (IRL) to derive a reward function for Reinforcement Learning using the trajectory data. In the future we also intent to fuse the ability of Reinforcement Learning to handle complex obstacles with the ability of supervised learning to better capture the human social etiquette into one navigation network. Further, the testing needs to be done for real robots rather than relying on simulators.
Author contributions SMHJ carried out the complete experiment. SMHJ and Rk contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript.
Funding The work is sponsored by the Indian Institute of Information Technology, Allahabad.

Conflict of interest
The authors declare that they have no conflict of interest. The authors declare that the work described has not been published previously, that is not under the consideration for publication elsewhere, that its publication is approved by all the authors and tacitly or explicitly by the responsible authorities where the work has been carried out, and that, if accepted, it will not be published elsewhere in the same form, in English or in any other language, including electronically without written consent of copyright-holder. The authors don't have any competing interest.

Ethics approval Not applicable.
Consent to participate Not applicable.

Consent for publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.