Autonomous robotic exploration with region-biased sampling and consistent decision making

In this paper, we propose a scheme for autonomous exploration in unknown environments using a mobile robot. To reduce the storage consumption and speed up the search of frontiers, we propose a wave-features-based rapidly exploring random tree method, which can inhibit or promote the growth of sampling trees regionally. Then, we prune the frontiers with mean shift algorithm and use the pruned frontiers for decision-making. To avoid the repeated exploration, we develop a decision making method with consistency assessment, in which the status of the robot and frontiers are explicitly encoded and modeled as a fixed start open traveling salesman problem (FSOTSP). Furthermore, a re-decision mechanism is build to reduce the extra computing cost. Simulations and real-world experiments show the significant improvement of the proposed scheme.


Introduction
Autonomous exploration technology of mobile robots has been widely applied in various fields. Amongst the main application is active mapping [1]. Its main task is to determine the next expected movement position of the robot, and finally achieve the shortest collision-free path in the global scope to obtain the most unknown environmental information. The robotics autonomous three-dimensional reconstruction [2] has similar goals. In addition, autonomous exploration has The Hong Kong Polytechnic University, Hong Kong, China also been applied to disaster relief [3], terrain exploration [4] and other fields. Because the environment of these tasks is unknown and changeable, the implementation process requires closely integrated excellent navigation and decision making [5][6][7].
This paper mainly focuses on the autonomous exploration of active mapping tasks which aims to improve the exploration efficiency under ensuring that the robot completes environment mapping. The important measures are the length of the exploration path, the exploration time or even computing consumption. In the process of exploration, the determination of navigation targets is the most critical step, which is a complex task, including searching for frontiers and decision-making of exploration order. Afterwards, the robot performs path planning and moves. For searching, the existing methods can be divided into two categories, which are sampling-based and boundary-based. Both algorithms update entire map constantly in real time. The samplingbased method is widely used because of effectively avoiding falling into search dead areas. However, the existing methods usually global sampling and lack regional tendency due to excessive pursuit of probability completeness. This leads to a waste of computing resources and storage, which is unacceptable on low-cost hardware. For decision making, in the existing approaches, the robot preferentially navigates to the point with the smallest value of the utility function which usually considering path length, information gain, position-ing accuracy and so on. However, the real-time updating of these utility function values causes the robot to frequently change navigation goals, leading to repeated exploration. In this paper, we propose a consistent decision making method to avoid the decrease of exploration efficiency caused by the indecision. In addition, we emphasize the importance of the pruning of searched frontiers, and present an efficient technical framework for autonomous exploration in unknown environments. Three main contributions are summarized as following.
(1) Inspired by the natural phenomenon of sea wave, we propose a regional biased sampling search method. It considers the peculiarities of exploration and actively exploits map information to suppress unnecessary expansion of nodes in the explored area. (2) We propose a decision making updating method with consistency, including designing a utility function to measure consistency, extending it to the solution of global optimal FSOTSP, and develop a re-decision mechanism. (3) All modules are effectively integrated into the autonomous exploration system for mobile robots. Experiments show the effectiveness and robustness of the system. We also verify the necessity of pruning frontiers through experiments.
The structure of this paper is as follows. In "Related work", the review of related research progress in autonomous exploration is given. To facilitate the research, "Problem formulation" carries on the mathematical modeling of the autonomous exploration problem. Then "Proposed method" proposes the detailed method of autonomous exploration. In "Simulation" and "Real-world deployment", the simulation results and practical experiment are discussed. Finally, "Conclusion" states the conclusions and discusses the future development research direction.

Related work
At present, there have been some autonomous exploration schemes which use mobile robots to map unknown environments. Some of the works focus on exploring the space quickly [8,9], as this paper does. Meanwhile, other methods place more emphasis on accurate reconstruction [10,11]. The functional modules of autonomous exploration are often subdivided into simultaneous localization and mapping (SLAM), frontier points searching (Search), exploration decision making (Decision) and motion planning (Planning). The goal of Search is to generate frontier points. the algorithms for generating frontier points mainly include boundary-based and sampling-based methods. The boundary-based search algorithm generates points to be explored by detecting the boundary, represented by Frontier Detector (WFD) [12], Fast Frontier Detector (FFD) [12], and edge detection algorithms for graphics and images. WFD is based on Breadth-First Search (BFS) and only scans known instead of unknown regions. The work in FFD uses lidar information to find the edge of the map to obtain possible forward detection points without processing the map information. Senarathne et al. [13] give proof and experimental support to these two algorithms, and further optimized them. Senarathne and Wang [14] apply the digital image processing technology of edge detection for boundary search. In each step of map update, the modified part of the map is processed, and the boundary information is managed and extracted in an incremental manner. Nevertheless, the computational complexity of image processing is still large. The sampling-based exploration algorithm is represented by the rapidly exploring random tree (RRT) and its derivative algorithms. Umari and Mukhopadhyay [15] apply RRT to search for frontier points. The main idea was to randomly sample on the map and use independently grown trees to speed up the search for frontier points. If the edge of the tree falls on the unknown area and the free area at the same time, the unknown points adjacent to the free area are the frontier points. Qiao et al. [16] change the way of storing trees to improve the exploration efficiency of narrow environments. All the above methods search globally and iteratively, without effective mechanisms to reduce storage and computing consumption. The method in [17] utilizes the semantic information of the environment as the heuristics to guide the robot exploration. However, it is premised on collecting enough data sets for training, and its generalization is poor.
After the robot obtains the frontier points, it needs Decision module to decide which point to go first. Robot usually makes decisions based on the value of the utility function. Makarenko et al. [18] pioneer the use of information entropy to represent maps, and proposed to use navigation cost, information gain, and positioning accuracy to construct utility functions. The following researchers continued to conduct in-depth research based on this. Isler et al. [2] propose a new probabilistic volume map for 3D reconstruction to quantify the visibility of voxels. The work in [5] combines the fast online re-planning function of planning based on trajectory optimization with the idea of maximizing the exploration gain in the sensor field of view to improve the decision-making speed of real-time exploration. Meng et al. [19] samples the viewpoints and used the heuristic TSP solver to calculate the visiting order of the frontier points, and visited the frontier points in turn. However, none of the above-mentioned exploration decision algorithms consider the problem of repeated exploration in real-time decision-making. The method [20] proposes a temporal memory-based RRT (TM-RRT) to make robot deci-sions more firm. But decision updates become slow because robots are assigned a fixed amount of time.
Some works focuses on active SLAM during exploration, which aims to improve the location accuracy and mapping quality [21,22]. Other works improve the Planning method [23,24] to make the planned trajectory satisfy the dynamic feasibility of robots or meet some trajectory optimization requirements. The method [25] considers the predicted actuation of a vehicle and uses nonlinear model predictive control (NMPC) to follow a generated path. As an extension of single robotic exploration, some works focus on multi-robot cooperative exploration system [26][27][28], which cast their eyes on solving task allocation or communication problem. There are also some robots use different sensors [24,29] from the using in this paper. All the above methods have contributed to the development of the field of autonomous robotic exploration, but their focus is different from that of this article. In summary, an excellent exploration often requires shorter time, low energy consumption (short paths), and low computational and memory consumption. SLAM and Planning have been widely studied, while there are few researches on the Search and Decision.

Problem formulation
The environment is modeled using occupancy grid map M which is a metric map. The basic idea of occupancy grid Map is to represent the environment map as uniformly distributed cells. According to the state of the cells, the Occupancy Grid Map can be divided into known cells S known and an unknown cells S unknown . Known cells are further divided into free cells S free and occupied cells S occupied which indicate obstacle. We represent the grid set of M as S = s(1), s(2), s(3) . . . s(N ), and every grid stores the value of occupied probability. The occupancy probability of each grid is expressed as information entropy: where C = 1, 2, 3 . . . , and P(s(c)) is the occupancy probability of grid s(c) in the occupancy grid map. Then the information entropy of the area to be explored W = w(1), w(2), w (3) . . . w(N ) can be expressed as: The higher the average entropy of the area w(i), the higher the uncertainty of the area's information which reflects the potential increase of map information. It represents the attractiveness of the mobile robot.
Usually, The robot decides which area to explore first according to the utility function value of the frontier points. The utility function needs to consider the navigation cost U N , the information gain U I [30] and decision-consistent cost U C (i). The navigation cost is the expected distance for the robot to reach the frontier points. To simplify the calculation, the navigation cost can be modeled as a Euclidean distance between the current position of the robot and the position of frontiers. The information gain is the average entropy of the area W to be explored. Consistency is evaluated using the methods presented in "Decision making with consistency". Each time the robot selects the frontier with the smallest utility value as the next target: arg min In the exploration process, the goal is to improve the exploration efficiency of the robot as much as possible when the map entropy is reduced to the expectations. Moreover, we hope to reduce the energy consumption of the robot, which means that the total path of the robot's walking is as small as possible: where T is the total exploration time. C M is the map entropy budget for completing the exploration and mapping.

Proposed method
We propose a technical framework for autonomous exploration and mapping in 2D environment. As shown in Fig. 1, the Search module randomly samples on the map constructed by the SLAM module. The searched frontier points are inputted to the Prune module which is a pre-processing before decision. The frontier points outputted by the Prune module will be used for the next Decision module. After making a decision, the robot will select the best forward frontier with the smallest value of the utility function as the target and send it to the Planning module.

Wave-features-based RRT search
The Search module aims to generate frontier points. In this process, we optimize RRT exploration algorithm [15]. RRT randomly samples in the environment and connects the sampling points with the surrounding nodes. The nodes and edges will form a tree-like structure. When the tree branch grows across the explored and unknown areas and does not pass the obstacle area, the node of the branch is marked as a frontier. Since the growth of the tree is random, and all branches in the sampling area grow with a uniform probability. As time goes by, more and more branches will grow in the explored area (free area), which will cause unnecessary memory consumption. To reduce unnecessary memory consumption, we are inspired by wave. For ease of description, the following concepts are defined and visualized in Fig. 2.

Definition 1 (Exploration area)
The area close to obstacles or borders, as shown in the green part. The tree that we expand when sampling should be mainly concentrated in this area.

Definition 2 (Safety area)
The explored area far away from obstacles and frontiers, as shown in the yellow part. We propose a suppression method in the safe area to inhibit the growth of RRT.

Definition 3 (Potential area)
The area between the exploration area and the safety area, as shown in the blue part. The trees in the potential area grow slower than the trees in the exploration area, but the inhibitory effect is weaker than that in the safe area.
It can be observed that the safe area is surrounded by the exploration area. The nodes in the safe area are parent of the nodes in the exploration area. A mathematical model is expected to describe the state of the nodes in the safe zone. The obstacles and frontiers are judged by whether the tree branches cross the known and unknown areas as they grow. The work in this paper is inspired by the interference phenomenon of waves: the nodes in the exploration area will produce an invisible wave-like energy, which is transmitted to the nodes in the safe area and producing an inhibitory effect. This is very similar to the phenomenon of wave interference: each wave propagates independently, and interference occurs only when interference conditions are met, and then the energy increases after interference. The transmission process of the water wave is shown in Fig. 3a.
For ease of description, we represent the RRT search process as a tree structure, as shown in Fig. 3b and define the following concepts for wave-features-based rapidly exploring random tree (WFRRT) algorithm: Definition 4 (Inhibition potential) The potential energy that inhibits the expansion of the node. The unified unit of measurement is E.

Definition 5 (Inhibition energy)
The energy that inhibits the expansion of the node, which is generated when the inhibitory potential is activated. The wave source node has inhibition potential E(O), which means that the growth of its root node may be inhibited.
Definition 7 (Interference node) The common parent node o i (g) for the suppression potential of multiple wave source nodes for successful interference(trigger activation mechanism), where g is the generation of the parent node. Interference nodes will produce inhibition energy, which suppresses the expansion.
Algorithm 1 lists the details of WFRRT exploration search algorithm. Compared with the RRT exploration algorithm, in addition to the node set V and the branch set E, it also maintains the wave source node set V source and the interference node set V int . The WFRRT exploration search algorithm proposed in this paper has the following properties: (1) Independent propagation of the inhibition potential We learn from the independent propagation principle of waves which is that waves excited by each wave source can propagate independently in the same medium. After they meet and separate, they are the same as before, and do not II. Wind energy increases because the force of the wind on the wave is superimposed. III. Swells will weaken linearly due to wind resistance and so on. IV. Wave interference will make the wave higher and higher. b I. Source node is the node whose child node hits an obstacle II. Potential inhibitory energy can be superimposed like wind energy. III. Inhibition energy is linear like swells. IV. Interference node will produce inhibitory energy when activated interfere with each other. The inhibition potential is regarded as a wave, nodes and branches as the propagation medium. The suppression potential has the characteristics of independent propagation: when a node and its child nodes touch an obstacle, the node will produce the inhibition potential E(o i ) = 1E and spread towards the parent node as the wave source node o i is marked as a wave source. In Fig. 3a, o 1 , o 2 , and o 3 are all wave source nodes whose inhibition potential end if 26: end while 27: return F spreads independently. But o is not a source node because it does not extend the child nodes that touch the obstacle.
(2) Superposition of the inhibition potential Inspired by the superposition principle of wave, in the area where waves overlap, if the particle of the medium participates in the vibration caused by these waves at the same time, the displacement of the particle is equal to the vector sum of the displacements caused when these waves propagate alone. As shown in Fig. 3a, the wind generates new waves, and the waves will be superimposed on each other, eventually making the waves higher and higher. We regard the inhibition potential as the waves caused by wind energy. When the same wave source node o i expands the child node, if the connection with the child node passes through an obstacle, it will produce new inhibitory potential that will be superimposed with the previously generated inhibitory potential (Line 9): (3) Interference of the inhibition potential We are inspired by the interference principle of wave which is that the superposition of two waves of the same frequency makes the vibration in some areas strengthened under certain conditions (Fig. 4). As shown in Fig. 3a, interference occurs between waves, making the waves higher. Similarly, the suppression potential also has interference phenomena. When the node triggers the activation mechanism (receives two wave source nodes separated by the same generation to suppress the potential), the node is marked as an interference node, and the inhibition energy will inhibit the expansion of the node and its child nodes is generated. As shown in Fig. 3b, o 3 (4) is the node where o 3 and o 2 interfere in the 4 generations backtracking, which is an interference node. It is worth noting that o 1 (2) is the point where o 1 goes back to the second generation, which is the same point where o 2 goes back to the first generation. Because of its different generation, this node does not cause interference. The user can adjust the generation g according to different situations. We recommend that the larger the exploration environment, the larger the g setting. The interference energy will affect the expansion probability of the interfering node and its child nodes (Line 10).
where P(o i (g)) is the expansion probability of the node, f int is the interference function, and k is a constant. When more than two interference potentials are transmitted to the interference node through the same generation, the interference function of the interference node is given by: Among them, E(O(g)) = E(o 1 (g)), E(o 2 (g)) . . ., and n is the number of the same backtracking generation from interference nodes. The inhibition function represents the degree to which the growth of the node is inhibited.

(4) Linear diffusion of inhibition energy
Waves spread to the shore to form swells. As the swells receive resistance from the shore and wind resistance during its propagation to the shore, they will become lower and lower. We regard the propagation of inhibition energy as the propagation of swells. When interference nodes have inhibition energy, the inhibition energy will spread to the child nodes of the interference node and get smaller and smaller (Line 11). As shown in Fig. 3b, o 3 (4) (also o 4 (4)) the interference node diffuses the interference energy in the direction of the child node respectively (i.e., the direction of o 3 and o 4 ). The inhibition energy is: where f int (0) is the inhibition function value of the interference node, E int (m) is the inhibition energy of the child node m generation from the interfering node. E is the debilitating energy, which represents the inhibition energy lost by each node that diffuses, and is a fixed value.

Prune based on mean shift
After search module, we will search out many frontier points to form a set F = f 1 , f 2 , f 3 . . . f n . With the time going, there are more and more frontier points, which will increase the computational cost of the forward optimal exploration point decision and cause unnecessary consumption of computing resources. To save decision-making costs, we propose a prune method based on mean shift (MSP) which is considered as the post-processing process of RRT search. The goal of pruning is to retain only one or a small number of frontier points on the same frontier (the junction of the explored area and the unknown area). The process is shown in Fig. 4. We firstly select the best point that has most exploratory potential among the frontier points at similar distances through clustering, and then discard the remaining points. In clustering, we use the Mean Shift algorithm [31] is used, which has a wide range of applications in target tracking, image segmentation, etc. We extract the frontier points from the map and express them in the form of points in a twodimensional image. For the selected points f i of the frontier point set F in the 2D grid map, the general form of the Mean Shift vector is: where S h refers to the circular clustering area with radius h. f j is the frontier point adjacent to the selected f i . K is the number of frontier points in the area. Because the frontier points are randomly sampled, the distance between the sample and the offset point is different, and the contribution of the offset to the mean offset vector is also different. Therefore, we add a kernel function to the general form of the mean shift vector: Among them, G( f i − f h i ) is the kernel function, and we use Epanechnikov, which is the Euclidean distance from the frontier point to the center of the region of interest (ROI). Through iteration, the mean shift will cause the S h clustering area to move continuously to the denser area until it converges. We keep the frontier points at the center of the S h area after convergence, and discard the other frontier points in the area. Finally we obtain the pruned frontier point set F new , and input this point set into the Decision module.

Decision making with consistency
The frontier points from the Prune module will be used for the Decision module. When making the decision, the frontier point with the smallest utility value will be selected as the next exploration point which is called the best forward point (BFP). The utility function usually includes navigation cost and information gain, but lacks overall plan. This kind of utility function will make the robot's exploration time local optimal rather than global optimal, which leads to repeated exploration. Furthermore, we hope that the robot has a global view to make the overall exploration more efficient which is similar to obtain a shorter global path. In this paper, we propose a new utility function based on FSOTSP is proposed to achieve the expectations.
The task of autonomous exploration and mapping requires us to explore the unknown area from the current position, and there is no need to return to the starting point after the task ends. Therefore, unlike the closed-loop TSP problem, we model the decision-making process of autonomous exploration and mapping is modeled as Fixed Start Open TSP (FSOTSP) problem [19]. The Lin-Kernighan-Helsgaun Heuristic (LKH) TSP solver [32] is used to solve the problem. The difference from the closed-loop TSP solution process is that the result of FSOTSP only contains the open-loop route of all exploration points. In this process, the current position of the robot is the starting point, and the frontier point is the node to be visited. We first calculate the cost matrix of the robot's navigation cost N for different frontier points in 2D space, and input this matrix into the solver of FSOTSP. We take the total cost of exploration is taken as the objective function and the genetic algorithm (GA) [33] is used to solve it. The cost of navigation is inversely proportional to the utility function evolved from Eq. (3): Among them, w is the weight coefficient. U I is the information gain, which is equal to H (w(i)) in Eq. (2). U N is the path cost. To simplify the calculation, we take the Euclidean distance between the location of the robot and the point to be explored. U FSOTSP is a penalty item that deviates from the FSOTSP decision sequence. This item will make the robot more inclined to choose the frontier point that conforms to the last global optimal decision sequence as the FBP. To achieve this goal, we need to quantify the degree of departure from the FSOTSP decision sequence is quantified, so we propose a decision difference matrix A to measure the degree of difference before and after the decision update: where F t i and F t j respectively represent the set of frontier points before and after a decision is made. To facilitate calculation, we express the 2D position information of each point in the set of frontier points in matrix [x, y] and [k, m]. Because the number of frontier points before and after the decision may change, we use 0 to fill in the matrix with the number of elements misplaced. As the distance from the robot's current position to the BFP continues to decrease, we hope the robot not to change its destination, because this may lead to longer repeated paths. Then an attenuation coefficient exp ω * Dist is introduced, where Dist is the Euclidean distance from the robot's current position to BFP. This coefficient will decrease as the destination is approached, and the penalty for deviating from the original decision sequence will keep increasing, making the robot more determined to explore this direction to the end. Finally, we refine the utility function (11) is refined into: 5 Every time the robot advances by S step length, the frontier points set F is updated and the Re-decision mechanism is triggered, which may cause the robot to change the BFP where f i is the BFP's position, and p is robot's current position. When heading to the BFP, the robot needs to make overall decisions based on new updated map information. However, real-time update of the decision will consume unnecessary computing resources. Therefore, a Re-decision Mechanism is proposed. As shown in Fig. 5, the decision is updated every time the robot advances by a step length S which is compared to odometer integration over time.

Experimental setup
We use Robot Operating System (ROS) as the running platform of algorithms, and build simulation environments using Gazebo for autonomous robotics exploration. The Turtlebot robot is used as the simulation experiment platform, and is equipped with a 180 • viewing angle Hokuyo laser, which has a visual range of 50 m. The localization of the mobile robot is done using Gmapping [34], while the navigation stack uses TEB planner [35]. We construct two simulation environments, which are home and office scene. The total size of the home simulation environment is 196 m 2 , including 5 rooms, corridors, and doors. The smallest room is only 7 m 2 while the largest room is 70 m 2 . The rooms present different shapes, and each room has only one door. The office simulation environment is 225 m 2 in total, including one lobby and 5 offices. When the robot is in the lobby, it often faces multiple objectives, which greatly tests the robustness of decision making method. All the sampling area is set to 20 m * 20 m. Note that the simulation is visualized using RVIZ.

Benchmark
In this subsection, we compare the performance of our proposed WFRRT against RRT exploration [15] and TMRRT [20] which is state-of-the-art. The focus of TMRRT is to improve two modules, which are map-merging module and temporal memory-based goal assigner module. The mapmerging module are used for multi-robotic exploration, therefore it is not considered in this article. Temporal memory-based goal assigner module allocates an adaptive time period for the robot to explore the BFP based on the robot's desired movement velocity. During this period, the robot does not change its target, which makes robot's decision-making more firm.
In addition, to verify the effectiveness of the main contributing Search and Decision modules, we design an ablation experiment. A simplified version of WFRRT without consistent decision making called WFRRT-Lite is also used for autonomous exploration. In this way, we can see the effect gain when different modules are stacked together.
For home and office scene, 10 groups of experiments are respectively done using RRT [15], TMRRT [20], WFRRT-Lite and WFRRT. Due to the randomness of sampling search, the experiment may have extreme results. We remove the best and the worst experiment results and remain 8 groups of experimental results are finally averaged, shown in Table 1.

Comparison and analysis
To comprehensively measure the methods proposed in this article, we compare the following indicators of RRT, TMRRT, WFRRT and WFRRT-Lite: (1) nodes number which reflects the consumption of memory and computing resource, (2) exploration time which directly represents the exploration efficiency, (3) path length which is usually used to measure energy consumption. Then we discuss in detail the reasons for the results in Table 1.
For the number of nodes, taking the home scene as an example, WFRRT-Lite's nodes are 22.32% less than that of RRT. The number of nodes in WFRRT has decreased by 28.20%. This means that WFRRT has only increased by 5.88% compared to the Lite version. It can be concluded that the region-biased search based on wave-features-based sampling has a greater contribution to reducing the number of nodes, while the consistent decision has a minor contribution. Compared with TMRRT, our method also exhibits excellent performance.
In the comparison of exploration time, WFRRT-Lite reduced the time by 10.53% in home scene, indicating that it has a contribution to saving time and improving exploration efficiency. It can be noted that TMRRT improves more than WFRRT-Lite because it has an advantage in making decisions about the BFP. Nevertheless, our complete version of WFRRT has stronger performance. The reason is that TMRRT requires the robot to reach the target point within a time period calculated from the ideal speed and distance to the target, without external interference. Therefore, robots For the path length of the robotic autonomous exploration, WFRRT-Lite reduces by 7.76% compared to RRT in the home scene. The improvement is due to WFRRT Lite affecting the update of frontiers. Compared to WFRRT, the reduction is not significant. This is because WFRRT has a greater impact on robot navigation decisions. It can be noticed that the trajectory of TMRRT is the longest, although the exploration time is not. This is because only one goal is set during the memory time, while other algorithms update frontiers in real-time. The shorter exploration time is due to reduced steering, which is not included in the path length.
By comparing the two scenarios, it can also be seen in Table 1 that our method achieves better results in the office scenario than in the home scenario. This is because each room in the home has only one door, while the office scene has multiple doors, so frontiers are distributed in various directions, which further tests the decision-making module. Moreover, it is more spacious and easy to grow more nodes during sampling search.
To display the results more intuitively, a few representative experimental visualization graphs are shown in Fig. 6. It can be seen that the node density of the improved random search tree is reduced, and the growth is obviously inhibited, which promotes the exploration efficiency. It also can be seen intuitively that the improved algorithm makes the robot more determined and less change to the BFP. Therefore, there are fewer repeated paths due to the change of the target.

Prune's necessity verification
We design experiments to evaluate the impact of the Prune module on exploration. 20 groups of experiments are done to compare "search-decision" (SD) framework with "searchprune-decision" (SPD) framework. One of the explorations is visualized as Fig. 7.
It can be seen from Table 2 that the number of nodes, exploration time, and path length have increased significantly after removing the Prune module, indicating that the Prune module is very important to our technical framework (Fig. 8). We can find that in the office scene, the disadvantages of the SD framework are more obvious than that in the home. This is because that there are more boundaries of explored and  unexplored areas in the office. The distribution of frontiers are more scattered. This brings difficulties to the decisionmaking. More swings and repeated paths will arise as shown in Fig. 7, leading to longer exploration time and more generated nodes. The Prune module can improve the efficiency of exploration and save energy losses caused by long-distance driving.

Real-world deployment
After testing WFRRT exploration effectiveness in the simulation, we deploy it in a real robotic exploration system using Kobuki mobile robot. For perception, RPLidar A1 laser range scanner is used which provides range measurements of up to 6 m with a coverage area of 360 • . We use Gmapping [34] for SLAM module. A star [36] and TEB [35] are used as the global and local planner. Intel NUC i7 is used as Kobuki's computing unit for processing all the modules. A personal laptop is used to monitor robot status and visualize the map by RVIZ.
We construct real-world experiments in two scenarios. One is the laboratory scene, as shown in Fig. 9. It is about 52 m 2 and is separated to different fields of view with cardboard boxes. The sampling area is set to 8 m * 10 m. The whole exploration process take around 153.38 s and the Kobuki robot travels 24.49 m. The final number of nodes is 4218. Another scene is an area enclosed by fences and cube boxes in a robot experimental venue. Another scene is an area enclosed byfences and cube boxes in a robot experimental venue, as shown in Fig. 8. The area is approximately Table 2 Experimental comparison between SPD (are the same as WFRRT whose results are shown in Table 1  To describe the experiment process in more detail, we record the video during the experiment and capture some pictures of first person view (FPV) and third person view (TPV) at some key positions, as shown in Fig. 8. Due to the larger and more spacious venue, we relax the dynamic restrictions. In this experiment, we set the maximum linear velocity of the mobile robot to 0.5 m/s, the maximum linear acceleration to 1.0 m/s 2 , the maximum angular velocity to 1.0 rad/s, and the maximum angular acceleration to 1.0 rad/s 2 . To adapt bigger environment, the map resolution is adjusted to 0.15. The robot move 19.22 m to complete the mapping. During this period, a total of 3594 nodes grow. The implementations of the WFRRT exploration show successful attempts in exploring unknown regions.

Conclusion
In this paper, we propose an autonomous robotic exploration system with region-biased sampling and consistent decision making, which is called WFRRT. The core is (1) rapidly random tree search method based on wave characteristics, which can suppress the growth of random trees in areas that have already been explored (called the safety zone in this article) to save memory and computational consumption, and (2) decision-making with allocation consistency. Consistency means that every time the robot makes a new decision on the order of exploration frontiers, it takes into account historical decision results and global optimality. It has two benefits. First, it can help the robot avoid frequent changes in direction leading to an increase in exploration time. Second, by encoding the states of robots and frontiers into the FSOTSP, the global optimality of access order can be maintained. In addition, we verify the necessity of the Prune module via experiments.
The proposed exploration method has been tested in simulation and real-world deployment. Results have shown it performs better than conventional RRT exploration and state-of-the-art TMRRT algorithms in terms of memory consumption, exploration duration and travel distance. In the future, we will conduct further research on the mathematical interpretability of the framework of autonomous exploration and motion planning. Another noteworthy aspect is that WFRRT can be extended to multi-robotic collaborative exploration systems. long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.