A Multisession SLAM Approach for RatSLAM

To successfully perform autonomous navigation, mobile agents must solve the Simultaneous Localization and Mapping (SLAM) problem. However, acquiring the map in a single SLAM session may not be possible, thus the map may be incrementally built over multiple sessions. Two solutions could be considered to solve the multisession SLAM problem: (i) the robot must localize itself in the previously stored map before the new session starts; (ii) it can start a new map and merge it with the map from the previous sessions. To date, only scenario (i) has been addressed by RatSLAM, an algorithm inspired by the navigation system in rodent brains. Therefore, this work proposes a multisession solution that solves both scenarios. A new mechanism merges the data from the RatSLAM structures of the current mapping session with those previously stored if there are connections between these paths. This approach was tested in four different scenarios, from virtual controlled environments to real-world environments with two, three, and ﬁve sessions. The robot started in an unfamiliar location for each mapping session, but it also works if the agent starts in a known place, scenario (ii) and (i), respectively. For all experiments, the entire map was consistently obtained. Furthermore, the proposed approach updates and enhances the previous session’s map in real-world environments. Therefore, the proposed approach may be a multiple SLAM session solution for the RatSLAM algorithm.


Introduction
Autonomous navigation is one of the fundamental problems in mobile robotics [1], in which autonomous robots travel in unknown static or dynamic environments trying to reach a given objective [2]. While static environments present constant displacement of an object over time, dynamic ones may change their spatial features along with objects in the environment, such as vehicles, people, or other agents, which may move around. In both environments, a relevant and central issue for mobile robot navigation is to acquire a spatial map of the environment while simultaneously localizing the robot on the map, which is known as the Simultaneous Localization and Mapping (SLAM) problem [3,4].
B Alexandre Oliveira amunizdeoliveira@ucc.ie Extended author information available on the last page of the article SLAM algorithms have been applied to several problems in real-world scenarios, such as: search and rescue [7][8][9], autonomous exploration of hazardous areas [10], area surveillance [11], among others. These are common problems that involve activities that can be hostile or dangerous for human-based interventions.
This study is mainly interested in the SLAM approach inspired by the localization and navigation system in the mammalian brain [12][13][14][15], which includes specialized neurons [16]. For instance, place cells in the hippocampus activate when the animal is located in a circumscribed region of space [17]. Grid cells in the entorhinal cortex activate when the animal's location coincides with a vertex of a hexagonal grid overlaid on the environment [18]. Head direction cells in several brain regions activate when the animals' head is rotated in a distinct direction [19].
Computational models of the neural process underlying these spatial representation cells have inspired RatSLAM [20][21][22]. Visual sensing, robot movement, and pose cells estimate the robot's position and orientation while exploring the environment, similar to a conjunctive grid and head direction cells. The visual representation and odometry input activate specific neural network regions that implement a three-dimensional Continuous Attractor Network (CAN). The combination of sensing and CAN activity generates an experience.
RatSLAM has been successfully used to solve indoor and outdoor SLAM tasks [13,[21][22][23][24][25][26]. However, RatSLAM requires improvements to deal with multisession SLAM problems, i.e. a robot maps along several sessions. Multisession is required when the robot cannot map the entire environment in one session, e.g., in large environments, or when the robot is shut down and later restarts mapping. It can also be applied when tracking fails caused by occlusion in visual SLAM. In multisession SLAM approaches, the robot can start at a random position in the environment when a new session begins. This new position may not be stored on the previously created map. This problem has been defined as the kidnapped robot [2,27,28].
According to [28], two solutions for this problem can be considered: (i) the robot localizes itself on the previously created map before the new session starts, or (ii) the robot starts mapping the new location with its reference coordinates -it may merge this new map with the previous ones to generate a unified map. This work focuses mainly on the latter.
One of the first works in which RatSLAM performs longterm large-scale autonomous mapping was developed by Milford and Wyeth [25]. In that work, the robot worked on delivery tasks for two weeks in two physically separate environments. The agent can be shut down and placed in either environment without notification. When switching from one environment to another for the first time, the agent attempts to locate itself on the previous map, and when unsuccessful, RatSLAM creates a new experience map for this new location. The new map shares the same RatSLAM structure but it is topologically separate from the first built map, i.e., it keeps multiple local maps in the same RatSLAM structure. However, in case of a single connection between them, the odometer error propagation will deform the map due to false metric representation. Therefore, strictly speaking, it does not solve the multisession scenario stated in (ii).
In recent work, a new navigation system has been developed based on the relationship between the hippocampus and episodic memory for mobile agents to perform multistep tasks [13]. The approach adopted RatSLAM as the hippocampus's spatial navigation mechanism and episodic memory to recover the steps of a given task. Moreover, the work proposed an improvement that enables the robot to localize itself globally in previously familiar areas. In their global localization module, the robot can be shut down and restarted in a random position of the environment, i.e., the kidnapped robot problem. Once the robot restarts and the global localization module is activated, RatSLAM uses the robot's visual input to check whether the new location has been saved in the stored map. As long as the robot does not localize itself, RatSLAM does not generate new experiences, as it usually does when fed a new visual input. Instead, it compares the visual information with the ones from the stored map. This solution is similar to scenario (i) mentioned above. Therefore, the approach proposed in [13] only works when the robot restarts inside the previously mapped area. Thus, the robot cannot explore distinct regions in multiple sessions in the global localization mode since no new experiences will be created in this new environment, and so it does not solve the multisession scenario (ii).
Although not fully encompassed by RatSLAM, multisession mapping has been addressed in many non-biological SLAM frameworks [28][29][30][31][32][33][34][35][36]. In particular, the multisession scenario (ii) is solved by visual/visual-inertial SLAM algorithms such as ORB-SLAM 3 [30] and SLAMM [31]. Both solutions propose starting a new map when the tracking is lost. This new map grows with new inputs, and additional operations continuously compare the current input of the "active" map with those from different stored maps. If new input information belongs to the "active" map, the algorithm performs loop closures; if it belongs to a different map, then both maps will be merged into a single one, which becomes the currently "active" map. This merging operation applies an alignment transformation between the two matched inputs and converts the information from the current map to the stored one.
The previous solutions either solve multisession using an engineering fashion for non-biological SLAM [28][29][30][31][32][33][34][35][36], or can only partially solve it for a neuro-inspired SLAM as mentioned above [13,25]. However, to fully deal with multisession RatSLAM, a solution must consider both RatSLAM structures and their intrinsic relation. For example, to correctly reactivate a RatSLAM experience on the map, both visual input and CAN related to that experience must be correctly activated on RatSLAM. This issue presents a challenge in building multisession solutions for RatSLAM, specifically for its network. This paper proposes a new solution to merge the RatSLAM Pose Cell Network, keeping the experience map consistent and performing loop closures correctly after merging of RatSLAM structures, which was not explored in other works found in the literature. This paper proposes merging multisession maps for a bioinspired SLAM, RatSLAM, improving the flexibility of mapping and allowing robots to build maps incrementally with RatSLAM. This method generates a separate partial map of the environment, which is built with the new experiences in a new session. Once an equivalence between a new and a previously stored experience is found in a loaded map, a merge mechanism combines these maps (partial and loaded maps), adjusting internal structures and yielding a new partial map, ready to continue the mapping in a new session.
If the new session starts at a known location on the loaded map, scenario (i), the agent activates the equivalent experience on the loaded map, and the mapping proceeds from this point. Similarly to ORB-SLAM 3 and SLAMM, this work establishes the matching of RatSLAM's Local Visual Cells as bases to find an alignment transformation among RatSLAM structures. In addition to the solution for (i), the proposed approach also solves (ii) once it merges the RatSLAM structures and ensures both the correct metric representation and the inner relations after the merge operation, which was not possible in other works found in the literature. This paper is organized as follows: Section 2 presents the theoretical basis of the RatSLAM algorithm. In Section 3, the methodology of this work is depicted. Next, Section 4 presents the experimental setup, that is, the scenarios, robots, parameters, and other experimental details. In Section 5, the results are presented and discussed. Finally, the conclusions of this work and suggestions for future work are presented in Section 6.

RatSLAM foundations
RatSLAM was developed in 2004 by Milford, Wyeth, and Prasser [20] for general real-world examples of localization and mapping on mobile robots using a vision system as its main input sensor. Figure 1 shows the updated RatSLAM architecture [21]. It is composed of the following modules: The following subsections describe the details of the Pose Cells Network, Local View Cells, Experience Map, and the role of the RatSLAM parameters in its function.

Pose Cells Network
The Pose Cells Network (PCN), denoted by P, is a CAN arranged in a three-dimensional structure (Fig. 2) and represents the position (x , y ) and orientation θ of the robot. Excitatory and inhibitory connections link the CAN units. In addition, the connections among units wrap across all six faces of the PCN (e.g., red arrows in Fig. 2). With these wrapped connections, the network can work beyond its fixed size, i.e., it can represent environments with an area larger than the areas encoded by the PCN.
Furthermore, local excitatory and global inhibitory connectivity among cells provides inputs to these units and changes their activity. Over time, this dynamics allows the CAN to form a cluster of activated cells, known as energy packet or activity packet [21] (Fig. 2, blue cubes). In addition, the center of the energy packet (Fig. 2, darker blue cubes) is the best estimate of the robot's pose in the environment.
Local excitatory connections increase the activity of neighboring cells, whereas global inhibitory connections eliminate the activity of small clusters in the PCN. The excitatory and inhibitory connectivity are described by the distribution ε [37]: where k p and k d are the variance constants for place and direction, respectively [21]. The parameters a, b, and c represent the distances between two cells' coordinates, considering the network's periodic boundary conditions. The distance between two cells with coordinates x , y , and θ and i, j, k , respectively, is given by: where mod is the modulo operator. The parameters n x , n y and n θ indicate the network size in terms of the number of cells along each of the X , Y and dimensions. The change of activity in a cell is given by [37]: where ϕ is the global inhibition. The final step in the network update limits the activation levels in P to non-negative values and normalizes the total activation to one [37]. In addition, the direction in which the energy package changes is driven by odometry information, which represents the robot's movement, and energy injection by the templates stored in LVC, which might move the activity packet to a different location in the PCN [21].

Local View Cells -LVC
The Local View Cells, V , form an array of templates, denoted by V i . Each template represents a distinct visual scene of the environment captured by the Robot Vision System. When a template is created, a short learning excitatory link β is established between it and the center of the dominant activity packet in the PCN [21]. This β link associates the distinct view from that location with the robot's estimated pose encoded by the PCN. The link is given by [37]: where λ is the learning rate, and the variable i denotes which template V i is activated. The x , y , and θ are the coordinates of the center of the dominant energy package in the PCN. Note that β t+1 i,x ,y ,θ only creates a new link with value λV i P x ,y ,θ if a previous association was not already learnt, i.e. β t i,x ,y ,θ does not exist, yet. Moreover, when a loop closure occurs, i.e., the robot returns to a location it has already mapped, a consecutive sequence of previously stored templates are activated in the correct order. When the template V i is activated, it injects energy in the PCN at (x , y , θ ) coordinates via its learnt link (4) [21]: where δ is the constant, determining the influence of visual features on the estimated robot's pose [21]. When the PCN receives a constant injection of activity, this changes the dominant energy package and, consequently, the re-localization of the robot.

Experience Map -EM
The Experience Map (EM) is a graph combining pose cells and templates information to estimate robot poses in a two-dimensional map. A node in the EM is defined as a 3tuple [21]: where P i and V i are the activity states in the PCN and LVC, respectively, at the time the experience node is created, and p i is the robot pose in the experience map space. Furthermore, a new experience node is created when the states P i and V i do not closely match the state of any existing experience. A link l i, j is created and saved when the robot moves from a previous experience e i to the new experience e j [21]: where p i, j is the relative odometry pose between the two experience nodes, and t i, j is the time taken by the robot to move between the nodes. The temporal information can be used to perform the path planning from a specific experience to the desired goal. As exemplified in [21], Dijkstra's algorithm can be used to find the shortest path between two nodes. As long as there is no loop closure, the EM is based on the robot's odometry. Loop closure activates the robot's relocalization in the map and distributes the odometry error throughout the graph using a relaxation algorithm, changing experience poses. Changes in experience locations are obtained as follows [21]: where α is a correction rate constant set to 0.5, N f is the number of links from the experience node e i to other experiences, and N t is the number of links from the other experiences to the experience e i .

Role of RatSLAM Parameters
The RatSLAM has parameters responsible for the algorithm's appropriate mapping performance. The parameters of Local View are responsible for template operations, such as comparison, size dimension, and when new templates need to be created. The Pose Cells parameters influence the dynamics and dimensions of the network, e.g., values for local excitation, global inhibition, and energy injection when a scene is revisited. Finally, Experience Map parameters define the number of executions of the graph relaxation algorithm (8).
In addition, The Self Motion Cues module ( Fig. 1) provides odometry information computed from the Robot Vision System. In this case, Visual Odometry parameters are set to correctly determine the robot's translational and rotational velocities. However, the Visual Odometry parameters can be ignored if this information comes from a separate source, such as wheel encoders.

Methodology
At the end of each session of a multisession mapping, the robot generates a map of the environment using RatSLAM and stores it. In the next mapping session(s), this map is loaded by the robot (loaded map). The robot is assumed to receive no information about its position on the loaded map. In the new session, the robot builds a new map called partial map. If a common path that links the partial and the loaded maps is found, they are merged into a single RatSLAM structure, i.e., the merging results in a single LVC, PCN, and EM to represent the environment. This process is described in detail below.
It is assumed that the partial map has more than one experience before the merge procedure, i.e., the agent starts the new session in a nonmapped area of the environment. If the robot starts at a known location of the loaded map, the algorithm could activate the equivalent experience on the loaded map, and the mapping proceeds from this experience.
The merge process is triggered if the agent creates a template in the partial map that matches a previously stored template in the loaded map ( Fig. 3), i.e., the agent entered a location that is represented by both maps. To compare templates, the same process that compares the templates in RatSLAM is employed, i.e., a new template of the partial map is compared to all templates of the loaded map. When a match occurs, both the Loaded Map and the Partial Map consist of RatSLAM's structures as shown in Fig. 2, where V , P, E, lm and pm stand for Local View Cell, Pose Cell Network, Experience Map, loaded map and partial map, respectively. The new template of the partial map matches a template from the loaded map V pm n vp = V lm u (Fig. 3), where n vp is the number of templates of V pm . In addition, the views V pm n vp and V lm u are linked to the center of the activity packet, displayed as the green cubes, P pm z and P lm u , respectively. However, even though the templates represent the same place, their activity packets may activate different coordinates (x , y , θ ) in the respective PCN, P pm and P lm . Similarly, for the Experience Map, the templates and activity packets are allocated to experiences e pm z and e lm u (Fig. 3, green nodes), where these experiences may have different poses coordinates.
A relevant information for merging also relies on the prior position before encountering the matching view in the partial At merging, the structures of the partial RatSLAM map and their relations are inserted in the loaded map in four operations ( Fig. 4): i) the LVC are merged (Fig. 4a), ii) the matching view in the partial map V pm n vp is linked to a new PCN activity packet location, which corresponds to a shift (Fig. 4b), iii) the associations between all LVC and PCN are shifted by the same amount (Fig. 4c), and iv) the EM are merged (Fig. 4d). In the following subsections, these operations are described in more detail. Finally, at the end of the merge procedure, a single RatSLAM structure is obtained (Fig. 4e) and used in the remainder of the current mapping session.

Merging Local View Cells
The merge of LVC aims to join all the templates from the loaded and the partial map into a single LVC structure. The templates of the partial map are concatenated into the LVC of the loaded map, except for the last acquired template because it is already present in the loaded structure: where n vl is the number of templates of V lm .
In the last stage of the LVC merge, the template that represents the actual robot's view scene V lm u is activated in the V lm (Fig. 4, green sphere).

Pose Cells Network Activation
After merging the LVC, the activated template is V lm u , which needs to be associated with the units in the PCN at right. This injection of activity should occur at the coordinates of P lm u because it is associated with V lm u . Before the merge procedure, the last activated packet in the partial map was P pm z ( Fig. 4B, light green cube). Thus, a change of activity from P pm z to P lm u is required and can be seen as a shift of activity in the P lm u (Fig. 4B, green arrow). The difference of coordinates between P pm z and P lm u is defined as: where (x p , y p , θ p ) and (x l , y l , θ l ) are the coordinates of activity packets P pm z and P lm u , respectively.

Shifting Association between LVC and PCN
All associations between LVC and PCN in the partial map must be updated to be consistent with the associations in the loaded map. As an example, we consider the penultimate view in the partial map. P pm k must be shifted to keep the previous distance d to P lm u (Fig. 4C, red arrow). The and (iv) merge the EM. These are illustrated in the first four panels. a) All templates of the V pm (Fig. 3) are inserted in the LVC structure V lm , except the last template of the partial map (green circle of V pm in Fig. 3), because this template is equivalent to V lm u , which becomes the current active template at the end of step (i). b) The activity packet is associated with the active template at the end of (i) P lm u (Loaded Map in Fig. 3). In addition, before the merge procedure, the activity packet encoding the robot's pose in the partial map was P pm z . However, after the merge, the similar energy packet that encodes the robot's pose in the PCN is P lm u . This change of activation can be seen as a shift of coordinates in the Pose Cell P lm , and it is represented by the green arrow from P pm z to P lm u . c) The coordinates of the association between LVC and PCN of the partial map are changed when they are inserted into the loaded map. As illustrated, the energy packet of the partial map, P pm k , is shifted (red arrow) to keep the same distance d (dashed line) to the active energy packet P lm u as it had with P pm z in the partial map (Fig.  3). Note that the shift applied to P pm k is the same computed in (ii). d) The experiences from the partial map's ME are inserted into the EM of the loaded map, as shown by the dotted red circles. A function that transforms the coordinates of the e pm z (green node) in the E lm to the coordinates of it equivalent experience e lm u (green node) in E lm is computed, and the shift is applied to all experiences of E pm , except e pm z . A final link between e pm k and e lm u is introduced to connect the inserted experiences from the E pm to E lm . e) The final RatSLAM structure after the merge transformation function that shifts the activity packet P pm k to P lm k is defined as f (): where (x u , y u , θ u ) are the coordinates of P pm k . The (x s , y s , θ s ) are the shifted coordinates of P lm k in the loaded map. Once this shifted operation is carried out, the excitatory links β lm must be updated as follows: where (x , y , θ ) are the coordinates of the energy packets from P pm .

Merging Experience Maps
Before the experiences of E pm can be merged into E lm , the experiences in the partial map have to be map consistent with those in the loaded map. Before the merge, the experience e t(x p , y p , θ p ) = x l , y l , θ l , and is defined as follows: where θ = θ l − θ p and H (x p , y p ) is defined by: where x = x l − x p and y = y l − y p . Once function t is defined, it must be applied on the poses of E pm experiences so they could be inserted on the merged E lm , which is carried out through operation T , defined below. In addition, operation T also changes the information of the new experiences in E lm with their correspondent template and energy packets according to operations (i) and (iii) loaded map's structures V lm and P lm .
where e pm is an experience from E lm . The n el and n ep are the number of experiences in E lm and E pm maps, respectively. It is important to mention that e pm z is not added in E lm because it is already equivalent to e lm u . Then, the new experiences are inserted on the merged E lm .
The last step (iv) is to connect, through links, the added nodes on the E lm as follows: i = 1, ..., n ep − 2 and j = 2, ..., n ep − 2, which is similar to (7). As e pm z is not inserted its equivalent, e lm u , has to be connected to e pm* k through (17) with i + n el being k and j + n el is u.

Merge Algorithm
The merge algorithm is summarized in the Algorithm 1, highlighting the equations that make up each step. Complexity analysis can consider the match and the merge routines separately. Considering that the merge process is triggered from a matching between a template in the partial map of size n vp and a template of size n vl , stored in some of the previous sessions κ, the computational complexity can be given by: 1: procedure merge(V lm , V pm , P lm , P pm , E lm , E pm ) 2: V lm ← MergingLocalViewCells(V lm , V pm ) Eq. 9 3: x , y , θ ← ComputeDeltasPCN(P lm u , P In summary, the complexity of the merge algorithm does not exceed the linear behavior concerning the Local View Cells (in both partial and loaded maps) and Experience Map sizes.

Experimental Setup
Four different environments are explored in this study. For each environment, datasets of video streams or image frames have been captured during tours performed by real and virtual robots. Briefly, the four environment datasets are: i) videos generated by a virtual robot from an ellipse-shaped tour, named Virtual Tour dataset; ii) videos generated by a real robotic platform during a tour inside a research lab called Lab Tour dataset; iii) frames extracted from the "iRat" Australian dataset; and iv) frames extracted from the The New College Vision and Laser dataset. The latter two were used to validate the OpenRatSLAM implementation [21]. Each environment is detailed in the following subsections.
To evaluate the proposed multisession approach, singlesession and multisession maps are compared using Iterative Nearest Points (ICP) [38]. The ICP solves the registration problem by finding a transformation matrix that approximates the two maps as closely as possible in iterated executions. As a criterion for the evaluation of the transformation matrix, the ICP computes the root mean square errors (RMSE) to the corresponding node distances in both maps. The ICP stops the iterated executions when either the RMSE is lower than a defined threshold or the algorithm executes the maximum number of iterations. Therefore, by returning the RMSE over the distances of the correspondent nodes, the ICP provides a single value that evaluates the overall trajectory of the multisession and single-session maps, with RMSE equal to 0 being a perfect match between them. This article uses the Libicp algorithm [39].
The RatSLAM algorithm requires a specific set of parameter values for each environment [21,24]. These parameters are required in all the main structures, i.e. LVC, PCN, and EM. The parameters values for each environment are displayed in Table 1. In both the "iRat" and New College datasets, odometry information is obtained from the robot's wheel encoders, so the Visual Odometry parameters were not used.

Virtual Tour Experiment
In the Virtual Tour experiment, the robot takes an ellipseshaped tour in a virtual environment (Fig. 5). Both the environment and the robot were modeled in a simulation framework developed to study biomimetic models of rodent behavior in spatial navigation learning tasks [40].  Ellipse path travelled by the agent. In the first session, the agent maps the path of the 3/4 turn represented by the blue line, starting and ending at the blue cross and blue diamond, respectively-the map generated by the robot in the first session loaded in the second session. The robot travels almost three laps (yellow lines) in the video stream used for the second session. The agent starts and ends mapping at the yellow cross and diamond symbols, respectively. In addition, the partial map of the second session is the path that starts at the yellow cross and ends before the start of the blue cross. The merge between the partial map of the second session and the loaded map occurs when the robot reaches the start point of the first session (blue cross) in the second session (red circle). The merge procedure results in a single RatSLAM structure that holds both the loaded and partial map. Furthermore, after the merge, there is a non-mapped section of the path (dashed yellow line). The robot maps this area when it moves from the blue diamond to the yellow cross This experiment is divided into two mapping sessions. First, the robot performs a 3/4 lap through the environment (Fig. 5b, blue line). When the robot reaches the end of the first session path (blue diamond), the experience map is saved. In the second session, the virtual agent loads this map. Since it is positioned at a novel location (Fig. 5b, yellow cross), the agent starts mapping with a new experience map (partial map). The robot performs almost three complete laps (yellow line) and eventually travels on the same path as it did in the first session.
The merge between the partial map in the second session and the loaded map of the first session occurs when the agent first encounters a view that was stored in the loaded map (Fig. 5b, red circle). After the merge, the virtual robot continues the mapping, using the merged RatSLAM structure, until the end of the second session.
Note that the video frames that generated the map in the first session are part of the video frames used in the second session. Specifically, the video of the first session is embedded in the video used in the second session. Thus, when the agent passes the merge point in the second session, the following frames are the same ones collected in the first session. Therefore, no new experiences should be created by RatSLAM when the agent moves through the path that was covered by the agent in the first session (blue path).
Moreover, after merging, there is a section of path between the end of the first and the start of the second sessions (dashed yellow line between the blue diamond and yellow cross). Such a session has not been mapped in either session. The agent i n to close the loop of the ellipse path fully.

Lab Tour Experiment
In the Lab Tour experiment, a research laboratory (Fig. 6a) is mapped by a real robot platform, called RoboDeck 1 (Fig. 6b). The robot platform is equipped with a monocular camera with 640×480 resolution. Since only the robot's camera was used to create the video streams used in this experiment, the odometry information was extracted by visual odometry. The RoboDeck obtained the video streams of this experiment in a manually driven tour of the room. The robot trajectory is a rectangular path, resembling a figure eight (Fig. 6c).
In the first session, the robot performs two counterclockwise laps along the small rectangle (Fig. 6c, blue line). The robot starts and ends at the blue cross and diamond locations, respectively. In the second session, the robot performs almost two counterclockwise laps around the large rectangle (Fig. 6c, yellow line), starting and ending at the locations marked by the yellow cross and diamond, respectively. The merge between the maps is expected where the two paths first meet (Fig. 6c, red circle). After the merge, a final loop closure is expected when the robot returns to the start point of the second session (yellow cross).

iRat Experiment
The iRat 2011 Australia dataset was used to validate the multisession approach with more than two mapping sessions, The robotic platform RoboDeck. c) The paths travelled by the robot in the two mapping sessions. In the first session, two laps are performed, starting and ending at the blue cross and blue diamond, respectively. The gray arrows show the direction of movement. In the second session, the robots start at the yellow cross, perform almost two complete laps, and end at the yellow diamond. The red circle marks the merge point for the partial and loaded map. Additionally, a final loop closure is performed when the robot reaches the start point of the second session at the start of the second turn focusing on consistency in more than two sessions. The dataset was obtained while a small mobile robot called iRat, similar in size and shape to a large rodent [21], explored an outdoor road tour (Fig. 7a). As main sensors, iRat was equipped with an overhead camera and dead reckoning sensors to provide images and odometry data.
The complete dataset composes a video of approximately 16 minutes long [21], during which the robot explores the environment moving without any pattern between the roads. However, only part of the dataset frames were used in the experiment, corresponding to five mapping sessions. The first four sessions are internal laps of the environment. The last session corresponds to the external lap (Fig. 7b) and therefore merges the four internal maps into a single one when it travels through their common areas.
As mentioned, the map from one mapping session is transferred to the next as the loaded map. Once the partial map session overlaps with the loaded map, they can be merged. This process is then repeated in the next mapping session. Figure 7c illustrates the merge points between the fifth and first sessions (red circles), and the fifth and second sessions, respectively.

New College Experiment
The New College Vision and Laser is a full-scale long-term dataset collected from a robot completing several loops outdoors around the New College campus in Oxford [41]. The data includes 360 • image (Fig. 8a), odometry, and laser scan information. Similarly to the iRat dataset, the New College has been used to validate the openRatSLAM implementation [21]. Due to its complexity, this experiment aims to validate the multisession approach with full-scale long-term data. The experiment is divided into three mapping sessions. The first two sessions covered the entire environment in two different areas (Fig. 8b, blue and green paths). The third session aims to merge the two loaded maps into a single one (yellow path). In addition, in this session, considerable amounts of new visual information are added to the mapping. These new inputs will test the approach's capability to maintain consistent mapping for a long period after the merge operations.
In the first session, the robot completes three laps inside the ellipse shape in the clockwise direction (upper blue arrow). Subsequently, the agent leaves towards the intermediate area between the first and second sessions and returns to the ellipse area, covering its fourth lap, but in a counterclockwise direction (bottom blue arrow). In the second session, the agent travels through the different regions of the environment in a total of two clockwise turns.
The third mapping starts at a known position within the map of the first session (Fig. 8, yellow cross). The robot In the first two sessions, the robot completes laps over distinct areas. The third (partial map) session merges both (loaded) maps on the merge points (red circle) and adds substantial new input for the RatSLAM. The arrows give the direction of movement travels through the entire environment in counterclockwise laps. In particular, the robot's last counterclockwise lap within the area corresponding to the second session adds novel information for the mapping.

Results
In this section, the results of the Virtual Tour, Lab Tour, iRat, and New College experiments are presented. To compare the multisession solution with the standard RatSLAM process, the EMs for both multiple mapping and single mapping sessions are shown, respectively. Note that after the merge procedure, the resulting RatSLAM structure (LVC, PCN, and EM) is similar to the standard RatSLAM one.

Virtual Tour Experience Map
For comparison, the EM for the single RatSLAM mapping is displayed in Fig. 9a. The link between the start and endpoint of the EM shows that the loop is closed in the map. The result of the multisession mapping is presented in different stages to show the evolution of the EM, starting at the end of the first session (Fig. 9b). Figure 9c depicts the second session with the partial map (yellow), alongside the loaded map (blue), at the moment when the agent found an experience on its EM that matched with an experience in the loaded map, thus triggering the merge process. After the merge procedure, the experiences of the partial map were transformed (translation and rotation) and joined into the loaded map (Fig. 9d).
Once the agent completed a full lap in the second session, the EMs of the first and second sessions formed a closed loop (Fig. 9e). As expected, the number of EM nodes in the single and multiple sessions is the same, meaning that no new experiences were created after merging the maps. Therefore, the merge operations made the creation of new experiences unnecessary. Neither new templates nor new activity packets on PCN were added to the EM. Figure 9f displays the final merged EM of both the first and second mapping sessions. As displayed, a path correction is performed by RatSLAM over the EM. This correction on the merged EM shows that after operation iv), the nodes are linked in such a manner that they influence each other when the relaxation algorithm distributes the odometry error throughout the graph. Therefore, this path correction demonstrates that the RatSLAM works as expected after the merge procedure.
Finally, the ICP comparison of both single and multisession maps shows strong similarities between the paths after the transformation of the merged map to fit into the singlesession map (Fig. 10). RMSE = 0.00773327.

Fig. 9 Experience Maps (EM) of the Virtual Tour experiment. a)
The path is generated in a single mapping session after the agent performs two turns and a half. b) EM of the first mapping session. c) EM in the second session when the agent finds a corresponding experience in the loaded map, and the merge condition is met. d) EM after merging the loaded and partial maps. The red circle represents the merge point. e) EM after the loop closure triggered when the agent completed a full lap. Note that new experiences are not generated when the agent travels in a previous path (blue dots and lines), i.e., no yellow experiences are plotted over the blue map. f) Path correction performed by RatSLAM after the loop closure in e

Virtual Tour Pose Cells Network
In this part, the behavior of the PCN is analyzed during the merge process. The Virtual Tour PCN was chosen because it has a simplified behavior due to the path mapped by the agent, i.e., the PCN is fully activated in the axis (robot orientation) because of the ellipse-shaped laps. This behavior can be seen after the end of a single mapping session (Fig. 11a). It is worth mentioning that the second and third laps done by the virtual agent have the same frames and speed information as the first one. Hence, the PCN units from the first lap are reactivated in these next ones. Therefore, we define start and end for the first and last energy packets created in the single mapping session.
The PCN for the multisession is partially stored in the loaded and partial map structures. As the loaded map has not completed a full lap, its PCN is partially filled on the axis (Fig. 11b). Likewise, the partial map also has only part of its PCN activated. Nevertheless, its last activation (blue) corresponds to the start LM activation of the loaded map (Fig. 11c). Through operation (iii) in the merge, the activities Experience Map of the partial map are shifted to match the blue activity packet with the start LM in the loaded map. Note that part of the partial map PCN activity packets is shifted to the network's top face after reaching the bottom face's boundary. Similar to the single SLAM session, the final merged map of the Virtual Tour is a closed-loop ellipse. Consistently, the final PCN is activated through the complete axis after the merge (Fig. 11c). These results show that our method changes both EM and the PCN coherently, and their final results resemble the ones from a single mapping session.
Finally, to demonstrate the influence and impact of the PCN operation on the merge process, multisession mapping was carried out without shifting the PCN activities of the partial map (operation iii). As expected, new experiences were created due to the incorrect association between templates and PCN activity (Fig. 12). Therefore, as demonstrated, the merge operations on the PCN are necessary for RatSLAM to perform correctly.

Lab Tour Results
Compared to the Virtual Tour, the Lab Tour experiment is more complex in two regards. It was performed in a physical environment using a robot in a more complex environment's topology. Due to the robustness of RatSLAM, the EM for the single mapping session nevertheless correctly represents the environment's topology (Fig. 13a). So, does the multisession mapping, which we discuss step-by-step? After the first session, the EM only reflects one of the loops in the environment (Fig. 13b) since the agent only had experiences in the smaller loop. Up until the merge, conditions are met (Fig. 13c), the loaded and partial maps are not aligned, and their relative positions are random. This changes when the two maps are merged (Fig. 13d). Note that the nodes of the partial map had been concatenated at the merge point (red circle). Figure 13e displays a path correction of the start and end nodes (blue cross and diamond symbols) of the first session  Besides that, this result shows that the merge approach could improve the mapping in multisession.
The final EM at the end of the second session exhibits the results of the second loop closure when the robot completes a full lap in the second session path Fig. 13f. This final map is similar to the EM generated in the single session (Fig. 14). RMSE = 0.00883269.

iRat Results
The iRat experiment is challenging since it requires merging maps in five sessions. Nevertheless, single-session RatSLAM performs very well, generating an accurate EM (Fig. 15a). Note that the robot continuously moves through the environment in a single mapping session. This continuous movement influences the behavior of RatSLAM's relaxation algorithm (8) since the path correction depends on the connection among the nodes.
The results of the multisession mapping show that the fifth session seamlessly connected the loaded EMs of the previous sessions. (Fig. 15b-f). Unlike single-session mapping, multisession maps are not generated through continuous robot movements, that is, each session generates its map independently. In addition, the mapping of the fifth session covers only the external area of the environment. Therefore, the existing links between the loaded and partial maps are their regions corresponding to this external path. Without linking the internal nodes with the external ones, no path corrections can be made on the final map. This may have influenced the high distances shown in the internal maps compared to singlesession EM (Fig. 16). Nevertheless, the ICP comparison between the final multisession EM with the single-session EM shows clear visual similarities between the internal paths and the overall map. (Fig. 16). RMSE = 0.0126932.

New College Results
The New College experiment is the biggest challenge since it involves mapping an even more complex environment, and the multisession version requires the merging of large maps. The single-session EM closely matches the physical structure of the environment (Fig. 17a). Similarly, the first and second sessions show similar maps for the different areas ( Fig. 17b-c). The partial map of the third session is quickly merged since it starts in a known place inside the first session loaded map (Fig. 17d, red circle). After completing its trajectory in the first session area, the resulting map is merged into the second session loaded map at its expected location (Fig. 17e, red circle).
As can be observed, in continuing the mapping, the Rat-SLAM algorithm applied an inconsistent path correction at the second merge point, which affected the final map (red squares, Fig. 17e-f). This inconsistency led to a clear difference between single-session and multisession maps when compared by ICP (Fig. 18, RMSE = 85.4687). However, despite this inconsistent correction, the final multisession map presents a sufficient similarity of structures that compose the environment path (see Fig. 8). Notably, the multisession map correctly closed loops where the single-session mapping failed to perform it. (green circles, Fig. 17a,f).

Conclusion
This work presents a novel solution to the multisession SLAM problem for the RatSLAM algorithm. Specifically, the proposed solution considers the multisession mapping scenario where the robot is not localized within the previously generated map, and newly mapped areas have to be merged with previously mapped ones into a single coherent map of the environment.
The merging of maps is obtained from complete structures (LVC, PCN, and EM) acquired and stored in previous mapping sessions. The proposed approach merges the map under construction with previously stored maps when both maps share a template, adjusting all structures spatially and producing a new partial map. After merging, the mapping session continues, following the standard algorithm, and can perform new merging procedures.
Four different environments were explored in this work: the Virtual Tour experiment, in which a virtual robot performed an ellipse path; the Lab Tour experiment, where a robotic platform called RoboDeck mapped a research laboratory; the iRat dataset used to validate the openRatSLAM implementation [21]; and the challenging New College dataset [41]. The Virtual Tour and Lab Tour experiments were divided into two mapping sessions, while the New College and iRat experiments were separated into three and five mapping sessions, respectively.
The multisession maps obtained were comparable to single-mapping maps in several experiments. For all experiments, the final EMs were similar to the EMs of the single mapping session. On the other hand, path deviation increased with the number of mapping sessions and in long-term and large-scale mapping scenarios, as shown by the EMs of iRat and the New College experiment, respectively. In the former, the difference between them may be due to the absence of connection between the nodes of the internal and external laps. In the latter case, the path deviation occurred after an inconsistent path correction after the merge point. Regardless of path distortions, the final maps of both resemble the single-session map, which shows a potential application for the merge approach in more than two sessions. Furthermore, the experimental results showed: a) path corrections performed in EM; b) no new experiences created in the Virtual Tour experiment; c) the PCN activity packages of Virtual Tour in the single mapping session are similar to the PCN of the multisession; d) EM matching in real-world Lab Tour, and realistic iRat experiments; e) loop closure correctly performed when compared to single-session mapping in the New College dataset. Therefore, the present work shows that the proposed merge mechanism addresses the multisession solutions (i) and (ii), and can be used as a solution for the RatSLAM algorithm.
In future work, a deeper investigation of the results presented on the New College experiment should be conducted. With this, improvements in the merge mechanism could be expected for large-size long-term environments. Moreover, the overall merge mechanism could be adapted to compare the current under-construction map with multiple previously stored maps.

Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Code Availability
The code that supports the findings of this study is available at https://zenodo.org/badge/latestdoi/568248424.

Conflicts of interest The authors have no conflicts of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.