1 Introduction

The simultaneous localisation and mapping (SLAM) algorithm has been developed to meet the challenge of building and updating a map of the surroundings in moving mobile robots, which includes mapping robots poses without prior knowledge of their surroundings. It has the advantage of providing knowledge about the robots’ own pose and representation of the environment using extensive computational resources to map robots’ surroundings and simultaneously perform localisation [4, 5]. To enable a robot to navigate through an environment autonomously, an estimation of its position within a reference system or map could be obtained from sensors, such as dead-reckoning sensors, laser sensors, radar sensor or 3D camera sensors [6], GPS (Global Positioning System) [1, 3, 10, 19, 30]. However, those sensors are either expensive or unsuitable for small robots because of size, weight or power efficiency. For example, a GPS can be used to provide the solution for the global localisation. However, at some places, GPS is not possible to use it, such as in caves, underwater or places where no signal can be obtained [1].

Currently, researchers have moved into alternative affordable solutions, such as ultrasonic sensors or infrared sensors. However, those types of sensors do not provide accurate odometry information due to the relatively low signal-to-noise ratio in low-cost sensors compare with expensive sensors [17]. Furthermore, most of the low-cost sensors that were used to estimate the current position of the mobile robot tend to accumulate errors over time (known as statistically dependent) due to the noise generated intrinsically in the sensors. Recently, the SLAM community moves towards image sensors (refer to Visual SLAM), as they are affordable and able to provide a large amount of information compare with other sensors such as laser range finders. In addition, it can be applied to mobile robots that have smaller sizes, lower weight and less power consumption [8, 24] than industrial robots. At the same time, it presents great challenges due to the computational resource needed for the identification of landmarks in variable environments. Hence, SLAM presents an emerging challenge to be implemented with low-cost embedded systems, which are the most common platform for pervasive computing products.

In this work, we combined visual SLAM and visual odometry algorithm on a low-cost embedded platform to demonstrate how a mobile robot navigating using artificial landmarks in an indoor environment. Our proposed method can lead to fast landmark detection with a reduced computational complexity using a low-variance resampling, which result that the sensor data measurement is kept outside of the main process loop during estimating the locations of the landmark.

2 Related work

In order to implement a cheap and small SLAM system, monocular visual SLAM solutions were invented to use only single image sensor [24]. However, the algorithms needed for monocular SLAM are much more complex since depth information cannot be directly inferred from a single frame captured from a single camera. Normally, each frame needs to be processed, even when no relevant information is provided by new frames. Furthermore, these solutions also present an accumulation error over time due to nonlinear Kalman filter related models.

The best-known implementation of monocular SLAM using a keyframe-based approach is PTAM [14]. The keyframe-based technique is used to select some of the frames with predefined features to save computational power. In PTAM system, a parallel processing with high-performance computing system was used for obtaining robot’s pose and the location mapping, which is unsuitable for low-cost and low-power embedded systems due to power consumption issues.

Other implementations of monocular SLAM are also based on obtaining abstract regions of an image in each frame that are more useful for reconstruction of location than robots navigation [7, 21, 27]. Some of the algorithms for the detection of regions are SIFT, BRIEF, FREAK, SURF and ORB [11]. However, all these feature extraction approaches present the problem of data association in SLAM [21].

Data association problem is how to decide which noisy measurement corresponds to which feature of the map. Noise and partial observability can make the relationship between measurements and the model highly ambiguous. This problem is further complicated by considering the possible existence of previously unknown features in the map and the possibility of spurious measurements. For example, multiple measurements over time are obtained by a sensor, it is needed to associate which measurement belongs to a specific landmark. The methods for the solution of data association are based on statistical procedures such as Nearest Neighbor (NN) [2] or Joint Compatibility Branch and Bound (JCBB) [22, 25]. One way to solve associate measurements with low computational complexity is using artificial landmarks that consist the modification of environment features [26]. Several other studies tackle the problem by minimising the ambiguity of the measurements in a mapped environment [18, 25, 29, 32]. However, in these solutions, uncertainty in the measurements is taken before classifying a landmark, which results in the incorporation on specific characteristics of the sensor implemented.

Other studies present the imposition of physical artificial landmarks in an environment, such as QR codes or fiducial markers, where uncertainties of both measurements and data association reduced [15, 23, 31, 32]. Llofriu et al. demonstrated an embedded solution of SLAM using artificial landmarks [16], such as boxes with different colours. However, due to slow measurements presented by the image sensor, it might not have enough updates of the particle, adding uncertainty in the location. Another drawback is that the identification of artificial landmarks used in this study is based on a measurement likelihood that aggregates to the possibility of introducing wrong data association.

3 Methods and materials

Based on FastSLAM2.0 method, we develop a low-variance resampling method with a multimodal design style based on embedded system platforms [20]. The multimodal FastSLAM framework has been implemented in python, and its source code is released for public use in an online repository (see details in Appendix 3).

3.1 Adapted FastSLAM algorithm

FastSLAM2.0 used a particle filter to model the uncertainty of the robot pose [20]. It defines the posterior SLAM as the product between the posterior of the robot pose, and the posterior of the landmarks is conditioned by the robot path, presented in Fig. 1. The posterior is determined by a Dynamic Bayesian Networks (Eq. 1) and a Rao-Blackwell particle filter [9, 12, 13]. The posterior of robot path is obtained by the control vector (u t ), the data association (n t ) and landmarks (z t ) [20].

Fig. 1
figure 1

Dynamic Bayesian Network for SLAM

$$ p\left({x}_t,{\theta}_t\left|{n}_t,{u}_t,{z}_t\right.\right)= p\left({x}_t\left|{n}_t,{u}_t,{z}_t\right.\right) p\left({\theta}_t\left|{n}_t,{u}_t,{z}_t,{x}_t\right.\right) $$
(1)

As we discussed in the last section about the data association issues in the SLAM algorithm, we adapted a simple data association approach into three main SLAM steps: the prediction step for calculating the current state of particles in motion model; the partial diversity updating step for recalculating the possibility of particles; the resampling step for deleting probable trajectories.

In prediction step, we replaced the probability of particle with a given id. The associations between particles and ids are made during the landmark detection. Then each particle stores its own belief of the landmarks, represented by a Gaussian distribution, and the distribution is updated using a Kalman filter on each particle independently. As illustrated in Fig. 2, each particle information is represented with each landmark by \( {\Theta}_n \) with distribution \( \left(\mu, \Sigma \right) \), characterised by a mean (\( {\mu}_n \)) and covariance (\( {\Sigma}_n \)).

Fig. 2
figure 2

Information of landmarks contains in particles

The new sample of the robot path is represented by a proposal distribution given the visual measurements and control vectors. Each particle obtained a new sample is based on a prediction using a motion model, which is described in Eq. (2). The control vector is composed of the linear velocity \( v \) and the angular velocity \( w \). The pose of the robot is represented by the components of the position in \( x, y \) and the orientation \( \left(\theta \right) \) of the robot after a movement.

$$ \left[\begin{array}{c}\hfill \overset{.}{x}\hfill \\ {}\hfill \overset{.}{y}\hfill \\ {}\hfill \overset{.}{\theta}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill v \sin \theta \hfill \\ {}\hfill v \cos \theta \hfill \\ {}\hfill \omega \hfill \end{array}\right] $$
(2)

After obtaining the new sample of the position and a measurement, each particle must update their beliefs about the landmarks. This is performed using the observation model, defined in Eq. (3). In addition, the control vector frequency is higher than the frequency of measurement in order to increase the diversity of the particles.

$$ \varTheta =\left(\begin{array}{c}\hfill r \cos \varphi \hfill \\ {}\hfill r \sin \varphi \hfill \end{array}\right) $$
(3)

In this work, we used an inverse model to estimate the Jacobian on the Kalman filter of each particle. The inverse matrix model is presented in Eq. (4). The variable \( \boldsymbol{V} \) represents a vector to a landmark using the measured distance, represented by the variable \( r \).

$$ \boldsymbol{G}=\left(\begin{array}{cc}\hfill \frac{Vx}{r}\hfill & \hfill \frac{Vy}{r}\hfill \\ {}\hfill \frac{- Vy}{r^2}\hfill & \hfill \frac{Vx}{r^2}\hfill \end{array}\right),\boldsymbol{V}=\left(\begin{array}{c}\hfill {\boldsymbol{\Theta}}_x- x\hfill \\ {}\hfill {\boldsymbol{\Theta}}_y- y\hfill \end{array}\right) $$
(4)

In the resampling step, original FastSLAM2.0 is based on a Rao-Blackwell factorisation; therefore, it is necessary to do a resample of the particles in order to “keep alive” the particle with higher probabilities. To reduce computational costs, we adopted a low-variance resample method [28]. Based on the low-variance resample, we select the most likely particle after applying a minimisation of the sum of the weights of the particles with respect to a standard uniform distribution. The adapted SLAM process using the low-variance resample method can be found in Fig. 3.

Fig. 3
figure 3

Pseudocode for our adapted SLAM Algorithm with low-variance resample

3.2 Landmark detection

In order to represent the landmarks into a coordinate plane in the observation model, we just measure the distance and the angle from robot to landmark, represented by the variable r (distance) and φ (angle). In the monocular visual sensor scenario, we obtained a relative distance from the camera to an object by knowing the dimensions of the object and calculate the distance using the epipolar geometry.

In order to calculate the distance, we formulated a relationship between the focal length (f) and the distance to a landmark (r). As illustrated in Fig. 4, the height of the landmark is represented by h_r; the height of the landmark in the frame of the camera is represented by h_s. The relationship between the heights is shown in Eq. (5). This equation describes the relationship between the height of the object and the height of the sensor, using the total height of the image frame (I_h), and the height of the sensor (S_h), being these proportional to the relationship between the focal length and the desired distance.

Fig. 4
figure 4

Landmark detection. a Relation of distance towards a landmark. b Relation of change in position of a camera

$$ r= f\frac{h_r}{S_h}\frac{I_h}{h_s} $$
(5)

Regarding the angle calculation, we assume that the landmark is at the front of the camera, projecting a horizontal line that forms an angle of π/2 with the object surface, as illustrated in Fig. 4b. The camera is represented by a three-dimensional plane \( \Big({x}_c,{y}_c,{z}_c \)) while the plane of the image is represented by the two-dimensional coordinate system (\( {x}_l,{y}_l \)). In a similar way, when the landmark is out of the view field of the camera, an angle between the distance and a delta x in the plane of the image is estimated. As a result, due to the plane is represented as a two-dimensional space, the change in \( {z}_c \) of the camera plane is not relevant, representing a point in Z plane as a projection in the plane \( {x}_c,{y}_c \). Hence, no matter the height in the Z position at which this object is detected, the projected point keeps on the angle described.

Moreover, the changing angle can be calculated with a geometry relationship between the delta x and the projection of the distance. Equation (6) is presented as calculating of the desired angle towards a landmark. This angle could also be calculated using vectors and finding the angle between the vectors. The \( {w}_s \) variable, represents the delta x, while the \( {I}_w \) variable is the value of the total width of the image frame. The \( {S}_w \) corresponds to the width in millimetres on the CMOS sensor in the pixy camera. The upper bound of ration is \( \frac{h_s}{w_s}=2.2 \); the lower bound ratio is \( \frac{h_s}{w_s}=2.0 \).

$$ {\varphi}^{\prime }= atan\left(\frac{w_S}{f}\frac{S_w}{I_w}\right), w s=\varDelta {x}_l $$
(6)

3.3 Hardware implementation

Our proposed multi-module SLAM implementation for mobile robotic systems in embedded systems includes three main modules: central computing node, odometry node and image processing node for landmark detection, as shown in Fig. 5 (Appendix 2 and 4). Our prototype has installed two wheel encoders for the detection of movement, a pixy camera to detect the landmarks and a central unit, Raspberry Pi, for the execution of the SLAM solution. The detailed schematic design can be found in Appendix 1. The central computing node is used to calculate the current state of the particle filter and estimate the location of landmarks. The landmark node generates a message to represent the measurement of the distance, the direction of the distance and an ID that associate each landmark. As a consequence, this node has the role of detecting landmarks in the environment. The odometry node sends a message to the central node every 100 ms to update the position of the robot. This message contains the linear velocity (v) and angular velocity (ω) with 10 samples per second. Both the linear velocity and the angular velocity are calculated by the wheel encoders installed on the prototype. In addition, the central node can use that information to reproduce a graphical representation of locations.

Fig. 5
figure 5

(a) The project prototype using a pixy camera and a rasberry Pi 2 (connection can be found in Appendix). (b) The connections for modular design in the prototype

The prototype of this work uses the robot kit module with an incremental encoder attached to each wheel. The robot kit is controlled by an Arduino Uno, which has an output of 5 V and can be powered directly by the robot kit and shared with the odometry node. The DC motors have a separate battery bank to power the HL29N driver controlled by the Arduino. Meanwhile, pixy camera, as one of open source and low-cost image sensor, is used for colour detection. Pixy camera represents the Landmark Node of the system, as it has its own memory and processor, which uses the raw image data for the detection of coloured blocks. The resolution of the images in the pixy camera is 320 × 200 (\( {I}_w,{I}_h \)). The artificial landmarks are detected using poles of different colour combinations as the work presented by [16].

4 Evaluation methodology

In order to obtain empirical data from the prototype, corner detection and corridor detection were performed in a real-life scenario. Each experiment was repeated three times, with the objective to assess the detection of landmarks and obtain a location of the prototype, while storing a map using our intelligible SLAM solution.

Due to the lack of rotational camera in our prototype, we are unable to carry out loop-closing experiments using our prototype. Instead, a loop-closing experiment using our FastSLAM simulator can be viewed using the online link (https://youtu.be/cOghoHYNVuA). The corridor and corner detection experiments are used to demonstrate the robots’ pose uncertainty and the ability to explore unknown terrain.

4.1 Corner detection

The first experiment involved the detection of a corner using six landmarks as shown in Fig. 6. As shown in Fig. 6a, we demonstrated the corner detection; the line in the figure represents the path performed by the mobile robot. Figure 6b shows the blue curve of the average trajectory of particles and the red curve of the covariance of landmarks detected. A pattern recognition was implemented to detect a combination of colours in a cylinder as a landmark. We choose a multiple colour floor in our experiments to produce noise readings for the pixy camera, with the purpose of evaluating how well our data association scheme is performed. Landmarks are formed a semi-circular path with no loop closure. In this way, it produces an increase in the landmark’s uncertainty. As a result, the noise produced by the floor plus the increment in uncertainty at the trajectory, it can be detected how well the SLAM solution discriminates the landmarks and associates the measurements. The average processing time for a single frame was 0.49 s. The landmarks are made of a cylindrical shape with a diameter of 40 ± 0.05 mm and a height of 80 ± 0.05 mm. In this first execution of experiment I, only 66.6% of the landmarks located in the test area were detected.

Fig. 6
figure 6

Corner detection experiment results. a Semi-circular path implemented to map a corner. b Simulation results

The performance of our prototype is limited by the resolution of pixy camera and the low precision of L298 motor driver. Two landmarks were not detected since the noise of the carpet had presented a greater influence by colour combination with blue colour. Figure 7 highlighted the undetected landmark mixed with blue colour and affected by the floor noise in a red rectangle. The detected landmarks are represented by a white rectangle with an associated ID, indicated by the variable ‘s’ and an inclination angle represented by the variable ‘\( \Phi \)’.

Fig. 7
figure 7

Landmark detection. a Single detection of landmark. b Parallel detection of landmark

4.2 Corridor detection

In the corridor detection experiment, we used the landmarks side by side in the trajectory of the prototype, simulating a corridor, which serves the purpose of revisiting the landmarks every step and having multiple loops for the solution of SLAM. In addition, this experiment produces closeness between the landmarks viewed from the frame of the camera. Due to a possible obstruction of detection, this experiment can evaluate how well our data association scheme is affected. As illustrated in Fig. 8a, this experiment has the same setup with exception of the relocation of the landmarks. The average processing time of each frame was 0.56 s. The separation between the top three landmarks on the right side and the second three landmarks on the left side is 48 ± 0.05 cm, as shown in Fig 8b.

Fig. 8
figure 8

Corridor detection experiment results. a Prototype results. b Simulation results

The noise values for the solution of SLAM in the motion model of the experiments were 3 cm for the forward (\( v \)) motion and 2° for a turn in the movement (\( w \)) of the prototype. Similarly, for the observation model, the noise values were 70 mm for the distance measured and 0.027 rad for the angle in the camera frame. The distance measurements from the camera are an approximation of the object detected; as a consequence, these measurements were not incorporated into the proposal distribution. Furthermore, due to this consequence, the filter is overestimated in order to prevent the particle filter to diverge. Likewise, the number of particles used also affects the calculation of the weights assigned to the particles. The overestimation of the distance was 1.2 metres and a standard deviation of 2° for the angle measured. These values tell the filter its belief about how far could be the landmarks. In addition, in all the experiments, 100 particles were used due to the tests performed in a relatively short distance.

4.3 System performance evaluation

Followed the experiments, we conducted in Section 4.2 and 4.1, the artificial landmarks were placed randomly without prior knowledge. The ground truth 3D coordinates of the pixy camera had been recorded and were compared with estimated values from our SLAM solution, as shown in Table 1. Those figures show that on these experiments our SLAM solution gives localisation results accurate to a 5 cm. In the corner detection experiments, our central axis is moved due to the recorded path was from the path described by the left wheel in the mobile robots, thus resulting in a skewed trajectory. Figures 9 and 10 shows the error and variance of measurement of landmark’s grand truth in both experiments.

Table 1 The ground truth coordinates from pixy camera vs. estimate values from our SLAM
Fig. 9
figure 9

System preformation evaluation in corner detection. a The experiment using prototype for corner detection. b Experiment for corner detection with distribution of vector magnitudes of the landmarks in the robot’s frame

Fig. 10
figure 10

System preformation evaluation in corridor detection. a The experiment using prototype for corner detection. b Experiment for corridor detection with distribution of vector magnitudes of the landmarks in the robot’s frame

Figures 9b and 10b presents the distribution of the landmarks for the three repetitions in experiments on corner detection and corridor detection using our SLAM solution. The graph is composed of the magnitude of the vectors to the landmarks. The greatest variation presented at the landmark is 10 cm, showing precision at mapping. As a result, 83% of the landmarks were detected, generating a map of the exposed landmarks. Being this percentage is sufficient to locate the prototype and differentiate the described environment. During this experiment, the average CPU usage was 26.72%. The second experiment consisting of the parallel detection of landmarks was executed three times. The first run is presented in Figs. 9a and 10a. The prototype detected 83.33% of the landmarks for the first run and a minimum of 66.6% of the landmark detected in the other runs.

4.4 Energy profile

The power consumed by the prototype was measured during above experiments using Tektronix oscilloscope with 1.14 W of total power usage in Raspberry Pi V2.0. Figure 11 illustrated our SLAM approach used an average of 45% ARM cortex-A7 processing time. A native implementation of FastSLAM2.0 requires O(M log(K)), where M is the number of particles in particle filter and K is the number of landmarks. We develop an integrated low-variance resample method to select the most likely particle applying a minimization of the sum of the weights of the particles with respect to a standard uniform distribution. Our approach makes it significantly faster than existing FastSLAM2.0 that reduces the running time of our intelligible SLAM approach to O(M log(K/N)), where N is sample intervals.

Fig. 11
figure 11

History of CPU usage during execution of SLAM. a Experiment for corner detection. b Experiment for corridor detection

5 Conclusions

Monocular visual SLAM in low-power devices presents the challenge of obtaining location information just using a single camera. Using motion (visual odometry) as well as pinhole model or epipolar geometry presents a possibility of implementation; however, researchers have not yet explored a simplified data association solution from the algorithm development into low-cost and low-power embedded system design. This work demonstrates an intelligible implementation of FastSLAM algorithm using low-cost and low-power “on-shelf” devices. Our prototype is able to detect landmark with 75% of success rates and 0.53 s processing time using Raspberry Pi V2.0. Our experiment results demonstrate that our intelligible solution, based on low-cost image sensors to an adequate architecture and a simplified algorithm, is suitable to design embedded systems for SLAM applications in real time conditions.

In the future, with the selection of smart sensors for obtaining the position of the robot, such as a higher resolution encoder or multiple sensor fusion approaches in the prototype, we can improve the landmark detection accuracy and explore outdoor environments. Also, we plan to increase the accuracy of landmark detection using Deep Learning in embedded systems, where multiple objects in the environment can be used to train and evolved the neural network over time.