AFE-ORB-SLAM: Robust Monocular VSLAM Based on Adaptive FAST Threshold and Image Enhancement for Complex Lighting Environments

Monocular Visual Simultaneous Localisation and Mapping (VSLAM) systems are widely utilised for intelligent mobile robots to work in unknown environments. However, complex and varying illuminations challenge the accuracy and robustness of VSLAM systems significantly. Existing feature-based VSLAM methods often fail due to the insufficient feature points that can be extracted in those challenging illumination environments. Therefore, this paper proposes an improved ORB-SLAM algorithm based on adaptive FAST threshold and image enhancement (AFE-ORB-SLAM), which works in the environments with complex lighting conditions. An improved truncated Adaptive Gamma Correction (AGC) is combined with unsharp masking to reduce the effect caused by different illuminations. What is more, an improved ORB feature extraction method with the adaptive FAST threshold is proposed and adopted to obtain more reliable feature points. To verify the performance of the AFE-ORB-SLAM, three public datasets (the extended Imperial College London and National University of Ireland Maynooth (ICL-NUIM) dataset with different lighting conditions, Onboard Illumination Visual-Inertial Odometry (OIVIO) dataset and the European Robotics Challenge (EuRoC) dataset) are utilised. The results are compared with other state-of-the-art monocular VSLAM methods. The experimental results demonstrate that the AFE-ORB-SLAM could achieve the highest average localisation accuracy with robust performance in the environments with complex lighting conditions while keeping similar performance in the normal lighting scenarios.


Introduction
With the rapid development and wide applications of autonomous robotic systems, Simultaneous Localization and Mapping (SLAM) has attracted a lot of attention in the robotics research field [1,2]. SLAM, which estimates the position of the robot while reconstructing the surrounding Erfu Yang erfu.yang@strath.ac.uk environment simultaneously in the unknown environment, has vital theoretical significance and application value. Moreover, it is the core technology of autonomous robots in the unknown environment [3].
Lots of sensor modalities such as the sonar, Light Detection and Ranging (LiDAR) and camera can be used for different SLAM systems [4][5][6]. Among them, Visual SLAM (VSLAM) researches are blooming due to their convenience and relatively low requirements for sensors. VSLAM only relies on the camera, which obtains plenty of texture information and has been widely deployed on robotic platforms [7]. The accuracy and robustness of VSLAM are vital for autonomous navigation, especially in the complex lighting environment. For this reason, the stereo camera, depth camera or Inertial Measurement Unit (IMU) have been adopted to VSLAM systems to improve their performances [8]. However, these methods will cause inconvenience and extra cost to the system. Monocular VSLAM systems only rely on the lightweight camera, and simple calibration makes them particularly attractive for many robotic applications [9].
Depending on the image matching methods, VSLAM systems can be divided into the featured-based method and direct method [10]. The former extracts feature points from images and finds their corresponding based on geometric constraints, while the latter finds the corresponding of different frames based on their pixel intensities directly. The VSLAM has been investigated from different perspectives, and lots of cutting-edge VSLAM methods such as the ORB-SLAM3 [11] and Direct Sparse Mapping (DSM) [12] have been developed. However, most advanced VSLAM systems are evaluated in well-lighted environments without considering challenging lighting conditions, such as dark, over-bright or dynamic illumination conditions. Visual blur or feature changes because of different illumination conditions occur in these complex lighting environments. Thereby, the feature matching or frame-to-frame matching process is significantly affected by the changes of illumination conditions. As a result, VSLAM systems may fail in these environments. Thus, developing a robust monocular VSLAM system for the challenging scenario with complex light has significant research and application value.
Towards this end, this paper presents a robust monocular VSLAM system (AFE-ORB-SLAM) through adopting the proposed adaptive FAST threshold and image enhancement techniques. In the proposed AFE-ORB-SLAM, the ORB-SLAM3 is chosen as the framework due to its excellent performance in well-lit environments. In order to handle the poor performance of the ORB-SLAM3 in challenging lighting environments, the truncated Adaptive Gamma Correction (AGC) is enhanced and combined with the unsharp masking method in the AFE-ORB-SLAM. Meanwhile, the proposed system is improved by the proposed efficient and adaptive FAST threshold method. The main contributions of this work are surmised as follows: The rest of the paper is structured as follows: Section 2 reviews related works. Section 3 provides the framework of the AFE-ORB-SLAM. Section 4 presents the details of improved image enhancement and feature extraction. Experimental results and analysation are given in Section 5. Section 6 concludes the whole work.

Feature-Based VSLAM Methods
For feature-based VSLAM methods, image enhancement has been widely utilised to handle the challenging lighting environment. The Histogram Equalisation (HE) was adopted to the HE-SLAM, which improved the contrast of captured low-contrast images [13]. Compared to the ORB-SLAM2 [14], the HE-SLAM was more robust in a harsh environment. However, the HE is affected by background noises significantly. To improve the robustness of the HE-SLAM, Yang et al. [15] adopted the Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm to the ORB-SLAM2 framework. The trajectory generated by this method was closer to the ground truth than that calculated by the HE-SLAM and ORB-SLAM2. The Dim-light Robust Monocular SLAM (DRMS) was proposed in [16], which utilised the linear transformation and CLAHE algorithms as the image pre-processing to enhance the brightness and contrast of input images. After that, the performance of the proposed VSLAM in the dim-light conditions was improved. However, the CLAHE algorithm calculates the neighbourhood histogram for each pixel and performs histogram equalisation processing for sub-regions of the image, which is computationally extensive [17]. Meanwhile, the aforementioned methods mainly focus on either global or local enhancement for all kinds of images, and it may not perform well for different types of images [18]. Moreover, the decreased sharpness of images owing to image transformations should also be taken into consideration [19]. Recently, the deep neural network was adopted to increase representations of captured images for the VSLAM algorithm, which improved the robustness of the VSLAM system in High Dynamic Range (HDR) environments [20]. However, deep-learning based methods need even more computing resources than traditional methods. What is more, even with the enhanced images, sufficient feature points still could not be extracted in some challenging environments. Thus, investigating the VSLAM combined with multiple features has gained research interest. Pumarola et al. [21] proposed the PL-SLAM that relied on line features as well as the ORB features. To this end, a more robust performance in environments with challenging illumination conditions was achieved. Huang et al. [22] processed ORB and Brisk feature points at the same time for a low-lighting environment to improve the robustness of the VSLAM system. However, extracting multiple features needs extra computing resources. Thereby, their applications for mobile robots are restricted due to robots' limited onboard computing capabilities.

Direct VSLAM Methods
In terms of direct VSLAM methods, several works have been done towards complex lighting environments. Extensive experiments have been conducted to verify direct VSLAM systems towards changing illumination environments in [23]. Experiments showed that most direct VSLAM systems failed due to abrupt illumination changes while the brightness constancy assumption was adopted. Sun et al. [24] combined the RGB channel linearly to compensate for the lighting changes. In [25], illumination changes were modelled for affine lighting correction. Thereby, the illumination invariance was handled for the direct VSLAM system. As these methods still rely on the brightness constancy assumption, direct VSLAM systems still cannot handle complex lighting environments.

Structure of the AFE-ORB-SLAM
This work proposes a novel monocular VSLAM system (AFE-ORB-SLAM) based on the ORB-SLAM3 framework for complex lighting environments. The overall schematic architecture is presented in Fig. 1. The two blocks with words in red are the main novel works proposed in this work. Three parallel threads (tracking, local mapping and loop and map merging) are utilised by the ORB-SLAM3. Besides, all the generated maps are managed by the Atlas, which is a novel multiple-map system described below.

Atlas
The Atlas is a multi-map technology that manages all the sub-maps generated by the ORB-SLAM3. The sub-map, utilised by the tracking thread to locate incoming frames, is called the active map. All other sub-maps are called nonactive maps. Both active and non-active maps consist of map points, keyframes, covisibility graphs and spanning trees.

Tracking Thread
The tracking thread is responsible for computing the camera pose and deciding new keyframes. To improve the robustness for illumination variation, the incoming frame is pre-processed by the improved image enhancement method. Next, FAST keypoints are detected with the adaptive threshold and then described with rotated BRIEF descriptor. The details of improved image enhancement and keypoints extraction methods will be given in Section 4. Afterwards, feature points are matched to initialise, track or re-localise the current camera pose. Finally, the tacking thread decides whether the current frame is a keyframe.

Local Mapping Thread
After a new keyframe is deemed by the tracking thread, the local mapping thread is invoked. It eliminates bad map points and adds the keyframe and associated map points into the active map. Poses of keyframes and map points observed by the newly added keyframe will be optimised by a local bundle adjustment. To maintain a scalable and reliable map, the redundant keyframes are deleted.

Loop and Map Merging Thread
In the loop and map merging thread, the newly added keyframe is compared to all frames stored in the Atlas. If two similar frames exist in different maps, both maps are Fig. 1 Structure of the AFE-ORB-SLAM merged as a new active map. On the contrary, a pose-graph optimisation is carried out to reduce the accumulated drift error within the active map. After that, the global bundle adjustment is launched in an independent thread to refine the map further.
Although the ORB-SLAM3 with the monocular sensor achieves impressive performance in well-lit environments, its accuracy and robustness still suffer a lot in complex lighting environments. In these environments, the performances of feature extraction and matching drop significantly. When there are not enough matched ORB feature points obtained from the surrounding environment, the pose estimation process cannot be implemented, even leading to initialisation and tracking failures [19]. For the reason mentioned above, it is crucial to incorporate the algorithm that can handle the variations of illumination in the ORB-SLAM3. In this paper, the image enhancement technology and ORB feature extraction are improved and deployed in the tracking thread to solve this problem. After the image enhancement, more distinguished texture information is revealed. Besides, more stable ORB feature points could be obtained with the adaptive threshold FAST feature extraction in complex lighting environments. Eventually, the accuracy and robustness of the ORB-SLAM3 are enhanced in complex lighting environments.

Image Enhancement
The texture information is decreased in the dimmed or overbright image. Thus, the captured images suffer from poor contrast. Contrast enhancement algorithms improve the visibility of objects in the dimmed or bright area by directly modifying pixel values based on the proper regulation [26]. Gamma correction [27] has gained lots of interest due to easy adjustment and efficient implementation. The Adaptive Gamma Correction algorithm with Weighting Distribution (AGCWD) [28] behaves well to enhance the images captured in the low-lighting environment. As the AGCWD focuses on improving contrast of dimmed images, detail loss occurs in the bright area. Inspired by Cao et al's work [29], truncated Cumulative Distribution Function (CDF) based AGCWD (IAGCWD) is improved and adopted to process both dimmed and over-bright images. Thereby, the local over-enhancement can be reduced. To compensate for decreased sharpness because of the image transmission and transformation, details and contours of the image are enhanced through unsharp masking technology. Eventually, with the combination of image contrast enhancement and image sharpening adjustment technologies, texture information, especially for contours contained in the image, will be more prominent.
The overall structure of the proposed image enhancement method is shown in Fig. 2. It consists of the contrast adjustment module and sharpening adjustment module.

Image Contrast Enhancement
The standard deviation of the image intensity denotes the average contrast of the image [30], and it can be used to divide one image as the low contrast image and the high contrast image. The standard deviation of the image intensity is represented by λ.
The Probability Density Function (PDF) can be calculated by where l is the pixel intensity in the (x, y) position. n l represents the number of pixels with intensity l, and N indicates the total pixels contained in the image. After the histogram distribution is smoothed by the weighting distribution function [31], the weighting distributed PDF can be formulated as where i is used to adjust the smooth level. The sum of P wd (l) can be calculated trough Thereby, the CDF can be formulated as Eventually, the parameter γ wd can be obtained through To improve the performance of the image enhancement, the dimmed and bright images should be processed differently. Thus, based on the average pixel intensity m I , t is calculated to represent the overall brightness of the image t = m I − 128 128 (7) Finally, the image is divided into the bright and dark subclasses based on the value of t.
Following the image classification, the contrast of dimmed and bright images will be restored separately. The bright region in the dimmed image will be degraded due to an overly low gamma value. To this end, a truncated CDF is utilised.
τ is the threshold used for CDF truncation. It makes sure that bright regions are not adjusted by a low gamma value. From plentiful experimental observations, it is set to 0.3 in this work. Thereby, the detailed contour information in the bright area could be reserved. With the adoption of CDF truncation, the dimmed pixels will be processed by a small gamma value, while the restricted adjustment is applied to bright pixels. Thereby, the pixel intensity could be transformed by the following equation: Specifically, the process to enhance the contrast of the dimmed image is introduced in Algorithm 1.

Algorithm 1 Contrast enhancement for the dimmed image
Step1: Calculate the P (l) of the input image I . Step2: Compute P wd (l) with the weighting distribution function.
Step5: Output the contrast enhanced image I ce .
Massive pixels in the dimmed or over-bright images have similar intensities. Over-bright images have high pixel intensities, and their negative images contain an enormous number of pixels with low-intensity values. Thus, the negative image of the over-bright image can be treated as a dimmed image, and it is formed by: Then, Algorithm 1 can be utilised directly to enhance I . After that, the final contrast enhanced image I ce could be obtained through the reverse of the enhanced negative image. Finally, the contrast enhancement mask T mask (x, y) can be obtained as.

Sharpening Adjustment
Image sharpening enhancement highlights the contour and makes the textures of the image clear. Unsharp masking is a typical image sharpening technique. This technique utilises a low-pass filter to get a blurred image. Based on that, a mask is created and combined with the original image to make the texture of the image clear. Specifically, the process of unsharp masking can be realised through the following steps: The input image is processed by one low-pass filter where * denotes the convolution operator, and h(m, n) is a low-pass filter. Unsharp mask g mask (x, y) can be calculated through The sharpened image can be obtained through g sa (x, y) = I (x, y) + k · g mask (x, y) where k represents the sharpening level. For the unsharp masking technique, k is set to 1. In this work, a Gaussian low-pass filter is used, which could be represented by in which σ is the standard deviation of the normal distribution. Finally, the enhanced image can be represented by (17) α and β are two adjustable parameters, which control the level of image contrast enhancement and image sharpening adjustment, respectively. As the unsharp masking technique is utilised in this work, α should be set to 1.0. β is obtained by carrying out extensive experiments under different scenarios to achieve relatively good results, and it is set to 0.3. Users can adjust them to acquire a more preferable result in a specific environment.

Adaptive FAST Threshold for ORB Extraction
The ORB feature consists of the FAST keypoint and BRIEF descriptor. If the pixel intensity significantly differs from that of surrounding pixels, this pixel will be treated as a keypoint. To detect whether a pixel p is a FAST keypoint, the pixel intensity l p will be compared with that of 16 pixels on a circle with a radius of 3 pixels (as shown in Fig. 3). A threshold θ is set manually to distinguish the current and surrounding 16 pixels. If there are over 12 contiguous pixels brighter than l p + θ or darker than l p − θ, the current pixel will be considered as a FAST keypoint. To improve the detection efficiency, the differences between the current pixel and pixels on the circle with numbers 1, 5, 9 and 13 will be detected first. Wherein at least 3 points meet the condition that the pixel intensity difference is larger than θ or smaller than −θ, the remaining 12 pixels on the circle will be detected. Otherwise, the pixel p will be discarded. Then, the scale, orientation invariance and BRIEF descriptor will be calculated by following the approach in [32]. Through the analysation above, the threshold θ is vital for the feature extraction process. Thereby, the performance of the whole VSLAM system will be improved through a proper θ value. However, a fixed θ cannot be adjusted to different illumination conditions. Thus, the feature extraction is degraded in different environments. To overcome this problem, an adaptive FAST threshold calculation method is proposed and adopted to the AFE-ORB-SLAM. Considering the computing efficiency, the λ used for the image enhancement is utilised to control the value of θ. Following the feature extraction process utilised in the ORB-SLAM3, two adaptive threshold values are set. The values of θ are set to 20 and 7 by the ORB-SLAM3. Similarly, if enough feature points can be extracted from the environment, a relatively large θ is used to obtain more reliable feature points.
If the number of extracted feature points is not enough in a quite low contrast image, a relatively small θ will be set.
ω is a parameter to control the threshold for ORB feature points extraction. In our work, as the texture information is enriched, we set ω to 128 which is the median value of the pixel intensity. To a specific scenario, users can adjust ω accordingly to obtain the best result.

Experimental Environment
To verify the performance of the proposed AFE-ORB-SLAM, a laptop with Ubuntu 16.04 is used. The processor is Intel(R) Core(TM) i7-8750H and the program uses C++ 17 compilation. Besides, the laptop is equipped with 12GB RAM. The Imperial College London and National University of Ireland Maynooth (ICL-NUIM) dataset with simulated lighting changes [23], Onboard Illumination Visual-Inertial Odometry (OIVIO) dataset [33] and the European Robotics Challenge (EuRoC) dataset [34] are utilised to verify the localisation accuracy and illumination robustness of the proposed AFE-ORB-SLAM. The ICL-NUIM dataset with simulated lighting changes is a synthetic dataset, and the camera position is available as the ground truth. It contains image sequences under different illumination conditions. Thus, it is suitable for testing the performance of VSLAM systems under different lighting conditions. We use office room sequences with static, local variation, global variation and local and global   Fig. 4.
The OIVIO dataset contains 9 image sequences captured by the Clearpath Husky UGV in weakly lighted environments, such as mines, tunnels and other dark environments. There are 3 scenarios named "MINE GROUND-VEHICLE 1", "MINE GROUND-VEHICLE 2" and "TUN-NEL GROUND-VEHICLE 1" have the ground truth generated by the Leica TCRP1203 R300, and these sequences are utilised in our work to verify the performance of VSLAM systems. What is more, an onboard light of approximately 1350, 4500, or 9000 lumens is utilised to illuminate each scene.
The EuRoC dataset contains 11 sequences collected by the AscTec "Firefly" hex-rotor helicopter. Among them, 5 sequences are recorded in a large machine hall with ground truth provided by a Leica Multistation. The other 6 sequences are recorded in a small Vicon room with ground truth provided by the motion capture system. To complete the V103 sequence, the ORB-SLAM3 relies on the multi-map system significantly, and the ORB-SLAM3 cannot complete the V203 sequence. Tracking losses will lead to unpredictable threats to robot platforms. Thus, in this work, the other 9 sequences are chosen to validate VSLAM methods to simulate their performances on a robot platform.
If the trajectory has a loop, the motion trajectory generated by ORB-SLAM3 and other ORB-SLAM based VSLAM algorithms will be optimised by g2o [35].

Verification of Image Enhancement
To compare performances of different image contrast enhancement methods, the HE and CLAHE that are utilised in VSLAM systems, and the original IAGCWD are chosen. Figure 5 demonstrates the results of different image enhancement algorithms. Figure 5(a) indicates the original images selected from different scenarios. As shown in Fig. 5(b) and (c), some high contrast images are achieved. However, if there are some noises contained in the images, the noises will also be increased significantly. Figure 5(d) shows the results achieved by the IAGCWD, it incurs overenhancement in some bright regions. Figure 5(e) proves that the contrast and visibility of the texture information contained in images are enhanced by the proposed method.

Evaluation on the ICL-NUIM Dataset with Simulated Lighting Changes
To verify a VSLAM system, the Absolute Trajectory Error (ATE) [36] is a common practice. The ATE represents the difference between the ground truth and the path estimated by the VSLAM system. Figure 6 shows the visible trajectory of the ORB-SLAM3 and AFE-ORB-SLAM in the office scenario with local and global variation lighting conditions. The results prove that the trajectory generated by the AFE-ORB-SLAM is closer to the ground truth compared to the ORB-SLAM3. A large offset occurs on the initialising stage of the ORB-SLAM3 while the AFE-ORB-SLAM has a relatively smaller error compared to the ground truth, and our method outperforms the ORB-SLAM3 in all coordinate directions.
To further verify the pose estimation performance of the AFE-ORB-SLAM, the PL-SLAM, DSM and ORB-SLAM3 with default parameters are selected for comparison. The median value of the localisation results for each method from 10 times running is presented. Table 1 indicates the mean ATE and Root Mean Square (RMS) ATE of keyframe trajectories. If the VSLAM system cannot complete all the sequences, the results will be marked by *. The AFE-ORB-SLAM outperforms the original ORB-SLAM3 in all video sequences. For the DSM, even it could achieve the best localisation accuracy in several sequences contained by the Syn1 scenario, it is still vulnerable to different illumination conditions. Moreover, our method shows the smallest error considering the average performance of the same sequence under different illumination conditions. The overall results prove that the AFE-ORB-SLAM achieves accurate localisation with robustness to illumination variations.

Evaluation on the OIVIO Dataset
To further evaluate the performance of the AFE-ORB-SLAM, apart from the VSLAM methods utilised in Section 5.3, VSLAM systems improved by the image contrast enhancement methods are also utilised for comparison. The HE-SLAM and CLAHE-SLAM represent the monocular version of [13] and [15], respectively. The IAGC-SLAM indicates the ORB-SLAM3 with the IAGCWD as the preprocessing technique. Meanwhile, the effect of the proposed image contrast enhancement method and the adaptive FAST threshold for ORB feature extraction are analysed separately. If only the proposed image contrast enhancement  method is adopted to the ORB-SLAM3, the VSLAM system is named the IE-SLAM. The TH-SLAM represents the ORB-SLAM3 improved by the adaptive FAST threshold for ORB feature extraction. Finally, the DSM, PL-SLAM, ORB-SLAM3, HE-SLAM, CLAHE-SLAM, IAGC-SLAM, IE-SLAM and TH-SLAM are selected to compare with the proposed AFE-ORB-SLAM. To simulate the performance of different VSLAM systems on robot platforms, the full trajectories generated by VSLAM systems are used to calculate the RMS ATE. As the DSM and PL-SLAM only output the keyframe trajectory, the keyframe trajectory is still utilised in this section. The median RMS ATE of 10 executions is provided in Table 2. If the VSLAM system cannot complete all sequences, the results are marked by *. Compared with the ICL-NUIM dataset with simulated lighting changes, sequences in the OIVIO dataset have long trajectories, and there is no loop closure during the whole process. What is more, the texture information is not as rich as that of the ICL-NUIM dataset with simulated lighting changes, especially for the TUNNEL scenario. The RMSE ATE obtained in this dataset are larger than the ICL-NUIM dataset with simulated lighting changes.
The performance of DSM is influenced by the illumination conditions significantly, and it achieves the worst performance in almost all sequences. The PL-SLAM fails in the TN 015 GV 01 sequence due to the weak visual connectivity. The HE-SLAM, CLAHE-SLAM and IAGC-SLAM achieve higher localisation accuracy than the ORB-SLAM3 in several sequences. However, the noises contained in the images are also enhanced, and the over-enhancement exists in some regions. The average accuracy of the ORB-SLAM3 outperforms the HE-SLAM and IAGC-SLAM, while the CLAHE-SLAM and ORB-SLAM3 have similar average accuracy. The IE-SLAM and TH-SLAM obtain better results than the ORB-SLAM3 in most sequences. However, due to the less texture information contained in the TUNNEL scenario, the TH-SLAM performs worse than the ORB-SLAM3. Apparently, the AFE-ORB-SLAM obtains the best localisation performance in all sequences.
The visualised localisation results are exhibited in Fig. 7 for the TUNNEL scenario. It shows that the low visibility of the environment has a great impact on the DSM. The PL-SLAM and ORB-SLAM3 rely on feature matching against neighbouring frames. When not enough reliable matched feature pairs are obtained, significant performance degradation can be observed. Considering the HE and CLAHE algorithms cannot handle different images properly, their performances are also influenced by different illumination conditions. The proposed AFE-ORB-SLAM could localise the robot accurately under different illumination conditions.
The time usage of the DSM, PL-SLAM, ORB-SLAM3, HE-SLAM, CLAHE-SLAM and AFE-ORB-SLAM for the scenario under different illumination conditions are averaged and shown in Fig. 8. The results further confirm that the AFE-ORB-SLAM is able to deal with the challenging scenes that provide less visual information effectively and efficiently. The average accuracy is improved by 34.65% with the comparison to the ORB-SLAM3. In contrast, the average processing time is only increased by 3.38%.

Evaluation on the EuRoC Dataset
The AFE-ORB-SLAM is further validated on the EuRoC dataset. Comparisons of the AFE-ORB-SLAM against the DSO [37], SVO [38], DSM, ORB-SLAM3, HE-SLAM and CLAHE-SLAM are presented in this section. The results published in [37] for the DSO, in [38] for the SVO and in [12] for the DSM are utilised. For other VSLAM systems, the median RMS ATE of 10 executions for the full trajectories is obtained. The results are shown in Table 3. As the most images in the EuRoC dataset contains rich texture information with regular lighting   Moreover, the proposed AFE-ORB-SLAM still achieves the best average accuracy on selected scenarios. To verify the robustness of different VSLAM systems, the results with 10 times execution are presented in Fig. 9. Different colour squares represent the RMS ATE obtained in each of the 10 executions. The results demonstrate that the precision of the ORB-SLAM3 could be improved by Fig. 9 Precision comparison of different VSLAM methods adopting the proper image contrast enhancement method, and the CLAHE-SLAM achieves the best results. The proposed AFE-ORB-SLAM outperforms the ORB-SLAM3 in terms of not only the accuracy but also the precision.

Conclusion
In this paper, the AFE-ORB-SLAM based on adaptive FAST threshold and image enhancement for challenging lighting environments has been proposed. In contrast to other VSLAM methods, we aimed to extract more reliable feature points from the images captured in challenging lighting conditions. To realise this goal, the image was enhanced from both the contrast adjustment and sharpening adjustment. Moreover, the ORB feature extraction was improved by adopting the adaptive FAST threshold. Experiments on publicly available datasets demonstrated that the AFE-ORB-SLAM was capable of achieving accurate and robust localisation performance in the environments where the images were captured under different illumination conditions, even with less visual information. In addition, the AFE-ORB-SLAM outperformed other state-of-the-art monocular VSLAM methods in most sequences in terms of localisation accuracy.

Conflict of Interests
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.