1 Introduction

With the emerge of the Internet of things (IoT), localization within indoor environments like supermarkets, airports, train stations, and hospitals becomes inevitable [1, 2]. In supermarkets, a customer can select a cart which is equipped with a personal digital assistant (PDA) screen and radio-frequency identification (RFID) tags, the cart's location is identified through hybrid Wi-Fi and RFID system, if the customer wants to find the place of a product, he can search through the PDA screen, and directions towards the target are given [3]. In museums and galleries, rather than providing brochures to visitors, nomadic tourists can be handled by a device that operates on Bluetooth and Wi-Fi. The device can give direction toward a specific part of the gallery and support information about a piece of art. It can also tell tourists if congestion is in a particular place to save time and explore something else [4]. Similarly, exhibitions have many visitors daily; for example, in 2019, the DreamWorks exhibition was held in Reo De Janeiro from 6-Feb till 15-April with 11,380 visitors daily [5].

Students can be guided in libraries to books' locations using Bluetooth beacons. The student can download a mobile application that will relate his coordinates to the network and recommend him. The localization accuracy is in several meters, guaranteeing that the student will be close to the required shelf books [6]. It's vital in hospitals to monitor some patients' movements, especially those who suffer from mental diseases like Alzheimer's; this can be accomplished by utilizing RFID technology [7]. Additionally, RFID tags can be placed on different parts of the body; this can tell the patients who require home-healthcare whether they are sitting, sleeping, walking, standing, or collapsing, which requires close attention and quick response [8]. Firefighters may need to get into a burning building. As a result, one or more may be trapped or fall unconscious; therefore, their location should be inferred all the time. A possible way by mounting dual-shoe integrated inertial sensors and inter-agent ranging sensors; the firefighter place can be concluded within the building and how far he is from the other team members [9, 10]. Additionally, fire detection systems can be incorporated with ZigBee based sensor networks to localize the fire's source [11].

For military applications, friendly and hostile entries can be detected, cameras/ acoustic systems can be used to detect humans within highly secure places [12]. Also, robots are used to accomplish missions that may lead to human causality or can't be achieved by soldiers [13].

The article structure is as follows: a review on localization systems technologies is presented in Sect. 2, a study on localization detection techniques is conducted in Sect. 3, while Sect. 4 offers localization methods and algorithms. And finally, conclusions are drawn.

2 Localization systems technologies

2.1 Satellite-Based Navigation

The global positioning system (GPS) is the most popular system for outdoor localization. However, it requires line-of-sight (LOS) between the satellites and the handset. Thus, and due to building external walls, it becomes inefficient for indoor location-based services [14]. The GPS can be utilized by using a steerable, high gain directional antenna as the front-end of a GPS receiver [14]. In places where GPS signals cannot be reached, pseudolites (i.e., pseudo satellite) are used as independent localization systems; these systems are composed of pseudolites, transmission and receiver antennas, target receivers, and reference [15]. The central concept is to receive the GPS signal and repeat it through indoor transmitters [16].

2.2 Inertial Navigation System

Inertial navigation system (INS) employs inertial measuring units (IMU) such as accelerometer and gyroscope for defining the location and directional movement of the objects to an initial location, velocity, and angle [17]. INS is distinguished with accuracy, energy efficiency [18] provided that the inertial sensor must be attached to the object's surface. However, INS may become exposed to errors requiring the need to implement sophisticated filtering approaches like Kalman filter [19]. Another drawback of using the INS is the cost and effort needed to deploy the network infrastructure of the location sensor [20]. In [21], a novel initial location estimation scheme is presented by combining existing WiFi routers and iBeacons.

iBILL was proposed in [22]; it combines iBeacons and inertial sensors; the system has modes: iBeacon localization mode and particle filter localization (PFL) mode. iBeacons is used to let PFL cope with magnetic field fluctuations, therefore no accumulating errors with walking, and reduce the computational overhead of PFL. iBILL was compared to Magicol (a system fuse geomagnetism and inertial sensors) and dead reckoning (DR). It was observed that iBILL suppresses errors from increasing with walking distance and has better performance.

In [23], a hybrid localization scheme consists of inertial sensor-based dead reckoning with acoustic localization; the scheme is then fused with Kalman filter. It was found that the proposed systems outperform the stand-alone systems and overcome their drawbacks. In [24], a hybrid scheme is presented that combines Wi-Fi fingerprinting with inertial sensors; the proposed method performs better than individual techniques.

2.3 Magnetic Based Navigation

Magnetic based technology is utilized for localization at low frequency. At least three reference magnetic stations are radiating magnetic field, a magnetic sensor receives the radiated fields, and the sensor's location is estimated by trilateration. This technology is accurate at low frequencies; however, it's sensitive to conductive and ferromagnetic materials [25]. Usually, magnetic-based navigation systems depend on the Earth's magnetic field disturbances within indoor environments; these disturbances occurred due to the ferromagnetic nature of metal structures within buildings [26].

Therefore, by recording the magnetic field at known locations, magnetic maps are constructed; these maps can infer an unknown target's location depending on its magnetic field measurement. This approach is called magnetic fingerprinting [27, 28]. However, magnetic interference could be significant and cause localization errors [29]. Using different handsets over the same routes may also record different magnetic measurements, making using handsets for localization questionable [26]. In [30], the authors examined localization using deep learning; in their work, accuracy was found to be 0.8 m in corridors and 2.3 m in the atrium.

The use of magnetic only localization systems in large areas makes it possible to have large magnetic field fluctuations [22]. Hybrid techniques have been examined to improve accuracy; for example, in [31], a hybrid technique that utilizes a magnetic sensor and inertial sensor in localization allows the user to use the smartphone without restriction during localization with accuracy around 1–2.8 m.

Cameras and magnetic fields are merged with neural networks to improve localization provided that the cell phone is at an upright position; their proposed method achieved more than 91% accuracy of 1.34 m [32].

2.4 Sound-Based Technologies

Sound waves have a lower velocity than electromagnetic waves; this makes time synchronization easier, which is a significant concern. Humans can hear within 20 Hz to 20 kHz range; localization using sound can be classified as ultrasonic and acoustic-based navigation systems.

2.4.1 Ultrasonic Based Navigation Systems

The acoustic method sounds to be the first localization approach since the Stone Age; nowadays, ultrasound is utilized to incorporate existing mobile devices for localization [33]. Ultrasonic localization is eminent within short ranges due to its low-power penetrating losses through indoor walls, cheap components, and its compatibility with handled devices. On the other hand, [34, 35], localization errors are introduced due to many factors, including multiple reflections from surfaces and synchronization problems between communicating nodes. Ultrasonic localization systems suffer from complex signal processing algorithms [20]. Compared to Ultra-Wideband (UWB) systems, ultrasonic localization systems can tolerate better with low time synchronization [36].

In [37], the authors proposed a localization method based on flight time using a single transceiver; localization median error was 15 cm. The location of ultrasonic nodes should be determined before localization To have accurate results [38].

2.4.2 Acoustic Based Navigation Systems

The technology utilized the build-in microphones in smartphones. The microphones capture the source's sound, which will be used to identify the location with respect to a reference [39, 40]. All smartphones have microphones, while malls, airports, and hospitals have their speakers and microphones, making this technique cost-effective [40, 41]. The transmitted modulated acoustic signal contains information like the time stamp; this will be used to estimate TOF and hence find the target's location using multi-lateration. To avoid annoying people, transmitted power should be low; this requires sophisticated signal processing for detectors to detect the low power transmitted signal.

2.5 Optical Based Navigation

Although the optical signals are a form of EM spectrum, however, since the techniques and challenges are quite different, it would be better to dedicate a section for reviewing localization systems based on optical signals. In this section, we introduced two technologies, namely: infrared technology and visible light technology.

2.5.1 Infrared Technology

Infrared (IR) systems are applied in Line of Sight (LOS) scenarios where handsets already have sensors like photodiodes. IR is featured for its simplicity, low weight, compact size, and immunity to interference (unlike RF systems) [42, 43].

However, it suffers from fluorescent light and sunlight interference; also, the hardware of these systems are expensive and costly when performing maintenance [42]. IR systems are composed of IR emitting devices (e.g., LED) and IR sensors (e.g., photodiode). The target wears the IR emitting device; this device emits a signal with a unique identifier; the target's location is identified once the sensors detect the IR signal. The Active Badge is an example of commercial IR based technologies. In [44], pyroelectric infrared (PIR) sensors are used to detect heat radiation changes emitted from people and animals; accuracy is found to be in terms of centimeters; however, they are vulnerable to environmental changes [45].

2.5.2 Visible Light Communications

Visible Light Communications (VLC) can be used in RF sensitive places like hospitals [46]; it has been shown that VLC provides a higher accuracy compared to Wi-Fi systems [47].

With the merge of light-emitting diodes (LED), VLC is widely employed in localization; LED has many advantages, including long-life expectancy, immunity to humidity, low power consumption, and low cost [46]. Also, LED can modulate lightwave signals at high speed [20].

General, light bulbs send coded transmitted light; at the receiver, the sensor compares the detected light with the existing database, which links coded light to the position. And the sensor location inferred.

Localization using VCL can be performed using photodiode based systems which capture the light intensity and image sensor-based systems which can capture light pulses [48]. While photodiode based systems are inexpensive, mobile is equipped with a camera but not photodiode; this makes image sensor-based systems more preferable.

However, both LED and sensors should be in light of sight for accurate localization. AOA is used in visible light systems to ensure precise localization [49]. Light Detection and Ranging localization (LIDAR) provides surrounding obstacles' contour information; with inertial sensors, these systems can provide accurate localization [50].

2.6 Radio Frequency (RF) Based Navigation

Radio Frequency (RF) based systems are the most adopted systems for localization. It is promoted as it covers a wider area with low-cost hardware required [51]. This is illustrated because the RF waves can penetrate materials like walls and human bodies. Compared with other localization systems like IR and ultrasonic based navigations, RF-based systems tend to show better results. However, these systems should be avoided in hospitals and planes as they may interfere with existing RF systems.

The Wireless Technologies used for indoor localization can be categorized depending on the different radio frequency it operates since the radio frequency is less than 300 GHz in the radio spectrum [42]. At the same time, the frequency of wireless technology influences its abilities like coverage, wall penetration, and resistance to obstacles. Thus for different applications, there are three categories of wireless technologies used for location-based applications, i.e., long-distance wireless technology, middle distance, short distance wireless technology [51]. Factors including complexity, accuracy, and environment play an essential role in determining the type of distance measurement system applied for a particular use [52,53,54].

Nodes’ position information in a Wireless Sensor Network (WSN) has become vital for many features like routing, clustering, and context-based applications. WSN is defined as a network of devices called nodes, these nodes sense the environment’s fields (like temperature, humidity, and luminosity) and communicate the collected information wirelessly [55]. These information are forwarded to sink node deployed for data collection. Examples include: indoor fire control, smart homes, and recue tasks [56]. WSN has been developed based on IEEE 802.15.4 in wireless personal area network (WPAN). WSN localization is the process of localizing the target using a network of wireless sensors [57, 58]. The knowledge of nodes’ location in WSN is important, since measurements without location information becomes useless. An example of localization in WSN is by using RSSI based on ZigBee standard [59]. WSN for localization can use both range-based (depends on inter-node measurements) and free-range based localization [60, 61]. It can be further divided into exact (lateration, trilateration) and approximate types (proximity and scene analysis).

In [62], fire fighters use suites that are equipped with five sensors to measure blood pressure, core temperature, heart rate, O2 oximeter, heat flux and wind speed, these data are send constantly to the team leader to check members’ status. Similar project was proposed by [63], where in case of fires, victims can be tracked, also, save paths to exits can be found. By using the ultrasonic waves which is not affected by smoke, ashes and fire flames, firefighter’s status within buildings can be inferred. If the location is estimated by one computer, the localization system is centralized, if the nodes are able to estimate the target location, the localization system is distributed [64].

Examples on RF based navigation systems include: WiFi [65], Bluetooth [66], Zigbee [67], Ultra-Wideband (UWB) [68] and Radio Frequency Identification (RFID) [42]. This section will provide short descriptions of these technologies.

2.6.1 Frequency Modulation Technology

Frequency modulation (FM) broadcasting can be used for localization for outdoor environments and indoor environments recently [69]. FM operates at 88–108 MHz, which is lower than cellular networks (operates at 0.9 and 1.8 GHz) and Wi-Fi (2.45 GHz). Therefore, FM radio signals are less affected by weather conditions, less sensitive to terrain conditions, and can penetrate walls more comfortable [70]. As it has a larger wavelength (3 m), the interaction with indoor objects and furniture is different from that at Wi-Fi frequencies. Operating at FM frequencies does not interfere with other RF components that operate at 2.4 GHz [71]. Also, FM receivers consume less power.

RSS is used for indoor localization by using the fingerprinting technique; in [72], fingerprinting performance was compared using different machine learning methods, k nearest neighbor (kNN), support vector machine (SVM) classifiers, and Gaussian processes (GP) regression. It was found that the kNN approach achieved the best results. Also, for better accuracy, it was recommended to choose stations with stronger signals. FM performance was compared to Wi-Fi. It was found that Wi-Fi has the best localization performance over large spaces like floors, while FM tends to have the best performance over small places, like rooms.

2.6.2 Cellular Based Technology

Cellular networks operate on many frequency bands, including the 0.9 GHz, 1.8, and 2.8 GHz bands. These networks provide better coverage compared to Wi-Fi networks and require no additional infrastructure. Initially, proximity was used for localization, where the mobile location is identified within the cell coverage range; this approach provides extremely low results [73]. Localization is performed usually using RSS, in which the fingerprinting technique is the adopted one. Other researchers consider using TOA, where trilateration is the adopted localization technique. In the case of using RSS fingerprinting, cell stations are regarded as the APs; the accuracy was found to be in the range of 2.5–5.4 m [74].

In [75], localization was performed using RSS fingerprinting using the Global System for Mobile Communications (GSM) received signals. Fingerprints were collected from 29 GSM channels and 6 cells in their work. Localization error was less than 5 m, similar outcomes achieved by [76]. Universal Mobile Telecommunications Service (UMTS) cell for indoor coverage was used for localization by taking fingerprints. In [68], measurements were taken in an office environment. It was found that localization using UMTS small cells is comparable to WLAN in terms of accuracy. Also, Long-Term Evolution (LTE) was utilized for indoor environment, in [77] authors conducted localization using TOA; in their work, the error was less than 8 m for 50% of cases. LTE could be fused with an inertial measurement unit as in [78], where root mean square error (RMSE) was around 3.52 m.

In [79], synthetic aperture navigation (SAN) framework was used to mitigate the effect of multipath signals; a synthetic aperture antenna will capture signals at different time instants. This would be similar to catching the signals from an array. SAN will use the Estimation of Signal Parameters via Rotational Invariance Technique (ESPRIT) algorithm to find DOA to determine the position; in their work, the RMSE of localization was 3.9 m for LTE-SAN compared to 7.19 m for standalone LTE.

In [80], a study on localization using LTE-only and LTE-WLAN fingerprinting was conducted; it was found that LTE-only fingerprinting gives poor results while using LTE-WLAN fingerprinting enhanced the performance by order of 3.5x. Cellular systems can be used to support other existing RF localization systems like RFID [81], Wi-Fi [82,83,84],

2.6.3 Wi-Fi Technology

Wi-Fi is the name of popular wireless networking technology. Wi-Fi operates within the RF bands of 2.5 GHz for IEEE 802.11b, IEEE 802.11 g, and IEEE802.11n, and in 5 GHz for IEEE 802.11a.

A most massive indoor environment such as a university or an office building has already distributed WiFi hotspots that provide whole-building coverage as a network access point. Devices that employ Wi-Fi technology include personal computers, video-game consoles, smartphones, digital cameras, tablet computers, and digital audio players [85, 86].

That the infrastructure cost and user device cost for Wi-Fi can be very low, and Wi-Fi has covered a reception range of about 100 m and has now increased to about 1 km (km). Additionally, Wi-Fi localization based on fingerprinting RSS (Received Signal Strength) [87, 88]. Wi-Fi could be used with other RF localization techniques, like RFID [89]. Wi-Fi covers wider areas than Bluetooth and provides higher throughput; this makes the utilization of Wi-Fi more practical [90]. Examples of commercial Wi-Fi-based positioning systems include RADAR, HORUS, COMPASS, HERECAST, and PlaceLab [88, 91].

2.6.4 ZigBee

ZigBee is a specification based on IEEE 802.15.4 standard. It uses the 868 MHz band in Europe, 915 MHz bands in the USA and Australia, and 2.4 GHz in other regions. ZigBee is used for long-distance transmission between devices in a wireless mesh network. It has low cost, low data transfer rate, short latency time, comparing to WiFi standards. The technology uses the RSS method to estimate the distance between two or more ZigBee sensor devices [92, 93]. Access point (AP) scanning via the WiFi interface leads to high power consumption. To reduce this effect, authors in [94] introduced an energy-efficient indoor localization using ZigBee termed as ZIL, in which the ZigBee interface is used to collect Wi-Fi signals. In [95], a ZigBee localization algorithm based on proximity learning was introduced; the proposed method differs from other traditional triangulation based techniques as it reduces computational time while maintaining accurate positioning.

2.6.5 Bluetooth

Bluetooth (IEEE 802.15.1) is intended to enable short-range wireless communication between devices. Bluetooth communicates utilized radio waves with frequencies between 2.402 GHz and 2.480 GHz as Wi-Fi does. If features with cost-effectiveness, low transmission power, battery life, secure and efficient communications, and accessible solutions [96, 97]. The new Bluetooth version termed Bluetooth Low Energy (BLE) can cover a range of 70–100 m and provide 24 Mbps with higher power efficiency [98]. Hence, Bluetooth is not suitable for localization for the large area [66]. In [97], neural networks (NN) are trained using the received signal strength values and their corresponding coordinates in the training phase; once they got trained NN, they can be used to detect user location based on the online RSS measurements. Recently, BLE based localization is utilized in smartphones as iBeacons (Apple) and Eddystone (Google), the smartphone can be used for localization within airports, train stations, big markets, malls and restaurants, where the area map is sent to the smartphone and then localization is performed using the BLE [98].

2.6.6 Ultra-Wide Band

According to the U.S. Federal Communications Commission (FCC), a UWB signal has absolute bandwidth more than 500 MHz and carrier frequency larger than 2.5 GHz [99]. Using low power consumption in UWB achieves a large bandwidth, high-speed communication, high time resolution, high data rate, and short-wavelength, making UWB stronger against multipath interference and fading. Another useful property of UWB is that it is permitted to occupy low carrier frequencies, where signals can more easily pass through obstacles; UWB is also immune to interference since it has a significantly different spectrum [100]. All those characteristics guarantee UWB a good candidate for indoor wireless positioning. TOA and time difference of arrival (TDOA) have higher accuracy than other localization algorithms because of the high time resolution of the UWB signals, where the multipath effect is minimized. UWB can minimize error to centimetres [68, 101]. In [102], the authors proposed a hybrid localization using Wi-Fi and UWB; this is accomplished by adding UWB beacons to existing Wi-Fi infrastructure. The hybrid method combines the availability of Wi-Fi infrastructure, which will reduce cost and the accuracy of UWB when deploying their algorithm; localization error was limited to 20 cm. Typical UWB systems can localize a limited number of tags; in [103], a UWB localization system called SnapLoc can localize an unlimited number of tags.

In [104], UWB localization system performance was analyzed for LOS and NLOS scenarios. The position was estimated using linearized least square estimation (LLSE), fingerprint estimation (FPE), and weighted centroid estimation (WCE); in their work, it was found that FPE shows the best performance, while LLSE shows the worst.

2.6.7 Radio Frequency Identification (RFID)

Radio Frequency Identification (RFID) systems operate on RFID tags' backscattering communication and RFID readers and middleware for processing the signal generated between the tags and the readers [105].

RFID tags are either active, passive, or semi-active. Active tags are equipped with an inbuilt battery embedded in their circuitry. Active RFIDs operate in the ultra-high frequency (UHF) and super high frequency (SHF) range with a detection range of up to 100 m. Thus, active RFID is useful for long-range localization and object tracking [106,107,108].

However, active RFID technology is not reliable for sub-meter localization accuracy, and it is not readily available on most portable user devices. Passive tags do not have inbuilt batteries but backscatter the signal received from the base station. Passive RFID is readily used for various applications due to its numerous benefits, including cheap cost, miniaturized size, ease of manufacturing compared to active RFID since it only requires a tag chip and antenna. Passive RFID is useful for sub-meter detections and can detect targets in up to 10 m range [108].

Radio Frequency Identification (RFID) technologies are viral due to their low cost. In such technologies, many reference tags are deployed in advance. Each tag will act as the transmitter, and the Radio Signal Strength Indicator (RSSI) information is measured from the readers around. The target tag's position is then estimated by those reference tags whose RSSI information is closest to the target tag's RSSI information [109, 110].

In [111], authors conducted localization using three types of sensors pairs, including infrared sensor pair, RFID reader and tags, and light-emitting diode LED and light resistor. It was found that localization using RFID outperforms other sensors in terms of accuracy and stability. Table 1 summarizes the characteristics of the localization technologies used nowadays [46, 98, 112,113,114,115,116,116].

Table 1 Localization technologies performance overview

3 Localization Detection Techniques

3.1 Proximity Based Technique

(Also called relative positioning/connectivity) have been suggested as a cheap and straightforward means to estimate the range between mobile and AP location. The proximity approach does not matter if the mobile and the AP are exposed to the same fading channel or not, as long as they are within the communication range [117]. The location for the mobile is estimated using the coordinates of the AP. The proximity approach is simple and widely used; however, accuracy is limited to AP radio coverage [118]. Generally, there are three proximity methods, including sensing physical contact, where sensors like pressure sensors, touch sensors, and capacitive field detectors are used to feel physical contact. The second approach includes observing the wireless signal of mobile within the access points range. Finally, it observes automatic ID systems, like credit card payment terminals [119].

3.2 Scene Analysis

In scene analysis, videos, virtual images, or electromagnetic characteristics of the target are recorded and compared to the existing dataset, which will help the target received the feature to be mapped to a location [120, 121]. For example, wearable cameras are used to capture virtual images and link them with the target's location. Wireless signal characteristics at defined locations can be used to create a radiomap, which can infer the mobile's location by mapping the mobile's signal data to the map. This is renowned as fingerprinting. This type of localization is famous for its simplicity; however, it requires collecting a sufficient amount of data, also [122], changing the environment may lead to change the features characteristics which necessitate updating the dataset [119].

3.3 Triangulation

In triangulation, the target location can be determined by forming triangles from known points to the target. It has two categories: lateration and angulation. Lateration is a distance-based technique, which is used in Time of Arrival (TOA), and Received Signal Strength (RSS), while angulation is a direction-based technique that is used in Angle of Arrival (AOA) approach.

3.3.1 Lateration

The distance between AP and mobile is related to the received power/ time of travel; this relationship can be represented in a mathematical expression. For 2D measurements, with two equations, there will be two possible solutions. To have a unique solution, three equations are required; the intersection of these equations will determine the location of the mobile as shown in Fig. 1; if the problem is in 3D (\(x, y,z)\) then four APs are used at least to have a unique solution [123].

Fig. 1
figure 1

Trilateration localization

Lateration is also used for estimating location through differential measurements (receive signal strength/ time of arrival). Differential measurements are used to reduce the effects of environmental changes. In this case, the transmitted power is unknown (DRSS) or if the time of transmission of the access point is unknown (TDOA), [124,125,– 126]

Generally, if \(m\) APs are collaborating in localization, there will be (\(\frac{m(m-1)}{2}\)) formulated differential equations, among these equations (\(m-1\)) are basic equations, while the rest are redundant, the solution of each basic equation will lie on a hyperbola, the intersection of these hyperbolas gives the mobile's coordinates [127]. Using 3 APs system, there will be two basic equations, while the third equation is a linear combination of the basic ones. Three basic equations are required to have a unique solution, which is achieved by adding another AP, as shown in Fig. 2. Therefore, for 2D localization, 4 APs are required.

Fig. 2
figure 2

Hyperbolic localization using DRSS

3.3.2 Angulation

For 2D measurements, two angle measurements and single range measurement are required to localize an object as seen in Fig. 3; the range measurement could be the distance between the two arrays. For 3D measurements, two angle measurements, single azimuth measurement, and single range measurement are required [119].

Fig. 3
figure 3

Angulation using two known angles and known distance

3.4 Dead Reckoning (DR)

DR technique relies on inertial measurement unit sensors; the sensor can track target movement by the equipped accelerometers, gyroscopes, and magnetometers [128]. Knowing the target's velocity at a known location, the position is updated by adding the estimated displacement to the previously estimated location [129]. Its simplicity distinguishes this technique. However, it requires an accurate initial position to avoid errors, even though, since there are no external reference signals is used for correction, errors are accumulated over time [130]. Hybrid techniques are used to have more accurate results [129,130,131]. In [132], localization was performed using pedestrian DR (PDR), Wi-Fi fingerprinting; it was found that PDR has the worst performance while making a hybrid of PDR, and Wi-Fi gives the best performance. Table 2 summarizes the localization detection techniques; the summary includes a comparison between these techniques, including the accuracy, measurement type, and cost [42, 114, 133, 115].

Table 2 Localization detection techniques comparison

4 Localization methods and algorithms

4.1 Angle of Arrival Measurements

Antenna arrays are used to detect arrivals' angle; the direction of arrival (DOA) is used for many applications, including beamforming and localization [133]. DOA requires the use of antenna arrays, which makes the technique more expensive and more power consumption than TOA and RSS [134]; however, it requires less equipment as only two APs are needed to infer mobile's location [135].

In the following discussion, it worth explaining the meanings for the used symbols, as shown in Table 3.

Table 3 Notations used for AOA techniques discussion

4.1.1 Classic Beamformer (Bartlett Beamformer)

The Bartlett method is considered to be one of the earliest AOA estimation techniques; during the process of making the array manifold, the covariance matrix is implemented; accordingly, the output power of the beamformer is estimated, maximum peaks of \(P(\theta )\) correspond to AOA [136, 137].

$$ P\left( \theta \right) = { }E\left\{ {\left| {y\left[ n \right]} \right|^{2} } \right\} = \max E\left\{ {{\varvec{w}}^{H} {\varvec{x}}\left[ n \right]{\varvec{x}}\left[ n \right]^{H} {\varvec{w}}} \right\} = \max E\left\{ {{\varvec{w}}^{H} \user2{R w}} \right\} = \frac{{{\varvec{w}}^{H} \user2{R w}}}{{{\varvec{w}}^{H} {\varvec{w}}}} $$
(1)

where \(y\left[n\right]\) is the array output, \({\varvec{x}}\left[n\right]\) is the array input data vector, \({\varvec{R}}\) is the covariance matrix, and \({\varvec{w}}\) is the weighting vector. For the uniform linear array (ULA) [138], where K is the number of antenna elements:

$$ P_{BAR} \left( \theta \right) = \frac{{{\varvec{w}}^{H} \user2{R w}}}{{K^{2} }} $$
(2)

The drawback of using the Bartlett beamformer is the low resolution as it appears when the impinging signals are very close [138]. The separation between the electrical angles \(\phi \) (equals to \(kdcos(\theta )\)) should be more than (2π/K), adding more elements with suitable design will compensate for these impairments; however, this will add more the cost, size and storage data for calibration [138].

4.1.2 Capon Minimum Variance Method

The Capon method uses some of the degrees of freedom to look at the desired signal \({\theta }_{0}\) and use the remaining to suppress the interfering signals [136]; in other words, the algorithm has two constraints, minimize the power from noise and interferer signals \({{\varvec{w}}}^{H}.n[n]\) and keeping the gain of the desired signal constant \({{\varvec{w}}}^{H}. {\varvec{a}}({\widehat{\theta }}_{0})\cong 1\) [136] where \({\varvec{a}}\left({\widehat{\theta }}_{0}\right)\) is the steering vector of the estimated angle of arrival \({\widehat{\theta }}_{0}\). Mathematically array weighting is estimated by [138]:

$$ \mathop {\min }\limits_{w} \left( {{\varvec{w}}^{H} . {\varvec{R}}.{\varvec{w}}} \right){\text{subject to }}{\varvec{w}}^{H} . {\varvec{a}}\left( {\hat{\theta }_{0} } \right) = 1 $$
(3)

The weighting vector can be estimated using the Lagrange multiplier method:

$$ {\varvec{w}} = \frac{{{\varvec{R}}^{ - 1} {\varvec{a}}\left( {\hat{\theta }_{0} } \right)}}{{{\varvec{a}}^{{\text{H}}} \left( {\hat{\theta }_{0} } \right){\varvec{R}}^{ - 1} {\varvec{a}}\left( {\hat{\theta }_{0} } \right)}} $$
(4)

Applying in Eq. 4, the total power received is:

$$ P_{CAP} \left( \theta \right) = \frac{1}{{{\varvec{a}}^{{\text{H}}} \left( {\hat{\theta }_{0} } \right){\varvec{R}}^{ - 1} {\varvec{a}}\left( {\hat{\theta }_{0} } \right)}} $$
(5)

This method shows better performance than the Bartlett beamformer as it can distinguish between angles with small separation, and it also suppresses the side lobes levels. One drawback of the Capon method is the expensive calculation for the inverse matrix for large antenna arrays [139]. The Capon beamformer shows acceptable performance in terms of resolution and rejection of interference under the condition that the presumed steering vector of the desired signal is identical to the actual one. However, practically, it is difficult to let the most accurate steering vector similar to the actual one; thus, the performance degrades dramatically [140].

4.1.3 The MUSIC Algorithm

The MUSIC algorithm focus on the eigenvector decomposition of the covariance matrix [141]. The array will scan all possible angles; for an arbitrary steering vector, a projection on signal subspace and projection on noise subspace will be projected. If the steering vector corresponding angle is close to the actual arrival angle, then the steering's projection on the noise subspace will be almost zero [141]. Dividing by zero will give infinite; however, since the steering vector corresponding will not be precisely the actual arrival angle, the division result will be enormous value rather than infinite.

$$ P_{MU} = \frac{1}{{{\varvec{a}}^{H} \left( \theta \right){\varvec{U}}_{n} {\varvec{U}}_{n}^{H} {\varvec{a}}\left( \theta \right)}} $$
(6)

where \({{\varvec{U}}}_{n}\) is the noise subspace.

MUSIC is preferred for different reasons, including its ability to measure more than one signal simultaneously [138], in addition to its highly accurate and precise estimation for closed separated signals; however, the algorithm requires the knowledge of the number of incoming signals [138]. MUSIC is sensitive to coherent signals (the same signal is produced from different locations or the multipath of a signal transmitted from a single location); in such a case, \({U}_{n}^{H} a\left(\theta \right)\ne 0\), and hence it become very difficult to detect peaks, this obstacle can be overcome by using spatial smoothing.

4.1.4 Weighted MUSIC Algorithm

Modification on MUSIC algorithm was proposed to overcome the shorting of the algorithm,

$$ P_{WM} = \frac{1}{{{\varvec{a}}^{H} \left( \theta \right){\varvec{U}}_{n} {\varvec{U}}_{n}^{H} {\varvec{WU}}_{n} {\varvec{U}}_{n}^{H} {\varvec{a}}\left( \theta \right)}} $$
(7)

where \({\varvec{W}}\) is a weighting matrix, for \({\varvec{W}}=I\), this will retrieve the original MUSIC.

4.1.5 Min-Norm Algorithm

If \({\varvec{W}}\) is chosen as:

$$ {\varvec{W}} = e_{1} e_{1}^{T} $$
(8)

where \({e}_{1}\) the first column of the \(K\times K\) identity matrix, since the weight is of a minimum norm, has its first element equal to unity, and is contained in the noise subspace. This modification is known as the Min-Norm algorithm. Min-Norm shows less bias and, therefore, provides better results than the MUSIC. However, it's applied to ULA only.

4.1.6 Root-MUSIC Algorithm

It is a polynomial rooting version of MUSIC, which applies only to ULA. The algorithm enhances performance and reduces computational time; it is based on setting up roots of a polynomial, the M-zeros (\({\widehat{z}}_{m}\)), which lies on the unit circle with the largest phases magnitude yields the AOA estimate.

$$ \hat{\theta }_{m} = \cos^{ - 1} \left( {\frac{1}{kd}arg\left\{ {\hat{z}_{m} } \right\}} \right) $$
(9)

where M is the number of arrival signals, and m ranges from 1:M.

4.1.7 The ESPRIT Algorithm

The Estimation of Signal Parameters via Rotational Invariance Technique (ESPRIT) is featured with being simple, fast, robust, and power and memory-efficient [142]. The method can detect the direction of multiple incident signals at the antenna array, similar to MUSIC; the ESPRIT uses the eigenvalues to determine DOA and their corresponding delays [143]. The algorithm does search for all possible direction vectors to estimate DOA. Therefore, it reduced computational time and memory usage [144]. The steering matrix can be viewed as [138]:

$$ {\varvec{A}} = \left[ {\begin{array}{*{20}c} {{\varvec{A}}_{1} } \\ {last row} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {First row} \\ {{\varvec{A}}_{2} } \\ \end{array} } \right] $$
(10)

\({{\varvec{A}}}_{2}\) is related to \({{\varvec{A}}}_{1}\) by \(\Phi ,\) which is a diagonal matrix:

$$ {\varvec{A}}_{2} = {\varvec{A}}_{1} {\Phi } $$
(11)

The method relies on properties of Eigen decomposition:

$$ {\varvec{U}}_{s} = \left[ {\begin{array}{*{20}c} {{\varvec{U}}_{1} } \\ {{\varvec{U}}_{2} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {{\varvec{A}}_{1} {\varvec{T}}} \\ {{\varvec{A}}_{1} {\Phi }{\varvec{T}}} \\ \end{array} } \right] $$
(12)

where \({\varvec{T}}={\varvec{P}}{{\varvec{A}}}^{H}{{\varvec{U}}}_{s}{\left({\Lambda }_{s}-{\sigma }^{2}{\varvec{I}}\right)}^{-1}\) \({{\varvec{U}}}_{2}\) is related to \({{\varvec{U}}}_{1}\) by \(\Psi ,\) which is a diagonal matrix:

$$ {\varvec{U}}_{2} = {\varvec{U}}_{1} {\Psi } $$
(13)

Rearranging the equations:

$$ {\Psi } = {\varvec{T}}^{ - 1} {\Phi }{\varvec{T}} $$
(14)

The above equation can be solved using Least-Squares sense (LS-ESPRIT) or by the Total-Least-Squares method (TLS-ESPRIT) [138]. Both \(\Psi \) and \(\Phi \) share the same eigenvalues. The DOA is estimated by estimating the M-eigenvalues (\(\xi \)) of \(\Psi \) [145]:

$$ \hat{\theta }_{m} = \sin^{ - 1} \left( {\frac{1}{kd}arg\left\{ {\hat{\xi }_{m} } \right\}} \right) $$
(15)

where \(m=1:M\).

4.1.8 Weighted Subspace Fitting (WSF)

It is an asymptotically efficient parametric method that applies to any arbitrary array structure [146]; the weighting provides minimum variance estimates [147]. The algorithm is well-known for its accuracy; its criteria are a multimodal nonlinear multivariate optimization problem, leading to a high computational complexity [148]. Genetic algorithms are used to optimize the WSF and reduce the complexity [149]; it requires the knowledge of the number of directional sources [150].

Recall the covariance matrix \({\varvec{R}}:\)

$$ {\varvec{R}} = {\varvec{APA}}^{{\text{H}}} + \sigma^{2} {\varvec{I}} = {\varvec{U}}_{s} {\Lambda }_{s} {\varvec{U}}_{s}^{H} + \sigma^{2} {\varvec{U}}_{n} {\varvec{U}}_{n}^{H} $$
(16)

where \({\varvec{A}}\left(\theta \right),{\varvec{P}}\) and \(\Lambda \) are the steering matrix, the statistical expectation of transmitted signal, and diagonal matrix of eigenvalues. By setting \({\varvec{I}}={{\varvec{U}}}_{s}{{\varvec{U}}}_{s}^{H}+{{\varvec{U}}}_{n}{{\varvec{U}}}_{n}^{H}\) and removing the \({\sigma }^{2}{{\varvec{U}}}_{n}{{\varvec{U}}}_{n}^{H}\) from both sides and multiplying the equation by \({{\varvec{U}}}_{s}\) from the right:

$$ {\varvec{APA}}^{{\text{H}}} {\varvec{U}}_{s} + \sigma^{2} {\varvec{I}} = {\varvec{U}}_{s} {\Lambda }_{s} {\varvec{I}} $$
(17)

Rearranging the equation:

$$ {\varvec{U}}_{s} = {\varvec{AT}} $$
(18)

The angles are estimated by solving the following no-linear equation:

$$\hat \theta = \arg \left\{ {\min Tr\left\{ {\left( {(I - A{{({A^H}A)}^{^{ - 1}}}{A^H}){{\hat U}_S}W\hat U{{_S^H}_{}})} \right)} \right\}} \right\}$$
(19)

where \({\varvec{W}}={\left({\widehat{\Lambda }}_{{\varvec{s}}}-{\widehat{{\varvec{\sigma}}}}^{2}{\varvec{I}}\right)}^{2}{\widehat{\Lambda }}_{s}^{-1}\)

The WSF shows better performance when coherent signals exist, rooting version of Weighted Subspace Fitting was introduced for ULA this also referred to as MODE technique,

In [151], a comparison between Min-norm and root-MUSIC was performed; the estimated variance achieved by root-MUSIC is less than or equal to that of the root-min-norm method. In [152], a comparison study between root-WSF, MUSIC, root-MUSIC, ESPRIT, and Capon was conducted using ULA over many scenarios; it was found that Root-WSF has the best and stable performance. In [153], a performance comparison between MUSIC, root-MUSIC and ESPRIT was conducted; it was found that MUSIC has good performance with a low number of array elements and high SNR. While Root-MUSIC needs more elements, ESPRIT is less dependent on the number of array elements and performs better in low SNR scenarios. In [148], a comparison between GA-WSF was compared to MUSIC, it was found that the former outperform the latter in terms of accuracy over different set of SNR.

In [143], ESPRIT and MUSIC's performance was compared using different number antenna elements, different number of snapshots, and different SNR levels; it was found that both techniques provide accurate results. However, MUSIC shows more accurate and stable results. In [154], a comparison between MUSIC, MVDR, Min-norm, ESPRIT DOA algorithms were used in a MIMO-OFDM radar system. It was found that for determining a target, MUSIC shows the best resolution, followed by Min-norm and then ESPRIT. However, if two sources are closely spaced, Min-norm tends to have the best performance, followed by ESPRIT and then MUSIC. MUSIC tends to have the best results than Capon, Bartlett, and Min-norm, as observed by [145].

4.2 Time of Arrival Measurements

In TOA measurements using wave velocity, the flight time between the AP and the mobile is calculated to estimate the distance between the two sensors [155]. Waves used for localization include RF signals and acoustic signals [155]. The velocity of radio waves is 3 × 108 m/s, while acoustic waves' speed is 343.59 m/s [148]; therefore, RF measurements are more prone to errors. A measurement error of \(1 \mu s\) will lead to 300 m error using RF waves, leading to 0.00034359 m error using acoustic waves [156]. When the receiver has 1 GHz bandwidth, the receiver resolution will be about 1 × 10–9. Therefore, the maximum error will be 0.3 m, while when using 10 MHz bandwidth, the receiver resolution will be about 1 × 10–7 and the maximum error will be around 30 m [90].

TOA localization uses the concept of lateration [39]. To solve these equations, it must have three equations (i.e., three APs measurements must be used). In a 3D situation (\(x, y,z)\) four APs at least must be used to have a unique solution. TOA measurements are transformed into circular equations, by solving these equations the coordinates of the mobile will be inferred [157]. TOA circular equations are solved using NLS and LLS similar to RSS range-based positioning, as shown in Fig. 1 TOA circular equations are solved using NLS and LLS similar to RSS range-based positioning. Nonlinear Least Square (NLS) is more complicated and more accurate, while LLS is very sensitive to the presence of noise and NLOS [158].

Another related time measurement is the time difference of arrival (TDOA), where the time difference between two TOA measurements are used to formulate one equation. Possessing three TOA measurements will formulate two TDOA measurements; however, the third equation will depend on the other equations and hence does not provide new information. To have a unique solution, four APs measurements are used [157]. Possible locations for the mobile will be located on a hyperbola [157]. The intersection of two hyperbolae will find the location [157], as shown in Fig. 2.

All sensors, including the mobile, must be synchronized in TOA since the mobile's clock is not as accurate as of the base station's clock [159, [160]. As a result, there will be an error in estimating the flight time and, thus, localization errors; however, in TDOA, only APs are required to be synchronized [159].

On the other hand, TOA makes better use of existing information; one measurement of TOA will confine the possible locations for the mobile to be on a circle, using two measurements, the mobile will be located possibly in two locations using TOA, while using TDOA the location will lie on a hyperbola, using three measurements TOA can estimate a unique solution while TDOA will have one or possibly two solutions [161].

Another drawback of using TDOA is the sensitivity LOS existence [162]; due to hyperbole nature, a small amount of error will lead to a massive change in the curve, and the result will be less accurate [163]. Suppose the LOS path suffered from attenuation and fall below the threshold chosen to reject the noise. In that case, the next path with power above noise level is considered the first arrival path; this will lead to inaccurate TOA estimation and, thus, erroneous localization [134]. The wave encountered walls through the propagation. As a result, the propagation time will not be a distance-dependent only, but a material-dependent as well [164]. The excess delay is given by [164], where \({\epsilon }_{r}\) is the material relative permittivity, w is the thickness of the material and c is the speed of light:

$$ \tau_{ex} = \left( {\sqrt {_{r} } - 1} \right)\frac{w}{c} $$
(20)

The receiver's ability to separate closed multipath components depends on its bandwidth. A larger receiver bandwidth BW a better separation for closely multipath components. For example, a receiver with BW = 30 MHz will have a time resolution around 33.3 ns; if less than 33.3 ns separated the incoming signals, the receiver would consider them one signal. As a result, delay estimation will be inaccurate.

With BW's availability, the temporal resolution becomes high and better separates the closely multipath signals.

4.2.1 Correlation Based Techniques

Cross-correlation is one of the most comprehensive methods used to estimate TOA [134]. In Fig. 4, TOA estimation is demonstrated; once the signal arrived, it matched (correlated) to a known template \(p(t)\) by a match filter MF. The output of the filter is forwarded to a square law device where the sign of the correlated signal is removed, and then the time instant with the maximum peak value represents the time at which the signal arrived first [164].

Fig. 4
figure 4

TOA estimation using cross-correlation [164]

In multipath propagation adjacent, arrival peaks will have comparable amplitudes to the correct one; therefore, selecting the right peak becomes ambiguous, leading to large errors [164]. This method is preferred to its low complexity; however, it's prone to multipath and noise.

4.2.2 Inverse Fourier Transform Method (Deconvolution)

This method is based on the fact that the received signal is a convolution of the transmitted signal and the channel, knowing the channel response will give the signals' arrival times. In the frequency domain, the convolution becomes multiplication. Thus the channel response can be estimated by dividing the received signal by the transmitted signal [133]. The deconvolution problem is defined as if the output of a convolution process \(v(t)\) and the inputs \(s(t)\) are given what would be the other input \(h(t)\) [165]. The Fourier Transform of the channel response \(h\left(t\right)\) is given by: is expressed as [133]:

$$ H\left( f \right) = \frac{V\left( f \right)}{{S\left( f \right)}} - \frac{N\left( f \right)}{{S\left( f \right)}} $$
(21)

where \(V\left(f\right)\) and \(S\left(f\right)\) are the Fourier Transform of the received signal \(v\left(t\right)\), the transmitted signal \(s\left(t\right)\), and additive white Gaussian noise (AWGN) \(n(t),\) respectively.

By taking the inverse Fourier transform of \(H(f)\) the channel response is estimated and hence TOA. Practically this is done by sweeping the frequency from the lowest frequency (\({f}_{c}-\frac{B}{2}\)) to highest frequency (\({f}_{c}+\frac{B}{2}\)) by a step of \(\Delta f\) [166]. Where \(B\) is the bandwidth, \({f}_{c}\) is the center frequency and \(\Delta f\) is the frequency segment. The range of frequencies is shown in Eq. 22.

$$ f\left( n \right) = f_{0} + \left( {n - 1} \right)\Delta f $$
(22)

where \({f}_{0}\) is the lowest frequency and \(n\) is ranged from 1 to \(K\). For every frequency step, different values of \(V\left(f\right), S\left(f\right), H\left(f\right)\) and \(N(f)\) will be observed. A matrix can be generated to represent channel behavior over the bandwidth:

$$ {\varvec{V}} = {\varvec{S}}\cdot{\varvec{H}} + {\varvec{N}} $$
(23)

This method performs better than the convolution-based method for resolving the closely separated multipath, but it suffers from the noise enhancement \(\frac{N(f)}{S\left(f\right)}\). Another drawback of this method is the size of memory required for the matrices [133].

4.2.3 Maximum Likelihood Techniques

In many works in literature, TOA is considered as a special case of channel estimation [164]; in such a case, channel coefficients and time delays are unknowns, the log-likelihood function for the unknown parameters is [133]:

$$ \mho = \left[ {\alpha_{1} ,\alpha_{2} , \ldots ,\alpha_{M} ,\tau_{1} ,\tau_{2} , \ldots ,\tau_{M} } \right] $$
(24)
$$ {\mathcal{L}}\left( \mho \right) = - \mathop \smallint \limits_{0}^{{T_{0} }} \left| {x\left( t \right) - \mathop \sum \limits_{m = 1}^{M} \alpha_{m} s\left( {t - \tau_{m} } \right)} \right|^{2} dt $$
(25)

where \({T}_{0}\) is the symbol duration. The maximum likelihood estimate is given by [133]:

$$ \hat{\mho }_{ML} = \arg \max_{\mho } \left\{ {{\mathcal{L}}\left( \mho \right)} \right\} $$
(26)

The algorithm is complicated, requires a huge computational solution, and is not effective in closely spaced multipath [133].

4.2.4 Subspace Techniques

Considering Eq. 23, if \(L\) observations are collected and at each frequency step, and \(M\) signal delays are received at each observation, channel \({\varvec{H}}\) will include information about the frequency segments and time delays. It's possible to view the time delays and the channel behavior on each delay as [166]:

$$ {\varvec{H}}_{K \times 1} = {\varvec{U}}_{K \times M} \Delta A_{M \times 1} $$
(27)

where

$$ {\varvec{U}} = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 \\ {e^{{ - j2\pi \Delta f\tau_{1} }} } \\ \end{array} } \\ {\begin{array}{*{20}c} {\begin{array}{*{20}c} . \\ . \\ . \\ \end{array} } \\ {e^{{ - j2\pi \left( {K - 1} \right)\Delta f\tau_{1} }} } \\ \end{array} } \\ \end{array} } & \ldots & {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 \\ {e^{{ - j2\pi \Delta f\tau_{M} }} } \\ \end{array} } \\ {\begin{array}{*{20}c} {\begin{array}{*{20}c} . \\ . \\ . \\ \end{array} } \\ {e^{{ - j2\pi \left( {K - 1} \right)\Delta f\tau_{M} }} } \\ \end{array} } \\ \end{array} } \\ \end{array} } \right] $$
$$ A = \left[ {\begin{array}{*{20}c} {\alpha_{1} e^{{ - j2\pi f_{0} \tau_{1} }} } \\ {\begin{array}{*{20}c} . \\ . \\ . \\ \end{array} } \\ {\alpha_{M} e^{{ - j2\pi f_{0} \tau_{M} }} } \\ \end{array} } \right] $$

The nth row of \(U\) times \(A\) represents the channel response at the nth frequency [167].

Substituting the value of H in Eq. 30 into Eq. 29 and multiplying both sides of Eq. 23 by \({{\varvec{S}}}^{-1}\) [166]:

$$ {\varvec{Y}} = {\varvec{UA}} + {\varvec{W}} $$
(28)

where (\({\varvec{Y}}={{\varvec{S}}}^{-1}{\varvec{V}}\)) and (\({\varvec{W}}={{\varvec{S}}}^{-1}{\varvec{N}}\)). The covariance matrix can be taken for Eq. 28 by taking \(L\) observations as in [166]:

$$ {\varvec{R}} = {\varvec{UAA}}^{H} {\varvec{U}}^{H} + {\varvec{WW}}^{H} = {\varvec{UR}}_{{{\varvec{AA}}}} {\varvec{U}}^{H} + \sigma^{2} {\varvec{I}} $$
(29)

Assuming that both signal delays and noise are orthogonal where \({\sigma }^{2}\) is the noise variance.

Using the concept of eigenvalues and eigenvectors, \({\varvec{R}}\) has \(K\) dimensional space which can be decomposed into \(M\) signal delays sub-spaces \({{\varvec{Q}}}_{S}\) and (\(K-M\)) noise sub-spaces \({{\varvec{Q}}}_{N}\) [168], time delays are estimated as:

$$ S = \frac{1}{{{\varvec{u}}\left( \tau \right)^{H} {\varvec{Q}}_{N}^{H} {\varvec{Q}}_{N} {\varvec{u}}\left( \tau \right) }} = \frac{1}{{\left| {{\varvec{Q}}_{N} {\varvec{u}}\left( \tau \right)} \right|^{2} }} $$
(30)

In Fig. 5, a comparison between MUSIC and IF algorithms is presented, the MUSIC algorithm outperforms the IF algorithm provided by its ability to distinguish closed multipath components. Despite the advantages of the MUSIC algorithm, however, the multipath number is a priori information for the algorithm [133].

Fig. 5
figure 5

TOA estimation comparison between MUSIC and IFT algorithms

The measurements are assumed to be stationary; therefore, the matrix is Hermitian and Toplitz. However, in real-life scenarios, samples taken are finite and small, which will not make the matrix to be Toplitz; the Covariance matrix can be further improved by using the forward–backward covariance matrix (FBCM) [166]:

$$ {\varvec{R}}_{{{\varvec{FB}}}} = \frac{1}{2}\left( {{\varvec{R}} + {\varvec{JR}}^{\user2{*}} {\varvec{J}}} \right) $$
(31)

where \({\varvec{J}}\) is an exchange matrix; elements of the matrix are zero except the anti-diagonal where elements are ones.

Hybrid algorithms are used TOA and DOA in [169], where a two-dimensional MUSIC algorithm is used to perform localization in an IR-UWB system. And as in [170], a two-dimensional MUSIC algorithm is used in TOA and DOA estimation using a uniform circular array.

4.3 Received Signal Strength

The idea behind localization using RSS is to establish a one–one relationship between the target location and it's RSS [171]. RSS level decreases with the distance between transmitter and receiver (known as RSS's decay law with distance). However, the RSS-distance relationship is not necessarily linear, especially in indoor environments due to multipath [172]. Moreover, since 50% of the human body is water, people's movement causes fluctuations of RSS with time, which reduces localization accuracy [65, 173, 174].

RSS measuring requires only power detectors that are available in WLAN, UWB, Zigbee, Bluetooth, and infrared devices. Utilizing WLAN for localization purposes is advantageous due to its continuous monitoring, low cost, and its capability of working unattended for years [175, 176]; however, this may introduce interference problems with other devices works on the same frequency bands such as microwave ovens and Bluetooth devices. Thus, the probability of error may increase, although the correlation becomes trivial by using different channels [172].

RSS systems do not rely on timing information; this makes them more robust to multipath. Moreover, synchronization between devices is not required [177]. RSS localization systems excel in short-range distances; however, it provides lower accuracy in long-range distances compared to TOA systems favorable for outdoor applications [178].

On the other hand, training and complex matching algorithms are needed to perform localization [179]. Moreover, RSS is sensitive to shadowing, low signal to noise ratio (SNR), and NLOS propagation.

Many RSS-based localization algorithms are presented in the literature, including range-based position, radio-frequency fingerprinting technique, proximity-based position, and probabilistic estimation [133]. A brief discussion of these algorithms is introduced in the following subsections.

Range-based position

Localization using a range-based technique includes two steps: ranging and lateration [180]. In the first step, a distance-power relationship is formulated depending on the observed RSS values; in the latter step, the mobile's location is inferred based on the distances obtained using least square techniques. This localization type is preferred due to its ease; however, it suffers from varying RSS measurements [181].

RSS values vary randomly within indoor environments. As the Tx-Rx distance increases, the SS level does not follow a monotonic decrease. This nonlinearity arises due to the fading effect. At any receiver point, the received power is the transmitted power from the transmitter minus losses; those losses are due to distance (the area mean propagation loss), shadowing (local mean propagation loss), and multipath (fast fading). Access points (AP) locations tend to be known in indoor environments, while the mobile's location is unknown. For AP located at a known location (\({x}_{i},{y}_{i}\)) and a mobile located at an unknown location (\(x,y\)), the distance \({d}_{i}\) between AP and the mobile is given by [133, 182]:

$$ d_{i} = d_{0} \left[ {10^{{\frac{{P\left( {d_{i} } \right) - P_{t} - \overline{PL} \left( {d_{0} } \right) + \chi_{\sigma } }}{10}}} } \right]^{{\frac{ - 1}{n}}} $$
(32)

where \({d}_{0}\), \({P}_{t}\),\(\overline{PL}\left({d}_{0}\right)\), \({\chi }_{\sigma }\) and n are reference distance, transmitted power, average path loss at at \({d}_{0}\), Gaussian random variable with zero mean represents shadow fading and path loss exponent.

For an omnidirectional antenna, possible mobile locations may lie on a circle; mobile coordinates are the solution of the circle equation shown below:

$$ d_{i}^{2} = \left( {x - x_{i} } \right)^{2} + \left( {y - y_{i} } \right)^{2} $$
(33)

Provided that \({d}_{i}\) value is given by applying Eq. 33. Since \({d}_{i}\) and (\({x}_{i},{y}_{i}\)) are knowns, the remaining unknowns are (\(x,y\)), which needs at least another equation to be solved.

Estimation of environmental parameters \({\chi }_{\sigma }\) and \(n\) is accomplished by taking training data (SS collected from known locations), by fitting these data into a model using linear regression, the unknown parameters are estimated [126].

Due to the effect of noise and NLOS, the exact solution for a mobile's location may not exist. In this case, least-square methods are applied. These methods are categorized into Non-linear least square (NLS) and linear least square (LLS) [133]. The principle is as follows: the available information includes known parameters (\({x}_{i},{y}_{i}\)) and the measured parameter \({d}_{i}\). The target is to estimate the unknown location of the mobile (\(x,y\)). This is accomplished by searching for all possible locations (\(\widehat{x},\widehat{y}\)) such that the distance between this point and (\({x}_{i},{y}_{i}\)) is approaching \({d}_{i}\) as much possible for all \(N\) APs, as shown in Eq. 34 [133]:

$$ \left( {\hat{x},\hat{y}} \right) = \arg \mathop {\min }\limits_{x,y} \mathop \sum \limits_{i = 1}^{N} \left[ {\left( {x - x_{i} } \right)^{2} + \left( {y - y_{i} } \right)^{2} - d_{i}^{2} } \right] $$
(34)

The above approach is the NLS method, which depends on its initial guess; therefore, it's required to perform several iterations to get better results; however, this requires enormous computations. For a less computational cost, the LLS approach is performed; nevertheless, less accurate results are obtained [133].

A possible way to perform linearization is by taking the mean of all APs measurements then perform a subtraction from each observation [133].

Lateration is prone to outliers (when the estimated location is too far away from the actual one). To give robustness to the system, outliers measurements are excluded by taking the median value of the sum [183].

Measurements accuracy depends on many parameters, including the unknown \({P}_{t}\). Another problem is the fluctuation of RSS values with time. To remove the need for having a priori knowledge of \({P}_{t}\) and to reduce the effects of environmental changes, Differential RSS (DRSS) is adopted [124, 125]. Although this method reduces dependence on knowing the value of \({P}_{t}\), it has poor performance in indoor environments compared to RSS [125].

4.4 Radio-Frequency Fingerprinting

Constructing a signal propagation model can be a very challenging task due to the complexities of indoor environments; rather than modeling RSS behavior, another approach can be used known as the Radio Frequency-fingerprinting technique [176, 184, [185]. Several fingerprinting approaches are used; the most straightforward and most popular approach is Pattern Recognition Technique [91].

RF–fingerprinting consists of two phases; the offline phase and the online phase. In the offline phase (Training phase), the area of interest is divided into grids. In each grid, many RSS are collected from surrounding APs and averaged to remove the fast fading effect; averaged RSS with corresponding location (also called reference points RP) are stored in a database known as Radio map [186].

In the Online phase (Real-time phase), RSS measurements are collected from unknown locations called test points (TP); these measurements are then compared with the offline phase database. One possible method is to estimate the smallest Euclidean distance between the test point measurements and the radio map subspace [187]. The RP with the corresponding smallest Euclidean distance represents TP's closest location [186], as shown in Eq. 35.

$$ \arg \mathop {\min }\limits_{RP\left( k \right)} \sqrt {\mathop \sum \limits_{l = 1}^{L} \left( {TP_{l} - RP\left( k \right)_{l} } \right)^{2} } \quad \forall {\text{ k}} = 1:{\text{K}} $$
(35)

where \(k\) is the kth RP, other methods return the k-nearest locations, which have the lowest values of Eq. 35 [188].

The level of achieved accuracy depends heavily on how many APs and RPs are used. Adding more APs will reduce the possibility of having ambiguous results and tend to enhance the localization process. Adding more RPs will enhance resolution; however, this will cost more labor work. Another disadvantage of this approach is the need for regular updates for the radio map as the building layout or the number of operating APs [133, 189]. In [190], it was found that adding more APs will increase the robustness of the localization. Also, it was suggested that even systems with a low number of RPs could afford good accuracy provided that there was an adequate number of APs in the system. Other fingerprinting approaches based on machine learning methods including K- Nearest Neighbor (KNN) [91], Support Vector Machines (SVMs) [191], Multi-layer Perceptron (MLP) [192], Decision Trees [193] and Random Forest [194]. In [192], KNN and MLP classifiers' performance was compared, it was found that KNN has better performance in terms of accuracy and computational time.

4.4.1 Proximity-Based Position

Mobiles can receive power signals until a certain level, where below such a level, the signal will be considered as noise; sensitivity is the term used to express this level. When RSS is below sensitivity, the mobile and AP are no longer in connection, in which the location of the mobile is assumed to be out of the AP coverage; when the RSS level is higher than the threshold, the AP and the mobile are connected, and hence it's within the AP coverage [117, 118]. Proximity gives an idea of whether the mobile is within the coverage area of an AP or not; if the coverage area is large, the information may not be beneficial. However, considering the intersection between APs coverage, this will enhance localization by shrinking the areas of possible locations [195].

4.4.2 Maximum Likelihood Estimation

In this method, RSS behavior is modeled as a random variable; two stages are performed similarly to the RF-Fingerprinting approach. In the first stage, the study area is divided into J grids, SS measurements are collected from the grid; these data are processed to give a probabilistic distribution for each location's SS behavior. In the second stage, the mobile's RSS from surrounding APs are collected from an unknown location and stored in a vector. Then mobile's location is inferred based on Maximum Likelihood Estimation (MLE) as shown in Eq. 36 [196]:

$$ \left( {\hat{x},\hat{y}} \right) = \arg \mathop {\max }\limits_{{L_{j} }} \left( {P\left( {L_{j} |{\varvec{ss}}} \right)} \right) $$
(36)

where \(P\left({L}_{j}|{\varvec{s}}{\varvec{s}}\right)\) is the probability that the mobile is located at the location \({L}_{j}\) given that the RSS vector is (\({\varvec{s}}{\varvec{s}}\)). Assuming M APs. The above equation can be expressed as:

$$ P\left( {L_{j} |{\varvec{ss}}} \right) = \frac{{P\left( {{\varvec{ss}}|L_{j} } \right)P\left( {L_{j} } \right)}}{{P\left( {{\varvec{ss}}} \right)}} $$
(37)

where

$$ P\left( {{\varvec{ss}}|L_{j} } \right) = \mathop \prod \limits_{i = 1}^{M} P\left( {ss_{i} |L_{j} } \right) $$
(38)
$$ P\left( {{\varvec{ss}}} \right) = \mathop \prod \limits_{i = 1}^{M} P\left( {ss_{i} } \right) $$
(39)

\(P\left({L}_{j}\right)\) is he probability of each grid to be the location where the mobile locates, \(P\left({ss}_{i}|{L}_{j}\right)\) is the probability distribution of RSS from the ith AP at the jth grid, and \(P\left({ss}_{i}\right)\) is the probability for RSS from the ith AP in all grids. Since the mobile location is unknown, then \(P\left({L}_{j}\right)\) is equal.

The algorithm shows a precise analysis of the given data; however, it suffers from extensive labor work [133].

Figure 6 and 7 give comparisons between RSS based algorithm, including the radar algorithm (RF-fingerprinting), GR gridded radar (RF-fingerprinting), ABP (MLE), H1 (MLE), LLS, and NLS (Range based) using WLAN network [133, 197]. As seen in these figures, MLE and RF fingerprinting performance are very similar, while for range-based algorithms, the performance is relatively poor. Also, it can be seen that NLS is more accurate compared to LLS.

Fig. 6
figure 6

Performance comparison between RSS based algorithms [197]

Fig. 7
figure 7

Performance comparison between RSS based algorithms [133]

In Table 4, a comparison between AOA, TOA, and RSS localization methods is presented; the comparison includes the accuracy, sensitivity to multipath, requiring additional hardware, advantages, and disadvantages [98, 133, 198].

Table 4 Localization methods techniques comparison

5 Conclusions

A review of indoor localization techniques and wireless technologies was introduced. This review aims to make a comprehensive awareness of localization techniques and technologies used in indoor environments. The survey overviews the localization system technologies, including satellite-based navigations, inertial navigation systems, magnetic-based navigations, sound-based technologies, optical-based technologies, and RF-based technologies.

The review also investigates localization detection techniques, including proximity-based techniques, scene analysis, triangulations, and dead reckoning. And finally, the paper introduces the most common localization algorithms and methods, including the angle of arrival (AOA), time of arrival (TOA), and received signal strength (RSS). Choosing the localization method depends on many factors, including cost, available resources, type of environment, and accuracy required; the most powerful technique is the one that gives high accuracy with less computation.