1 Introduction

Positioning is one of the core technologies of location-based services (LBS). It also plays a significant role in many applications of the Internet of Things (IoT) and artificial intelligence (AI). With the extensive urban development of recent years, indoor positioning is becoming more and more important. According to a report by the U.S. Environmental Protection Agency, people spend 70–90% of their time indoors (Weiser 2002). A wide area of applications has emerged for indoor emergency rescue (Federal Communications Commission 2015), precision marketing in shopping malls, asset management and tracking in the smart factory, mobile health services, virtual reality games, and location-based social media (Sakpere et al. 2017; Davidson and Piché 2016; Ali et al. 2019). By 2025, the global indoor LBS market is expected to reach USD 18.74 billion (Globe Newswire 2019).

Global navigation satellite systems (GNSS) have achieved great success in positioning in outdoor open areas, and positioning accuracy is able to achieve a sub-meter level with various assisted technologies (Kaplan and Hegarty 2005). However, due to the weakness of signal power, GNSS signals cannot be received indoors sufficiently to provide continuous and reliable positioning. In many cases, especially in deep indoor areas, GNSS signals can even be totally blocked. Although various technologies have been developed for indoor positioning, which includes WiFi, Bluetooth, ultra-wideband (UWB), pseudolites, magnetic fields, sound and ultrasound, and pedestrian dead reckoning (PDR), it is still challenging to achieve an accurate, effective, full coverage and real-time positioning solution indoors (Maghdid et al. 2016). The main reasons are the constraints of spatial layout, topology, and the complex signal environment indoors (Zafari et al. 2019). To be more specific, the reasons are summarized as follows.

The indoor environment is complex and radio waves are often reflected, refracted, or scattered by obstacles indoors, which leads to non-line-of-sight (NLOS) propagation. NLOS propagation can cause a large deviation error in the positioning and seriously affect the localization accuracy.

Indoor space layout and topology are frequently changed and the number of people in the indoor space varies, for example, between peak and off-peak hours. Thus, signal propagation and the fields of sound, light, electricity, and magnetism can all be changed accordingly. Such changes will greatly affect the results when using the positioning methods with the feature or field matching.

The unpredictability of indoor pedestrian motions, such as frequent changes in speed and direction (Morrison et al. 2012), and motion without any predefined paths (Saeedi 2013) also increases the difficulty of continuous estimation of pedestrian position.

With the development of information technology, the smartphone has become more and more popular. As shown in Fig. 26.1, the smartphone has a large number of built-in sensors, such as accelerometers, gyroscopes, magnetometers, barometers, light sensors, microphones, speakers, and cameras, as well as Bluetooth chips and WiFi chips. Such sensors were not originally developed for the use of the positioning. Nevertheless, for applications in the mass market, it is promising to achieve low cost, continuity, and high usability mode for indoor positioning with the built-in sensors in a smartphone with appropriate technology (Davidson and Piché 2016).

Fig. 26.1
figure 1

Multiple sensors embedded in the smartphone

In this chapter, we present a survey of indoor positioning with smartphone sensors. The state-of-the-art technologies will be reviewed. We will comprehensively compare the accuracy, complexity, robustness, scalability, and cost of different technologies, and comment on the pros and cons of the technologies in the context of different application scenarios. Moreover, from the perspective of developing the technology with high accuracy, high usability, high durability, and at low cost, we further discuss the directions of future development in this area.

The organization of the book chapter is as follows: in Sect. 26.2 we review the technologies of the smartphone for indoor positioning in detail. In Sect. 26.3 we summarize the difficulties in indoor positioning. In Sect. 26.4 potential future trends in smartphone indoor positioning are discussed. Conclusions are drawn in Sect. 26.5.

2 The State-of-the-Art Indoor Positioning with Smartphones

This section focuses on the state of the art of indoor positioning technology with smartphone sensors. The positioning technology can be classified into two categories: positioning with RF and positioning with built-in sensors.

2.1 Positioning Technology of RF Signals

Currently, WiFi, Bluetooth, and wireless cellular communication signals are the main radio-frequency signals that smartphones support for the purpose of data transmission. The methods of indoor positioning vary due to differences in carrier frequency, signal strength, and the effective transmission distance of the signals.

2.1.1 WiFi Positioning Technology

WiFi is a wireless local area network (WLAN) technology based on the IEEE 802.11 family of standards (IEEE Standard for Information Technology 2013). With the advantages of flexibility, convenience, rapid deployment, and low cost, WiFi technologies have now been widely deployed indoors and have been used for indoor positioning. There are basically two methods used for positioning with WiFi signals: triangulation and fingerprinting.

In the triangulation method, the smartphone measures the received signal strength index (RSSI) of each of multiple WiFi access points (APs), and then estimates the distances between the smartphone and each of the APs using a model of long-distance path loss (Liu et al. 2007). The model is a radio-propagation model that predicts the path loss a signal encounters inside a building or densely populated area. However, due to the strong reflections and scattering conditions indoors, RSSI measurements are seriously attenuated by multipath and NLOS signal propagation. Therefore, it is a challenging task to accurately estimate the position with RSSI measurements and the path loss model has given the various fading effects. In the method of triangulation, the other way to get the distance between the transceivers is to measure the time of flight (TOF; Schauer et al. 2013). Tests have shown that indoor multipath and the time-varying interruption service in WLAN have a great impact on the accuracy of TOF measurement. Ranging accuracy can be improved by proper design of filters and by smoothing of the raw measurements.

In the fingerprint positioning method (Bahl and Padmanabhan 2000), the basic idea is to match elements in a database to particular signal-strength fingerprints in the area at hand. The method operates in two phases: the training phase and the online positioning phase. In the training phase, a radio map is created based on the reference points within the area of interest. The radio map implicitly characterizes the RSSI position relationship through the training measurements at the reference points with known coordinates. In the online positioning phase, the smartphone measures RSSI observations and the positioning system uses the radio map to obtain a position estimate. The advantage of the method is that it does not need to know either the exact model of the channel attenuation between the transceivers or the coordinates of the WiFi APs. The disadvantage is that the signal is easily modified by the surroundings, the mismatch rate is relatively high in the open space indoors, and to build and update the fingerprint database is a time-consuming process. The fingerprinting method has been widely investigated in the literature. Recent surveys of the RSSI fingerprint method can be found by Khalajmehrabadi et al. (2017), He and Chan (2015), and Davidson and Piché (2016). In general, the methods can be divided into three types: deterministic approaches, probabilistic methods, and pattern-recognition methods. The main factors affecting the accuracy of WiFi positioning include inter-channel interference from different APs (Pei et al. 2012) and hardware differences in smartphones (Schmitt et al. 2014). Khalajmehrabadi et al. (2017), He and Chan (2015), and Davidson and Piché (2016) give a thorough summary of the factors that affect WiFi fingerprint positioning. Currently, WiFi positioning systems using RSSI fingerprints include RADAR (Bahl and Padmanabhan 2000), Ekahau (ekahau.com), and Horus (Youssef and Agrawala 2008), and the positioning accuracy is about 2–5 m.

Benefiting from the performance improvement of the WiFi receivers, commercial WiFi receiver modules are now able to provide channel state information (CSI; Wang et al. 2016). CSI gives more details on the multipath information of the channel attenuation than the RSSI measurements, which only provide the power measurement of a received radio signal. Research shows that using CSI information to build the fingerprint database can effectively improve the accuracy of indoor positioning (Wang et al. 2015b; Wu et al. 2012).

With the ratification of IEEE 802.11n standardization, the technology of multiple antennae has been introduced to WiFi transmission. Thus, angle of arrival (AOA) can be estimated in the WiFi positioning. The literature (Vasisht et al. 2016; Kotaru et al. 2015) simultaneously estimates the AOA and the time of arrival (TOA) to achieve positioning results with an accuracy of decimeter or centimeter, respectively. However, such methods are applied in the AP base station and are not applicable to a user-centric positioning with smartphones, in which only one antenna is embedded.

The main factor that limits WiFi fingerprint positioning in massive applications is the difficulty in effectively constructing and adaptively updating the radio map, which is both time and labor-consuming. The methods for reducing the costs of building and updating the radio map include crowdsourcing (Zhuang et al. 2015), LiDAR-based simultaneous localization and mapping (SLAM; Tang et al. 2015), and the use of interpolation (Zhao et al. 2016). In addition, with the increasing attention to the issues of information security and personal privacy (Chen et al. 2017), the scanning rate of WiFi signals have been adjusted to 1/30 Hz or even lower, which increases the latency for the positioning.

2.1.2 Bluetooth Positioning Technology

Bluetooth is a radio-frequency signal based on the IEEE 802.15.1 protocol, which is mainly developed for wireless personal area networks (WPAN). It operates in the 2400–2483.5 MHz range within the same ISM 2.4 GHz frequency band as WiFi IEEE 802.11 b/g. The transmission data is split into packets and exchanged through one of 79 designated Bluetooth channels, each of which has 1 MHz in bandwidth. Positioning with Bluetooth Classic (prespecification4.0) has used various techniques from proximity to trilateration to fingerprinting. The positioning accuracy is about 4 m (Chen et al. 2011a, 2013, 2015). However, in the specification, the scanning interval of a mobile handset to the nearby Bluetooth beacons can be more than 10 s, within which time the indoor pedestrian could travel 15 m or more. Due to the low scan rate, positioning using Bluetooth Classic has not proved popular (Faragher and Harle 2015).

In 2011, Bluetooth Low Energy (BLE), which was originally branded as Bluetooth 4.0, was created. Compared to classic Bluetooth, BLE provides an improved data rate of 24 Mbps and coverage range of 70–100 m with higher energy efficiency (Zafari et al. 2015). BLE also has a very short connection time (only a few milliseconds) and then goes into sleep mode until a connection is reestablished, which achieves low power consumption. With this property, BLE can be powered by a single battery which could last up to five years. Compared with WiFi, which is typically placed near power outlets, BLE, with its own batteries, is thus free to place beacons to provide good signal geometry with optimized signal coverage. In addition, with a much higher scan rate than WiFi, BLE can average out the occasional outliers caused by interference or multipath effects, and improve the tracking accuracy.

At the moment, the most popular BLE beacon ecosystems are Apple’s iBeacon, Google’s URI Beacon and Eddystone, and Radius Networks’ Alt Beacon. Apple’s iBeacon system (Apple 2014), based on RSSI ranging, has a positioning accuracy of 2–3 m in a typical office environment. A Bluetooth antenna array system, developed by Quuppa(2020), can achieve a sub-meters positioning accuracy. In January 2019, a new specification of Bluetooth 5.1 enhances location services with its new feature of direction-finding. With this new feature, it is possible that Bluetooth devices will be able to pinpoint physical location to centimeter accuracy indoors (How-To Geek 2019).

2.1.3 Cellular Positioning Technology

The cellular network is originally designed for dedicated mobile communication systems. Nevertheless, the large cellular communication infrastructure can still be reused for positioning purposes, providing an added value to network management and services (Del Peral-Rosado et al. 2017). In 2G/3G/4G mobile communication systems, cellular positioning is achieved by a localization module implemented in the base station, which is also known as the RAN (radio access network) positioning method. The most significant advantage of cellular positioning technology is to achieve seamless indoor and outdoor positioning, while the disadvantage is that the positioning accuracy is relatively low, generally in tens of meters to hundreds of meters (Zhao 2002; Lakmali and Dias 2008). Ericsson uses a long-term evolution (LTE) signal to adopt the OTDOA (observed time difference of arrival) method, and the positioning accuracy can reach 50 m, with a reliability of 97% (Ericsson Research Blog 2015). But the positioning results cannot meet the needs of most indoor positioning applications.

The upcoming fifth-generation (5G) of mobile communication systems are expected to improve positioning accuracy in cellular networks, which is a benefit of the key features of 5G, such as small cells, device-to-device (D2D) communication, heterogeneous networks (Het-Net), massive multi-input multi-output (MIMO), and millimeter-wave (mm-Wave) communication (Talvitie et al. 2017). In particular, through D2D communications, mobile stations or smartphones can determine their locations in a cooperative manner, which would not only increase the localization accuracy but also decrease the time delay. The massive MIMO technologies will offer more possibilities for accurate directional measurements. Dense networks with small cells will lead to a large number of line-of-sight (LOS) links, and higher signal bandwidths will improve the accuracy of range measurements, and increase the resolution of multipath.

2.2 Positioning Technology Based on Embedded Sensors

Built-in sensors for smartphones include accelerometers, gyroscopes, magnetometers, barometers, light intensity sensors, cameras, microphones, etc. These sensors are not designed for positioning, but measurements from such sensors can be used for indoor positioning with proprietary methods. The methods include PDR, geomagnetic matching, visual positioning, audio, and sound positioning.

2.2.1 Pedestrian Dead Reckoning

With the advances in micro-electro-mechanical system (MEMS) technology, more and more low-cost inertial measurement units (IMUs) are integrated into smartphones. Accelerometers, gyroscopes, and magnetometers are among the most popular sensors embedded; due to their low cost, their stability and measurement accuracies are relatively low. It is therefore difficult to use the strap-down inertial navigation method. As an alternative, PDR can be applied in indoor positioning using the measurements from low-cost MEMS sensors (Robert 2013). In more details, PDR uses an accelerometer to detect the number of steps, measures the walking speed, and determines the heading by magnetometer and gyroscope, and then calculates the relative position of the pedestrian by computing the speed and heading (Chen et al. 2011b; Deng et al. 2016).

The PDR algorithm (Fig. 26.2) is able to provide continuous positioning results. Without the process of integration, it is a relatively simple but effective method to use the raw measurements from the low-cost sensors. The difficulty of PDR lies in the heading estimation, which is affected by magnetic interference in the indoor environment. It is, therefore, necessary to integrate with other positioning algorithms, such as WiFi, BLE, or geomagnetic matching, which are able to provide absolute positioning results, to improve the heading estimate as well as to reduce the accumulating errors of relative positioning from PDR (Deng et al. 2016).

Fig. 26.2
figure 2

PDR system block diagram

2.2.2 Magnetic Matching (MM) Positioning Technology

MM positioning technology takes the magnetic field as the signal for a fingerprint and fulfills the indoor positioning by matching characteristics of the magnetic field in the indoor environment. Similar to the process of WiFi fingerprinting, MM positioning is also divided into two steps: to set up a geomagnetic fingerprint database, and to match geomagnetic features for positioning. Because of the spatial correlation of the magnetic field, contour matching, for example, dynamic time warping, can be used in the MM to achieve more robust matching results. At present, most smartphones are integrated with magnetometers, and the magnetic field can be obtained when the phone is turned on. So, MM positioning technology is suitable for smartphone positioning. However, indoor magnetic field signals often change, so it is difficult to build an accurate fingerprint database of magnetic fields in practice. The University of Oulu in Finland proposed an indoor positioning system, named Indoor Atlas, which combines magnetic fields with built-in sensors (Thompson 2020), which is able to achieve a positioning accuracy of 0.1–2 m.

2.2.3 Visual Positioning Technology

The visual positioning for smartphones is mainly based on monocular vision since smartphones commonly use a monocular camera. One method is based on image matching, where the positioning is computed by matching the current photos with the photos stored in the image database. The methods of density matching and structure from motion (SFM) can be used to match the image features in the image feature database. Another method is based on visual gyroscopes and visual odometer technology (Ruotsalainen 2012; Ruotsalainen et al. 2013). The visual gyroscope uses a monocular camera to obtain a vanishing point of each image and uses a vanishing point change of two adjacent images to obtain the heading change rate. The visual odometer obtains the relative translation of pedestrians by matching photos taken in time series. The challenges of using the monocular camera as a visual gyroscope and visual odometer are in the sharp turns for the pedestrian where there are fewer feature points for matching in photos. The literature (Ruotsalainen et al. 2016) lists methods for merging visual gyroscopes and visual odometers with other IMUs.

Visual positioning technology can achieve decimeter-level or even centimeter-level accuracy in scenarios with sufficient light and image features. When an optical camera is combined with depth cameras (such as Google’s Tango technology), the positioning accuracy can be further increased. But, in general, the algorithm of visual positioning is computationally complex and has high power consumption. With further improvement in the computation performance and storage capacity of smartphones, the method is promising in pedestrian navigation.

2.2.4 LED Visible Positioning Technology

Visible light positioning can be divided into two categories: the first is to locate a specific optical signal by modulating the light source. For example, an LED lamp emits a high-frequency flicker signal that is invisible to the naked eye, and the LED light signal is received by the smartphone sensors to calculate pedestrian position information. The byte light positioning system (Ganick and Ryan 2012) is based on such a principle, and the positioning accuracy can reach the one-meter level. The second is based on the pattern-matching method, which uses the time–frequency characteristics of ambient light to establish the environmental light fingerprint database in advance. In the real-time positioning phase, the measured light intensity is matched with the ambient light fingerprint database to achieve positioning (Liu et al. 2014). The built-in camera of the smartphone can sense light intensity and high-frequency light information, so the above optical positioning technology can be easily applied to indoor positioning of smartphones.

2.2.5 Ultrasonic Positioning Technology

Ultrasonic positioning technology uses the method of round-trip time ranging. The most popular ultrasonic positioning systems are the Active Bat system (Ward and Jones 1997) and the Cricket system (Priyantha et al. 2000). The positioning accuracy of the Active Bat system is within 9 cm with a 95% confidence interval. Although the ultrasonic positioning system has high positioning accuracy, the current smartphones have not been equipped with dedicated ultrasonic modules for transmitting or receiving ultrasound signals. However, the microphones in the current smartphones can monitor ultrasonic signals with the frequency ranging from 16 to 22 kHz. Determining the user’s location with such ultrasonic signals has already attracted much attention in the area of smartphone positioning (Ijaz et al. 2013). In order to improve the accuracy of ultrasound indoor positioning, the main effort is to mitigate the echo signals, which have severe effects on the TOA detection of ultrasound.

2.3 Positioning Technology of Multi-source Fusion

As seen from the above, different positioning methods have their pros and cons in different scenarios of indoor positioning. For example, RF signals may have large coverage, however multipath interference, which is common indoors, will cause large positioning errors. Pedestrian-track estimation based on built-in sensors does not depend on the infrastructure indoors, but the errors from the IMUs accumulate over time. Currently, there has not yet been any method based on a single technology that suits all different scenarios of indoor positioning. Table 26.1 compares the performance of various technologies for the smartphone positioning in terms of positioning accuracy, complexity, robustness, scalability, and cost. Although there are many sources available for indoor positionings, such as sound, light, electrical signals, and magnetic fields, different positioning sources have their own limits and the usability depends on the actual environment in reality. For example, the method of WiFi fingerprinting requires a wide coverage of the signals with more APs and less radio interference, while the method of magnetic field matching requires significant magnetic features in the place of interest, where magnetic interference benefits positioning to some extent. As to the visual positioning, it works well in a bright environment, while it cannot work effectively in dark places.

Table 26.1 Comparison of different positioning technologies of smartphone sensors

With the improvement of computing performance and storage capacity on smartphones, the sensor fusion technology to integrate multiple positioning technologies has been a hot research topic in the field of indoor positioning with smartphones. The methods are broadly divided into loosely coupled and tightly coupled. The basic idea of the loosely coupled method is to fuse all the positioning results from different sensors and get the estimate of the position at a time epoch. This kind of fusion is easy to implement, but due to the heterogeneity of sensors in the smartphone positioning, it is difficult to analytically compute the weights on the position estimation from different sensors, which are sent to the sensor-fusion module. The tightly coupled method is to fuse different parameters estimated from different types of sensors and get the positioning estimate. At present, an effective way to implement tightly coupled fusion is based on Bayesian inference, which includes Kalman filtering (KF; Zhang et al. 2013), unscented Kalman filter (UKF; Chen et al. 2011c), and particle filter (PF; Quigley et al. 2010). In these methods, the state model and the measurement equations are first set up, and the moving states (position and velocity) of the pedestrian have been inferred in sequence based on the parameters estimated from different sensors, such as position, velocity, heading angle, and step size. The literature on sensor-fusion research includes: the hybrid positioning system with WiFi magnetic field and cellular signal (Kim et al. 2014); WiFi positioning fused with PDR results (Karlsson et al. 2015; Li et al. 2016); Bluetooth module, accelerometers, and barometers used for 3D indoor positioning (Jeon et al. 2015); and WiFi fingerprinting with PDR and magnetic field matching (Zhang et al. 2017). In addition, indoor maps are commonly used to assist indoor positioning. The positioning system can reliably achieve meter-level accuracy by integrating the map-constrained information with WiFi fingerprint and PDR positioning results (Wang et al. 2015a). Ruotsalainen et al. (2016) provide a solution to infrastructure-free indoor navigation by fusing the observations from IMUs, cameras, ultrasonic sensors, and barometer with the PF algorithm. The average positioning accuracy is about 3 m. Various sensor-fusion positioning methods are compared in Table 26.2. The test results have already shown that the accuracy and stability of the sensor-fusion systems are better than an indoor-positioning system with a single technology.

Table 26.2 Comparison of various available sensor-fusion positioning methods

3 Difficulties in Indoor Positioning

Using the method of sensor fusion, the positioning accuracy of a smartphone is able to reach 2–5 m, and it is possible to achieve within 1 m in some specific environments. However, in general, it is still challenging to develop a technology with low cost, fine precision, and high usability for indoor and outdoor seamless positioning. The main difficulties of smartphone indoor positioning are summarized as follows.

3.1 Complex Channel Transmission and Spatial Topology in Indoor Environments

For the positioning with RF signals, multipath interference and NLOS transmission are the main errors for TOA-based measurements. However, due to the complex topology of the indoor environment, the multipath effect and the NLOS conditions are common and more severe indoors, which introduces large positioning errors when applying traditional RF positioning technologies developed for outdoor positioning. For example, the relocation of the appliances and furniture indoors, the increase or decrease of goods on shelves, and variations in the layout of the venue all affect the signal transmission and the magnetic field of the indoor environment. Such changes are the main difficulty for indoor positioning systems to maintain high accuracy. It is challenging to automatically sense and recognize the changes of the radio and magnetic fields incurred by the spatial and temporal changes of indoor topology, and thus improve the self-learning and self-adaptive ability of the positioning environment by updating the positioning database, including the WiFi fingerprint database, the geomagnetic fingerprint database, the image feature database, and the landmark information database. Automatic update for such metrics is still a problem that has not been solved in the field of indoor positioning.

3.2 Heterogeneous Source of Positioning

As shown in Fig. 26.1, there are over 12 types of sensors embedded in smartphones, including GNSS receiver modules, short-range RF transmitters, WiFi and Bluetooth modules, or receivers and other embedded sensors, such as accelerometers, magnetometers, gyroscopes, barometers, light-intensity sensors, microphones, speakers, and cameras. However, except for the GNSS receiver modules, other sensors and RF signal modules are not specifically designed for the purpose of positioning. Although many methods have been developed for these sensors to estimate the parameters of positioning, these measurements from different sensors are in essence heterogeneous, due to the fact that they observe different parameters of positioning (e.g., position, velocity, heading rate), different sampling rates, and different noise, which are in essence heterogeneous. As discussed in Sect. 26.3.1, it is possible to integrate different sensors that are embedded in the smartphone for indoor positioning. However, in order to achieve an optimal solution to sensor fusion for indoor positioning, the following problems have to be tackled.

3.2.1 Synchronization of Signal Measurements

Different smartphone sensors work independently and may have different sampling rates. For example, the scanning rate of the WiFi RSSI signal ranges from 1/3 to 1/30 Hz, while the sampling frequency of the accelerometer can reach 180 Hz. Even with the same sampling rate, the sampling time instant may be different too. Therefore, in order to compute position with the sensor-fusion algorithm, a synchronized measurement obtained from different sensors in different time instants has to be aligned to a specific time baseline. The baseline can be the main clock time of the smartphones in the user-centric positioning or the network time of the cloud server in a solution of network-centric positioning. To meet the requirement of most indoor location services, the update rate of indoor location should be greater than or equal to 1 Hz. The interpolation method works well on the time alignment of asynchronized measurements when the user is in the low-speed motion state (the motion speed is less than 2 m/s), which suits the scenarios of pedestrian indoor navigation.

3.2.2 Different Accuracy of Sensor Measurements

There are over 12 types of sensors embedded in smartphones. Different sensors have different measurement noise and quantification errors. Besides, there are different methods for different sensors to measure the positioning parameters, and thus, the measurement accuracy consequently varies. For example, MEMS sensors embedded in smartphones are low cost, and the measurement accuracy of such sensors is very poor, so they cannot be directly used in strap-down inertial navigation. But they can be used in step detection, and provide walking speed and length with acceptable accuracy. The indoor environment also has a different effect on different sensors. Some sensors or modules, such as a Bluetooth antenna array, visual positioning, or audio positioning, can provide fine-precision measurements of distances and angles in small-scale indoor spaces. In large-scale areas indoors, these sensors may have much larger measurement errors, which might lead to the failure of the positioning. It is therefore important to develop positioning algorithms that have enough flexibility to intelligently integrate different sensors with different observation accuracies.

3.2.3 Inconsistency in Different Smartphone Terminals

Different smartphone manufacturers may use different chipsets or components for the receiver modules or embedded sensors. Thus, the measurements from different smartphones may be biased due to the differences in the hardware of terminals. For example, different mobile phones have differences in the signal strength measurement of the same WiFi base station. Some deviations are actually quite large, which largely affects the positioning accuracy for fingerprinting-based positioning. Such inconsistencies also happen to cameras and MEMS sensors in different smartphones. A process of self-calibration can improve the consistency of the measurements from different smartphones to some extent. However, such difference or deviation is critical when considering fine-precision indoor positioning with accuracy within 1 m.

3.3 Limited Computing Resources on Mobile Terminals

As a handset, a smartphone is limited in its computing and storage capacity and power supply. Although the computing performance of smartphones has recently been increasing in accordance with Moore’s Law, smartphones already perform multiple functions—phone calls, positioning, assistance with daily work, recreation, etc.—all of which demand a portion of computing and power resources. From the point of view of energy saving, it is therefore not suitable for the smartphone to keep running complicated positioning algorithms for a long time. Though some complex positioning algorithms such as visual positioning and particle filter are gradually implemented in smartphones, more complicated algorithms related to deep learning and AI are still inappropriate for the handset platform and will need continuing upgrade of the computation resources in smartphones in the future.

4 The Development Trends of Indoor Positioning Technology

Indoor positioning is one of the hot research topics in academia and industry. Google, as one of the leading IT companies, has promoted visual positioning service (VPS) as its core technology, which fully demonstrates the importance of indoor positioning in the future application of AI. Other internationally renowned IT companies, such as Apple, Baidu, Huawei, and Alibaba, have all listed indoor positioning as one of their strategic technologies. From the perspective of developing the technology with high accuracy, high utility, and low cost, the future directions of smartphone indoor positioning may include new positioning sources, effective fusion methods on heterogeneous positioning technologies, and cooperative positioning based on geographic information systems (GIS).

4.1 Explore New Positioning Sources for Fine-Precision, High-Utility Smartphone Indoor Positioning

More and more sensors are integrated into smartphones, providing the opportunity to develop new positioning technologies. Among them, audio positioning is one of the promising methods to achieve high-accuracy indoor positioning with smartphones. The position is determined by measuring the TDOA from the sound transmitter to the smartphone. The frequency for audio positioning can be set between 16 and 21 kHz, which is within the working frequency of the microphone, while above the frequency of audible sound. The advantage of sound positioning is that the requirement for time synchronization is not as strict as that for RF positioning. Because the speed of sound in the air is about 340 m/s, the time difference between acoustic transmitters is within 0.1 ms. At this time, the error of acoustic positioning is within 3.4 cm, although that is a quite large error for RF positioning.

Light-source coding and positioning is another candidate method for high-accuracy positioning with smartphones. The location of the smartphone is determined based on an LED light installed on the ceiling with on/off signals as the positioning source. By rotating the LED light, such a code has a unique pattern in each sector, which can be utilized by smartphone light sensors for positioning (Fig. 26.3). By measuring the relative position of the mobile phone in the sector, positioning accuracy of 5–10 cm can be achieved without changing the hardware of the mobile phone.

In terms of RF signal, Bluetooth 5.1 and 5G signals will play an important role in indoor positioning. Bluetooth technology has the characteristic of low power consumption, and BLE 5.1 has enhanced the indoor positioning with an angle-finding property, which will achieve sub-meter. 5G-based wireless positioning technology is likely to become one of the core technologies for future indoor positioning, as it has explicitly announced indoor and outdoor positioning accuracy to be better than 1 m (Koivisto et al. 2017; Laoudias et al. 2018). UWB signals have recently been integrated into Apple’s smartphone. It is believed that UWB positioning in smartphones will attract more interest in applications.

Visual positioning based on cameras is still a promising method to achieve high accuracy with decimeter-level or even centimeter-level positioning errors, provided that the ambient lights and image features are sufficient. By integration with a depth camera, the visual positioning accuracy can be further improved, which has been verified in Google’s Tango technology. However, the computation complexity is high, in particular in the processes of feature detection, image matching, and AI-related algorithms. With the 5G wireless communication systems coming into operation, their property of large bandwidth and low latency will allow smartphones to upload their photos to a cloud server, and get the positioning results from the server in real-time. It is, therefore, possible that all complicated algorithms will be computed in a high-performance cloud server.

Fig. 26.3
figure 3

Positioning with light coding

Table 26.3 briefly analyzes the promising indoor positioning technologies mentioned above. Affected by the complex environment of indoor positioning, different positioning methods have their advantages and disadvantages in terms of positioning accuracy, reliability, availability, etc. In order to achieve continuous positioning estimates, fine-precision positioning technologies should intelligently fuse with each other.

Table 26.3 Characteristics and function of future technologies for indoor positioning

4.2 Fusion of Heterogeneous Positioning Sources

At present, the technical development trend in the field of indoor positioning is to use a reliable estimation method to effectively integrate two or more positioning sources, to improve the accuracy and availability of the smartphone positioning system. In terms of the sensor fusion for indoor positioning, a complete solution needs to be developed, which should integrate the steps of heterogeneous hardware calibration, high-accuracy position estimation from a single technology, and the intelligent sensor-fusion method with the heterogeneous smartphone sensors. One possible way is to consider using the control points in the tightly coupled fusion method, where the control points are estimated from the high-accuracy positioning techniques mentioned in Sect. 26.2. To achieve a hybrid positioning solution with stability and reliability, it is also important to design appropriate filtering methods and cross-validation methods to identify the errors from heterogeneous measurements, in the case that the positioning sources are sufficient.

4.3 GIS-Based Semantic Constraint Location and Semantic Cognitive Collaboration Positioning

Currently, the research topics of GIS have gradually shifted from outdoors to indoors. Indoor GIS can on the one hand enhance the position estimates with indoor maps and indoor features, and on the other hand, fully utilize the potential value of indoor landmarks, providing semantic positioning capabilities with space constraints. However, all these supports are insufficient due to the lack of high-accuracy coordinates in current indoor GIS. Therefore, to establish a basic indoor GIS for a fine-precision intelligent indoor positioning system, the following key technologies need to be considered and properly addressed: (1) an indoor GIS model with a unified space–time reference system; (2) a simultaneous indoor modeling and positioning method with high-accuracy real-time coordinate computation; (3) an automatic update and instantaneous modeling method for maps using crowdsourcing; and (4) real-time visual positioning and 3D modeling with indoor semantics. At present, a new direction of indoor GIS research includes GIS-based semantic constraint positioning and semantic cognitive positioning.

5 Conclusions

Indoor positioning is one of the core technologies in the era of IoT, AI, and future super-AI (robots + human). Currently, smartphone-based indoor positioning technologies include RF positioning and sensor-based positioning. Many different methods have been developed for indoor positioning. However, all these technologies developed so far have their own shortcomings because they are affected by the complexity of space topologies, the heterogeneous data, and the limited computation capability from mobile terminals, and thus, are limited for developing a ubiquitous positioning solution. In order to meet the requirements of low cost, high accuracy, high usability, and high durability for mainstream applications, it is necessary to develop precise positioning solutions that are capable of adaptively fusing accurate observables, including visual images, light signals, acoustic signals, and RF signals. These precise locations can serve as the control points to prevent the propagation of positioning errors. To achieve full coverage, positioning solutions such as pedestrian dead reckoning and magnetic matching are needed to be integrated with the system.