Using the unscented transform to reduce the number of measurements in drive tests

In a drive test, it is common to measure the electric field strength (or other related quantity) at fixed intervals to calibrate propagation models or to optimize wireless network coverage. We propose to select the measurement locations based on the unscented transform. Using just a few points (tens rather than thousands), we show that the statistics of the measurement tends to the correct value, which can save time and reduce the cost of drive tests.


Introduction
When a cellular network is deployed, it can have spots without network coverage or with poor signal quality, which requires some network optimization [1]. Usually, the very first step in optimizing a cellular network is getting field measurements, which can be done through drive tests. In this case, a measurement setup is mounted in a vehicle that rides through a series of locations measuring one or more aspects of the network (field intensity, path loss, etc.). In the ideal case, when the network needs to be optimized, it is necessary to measure the whole area. Since this is impossible, the current approach is to measure along one or more routes.
Routes cannot be random and should be properly designed. They should be radial and circular around the base station [2]. If this is not the case, it is possible that the measurements used to estimate the network coverage are not optimal, which implies that the statistics of the network coverage might be incorrect.
After the measurement campaign, the data are processed, analyzed and some parameters of the network are tuned. These steps can be repeated until the optimization is completed [1]. The collected data can be used to tune empirical path loss models used in network planning [2,3]. If the measurement campaign is properly performed, the data can also provide direct information about the network coverage and the statistics of the actual field intensity.
When conducting drive tests, we deal with two main issues: the first is route planning. If the route is randomly designed (which is very common in papers reporting electromagnetic fields measurements), we have no guarantee that the measurements could be used to directly estimate the network coverage. The second issue is that a drive test can generate a huge amount of data that must be processed and analyzed, which takes time [4].
We can visualize this problem as follows: there is an area to be measured and we need to choose locations in it so that the measurements properly represent the whole area. This could be done using the Monte Carlo algorithm, in the same way as it is done in studies of interference between telecommunication systems [5]. However, this technique generates thousands of locations to be measured, which is impractical in this case. An alternative to the Monte Carlo algorithm is the unscented transform, which has been applied in a variety of problems in the past few years (e.g. [6][7][8]).
Our hypothesis is that the unscented transform (UT) can be used to choose the measurement locations instead of using traditional routes in drive tests. This method allows us to select just a few points to measure, which saves time and reduces the cost of drive tests.
This paper is organized in five sections. The next section describes the UT. Section 3 applies the UT in two problems related to electromagnetic propagation. Section 4 discusses the results of the paper. Finally, a conclusion is presented. Besides that, two appendices explain some calculations of Sect. 3.

The unscented transform (UT)
Consider some hypothetical electromagnetic propagation environment. We can measure some quantity y (e.g. path loss, electric field strength, interference, etc.) at each coordinate (x 1 , x 2 ) of the environment, where x 1 and x 2 represents the latitude and longitude of the probe. In other words, the environment act as a nonlinear function f(.) that maps each coordinate to a measurement: y = f(x 1 , x 2 ). We can rewrite this equation as Y = f(X) to consider an arbitrary number N of measurements. In this case, X = [(x 1 , x 2 ) 1 , (x 1 , x 2 ) 2 , …, (x 1 , x 2 ) N ] is a N × 2 matrix representing each pair of coordinate and Y = [y 1 , y 2 , …, y N ] is a N × 1 matrix representing all the measurements.
If we need to estimate the statistics (mean, standard deviation, etc.) of the quantity y, one possibility is to use the Monte Carlo algorithm [9,10], which randomly generates thousands of inputs X MC according to their probability density function (p.d.f.) and process them using f(.) to obtain Y MC : Y MC = f(X MC ). The statistics of the quantity y can be directly extracted from Y MC .
In the context of field measurements, this is equivalent to measure the field in thousands of random locations in some area to get a good estimate of the output. In this case, X MC represents the probe locations uniformly distributed in the area under analysis, the function f(.) represents the propagation environment, and Y MC , the measurements (field strength, path loss, or any other aspect of the propagation environment). Selecting the points using the Monte Carlo algorithm will ensure that the measurements will correctly represent the environment, but this came with a cost because, in this method, N is in the order of thousands.
It is impossible to measure all the desired locations. Hence, a drive test is done on some routes, which results in a set of locations ( ∼ ) and their measured quantity ( ∼ ). Although ∼ and ∼ are very effective to calibrate a propagation model, they might not produce a good estimate of the statistics of the quantity y, which is necessary to properly estimate the coverage of an area by some network operator. To overcome this, it is necessary that the p.d.f. of the measurements points in the routes represents a uniform distribution over the area. In real routes, the probes are usually not uniformly distributed over the area under analysis.
We propose the use of the unscented transform as an alternative to select the locations of the probes. Using the UT, the number of measurement points can be drastically reduced, and we still get measurements that can be used to calibrate propagation models and to estimate percentage of coverage.
The UT is a method for predicting means and covariances in nonlinear systems. It was built on the principle that it is easier to approximate a probability distribution than it is to approximate a nonlinear function [11,12]. For each input of the system, a reduced set of points called sigma points (X σ ) is calculated and associated with a set of weights (W). These points are calculated so that their mean and covariance match the mean and covariance of the distribution of the original input of the system [11,12]. X σ is used as input for f(.) to get Y σ . With W and Y σ , we can estimate the statistics of Y with a given accuracy [11,12].
There are some ways to calculate the sigma points. Table 1 shows one possibility for the first ten sigma points for a unidimensional uniform distribution in [− 1, + 1] [13]. These points have a precision of 2S -1, which means that the expected value of the output calculated using S sigma points and their weights is equal to the expected value of Y calculated using the first 2S -1 terms of the Taylor expansion of the function f(.) [13]. For multidimensional inputs (as in this paper), if the variables are uncorrelated, the sigma points can be calculated combining the onedimensional sigma points for each variable. "Appendix 1" shows how to use the unidimensional sigma points of Table 1 to generate sigma points to represent position (two-dimensional variable).
Using the UT, the number of measurements can be reduced from N to S: tens rather than thousands.

Results
In this section, we discuss the application of the unscented transform in two cases. First, we analyze the cumulative distribution function of the total exposure ratio due to cellular base stations in a circular area of 12.56 km 2 . In the second example, we use path loss measurements at 900 MHz in a circular ring of 3.11 km 2 to calibrate a path loss propagation model and to estimate the statistics of the path loss in the area around a base station.

Cumulative distribution function of the total exposure ratio to electromagnetic fields due to cellular base stations
In this first example, the cumulative distribution function (c.d.f.) of the total exposure ratio (TER) due to the radiation of cellular base stations is shown in two Brazilian cities. The TER is defined as the sum of exposure ratios (ER) for all the relevant radio sources. The International Telecommunication Union defines the ER as "the parameter at a specific location for each operation frequency of a radio source, expressed as a fraction of the related limit" [14], i.e. ER = E/E lim , where E is the electric field strength of the radio source and E lim is the corresponding reference limit (Table 2) at the frequency of the radio source.
The electric field strength can be measured or estimated. Since it is not feasible to measure the TER in thousands of locations, in this section we will use only values estimated using a conservative method described in [17], which considers information of base stations (height, frequencies of the radio sources, transmitted power, and vertical half-power beamwidth) and a free space propagation model.
The TER presented here were calculated using the Brazilian public base station database [18] and considers all deployed base stations operating at 700 MHz, 800 MHz, 900 MHz, 1800 MHz, 1900 MHz, 2100 MHz, and 2500 MHz. Analyzing the TER, we are indirectly checking one aspect of the field strength (it is a sum of field strengths of all radio sources), i.e. for the purpose of this paper, it can be used as a simulation of a drive test.
Using the Monte Carlo algorithm, we generated 20,000 random locations around a 12.56 km 2 circular area (2 km radius) centered at the coordinates (− 15.7966675°, − 47.8915836°), in Brasília/Brazil, and centered at (− 23.5632103°, − 46.6547975°), in São Paulo/ Brazil. Then, using the UT, we generated 49 distinct pairs of sigma points (7 sigma points for the latitude and 7 for the longitude) to represent the same 12.56 km 2 region (see "Appendix 1" for details on how to generate these points). Figure 1 shows the map of Brasília/Brazil, the base stations, and probes locations considered in this analysis. Figure 2 shows the c.d.f. of the TER in Brasília and São Paulo calculated considering the points selected by the Monte Carlo algorithm and by the unscented transform. Using the UT, only 49 selected points were used to plot the c.d.f. Table 3 shows the mean and standard deviation of the TER in these two cities. Despite the staircase effect, which is expected due to the low number of points, the c.d.f. calculated using the UT approaches the one calculated using Monte Carlo.

Calibration of path loss model and statistics of path loss around a base station
This section discusses the selection of measurement points inside a circular ring around a transmitter in three cases: using measurement routes, at random points generated by the Monte Carlo algorithm, and at selected points calculated using the unscented transform.
To illustrate this idea, 1635 measurements of path loss in Munich/Germany at 900 MHz were used [19], which are shown in Fig. 3, in gray, inside the circular ring of inner radius of 100 m and outer radius of 1 km. Then, using the Monte Carlo algorithm, we generated 20,000 locations uniformly distributed around the transmitter [20]. Using the UT, we also generated three sets of locations (with 25, 36, and 49 points) to represent probes uniformly distributed over the circular ring. "Appendix 1" shows how to use the UT to select these points. Since we do not have measurements at the points given by the UT and by the Monte Carlo algorithm, we used a path loss model suitable for this environment. The model is described in "Appendix 2".
With these data, we interpolate the log-distance path loss model using the measurements and using the predicted values at the points given by the Monte Carlo algorithm and by the UT, which are shown in Table 4 and plotted in Fig. 4. In all cases, for distances between 100 m and 1 km, the root mean square error of the log-distance models 1-4 was less than 1 dB (considering the log-distance  model using the measurements as the correct one), which indicates that the UT can be used to select measurement points to calibrate a propagation model. In some cases (e.g. estimation of percentage of coverage), it is also necessary to analyze the distribution of some quantity in the propagation environment. In this situation, the analysis should consider a uniform distribution in the area to be covered. For the Munich dataset (Fig. 3), the data given by the drive test has more measurements to the left of the transmitter than to the right. If the environment has many asymmetries and the probes are not uniformly distributed in the environment, the statistics of measurements might not accurately represent the reality.
To illustrate it, the two leftmost plots of Fig. 5 show the c.d.f. of the path loss at the 1635 measurement points shown in Fig. 3. One of the plots were generated with the path loss measurements and the other one, with the path loss predicted by the model described in "Appendix 2". The average measured and predicted path losses   . 4 Log-distance model. Since the differences among the models are barely noticeable in this figure, we opt to show the log-distance models 1-4 of Table 4 grouped in gray are, respectively, 124.58 dB and 124.04 dB. The standard deviation of the prediction error is less than 6 dB. The predicted values are similar to the measurements, indicating that the model described in "Appendix 2" can be used to study this environment. The two right-most plots of Fig. 5 show the c.d.f. of the predicted path loss in the whole circular ring shown in Fig. 3. The probes were uniformly distributed over the circular ring. The continuous right-most plot was generated with the Monte Carlo algorithm and is the result of 20,000 predictions (which are also shown in Fig. 3 using a color  map). The slashed right-most c.d.f. were generated using only 49 points (their location are shown in black dots in Fig. 3) given by the unscented transform. These points also represent a uniform distribution of probes in the circular ring. Compare both cumulative distribution functions: although the one generated with the Monte Carlo algorithm uses 20,000 points and the one generated with the UT has only 49 points, they are very similar, indicating that the UT can be used to select the measurement locations to properly represent the propagation environment. But most importantly, note the difference between the two leftmost plots (c.d.f. along the routes) and the two rightmost plots (c.d.f. in the whole circular ring): this means that the routes, in this case, do not accurately represent the statistics of the path loss in the whole circular area. To illustrate it, the measurements indicate that ~ 30% of the region has path loss lower than 120 dB. In fact, as the analysis using Monte Carlo and the unscented transform shows, the probability to get a path loss lower than 120 dB is ~ 20%. Although there were 1,635 measurements points in the routes, if the percentage of coverage were estimated using this data, we can get wrong results. This error can be reduced if we used field measurements collected at the locations calculated using the UT.

Discussion
In the last section, two case studies were used to illustrate the use of the UT to select measurement locations in problems related to electromagnetic propagation.
The first case study considers a set of base stations in two cities to estimate the c.d.f. of the total exposure ratio in a circular area. As explained in the last section, this problem can be used as a simulation of a drive test. The result shows that 49 locations calculated using the UT suffice to get a c.d.f. similar to the one obtained considering 20,000 locations given by the Monte Carlo method.
The second case study compares the path loss measured in routes with the ones at the points given by the UT. In both cases, the measurements can be used to calibrate a path loss model (Fig. 4). However, the locations given by the UT with only 49 points represented the environment better than the routes with more than a thousand locations.
This result is important because it indicates that the UT can be used to choose locations to obtain results similar to those obtained using the Monte Carlo method. It is worth noting that the UT has traditionally been used in problems related to parameter estimation, filtering, automation, and control [21][22][23][24][25]. In the last years, it has been used in a variety of new problems, including some in the telecommunications sector as an alternative to Monte Carlo methods [26][27][28]. Nonetheless, this is the first time that method is applied to select measurement locations for drive tests.

Conclusion
The initial hypothesis of this paper is that the unscented transform can be used to choose the measurement locations instead of using random routes in drive tests.
We compare the path loss in random routes and in locations selected using the UT. Both strategies can be used to construct a path loss log-distance model. However, if the purpose of the measurements is to analyze network coverage, random routes might generate incorrect statistics of the measured signal, which not happens using the UT.
Our results show that only 49 locations calculated using the UT produces a cumulative distribution function similar to the one considering 20,000 points selected using the Monte Carlo algorithm. Thus, we can use just a few points to represent the propagation environment, which potentially saves time and reduce the cost of drive tests.
The two case studies considered frequencies below 3 GHz. For higher frequency bands, further research is necessary to check if 49 or more measurement locations is necessary to represent the propagation environment.
Availability of data and material The base station database used in Sect. 3.1 is available at https ://githu b.com/caris io/emf-expos ureweb/blob/maste r/src/nir/util/DataB ase.ts

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.