LTE Cell Tower Access Traces in Urban Environment

. Point of interest (POI) in an urban space represents the perception of city dwellers and visitors of a certain place. LTE cell tower access trace data is one of the promising data sources which has the potential to show real-time POI exploitation analysis. However, there is not much discussion on how it is correlated to diachronic POIs and their exploitation pattern. In this paper, we ﬁrst show that the access trace pattern from the LTE cell tower can be used to discover which types of POIs exist in a certain area. Then, we propose a daily POI exploitation discovery scheme which can extract patterns of how POIs are daily used. Our analysis can provide a good insight into future urban space-based services such as urban planning and tourism.


Introduction
Understanding how urban spaces are exploited by people can change the way we perceive and conceptualize the city. Service providers can use it to provide citizens with modern urban services such as tourism or personalized advertisement. We can discover such place exploitation based not only on the physical form of a place but also on the projection of human perception. The former is composed of a space and its buildings, while the latter is built from the social and physical perceptions of dwellers accumulated over a long period [1]. Among various forms of data sources to understand the place exploitation [2][3][4][5][6], LTE cell tower access trace data has recently attracted the attention of urban researchers as a promising data source. It can capture the mobility patterns of urban users, which may represent fundamental aspects of social and economic interactions. This advantage with the widespread adoption of mobile devices enables urban researchers to conduct various spatio-temporal analysis without extensive fieldwork. Discovering the correlation between the LTE cell tower access traces and the urban circumstances can broaden the understanding of the placeness of an area, and have the potential to make the real-time place analysis possible as they can provide more instant observation about an urban area.
Liu et al. [6] cluster the moving patterns of citizens read from the mobile traffic data to discover different types of urban functional areas. Zinman et al. [8] identify the land use of urban areas using a classification method on cellular communication usage patterns. Xu et al. [10] cluster the mobile traffic patterns from LTE cell towers to identify each cluster's urban function recognized by the distribution of POIs. Cici et al. [5] use a clustering approach on mobile traffic data to discover the geographical relationship between the clusters and the POIs in the urban area. Although these clustering and classification methods are informative to a macroscopic urban design, they are limited to discover microscopic exploitation of places and their daily changes, which can provide more temporal information for urban services such as urban tourism and popular place discovery.
In this paper, we propose a POI exploitation discovery scheme that infers which POIs diachronically exist as well as how they are daily exploited in a given area, using LTE cell tower access traces. To analyze spatio-temporal characteristics of urban places, we apply time decomposition to LTE access trace collected from 4,096 LTE cell towers and extract 7-day (168-h) seasonal components. We leverage several classification models that can identify urban space POIs, and regression models for discovering daily POI exploitation, respectively, on the data. For ground truth for POI identification, we annotate each LTE cell using the places and their types from the Google Place API. We also annotate the daily exploitation patterns of each POI in an LTE cell using Google Place popular-times data. Based on these two datasets, our scheme can discover the types of diachronic POIs and the daily exploitation patterns of POIs of an LTE cell area. Evaluation results show that the proposed scheme performs well on the POI identification and can discover explainable changes in POI exploitation.

Related Works
Liu et al. [6] use CDR (Call Daily Records) data to discover the fabric of urban city. They extract snapshot map to represent each region and leverage embedded features from the convolutional auto encoder trained by their dataset. Although their evaluation was limited due to the lack of data size, each cluster was shown to include same type of urban functional areas.
Zinman et al. [8] use CDR data to identify the land use of an area which also includes road information such as highway and street. They extract several different features and measure the variable importance (VI) of each feature to make better input feature for the model. They apply random forest classification and their result shows that their analysis of the feature improved accuracy.
Xu et al. [10] extract and model the traffic patterns of large scale cell towers placed in an urban area. They apply hierarchical clustering to seasonal components of mobile LTE traffic data and find a correlation between the cluster and urban ecology. Besides, through traffic spectrum analysis in the frequency domain using DFT, they find a linear combination that could express human activity behavior and indirectly capture the movement of people over time.
Zhang et al. [9] aim to understand the patterns of mobile traffic data and to find correlations between mobile traffic patterns and human activities in urban environments. They separate traffic data into three components: seasonal, trend, and residual. The seasonal component can identify the relationship with the POI of the place through clustering, and the trend component can identify some POI through the pattern difference between weekday and weekend. Residual components show to enable the capture of unexpected behaviors.
Cici et al. [5] aim to infer the characteristics of urban ecology (social and economic activities, and social interaction) from the patterns of cell phone data. They divide the cell phone traffic data in Milan into seasonal and residual patterns using Fourier time decomposition. They find the seasonal data characterize the patterns related to socio-economic activity in the region using agglomerative hierarchical clustering, while residual data make it possible to analyze how irregularity from the new events affect in other regions.
Shafiq et al. [7] aim to identify application usage patterns in each region so that network operators can easily optimize the distribution of cellular data. They collect traffic volume in terms of byte, packet, flow count, and unique user count, which can be extracted from 3G cellular data, and clustered for each term to specify the characteristics of each region (Downtown, University, Suburb, etc). The network operators can optimize the distribution of cellular data that fits clusters.

Dataset
In this section, we describe how we construct our dataset in three steps. We first explain how we preprocess LTE signal patterns. Then we illustrate how we annotate each LTE cell with POI type label, and collect POI exploitation patterns for the experiment.

Step1: Extracting a 7-Day Seasonal Component from an LTE Cell Tower Access Trace
Our dataset consists of anonymized LTE cell tower access traces on 4,096 LTE cell towers, collected by an internet service provider (ISP), Korea Telecom (KT) from March 1st, 2018 to February 28th, 2019. These LTE cell towers are located across three most popular commercial areas in Seoul as shown in Fig. 1. Each cell tower covers a 50 m × 50 m area and access traces represent the number of people connected to each cell per hour. To analyze embedded patterns from each LTE cell tower access trace, we leverage the STL time decomposition method [22] to extract a 7-day (168-h) seasonal components. Then we conduct min-max scaling to normalize each pattern.

Step 2: Associating Diachronic POI Type Labels with Each Cell
To understand which types of diachronic POIs exist in each LTE cell, we establish a ground-truth of (LTE access pattern, POI labels) pairs. First of all, we crawl a list of places in all the LTE cells using Google Place API 1 by querying any places within a radius of 36 m from each centroid of the LTE cells to be able to cover each cell area. Then we use the place type tags from Google query result to annotate LTE cell whether it contains certain types of POI. To prevent the associations of the POI labels with the LTE cell tower access patterns from being diversified too much, we choose 8 most frequent place types (out of 100) found by the Google Place API in our three target areas: restaurant, store, cafe, health 2 , finance, bar, clothing store, and lodging. Table 1 shows the number of LTE cells which contains places of corresponding POI types. Figure 2 shows the median of the extracted 7-day seasonal component for each type. We can see a clear difference between weekdays and weekends. Weekdays have regular patterns, while weekends have irregular patterns.

Step 3: Annotating Daily POI Exploitation Using Google Popular-Times
To understand how the exploitation patterns can be inferred from LTE access pattern, we establish a ground-truth of (LTE access pattern, a daily POI exploitation pattern) pairs. This dataset requires POI exploitation patterns on a daily basis to figure out how POIs in a given cell are exploited during a week.
To construct such a dataset, we use Google popular-times 3 for each POI. Since official Google Place API does not provide popular-times, we leverage the thirdparty crawler 4 that provides a 7-day exploitation pattern normalized from 0 to 100 for a given place. We simply scale the pattern by 1/100 to have it represented by values between 0 and 1. Table 2 shows the number of collected POI exploitation data in each area.

POI Discovery
To discover POIs in each LTE cell from the access trace pattern, we leverage logistic classifier (LC), support vector machine (SVM), random forest classifier (RFC), and deep neural network (DNN) based model. For each POI type, each model listed above is trained to predict whether the LTE cell area may contain target POI type or not, which can be considered as a binary classification problem. The DNN based model consists of 3 linear layers with ReLU activation function on two foremost layers, and a final sigmoid function for binary classification. This model produces the prediction score from 0 to 1 where a higher value implies that a target POI type is more likely to exist in a given LTE cell.
The detailed architecture is illustrated in Fig. 3.

Daily POI Exploitation Pattern Discovery
To discover the daily POI exploitation pattern from an LTE access pattern, we leverage a random forest regression (RFR) model which can be used for time-series pattern prediction [20,21]. Figure 4 illustrates our RFR model and its training process. First of all, find POIs with exploitation patterns for each LTE cell. Next, pair each POI exploitation pattern with corresponding LTE access pattern. During this process, pairs of an LTE access pattern and a POI exploitation pattern will be separated into different datasets each belongs to the same POI type. Then, we train a RFR model to predict the exploitation pattern for each POI type. Finally, this model can be used to predict unknown POI exploitation pattern using corresponding LTE access pattern in a given area. Fig. 4. Daily POI exploitation discovery scheme on an LTE access trace pattern.

Evaluation Setup
POI Discovery. We train binary classification models (LC, SVM, RFC, DNN model) for each POI type to predict whether the cell contains such POI or not. When we train the models, they may suffer from the unbalanced amount of positive and negative labels of the dataset. To prevent this, we use a random undersampling technique that removes either positive or negative samples randomly to have an equal amount to each other. We conduct 10-fold cross-validation using the dataset described in Sect. 3.2. To analyze the performance of the model, we use five standard metrics: (1) Accuracy, (2) Precision, (3) Recall, (4) F1macro, (5) ROC-AUC. When we evaluate the first four metrics above, we use the threshold value as 0.5 to make a positive prediction. For each metric in this experiment, a higher value represents that a model shows better performance. The parameters used in each model are as follows: -LC: L2 norm loss, tolerance = 0.1, LBFGS solver -SVM: radial basis function (RBF) kernel, regularization parameter = 1 -RFC: the maximum depth of 5, the number of estimators = 100 Table 3 shows the mean value for each metric after cross validation with respect to each POI type.
Daily POI Exploitation Discovery. We train random forest (RFR) regression models with maximum-depth of 5. We compare the performance of RFR models to the baseline which uses the mean and median of the training data to predict the test data. To evaluate our model, we conduct 10-fold cross-validation using the dataset described in Sect. 3.3. During the experiment, we find that many places have one or two days of holiday every week. However, this can cause a lot of error on evaluation if the model predicts any activated pattern on the holiday of a POI. We omit the activated signals on the holiday of the POI from the model prediction by zeroing them. We simply define a holiday of a POI as a day whose corresponding 24-h pattern has the values equal to 0. We use three evaluation metrics, which are (1) mean squared error (MSE), (2) root mean squared error (RMSE), and (3) Pearson correlation. For MSE and RMSE, a smaller value represents that the model produces a less amount of errors. For the Pearson correlation, a higher value represents that the predicted pattern is more similar to the corresponding real POI exploitation pattern. We also evaluate Pearson correlation by mean, median, standard deviation (SD), and coefficient of variation (CV) between the test results. We choose a maximum depth of 5 for an RFR model. Table 3 shows the performance results of each model for each POI type. When we compare the overall performance across the categories, the models trained for bar category show the highest accuracy while that for the health category show the lowest. Although the accuracy of a model for bar might be seemed marginally better than others, the precision is higher than any other category. This is because the signal patterns of LTE cells encompassing bars are quite distinguishable from others as bars mainly open during the night and their existence are easy to be discovered from the patterns. On the other hand, POIs in the health category are difficult to be predicted as its corresponding places do have a small number of visitors, compared with other categories. This makes it difficult to find a common pattern from the LTE access pattern belonging to this category. The table shows that RFC and DNN model can be considered as a model which produces a fine prediction in terms of accuracy, as they show good performance on all the categories except for clothing store and lodging. LC and SVM also performs pretty well and worth to be applied when the researcher needs an prototype result. However, evaluating the models only on accuracy can be dangerous as we conduct our experiment on a small amount of dataset after the under-sampling technique. So, we also evaluate the models based on AUC-ROC scores, which can take into account of sensitivity and specificity. In terms of AUC-ROC scores, DNN based model shows better performance overall. We assume that this is because the DNN architecture captures a feature that most commonly appears in each POI type, resulting in a more consistent model.

Validation of Daily POI Exploitation Discovery
We conduct our experiment on restaurant, store, cafe, and bar since these categories have enough data to evaluate our experiment while others suffer from data sparsity. Table 4 shows the evaluated performance measures of daily POI exploitation pattern prediction. The results show that RFR-based models produce best prediction patterns than baseline mean and median models as they shows less MSE and RMSE. The models for bar category shows lowest MSE and RMSE as the exploitation patterns of this category are more distinguishable than others. The higher mean value of Pearson correlations means that RFR models predict rather similar exploitation patterns to the real value than the baseline models. Plus, a small coefficient of variance means that the level of dispersion of the Pearson Correlation is smaller around the mean. Therefore, we can conclude that RFR models produce a similar exploitation pattern to the corresponding real value more frequently for all types of POI. In addition, a gap between the mean of the Pearson correlation from the best RFR model and that from the baseline mean model is biggest in the restaurant category and smallest in the bar category as shown in Table 4. Considering that the amount of training data is biggest at the restaurant category and smallest at the bar category, this observation implies that the current RFR model can be reinforced as more training data is provided.

Finding POI Exploitation Variances Across Places with the Same POI Types
By conducting a cross-analysis of the daily POI exploitation changes, it can be found that urban places with the same POI types can be exploited differently by dwellers and tourists from the ne-grained perspective. Figure 5 shows the daily changes of POI exploitation of four different places identified by the same diachronic POIs (i.e., having similar urban compositions), while their dominant POI types are different. By using the Google street view, we nd that exploitation patterns are different, depending on the urban setting of each place. A restaurant dominant place (blue) is located near a main street where people gather and socialize while having lunch and dinner. Its exploitation pattern shows that this area is highly visited at every lunch and dinner periods, and even higher at the dinner period on Friday and Saturday. A store dominant place (orange) is a popular shopping street mainly for buying clothes and accessories. The exploitation pattern of this place shows that people visit during all the day, unlike the restaurant dominated place, which shows clear peaks in lunch and dinner periods. A cafe dominant place (green) includes big franchise cafes (e.g., Starbucks) and many unique theme cafes (pet, flower, movie character, etc.). Its exploitation pattern shows that people visit the place during daytime, and more on a weekend as there are many special type of cafes which are attractive to visitors. A bar dominant place (red ) is well-known for bars, night clubs, concerts, and night food. Its exploitation pattern shows that people visit the place over midnight, and more on Friday and Saturday. It is interesting to note that the exploitation pattern of restaurants in this place is similar to that of bars, that is, many restaurants in this area are also not closed even after midnight. The ndings above show that our proposed scheme can provide urban planners who seek for static conguration and dynamic exploitation of urban places [16,17], and urban tourists who want to visit places that match with their daily interest with helpful insights [18].

Discussion
The results above show that we can discover POIs and their daily exploitation leveraging several classification and regression models using a normalized 7-day seasonal pattern from an LTE access traces. Nevertheless, there still exist a few points to investigate further. First, we use a normalized 7-day seasonal pattern from each LTE access pattern as our input for both experiments. However, during the extraction of seasonal patterns, many information such as trend or residual patterns are missed. A trend pattern can reflect one-year seasonal change such as the weather or public traveling behavior, and it can provide more insight on the long-term exploitation pattern change. A residual pattern can reflect unexpected signal away from seasonal and trend, so it can show the frequency of unexpected event in a given area. So, the combined analysis of trend and residual as well as seasonal patterns would provide another dimension to analyze the LTE access pattern. Another way to read social information from LTE access patterns is to analyze in terms of age and gender. This will enable us to understand the demographics of the area and help to figure out what age and gender groups facilitate the POI (or a combination of POIs) of a given place -that is, understand what placeness is leveraged not only when but also by whom.

Conclusion
In this paper, we leverage LTE cell tower access traces and develop a decent scheme to identify what kinds of points of interest (POIs) exhibit in a given space and how these POIs are exploited in a daily manner. For the former, we leverage several binary classification models such as LC, SVM, RFC, and DNN. For the latter, we leverage the random forest regression model for discovering daily changes in POI exploitation. Evaluation results show that our approach performs well on each task, and can discover the poi exploitation variance of the different urban environment. Our analysis based on LTE cell tower access data will provide a good insight into future urban space-based services such as travel and tourism.