Impact of Location Spoofing Attacks on Performance Prediction in Mobile Networks

Kanuri, Nikhil Sai; Chang, Sang-Yoon; Park, Younghee; Kim, Jonghyun; Kim, Jinoh

doi:10.1007/978-3-031-24049-2_7

Nikhil Sai Kanuri¹⁰,
Sang-Yoon Chang¹¹,
Younghee Park¹²,
Jonghyun Kim¹³ &
…
Jinoh Kim ORCID: orcid.org/0000-0002-9835-1866¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1683))

Included in the following conference series:

Silicon Valley Cybersecurity Conference

1237 Accesses

Abstract

Performance prediction in wireless mobile networks is essential for diverse purposes in network management and operation. Particularly, the position of mobile devices is crucial to estimating the performance in the mobile communication setting. With its importance, this paper investigates mobile communication performance based on the coordinate information of mobile devices. We analyze a recent 5G data collection and examine the feasibility of location-based performance prediction. As location information is key to performance prediction, the basic assumption of making a relevant prediction is the correctness of the coordinate information of devices given. With its criticality, this paper also investigates the impact of position falsification on the ML-based performance predictor, which reveals the significant degradation of the prediction performance under such attacks, suggesting the need for effective defense mechanisms against location spoofing threats.

You have full access to this open access chapter, Download conference paper PDF

Mobile Network Threat Analysis and MNO Positioning

IMSI Probing: Possibilities and Limitations

Tracking Traffic Peaks in Mobile Networks Using Statistics of Performance Metrics

Article 11 February 2017

Keywords

1 Introduction

Performance prediction in wireless mobile networks is essential for network optimization and management [8], application offloading decisions [10], deployment of unmanned aerial vehicles (UAVs) also known as flying base stations [3], to list a few. In fact, there would be different angles on performance prediction in mobile communication, from low-level channel performance [11, 12] to mobile application/device throughput [6, 8, 10]. In this study, we focus on application throughput of mobile devices for predicting and evaluating.

In the mobile communication setting, the position of mobile devices is significantly crucial to estimate the performance. Simply speaking, even for a single mobile device, the measured performance of that device may show a high degree of fluctuation depending on its location (e.g., due to the density of devices, signal strength, and interference/reflection). In this study, we investigate mobile communication performance based on the coordinate information of mobile devices. We analyze a recent 5G data collection [7], which contains a set of features including the GPS coordinates, velocity, and application throughput information of mobile devices, with a machine learning (ML) approach.

As the location information is key to performance prediction, the basic assumption of making relevant prediction is the correctness of the coordinate information of devices given. However, any malfunctioning of location chips (e.g., receiving GPS signals) may result in an unacceptably erroneous estimation (although rare). A more common scenario is location spoofing taken place intentionally; that is, a location spoofing attack falsifying the position information can be attempted with a malicious intent, which is one of the greatest security concerns in mobile communication networks [4, 9]. With its criticality, this paper investigates the impact of position falsification on the presented ML-based performance predictor.

While this paper presents our initial experimental results and observations, there are several contributions non-trivial to the research community. Firstly, this paper examines the feasibility of location-based performance prediction. An interesting observation is that it is possible to estimate application throughput with 80% accuracy using a small set of features readily available when establishing the communication channel. Secondly, the impact of location-spoofing attacks on performance prediction is evaluated, with the intuition that location-based performance prediction would be critical to such threats. The experimental result shows a significant degradation of the performance prediction quality, signaling the need for effective defense mechanisms against location-spoofing attacks to enable reliable estimations.

The organization of this paper is as follows. We first introduce the 5G dataset employed for performance prediction in Sect. 2, with exploratory data analysis. In Sect. 3, location-based performance prediction is discussed with our initial experimental results for binary and multi-class classifications. Section 4 shows the impact of location spoofing attacks on performance prediction using two types of position falsification techniques (constant and constant-offset spoofing). Section 5 provides a summary of closely related studies, and we conclude our presentation with future research directions in Sect. 6.

2 Exploratory Analysis of 5G Dataset

This study employs a recent 5G dataset collected from an Irish mobile operator network [7]. The data collection was made using different file access applications, including file transfer and video streaming. The throughput of such applications was measured in different locations and mobility options (stationary or driving), in addition to other channel and context information. The number of features defined in this dataset is 26 features in total. Table 1 provides the features referred to for our performance prediction study.

Table 1. Selected features defined in the 5G dataset

Full size table

The number of samples is roughly 189K in the raw dataset. From the original dataset, we remove data instances meeting any of the following conditions: (i) DL_bitrate=0, (ii) State=Idle, and (iii) if any feature contains a null value. Note that the State feature defines the state of the download process, whether it is downloading or idle (i.e., not downloading). After this removal process, the pre-processed dataset contains 81,859 instances in total.

We carried out initial explorations to understand potential correlations of the features in the throughput feature (Tput). Figure 1 shows the throughput information on the coordinate space. The figure shows four different throughput ranges: (i) Tput < 100 Kbps, (ii) 100 Kbps \(\le \) Tput < 1 Mbps, (iii) 1 Mbps \(\le \) Tput < 10 Mbps, and (iv) Tput \(\ge \) 10 Mbps. From the figure, we can see that the location information would be helpful for estimating throughput. While some spots (colored in red or orange) show a relatively greater throughput, the rest (in blue or dark blue) show quite low bit rates. The figure also reveals some clusters having higher throughput.

The box plot in Fig. 2 provides the measured throughput over different CQI values. The CQI of a mobile device is a feedback indicating the channel data rate, provided to the base station (eNB). A previous study in [7] reported a partially proportional pattern between CQI and throughput. Our experimental result does not show such proportionality clearly; rather it shows different throughput ranges for each CQI value.

To see how the features are correlated with each other, Fig. 3 provides a correlation matrix. We can see that the feature of RSRP is strongly correlated to RSSI, while RSRP is also somewhat correlated to SNR. Additionally, the feature of RSRQ shows a high degree of correlation with SNR. For the throughput feature (DL_bitrate), none of the features shows any strong correlation. In the next section, we will examine the feasibility of throughput prediction using conventional ML methods. In addition, Fig. 4 shows the importance of the features to determine throughput, compiled by using a random forest classifier (described in Sect. 3). While RSSI is important the most, the result shows any of the features does not play dominantly for predicting throughput.

3 Performance Prediction

In this study, we reduce the performance prediction to a classification problem. We employ several conventional supervised learning methods for making the classification, as follows:

k-Nearest Neighbors (KNN) performs the grouping of data samples based on the proximity information. To classify, the class label most frequently found from its neighbors is assigned to the given data point (on the basis of the concept of majority vote).
Random Forest (RF) is a tree-based ensemble algorithm combining multiple decision trees. The combining function incorporates the results produced by individual tree trained in parallel with a subset of the data randomly allocated, to make a final decision.
Extreme Gradient Boosting (XGB) is also a tree-based ensemble method based on a gradient descent algorithm. XGB builds one tree at a time, while multiple decision trees are built independently in RF. This method is based on minimizing a loss function iteratively, which is the correction of errors observed in the previous iteration.

The classification problem takes the input and the predicted class is produced as the outcome. In this study, we set up three different feature sets to evaluate their impact on the classification performance, as described in Table 2. We basically perform the performance prediction based on the position information. For Set-1, it is reasonable to assume the velocity information is available when issuing the prediction request, whereas the other features defined in Table 1 might not be available before making the actual communication. Figure 2 shows the correlation between CQI and throughput (although not strong), and Set-2 refers to the CQI value in addition to the basic Set-1 features. Lastly, Set-3 refers to the entire feature set defined in Table 2 except State and Tput.

Table 2. Evaluated feature sets for performance prediction

Full size table

For actual evaluation, we partition the dataset into two disjoint sets for training (70%) and testing (30%). To report classification performance, we consider two standard measures of Accuracy and F1-score: Accuracy is a fraction of the correctly classified samples, while F1-score is a harmonic mean and balanced in case of an unbalanced class distribution (i.e., majority vs. minority classes). To consider a class imbalance concern in the evaluation settings, we mainly utilize F1 score by default, unless otherwise mentioned.

3.1 Binary Classification Performance

We first evaluate the binary classification performance. Two classes are defined as: low if tput \(\le \)1 Mbps; high otherwise, in a balanced manner with respect to the distribution of data instances. The Class-0 (low) contains 39,730 samples (48.5%) and the Class-1 (high) does 42,129 samples (51.5%).

Figure 5 shows the prediction performance in F1 score. The evaluation result shows that RF yields the greatest performance, while XGB shows the consistent performance over the reliance on different feature sets. The KNN algorithm shows slightly lower performance than the other two schemes. Note that we set \(k=11\) that produces the best performance for KNN (between 1 and 100 for the k value), while we simply take the default setting for RF and XGB (without intensive optimizations).

An interesting observation is that referencing additional features would not be helpful for improving the prediction performance. In fact, all the classifiers show that using set-1 performs better than or at least equal to the use of other feature sets. We conjecture that this is because any feature defined in Set-2 and Set-3 has no strong correlation to the throughput feature, as depicted in the correlation matrix in Fig. 3. The result here shows that the position information plays a significant role for estimating throughput, and this is somewhat intuitive since a mobile device may show a high degree of fluctuation in application throughput depending on its location due to several reasons, such as the density of devices, signal strength, and interference/reflection.

It is important to note that the features in Set-1 are readily available when establishing actual communication channels, and it is possible to estimate application performance (throughput) with 80% accuracy (precisely F1 score) using the RF predictor. In contrast, the features additionally defined in Set-2 and Set-3 may not be available beforehand at the connection set-up time.

Table 3. Class definition for multi-class prediction

Full size table

3.2 Multi-class Prediction Performance

We also examine the performance prediction tools with multi-class classification settings. Table 3 shows the class definition, for 3-class, 4-class, and 5-class classification settings.

Figure 6 shows the multi-class classification performance for RF. For the comparison purpose, the figure includes the binary classification result as well, As expected, defining a more number of classes results in the significant degradation of the estimation performance. For 3-class classification, the performance goes down to 62% (from 80% when performing the binary classification). As in the binary classification, the multi-class prediction result also shows using Set-1 performs better than using the other feature sets.

The other two classifiers (KNN and XGB) also showed the similar pattern, with slightly lower performance than RF. Figure 7 shows the multi-class prediction result for different classifiers when using Set-1. We can see that RF shows the best performance consistently, while XGB performs better than KNN.

4 Location Spoofing Attacks

We next investigate the impact of location-spoofing attacks on the coordinate-based performance prediction. In fact, location-spoofing attacks are one of the critical attacks in mobile communication environments. A widely-used Vehicular Ad-hoc Networks (VANETs) dataset, VeReMi, assumes five different types for location spoofing attacks [2]: (i) Constant attack transmitting a pre-defined coordinate, (ii) Constant offset adding a pre-defined offset to the original coordinate, (iii) Random transmitting a random coordinate, (iv) Random offset providing a random coordinate in a predefined rectangle around the original coordinate, and (v) Eventual stop transmitting the current coordinate without any change (although moving).

In this study, we evaluate the impact of spoofing attacks with constant spoofing and constant offset spoofing. Again, the constant spoofing attack overwrites the location information with the constant value. We chose five random positions to simulate the constant spoofing attack (within the coordinate space). The second scenario is the use of constant offset attack, in which a constant offset value is added to the original coordinate. For the constant offset attack, we use the notion of perturbation degree: In the coordinate space in the 5G dataset, it is straightforward to calculate the width of latitude space (i.e., \(|x| = x_{max}-x_{min}\)) and the height of the longitude space (\(|y| = y_{max}-y_{min}\)). The constant offset for a perturbation degree p is defined as \(p \times (|x|, |y|)\). For the constant offset attack, we configure different perturbation degrees from 5% to 50% to define the offset.

Table 4. Impact of constant spoofing attack (with Set-1)

Full size table

Table 4 shows the performance prediction result with and without spoofing attacks. The experiment was performed with Set-1 for the binary prediction. Since five different coordinates were randomly picked up, we report the result with the average and standard deviation (for w/ spoofing). As can be seen from the table, even this simple spoofing attack considerably degrades the prediction performance. For instance, RF becomes degraded from 80% to 34.2% in F1 score, while KNN is slightly less affected than RF and XGB.

The constant spoofing attack would be easily detected and resisted as it relies on static positions. The constant offset attack is more complicated to detect since the modified coordinate is based on the original location. Figure 8 shows the binary classification performance over different perturbation degrees (p). Note that \(p=0\) indicates no spoofing attack applied. As can be seen from the figure, even a small perturbation degree (\(p=1\%\)) significantly impacts on performance prediction, from 80% to lower than 60% in F1 score, regardless of classifier types. With a greater degree of perturbation, the prediction performance drops below 50% if \(p \ge 3\%\) for any classifier. The result here signals the need for effective defense mechanisms against location-spoofing attacks for reliable estimation of throughput in a mobile communication setting.

5 Related Work

A recent study in [6] investigated mobile bandwidth prediction using 4G and 5G datasets. For bandwidth prediction, the authors applied a Recurrent Neural Network (RNN) structure by formulating the prediction problem as a time series forecasting. Their experimental result shows better performance than the conventional univariate and multivariate prediction models. This previous work assumes bandwidth prediction as a (continuous) regression problem, while our study defines the throughput estimation as a (discrete) classification problem.

The authors in [5] evaluated the impact of location spoofing attacks using the VeReMi dataset. In this previous work, two machine learning algorithms of KNN and Support Vector Machine (SVM) were examined. The measured detection performance against spoofing attacks shows over 99% (in recall and precision). A recent study in [1] investigated the detection of falsified positions and the corresponding attack types in vehicular communication networks using a boosting decision tree ensemble technique. Our study analyzes the 5G dataset to understand the impact of location spoofing attacks on performance prediction (rather than detection of spoofed coordinates).

6 Conclusion

This paper investigates mobile communication performance based on the coordinate information of mobile devices using an ML approach. Only using three features of <Longitude, Latitude, Velocity>, we observed up to 80% correct decisions (in F1 score) for binary prediction using a conventional random forest classifier. However, the experimental result shows the location-based performance prediction becomes considerably degraded when assuming more than two classes (i.e., multi-class prediction). This paper also investigated the impact of location-spoofing attacks on the coordinate-based performance prediction, since location-spoofing attacks are one of the critical attacks in mobile communication environments. The location spoofing attacks significantly impact on performance prediction from 80% to lower than 50% correct decisions, signaling the need for effective defense mechanisms for reliable performance estimation.

In this initial study, we employed conventional ML methods (KNN, RF, and XGB) for predicting throughput in a mobile communication setting. The observed performance of 80% for binary classification could be improved by designing more sophisticated learning models (e.g., using deep structures), which is one of the future tasks of this study. Additionally, this paper showed the significant impact of location spoofing attacks on performance prediction by applying two spoofing attack types (constant spoofing and constant offset spoofing). For more a sophisticated ML model resilient to such attack types, it will be interesting to apply other types of spoofing attacks (i.e., random, random offset, and eventual stop spoofing) for evaluating the robustness to location spoofing. Another interesting research avenue is the investigation of defense mechanisms against potential spoofing attacks, with the impact on performance prediction.

References

Elsayed, M.A., Zincir-Heywood, N.: BoostGuard: interpretable misbehavior detection in vehicular communication networks. In: NOMS 2022–2022 IEEE/IFIP Network Operations and Management Symposium, pp. 1–9. IEEE (2022)
Google Scholar
van der Heijden, R.W., Lukaseder, T., Kargl, F.: VeReMi: a dataset for comparable evaluation of misbehavior detection in VANETs. In: Beyah, R., Chang, B., Li, Y., Zhu, S. (eds.) SecureComm 2018. LNICST, vol. 254, pp. 318–337. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01701-9_18
Chapter Google Scholar
Ho, T.M., Nguyen, K.K., Cheriet, M.: UAV control for wireless service provisioning in critical demand areas: a deep reinforcement learning approach. IEEE Trans. Veh. Technol. 70(7), 7138–7152 (2021)
Article Google Scholar
Kamal, M., Barua, A., Vitale, C., Laoudias, C., Ellinas, G.: GPS location spoofing attack detection for enhancing the security of autonomous vehicles. In: 2021 IEEE 94th Vehicular Technology Conference (VTC2021-Fall), pp. 1–7. IEEE (2021)
Google Scholar
Le, A., Maple, C.: Shadows don’t lie: n-sequence trajectory inspection for misbehaviour detection and classification in VANETS. In: 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall), pp. 1–6. IEEE (2019)
Google Scholar
Mei, L., Gou, J., Cai, Y., Cao, H., Liu, Y.: Realtime mobile bandwidth and handoff predictions in 4G/5G networks. Comput. Netw. 204, 108736 (2022)
Article Google Scholar
Raca, D., Leahy, D., Sreenan, C.J., Quinlan, J.J.: Beyond throughput, the next generation: a 5G dataset with channel and context metrics. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 303–308 (2020)
Google Scholar
Riihijarvi, J., Mahonen, P.: Machine learning for performance prediction in mobile cellular networks. IEEE Comput. Intell. Mag. 13(1), 51–60 (2018)
Article Google Scholar
Sharma, A., Jaekel, A.: Machine learning approach for detecting location spoofing in vanet. In: 2021 International Conference on Computer Communications and Networks (ICCCN), pp. 1–6. IEEE (2021)
Google Scholar
da Silva Pinheiro, T.F., Silva, F.A., Fé, I., Kosta, S., Maciel, P.: Performance prediction for supporting mobile applications’ offloading. J. Supercomput. 74(8), 4060–4103 (2018)
Article Google Scholar
Xu, L., Quan, T., Wang, J., Gulliver, T.A., Le, K.N.: GR and BP neural network-based performance prediction of dual-antenna mobile communication networks. Comput. Netw. 172, 107172 (2020)
Article Google Scholar
Xu, L., Wang, J., Wang, H., Aaron Gulliver, T., Le, K.N.: BP neural network-based ABEP performance prediction for mobile internet of things communication systems. Neural Comput. Appl. 32(20), 16025–16041 (2020)
Article Google Scholar

Download references

Acknowledgment

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2021-0-00796, Research on Foundational Technologies for 6G Autonomous Security-by-Design to Guarantee Constant Quality of Security).

Author information

Authors and Affiliations

Texas A &M University, Commerce, TX, 75428, USA
Nikhil Sai Kanuri & Jinoh Kim
University of Colorado, Colorado Springs, CO, 80918, USA
Sang-Yoon Chang
San Jose State University, San Jose, CA, 95192, USA
Younghee Park
ETRI, Yuseong, Daejeon, 34129, Korea
Jonghyun Kim

Authors

Nikhil Sai Kanuri
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Yoon Chang
View author publications
You can also search for this author in PubMed Google Scholar
Younghee Park
View author publications
You can also search for this author in PubMed Google Scholar
Jonghyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jinoh Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinoh Kim .

Editor information

Editors and Affiliations

IBM Research, San Jose, CA, USA
Luis Bathen
San José State University, San Jose, CA, USA
Gokay Saldamli
California State University, Sacramento, CA, USA
Xiaoyan Sun
San Jose State University, San Jose, CA, USA
Thomas H. Austin
National Institute of Standards and Technology, Gaithersburg, MD, USA
Alex J. Nelson

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kanuri, N.S., Chang, SY., Park, Y., Kim, J., Kim, J. (2022). Impact of Location Spoofing Attacks on Performance Prediction in Mobile Networks. In: Bathen, L., Saldamli, G., Sun, X., Austin, T.H., Nelson, A.J. (eds) Silicon Valley Cybersecurity Conference. SVCC 2022. Communications in Computer and Information Science, vol 1683. Springer, Cham. https://doi.org/10.1007/978-3-031-24049-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-24049-2_7
Published: 19 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24048-5
Online ISBN: 978-3-031-24049-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Impact of Location Spoofing Attacks on Performance Prediction in Mobile Networks

Abstract

Similar content being viewed by others

Mobile Network Threat Analysis and MNO Positioning

IMSI Probing: Possibilities and Limitations

Tracking Traffic Peaks in Mobile Networks Using Statistics of Performance Metrics

Keywords

1 Introduction

2 Exploratory Analysis of 5G Dataset