1 Introduction

The security of computing systems is based on three basic principles: confidentiality, integrity and availability. System availability of networks and services can be significantly impaired by Denial of Service (DoS) attacks which can take various forms which differ according to the technology being considered. Thus DoS attacks for IP (Internet Protocol) networks differ significantly from DoS attacks against mobile networks.

Mobile networks are susceptible to DoS attacks, mostly because of the networks’ openness to the Internet, the use of deterministic procedures, and the use of basic design principles based on “typical” user behaviour. In the last ten years, there were huge advances from an algorithmic, manufacturing ‘and software perspective, pushing forward the innovation of mobile smart devices and applications, which operate over a mobile network - while the network itself did not keep up with the pace. One of the problems caused by these circumstances, is the appearance of DoS attacks known as signalling storms, which overload the control plane of the mobile network, unlike many previously known data plane flooding attacks [26, 37].

Network security is ranked as one of the top priorities for future self-aware networks [18], which is why there is well established research in the field. Furthermore, while work in [21, 33] focuses on a general defensive approach against DoS attacks in future networks, signalling storm specific research can roughly be categorised in the following groups: problem definition and attacks classification [5, 30, 31, 41]; measurements in real operating networks [11, 40]; modelling and simulation [1, 27]; impact of attacks on energy consumption [10, 12]; attacks detection and mitigation, using counters [19, 20, 38], change-point detection techniques [32, 42], IP packet analysis [28], randomisation in RRC’s functions [45], software changes in the mobile terminal [8, 34], monitoring terminal’s bandwidth usage [39], and detection using techniques from Artificial Intelligence [2]. As we look to the future, such as the Internet of Things (IoT), various forms of attacks will also have to be considered [6, 9].

The communication schemes may be opportunistic [25] and attacks may use similar opportunistic means to access IoT devices, viruses and worms will continue being important threats [16] and they can diffuse opportunistically through a network [17], video input is one of the uses of the IoT and video encoding [7] can also be specifically targeted by attacks. Furthermore, many network services are organised to flow over overlay networks [4] that cooperate with the Cloud [43, 44] to offer easy deployable and flexible services for the mobile network control plane. Thus research needs to remain alert to such developments.

In this paper we mainly use stochastic modelling techniques, in order to represent complex communication protocols, such as the Radio Resource Control (RRC), in simplified mathematic terms. In particular we use open and closed queueing networks with multiple classes of calls. The analysis of these systems is first described by Jackson [29], Basket et al. [3], and Gelenbe [14, 15, 22,23,24], among others. A second approach is used through discrete event simulation, whose results in many cases are comparable to queueing network models. More precisely, we are using a specialised Mobile Networks Security Simulator (SECSIM) created by in research group [27].

The remainder of the paper is organised as follows. In Sect. 2 we present a queueing network model of a generic architecture of a mobile network, and model normal and attack behaviour in Sect. 2.2. In Sect. 3 we present two attack detection techniques, respectively in Sects. 3.1 and 3.2, and a mitigation technique in Sect. 3.3. Finally, Sect. 4 concludes the paper.

2 Network Model

The proposed model describes a general network architecture, focusing on its radio access part, from the perspective of both, the control and data (user) plane. It’s envisioned to represent different mobile network technologies, which is achieved through representing the resource allocation in the data plane as a “black box” where different technologies’ sub-models can be plugged in, while keeping the control plane unchanged. The core part of the model consists only the basic elements of the architecture, such as multiple Base Station (BS) nodes connected to a single network controller consisting one Signalling Server (SS) node, and the communication stage nodes.

2.1 Model Description

An example workflow captured by our model goes as follows. When a mobile terminal wants to communicate, it sends a connection setup request through the control plane of the network, which needs to be processed at the BS and the SS. If admitted, the mobile proceeds to communicate in the data plane of the network, in sessions (each comprising multiple data packets), which we denote as calls in the rest of the paper. If a call is blocked, then the mobile may either leave the network or attempt to reconnect with a probability that depends on the type of call. There are two types of calls or connection setup requests in the network: (i) normal calls representing traffic from legitimate users or applications, and (ii) attack traffic generated by malicious or malfunctioning applications that may overload the network. The network model is open with calls joining and leaving the network, representing for example the arrival and departure of mobiles to WiFi areas. Its parameters are defined in Table 1 where the superscript \(r\in \{n,a\}\) denotes the class of a call (normal n or attack a) (Fig. 1).

Fig. 1.
figure 1

A model of the radio access part of a mobile network.

Table 1. The main parameters of the model

We assume calls arrive from outside the network according to independent Poisson processes and the service times in each node are independent and exponentially distributed. Since calls may be blocked at the SS due to congestion, the aggregate arrival processes at different parts of the network are not Poisson. Nevertheless, to simplify matters so as to obtain analytical solutions, we make the approximation that all flows within the network are Poisson. The service time distribution for the BS and SS nodes in the signalling stage is same for both classes of calls, because the signalling procedure undertaken by the network does not distinguish call classes. On the other hand, in the communication stage, the service time distribution is distinct for different classes of calls because of the different bandwidth usage behaviour of the normal and malicious calls.

The flow of calls in the above model could be expressed in a closed form as follows. The total arrival rate of class-r connection requests at BS i is the sum of the rates of (i) new calls, (ii) returning calls that timed out, and (iii) calls that were blocked at a cell j by the SS and are attempting to connect at cell i:

$$\begin{aligned} \lambda ^r_i = \underbrace{\lambda ^r_{0i}}_{\text {new calls}}+\underbrace{\gamma ^r_i(1-p^r_{i0})}_{\begin{array}{c} \text {reconnecting after}\\ \text {timeout} \end{array}} + \underbrace{\sum _{j=1}^N \lambda ^r_j p^r_{jb}(1-p_{b0}^r)p^r_{ji}}_{\begin{array}{c} \text {joining after being blocked}\\ \text {at cell }j\text { due to congestion} \end{array}}, \end{aligned}$$

where the proportion of blocked calls \(p_{ib}^r\) and the rate of admitted calls that has timed out \(\gamma ^r_i\) depend on \(\lambda ^r_j,~\forall j\). The model as presented is suitable for modelling different mobile technologies under an attack. More details, and a comparison of the attacks’ influence on two groups of technologies, are presented in [36].

2.2 User Behaviour Model

An important part of the network model is the user behaviour model. In general, the two classes of calls have different service time distributions. A normal call, for example web browsing traffic, would usually happen in bursts which would occupy the channel for a longer period. Contrary, attack calls would usually transfer only a small portion of data in order to trigger quick bandwidth allocations and deallocations. The two patterns are depicted on Fig. 2 with \(T^n\) denoting the normal session duration and \(T^a\) the attack session duration, and s and q respectively denoting “service” and “quiet” periods. In this part we need to estimate the average session duration \(E[T^r] = 1/\mu ^r\).

Fig. 2.
figure 2

The user behaviour model describing the duration of a single data session \(T^r\) of class r.

Figure 2 could be translated to a Markov Chain model as in Fig. 3, using the states: service (S), quiet (Q), and end of session (F). The transitions among S and Q states are controlled with \(\alpha ^r\), and \(\beta ^r\), where \(1/\alpha ^r\) is the average communication time of a class-r burst, and \(1/\beta ^r\) is the average duration of a quiet (inactivity) period, regarding class-r calls. The timeout rate is given with \(\tau =1/t_0\).

Fig. 3.
figure 3

State diagram of the user behaviour model.

Let us denote with \(\varPi _i\) the probability of the session being in one of the states \(\{S,Q,F\}\). The average session duration could be found using the following ratio:

$$\begin{aligned} \frac{\varPi _S + \varPi _Q + \varPi _F}{1 + E[T^r]} = \varPi _F. \end{aligned}$$

Solving the balance equations yields the state probabilities in equilibrium, and the above equation solves to:

$$\begin{aligned} (\mu ^r)^{-1} = E[T^r] \equiv \frac{1}{\mu ^r} = \frac{1}{\alpha ^r} + \frac{1}{\tau } + \frac{\beta ^r}{\alpha ^r\tau }. \end{aligned}$$

In the above expression, one can see that when the timeout is very short, with \(\tau \rightarrow \infty \), the average session duration tends to the communication time of a single burst \(1/\alpha ^r\). Modifying the \(\alpha ^r\) and \(\beta ^r\) parameters, this modelling approach can be used to investigate different traffic types, and different attack patterns.

3 Detection and Mitigation

In this Section, we first present two real-time storm detection mechanisms based on counting channel allocations and monitoring bandwidth usage. Both are tested in the SECSIM simulator. The mitigation mechanism employs an idea of using a adjustable inactivity timer, and is tested with the model in Sect. 2.

3.1 Counter Detection

Description. The Counter detection mechanism enables detection of signalling storms per mobile terminal in real-time. It is based on counting the repetitive bandwidth allocations of same channel type (eg. a shared FACH or dedicated DCH channel in a 3G UMTS network). It is envisioned as a lightweight mechanism that should not impose any processing, storage, and memory problems if implemented on a mobile terminal.

Decision Making. The mechanism requires two input parameters: the time instances of bandwidth allocation and the type of bandwidth allocation, which are stored in memory for the duration of a time window of length \(t_w\). A decision of an attack being detected is simply taken when the number of repetitions reaches a predefined threshold called counter threshold - n. The length of the window \(t_w\) is chosen such that \(t_w > n\cdot t_{I}\), where \(t_{I}\) the duration of the inactivity timer of the attacked state. The upper limit of \(t_w\) is set according the memory and storage capacities of the device on which it is implemented.

Evaluation. Figure 4 shows the performance of the described detection algorithm using a ROC curve, as calculated with the SECSIM simulator. A threshold of \(n=3\) could be a suitable choice resulting in around 40% true positive detection \(p_{tp}\) and less than 0.2% false positive detection \(p_{fp}\).

3.2 Bandwidth Monitoring Detection

The Bandwidth monitoring detection mechanism uses a simple idea of tracking the bandwidth usage of each mobile terminal in a given sliding time window, and calculating a cost function to estimate the likelihood of a terminal performing a signalling attack. It’s based on previous analyses which showed that signalling storms are inefficient bandwidth users. The mechanism monitors two input parameters: the total time that the terminal spends while allocated bandwidth within a given time window \(t_w\) (denoted with \(t_D\), and \(t_F\) respectively for DCH and FACH states in 3G UMTs), and the time which the mobile terminal is allocated bandwidth but does not transfer any data in a time window \(t_w\) (denoted with \(t_{Di}\) and \(t_{Fi}\)). Whenever resources are de/allocated, the detector calculates the ratio \(\frac{t_{Fi}+t_{Di}}{t_{F}+t_{D}}\), which is then rolled in time using the Exponential Weighted Moving Average (EWMA) algorithm as:

$$\begin{aligned} C[k] = \alpha \frac{t_{Fi}[k]+t_{Di}[k]}{t_{F}[k]+t_{D}[k]} + (1-\alpha )C[k-1], \end{aligned}$$

where \(k\in \mathbb {N}>0\) is the index of the state change, \(0\le \alpha \le 1\) is a weight parameter and \(C[0] = \frac{t_{Fi}[0]+t_{Di}[0]}{t_{F}[0]+t_{D}[0]}\) is the initial cost value. As defined, C is between 0 and 1 with values closer to 1 indicating higher probability of an attack.

Decision Making. For decision making, we define two thresholding rules, and a rule based on the cost function. Observing the cost C, and having calculated an average \(C_{avg}\) over all historical C values, a simple rule of \(C\ge \beta C_{avg}\) can be used to detect an attack. A second rule is using an upper threshold \(\theta ^+\) above which we make a decision of an attack. This rule helps in detecting attacks with very small attack rate, for which the cost function rule cannot be used, because \(\beta C_{avg} > 1\). A second threshold is defined as lower threshold \(\theta ^-\) below which we assume a normal behaviour of the mobile terminal. The \(\theta ^-\) rule helps in protecting mobiles with normal behaviour of high activity, which are assigned a low value of \(C_{avg}\). Setting up these thresholds should be based on offline traffic analysis by the mobile operators.

Evaluation. The performance of the Bandwidth monitoring detection algorithm is depicted with the ROC curve on Fig. 4, which combines the \(p_{fp}\) and \(p_{tp}\) metrics. Values in the top-left corner of the graph are most desirable, as it produces the highest true positive and lowest false positive detection probabilities. The simulation results suggest that \(\alpha =0.3\) is the most suitable value, producing 95% true positive and 0.04% false positive detection.

Fig. 4.
figure 4

ROC curves of the counter detector (left), and bandwidth detector (right).

3.3 Dynamic Timer Mitigation

Mobile networks today use a fixed value for the inactivity timer with possible manual corrections for specific situations, which we consider to not be the optimal approach. While it plays an important role in controlling radio resource allocation, being a trade-off parameter between the bandwidth reuse and number of connections, this section examines if it could possibly play a similar role controlling the impact of a signalling attack on the network. For this, we propose a dynamic inactivity timer which is set as a function of the network load, and use the model described in Sect. 2 to study its performance.

One possible approach is to increase the timer linearly with the load on the signalling server, after a signalling load threshold value \(\theta \) is reached:

$$ t_0(\lambda _s) = {\left\{ \begin{array}{ll} t_0^{min} \quad \lambda _s \le \theta ,\\ \frac{(t_0^{max}-t_0^{min})}{\lambda _s^{max}-\theta }\cdot (\lambda _s-\theta ) + t_0^{min} \quad \lambda _s > \theta , \end{array}\right. } $$

where \(\lambda _s^{max}\) is the maximum allowed load on the signalling server, \(\theta \) is a load threshold and \(t_0^{min}\) and \(t_0^{max}\) are the minimum and maximum values that the timer can take. In real operating network, these parameters need to be estimated from statistical observations.

Results. Using the model in Sect. 2 we select a data plane model with \(m=20\) non-sharing data channels, such as in 3G UMTS Rel. 99, modelled as M/M/m/m Markov chain [35]. The rest of the parameters are selected as follows: \(\lambda _0^n=1, p_0^n=0.9, p_0^a=0.1, p_{b0}^n=0.9, p_{b0}^a=0.3\), \(\lambda _e=0.05, t_0=2\,\text {s}\) (static), \(t_0^{max}=60\,\text {s}, t_0^{min}=2\,\text {s}, \lambda _s^{max}=5\,\text {calls/s} \theta =3\,\text {calls/s}\).

Fig. 5.
figure 5

Signalling server load for static and dynamic inactivity timer.

Figure 5 shows the comparison of a static and dynamic inactivity timer for varying network load. The dynamic timer activates when the threshold load \(\theta \) is reached and manages to lower the resulting network load, compared to the static approach. Although the timer can play a control role, it cannot completely mitigate a signalling storm. One downside of using this approach is increasing the portion of normal calls that don’t get a service. Therefore, the timer controls the trade-off between the signalling load in the network and the number of unserviced normal calls.

4 Conclusions

This paper has briefly explained the ongoing research in the field of mobile networks security, looking at in particular, signalling related attacks. It introduced a generic mathematical model of the radio access part of a network, which can be used to model different mobile technologies, and different user patterns. The model was afterwards used to examine an attack mitigation technique using a modified inactivity timer. The two proposed attack detection mechanisms were implemented in a simulation environment and their evaluation showed satisfactory results of 95% true positive and 0.04% false positive detection. Recent work has used the Random Neural Network [13] for attack detection [2] and we expect that further results will become available with similar machine learning techniques.