1 Introduction

Nowadays, industrial control systems (ICS) [1] play a very important role in national critical infrastructures, such as smart grids [24], water treatment systems [5], chemical processing plants [6], oil and natural gas pipelines [7], or large-scale communication systems [8]. With the rapid development of Internet technology (IT), ICS are also strengthening the connectivity to the Internet so as to make full use of the rich resources on the Internet to support remote process control and intelligent decision-making. However, the growing openness of ICS has made them an attractive target for malicious attackers [9, 10]. In 2010, the notorious cyber worm “Stuxnet” infected the core control program of the Natanz uranium enrichment base in Iran and misled the centrifuge that produces enriched uranium into accelerating unconventionally, and finally caused a severe damage to the centrifuge and the whole nuclear plant was forced to stop. In 2015, the “BlackEnergy3” attacked the Ukrainian power grid. The counterfeited control instructions of relays caused abnormal circuit disconnections, immediately followed by a large-scale blackout. At Black Hat 2017 [11], Dr. Staggs stated that the wind farm vendor design and implementation flaws left the wind turbine programmable automation controllers and OPC (OLE for process control) servers vulnerable to attacks. Additionally, they designed attack tools to exploit wind farm control network design and implementation vulnerabilities. So many ICS security incidents indicate that the security of ICS has become an urgent international issue [12, 13].

Intrusion detection systems (IDS) [14, 15] provide an effective solution to identify malicious attacks against traditional information systems by analyzing network protocols and traffic data. However, when applying IDS to ICS, the real-time process data is another important factor to consider [16]. The evolution of an industrial process generally follows fundamental laws of nature, which is a distinct feature of ICS. Attackers usually attempt to cause fatal physical damages to ICS by manipulating process data (e.g., sensor readings [17, 18] or control signals [19, 20]) maliciously. Therefore, by monitoring and analyzing the “physics” of ICS, we can detect a wide variety of intrusions. IDS generally construct a physical model for the target control system, based on which to forecast its expected behaviors. Once the monitored behaviors deviate from the expected values significantly, an alarm is raised.

However, in recent years, Liu et al. [18] discovered a new kind of stealthy attacks against ICS, which can bypass existing intrusion detection schemes. As we all know, the dynamic behavior of a control system generally does not change significantly within a short time period due to physical constraints. Therefore, the attacker can make the observed behavior of a system follow its expected behavior closely during a stealthy attack, but still inject enough false information into the system after a long period of time [16], and finally cause a fatal damage to the target system. Since then, stealthy attacks against ICS have attracted much attention [21, 22]. Previously, we proposed a detection approach against stealthy attacks based on residual permutation entropy [23].

In this paper, we propose an effective and much faster stealthy attack detection technique based on residual skewness analysis of system behaviors, which is more suitable for the real-time requirement of industrial control systems. Counterfeited residuals generally conform to a skewed distribution, which is different from a normal distribution, if the intruder intends to achieve specific attack goals. The values of the residual skewness coefficient can effectively distinguish a residual sequence generated during a stealthy attack from an attack-free residual sequence. Accordingly, stealthy attacks can be identified successfully. We launch stealthy attacks on two simulated ICS and verify the effectiveness of the proposed stealthy attack detection technique. The key contributions of this work are summarized as follows:

  • We investigate the prediction residuals of system behaviors under stealthy attacks and discover that the residual distribution exhibits a significant degree of skewness compared to a normal distribution.

  • We make full use of the skewness contained in the prediction residuals and propose a novel detection technique against stealthy attacks based on residual skewness analysis.

  • Comprehensive experiments are conducted on simulated ICS to verify the effectiveness and efficiency of the proposed stealthy attack detection approach.

The rest of the paper is organized as follows. Section 2 introduces some research literature about ICS IDS. In Section 3, we present some preliminaries of our approach. In Section 4, we elaborate on the novel detection technique against stealthy attacks based on residual skewness analysis. Experiments are conducted to verify the effectiveness and efficiency of the proposed stealthy attack detection approach in Section 5. Experimental results are discussed in Section 6. Finally, we draw a conclusion in Section 7.

2 Related work

Due to the increasing connectivity between ICS and the outside IT world, cyber attacks against IT systems also endanger ICS. Traditionally, intrusion detection techniques against cyber attacks are mainly divided into two categories: misuse-based and anomaly-based. Misuse-based intrusion detection techniques, also referred as signature-based, rely on a precise definition of malicious system behaviors. If system activities match the known malicious behavior patterns, a potential attack is detected. Anomaly-based intrusion detection techniques exploit a definition of normal behavior and flag any visible deviation from normal behavior as unintentional faults or intentional attacks. In this section, we try to present a new taxonomy of intrusion detection techniques on ICS. Attacks against ICS often cause abnormal network traffics or violate network protocol specifications. Furthermore, due to the close correlation between ICS and physical processes, investigating process data can also help identify malicious intrusions against ICS. Therefore, we introduce the research literature of ICS IDS from three aspects: network traffic mining, network protocol analysis, and process data analysis.

2.1 Intrusion detection based on network traffic mining

ICS have relatively fixed operation objects and business processes, simple and static network topologies, and small numbers of applications, which result in relatively stable traffic patterns under normal conditions. Fluctuation of network traffics generally indicates the status change of ICS, which enables intrusion detection based on network traffic mining.

Traditional IDS based on network traffic analysis [24] generally extract information such as source and destination IP addresses and ports, traffic durations, and average time intervals between adjacent packets, and then apply data mining technologies to these collected information to identify abnormal system behaviors. The commonly used traffic mining techniques include supervised clustering [25], semi-supervised clustering [26], mixed Gaussian model [27], neural network [28, 29], fuzzy logic [3032], single-class support vector machine [33], multi-class support vector machine [34], and deep learning [35]. The purpose of these techniques is to establish complex non-linear relationships between network traffics and system behaviors. The relationships, together with the current network traffic data, are then used to judge the security status of a target system. However, the computation overhead is usually high due to the large number of traffic features. In order to improve detection efficiency, some researchers utilized techniques like the ant colony algorithm [36] and the principal component analysis method [37] to remove redundant traffic features.

2.2 Intrusion detection based on network protocol analysis

Protocol specifications generally define the packet formats and communication modes allowed by the protocol. Intrusion detection rules can be extracted from protocol specifications. Accordingly, malicious behaviors that violate protocol specifications can be identified effectively. Common open protocols in ICS include ModBus, ICCP/TASE.2, and DNP3. These protocols are vulnerable to a variety of network attacks such as theft, tampering, and counterfeiting.

Cheung et al. [38] constructed a protocol specification model based on legal values of different data fields and legal relationships between different fields in a data packet. Additionally, they built normal communication patterns based on the security requirements, the data transmission directions and transmission ports of specific ICS. Anomalies violating the protocol specification model or the desired communication patterns could be detected, which belongs to anomaly-based intrusion detection techniques. Morris et al. [39] used Snort (an intrusion detection software) to generate signatures for ModBus protocol vulnerabilities. These signatures were used to examine communication data in field networks and identify illegal data, which is a typical misuse-based approach. Moreover, in order to achieve rapid development, other researchers modify the traditional IDS to make them suitable for ICS. Lin et al. [40] integrated a packet parser of industrial control protocols (e.g., DNP3) into the famous network intrusion detection system Bro developed by the University of Berkeley, to support intrusion detection in ICS.

In addition to open protocols, IDS based on proprietary protocols are also designed. Hong et al. [41] analyzed automatic systems in the substations of smart grids and detected anomalies or malicious behaviors in multicast messages based on the specifications extracted from the IEC 61850 standards (e.g., Generic Object Oriented Substation Event (GOOSE) and Sample Value technology (SV)). Hadeli et al. [42] extracted legal and illegal network traffic models from the protocol specifications of power systems and transformed them into Snort rules for intrusion detection.

The above two categories of IDS build the first security barrier for ICS. However, the close relationship between ICS and the physical world makes ICS different from traditional information systems. Therefore, the above two categories of IDS, originally designed for traditional information systems, are difficult to identify attacks against physical processes, which do not cause abnormal network traffics nor violate network protocol specifications. Therefore, IDS based on process data analysis have emerged.

2.3 Intrusion detection based on process data analysis

Process information is an important factor to consider in ICS IDS. Attackers usually mislead the controller into making wrong decisions [17] by tampering with process information, and finally cause a fatal damage to ICS. Such attacks can be detected by comparing the observed and expected process values in real time. Once the deviation exceeds a predefined threshold significantly, an alarm is raised [43]. Hadžiosmanović et al. [44] classified process variables into three categories: constants, enumeration, and continuous variables. Afterwards, a normal behavior model was built for each process variable. During system operation, once an observed process value deviated from its normal behavior model, the system generated an alarm. Carcano et al. [45] used measurement data from multiple industrial sensors to denote system states and proposed a state distance measurement method. Intrusions could be identified by examining the proximity between the current state and the critical states.

Other researchers use time series forecasting techniques to predict the future outputs of ICS. The predicted outputs are compared with the monitored values to generate residuals. Afterwards, some statistical analysis techniques are performed on the residuals to identify intrusions. If the system operates normally, the residual sequence follows a Gaussian distribution approximately. Once an intrusion occurs, the actual behavior of a system deviates from its expected behavior, i.e., the residuals are different from 0 observably [46]. Cárdenas et al. [47] summarized two categories of intrusion detection methods based on residual analysis: sequential detection and change detection. The former aims to find intrusions as soon as possible, i.e., determining the shortest residual sequence based on which IDS can make a normal/abnormal judgment. The latter detects a possible anomaly at an unknown time point. In other words, the system detects the transition from a normal state to an abnormal state based on whether the residual or the accumulated residual exceeds a certain threshold. The commonly used change detection methods can be classified into two categories: stateless [48] and stateful [16]. The stateless and stateful detection methods raise alarms when the residual and the cumulative residual at the current time point exceed a threshold, respectively.

However, Liu et al. [18] discovered a new kind of data injection attacks against state estimation in power grids in 2011. This attack injects erroneous data into the system persistently until the system crashes, but always keeps the residual magnitudes below the detection threshold, thus to bypass the stateless intrusion detection scheme. This is the first stealthy attack against ICS. Since then, stealthy attacks have emerged in a variety of industrial control scenarios (e.g., chemical process control [47] and industrial waste water treatment [49]). Until 2016, Urbina et al. [16, 50] stated that existing intrusion detection technology still cannot detect stealthy attacks effectively, so they proposed a new method to measure the negative impacts of stealthy attacks on ICS and tried to limit the negative impacts by configuring detection schemes and metrics properly. Since then, some researchers have conducted further research on stealthy attacks, but they mainly focused on how to perform stealthy attacks on specific ICS [21] or exploring the impacts of stealthy attacks on some more complex systems [22]. As a result, detecting stealthy attacks against ICS becomes an urgent issue. In our previous work [23], we proposed a detection technique against stealthy attacks based on the analysis of residual permutation entropy. This technique was effective but not very fast. In this paper, we propose an effective and much faster technique to detect stealthy attacks based on residual skewness analysis, which utilizes the residual distribution skewness to identify abnormal system behaviors.

3 Preliminaries

The approach proposed in this paper belongs to the category of IDS based on process data analysis. Intrusion detection based on process data analysis mainly includes three steps. First, build a physical model for the target system in order to predict its expected outputs \(\hat {y}_{k}\) in the future. Second, compute the residuals rk between the observed outputs yk and the predicted values \(\hat {y}_{k}\) during system operation. Third, perform statistical analysis on the residual sequence to detect intrusions. In this section, we introduce physical models of ICS, prediction techniques, and intrusion detection statistics.

3.1 Physical models of ICS

Physical models generally characterize time-varying behaviors of ICS, so a reasonable model can predict the expected behavior of a system accurately. We can derive physical models from first principles (e.g., Newton’s laws, electromagnetic laws, and fluid dynamics) or from historical data of ICS using system identification technology. There are two commonly used models in system identification: auto-regressive integrated moving average (ARIMA) [51] and linear dynamical state-space (LDS) [52]. The ARIMA model of a time series {yk} is formalized as follows:

$$ y_{k} = \sum\limits_{i=1}^{p} \phi_{i} y_{k-i} + \sum\limits_{j=1}^{q} \theta_{j} \varepsilon_{k-j} + \varepsilon_{k}, $$
(1)

where yk and yki (i=1,2,…,p) are the current and last p output values of a system, εk and εkj (j=1,2,…,q) are the current and last q prediction errors, which are Gaussian noises with a zero mean and a non-zero variance, ϕi and θj are model parameters, which should be estimated from the time series {yk} [53].

ARIMA models just build relationships between system outputs, but cannot relate system inputs to system outputs. If both the control signals (inputs) and the sensor readings (outputs) are available, we can construct the LDS model as follows:

$$ \boldsymbol{x}_{k+1}=\boldsymbol{A}\boldsymbol{x}_{k} + \boldsymbol{B}\boldsymbol{u}_{k} + \boldsymbol{K}\boldsymbol{\varepsilon}_{k}, $$
(2)
$$ \boldsymbol{y}_{k}=\boldsymbol{C}\boldsymbol{x}_{k} + \boldsymbol{D}\boldsymbol{u}_{k} +\boldsymbol{e}_{k}, $$
(3)

where A, B, C, D, and K are system matrices characterizing the dynamics of a physical system, and εk and ek are process and sensor noises following Gaussian distributions. D is generally equal to 0 owing to the strict causality of most physical systems. The LDS model indicates that the next state \(\boldsymbol {x}_{k+1} \in \mathbb {R}^{n}\) of a system is determined by the current state \(\boldsymbol {x}_{k} \in \mathbb {R}^{n}\) and the current control signal \(\boldsymbol {u}_{k} \in \mathbb {R}^{p}\). Additionally, as shown in Eq. (3), the expected output \(\boldsymbol {y}_{k} \in \mathbb {R}^{q}\) of the system is a linear combination of system states xk.

3.2 Kalman filtering for process forecasting

Kalman filtering (KF) [54] is a well-known technique to forecast the future behavior of a LDS model. The KF algorithm performs two operations recursively: prediction and update. The prediction step projects forward the current posteriori state to the next priori state, along with uncertainties. Once the system output (inevitably corrupted with some errors and noises) of the next step is measured, the update step computes the posteriori state of the next step as a weighted average of its priori estimate and the sensor measurement. A greater weight is assigned to a priori state estimate with higher certainty.

We respectively use \(\boldsymbol {x}^{-}_{k}\) and xk to denote the priori and posteriori states at step k before and after the k-th system output yk is observed. The prediction step is denoted by:

$$ \boldsymbol{x}^{-}_{k+1}=\boldsymbol{A}\boldsymbol{x}_{k} + \boldsymbol{B}\boldsymbol{u}_{k}, $$
(4)
$$ \boldsymbol{P}^{-}_{k+1}=\boldsymbol{A}\boldsymbol{P}_{k} \boldsymbol{A}^{\rm{T}}+\boldsymbol{K}\boldsymbol{Q}_{k}\boldsymbol{K}^{\rm{T}}, $$
(5)

where \(\boldsymbol {P}^{-}_{k+1}\) and Pk denote the priori and posteriori covariance matrices of prediction errors at step k+1 and k, respectively, and Qk is the covariance matrix of the process noise εk at step k. Accordingly, KF predicts the next expected output \(\hat {\boldsymbol {y}}_{k+1}\) of the system as follows:

$$ \hat{\boldsymbol{y}}_{k+1} = \boldsymbol{C}\boldsymbol{x}^{-}_{k+1}. $$
(6)

Once the next system output yk+1 is measured, the update step is performed as follows:

$$ \boldsymbol{KAL}_{k+1}=\boldsymbol{P}^{-}_{k+1}\boldsymbol{C}^{\rm{T}}\left[\boldsymbol{C}\boldsymbol{P}^{-}_{k+1}\boldsymbol{C}^{\rm{T}}+\boldsymbol{R}_{k}\right]^{\rm{-1}}, $$
(7)
$$ \boldsymbol{x}_{k+1}=\boldsymbol{x}^{-}_{k+1}+\boldsymbol{KAL}_{k+1}\left[\boldsymbol{y}_{k+1}-\boldsymbol{C}\boldsymbol{x}^{-}_{k+1}\right], $$
(8)
$$ \boldsymbol{P}_{k+1}=[\boldsymbol{I}-\boldsymbol{KAL}_{k+1}\boldsymbol{C}]\boldsymbol{P}^{-}_{k+1}, $$
(9)

where I is the identity matrix, Rk denotes the covariance matrix of the measurement noise ek, the Kalman gain matrix KALk+1 is estimated by minimizing Pk+1. Pk+1 in Eq. (9) is the consequent minimized posteriori covariance matrix. As shown in Eq. (8), the posteriori state xk+1 is computed as a weighted average of the priori state estimate \(\boldsymbol {x}^{-}_{k+1}\) and the deviation between the new sensor measurement yk+1 and its forecast \(\boldsymbol {C}\boldsymbol {x}^{-}_{k+1}\). KALk+1 determines how much the new sensor measurement contributes to the posteriori state estimation. If the past prediction is with higher certainty (i.e., Pk smaller and accordingly \(\boldsymbol {P}^{-}_{k+1}\) smaller), the contribution of the new sensor measurement yk+1 should be less (KALk+1 smaller).

3.3 Detection statistics

After building the physical model for the target control system and performing the process forecasting procedure, IDS perform statistical analysis on the forecasting residuals to detect potential attacks. Generally, there are two kinds of residual testing techniques: stateless and stateful [50].

The stateless test raises an alarm for each observable deviation, i.e., \(|y_{k} - \hat {y}_{k}| = |r_{k}| \geq \tau _{1}\) (k>0), where yk and \(\hat {y}_{k}\) are the measured system output and its forecast at step k, and τ1 is a pre-defined detection threshold. In the stateful test, the change (no matter how small) of rk is tracked using another statistic Sk. The non-parametric CUmulative SUM (CUSUM) is one of the most popular stateful detection statistic. It is a variable defined recursively as S0=0 and Sk+1=(Sk+|rk|−δ)+, where (x)+ denotes max(0,x), and δ is a small positive value used to keep Sk from increasing persistently when the system operates normally. Once Sk exceeds the detection threshold τ (τ is defined based on a tolerable false alarm rate), in other words, there exists a persistent deviation across multiple time steps, an alarm is generated and Sk+1 is reset to 0 when the detection procedure restarts. The intrusion detection procedure based on process data analysis is summarized in Fig. 1.

Fig. 1
figure 1

Intrusion detection based on process data analysis

4 Detecting stealthy attacks

In this section, we present the novel detection approach against stealthy attacks based on residual skewness analysis. We first take a water level control system as an example to describe the stealthy attack model. Then, we present the detection strategies against stealthy attacks.

4.1 The stealthy attack model

We take a water level control system as a motivating example to describe the stealthy attack model against ICS. The architecture of the system is shown in Fig. 2. The water level in the tank should be maintained below 0.8 m (the high level) and above 0.2 m (the low level) by turning on or off the inlet and outlet pumps at proper moments. Water spill occurs at 1.1 m.

Fig. 2
figure 2

A water level control system

Suppose that each pump has only two states: on and off. A water level sensor is used to monitor the water level in the tank and transmits measurement data to the controller (PLC). The PLC generates appropriate control commands according to the real-time sensor measurements. For simplicity, the outlet pump is assumed to keep working when the system operates normally. As a result, only the inlet pump needs to be controlled to maintain the water level in the tank. Moreover, we assume that the amount of water coming in is greater than the amount of water going out per unit time while the two pumps are both working. The inlet pump should be turned off once the water level exceeds the high level, and should be turned on again once the water level goes down below the low level.

We assume that the adversary is able to gain knowledge of the physical model of the target ICS, the process forecasting and intrusion detection techniques, and can tamper with the sensor measurements secretly. Thus the adversary can launch a successful stealthy attack. The physical model of the system can be derived from the mass balance equation, which relates the water level h with the volume of water coming in Qin and the volume of water going out Qout per unit time as follows:

$$ \text{Area}\frac{\mathrm{d}h}{\mathrm{d}t}=Q^{\text{in}}-Q^{\text{out}}, $$
(10)

where Area denotes the cross-sectional area of the tank, and Qin and Qout are positive constants when the two pumps are both working, and zero otherwise. Assuming that the discrete time interval is 1 s, the LDS model is derived as follows:

$$ h_{k+1} = h_{k} + \frac{Q_{k}^{\text{in}}-Q_{k}^{\text{out}}}{\text{Area}}, $$
(11)

where hk+1 and hk are the water heights at step k+1 and k, and \(Q_{k}^{\text {in}}-Q_{k}^{\text {out}}\) is the control input at step k. In this example, we assume that \(Q_{k}^{out}\) keeps constant when the system operates normally and \(Q_{k}^{\text {in}}\) changes over time according to the control instructions issued by the controller. As a result, this equation is not an ARIMA model but a LDS model with xk=hk, \(u_{k} = \left [Q_{k}^{\text {in}},Q_{k}^{\text {out}}\right ]^{T}\), \(B=\left [\frac {1}{\text {Area}},-\frac {1}{\text {Area}}\right ]\), A=1, and C=1.

The adversary attempts to manipulate the water level in the tank maliciously by tampering with the sensor measurements persistently but remain undetected until water spill occurs. Specially, during a surge stealthy attack [47], the goal of the adversary is to cause maximum damage to the system as quickly as possible. Suppose that the stateful test is adopted by IDS due to its stronger detection ability compared to the stateless test. Once the detection threshold τ is reached, the stateful statistic Sk should stay at the threshold until the water overflows. Otherwise, the attack can be easily identified by IDS. Accordingly, the adversary needs to solve the following equation:

$$ S_{k} + |y_{k}^{a}-\hat{y}_{k}|-\delta = \tau, $$
(12)

where \(y_{k}^{a}\) and \(\hat {y}_{k}\) denote the observed and forecasted water levels during a stealthy attack, respectively. By solving this equation, the adversary can get the following attack model:

$$ y^{a}_{k} = \left\{ \begin{array}{ll} \hat{y}_{k}-(\tau+\delta), & k=1\\ \hat{y}_{k}-\delta, & k >1 \end{array} \right. $$
(13)

The model means that the fake water levels that are lower than their forecasts should be sent to the PLC persistently until the water spill occurs. In the first step of the attack, the residual between the fake water level and its forecast is −(τ+δ). In the following steps, the residuals should be kept at −δ. In another word, the adversary should increase the observed water levels at a lower rate than the forecasts. The attack goal is achieved when the controller receives a high water-level measurement from the sensor and issues a “turn-off” control command to the inlet pump, but the deviation (Δ) between the observed sensor measurement and the real water level exceeds overflow-high. Figure 3 illustrates three attacks with different slopes from the low level to the high level. According to the maximum deviations (Δ) caused by the three attacks, we can draw a conclusion that only a2 and a3 can make the tank overflow, and only a3 achieves a water spill. a1 is not a successful attack since it yields a smaller deviation Δ1< overflow-high.

Fig. 3
figure 3

Different attacks on the water level control system

This example verifies that the state-of-the-art stateless or stateful statistics cannot identify this kind of stealthy attacks, since only the residual magnitudes (\(|y_{k}^{a}-\hat {y}_{k}|\)) are investigated but the residual signs are ignored. In order to achieve a successful stealthy attack, the adversary has to make the residual signs follow certain regularities. In this example, the residuals generated during a surge stealthy attack are denoted by:

$$ r_{k} = y^{a}_{k} - \hat{y}_{k} = \left\{ \begin{array}{ll} -(\tau+\delta), & k=1\\ -\delta, & k \textgreater 1 \end{array} \right. $$
(14)

Negative signs of residuals enable the adversary to inject enough false data into the system until it crashes. Moveover, in order to complete a successful stealthy attack as quickly as possible, the adversary keeps the residual magnitudes as large as possible under the premise of not being detected. The two features make the residuals generated during a stealthy attack exhibit significant skewness when compared to Gaussian noises. Based on the new discovery, we propose a novel stealthy attack detection technique based on residual skewness analysis.

4.2 Detecting Stealthy Attacks Based on Residual Skewness Analysis

The proposed stealthy attack detection approach mainly includes three steps as follows:

(1) Estimate parameters of the normal residual distribution. Suppose that the attack-free forecasting residuals follow a normal distribution. A priori residual distribution is helpful to stealthy attack detection. Therefore, we first collect a series of attack-free residuals by operating the target ICS in “air-gapped” separation for a period of time and then estimate the two parameters (mean μ and variance σ2) of the normal residual distribution using the maximum likelihood estimation (MLE) method as follows:

$$ \mu = \bar{x} = \frac{1}{n}\sum\limits_{i=1}^{n}x_{i}, $$
(15)
$$ \sigma^{2}=\frac{1}{n}\sum\limits_{i=1}^{n}(x_{i}-\bar{x})^{2}, $$
(16)

where xi is the ith value of the attack-free residual sequence, and \(\bar {x}\) denotes the mean value.

(2) Compute the skewness coefficients of the residuals to be tested. During the stealthy attack detection, we first generate an artificial random sequence rrand following the normal distribution estimated above (i.e., \(\boldsymbol {r}_{\text {rand}} \sim \mathcal {N}(\mu,\sigma ^{2})\)). After that, we replace a small proportion of entries in the original residual sequence ro to be tested with rrand and generate a new sequence rtest for testing. Here, we define an new operator ⊎ to denote the sequence replacement operation as follows:

$$ \boldsymbol{r}_{\text{test}} = \boldsymbol{r}_{o} \uplus \boldsymbol{r}_{\text{rand}}, $$
(17)

where L(rrand)/L(ro)≈θ, and L(·) denotes the length of a sequence and θ is a positive real value around 5%. The procedure of the sequence replacement is shown in Fig. 4. Afterwards, we compute the skewness coefficient (SC) of the new residual sequence rtest as follows:

$$ \text{SC} = \frac{{\sum\nolimits}_{i=1}^{l}(r_{i}-\bar{r})^{3}}{{\sigma_{r}}^{3}}, $$
(18)
Fig. 4
figure 4

The sequence replacement procedure

where l is the length of rtest, ri is the ith entry in rtest, \(\bar {r}\) and σr are the mean value and standard deviation of rtest, respectively. If the residuals are set equal to −δ or δ by the adversary during a stealthy attack, and a small portion of residuals are replaced with normal residuals, the residual distribution becomes right-skewed or left-skewed (i.e., the tail is on the right or left side of the distribution), as shown in Fig. 5. This feature can help us identify the counterfeited residuals and further detect stealthy attacks.

Fig. 5
figure 5

The right-skewed (a) and left-skewed (b) residual distributions. The purple line and red line denote the residual histogram and the fitted residual distribution, respectively

(3) Detecting stealthy attacks according to the skewness coefficients of residuals. Generally, there are two kinds of industrial control scenarios: a larger or a smaller value of a process variable indicates a more dangerous system state. In the first scenario, the attacker attempts to counterfeit negative residuals persistently. In order to eliminate the negative residuals, the controller generates commands to increase the value until the system crashes. However, in this case, the skewness coefficient of the observed residuals is greater than 0, since the residual distribution is right-skewed as shown in Fig. 5a, indicating the occurrence of a stealthy attack. The second scenario is just the opposite. The attacker tries to counterfeit positive residuals, making the real value of the target process variable decrease over time until the system crashes. In this scenario, the skewness coefficient of the observed residuals is negative, since the residual distribution is left-skewed as illustrated in Fig. 5b.

Therefore, we should fully understand the characteristics of the target ICS before intrusion detection, i.e., which scenario the system belongs to. During attack detection, the skewness coefficients of residuals are computed and investigated over time. If the sign of the skewness efficient conforms to the current control scenario and its absolute value exceeds a predefined positive threshold ε (i.e., |SC|>ε), a stealthy attack is detected and an alarm is raised. For simplicity, we can only investigate the absolute value of the skewness coefficient for attack detection. However, its sign can help the system operator better understand the adversary’s intentions and then make appropriate strategies for system recovery. The entire procedure of the Detecting Stealthy Attacks based on Residual Skewness Analysis algorithm, or “DSARSA” for short, is summarized in Algorithm 1.

In this algorithm, lines 1 and 2 estimate the state-space model and the normal distribution parameters of the attack-free residuals. Line 3 defines a counter used in attack detection. Lines 4 to 26 perform the stealthy attack detection procedure. Lines 5 to 7 present the prediction procedure of Kalman Filtering, and the updating procedure of Kalman Filtering is described by Lines 22 to 24. Lines 8 and 9 compute the current forecasting residual. The skewness coefficient of the residual sequence to be tested is computed by lines 10 to 21. If the absolute value of the skewness coefficient exceeds the detection threshold ε, the detection procedure is terminated, and a flag F indicating the occurrence of a stealthy attack is returned by the algorithm and triggers an alarm(lines 17 to 20, 27). Once the alarm is handled properly and the system goes back to safety, the detection procedure restarts.

5 Experimental

In this section, we study the effectiveness of the stealthy attack detection approach based on residual skewness analysis by conducting experiments in a Matlab-Simulink environment.

A water level control system and a water’s pH value control system are simulated in our experiment. Both of them are typical ICS as discussed in [16]. Note that the proposed approach can apply to a variety of ICS in addition to the two experimental systems as long as the state-space model of the system can be constructed.

The first system has been discussed as a motivating example in Section 4.1. The dynamics of the water level in the tank can be described by a well-known LDS model derived from the mass balance equation. For simplicity, we assume that the cross-sectional area of the tank is 1 m2, and the outlet pump keeps working when the system operates normally. The inlet pump should be turned off when the water level exceeds 0.8 m and be turned on again when the water level drops below 0.2 m. Water spill occurs at 1.1 m.

The water’s pH value control system is a more complex non-linear system as presented in [16]. The HCl dosage determines the pH value of the water. The HCl pump starts to dose HCl into the water if the pH value exceeds 7.05, and the pump is turned off if the pH value drops below 6.95. Figure 6 depicts the actions (ON/OFF) of the HCl pump and the water’s pH values responding to it. The time-delay feature of the system causes the wide oscillations of the pH response curve. The nonlinearity and high latency make it difficult to drive a LDS model from first principles. Therefore, we use system identification techniques to build a high-order LDS model to simulate the system dynamics approximately.

Fig. 6
figure 6

The relationship between the water’s pH value and the actions of the HCl pump

6 Results and discussion

On the two simulated ICS, we launch surge stealthy attacks. During attack detection, we set the length of the residual sequence for testing equal to 100 and the parameter θ equal to 5%. Then, we investigate the residual sequence {rk−99,…,rk−1,rk} at each step k≥100.

In the water level control system, the simulated surge stealthy attack starts from 201 s, as illustrated in Fig. 7a. After that, the deviation between the sensor reading and the real water level in the tank increases persistently until the water spill occurs at 286 s. Figure 7b shows the residuals between the forecasted and measured water levels. It can be seen from Fig. 7c that the skewness coefficient curve stays close to 0 from 1 s to 200 s, but starts to rise significantly after 200 s, indicating the occurrence of the stealthy attack. Additionally, the positive skewness coefficients indicate a right-skewed residual distribution. In other words, there is a small number of large values in the right-hand tail of the distribution, which comes from the artificial random sequence rrand, and a large number of small values in the left hand, which comes from the original residual sequence ro for testing. As a result, we can draw a conclusion that the attacker attempts to deceive the controller with the fake negative residuals and mislead the controller into making opposite decisions until the tank overflows. Figure 7d to f show the intrusion process, the compromised residuals and the detection result on the water’s pH value control system. The stealthy attack starts from 301 s and the skewness coefficient curve starts to decline near 301 s, which indicates a left-skewed residual distribution, i.e., the tail is in the left hand. In this scenario, the attacker tries to counterfeit positive residuals. Accordingly, the deceived controller keeps increasing the HCl dosage into the water until the water container is corroded. Figure 8 shows that the counterfeited residuals fluctuate randomly in a small range above −δ or under δ, and our detection scheme can still detect this variant of surge stealthy attacks successfully. The experimental results indicate that the residual skewness coefficient is sensitive to the occurrence of stealthy attacks and verify the excellent detection ability of the proposed approach.

Fig. 7
figure 7

Detection of surge stealthy attacks. When the surge stealthy attacks in (b) and (e) are conducted on the water level control system and the pH value control system, respectively, the real water level shown in (a) deviates significantly from the seemingly normal but compromised water level, and the real pH value shown in (d) also deviates observably from the seemingly normal but compromised pH value. The significant changes of skewness coefficients of residuals shown in (c) and (f) indicate that the approach proposed can identity surge stealthy attacks effectively

Fig. 8
figure 8

Detection of the variant of surge stealthy attacks. When the variant of surge stealthy attacks in (b) and (e) are conducted on the water level control system and the pH value control system, respectively, the real water level shown in (a) deviates significantly from the seemingly normal but compromised water level, and the real pH value shown in (d) also deviates observably from the seemingly normal but compromised pH value. The significant changes of skewness coefficients of residuals shown in (c) and (f) indicate that the approach proposed can identity the variant of surge stealthy attacks effectively

Additionally, skilled attackers may replace some entries in the residual sequence with a series of random values (i.e., \(\{r_{i}\} \sim \mathcal {N}(\mu, \sigma ^{2})\)), trying to bypass the intrusion detection system. Figure 9 illustrates that the attacker replace 10% of entries in the residual sequence with random values. In this case, the proposed detection scheme is still capable of identifying this kind of advanced stealthy attacks effectively (i.e., the skewness coefficient curve starts to rise or decline sharply from a certain time point), although the convergent absolute values of skewness coefficients are smaller than those in the above two attack scenarios shown in Figs. 7 and 8. However, it is more difficult for the adversary to achieve his goal if the ratio of the random values becomes higher, so we study the impacts of the ratio of random values on the time to achieve attack goals and the detection ability of our approach.

Fig. 9
figure 9

Detection of stealthy attacks with a certain percentage of random residuals. When stealthy attacks with a certain percentage of random residuals in (b) and (e) are conducted on the water level control system and the pH value control system, respectively, the real water level shown in (a) deviates significantly from the seemingly normal but compromised water level, and the real pH value shown in (d) also deviates observably from the seemingly normal but compromised pH value. The significant changes of skewness coefficients of residuals shown in (c) and (f) indicate that the approach proposed can identity stealthy attacks with a certain percentage of random residuals effectively

Figure 10a and c show the impacts of the ratio of the random residuals on the time to achieve attack goals on the water level control system and the water’s pH value control system, respectively. We can see that the time to achieve attack goals increases quickly as the ratio of the random residuals rises, especially when the ratio exceeds 60%. Figure 10b and d show that the ratio of the random residuals can also weaken the detection ability of our approach. When the ratio is less than 70%, the convergent values of skewness coefficients are significantly different from 0 (i.e., greater than 0 in the water level control system and less than 0 in the water’s pH value control system). However, when the ratio reaches or exceeds 80%, it is not easy for our detection scheme to identify the stealthy attack. Additionally, when the ratio exceeds 80%, the stealthy attack detection technique based on residual permutation entropy [23] cannot work well either. Therefore, the detection abilities of the technique proposed and the technique proposed previously are nearly equal. Fortunately, in this case, it takes a much longer time to achieve the attack goals, so attackers are generally unwilling to counterfeit so many random residuals during an attack. Hence, the proposed residual skewness analysis-based technique is able to detect stealthy attacks against ICS effectively in most cases.

Fig. 10
figure 10

Impacts of the ratio of random residuals on attack time and detection ability. Time to achieve attack goals becomes longer as the ratio of random residuals increases in the water level control system (a) and the pH value control system (c). Additionally, when the ratio of random residuals increases, the change of skewness Coefficient of residuals becomes more subtle, as shown in (b) and (d)

It is worth noting that there exists an interesting phenomenon in Fig. 10. It can be seen from Fig. 10b that the skewness coefficient curve drops slightly at the beginning of the stealthy attack, and then rises significantly. This phenomenon is caused by a transition from a left-skewed residual distribution to a right-skewed residual distribution, since we investigate a set of time sliding windows of residuals during intrusion detection. At the beginning of a stealthy attack, most of the residuals in the current sliding window are Gaussian noises and only a small portion of counterfeited negative residuals, which results in a left-skewed distribution, so the skewness coefficient is less than 0. As time goes on, the sliding window contains more counterfeited negative residuals and only a small portion of gaussian noises, so the left-skewed distribution turns into a right-skewed distribution, and the skewness coefficient becomes greater than 0. A similar phenomenon occurs in the water’s pH value control system as shown in Fig. 10d. A right-skewed distribution turns into a left-skewed distribution.

Additionally, we study the impacts of the length of time windows for testing on the computing time of the detection algorithm, and compare the computing time of the proposed approach with that of the residual permutation entropy-based approach proposed in our previous work [23]. Figure 11 shows that the detection approach proposed in this paper is about ten times faster than the approach proposed previously. Therefore, we can conclude that the residual skewness analysis-based approach is more efficient and more suitable for industrial control systems, which requires low latency and high reliability [1].

Fig. 11
figure 11

Detection time comparison of two different approaches, (a) the residual permutation entropy-based approach and (b) the residual skewness analysis-based approach. The blue points are observed computing time and the red lines are fitted lines for the blue points

7 Conclusions

In this paper, we propose an effective and efficient detection technique against stealthy attacks on ICS. This approach makes full use of the distribution skewness of the forecasting residuals generated during stealthy attacks, which can effectively distinguish the counterfeited residuals from the attack-free residuals. As a result, the occurrence of stealthy attacks can be identified effectively. Comprehensive experimental results verify the effectiveness and efficiency of the proposed approach.

However, this method proposed in this paper still has some shortcomings. The values of the algorithm parameters (e.g., the detection threshold ε, the length l of the residual sequence for testing, the ratio θ of residuals to be replaced) should be set manually. Overdependence on human experience may weaken the detection ability of our approach. In the future, we will try to study and model the relationships between the algorithm parameters and the detection performance, based on which to devise an automatic and real-time parameter updating technique, to accomplish the adaptive updating of the parameter values according to the changing detection performance, and evaluate the proposed techniques on larger industrial control systems.