1 Introduction

Due to the unique openness of the network, information security issues have become more prominent [1], which brings huge threats to individuals, society, and countries. Technologies such as the traditional firewall [2], intrusion detection [3], and digital encryption [4] have not been able to deal with the existing attacks and threats. An active and reliable security strategy is urgently needed. Network security situation awareness (NSSA) is a process that can comprehensively analyze the network security status [5]. It can obtain the situation elements in a large-scale network and calculate and analyze them to predict the future trend of the network [6]. Network security situation prediction (NSSP) is an important technology in NSSA. With the emergence of machine learning, the computation ability of computers has been further improved [7], which provides more methods for NSSP. The commonly used methods include neural network, time series, and gray correlation analysis [8]. The neural network method has an excellent self-learning ability, but it has randomness and is prone to local convergence. The time series method is based on the periodicity and regularity of the situation, but due to uncertain factors in the actual network, its prediction results are not accurate. The gray correlation analysis method generates new data by accumulating historical data [9], which has the advantages of simple modeling and fast calculation; however, it may cause large errors when the randomness of the network is large. NSSA was first defined by Endsley [10]. Then, with the development of network technology, there are more researches on NSSA. Bode et al. [11] developed a Bayesian network classifier to analyze network traffic and further analyzed the risk level using the modified risk matrix standard. The experiments on the KDD Cup 99 data set showed that the model was suitable and well developed in network security. Aiming at the deficiency of the gray Verhulst model, Leau et al. [12] designed an adaptive gray Verhulst model with adjustable generation order, tested the model with DARPA 1999 and 2000 benchmark data sets, and found that the model showed good performance in predicting the network. Kim et al. [13] pointed out that the traditional time series analysis could not predict the dynamic network and proposed a hidden Markov model (HMM) to analyze and predict the real-time changes of network traffic. Holsopple et al. [14] designed a FuSIA framework to predict the possible future attacks, which used uncertain observability to determine the current and future impacts of key tasks in the application. Panigrahi et al. [15] proposed a new autoregressive integrated moving average-artificial neural network (ARIMA-ANN) hybrid model for time series prediction, which used a fuzzy filter to decompose the time series into low-volatile and high-volatile components. The low-volatile components were modeled by ARIMA, and the high-volatile components were modeled by ANN. The final prediction was obtained by combining the prediction of ARIMA and ANN models. The experiment found that the hybrid model was superior to ARIMA and ANN models. In the current research, although a lot of achievements have been made, the accuracy and timeliness of predictions need to be further improved. In order to find a more efficient NSSP method to achieve better and faster predictions for the network security situation, this study designed an improved gray relational analysis (GRA)-based NSSP model and performed simulation experiments and analyses on the model. The experimental results verified that the method was effective in predicting the network security situation, which makes some contributions to the further development of network information security.

2 Methods

2.1 GM (1, N) model

In GRA theory, the GM (1, 1) model is a typical model used in the early stage [16]. Suppose the original NSS data sequence is: X(0)(t) = {x(0)(1), x(0)(2), ⋯, x(0)(n)}. Let \( {X^{(1)}}_{(t)}=\sum \limits_{i=1}^t{x}^{(0)}(i),i=1,2,\cdots, n \). After AGO processing, there is: X(1)(t) = {x(1)(1), x(1)(2), ⋯, x(1)(n)}. GM (1, 1) model is described by a differential equation, \( \frac{dx^{(1)}(t)}{dt}+{ax}^{(1)}(t)=\phi \), where a and ϕ are undetermined parameters. (a, ϕ)T = (ATA)−1ATy, where \( A=\left[\begin{array}{cc}-\frac{\left[{x}^{(1)}(t)+{x}^{(1)}(2)\right]}{2}& 1\\ {}\vdots & \vdots \\ {}-\frac{\left[{x}^{(1)}\left(n-1\right)+{x}^{(1)}(n)\right]}{2}& 1\end{array}\right] \) and y = [x(0)(2), x(0)(3), ⋯, x(0)(n)]. Then, the cumulative sequence value can be written as: \( \hat{x}\left(t+1\right)=\left[{x}^{(1)}(1)-\frac{\phi }{a}\right]{e}^{- at}+\frac{\phi }{a} \). After reduction, the prediction result of GM (1, 1) model can be obtained: \( {\hat{x}}^{(0)}\left(t+1\right)={\hat{x}}^{(1)}\left(t+1\right)-{\hat{x}}^{(1)}(t) \).

GM (1, 1) model can only be used in the case of a single change, and the error is uncontrollable; therefore, it is not suitable for solving NSSP problems. GM (1, N) has high accuracy [17], which is more suitable for situation prediction. In GM (1, N) model, the differential equation is written as: \( \frac{dx^{(1)}(1)}{dt}+{ax}^{(1)}(1)={\phi}_1{x}^{(1)}(2)+{\phi}_2{x}^{(1)}(3)+\cdots +{\phi}_{N-1}{x}^{(1)}(N) \). Then, the value of x(1)(t) can be written as: \( {x}^{(1)}(t)={e}^{- at}\left[\sum \limits_{i=2}^N\int {\phi}_{i-1}{x}^{(1)}(t){e}^{at} dt+{x}^{(1)}(0)-\sum \limits_{i=2}^N\int {\phi}_{i-1}{x}^{(0)}(t) dt\right] \). After reducing the cumulative sequence value, there is \( {\hat{x}}^{(0)}\left(t+1\right)={\hat{x}}^{(1)}\left(t+1\right)-{\hat{x}}^{(1)}(t) \).

2.2 NSSP model based on improved GRA

Situation value is an ever-changing dynamic value. In order to predict it better, this study improved the GM (1, N) model and combined the dynamic equal dimension method. According to GM (1, N), the original data are accumulated once: \( {x}^{(1)}(t)=\sum \limits_{i=1}^t{x}^{(0)}(k) \). According to the original data, GM (1, N) of different dimensions is established. The approximate time response formula is obtained: \( {\hat{x}}^{(1)}\left(t+1\right)=\left[{x}_1^{(0)}(1)-\frac{1}{a}\sum \limits_{i=2}^N{\phi}_i{x}_i^{(1)}\left(t+1\right)\right]{e}^{- at}+\frac{1}{a}\sum \limits_{i=2}^N{\phi}_i{x}_i^{(1)}\left(t+1\right) \). After reduction, the prediction model is: \( {\hat{x}}^{(0)}\left(t+1\right)={\hat{x}}^{(1)}\left(t+1\right)-{\hat{x}}^{(1)}(1)(t) \). The predicted value \( {\hat{x}}^{(0)}\left(n+1\right) \) is substituted into the original sequence to remove the original x(0)(1) and generate a new sequence, i.e., the real-time data obtained by prediction replace the early data. The above steps repeat until the predicted target is obtained.

The method is applied to NSSP. The original security situation sequence is set as: s(0) = (s(0)(1), s(0)(2), ⋯, s(0)(n)), where s is the gray correlation factor of the security situation. s(0) can be obtained by performing 1-AGO on s(0). Then, through the adjacent mean generating sequence, z(1) is obtained: z(1) = 0.5s(1)(t) + 0.5s(1)(t − 1). Accuracy test was performed on the improved GM (1, N). The error is:

$$ \varepsilon (t)={X}^{(0)}(t)-{Y}^{(0)}(t), $$

where X(0)(t) refers to the actual security situation sequence and Y(0)(t) is the sequence predicted by GM (1, N).

3 Results

3.1 Experimental data

Suppose several hosts are included in the network system, providing p kinds of services Si(1 ≤ i ≤ p) and being attacked by Aj attacks, the severity degree of attack is \( {T}_{A_j} \), the value of attack class is C, 1 ≤ j ≤ c, the time interval of attacks is τ, the importance of time interval is wτ, the detection value of Aj is \( {N}_{A_j} \), and the value for dividing time intervals is CT. Then, the risk index of Si is written as:

$$ {I}_{S_i}=\sum \limits_{\tau =1}^{C_T} w\tau \sum \limits_{j=1}^C10{T}_{A_j}{N}_{A_j}. $$

Suppose that the total amount of Si is \( {W}_{S_i} \) in the system. Then, the NSS value of the system can be written as:

$$ NSS=\sum \limits_{i=1}^p{W}_{S_i}\sum \limits_{\tau =1}^{C_T}{w}_{\tau}\sum \limits_{j=1}^C10{T}_{A_j}{N}_{A_j}. $$

A network system was simulated, including three servers, which provided WWW service, e-mail service, and file transfer protocol (FTP) service, respectively. The computer of the simulated sensor was responsible for linking the small local area network of the attacker and the attacked computer. When the attack was launched, the attack packet was crawled and reported to the data aggregation server. The specific process is as follows. One network card grabbed the attack packet and transmitted it to another network card. The network card analyzed the data and finally reported it to the aggregation server for final processing. The aggregation server collected the state information of the attacked host while receiving the data transmitted by the sensor. The two kinds of data were compared to determine whether the host was attacked. The NSS value of the network was calculated through the calculation formula of the NSS value. The calculation of the NSS value lasted for 20 h, as shown in Table 1. The first 10 h were used for model training, and the last 10 h were used for prediction.

Table 1 NSS values within 20 h

3.2 Prediction results

GM (1, 1), GM (1, N), and improved GM (1, N) models were used in NSSP. The predicted results were compared with the actual NSS values, and the results were drawn into a line chart (Fig. 1).

Fig. 1
figure 1

Prediction results of different gray models

It was seen from Fig. 1 that there was a large gap between the results of GM (1, 1) and GM (1, N) models and the actual value. Taking the 11th hour as an example, the actual NSS value was 27.2541, the prediction result of the GM (1, 1) model was 46.1285, which was 18.8744 larger than the actual value; the prediction result of the GM (1, N) model was 37.2651, which was 10.011 larger than the actual value; the predicted value of the improved GM (1, N) model was 28.1524, which was only 0.8983 larger than the actual value. It was found that the result of the improved model was closest to the actual NSS value, and it had a better performance in solving NSSP problems.

In order to further verify the effectiveness of the model, it was compared with the neural network model [18] and the Markov model [19]. The line chart was also used to compare the predicted results between different models (Fig. 2).

Fig. 2
figure 2

The results of comparison with the other models

As shown in Fig. 2, the predicted value of the improved model was in good agreement with the actual value, and the values were close; the prediction results of the other two models fluctuated greatly, and the differences with the actual values were large. For example, at the 20th hour, the actual NSS value was 26.8518, and the predicted value of the three models was 34.9477, 20.6485, and 24.3196, respectively, and the difference between the predicted value and the actual value was 8.0959, 6.2033, and 2.5322, respectively. The results of the improved model were closest to the actual values.

The errors of the models in Fig. 2 are calculated, and the results are shown in Table 2.

Table 2 Comparison of errors between errors

It was seen from Table 2 that the maximum and minimum errors of the neural network model were 8.6667 and 2.6606, respectively, the maximum and minimum errors of the Markov model were 9.5692 and 5.4318, respectively, and the maximum and minimum errors of the improved model were 3.6167 and 0.8983, respectively, which were significantly smaller than the other two models. The average error of the three models was 7.4138, 8.0211, and 2.3811, respectively; the average error of the improved model was 67.88% and 70.31% smaller than the other two models, which indicated the advantage of the improved model in NSSP.

The time complexity of different models in prediction was compared, and the results are shown in Fig. 3.

Fig. 3
figure 3

Comparison of time complexity between models

Figure 3 shows the time complexity of different models in NSSP. When predicting the NSS value, the neural network model had the largest time complexity, followed by the Markov model and the improved GM (1, N) model. The time the three models needed was 42.67 s, 35.29 s, and 21.34%, respectively. The time complexity of the Markov model was 17.3% lower than that of the neural network model. The time complexity of the improved GM (1, N) model was 49.99% lower than that of the neural network and 39.53% lower than that of the Markov model, which verified the advantage of the improved GM (1, N) model in computation efficiency.

4 Discussion

With the development of the Internet of things, more and more devices have been connected to the network [20], further strengthening the openness of the network [21]. The network is always faced with a variety of malicious attacks and threats [22], which will damage any operation of the target computer and bring huge reputation and property losses [23]. In order to achieve network security, it is necessary to detect the attacks in the network in advance, take corresponding measures to curb the threats in time, and protect the information security in the network actively. Therefore, network managers need to sense the threat in time, accurately grasp the status of the network, and predict the future development trend. NSSA technology can convert the changes in network traffic and resource occupancy rate into security situation information when attacks happen to provide reliable support for risk assessment and prediction, including data fusion [24], network security situation evaluation [25], and NSSP. This study mainly analyzed NSSP.

The GRA method can find the rule in the sequence and use the rule to predict the sequence, which has a good performance in short-term prediction. Based on the GRA method, this study introduced GM (1, 1) and GM (1, N) models and applied them to the solution of the NSSP problem. In order to obtain better accuracy, the GM (1, N) model was improved by the dynamic equal dimension method. It was found that the GM (1, 1) model and GM (1, N) model both showed large errors in the prediction of situation value, more than ten, and the results of the improved GM (1, N) model were closer to the actual NSS value, which showed that the method had a high prediction accuracy. Then, the comparison with the other methods demonstrated that the neural network and Markov model showed great volatility and large errors in NSS prediction, and the average errors were 7.4138 and 8.0211, respectively. It was seen from Fig. 2 that the results of the improved GM (1, N) model had better similarity with the actual value and the average error was 2.3811, which was significantly smaller than the other two methods. The above results revealed that the improved GM (1, N) model had a better performance in the NSSP problem.

5 Conclusions

Aiming at the NSSP problem, this study analyzed the advantages of the GRA method, improved the GM (1, N) model, and conducted simulation experiments. The results showed that (1) there were large errors between the prediction results of GM (1, 1) and GM (1, N) models and the actual values; (2) the predicted value obtained by the improved GM (1, N) model was closer to the actual value. Taking the 20th hour as an example, the error between the predicted value and the actual value was only 2.5322; (3) compared with the neural network and Markov model, the prediction accuracy of the improved GM (1, N) model was higher, and the average error was only 2.3811.

The experimental results verify that the improved GM (1, N) model is reliable and can be popularized and applied in practice to accurately predict the situation to realize the network information security. In future research, the accuracy of the GM (1, N) model will be further improved, and experiments will be carried out in a larger network and actual network environment to further verify the performance and practical application ability of the model.