Application Study on the Reinforcement Learning Strategies in the Network Awareness Risk Perception and Prevention

Xie, Junwei

doi:10.1007/s44196-024-00492-x

Application Study on the Reinforcement Learning Strategies in the Network Awareness Risk Perception and Prevention

Research Article
Open access
Published: 07 May 2024

Volume 17, article number 112, (2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

Application Study on the Reinforcement Learning Strategies in the Network Awareness Risk Perception and Prevention

Download PDF

Junwei Xie¹

538 Accesses
Explore all metrics

Abstract

The intricacy of wireless network ecosystems and Internet of Things (IoT) connected devices have increased rapidly as technology advances and cyber threats increase. The existing methods cannot make sequential decisions in complex network environments, particularly in scenarios with partial observability and non-stationarity. Network awareness monitors and comprehends the network's assets, vulnerabilities, and ongoing activities in real-time. Advanced analytics, machine learning algorithms, and artificial intelligence are used to improve risk perception by analyzing massive amounts of information, identifying trends, and anticipating future security breaches. Hence, this study suggests the Deep Reinforcement Learning-assisted Network Awareness Risk Perception and Prevention Model (DRL-NARPP) for detecting malicious activity in cybersecurity. The proposed system begins with the concept of network awareness, which uses DRL algorithms to constantly monitor and evaluate the condition of the network in terms of factors like asset configurations, traffic patterns, and vulnerabilities. DRL provides autonomous learning and adaptation to changing network settings, revealing the ever-changing nature of network awareness risks in real time. Incorporating DRL into risk perception increases the system's capacity to recognize advanced attack methods while simultaneously decreasing the number of false positives and enhancing the reliability of risk assessments. DRL algorithms drive dynamic and context-aware response mechanisms, making up the adaptive network prevention component of the development. Predicting new threats and proactively deploying preventive measures, such as changing firewall rules, isolating compromised devices, or dynamically reallocating resources to reduce developing risks, is made possible by the system's ability to learn from historical data and prevailing network activity. The suggested DRL-NARPP model increases the anomaly detection rate by 98.3%, the attack prediction accuracy rate by 97.4%, and the network risk assessment ratio by 96.4%, reducing the false positive ratio by 11.2% compared to other popular methodologies.

Deep Reinforcement Learning in the Advanced Cybersecurity Threat Detection and Protection

Article 30 August 2022

Deep Reinforcement Learning for Cybersecurity Threat Detection and Protection: A Review

Intelligent Solutions for Attack Mitigation in Zero-Trust Environments

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recently, the Internet has grown quickly worldwide, becoming an important part of numerous people's daily lives. Several fields use wireless network information systems, such as online social networking, business transactions, traffic planning, military spying activities, and the IoT [1]. Computer network security is of utmost significance in today's ever-changing digital world. Traditional security measures are frequently inefficient because of the growing complexity of cyber-attacks [2]. There is now a wide variety of cyber threats, from simple malware and phishing to advanced persistent threats (APT) [3]. Traditional security solutions, which depend on predetermined rules and signatures, struggle to keep pace with these changing threats. The need for adaptable, intelligent systems capable of learning from their surroundings is apparent [4]. Understanding the overall network's security condition, seeing faults and abnormal activity on the present network, and providing comments or improvements depend on network security personnel having high situational awareness [5]. Network risk perception is how people, groups, or computer programs think about and understand potential risks and vulnerable places within a computer network. Identifying, analyzing, and understanding risks that could harm the privacy, security, or availability of network resources and data is part of this process [6]. A computer network uses adaptive network protection to find, reduce, and stop cyber threats before they happen. As opposed to traditional or rule-based ways, adaptive network protection uses smart and learning-based methods to constantly change and improve security measures based on changing threats [7]. By adding risk-based security measures to an organization's digital framework, the protection part of the adaptable security model keeps objects secure. Obtaining the vulnerabilities and making controls tighter requires a close examination of systems [8].

As a subfield of machine learning (ML), reinforcement learning (RL) is the most similar to human learning since it can acquire knowledge about its environment via exploration and exploitation. [9]. In this scenario, an agent interacts with the environment, gaining knowledge and making decisions based on the observed information. However, the environment offers rewards and penalties to the agent based on its expected behaviour and actual actions. [10]. The idea of RL was heavily influenced by how most people acquire new skills, namely, by witnessing the results of repeated attempts at the task at hand. [11]. RL excels in real-time and adversarial settings because of its flexibility and utility in modelling an independent agent to conduct consecutive activities, ideally without or with minimal past knowledge of the atmosphere [12]. Deep learning in RL approaches has increased their ability to tackle complicated problems by improving their function approximation and representation learning capabilities [13]. A strong indication is that combining deep learning and RL is well suited for cyber security applications in an era of rapidly evolving, pervasive cyber network risks [14]. RL algorithms can be used to map and understand the structure of computer networks in real time [15]. Agents may discover devices, their functions, and their relationships inside a network via environmental exploration. Due to this dynamic mapping, the network's topology may be seen in real time [16]. The parameters' uncertainty is modelled using probability distributions, and decisions are made based on maximizing the expected reward. [17]. Cyberattack predictions may be made in advance with the help of a virtual agent built using reinforcement learning. [18]. Additionally, deep reinforcement learning (DRL) technology is used to choose participants with sufficient computing resources and high-quality datasets to increase the quality of data sharing. [19]. Feedback from the environment (as in a trial-and-error interaction to determine what works well with a certain environment) might help an RL approach learn to preserve the environment more effectively by rewarding or penalizing the approach's actions. [20].

The major contribution of the article is.

Designing Deep Reinforcement Learning-assisted Network Awareness Risk Perception and Prevention Model (DRL-NARPP) for detecting IoT network attacks.
Introducing Q-Learning for IoT network environment that can determine different network intrusions utilizing an automated trial–error technique and continuously enhance its attack identification abilities.
The experimental outcomes have been implemented, and the suggested DRL-NARPP model increases the risk assessment, accuracy, and prediction ratio and reduces false positive rates compared to other existing models.

The remainder of the study is prearranged: Sect. 2 deliberates the literature study, Sect. 3 recommends the DRL-NARPP model, Sect. 4 discusses the experimental outcomes, and Sect. 5 concludes the research article.

2 Related Works

Several theoretical approaches are available to understand how the risk-related network context may outline network awareness and perceptions. Most studies on information system security have focussed on system security or ensuring that the system is protected through appropriate software and hardware. However, there is a lack of research on network awareness of information security and how this awareness influences their behaviour. The authors note that the role of humans in network security is becoming more acknowledged in academia and industry and they present a review of studies on user awareness of information security [21,22,23,24].

2.1 Cyber-Security Risk Prediction Models

Ihsan H. Abdulqadder et al. [25] proposed the directed acyclic graph (DAG)-based blockchain technology for Context-Aware Authentication Handover and Secure Network Slicing. The author provided context-based authentication and secure handover strategies through the Markov decision-making (MDM) and the weighted product model to increase security. Abdul Razaque et al. [26] presented the web-based Blockchain-enabled cybersecurity awareness program (WBCA) to decrease the risk of cybercrimes. The suggested program aids in comprehending typical cybercriminal actions and strengthens end-user familiarity with cyber hygiene, best practices, susceptibilities, and current patterns. The suggested WBCA leverages Blockchain to defend the software from attacks.

2.2 Network Intrusion Detection Models

Nuno Oliveira et al. [27] proposed the Intelligent Cyber Attack Detection and Classification (ICADC) for a network-based intrusion detection system. The outcomes of the experiments indicate that a sequential approach is higher for dealing with the issue of anomaly detection. The LSTM is a trustworthy model for discovering sequential patterns in data on network traffic, with an accuracy of 99.9% and an f1-score of 91.6%. Halima Ibrahim Kure et al. [28] recommended integrated cyber security risk management (i-CSRM) for risk prediction for critical infrastructure security. A novel i-CSRM framework utilizes a decision-support system based on fuzzy set theories to aid in systematically identifying critical assets and machine learning techniques to predict the risks that may arise and evaluate the efficiency of any current controls. In addition, the findings have shown that machine learning classifiers are very effective in predicting various forms of risk, such as DoS attacks, cyber espionage, and malicious programs.

2.3 Network Risk Prediction Models

Bdah Mohammed Mubarak AlShahrani and Mohammad Tabrez Quasim [29] discussed the Adaboost Regression Classifier (ABRC) for classifying Cyber-Attack and securing the network risk. The proposed ABRC uses a deep learning framework to estimate the impact of attacks on the security of a network. According to the investigation of its performance, the suggested ABRC significantly outperforms the existing deep learning method in detecting cyber attacks. Shareeful Islam et al. [30] deliberated the comprehensive assessment model (CAM) for Asset criticality and risk prediction in cybersecurity risk management (CSRM) of cyber-physical systems (CPS). The experimental findings show that stakeholders may benefit from an efficient risk management technique when applying the fuzzy set theory to identify the criticality of assets. Cyber espionage, denial of service, and crimeware are risks reliably predicted by machine learning classifiers.

2.4 AI Methods in Network Security Systems

Onder Tutsoy and Martin Brown [31] presented the reinforcement learning analysis for a minimum time balance problem. The convergence rate and difficulties related to the value function parameters are first examined in this study using a second-order unstable balance test problem. With the assumption that the optimum control strategy for the minimum time is known, the author focuses on converging the minimum time value function. During the simulation, it becomes clear that the temporal difference mistake creates a null space linked to the basis function for the experiment's end. It is shown that the residual gradient approach converges quicker than TD(0) for this specific test case since the second step is to examine the parameter convergence rate. ONDER TUTSOY [32] introduced the artificial intelligence-based long-term policy-making algorithm to generate time-varying policies for opening the schools part by part. Under worst-case scenarios, the algorithm's primary goal is to generate rules that maximize school enrolment while minimizing pandemic mortality. Under worst-case scenarios, the results show that the suggested algorithm may provide effective policies that reduce COVID-19 fatalities while increasing school enrolment.

Based on the survey, existing systems have several issues in attaining high-risk assessment, accuracy, and prediction ratio and reducing false positive rates. Hence, this study suggests applying the Deep Reinforcement Learning-assisted Network Awareness Risk Perception and Prevention Model (DRL-NARPP) for detecting malicious activity in cybersecurity.

3 Deep Reinforcement Learning-assisted Network Awareness Risk Perception and Prevention Model (DRL-NARPP)

Network awareness refers to an organization or system's capacity to comprehend and monitor its network infrastructure's condition. Understanding the network requires familiarity with its components, its configurations, the data's path, and any security holes. Being aware of the developments in one's network is essential for respectable cybersecurity since it enables the discovery of anomalous behaviour, the identification of possible threats, and the prompt resolution of security issues. Assessing and comprehending risks and hazards to a network is an integral part of risk perception. Assessing the possible effect on the network and the company as a whole, as well as the possibility of a threat occurring, is part of this process. An in-depth familiarity with the assets and risks they face and constant monitoring is essential to accurate risk perception. A proactive method of cybersecurity, adaptive network prevention includes continuously monitoring and updating security policies in response to changing threats. This approach takes security to a new level by including dynamic protections like AI and machine learning. This might involve applying machine learning algorithms to identify abnormalities, adopting behaviour-based analysis, and using automatic response mechanisms to mitigate new risks. The extent and complexity of cyber attacks have grown exponentially as the number of linked IoT devices has risen in recent years. The introduction of several inventive risks and varied network applications poses a significant challenge to the design of an efficient network intrusion detection system (IDS). Signature-based methods have become the norm; however, they are readily overcome by small changes to malware or its dropper. Another strategy takes behavioural deviations into account. This may be accomplished by keeping track of the system's actions over time and marking those that seem out of the ordinary. Anomaly detection-based methods, however, commonly incorrectly labelled common and legal system operations as malicious. Detecting attacks in networks using machine learning is another prevalent method. Recognizing patterns in data are a crucial application for machine learning. This may be used in cybersecurity to improve understanding of an attacked system's actions. However, traditional machine learning algorithms, particularly in operational environments, have their limits regarding cybersecurity. Unless a dataset is selected intentionally to be balanced, it will typically comprise benign data since most of the observed data in a genuine system is not attacked. That means that an attack's base rate is quite low. As a result, most machine learning algorithms tend to overfit data elements from benign environments. In practice, these types of learning algorithms often fail. In addition, they have trouble generalizing to risks they cannot see. Due to the development of DRL techniques, complex cyber-attacks may be detected and countered, including inserting falsified data into cyber-physical systems. Hence, this study suggests the Deep Reinforcement Learning-assisted Network Awareness Risk Perception and Prevention Model (DRL-NARPP) for detecting IoT network malicious activity in cybersecurity. Incorporating long-term planning abilities into RL algorithms enables them to consider future situations and risks. RL learning agents may learn to mitigate risks by practising with simulated environments and making informed predictions about the future. RL algorithms may be designed with reward functions that discourage behaviours linked to potential risks. An RL agent may learn to minimize risk by manipulating the reward signal and avoiding behaviours that result in unfavourable future states. The RL algorithm may still optimize policies using the Multi-layer NN outputs. Estimating the values or probability associated with different actions in various states is made easier by the network's function approximation capabilities. The RL agent may learn to reduce future risks by optimizing the policy using these approximations.

Figure 1 shows the Interaction between the Environment and Agent in the RL process. In RL, an agent is defined by its ability to generate learning experiences by direct interaction with the environment, as opposed to the other major branch of ML, supervised techniques learning by examples. Ideas of action, state, and reward characterize RL. In this approach, the agent makes an action at every time step that results in two outcomes: a new state for the environment and a reward or penalty for the agent. The reward is a function that, given a state, may indicate to the agent whether or not its current behaviour is optimal. The agent learns to do more positive and less negative behaviours based on the rewards it receives. Q-learning is a well-liked RL technique that uses the Bellman equation to optimize the discounted cumulative reward.

$$P\left({w}_{t},{b}_{t}\right)=E\left[{r}_{t+1}+\alpha {r}_{t+2}+{\alpha }^{2}{r}_{t+3}+\dots +\left|{w}_{t},{b}_{t}\right|\right]$$

(1)

The discount factors $\alpha \in [0, 1]$ handle the significance levels of upcoming rewards. It is employed as a statistical technique to examine the learning convergence. Due to the stochastic environment's limited observability or inherent uncertainty, the discount is often used in practice.

The expected reward (Q-value) of action provided a set of states must be stored in a lookup table or Q-table for Q-learning. As the state and action spaces grow, additional memory is needed. Real-world issues generally require continuous states or action space; hence, Q-learning is ineffective in tackling these issues. Thankfully, DL has developed into a potent tool that can be used with conventional RL methods. DL techniques can learn an efficient low-dimensional depiction of raw high-dimensional information because of its signature capabilities: representation learning and function approximation. However, utilizing deep neural networks (DNN) to estimate Q-functions is unbalanced because of the relationships amongst the series of observations and the associations between the Q-value $P(w, b)$ and the target value $P({w}{\prime}, {b}{\prime})$. On the one hand, the experience memory stores a wide list of learning experience tuples $(w, b, r, {w}{\prime})$, which are determined by the agent’s interaction with environments. The agent's learning progression randomly retrieves these memories to evade being influenced by the connections between subsequent experiences. On the other hand, the target networks are identical replicates of the estimating network, except that their variables are fixed and only changed periodically.

Figure 2 shows the proposed DRL-NARPP model. The data are taken from the Edge-IIoTset cybersecurity Kaggle dataset [33]. Data pre-processing includes converting raw data into a consistent and comprehensible format. Data pre-processing practices are employed as desirable at the production phase of the system development life cycles. Grouping and aggregating data, such as summation grouping, normalizing and standardizing data, and scaling data are all pre-processing procedures. Once the data have been pre-processed, suitable cyber data encoding enables dataset-feature mapping. The input data have been transformed into feature output, ready for use by the subsequent feature learning subsystem. A denoising autoencoder (DAE) was used in this research to improve the precision of our suggested IDS. Regarding inputs and outputs, DAE is a deep neural network. This model is trained using the adversarial dataset (the corrupted dataset) to predict the unperturbed dataset (the clean dataset). There are two parts to it: an encoder and a decoder. The encoder comps the input information into a representation called code, and the output data is reconstructed by the decoder from the code. Between the visible input and output layers of an NN lies a collection of hidden layers (called encoders) that make up a DNN. DNN may alter the connection between the input and the output for either a non-linear or a linear relationship link. To solve sequential decision-making issues, reinforcement learning (RL) employs iterative procedures in which an agent (or decision-makers) interact with its environments to learn how to behave appropriately under dissimilar environments. In formal terms, the agent aims to find a policy that would lead to the best possible outcome given the system's present state. This research aims to improve the system's effectiveness by teaching the agent to adopt a strategy that enhances the number of IoT assaults identified over time. If attackers want their IoT assaults to last in the IoT network and generate more profit, they must master stealth and resistance. An attacker may use various techniques to hide the telltale signs of an assault and prevent it from being uncovered. For instance, attackers may regularly alter attack traffic's temporal and spatial characteristics. Fairly accurately, the created system has identified cyberattacks on network traffic. The probability of a cyberattack occurring depends on the severity of the recognized security threat, the level of exposure, and the frequency of attacks. To direct the agent's behaviour, the reward function gives it rapid feedback. By approximating the predicted cumulative reward, the value function helps the agent assess the long-term effects of its activities. Lastly, the value function estimates values, and the policy decides what the agent should do with those values.

DNNs simulate master agents and individual learners, with separate outputs for the critic and the actors. The initial output is scalar values signifying the predictable reward of a provided state $U\left(w\right)$ while the secondary output is vectors of value demonstrating a likelihood distribution of total probable activities $\pi (w, b)$. The values of loss functions of the critic are stated by:

$${K}_{1}=\sum {\left(R-U\left(w\right)\right)}^{2}$$

(2)

As inferred from Eq. (3), where $R = r + \alpha U \left({w}{\prime}\right)$ denotes the discounted future rewards. The actor is pursuing the minimization of the subsequent policy loss functions:

$${K}_{2}=-{\text{log}}\left(\pi \left(b\left|w\right.\right)\right)*B\left(w\right)-\vartheta G(\pi )$$

(3)

As shown in Eq. (3), where $B(w) = R -U (w)$ indicates the estimated benefit functions and $G\left(\pi \right)$ is entropy terms, which manages the exploration ability of agents with the hyperparameters $\vartheta ,$ controlling the strength of the entropy regularizations. The benefit functions $B(w)$ show how beneficial the agents are when it is in particular states. Asynchronous advantage actor-critic (A3C) learning is asynchronous because every learner interrelates with its distinct environment and updates the master network self-sufficiently. The cycle is repeated until the learning is complete; at this point, the master network is utilized.

To study cyberattacks on Cyber-Physical Systems (CPS), the cyber state dynamic by a statistical model characterized as

$$y\left(t\right)=f\left(t,y,v,\omega ;\theta \left(t,b,d\right)\right);y\left({t}_{0}\right)={y}_{0}$$

(4)

As discussed in Eq. (4), where $y$, $v$ and $\omega$ represent the physical states, control input and disturbances, respectively. Besides, $\theta \left(t,b,d\right)$ defines the cyber states at time $t$ with $b$ and $d$ referring to cyber attack and defence, correspondingly.

Figure 3 shows the DRL-assisted intrusion detection systems. In this architecture, DRL agents (edge servers or base stations) connect wireless terminals to the rest of the network. These agents observe their environments and use the DRL model explained in the next sections to make decisions. The model may be trained to predict future rewards and determine prospective intrusions or attacks with the help of DRL agents, a unique type of RL agent. The IoT node works as an agent, monitoring its environment and making decisions based on the information it learns. This agent may be a server or base station with sufficient processing capability. There are two stages to intrusion detection. To begin, a disseminated trust management mechanism is set up to select reliable nodes for the network carefully. The legitimacy of each device is tested by reputation assessment. Nodes with specific functions like base stations or servers collect transmitted information and use DRL models to draw judgments. A DRL agent may train the model to predict impending reward and recognize probable intrusion or assaults. The DRL agent will only talk to the most reliable devices to handle their data needs. Secondly, this paper recommends adopting a DRL technique to identify network intrusions. During training, the agent uses an exploration policy, like the $\epsilon$ -greedy approach, to investigate potential actions. This policy gives the agent discretion to take an action at random with a probability of or to use the greedy method and select actions with the maximum value functions with a probability of 1-$\epsilon$. As an extension of conventional RL methods, the Deep Q-network (DQN) algorithm calculates the Q-value by considering a comprehensive states-actions pair with functions P(w,b). Using the states and actions of the environment and the Kaggle dataset, the DQN agent in our system approximates the Q-value using a DNN. It is impossible to save the Q-value for each state-action combination in Q-tables due to the large number of features in the dataset and the batch size used in the DQN progression. To estimate the Q-value of each state-action combination, the researchers suggest using the DQN algorithm for intrusion detection.

Supposing the attackers can launch different attacks represented by attack vectors$B= ({b}_{1}, {b}_{2}, {b}_{3}, ..., {b}_{m})$. The thresholds utilized to choose the respective attacks in time slots $t$ can be signified by${T}^{t}=\left({\theta }_{{b}_{1}}^{t},{\theta }_{{b}_{2}}^{t},{\theta }_{{b}_{3}}^{t},\dots ,{\theta }_{{b}_{m}}^{t}\right)$. The components in attack vectors are being perceived continuously and simultaneously. Precisely, in time$t$, if the perceived entropy ${G}_{{b}_{j}}^{t}$ for attacks ${b}_{j}$ about a feature surpasses the present thresholds${\theta }_{{b}_{j}}^{t}$. The IoT assault indicator delivers an alarm to administrators. Identification outcomes may be either affirmative (malicious traffic is confirmed as attacks) or negative (malicious traffic is ruled out as attacks). Each subject's detection findings may or may not reflect the subject's true state, which may be summed up into four categories. Attack traffic is confirmed to be malicious, a True Positive (TP). When benign data are erroneously labelled as malicious, this is called a false positive (FP). False Negative (FN): Malicious traffic is incorrectly labelled benign, whereas True Negative (TN) correctly labels benign traffic as malicious.

Now, this study calculates the added incidences for every case during a period $T$ including of $m$ time slot $({t}_{1}, {t}_{2},..., {t}_{m})$. This study uses ${M}_{11}^{T}$ to signify the number of authenticated true assaults. The number of false alarms is signified as ${M}_{12}^{T}$. The number of actual assaults missed is ${M}_{21}^{T}$. ${M}_{22}^{T}$ is utilized to signify the number of authentic benign traffic flows. Then, in time $T$, the complete rewards determined from the environments are:

$${R}^{T}=\left({Q}_{0}-{C}_{0}\right)*{M}_{11}^{T}-\left({C}_{0}-{C}_{1}\right)*{M}_{12}^{T}-{C}_{2}*{M}_{21}^{T}+{Q}_{1}*{M}_{22}^{T}$$

(5)

where

$${M}_{11}^{T}+{M}_{12}^{T}+{M}_{21}^{T}+{M}_{22}^{T}=m$$

(6)

Then, the hit rates ${\delta }_{T}$ and false alarm rates ${\beta }_{T}$ could be calculated by:

$${\delta }_{T}=\frac{TP}{TP+FN}=\frac{{M}_{11}^{T}}{{M}_{11}^{T}+{M}_{21}^{T}}$$

(7)

$${\beta }_{T}=\frac{FP}{FP+TN}=\frac{{M}_{12}^{T}}{{M}_{12}^{T}+{M}_{22}^{T}}$$

(8)

The system state in time $T$ could be signified as:

$${W}_{T}=\left({\delta }_{T},{\beta }_{T}\right)$$

(9)

This study models the attack identification in IoT networks as Markov decision. The objective is to increase the utility of IoT assault identification systems ${R}^{T}$ via optimizing the threshold ${\theta }_{{b}_{j}}$ for identifying the particular attack types ${b}_{j}$.

$${\theta }_{{b}_{j}}^{*}={\text{arg}}\underset{{\theta }_{{b}_{j}}\ge 0}{{\text{max}}}{R}^{T}$$

(10)

The feature threshold must be suitably selected to maximize the system's utility. This study wants the identification system not to miss detecting certain attacks. For this study, though, the system should not send out too numerous false alarms because that might require human assistance. Providing awards for careful and accurate attack detection should encourage the system to do its task, lowering the need for human involvement by a large amount. To help the attacker make enhanced decisions over time, this study suggests a network risk detection method based on reinforcement learning.

Figure 4 shows the scenario of an IoT network attack. As the attacker variations attack variables like attack type and rate, the classification boundary at which traffic is considered malicious may change. Therefore, this research applies an RL model to the attack identification system to continually update the classification boundaries, adjusting to the novel IoT threats. Data metrics are commonly used in the intrusion detection sector for anomaly discovery. In information theory, entropy quantifies the uncertainty associated with a data value. More unpredictability in the information variable is associated with a larger entropy value. In contrast, entropy is low when there is less uncertainty about the information variable. The low entropy value indicates that information variables are concentrated in their distribution, which may indicate the presence of an abnormality in the present system. Anomaly detection is more difficult because of the wide variety of protocols, platforms, hardware and software that open different susceptibilities to attack. Meanwhile, new low-rate attacks make distinguishing between safe and harmful ones more difficult. Additionally, the attacker is developing the capability to modify attack techniques and even design a recent attack depending on input from environments, necessitating immediate response from the defence. The suggested DRL-NARPP model increases the anomaly detection ratio, attack prediction accuracy ratio, and network risk assessment ratio and reduces the false positive rate compared to other existing methods.

4 Experimental Outcomes and Discussion

This study presents the Deep Reinforcement Learning-assisted Network Awareness Risk Perception and Prevention Model (DRL-NARPP) for detecting IoT network malicious activity in cybersecurity. The data are taken from the Edge-IIoTset cybersecurity Kaggle dataset [31]. The information contains both normal network activity and several malicious attacks. More than ten IoT devices (including an ultrasonic sensor for detecting water levels, low-cost digital sensors for sensing temperature and humidity, a flame detector, a heart rate sensor, a pH sensor metre for measuring soil moisture, and so on) contribute to the IoT information stream. However, this study uncovered and evaluated fourteen attacks against IIoT and IoT communication protocols, split into five categories: information collecting, denial-of-service/distributed denial-of-service, man-in-the-middle, malware and injection. This dataset analyses the efficacy of machine learning strategies under both centralized and decentralized learning conditions and provides the main exploratory data analysis results on the suggested realistic cyber security dataset. The performance of the suggested DRL-NARPP model has been examined based on metrics such as anomaly detection ratio, attack prediction accuracy ratio, network risk assessment ratio and false positive rate.

(i) Anomaly Detection Ratio

Detecting attacks and other anomalies in IoT network infrastructure is becoming more important in this field. Threats and attacks in IoT network infrastructure are increasing with the widespread use of such systems across industries. Some attacks and abnormalities that may bring down an Internet of Things system include data type probing, denial of service, malicious control, scan, malicious operation, spying, and wrong setup. However, attacks against IoT network systems may spread over a wider region, causing damage to a far greater number of devices than would be the case with a conventional communication attack on a local network. Microservices in the IoT network exhibit sporadic behaviour, disturb the consistency of IoT service operation, and constitute an anomaly. RL's flexibility in responding to unfamiliar situations is an attractive feature. An IoT anomaly detection system based on RL may adjust its behaviour over time to account for changing network and device conditions. When looking for anomalies, making a series of choices is common depending on the system's changing condition. RL's strengths lie in learning optimum solutions across sequences of actions, making it well-suited to this type of problem. Q-learning aims to train agents to do the best possible actions in highly uncertain environments. Environments with huge state spaces are no problem for Deep Q-Learning since the network approximation of the Q-function (estimates the cumulative reward for each action in a given state) makes this possible. Figure 5 shows the Anomaly Detection Ratio.

(ii) Attack Prediction Accuracy Ratio

This research provides an RL-based attack identification model that can spontaneously learn and identify the alteration of assault patterns, allowing it to adapt to the novel features of attacks on IoT networks. The DRL attack prediction module uses the Mini-batch encoding technique to bring reinforcement learning to a supervised learning setting. This incorporation enhances accuracy. In addition, policy networks implemented in this part of the model are deliberately kept simple to maximize productivity. Feature selection improves IDS model performance by removing superfluous features and maximizing prediction precision and efficiency. Compared to Support Vector Machine (SVM), hidden Markov models, and other ML or data mining techniques, experimental findings acquired from system demand trace information demonstrate the superiority of the suggested DRL-based network IDS regarding greater accuracy and reduced computing costs. Figure 6 shows the attack prediction accuracy ratio.

(iii) Network Risk Assessment Ratio

Cyberawareness refers to the knowledge and comprehension end users have about cybersecurity best practices and the everyday cyber dangers their networks or organizations face. Using reinforcement learning, especially Q-learning, for network risk assessment entails teaching a bot to take actions that lessen vulnerability across the board. Creating a reward structure that incentivizes the RL agent to take measures that secure the network. The incentive system should reward those who contribute to a more secure network. For instance, one may award points for seeing and fixing security flaws while docking them for doing nothing about an impending danger. The trained model's effectiveness in reducing network risks is evaluated using a distinct dataset or a simulated environment. This entails testing the agent's decision-making skills in various environments. Figure 7 shows the network risk assessment ratio.

(iv) False Positive Rate

The intrusion detection rate may measure false positive rate in a network. Assessing the efficacy of intrusion detection systems (IDS) continued to depend heavily on finding a balance between the risks of false negatives and false positives. Confusion between normal and abnormal behaviour might result in false positives and negatives when employing simple detection systems. Several DoS and DDoS assaults may masquerade as regular traffic, and to effectively detect them, it is important to study numerous elements in the network behaviour. Using a single data point (such as link usage) might lead to erroneous conclusions in settings where heavy use of a certain resource is to be anticipated. Because of this, methods that can tell the difference between attack activity and legitimate programs using a lot of resources must be developed. Figure 8 shows the false positive rate.

Table 1 shows the confusion matrix for classifying phishing. Cyberattacks, such as ransomware, insider threats, phishing, botnets, malware, and several more, are a continual reality today. The situation is completely unsustainable, and it's continuing to get worse. There is a tremendous and ever-increasing volume of data that might be compromised. By including a learning progression geared towards email type, concealed malware, or compromised URLs, reinforcement learning may be utilized for spam and phishing detection. Parallel classification problems aim to identify phishing attacks within a data collection set, including spoofed and authentic instances. False positive (FP) rate procedures the number of legitimate instances erroneously recognized as phishing attacks concerning all current actual occurrences, as demonstrated in Eq. (11).

Table 1 Confusion matrix

Full size table

$$FP=\frac{{M}_{L\to P}}{{M}_{L\to L}+{M}_{L\to P}}$$

(11)

As shown in Eq. (11), where $M$ symbolizes the number of phishing instances that are accurately categorized as phishing, ${M}_{P\to P}$ denotes the number of authentic cases that are erroneously categorized as phishing, ${M}_{L\to P}$ indicates quantity of phishing occasions that are imprecisely categorized as original, ${M}_{P\to L}$ represents the number of actual occurrences that are mistakenly categorized as original and ${M}_{L\to L}$ signifies the number of genuine cases that are effectively categorized as original.

5 Conclusion

This study presents the Deep Reinforcement Learning-assisted Network Awareness Risk Perception and Prevention Model (DRL-NARPP) for detecting IoT network malicious activity in cybersecurity. This research explored the application of reinforcement learning, specifically Q-learning, in the context of network awareness, risk perception, and adaptive network prevention. The objective was to design a smart system that could continuously evolve in response to the ever-changing nature of cyber threats. Utilizing Q-learning, the network can dynamically modify its preventative measures in response to a constantly changing threat landscape. This flexibility is essential for successfully fighting evolving cyber threats, some of which may reveal previously unanticipated patterns. Accurate perception and evaluation of network threats were shown to be within the agent's control due to its reinforcement learning capabilities. The agent became more adept at spotting security flaws and dangers in the network as it accumulated experience by interacting with the system and analyzing collected data. Over time, the agent learned optimum decision-making techniques that reduced false alarms while effectively responding to true security risks. The Q-learning model optimized preventative measures by continual interaction with the network environment. As part of this process, this study changed the firewall rules, updated the security environments, and implemented preventative measures to lower the risk level. The suggested method uses a decentralized approach; however, it fails to account for computational and network costs (such as delays between agents in routers and the central intrusion detection system).

Availability of Data and Materials

This article does not cover data research. No data were used to support this study.

References

He, W., Ash, I., Anwar, M., Li, L., Yuan, X., Xu, L., Tian, X.: Improving employees’ intellectual capacity for cybersecurity through evidence-based malware training. J. Intellect. Cap. 21(2), 203–213 (2020)
Article Google Scholar
De Kimpe, L., Walrave, M., Verdegem, P., Ponnet, K.: What we think we know about cybersecurity: an investigation of the relationship between perceived knowledge, internet trust, and protection motivation in a cybercrime context. Behav. Inform. Technol. 41(8), 1796–1808 (2022)
Article Google Scholar
Xu, W., Murphy, F., Xu, X., Xing, W.: Dynamic communication and perception of cyber risk: Evidence from big data in media. Comput. Hum. Behav. 122, 106851 (2021)
Article Google Scholar
Xie, Y.X., Ji, L.X., Li, L.S., Guo, Z., Baker, T.: An adaptive defense mechanism to prevent advanced persistent threats. Connect. Sci. 33(2), 359–379 (2021)
Article Google Scholar
Mehraj, H., Jayadevappa, D., Haleem, S.L.A., Parveen, R., Madduri, A., Ayyagari, M.R., Dhabliya, D.: Protection motivation theory using multi-factor authentication for providing security over social networking sites. Pattern Recogn. Lett. 152, 218–224 (2021)
Article Google Scholar
Snider, K. L., Shandler, R., Zandani, S., & Canetti, D. (2021). Cyberattacks, cyber threats, and attitudes towards cybersecurity policies. Journal of Cybersecurity, 7(1), tyab019.
Radanliev, P., De Roure, D., Page, K., Van Kleek, M., Santos, O., Maddox, L. T., ... & Maple, C. (2020). Design of a dynamic and self-adapting system, supported with artificial intelligence, machine learning and real-time intelligence for predictive cyber risk analytics in extreme environments–cyber risk in the colonization of Mars. Safety in Extreme Environments, 2, 219–230.
Corallo, A., Lazoi, M., Lezzi, M., Luperto, A.: Cybersecurity awareness in the context of the Industrial Internet of Things: a systematic literature review. Comput. Ind. 137, 103614 (2022)
Article Google Scholar
Huang, Y., Huang, L., Zhu, Q.: Reinforcement learning for feedback-enabled cyber resilience. Annu. Rev. Control. 53, 273–295 (2022)
Article MathSciNet Google Scholar
Imran, M., Siddiqui, H.U.R., Raza, A., Raza, M.A., Rustam, F., Ashraf, I.: A performance overview of machine learning-based defense strategies for advanced persistent threats in industrial control systems. Comput. Secur. 134, 103445 (2023)
Article Google Scholar
Wang, W., Sun, D., Jiang, F., Chen, X., Zhu, C.: Research and challenges of reinforcement learning in cyber defense decision-making for intranet security. Algorithms 15(4), 134 (2022)
Article Google Scholar
Ferrag, M. A., Shu, L., Friha, O., & Yang, X. (2021). Cyber security intrusion detection for agriculture 4.0: machine learning-based solutions, datasets, and future directions. IEEE/CAA Journal of Automatica Sinica, 9(3), 407–436.
Sethi, K., Madhav, Y.V., Kumar, R., Bera, P.: Attention based multi-agent intrusion detection systems using reinforcement learning. J. Inform. Secur. Appl. 61, 102923 (2021)
Google Scholar
Bout, E., Loscri, V., Gallais, A.: How machine learning changes the nature of cyberattacks on IoT networks: A survey. IEEE Commun. Surv. Tutorials 24(1), 248–279 (2021)
Article Google Scholar
McCarthy, A., Ghadafi, E., Andriotis, P., Legg, P.: Functionality-preserving adversarial machine learning for robust classification in cybersecurity and intrusion detection domains: a survey. J. Cybersecur. Privacy. 2(1), 154–190 (2022)
Article Google Scholar
Kabanda, G.A.B.R.I.E.L.: Performance of machine learning and other artificial intelligence paradigms in cybersecurity. Oriental J. Comput. Sci. Technol. 13(1), 1–21 (2020)
Article Google Scholar
Mishra, S., Albarakati, A., Sharma, S.K.: Cyber threat intelligence for IoT using machine learning. Processes. 10(12), 2673 (2022)
Article Google Scholar
Chen, Z., Liu, J., Shen, Y., Simsek, M., Kantarci, B., Mouftah, H.T., Djukic, P.: Machine learning-enabled IoT security: open issues and challenges under advanced persistent threats. ACM Comput. Surv. 55(5), 1–37 (2022)
Article Google Scholar
Revathi, M., Ramalingam, V. V., & Amutha, B. (2021). A machine learning based detection and mitigation of the DDOS attack by using SDN controller framework. Wireless Personal Communications, 1–25.
Kabanda, G.: Performance of machine learning and big data analytics paradigms in cybersecurity and cloud computing platforms. Global J. Comput. Sci. Technol. 21(2), 1–25 (2021)
Google Scholar
Guha Roy, D., & Srirama, S. N. (2021). A blockchain‐based cyber attack detection scheme for decentralized Internet of Things using software‐defined network. Software: practice and experience, 51(7), 1540–1556.
Jakka, G., Yathiraju, N., Ansari, M.F.: Artificial intelligence in terms of spotting malware and delivering cyber risk management. J. Positive School Psychol. 6(3), 6156–6165 (2022)
Google Scholar
Selva, D., Nagaraj, B., Pelusi, D., Arunkumar, R., Nair, A.: Intelligent network intrusion prevention feature collection and classification algorithms. Algorithms. 14(8), 224 (2021)
Article Google Scholar
Noor, Z., Hina, S., Hayat, F., & Shah, G. A. (2023). An intelligent context-aware threat detection and response model for smart cyber-physical systems. Internet of Things, 100843.
Abdulqadder, I.H., Zhou, S.: SliceBlock: context-aware authentication handover and secure network slicing using DAG-blockchain in edge-assisted SDN/NFV-6G environment. IEEE Internet Things J. 9(18), 18079–18097 (2022)
Article Google Scholar
Razaque, A., Al Ajlan, A., Melaoune, N., Alotaibi, M., Alotaibi, B., Dias, I., ... & Zhao, C. (2021). Avoidance of cybersecurity threats with the deployment of a web-based blockchain-enabled cybersecurity awareness system. Applied Sciences, 11(17), 7880.
Oliveira, N., Praça, I., Maia, E., Sousa, O.: Intelligent cyber attack detection and classification for network-based intrusion detection systems. Appl. Sci. 11(4), 1674 (2021)
Article Google Scholar
Kure, H.I., Islam, S., Mouratidis, H.: An integrated cyber security risk management framework and risk predication for the critical infrastructure protection. Neural Comput. Appl. 34(18), 15241–15271 (2022)
Article Google Scholar
AlShahrani, B.M.M.: Classification of cyber-attack using Adaboost regression classifier and securing the network. Turkish J. Comput. Math. Educ. (TURCOMAT) 12(10), 1215–1223 (2021)
Google Scholar
Kure, H.I., Islam, S., Ghazanfar, M., Raza, A., Pasha, M.: Asset criticality and risk prediction for an effective cybersecurity risk management of cyber-physical system. Neural Comput. Appl. 34(1), 493–514 (2022)
Article Google Scholar
Tutsoy, O., Brown, M.: Reinforcement learning analysis for a minimum time balance problem. Trans. Inst. Meas. Control. 38(10), 1186–1200 (2016)
Article Google Scholar
Tutsoy, O.: COVID-19 epidemic and opening of the schools: Artificial intelligence-based long-term adaptive policy making to control the pandemic diseases. Ieee Access 9, 68461–68471 (2021)
Article Google Scholar
https://www.kaggle.com/datasets/mohamedamineferrag/edgeiiotset-cyber-security-dataset-of-iot-iiot

Download references

Funding

The key research project of the Party Building Research Committee of the Colleges and Universities of Xinjiang Party Building Research Association "Research on Network Ideological Risks and Prevention Strategies in Xinjiang Universities in the New Era" in 2023 (Project number: GXDJ2023006).

Author information

Authors and Affiliations

Publicity and United Front Work Department, Urumqi Vocational University, Urumqi, 830002, Xinjiang, China
Junwei Xie

Authors

Junwei Xie
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Junwei Xie wrote the full paper.

Corresponding author

Correspondence to Junwei Xie.

Ethics declarations

Conflict of Interest

The author(s) declared no potential conflicts of interest with respect to the research, author- ship, and/or publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Xie, J. Application Study on the Reinforcement Learning Strategies in the Network Awareness Risk Perception and Prevention. Int J Comput Intell Syst 17, 112 (2024). https://doi.org/10.1007/s44196-024-00492-x

Download citation

Received: 26 January 2024
Accepted: 25 March 2024
Published: 07 May 2024
DOI: https://doi.org/10.1007/s44196-024-00492-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Application Study on the Reinforcement Learning Strategies in the Network Awareness Risk Perception and Prevention

Abstract

Similar content being viewed by others

Deep Reinforcement Learning in the Advanced Cybersecurity Threat Detection and Protection

Deep Reinforcement Learning for Cybersecurity Threat Detection and Protection: A Review

Intelligent Solutions for Attack Mitigation in Zero-Trust Environments

1 Introduction

2 Related Works