An edge based hybrid intrusion detection framework for mobile edge computing

The Mobile Edge Computing (MEC) model attracts more users to its services due to its characteristics and rapid delivery approach. This network architecture capability enables users to access the information from the edge of the network. But, the security of this edge network architecture is a big challenge. All the MEC services are available in a shared manner and accessed by users via the Internet. Attacks like the user to root, remote login, Denial of Service (DoS), snooping, port scanning, etc., can be possible in this computing environment due to Internet-based remote service. Intrusion detection is an approach to protect the network by detecting attacks. Existing detection models can detect only the known attacks and the efficiency for monitoring the real-time network traffic is low. The existing intrusion detection solutions cannot identify new unknown attacks. Hence, there is a need of an Edge-based Hybrid Intrusion Detection Framework (EHIDF) that not only detects known attacks but also capable of detecting unknown attacks in real time with low False Alarm Rate (FAR). This paper aims to propose an EHIDF which is mainly considered the Machine Learning (ML) approach for detecting intrusive traffics in the MEC environment. The proposed framework consists of three intrusion detection modules with three different classifiers. The Signature Detection Module (SDM) uses a C4.5 classifier, Anomaly Detection Module (ADM) uses Naive-based classifier, and Hybrid Detection Module (HDM) uses the Meta-AdaboostM1 algorithm. The developed EHIDF can solve the present detection problems by detecting new unknown attacks with low FAR. The implementation results illustrate that EHIDF accuracy is 90.25% and FAR is 1.1%. These results are compared with previous works and found improved performance. The accuracy is improved up to 10.78% and FAR is reduced up to 93%. A game-theoretical approach is also discussed to analyze the security strength of the proposed framework.


Introduction
Mobile Edge Computing is a distributed computing environment in which information technology with networkingbased elements such as hardware, memory, and virtual resources are integrated with telecommunications networking such as mobile base stations [1,2]. It extends the capabilities of cloud computing and IT services at the edge of the cellular network. This technology was primarily based B Ashish Singh ashishashish307@gmail.com 1 MEC technology is specially designed for compute-intensive and latency-sensitive applications like augmented reality [3], real-time data analysis [4], Internet of Things (IoT) [5], video analytics, indoor positing, smart vehicles [6], smart IoT-based healthcare system [7][8][9][10], and image processing.
However, in mobile edge computing, multiple smart applications use the same edge device with a different set of users simultaneously. This increases the security issues of edge devices because if the edge device is hacked, it can provide false output data with inaccurate input data and produce incorrect results. This will affect the performance of any sensitive application. Also, the data of the hacked edge device can be misused for other unwanted or illegal processes. Generally, MEC faces two major types of attacks. The first one is many times, the edge device itself compromised after authentication. Any insider can perform an unwanted activity using the device as certain privileges are given to the devices for some inside network activity. This can be called insider attacks or unauthorized attacks [11] through which the attacker tries to attack the edge layer or cloud layer. The second one is sometimes unauthenticated edge device tries to attack the device layer or cloud layer with some advanced techniques. This type of attack is called an unauthenticated attack or outside attack. It is very difficult to identify insider attacks as a mobile edge computing environment is followed a multitenant architecture where resources are shared among different applications. Hence, malicious edge device attacks are the major drawback in the mobile edge computing environment. The firewall is the first line of defensive solutions to protect such networks. But, a traditional firewall solution can only block the packets coming from the outside through the internet. But, it cannot filter the malicious inside packets generated due to insider attack [12,13]. Due to the variability and complexity of the edge nodes, such firewall-based solutions are not effective. Also, most of the MEC devices are resource constrained and costly to manage a huge number of firewalls to implement security solutions.

Motivation
Mobile Cloud Computing (MCC) concepts are introducing the benefits of cloud computing for mobile users [1]. It provides higher data storage capabilities to the users and providing many sophisticated applications with services. But at the same time, it imposes a huge additional load on radio and mobile networks and introduces high latency. This is not helpful for real-time applications. Consider an application is used to process real-time video streaming or traffic monitoring. In such streaming-based transmission, the large number of access requests and faster growth in computing capabilities may reduce the efficiency of the network. This reduction of the efficiency of the mobile network is due to a lack of energy, bandwidth, latency, etc. Hence, cloud-based applications suffer from latency and network-related bottlenecks. Mobile edge devices remove these limitations of cloud computing with lower latencies and jitter. Therefore, there is an immediate requirement of the edge-based framework for such real-time applications. Another very common problem is different malicious attacks like Distributed DoS (DDoS), ransomware, remote recording, routing attacks, and data leakage attacks are the most probable security threats faced by the MEC system [14]. Among these security threats, a DDoS attack is very much focused on which a large number of compromised machines launches a DoS attack on the MEC environment for the disturbance of network service and reduction of service efficiency [14]. With this attack, the intruder overloads the MEC server with a false request packet to block the server to make the resources unavailable and increase the response time. Many security mechanisms such as access control algorithms, homomorphic encryption, public key-based security algorithm, and intrusion detection mechanisms are considered to protect this MEC network. But efficient security solutions are not available in MEC.
This motivates me to do the research in MEC to build an efficient Intrusion Detection System (IDS) that can identify all the attacks as mentioned earlier and secure the mobile edge devices so that efficient transmission of real-time data can be possible.

Need of hybrid IDF
Signature-based IDS (SIDS) is a popular method for intrusion detection as they can be effective by regular updating the signature database. It generally used signatures through the polymorphic behavior of malware. But it has some limitations. The first limitation includes many times it fails to succeed in similarity test because the incoming signature does not match with stored signatures in the IDS database. Thus, the malicious packets can enter in the system very easily. Second, if the signature database volume increases, the IDS automatically takes a longer time for analyzing and processing of huge incoming data packets. Third, it cannot detect zero-day attacks, a very new unknown attack, as the attack signatures are not available in the databases after regular updates. Thus SIDS fails to detect all known and unknown attacks.
However, Anomaly-based IDS (AIDS) can efficiently detect all attacks using different ML-based, statistical-based, and knowledge-based techniques. AIDS mainly creates normal profiles of attacks. But it has some limitations. The first limitation includes if the attack pattern is general in nature, then it could not identify several intrusions. This can give a poor detection rate. Second, in AIDS, if the profiles of the packets are very specific, then only it can identify them. Otherwise, several normal behaviors of network traffic could be classified as attack class. Due to this, it has a high false-positive rate, giving high alarm generation by the AIDS decision module. Third, most of the AIDS models are trained at the beginning phase, and after that, they can be used for a long time. In many cases, due to dynamic network changes in traffics, there is a need to retrain the model based on new attack behavior.
Thus, both SIDS and AIDS have some strengths and weaknesses. A Hybrid Intrusion Detection Model (HIDM) is developed using an ensemble method of both techniques to overcome the limitations mentioned above. In literature, there are popular ensemble methods such as Bootstrap aggregating, boosting, and stacking. In this model, the boosting strategy is used because this process can steadily create an ensemble by utilizing the misclassified training instances of past models.

Problem formulation
The intrusion detection mechanism is the most common security solution capable of identifying malicious activity or attacks. The IDS helps to find intrusive behavior of network traffic that may harm the system. It is mainly composed of an agent, an analysis engine, and a response module. Several IDS-based security solutions were developed in the previous year [15][16][17][18][19][20][21][22][23][24][25]. But, these IDS-based security solutions cannot cope with the MEC architecture due to the changing behavior of users and devices. Most of the IDS solutions analyzed network traffic for the detection of attacks. But, nowadays, the amount of network traffic is huge and contains heterogeneous MEC devices in terms of protocols, manufacturers, and applications. Hence, anomalies or signatures-based network traffic analysis solutions are not suited to detecting malicious activity or intrusion in the MEC environment. One of the challenging tasks in IDS is building a behavior framework containing a rule set, pattern analysis to differentiate normal behaviors and abnormal behaviors from collected historical data. But, nowadays, attack patterns are changing each day and very difficult to detect new attacks. Hence, it is challenging for intrusion detection with a good performance and low time complexity. These significant discussions comprised two major problems. The first identified problem is found in the current approach because most of the ML-based IDS are not efficiently detecting unknown attacks or new attacks, which is also known as a zero-day attack. The second problem is that with the reduction of FAR, the computational complexities increased. As the devices are constrained, so the performance is also degraded.

Problem solution
An edge-based hybrid IDS framework is essential to solve these identified problems in MEC architecture. This proposed framework can detect unknown attacks or new attacks, and the computational complexities of the system are low. Usually, signature-based models cannot detect unknown attacks, whereas the anomaly detection models detect both known and unknown attacks with increasing FAR. These limitations can be overcome by using hybrid models. While designing a hybrid model, boosting approach is considered as an ensemble learning strategy that combines numerous models rather than utilizing a single model to improve ML performance. [26]. This approach will solve both identified issues. Hence, the main objective of this research work is to develop a solution using a mobile edge-based hybrid IDF that detects known attacks and can identify new unknown attacks. The developed edge-based hybrid IDF can able to prevent the MEC environment from malicious attacks with minimal overhead on MEC devices.
The key contribution of the work are summarized below: • An edge-based hybrid IDF has been proposed for the MEC environment, which not only detects known attacks but also able to detect unknown or new attacks. • The proposed edge-based hybrid IDF comprises a hybrid detection module that includes SIDS and AIDS modules. • A game theoretical approach is followed to analyze the security strength of the edge-based IDF. • The proposed framework is demonstrated and analyzed to measure the performance of the system. The achieved experimental results and security analysis of the model shows high accuracy and ability to detect unknown or new attacks.
The remaining parts of the paper are structured as Background is discussed in "Background". "Proposed edge-based hybrid IDF" presents the proposed hybrid edge-based IDF. The implementation results are explained in "Experimental results and performance analysis". A game-theoretical approach for the security analysis is discussed in "Security analysis". Conclusion is discussed in "Conclusion and future scope".

Edge-based IDS security architecture
Most of the current enterprise security solution uses cloud architecture where service providers have responsible for satisfying all the security requirements. But, ad hoc environment and low-latency requirement of the applications like MEC, IoT, these security solution is not scalable. The edge computing concept provides a new way to design and deploy new security solutions for the MEC environment. The main objective of any edge-based security solution is to satisfy all the security requirements at the network's edge.
This paradigm deploys security solutions at the edge computing layer rather than at the cloud. This can also offload the processing task from the end devices by enabling computation and storage capacity at the edge network. Hence, the edge computing paradigm not only improves the system security level but also reduces network latency and network congestion. The edge-based security architecture is mainly classified into user-centric, device-centric, and end-to-end security [27]. An edge-based IDS can work in an uncertain condition where data size is big and needed a fast response. A mobile edge computing architecture with IDS is shown in Fig. 1. The edge-based mobile computing architecture mainly concerning with three layers: end-user layer, mobile edge networking layer, and data storage layer. The data storage layer consists of resources, information, and services with security features. First, at the end-user layer, the smart end-user is connected with the system using edge devices. Then, all the edge application raw traffic goes to the mobile edge networking layer. In this layer, most of the essential security features are deployed. In this work, the Edge-based Hybrid IDF is deployed in this layer. The responsibilities of this layer are to reduce the network latency, satisfying many real-time needs, offloading heavy computational tasks, process the data very quickly, provide security features, monitor all the traffic data, and many more. After processing the data, the data storage layer stores the data into the cloud.

IDS overview and limitations
Generally, four types of IDS are used to provide the security to MEC environment. These are Host-Based (HIDS) [28], Network-Based (NIDS), Hypervisor-Based, and Distributed IDS (DIDS). Host-based IDS are responsible for monitoring and analyzing the information collected from a specific host machine. Network-based IDS are used to detect network intruders by comparing the current behavior of network traffic with the previously observed traffic behavior in real time. DIDS is composed of many HIDS /NIDS for network traffic monitoring over a huge network. It allows the user to monitor and examine communications between Virtual machines (VM). Hypervisor-based IDS are specifically used in Cloud computing for intrusion detection in a virtual environment. Mainly these IDS use different types of intrusion detection techniques. These techniques are based on: Signature [29][30][31][32][33], Anomaly [34][35][36][37][38], Artificial Neural Network (ANN) [39][40][41][42][43], Fuzzy Logic [44][45][46][47], Association Rule [34,48,49], Support Vector Machine (SVM) [50][51][52], Genetic Algorithm (GA) [53][54][55][56][57], Hybrid Technique [58]. Signaturebased IDS mainly detect intrusion by matching captured patterns with previously generated pattern databases. Thus, they cannot detect unknown attacks, which produces high FAR for unknown attacks. Anomaly detects unknown attacks with low accuracy. Due to this drawback, ANN-based IDS was found in the literature. ANN-based IDS [43] classifies unstructured network packets efficiently with multiple hidden layers. But, it requires more time and more training of samples. Fuzzy logic-based IDS and Association Rule-based IDS provides a good result for some uncertain problems. But the detection rate is less. SVM-based IDS can correctly classify intrusions for a given sample data. This type of IDS can handle a large number of preprocessed data with high accuracy. In GA-based IDS, complexity increased with computational cost, whereas in Hybrid IDS is an efficient approach to accurately classifying rules.
The mobile IDS consists of handheld wireless devices (smartphones or mobile devices) with intrusion detection capabilities [59][60][61][62][63][64][65]. It has a self-configurable network in which, without the help of any party, the system automatically deployed very quickly. While the IDS technology is moved from a stand-alone computer to a mobile device, various design and implementation constraints arose. In such a mobile-based IDS, battery-powered, limited energy supply, constrained resources, limited processing capability, low memory, and low sensing range are some of the challenging issues. The ad hoc nature and independent running nodes of the mobile IDS make the system more vulnerable and might not be enough to detect intrusions. In some cases, mobile applications are running on a specific platform and allowing third-party applications. Thus, unique vulnerabilities and new intrusive traffic cannot be identified using smartphones. Due to resource constraints, the nature of mobile IDS response time of suspicious activity may be high, and the ability to visualize the traffic patterns is limited. Physical security of mobile nodes, absence of central security management point, single point of failure, undefined network boundary, and low cryptographic supports are some limitations of mobile IDS that will keep in mind while the intrusive packet identification is attempted using a mobile device.

Related works
A literature work has been carried out to examine recent findings in the area of IDS solutions in MEC networks [66][67][68][69][70][71][72][73][74]. A firewall architecture has been designed to protect the edge network from the insider attack [75]. This architecture support correct, non-bypassable, and tamper-resistant characteristic reside in any protection system. A deep learning approach for the detection of intrusion in Internet society has been discussed in [17]. Recurrent Neural Networks (RNN) lead the binary and multi-class classification, which will help measure the system's performance. In such a system, high computational processing was observed, which will reduce the system efficiency [76]. A Distributed Intrusion Detection Systems (DIDS) has been proposed in [77]. This work aims to reduce the false alarm rate in DIDSs-based edge computing environments. They also reduced the response  [16]. The proposed system can detect intrusive activities in the EoT network. The proposed framework consists of data collection, feature extraction, and classification module. But, the computational requirement and cost of this model are high. The security of the Internet of Things (IoT) network is an important issue. To view this security issue, in [18], proposed a robust IDS. This approach is comprised of a multi-agent system, blockchain, and deep learning algorithms. The system's efficiency is high, but combining three different approaches increases the system complexity and the response time. For the IoT infrastructure, device-edge-based IDS has been proposed in [19]. The IDS is made with the help of behavioral profiles and system-level information. The unique split architecture supports effective detection with minimal latency. But, the complexity of the system architecture was computationally overloaded. An IDS has been developed in [15] for the internet industry. They have also designed the concept of Cloudlet on which Edge-based IoT devices are deployed in urban areas. The proposed model consists of a Microcontroller module, a Mobile application module, and a Database module. But, the security efficiency and performance of the model are low. A network IDS has been proposed in [20] for mobile edge computing. This technique captures all the tcpdump packets, extracts and analyses the features, and, if identified as a legitimate packet, forward it into the network. A topic model is trained to learn the behavioral pattern of a normal packet. But, the detection accuracy is affected in case of new types of the packet is come into the networks. Data-driven mimicry and game theory-based IDS have been proposed in [78]. The new attacks are investigated based on the game income of participants and game balance points in the edge computing networks. They also try to reduce the cost of the IDS. Traffic inspection and classification-based distributed attack model has been proposed in [79] for the IoT applications. They have also leveraged the flexibility of cloud-based architectures with edge computing architecture. This model shifted more computational work at the cloud. But, a traffic classification-based mechanism does not give accurate results if new traffic comes into the network. An IDS for smart connected vehicles has been proposed in [80]. They try to achieve user requirements such as quality of service (QoS) and quality of experience (QoE). The detection module contains data traffic analysis, reduction, and classification techniques. Deep belief and decision tree ML mechanisms are used for data reduction and classification purposes, respectively. Multilayer Perceptron (MLP)-based lightweight IDS has been proposed in [81]. Vector space representation and a single hidden layer MLP concept provide a reasonable detection rate which also improves the security of the fog computing environment. But, the com-putational efficiency of the proposed model is much high. This can lead to an increased detection time. In [82], proposed an IDS for multi-access edge computing environment. The evaluation criteria consider both the centralized and distributed collaborative environment on which the IDS is deployed. The Distributed Hash Tables (DHTs) in Peer-topeer (P2P) communication may reduce the data transmission overhead. The cyberattacks in fog-assisted networks are possible if a proper security solution is not implemented [83]. The artificially full-automated IDS is a solution developed in [83] may protect the system against cyber-attacks. An IDS based on multi-layered deep recurrent neural networks has been designed to secure end-users and IoT devices. An IDS for Vehicular Edge Computing has been proposed in [84]. The proposed cooperative IDS offloads the training task from the distributed edge devices. The federated learning approach reduces the resource utilization from the edge server. Blockchain is used to provide security and privacy while the training information is shared between the edge devices. A novel two-phase cycle algorithm has been proposed in [85] for the detection of cyber intrusion in an edge computing environment. KDD cup 1999 benchmark dataset is used to validate the performance of the model. The proposed model achieved 98.81% accuracy and 98.23% detection rate. The proposed model is validated on the ancient datasets, which needs to be improved by measuring the performance on the recent dataset. A deep learning approach has been used in [86] for the detection of host intrusion in an edge-IoT environment. They have achieved 99.74% accuracy and 1 μs attack prediction timing. In Lambda architecture, an IDS has been proposed to provide the security of edge-cloudbased IoT system [87]. The proposed model uses a deep learning approach for the detection analysis. The edge-cloud concept reduces the training time compared to traditional ML algorithms and improves the attack detection accuracy. A hybrid algorithm for the detection of intrusive packets has been proposed in [88]. The proposed approach uses Artificial Bee Colony (ABC) and Artificial Fish Swarm (AFS) to develop Internet traffic classification. The selected features use the CART technique to generate If-Then rules that will classify the intrusive packets. A misuse-based hybrid IDS has been proposed in [89]. This hybrid approach combines two different techniques in one unit. Packet traffic and network traffic anomaly detection with snort make an anomaly-based hybrid approach. This approach analyzed the system activities and determine the matching score by finding out the relation between system activities and known activities having definitions into the system. The hybrid system can detect up to 146 attacks. Multi-classifier-based approach has been used in [90] for the development of the hybrid IDS algorithm. It combines AIDS and SIDS for the detection of intrusive packets. SIDS used a C5 decision tree classifier, and after that, a one-class Support Vector Machine algo-rithm was used to split the dataset, which gives a better result than the previous model. They have achieved 83.24% accuracy. Al-Yaseen et al. [91] developed a hybrid model with the multilevel model where SVM and extreme learning machine is used. This model has good performance results for unknown attacks with low FAR for the dataset KDD Cup 1999. An anomaly and signature-based hybrid intrusion detection approach have been proposed in [92] for the detection of DDoS attacks. The integrated output of both methods make the system hybrid and enhanced the overall accuracy. The proposed model is implemented on two datasets (DARPA 2000 Dataset and commercial Bank from a Penetration Test). A hybrid IDS has been proposed in [93] for the security of the smart home system. They have used multiple ML techniques, including random forest, decision tree, X gboost, and misuse detection to implement the work. A hybrid IDS approach has been followed in [94] for the security of Internet-connected smart vehicles. A multi-tiered hybrid IDS incorporates a SIDS and AIDS to detect both known and unknown attacks on vehicular networks. The experiment result illustrated that the model achieves up to 99.99% accuracy. From the above discussion, it has been found that the hybrid IDS can be used to improve the detection performance in terms of FAR and unknown attack detection rate. But, due to multiple repetitive operations, the complexity and overhead will be increased. Some of the literature works limitations are tabulated in Table 1.
The following research problems have been identified in existing IDS solutions in MEC from these overall works of literature.
• Despite all advantages of ML-based IDS in MEC, there is a lack of a hybrid model which can solve the identified research objective. • Most of the ML-based IDS in MEC gives high accuracy with high FAR. • The difficulty in the existing MEC system is achieving low FAR with minimal overhead since the devices in the MEC environment is constrained. • The hybrid IDS can reduce the FAR but automatically increases complexity and running time.

Attacks on mobile edge computing
• Denial of Service (DoS) attack: An attacker fires many packets to a target victim from an innocent computer or host in the network. Due to these false packets, most of the bandwidth is exhausts and engages all the resources. This can lead to flooding attacks and zero-day attacks in which a genuine user request cannot proceed or he cannot access the services.

Proposed edge-based hybrid IDF
This section discusses the working of the proposed EHIDF that includes SDM, ADM, and HDM. It uses UNSW-NB 15 dataset for the experiment and testing purposes. The SDM uses a C4.5 classifier, ADM uses Naive-based classifier, and HDM uses the Meta AdaBoostM1 algorithm for the classification and attack detection purpose. The proposed EHIDF is shown in Fig. 2

Fuzzers
This attacking technique attacker tries to discover the loopholes in the system by feeding semi-randomgenerated data into the system so that the system will be crashed 6. Generic A type of attack against a cryptographical primitive without knowing about the configuration of the block cipher

7.
Reconnaissance Contains all set of processes and techniques that will be helpful to determine the information of the target system

Shellcode
A small chunk of payload script used to manipulate the flaw of the program 9. Worms Worms are malware that replicates themselves from one network to another without any human interaction of TCP/IP, including some of the attributes related to HTTP service. The time features include the time attributes of the events that occur in the networks. The additional features include a general-purpose feature and connection feature. The general purpose feature is used for its own purpose, whereas connection features include the flow of 100 sequential order of record connections. The description of each feature is shown in Fig. 3. The total number of instance distribution for training and the testing dataset is presented in Table 3. The set of instances are divided into 70%:30% for training and testing purposes, respectively.

Data pre-processing
In the pre-processing data phase, the original UNSW-NB15 dataset size has been reduced by eliminating redundant data from the dataset. The clustering approach has been used to identify similar data behavior of different categories present in the dataset. It is started by removing the labels from the dataset. Silhouette coefficient is applied to determine the required number of clusters and cluster quality. k-mean cluster algorithm is used in which the value of k will be changed to determine cluster configurations. The generated cluster configuration measures the silhouette value. The maximum silhouette coefficient value for the least value of k is selected for further testing purposes. Figure 4 illustrated that silhouette value is maximum (0.7) when the number of clusters is 11. Dataset instances are distributed in 11 clusters which are shown in Table 4. Random data have been selected from each cluster for training and testing purposes. The training data contains approximately 70% of instances, and the remaining 30% samples are used for testing. The detailed description of the reduced dataset is given in Table 5.

Feature selection and ranking
The UNSW-NB15 dataset consists of total 47 features. In this phase, the attributes (features) are selected from the data set using the Information Gain value. This value shows the feature importance in a given dataset. The high information gain value of a feature contains high significance compared to other features. The Information Gain (I G (S, F)) value of a feature (F) in a given dataset (size=S) is calculated using Eq. 1.
where Entropy (E N (S)) is the defined degree of nonhomogeneity in the given dataset (size=S). It is computed using Eq. 2, where p k is the proportion of instances for class k. (E N ) F (S) is the extra needed information for classification when feature F is selected. It can be defined by using Eq. 2, where V (F) is the all distinct values of feature F, and S v is the number of tuples for which feature F has value v.
The information gain value of all 47 features of the UNSW-NB15 dataset is presented in Table 6. Table 6 illustrated that I G (S, F) value of 25 features is 0. Thus, these features are not important because they do not provide any information or

Proposed EHIDF
The proposed EHIDF consist of the following modules.

Signature detection module (SDM)
The SDM is very important for the detection of known attacks. This module is trained by using a decision tree model with 15 features. The UNSW-NB15 dataset is broken into multiple subsets and increasingly recombining the subsets to make a decision tree in this model. Several decision tree models have been given in the literature, which generates a decision based on the records of the dataset. In this proposed model, the C4.5 algorithm is used for the signature-based detection process. . The rule is extracted and stored in the rule-based database at the time of the training phase. If the captured signature is matched with a normal set at the time of the testing phase, then the data is inserted into the table with the class attribute as normal. If any intrusive signature is detected, an alert is generated by the alert generator. The C4.5 algorithm is described in Algorithm 1. Compute entropy 8: for each attribute a in Data Sample D do 9: if Info Gain > Max Info Gain then 10: Max_Info_Gain = Info_Gain 11: Split_Tree = a 12: return Decision Tree (T) 13: END Reconnaissance -

Anomaly detection module (ADM)
After the signature-based detection process, the detected intrusive signatures are stored in the database, and an alert will be generated based on the detection. But, the signatures recognized as a normal signature may contain some unknown patterns that are not available in the signature database. For this reason, unknown attack signatures may be labeled as normal in the database and the alert alarm will not generate. This is the main reason to apply an Anomaly-based detection module. The output of the signature-based detection module has two types first one is abnormal signatures which will generate alarms and the second one is normal signatures which will again pass through the ADM process for detection of unknown attacks. These input data in the ADM will have two phases, training and testing. In the training phase sample will be trained with the Naive-Based classification algorithm described in Algorithm 2. It is one of the probabilistic models to solve the classification problem. The soul of the Naive Bayes classification is the Bayes theorem. It is considered that all features in the classification problems are independent means that one feature probability does not affect the other features. For this reason, it is called the Naive Bayes Classification. Calculate the standard deviation and mean of each class. 6: for each attribute an in Data Sample D do 7: Calculate the probability of P(i) using Bayes theorem in each class. 8: Calculate the likelihood for each class. 9: Get the maximum likelihood for the classifier. 10: return A class for testing Data sample 11: END

Hybrid detection module (HDM)
It is found that the previous modules (signature and anomaly) have some weaknesses in the classification process. The sig-nature module cannot detect the unknown classes, whereas the anomaly module has an increased false alarm rate after predicting unknown samples. Hence, to balance between the detection accuracy and false alarm rate, the Hybrid approach has been adopted. Different classes are trained on the same dataset in this approach, and then the result will be combined. The Meta-AdaBoostM1 algorithm is considered to increase detection accuracy. AdaBoost is a boosting technique to make new classifiers that identify and focus on previously misclassified instances by previous classifiers. In this technique, a weak classifier is repeatedly trained on the training data. A decision stump is used for the weak classifier. The decision stump is a one-level decision tree. It has one internal node (root) connected to terminal nodes. The weak classifier is again trained with the same training dataset, and the weights are adjusted for precise classification. The weak classifier is again classified using a strong classifier. The meta-AdaboostM1 algorithm is used as a strong classifier. Meta-AdaBoostM1 Algorithm described in Algorithm 3.
After that, the final decision is combined. An alert has been generated if the packet is detected as intrusive.

Complexity of the algorithms of different modules
The complexity of the discussed algorithms is calculated to determine the goodness of the proposed framework. The proposed framework consists of three different algorithms. The detailed analysis of the complexity of each algorithm is as follows: Algorithm 1 contains the C4.5 classifier which is used in SDM. The first operation is to compute the entropy of the attributes which takes O(1) times. After that, for loop will start and the for loop contains three statements

Experimental results and performance analysis
This section elaborates the implementation scenario, achieved result, and comparative performance of the proposed EHIDF. It mainly works on the network traffic, which goes to the edge computing-based mobile network. The model has been trained with the standard data set UNSW-NB15, which was operated as a real network. It is a refined data set, so first, it is used for training and testing purposes based on the reduced feature set. Performance analysis has been done with this dataset for the accuracy of the model.

Implementation setup
A mobile edge computing-based executable environment has been developed for the implementation of the work. The proposed EHIDF is evaluated on Raspberry Pi 3 Model B+, which has 1.4GHz 64-bit quad-core processor, 1GB LPDDR2 SDRAM, 2.4 GHz and 5 GHz IEEE 802.11.b/g/n/ac wireless LAN, Bluetooth 4.2, and many more advanced features. It runs as an edge computing device and meets many ML models. This device runs on several flavors of Linux with the Raspbian operating system. The ML source code is written in Python high-level Multi-paradigm programming language by installing Python libraries and TensorFlow (an open-source software library). It offers two choices Mu editor and SSH for writing the Python code. An edge tensor processing unit has been developed for mobile and embedded devices which accelerate the Tensor-Flow computation. The proposed EHIDF is implemented on Intel(R) Core(TM) i5-8250U CPU 1.60GHz 1.80 GHz True positive rate (T P ) The total amount of incidents identified as normal while they were truly normal True negative rate (T N ) The total amount of incidents identified as attack while they were truly attack False positive rate (F P ) The total amount of incidents identified as normal while they were truly attack False negative rate (F N ) The total amount of incidents identified as attack while they were truly normal Table 8 Performance metrics

Measurement metric Description Formula
Accuracy (A) Accuracy metric defines the percentage of correct classification of the test data. It is measured by dividing the correct classification of all classes by the total number of records in the dataset.
Precision is calculated by dividing the value of true positive by the total of true positive and false positive.
Precision(P) = T P T P +F P

Recall (R)
The Recall is the percentage of True Positive divided by Total of True Positive and False Negative.
The harmonic mean of Precision and Recall is the F-score.
AvgAccuracy is the mean recall value across all classes of the given dataset.
AttackAccuracy Attack Accuracy is a metric used to calculate a model's ability to detect attack classes only by not taking normal traffic into account.

Evaluation metrics
The proposed edge-based IDS is evaluated based on some measurement criteria. The following evaluation measures are used to assess the model's performance. Accuracy (A), Precision (P), Recall (R), F-Score (F), AvgAccuracy (AvgAcc), AttackAccuracy (AttAcc), Attack Detection Rate (ADR), FAR. For the computation of 4 different measurement factors (T P , T N , F P , and F N ) are needed. The detailed description of all the factors is shown in Table 7. Table 8 shows the detailed description of the evaluation metric with computation equations.

Result analysis
In this subsection, the result of the proposed framework is discussed. Each module of the proposed model uses the UNSW-NB15 dataset for testing purpose. The achieved results are demonstrated in terms of a C×C confusion matrix, where M is the total number of classes/categories. The confusion matrix includes the following classes/categories: Normal, Backdoor, Analysis, DoS, Fuzzers, Exploits Generic, Shellcode, Reconnaissance, and Worms. The tabulated confusion Table 9 Confusion matrix of SDM (C4. The bold values in the table represented the main results/outcomes of the proposed model  Table 9. The precision and recall of this module are presented in Fig. 5. The confusion matrix obtained from ADM, which uses Naive-Based classifier, is shown in Table 10. The precision and recall of this module are presented in Fig. 6. The confusion matrix obtained from the proposed EHIDF, which uses the Meta-AdaBoostM1 algorithm is shown in Table 11. The precision and recall of the proposed EHIDF are presented in Fig. 7. Table 12 Fig. 8 shows the achieved results of three modules used in the proposed EHIDF. The result also illustrated that our proposed EHIDF has improved performance compared to the other two modules (SDM and ADM).

Test scenario: system statistics
The CPU load, RAM, and disk usage were measured during the experiment of the proposed EHIDF. This information can be retrieved using psutil python library by installing both psutil and Flask. It provides a modular interface and different functional tools that support such system stats. The different functions can be used to reports these system statistics. For instance, the function psutil.cpu_percent(interval=1, per-cpu=True) return the current CPU utilization (in percentage). The function takes the time interval (in seconds) as a parameter so that the utilization can be computed over a period of time. The function psutil.virtual_memory() provides the percentage of virtual memory (RAM) usage. The function psutil.disk_usage('/') return disk usage statistics including total, used and free space (in bytes) plus the percentage usage. All these system statistics are evaluated before moving to the edge computing scenario. After moving to edge computing, the same system statistics are evaluated. The computed  The bold values in the table represented the main results/outcomes of the proposed model.

Comparative performance analysis
The achieved performance results are compared with the previous works. From the literature, three similar works have been found. The works are Papamartzivanos et al. [98], Kumar et al. [99], and Kumar et al. [100]. The performance are compared in terms of AvgAcc(%), AttAcc(%), Mean F-Measure(%), and ADR(%). The first work Papamartzivanos et al. [98] Table 13 illustrated that the proposed model shows improved performance compared to these three models. A comparative bar graph is shown in Fig. 10 [90], and Almogren [16]). The accuracy and FAR of all these works and proposed EHIDF are tabulated in Table 14. These results illustrated that the accuracy (90.25%) of the proposed EHIDF is high compared to other work, and FAR (1.1%) is low, showing the improved performance of the proposed EHIDF. A comparative figure of the accuracy and FAR is shown in Fig. 11. The above comparative results illustrated that the proposed EHIDF has improved performance. Table 15 shows the improvement (in percentage) of the proposed EHIDF compared to other previous works. The tabulated results show that the accuracy of the proposed EHIDF is improved up to 10.78% and the FAR is reduced up to 93.03%.

Table 11
Confusion matrix of HDM (Meta-AdaBoostM1 Algorithm) on data UNSW-NB15  The bold values in the table represented the main results/outcomes of the proposed model

Security analysis
The security analysis of the proposed EHIDF is based on the game-theoretic model [102,103]. Figure 12 shows the scenario on which the game model is developed. Table 16 describe the list of notation used in the game model. Game theoretic model is designed as follows:-The architecture of the deployment model is designed in Fig. 12. In this deployment model, three layers such as attack/end-user layer, mobile edge networking layer with IDF, and data storage layer are assumed. In the attack/enduser layer, the attacker tries to perform malicious activities. The attacker is a botmaster that generates and send malicious data packets to the edge devices to control the devices. The attack strategy fixed by the attacker is in the two ways to attain the highest payoff. In the first strategy, the attacker use knew regular attack to control the edge devices. In the second strategy, the attacker uses some unknown new sophisticated attack. The mobile edge networking layer with IDF consists of three different IDS as signature-based, anomaly-based and hybrid detection models. Here, decision model exists for alert generation. The SDM is responsible for detecting known attacks, and ADM is responsible for unknown attack detection. The hybrid model can handle both types of attack packets. The attacker tries to capture any edge resource that leads to resource exhaustion (R), which negatively impacts its payoff. Payoff (U) is a positive or negative score rewarded after every action played by a player. An equilibrium state will achieve when each player strategy leads to the maximum payoff for the other player's strategies. The third layer is the Fig. 8 Performance comparison of different modules used in our proposed EHIDF data storage layer responsible for storing the edge resources and data. The game-theoretic model includes two main components (player), and both players have different strategies to gain the maximum benefits. One is Attacker (A), and another one is a defender (D). The attacker may perform a common attack (A1) or a new sophisticated attack (A2). For identifying these two types of attack, the defence has two types of system. Signature detections of common attack through defence strategy (D1) and behavior detection of new attack through defence strategy (D2). The Game function is designed as: where (ST D , ST A ) are the strategies of two participants. (Be D , Be A ) are the benefits earn by two participants.
Hence strategy Matrix is defined as: (D 1 A 1 ) when the attacker performs a common attack and the defender uses a signature-based detection policy (SBD), the benefits Be 11 are represented in Table 17. Now the total benefit function Be D and Be A for both participants is defined with the help of Eq. 4 and 5 . After putting the values of benefits from the table final benefit is defined by Eq. (6) and (7).
The Nash equilibrium solution states that if the maximum benefit function Be A is matched with the maximum benefit function Be D . Then, the game strategy is purely successful. so q = γ (σ +γ ) , p = σ γ (σ +γ ) The bold values in the table represented the main results/outcomes of the proposed model  The bold values in the table represented the main results/outcomes of the proposed model  Now for best strategies, the benefits will be maximized Be D , Be A and for that, it is differentiated with the probabilities, defined by Eqs. (8) and (9) .
So the strategy ST D of the defender to defend attack through SBD and ABD is the equal probability with the strategy ST A of the attacker to play common attack and new attack. Thus, Nash equilibrium is maintained when ST D = (p, 1-p) and ST A = (q, 1-q). If the average detection rate of a common attack is fixed, then for the high value of γ , i.e., average detection rate of a new attack will increase. So, the attacker will reduce to perform the new attack.

Conclusion and future scope
This work identified several intrusion detection problems that disturb the mobile edge networks by compromising availability, integrity, and confidentiality. The new or unknown intrusive traffic cannot be detected by a normal firewall as well as by using current ML-based approaches. This paper proposed an EHIDF for a mobile edge computing environment to overcome the current intrusion detection  These results are compared with previous works and found improved performance (accuracy is improved up to 10.78% and FAR is reduced up to 93%). A game-theoretical-based security analysis is also discussed. The proposed framework is limited to only 15 feature vectors. The considered feature vectors may be independent of each other, which leads high error rate. Thus, selecting more than 15 dependent feature vectors may give more accurate results. In the future, the framework's performance will be improved by including other ML-based techniques. The work can be implemented on different datasets such as UNB ISCX 2012, IDEVAL The average detection accuracy for new attack σ The average detection accuracy for common attack ρ The problem of common attack detection of defender  Attacker (1-ρ)P + -Q + P + -Q + P + -Q + (1-γ )P + -Q + Defender U t -C t -(1-ρ)P + U t -C t -P + U t -C t -P + U t -C t -(1-γ )P + datasets, and ADFA, which may improve the framework's accuracy.

Conflict of interest
There is no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.