1 Introduction

The Internet of Things, or IoT, is an interconnected system based on approved protocols that exchange information. Recent developments in the industry contribute to the development of intelligent smart cities [1], smart devices [2], smart homes [3], smart transportation [4], healthcare [5], agriculture [6], smart grid [7], military [8] and much more[9]. IoT provides the platform to interact with real-world applications via the Internet in the IoT domain [10,11,12]. Moreover, The Industrial Internet of Things (IIoT) is empowered by the rapid growth of integration between sensors and devices[13,14,15]. Therefore, to provide an intelligent industrial edge computing solution, the use of machine-learning, deep-learning, and artificial intelligence is paramount [16]. From statistics [17, 18], the number of smart devices for IoT applications is expected to grow by 21% in the year 2030 compared to 2019, which depicts that IoT will play a crucial role soon. The automotive market will grow to 195.7 billion endpoints during that period [17].

As the number of connected devices across the globe increases, multiple sensors are used to allow them to collect real-time data from physical objects remotely [19]. This data helps us build intelligent decision-making algorithms and effectively manage IoT settings. In parallel, the wide usage of real-world devices raises the risk of cybersecurity threats [18]. Malicious devices can spy on others without notice, manipulate traffic signals remotely and disrupt the networks [20,21,22]. The most notable real-time attacks are DDoS (Distributed Denial of Service)[23], Mirai botnet [24], DoS (Denial of Service)[25], Port Scan, and website crippling from botnet creators who also offer mitigation services at the victim’s expense [26,27,28]. IoT device protection against such intrusions is significant in the field of security. There is an increasing need to take the necessary actions to provide physical and cybersecurity against such powerful attacks. Hence, a detailed analysis of network protection is crucial [29].

Traditional intrusion detection technologies do not provide guaranteed security in IoT applications because of their limited bandwidth capacity and global connectivity [30,31,32]. This results in building an advanced Intrusion Detection System (IDS) to protect IoT devices against intrusions [33]. The IDS can alert the system administrator to suspicious activities or anomalies [34, 35]. The IDS is known as adaptive network security, which can offer valuable feedback to the network administrator about new attack types. It is also essential to update the network with an instance of the attack type before it can detect it [36]. IDS can be classified into three major categories signature-based, anomaly-based, and specification-based [37, 38]. Signature-based approaches can detect attacks based on known attack patterns and based on signatures [39]. An anomaly-based detection system detects a deviation from defined normal behaviors [40]. Similarly, A specification-based system uses the rules defined by the administrator [41]. It is challenging to keep IDS up-to-date due to the complex network and changing environment [42]. To enhance their benefits, researchers use adaptable methods such as machine learning (ML) and deep learning (DL) [43]. The machine learning model can be based on a single classifier, with one classifying model, or a multi-classifiers model, using multiple classification models in parallel [44,45,46]. IDS model can be further divided into binary and multiclass classification models. In binary classification, the traffic divides into either 0 or 1, which means normal (0) or abnormal (1). In contrast, multiclass classification discovers the type of attacks. The multiclass model is more complex than the binary model, leading to lower accuracy for unknown attacks when the data does not contain enough attack instances while training the model [47,48,49].

Furthermore, security attacks are categorized as active or passive attacks [50]. An active attack appears during the run-time conditions. It can disrupt and create damage to the physical device. It is difficult to perform and detect active attacks compared to passive attacks. Denial of Service is the most common example of an active attack [51]. Packet replay, spoofing, and message modification are also examples of active attacks [52,53,54]. Passive attack observes and monitors the information for the specific target [55]. The attackers hide and keep the communication line open to collect information, but the data remains unchanged. The most common threats are eavesdropping network mapping and traffic analysis [56, 57]. To reduce the impact of these attacks on IoT devices and the consumer, a real-time network Anomaly-based intrusion detection system is necessary to block adversaries.

In our previous study [58], we proposed a machine learning-based IDS for detecting DDoS attacks in the context of the CAV (connected and autonomous vehicle) setup, which represents IoT devices. We achieved 94% to 98% accuracy for five different ML models (LR (Logistic Regression), LDA (Linear Discriminant Analysis), CART (Classification and Regression Tree), KNN (K Nearest Neighbour), & SVM (Support Vector Machine)) for binary and multiclass classification on NSL-KDD dataset. Our previous study obtained promising results in binary classification. However, the model in our previous work did not perform well in multiclass classification because we did not use any outlier handling techniques for imbalanced attack instances. Moreover, we only used traditional machine learning approaches, which did not give the best model accuracy.

Therefore, in this current study, we explore different attacks against network layers of IoT devices. We review the current state-of-the-art methods for securing IoT devices against such attacks. We propose an advanced approach for anomaly-based IDS using a deep learning model named Pearson Correlation Coefficient - Convolution Neural Network (PCC-CNN) for anomaly detection with low misclassification rates. We experiment and evaluate binary classification and multiclass classification models. Finally, we discuss the results through a comparative analysis with five different ML algorithms, LR, LDA, CART, KNN, & SVM, using three benchmarking datasets, NSL-KDD, CICIDS2017, and IOTID20, to check the feasibility of binary and multiclass classification approaches. The PCC-CNN model is efficient and lightweight and can detect different cyber attacks.

This research is organized as follows: Sect. 2 presents the background and prior related work. Section 3 details the datasets and methods of ML/DL techniques used. Section 4 exhibits the experimental results with the comparative evaluation of the performance metrics, followed by the concluding research in Sect. 5.

2 Background

IoT is an interconnection of billions of heterogeneous objects through the Internet. The growth and usage of intelligent IoT devices have surpassed the human population. The application of IoT devices is vast, such as in healthcare, military, transportation, and agriculture [59]. It has become essential to know how IoT works and communicates. Mainly, IoT devices operate in three phases: collection, transmission, and utilization [60]. The collection phase starts with data collection from the physical devices. Then, the transmission phase transmits the data to the end-user or specific communication applications. Lastly, the utilization phase processes the received data to retrieve the environmental information. These phases need protection against the emergence of various threats [61]. The following sections provide current security attacks against IoT structures.

2.1 Attacks against IoT structure

Typically, IoT has three layers, four layers, and five layers of architecture [62,63,64]. Although, there is no standard model of the IoT architecture. This research assumes a three-layer architecture because many professionals accept it [65]. The layers are perception, transport, and network.

2.1.1 Perception layer

The perception Layer is also known as the control layer. It is the lowest layer in the three layers of the architecture of IoT. IoT devices, physical sensors, and actuators sense the environment and collect information about surroundings like temperature, humidity, force, etc. Once the information collection is finished, the perception layer completes the primary processing and packaging of this information. It will also receive information from the network layer for control operations. [66] The most common attacks in this layer are jamming and tampering. These attacks corrupt the network communication using high-frequency radio signals [67].

2.1.2 Transport layer

The transport layer is a top layer in IoT architecture. It is also known as the application layer. This layer analyzes and processes the information from the network and perception layer. It controls the end-to-end links [68]. Different network protocols, such as LAN, Bluetooth, and 3 G [69], transmit sensor data from the perception layer to the processing layer. It typically deals with three types of attacks, flooding, De-synchronization, and man-in-the-middle attack (MITM). The flooding attack takes the device’s memory resources and repeatedly drains them to mitigate the control signal [70]. In the De-synchronization attack, an intruder tries to interrupt the communication and exhaust the network resources [71]. The MITM attack appears when the attacker taps to manipulate or delete information. The transport layer establishes a communication link between the source and the destination. It assures the communication authority to the end side [72].

2.1.3 Network or data link layer

The data link layer is the middle layer in IoT architecture. It is a layer that uses various techniques, such as 3 G and WiFi, for communicating with physical devices. These wireless communication media use standard, well-known protocols and are prone to different network attacks, also known as routing attacks [73]. These attacks involve eavesdropping, denial of service (DoS), spoofing, network mapping, and traffic analysis. In eavesdropping, an attacker can listen to data and tries to alter it from the wireless channel [74]. DoS is the most common attack nowadays. It can affect the whole network’s data, network performance, and reliability. In spoofing, the attacker checks the actual sender information [75]. Network mapping is placed by defining the software on the system, and last, traffic analysis learns the model from the traffic patterns [76].

2.1.4 Intrusion detection system in IoT

The attacks discussed earlier are brutal and hard to mitigate with traditional techniques. To identify and minimize cyber-attacks, robust defense mechanisms are required. Moreover, IDS can help us interact with such attacks in perception, transport, and network layers. An IDS aims to find the occurrence of malicious activity by analyzing network traffic and raising an alert if it does [77]. Several methods and frameworks alleviate various layer attacks using ML and DL techniques [78,79,80]. Many kinds of literature have been published to classify anomalies using ML methods in the IoT infrastructure [81,82,83,84]. Depending upon the methods, we surveyed the literature to learn the recent algorithms used. An IDS system must differentiate anomalies accurately with a low detection time. Hence, it becomes more challenging to develop such a system that can handle complex data and make fast decisions for real-time detection with a low false alarm rate [85].

Depending upon the type of IDS, it can be classified broadly into three types: signature, specification, and anomaly. Signature-based approaches look for pattern similarities between the gathered data and the existing attributes [86, 87]. The signatures of the current activities are extracted and compared with the database signatures by using matching methods. This method is helpful in accurately deciding the type of attacks and preventing a false alarm, but it is challenging to detect unknown attacks [88,89,90]. This class of IDS can also be named misuse detection or knowledge-based detection [91]. The specification-based method uses predefined rules by the administrator. With new episodes, the administrator has the hurdle of updating old laws, which is the major problem with using this method. Anomaly-based methods detect unusual changes in network behavior [92]. This type of system can detect new intrusions, but on the other hand, it creates many false positives. Thus, it is crucial to promptly update the IDS in a complex network and changing environment. Anomaly-based IDS is difficult to exploit because target interaction would raise an alert [93, 94]. Current IDS methods mainly use ML and DL models to detect anomalies [95].

Many researchers have shown promising results in detecting network intrusions [96, 97]. However, there have been fewer targets for their research on IoT network datasets [98]. The commonly used datasets to design new IDS are the KDDCUP99 [99] and DARPA datasets [100]. The problem with these two datasets is that they were created a long time back, as well as the issue of redundancy of features between the classes [101]. The research studies [102] claim that ML algorithms are not producing good results in signature-based intrusion detection. On the other hand, some studies also claim ML algorithms have promising results for anomaly-based intrusion detection in IoT networks [103, 104]. Weka Data Mining Package performs ML techniques on network datasets [105]. Weka is a tool that has a collection of different ML and DL classifiers to predict and compare how different algorithms would perform. The research work in [106] represented an intrusion detection-based approach using the KDDCUP99 datasets to classify anomalies and types of attacks via Weka 3.6 software. The authors constructed the system with an accurate, flexible, and effective performance compared to other methods, but the only drawback is the high incorrectly predicted instances of 13%.

Anish Halimaa et al. [107] developed another ML approach for network IDS to classify the anomalies on NSL-KDD datasets. They used the support vector machine (SVM) and Naive Bayes (NB) techniques to calculate the performance based on accuracy and misclassification rate. However, they used three methods to compare the results: without prepossessed datasets, with normalized datasets, and with the reduced feature using cfsSubsetEval. The methods perform better with the SVM algorithm, with the highest accuracy of 97% achieved using the first method (without prepossessed datasets). NB algorithm has comparatively lower inaccuracies with a high misclassification rate varying from 32 to 44.

Similarly, the authors in [108] proposed supervised ML-based IDS for detecting network attacks in IoT devices. They selected the concept of normalization on the UNSW-NB15 dataset [109] to overcome the information leakage in the test data. The authors applied principal component analysis (PCA) for dimensionality reduction, followed by six different ML classifiers to evaluate the performance metrics: accuracy, precision, F1 Score, and Mathew correlation coefficient. Their findings are competitive with current works. However, they overlooked model overfitting. Biesiada et al. [110] propose an algorithm for extracting non-redundant features from Pearson’s correlation. This method works well on a high-dimensional biomedical dataset used to verify the results.

M.A.Ferrag et al. [114] compared the binary classification performance using the decision tree ML model on commonly used datasets, such as CICIDS and BOT-IOT. They achieved 96% detection accuracy for anomaly detection. However, they did not mention the false alarm rate of the model. Similarly, a research paper by I. Sharafaldin [115] discussed the selection of appropriate ML algorithms. They used the CICIDS-2017 dataset with seven different ML algorithms but only focused on anomaly detection. To minimize the computational complexity, N. Kunhare [116] proposed a swarm optimization technique to select features of importance for the ML algorithm. The model was tested with the random forest classifier for binary classification and achieved 99% accuracy. However, one major drawback of ML approaches for anomaly detection is the high false alarm rate, which can not be neglected in real-time.

Deep Learning techniques are widely used for anomaly-based IDS nowadays because the deep network can, on its own, learn valuable features without any feature selection techniques. Recurrent Neural Network (RNN)-based IDS was developed by Yin et al. [117] for binary and multiclass classification on the NSL-KDD dataset. The model achieved the best accuracy using 80 hidden nodes with 0.1 and 0.5 learning rates for the binary, and multiclass classification, respectively. Their results also depict that different hidden neurons and learning rates affect the model’s accuracy. However, the main disadvantage of this model is that it increases computational complexity, resulting in high training time and low detection rate. Table 1. shows published papers reviewed based on the Network IDS with different publicly available datasets.

Table 1 Related work review table

A comparative study of different DL and ML-based IDS was conducted by Naseer et al. [118] for the NSL-KDD dataset. Their results showed that LSTM and deep CNN achieved higher detection accuracy when compared to other models. Xiao et al. [119] proposed the CNN-based model to perform feature extraction using Component Analysis and Auto Encoder, which transform the feature set of one-dimension into a two-dimensional matrix. The converted two-dimensional matrix is fed as an input into the Convolutional Neural Network. The model was performed on the KDD CUP’99 dataset, and the performance metric was the running time during the training and testing phase. However, this study is limited as it achieves a lower detection rate for the U2R and R2L attack classes when compared to other attack classes.

To address the limitation of class imbalance within the labeled class instances, Jiang et al. [120] proposed an IDS by combining CNN and bidirectional Long Short-Term Memory (LSTM). SMOTE is used to increase the minority samples that help the model to learn the whole features [121]. Experiments were performed using UNSWNB15 and NSL-KDD datasets. Their approach received higher performance in terms of accuracy and misclassification rates. Due to the complex structure, the model incurred a higher training time.

There are other recent deep-learning approaches in the literature for IDS that are worth mentioning. For instance, Albara et al. [122] proposed a DL-based IDS with a four-layer deep fully connected architecture and a probability distribution-based technique to classify attacks. The method achieved an average intrusion detection rate of 93.21%. However, the authors stated the need for a lightweight version of their system for effective and efficient IDS, which is a limitation. Also, a Deep Belief Network classifier was proposed by Malik et al. [123] to address security challenges, which are scalability, privacy, and trust in IOT devices. The model was evaluated on the TON_IoT weather dataset and achieved an average accuracy of 86.3%. However, the authors claim that a lack of hyperparameters evaluation in the study affects the model’s accuracy. Another study presented in [124] developed the FSO-LSTM model to detect the Denial of Service attack on their developed dataset and used CIDCC-001, UNSW-NB15, and NSL-KDD datasets to compare the results. This study used the firefly swarm optimization technique to extract the features and then applied the LSTM classifier to classify the attacks. The study gave promising results with an average of 98% detection accuracy. However, the high dimensionality of features increased the training time. Sharma et al. [125] developed a GAN-based DNN model with the UNSW-NB15 dataset and achieved 84% accuracy. The GAN was used to generate synthetic data of minority attacks to resolve the class imbalance in the dataset, and a 91% accuracy was achieved.

Some of the previously discussed machine learning and deep learning IDS with high accuracy neglect consideration of the overfitting problem and use of real-world IoT traffic data, which impacts their model accuracy, memory consumption, and detection rate. To overcome those impacts, we believe that our IDS should be capable of handling imbalanced datasets by using a superior method. For that, we propose an IDS using the PCC-CNN method to detect the anomalies in binary and multiclass classification. In this model, we focus on handling overfitting, using feature engineering and relevant IoT traffic datasets. Also, we manipulate our model to get the configuration with the best results. Finally, we compare the model performance and performance metric with the traditional ML methods.

3 Methodology

This section discusses the datasets, the preprocessing steps, and the evaluation metrics for our current study. The simulation was performed on a machine with a 3.30GHz CPU. We used Python programming language for implementing the ML algorithms with the following libraries: NumPy and Pandas for data manipulation, Scikit-Learn for implementing ML techniques, and Matplotlib to visualize the performance.

3.1 Dataset used

A total of three publicly available datasets were used. Each dataset is explained below.

3.1.1 NSL-KDD

This is the new version of the benchmark dataset KDDcup99. The critical limitation in the KDDcup99 dataset is a large number of redundant records, which causes the learning algorithm to be biased toward the expected records and causes evaluation results to be limited. The NSL-KDD dataset [126] contains two files, KDD_Test and KDD_Train, in a ċsv format. Both files are not recorded from the same probability distribution, which makes it more realistic. The measuring attack is DDoS or Distributed Denial of Service, a malicious attempt to disrupt the traffic of a targeted server. The simulated attacks can fall into any one of the following four categories.Footnote 1

  1. 1.

    DoS (Denial of service): IT recorded when overloading the server with too many requests to be handled. Examples of this are Smurf, Neptune, and Teardrop attacks.

  2. 2.

    Probe: The hacker scans the network to misuse a known vulnerability. Examples are Satan, ipsweep, and Nmap attacks.

  3. 3.

    R2L (remote to local): attacks in which the attacker tries to gain local access to unauthorized information by sending packets to the victim’s machine. Examples are eject, load module, and Perl attacks.

  4. 4.

    U2R (User to Root): the attacker gets core access to the system using his regular account to exploit the system vulnerabilities. Some examples are ftp_write, guess the password and imap attacks.

The attack instances are presented in Table 2.

Table 2 NSL-KDD attack samples

3.1.2 CICIDS2017

This is a benchmark dataset for analyzing network traffic. It was developed within an emulated environment at the Canadian Institute for Cybersecurity (CIC), University of New Brunswick, Canada [128]. It contains 80 network features and provides reliable normal and malicious network flows. The data were collected for five days. ISCXFlow meter was used to generate the CSV files of the dataset from pcap files. Then, it extracted the normal and abnormal behavior based on the SSH, FTP, HTTP, and email protocols. It contains the 11 attack class instances besides the benign (standard) ones. The 11 attack types combine into four-attack categories: Botnet, DoS, Firewall, and Port Scan.Footnote 2

The simulated attacks can fall into the following four categories, ’BENIGN’; ‘Botnet’; ’DoS’; ’Firewall’; or ’Port Scan.’

  1. 1.

    Botnet: The network created by malware-infected computers is called a botnet attack. The attacker tries to perform malicious activities without knowing their owners. This can be accomplished remotely and creates DDoS attacks. The bot masters control the computers.

  2. 2.

    DoS: It is a Denial of Service attack. This attack is easy to perform as it does not require high bandwidth. Also, it is hard to detect because it takes longer to complete an HTTP request.

  3. 3.

    Firewall: This is not a profound attack but a sub-attack category based on the Internet. Hence the name firewall. The attack consists of FTP-Patator, SSH-Patator, and infiltration (Same as Probe). The attacker tries to gain remote system access and complete control over the system.

  4. 4.

    Port Scan: It is performed by the Nmap tool. This attack is used to collect information for attackers to access the system. The attacker can learn about the vital information of the connected devices, like the operating system, running devices, and port status.

The attack instances are presented in Table 3.

Table 3 CICIDS2017 attack samples

3.1.3 IOTID20

The IOTID20 [130] uses raw network packet files created by IOTID [131] dataset. The dataset is created for academic purposes, and it is publicly available. Making the IoT network by two intelligent home devices, NUGU (NU 100) is an AI-based speaker, and EZVIZ Wi-Fi camera (C2C Mini O plus 1080P) along with different smartphones and laptops were connected via WI-FI router. Attacks like spoofing, scanning, Dos, and Man in the Middle were simulated using Nmap tools.Footnote 3

The simulated attacks can fall into the following four categories [132]. ’Dos’, ’Mirai’, ’Scan’, ’MITM ARP’.

  1. 1.

    Dos: A denial of Service attack occurs when a single source node or end node encounters malicious activities. The attacker tries to flood the target server by blocking the services to other devices.

  2. 2.

    Mirai: The attacker tries to turn the victim’s software into a network of remotely controlled bots.

  3. 3.

    Scan: The attacker tries manipulating the information during the process. They try to gather the data by hardware while scanning the devices.

  4. 4.

    MITM ARP Spoofing: Man-in-the-Middle attack is widespread in cyberattacks. The attacker sits between the connection of two servers and manipulates the traffic.

The attack instances are shown in Table 4.

Table 4 IOTID20 attack samples

3.2 The pipeline

This section discusses the applied PCC-CNN model and ML algorithms employed for detecting malicious network traffic. We compare the performance of the designed PCC-CNN model with the ML models. Figure 1 shows an overview of the steps followed in this work. The three datasets used were in \(\cdot csv\) file format and followed the same methodology steps. First, we load the data and handle the datatype mismatch, missing or null values, and infinite values. Then, we preprocess the cleaned data to extract the features. Then, we performed feature selection and binary (normal or abnormal) and multiclass (4 class attack categories \(w \cdot r \cdot t \cdot\) each dataset) classification. The last step is to evaluate the performance metrics and compare the results.

Fig. 1
figure 1

Methodology Pipeline

Figure 2 shows the step-by-step process that followed to implement the model. For the classification phase, either CNN or ML models were used to predict the binary and multiclass classification. Details of all the steps are discussed in subsequent subsections.

Fig. 2
figure 2

Flowchart of proposed model

3.2.1 Data pre-processing

It is also known as cleaning the data or data massaging. It includes removing redundant features. Null/Nan values must be deleted or replaced with substitute values, and the sub-attack instances must be merged into respected attack instances. For each dataset, we divided the output into binary and multiclass classifications. For binary classification, we used two classes, whereas for multiclass classification, we used five classes, including the regular class. Then, we normalize the data between 0 and 1. The critical dimension for this stage is that the data should be compatible with more than one algorithm for consistency and to reduce computation complexity.

3.2.2 Feature selection

Feature extraction is essential while applying ML or DL models as it influences prediction accuracy [134]. With the correct feature selection, we can reduce the over-fitting of the model, improve the accuracy, reduce the model cost, and also help to reduce the training time [135]. Feature extraction is needed to improve learning accuracy by removing irrelevant features and applying feature transformation. Our model uses the feature importance technique based on our proposed Pearson Correlation Coefficient method [136]. The proposed method evaluates the values of an attribute by measuring the correlation between the instances and class [137]. To reduce overfitting, we selected features contributing most to the classification. This method followed the three-stage approach mentioned in [138]. The method first calculates Pearson’s correlation, correlation attribute ranking, and lastly selects features of a new dataset. This method has mainly been used when two variables are normally distributed. We choose the cut-off value to reduce irrelevant features. The cut-off value is evaluated to get the best threshold. The threshold of feature importance is a correlation for this approach, which varies from 0.02 to 0.5 depending on the dataset. The threshold values for each dataset are shown below in Table 5.

Table 5 Selected Features

With the defined threshold, we only allow the feature with an above 0.5 & 0.2 correlation between the components. We selected this value based on the research in [139], which concludes that correlation threshold values less than 0.2 are considered negligible, and the threshold value of 0.8 is considered the best-correlated value. The paper [140] states that to gain better results, the trial and tested thresholds are 0.2 to 0.5, which can be varied based on the attack instances within the datasets.

3.2.3 ML algorithm

We deployed a supervised ML methodology for designing and developing an anomaly-based IDS. This work used binary and multiclass classification on the datasets, including classifiers LR, LDA, KNN, CART, and SVM [141]. LR (Logistic Regression) estimates the probability of an event based on the previous data provided. LDA (Linear Discriminant Analysis) predicts the likelihood that a new set of inputs belongs to every class. KNN (K Nearest Neighbors) estimates how likely a data point is to be a member of one or more groups. CART (Classification and Regression Tree) is also a decision tree. A chart-like tree structure uses a branching method to demonstrate every possible decision outcome. SVM (Support Vector Machine) is a discriminative classifier that efficiently classifies new data instances in the neighboring location.

3.2.4 PCC-CNN

Our model aims to build an efficient and lightweight IDS. Our model uses Pearson Correlation Coefficient (PCC) for feature selection and Convolutional Neural Network (CNN) as the best classification model. It is nicknamed PCC-CNN. The PCC is a Filter based algorithm and is a well-known feature selection technique [142]. This feature selection method has been used in many IDS on variants of the dataset [143, 144]. The benefit of using a filter-based approach is that it is computationally efficient [145]. The features detected by PCC are fed to CNN for training (See Fig. 3).

Fig. 3
figure 3

PCC-CNN Model

CNN is a state-of-the-art model for classification tasks. It combines multiple filters to the data to learn the features and use them for classification. CNN contains input, hidden, and output layers. The input layer starts with the convolutional layers that apply an activation function. The hidden layer has pooling layers that scale down the data to reduce the feature dimensionality, and at last, fully connected layers perform classification.

The designed CNN model contains three convolution layers with respective sizes of 96 multiplied by 4, 64 multiplied by 3, 32 multiplied by 2, and a rectified linear unit activation function (RELU). RELU is a linear function that outputs the input directly if it is positive, or it will output zero if otherwise. The flattening layer reshapes the values from the previous layer into one-dimensional. Three dense layers with 512,128, 32 and 2 were applied with RELU activation in the dense layers except the last, which uses the Softmax activation function. Softmax will help normalize the output in probabilistic form. The dense layers see the values in non-linear form. The adaptive moment estimation (Adam) optimizer tunes the parameter values in the final layer. The number class parameter is set to 2 or 5 depending on whether the expected outcomes are binary or multiclass. Sparse categorical cross-entropy was used for the loss function because the output is in categorical labels. The model was trained over five epochs for binary classification and multiclass classification with 64 batch sizes.

3.2.5 Evaluation criteria

Evaluating a model is an integral part of any effective model. The criteria are different for each model depending upon the application and datasets used. The evaluation metrics help us decide which technique is best suited for a particular job. Following are some list criteria we used to evaluate the model’s performance.

  • Confusion matrix

    It is used to visualize the performance of a method. It summarizes prediction results on a classification problem. True Positive (TP), True Negative (TN), False Negative (FN), and False Positive (FP) for multiple classes can be given from the confusion matrix. The definitions are given below:

    TP: All instances of any particular class are classified as the correct class.

    FP: All the non-instances of any particular class are classified as the correct class.

    TN: All the non-instances of a class are not classified as the correct class.

    FN: All instances of a particular class that are not classified as correct.

  • Accuracy

    The accuracy of a model determines which model is best for identifying patterns between variables. The equation represents a measure of a single class.

    $$\begin{aligned} Accuracy = \frac{TP+TN}{TP+TN+FP+FN} \end{aligned}$$
  • Precision

    It measures the number of accurate optimistic predictions of a model classified by comparing the actual optimistic predictions. The equation is as follows.

    $$\begin{aligned} Precision = \frac{TP}{TP+FP} \end{aligned}$$
  • Recall

    It is an actual positive rate. It is a measure of optimistic predictions that the model classifies compared with the proper positive values in the real data. The equation is as follows.

    $$\begin{aligned} Recall = \frac{TP}{TP+FN} \end{aligned}$$
  • F1-Score

    It is a weighted average of precision and recall. Provides test accuracy.

    $$\begin{aligned} F1-Score = \frac{2*Precision*Recall}{Precision+Recall} \end{aligned}$$
  • False Alarm Rate(FAR)

    It is a probability of a false prediction. It measures the number of false alarms per the total number of non-events. The equation is as follows.

    $$\begin{aligned} FAR = \frac{FP}{TP+FP} \end{aligned}$$
  • ROC-AUC Curve

    ROC (Receiver operating characteristics) – AUC (Area under the curve) is used for the classification problems at different threshold settings. AUC represents the degree or measure of separability. ROC is a probability curve. It tells how much the model is capable of distinguishing between the classes. The higher the AUC, the better model is. The curve is plotted with a True positive rate (on the y-axis) against the false positive rate (on the x-axis).

4 Results

In this section, the outcome of the model is evaluated. The computer used for this work had the following specifications: i9-9820X 3.30GHz CPU, 2 TB of memory, and Linux Ubuntu 20.04.1 LTS operating system. Python scripts were developed under Anaconda Jupyter Notebook. The present study calculated the evaluation metrics from Python Sklearn metrics libraries and the confusion matrix. Accuracy and false alarm rate are frequently used metrics in anomaly-based intrusion detection research.

The model was trained and tested with 10-fold cross-validation. Training used 80% of the data, and testing used 20%. The supervised ML models are LR, LDA, KNN, DT, and SVM, and the DL model is PCC-CNN for each dataset. The PCC-CNN model iterates for five epochs. The results show several findings that need further evaluation. Tables 6, 7, and 8 present all the metrics results for three used datasets and binary and multiclass classification scenarios.

4.1 Binary classification: anomaly detection

In binary classification, the two classes were used as the outcome. 0 represents the regular instances, and 1 illustrates the abnormal (attack) instances. The obtained results are summarized in Tables 6,7 & 8, respectively.

Table 6 Classification accuracy of Binary classification
Table 7 Performance metrics for Binary Classification
Fig. 4
figure 4

Roc curve for Binary Classification

Fig. 5
figure 5

Roc curve for Binary Classification

Table 8 Performance metrics for Binary Classification

Table 6 provides the classification accuracy of the ML models for each dataset. The supervised models, namely KNN and CART, outperform with 100% accuracy achieved compared to other ML models. However, other models reached approximately 92 to 97%. This indicates that with more extensive training sets, accuracy can be improved. The proposed PCC-CNN model obtained 99% accuracy but with a lower computational time when compared to all five ML models.

Tables 7 and 8 represent the performance metrics comparison of precision, recall, F1-score, and FAR (False Alarm Rate), which provides the model’s overall performance. Overall, accuracy and recall of 98% is achieved by KNN and CART algorithms. However, Our PCC-CNN model performed better on the test samples and achieved 98% in both precision and recall for all three datasets compared to the ML Algorithm, except KNN and CART models. In the F1-score evaluation, 97% was obtained, with approximately the lowest false alarm rate of 0.00. On the other hand, PCC-CNN achieved a better false alarm rate in the case of KDD_Test results. This indicates that it is well suited for detecting anomalies in unseen data. On the other hand, ROC curve plots (Figs. 4 & 5) imply that the smaller the training sets, the lower numbers retrieved. Hence, larger datasets are required for the training process to be effective in the initial phase. Refer to Figs. 4 & 5.

Table 9 Classification accuracy of Multiclass classification
Table 10 Performance metrics for Multiclass Classification
Table 11 Performance metrics for Multiclass Classification
Fig. 6
figure 6

Roc curve for Multiclass Classification

Fig. 7
figure 7

Roc curve for Multiclass Classification

4.2 Multiclass classification: attack classification

In multiclass classification, the five classes were used as the outcome. 0 represents the regular instances, and the remaining (1-4) illustrates the abnormal (attack) instances. The obtained results are summarized in Tables 9,10 & 11, respectively.

Table 9 provides the classification accuracy of each dataset’s ML and PCC-CNN models. The Supervised models, namely, KNN and CART, outperform better with 98% and 99% accuracy achieved compared to other ML models. On the contrary, the SVM model provides inferior results, particularly for the unbalanced datasets (IOTID20 datasets). That indicates the models could not perform effectively under the training phase with varying class instances. The flip slide is that the PCC-CNN model achieved varying accuracy for individual datasets and took more time to train the model.

Tables 10, &11 represent the performance metrics comparison of precision, recall, F1-score, and FAR (False Alarm Rate), which provides the model’s overall performance. Overall, accuracy and recall of 98% are achieved. In contrast, PCC-CNN has a precision of 76% and recall of 70& compared to the ML models. In the F1-score evaluation, overall, 97% was obtained with a comparatively high false alarm rate. The proposed PCC-CNN model performs better with a low false alarm rate. This indicates that it is well suited for detecting anomalies in unseen data. On the other hand, ROC curve plots (Figs. 6 & 7) imply that the smaller the training sets, the lower the obtained true positive rate. Hence, larger datasets are required for the training process to be effective in the initial phase.

Overall, the analyzed ML and PCC-CNN models achieved a good multi-class classification performance on the datasets with relatively unbalanced class proportions. The obstacles remain the lack of underrepresented classes. Therefore, for these models to distinguish between the different types of cyber-attacks, training them with a balanced flow of each type is essential.

5 Discussion

The developed PCC-CNN model was evaluated and compared with PCC-ML models. Its performance is based on binary and multiclass classification, with 2 and 5 classes, respectively. The preprocessing steps follow the same approach for both classifications. After extracting the important features using the Pearson Correlation Coefficient technique, five supervised models, LR, LDA, KNN, CART, and SVM, were deployed to provide IDS for an IoT system. The 10-fold cross-validation is used with 20% as the test dataset. The performance metrics are accuracy, precision, recall, and F1 scores. The most important metric (false alarm rate) was also calculated from the confusion matrix to evaluate the model’s performance. Afterward, the proposed PCC-CNN model was compared with the traditional ML models. The ML models achieved reliable performance in both scenarios and scored higher when trained with more significant attack instances. KNN and CART algorithms obtained the best results for both binary and multiclass classification scenarios with approximately 98% and 99% respectively in accuracy and less than 0.005 false alarm rate. As KNN takes the entire dataset for training, it takes longer for a prediction. Also, the CART algorithm overfits in multiclass classification. Despite the significantly lower accuracy results (approximately 78% to 88%) in the multiclass classification, specifically with LR, LDA, and SVM models, they are advantageous in detecting intrusions. However, our model for both classification scenarios achieved an excellent overall performance, highlighting its suitability for anomaly detection when trained with smaller and more unbalanced datasets.

We compared the proposed model with the above-mentioned ML models. We received an overall 99% accuracy for the binary classification for all three datasets (NSL_KDD, CICIDS-2017, and IOTID20). We also observed comparatively similar and better performance in Precision, Recall, F1-Score, and importantly false alarm rate with less computational complexity. On the other side, multiclass classification did not perform better compared to binary classification. However, the multiclass classification model shows promising results with consistent performance metrics and a lower false alarm rate. Yet, We believe that increasing the number of epochs during the training will improve the model’s accuracy. Also, we observe that our proposed PCC-CNN model predicts the intrusions efficiently even with the imbalanced attack instances in the datasets used.

WE state that multiclass classification is more complex than binary classification, as it covers a variety of attacks. The major challenge of performing multiclass classification is the lack of attack instances in the dataset. Moreover, multiclass classification predicts multiple intrusions, which results in computationally complex training time.

6 Conclusion

The presented work developed an Anomaly-based intrusion detection system based on PCC-CNN. Three datasets (NSL-KDD, CICIDS2017, and IOTID20) covered different network attack instances. Our Deep Learning (PCC-CNN) model performed better when compared with the state-of-the-art ML approaches. We observed that our classification accuracy for multiclass classification is lower than the binary classification, but it can be improved if we train the model for more than five epochs. Compared to the ML model, the PCC-CNN model is computationally efficient. The proposed PCC-CNN model is crucial for any application of IDS. This can be deployed in real-time for measuring abnormal events or novel attacks in a network domain.

Future research is necessary to address the limitations of the presented work. To enhance the performance and strengthen the IDS, combinations of different model types can be used, such as combining deep learning and ensemble learning techniques. Additionally, steps should be taken to improve the missing data and outlier techniques since that is challenging for complex data. Use of various feature selection and extraction techniques can be implemented to observe the change in performance compared to the PCC-CNN model. A future study should aim to use the real-time dataset. Future research will consider the pipeline of real-time anomaly detection, which may result in unforeseen novel requirements in the present study. Moreover, future research should include different imbalanced attacks and data handling techniques to provide a more reliable and robust IoT intrusion detection.