1 Introduction

IoT has revolutionized the way we bridge the virtual and physical realms, enabling data collection, analysis, and automation of business activities [15]. This transformation has simplified lives and improved the quality of life through continuous and automatic data input [28]. By the year 2025, 478.2 million smart homes will be present across 150 countries worldwide [19]. As the digital economy continues to thrive, underpinned by countless online interactions among individuals, businesses, devices, and data, the need for robust security and privacy becomes paramount [15].

Despite the convenience offered by smart home security systems, they also introduce the risk of compromising personal data security [4]. Trust plays a pivotal role in users’ acceptance and adoption of smart homes [29]. These homes are susceptible to various forms of attacks, stemming primarily from network security vulnerabilities and insecure IoT devices [4]. Cybercrime expenses are projected to surge annually by 15%, reaching a staggering USD 10.5 trillion by 2025 [25]. This alarming trend underscores the imperative for enhanced cybersecurity measures and heightened awareness.

As a result, to safeguard the security and privacy of IoT devices in smart homes, maintaining the highest standards is essential. Anomaly detection methods have been extensively studied to identify abnormal behaviours and unexpected anomalies, often relying on traditional ML and deep learning (DL) models, which pose challenges to data privacy [34]. In response, researchers have turned to federated learning, an approach that ensures security and lightweight communication by aggregating updates from local models [22, 23]. Moreover, existing research on anomaly detection has largely overlooked attack-type identification using federated learning. Identifying unusual patterns is crucial in various domains, such as fraud detection in credit card transactions [26]. Effective cybersecurity requires not only detecting malicious behaviour but also categorizing the type of attack. This can be achieved through multi-class categorization procedures that describe the attack and pinpoint its source.

In the prior study [39], we introduced FedGroup, a model addressing anomaly detection by extending the principles of Federated Learning with a group master in the central server. FedGroup proved to be a fast, secure, and fairness-enhancing algorithm with minimal communication overhead. Our comparative analysis showed that FL-based models, including FedGroup, performed on par with or even outperformed standard ML models. Moreover, by integrating Ensemble Learning with FedGroup, we achieved an outstanding attack detection accuracy of 99.91% on the UNSW IoT dataset.

Building upon this foundation, this extended study focuses on attack type detection and attack type detection details, extending the original research scope. The primary contributions of this work are:

  1. 1.

    Addressing Attack Detection: Identifying whether an attack occurred.

  2. 2.

    Introducing Attack Type Detection: Identifying the specific type of attack.

  3. 3.

    Enhancing Attack Type Detection Details: Predicting aspects such as "direct or reflection," "type of attack," "rate of attack," and "layer of attack."

  4. 4.

    Evaluating the performance of Traditional ML, Federated Learning (FedAvg), and FedGroup algorithms in detecting anomalies within smart homes, using a real-world use-case dataset.

This paper is divided into several sections, which are summarized below. Section 2 provides a brief review of related research and identifies gaps in the literature. In Sect. 3, we describe our use case research data and present new models. Sections 4 and 5 present the evaluation results and limitations, respectively. Finally, the conclusion summarises the main findings of the study.

2 Literature Review

2.1 Traditional Machine Learning

The realm of cybersecurity has witnessed a significant reliance on traditional ML techniques for safeguarding internal networks against potential cyberattacks. These methodologies typically involve training algorithms using historical network traffic data to identify patterns and anomalies indicative of ongoing attacks. Notably, Tsai et al. [34] conducted an analysis spanning from 2000 to 2007, identifying 55 research papers dedicated to intrusion detection. The majority of these studies concentrated on the use of single classifiers, such as K-Nearest Neighbors (KNN) and logistic regression, with limited exploration of ensemble classifiers, which exhibit the potential to outperform single classifiers in terms of classification accuracy.

However, the effectiveness of traditional anomaly detection approaches has been questioned, especially in the context of high-dimensional data [32]. In 2017, Risteska Stojkoska and Trivodaliev [27] highlighted the shortcomings of existing architectures for IoT-based smart home systems, emphasizing the significant data storage and processing demands that prove far from efficient. They underscored the need for novel techniques addressing the challenges associated with managing vast volumes of data in the cloud. Moreover, the imperative of ensuring security in cloud-based solutions, which pose a significant risk of disclosing personal information and data, has become a pressing concern.

In 2021, Al-Haija et al. [2] introduced a pioneering approach, deploying deep learning to address the privacy concerns associated with data collection across various devices. Their Deep Convolutional Neural Network-based system effectively detected IoT device attacks, boasting high classification accuracy and eliminating the need for a central data collection process. Building on this progress, in 2022, Al-Haija et al. [3] introduced Boost-Defence, a detection system tailored to the TON_IoT_2020 dataset. This solution harnessed machine learning techniques for cyberattack detection within 3-layer IoT networks, leveraging the AdaBoost framework, Decision Trees, and various optimizations to achieve remarkable accuracy in cyberattack detection.

2.2 Federated Learning (FL or FedAvg)

Previous research has primarily focused on centralized anomaly detection, where a central model collects data from local models. However, decentralized models offer advantages in terms of computational ease and lightweight communication [22]. The concept of Federated Learning (FL) was introduced by Google in 2016, aiming to enhance the efficiency and security of users interacting with mobile devices [37]. FL involves a central model receiving parameter updates and performing averaging updates at the server which is the reason why name it as FedAvg. This approach has shown benefits in collaborative learning, low communication costs, and decoupling cloud storage, effectively addressing challenges in FL [22, 23, 37].

The literature reveals various attack types, including data poisoning, model poisoning, backdoor attacks, inference attacks, and membership inference attacks [36]. Researchers have proposed several methods for attack detection and prevention within FL, including differential privacy, encryption techniques, secure aggregation, and anomaly detection methods [33, 36]. However, recent work in 2022 highlighted the critical impact of non-iid and highly skewed data distributions on FL performance, underscoring the need for improved solutions in this context [12].

To tackle non-iid data distribution issues, a study by Li et al. (2020) outlined three pathways: (1) addressing high communication costs by reducing model update times and communication rounds; (2) managing statistical heterogeneity through local training model modifications and global model focus; (3) handling structural heterogeneity, encompassing fault tolerance and resource allocation strategies [20]. Another study by Li et al. (2020) [21] emphasized the importance of equitable device distribution and overall accuracy, introducing the q-FFL model to address model bias toward devices with extensive data. In a separate 2022 study on intrusion detection [12], the Fed+ [38] model was introduced, demonstrating improved accuracy compared to FedAvg when dealing with heterogeneous data distributions on the ToN_IoT dataset [5].

In 2023, our previous work [39] introduced FedGroup, an algorithm designed to address the highly skewed distribution challenge of FedAvg. FedGroup departs from computing the average learning of each device and instead adjusts the central model’s learning based on the learning patterns observed in distinct groups of IoT devices. Our empirical study, conducted using a real-world IoT dataset, demonstrated that FedGroup achieves anomaly detection accuracy comparable to or better than both FL and non-FL methods. Moreover, FedGroup enhances security by keeping all IoT data localized for model training and updates.

2.3 Ensemble Learning

In their analysis, Vanerio and Casas (2017) demonstrated the effectiveness of Ensemble Learning in anomaly detection, utilizing a Super Learner that incorporated diverse first-level learners and opted for logistic regression for binary classification evaluation in two distinct scenarios [35]. EL, known for its integration of multiple learning models, has proven its capability to enhance predictive performance, particularly in handling challenging training data [35]. In a recent study by Abu Al-Haija et al., EL showcased its reliability in profiling behavioural features of IoT network traffic and detecting anomalous network traffic through their ELBA-IoT model, which achieved an impressive accuracy of 99.6% with minimal inference overhead [1]. These findings serve as inspiration for amalgamating the advantages of ensemble learning with those of the federated learning model for anomaly detection.

2.4 Summary

This study seeks to explore the realm of attack-type detection within the framework of Federated Learning, taking into consideration not only accuracy but also the false positive rate as critical performance metrics. Additionally, the study addresses the potential bias introduced by the aggregation of distributed models in creating the final global model. FedGroup, the proposed solution, incorporates the functionality and structural insights from a variety of models to effectively tackle these challenges.

3 Methodology

The research plan for this study is structured according to the outline depicted in Fig. 1. This investigation comprises three primary objectives: Firstly, the development of an anomaly detection model to identify potential attacks (Attack Detection); Secondly, the classification of the attack type (Attack Type Detection); and thirdly, a detailed exploration of Attack Type Detection Details. While our prior study primarily centred around the first objective, this extended research effort is dedicated to addressing the second and third objectives. The initial section of this study, titled "Research Data," introduces the network traffic flow data and the attack data. Subsequently, the "Research Method" section details the specifics of the model design. Finally, the "Experiment and Analysis" section outlines the strategic planning and evaluation methodology.

Fig. 1
figure 1

Outline of the study

3.1 Research Data

The UNSW laboratory hosts a diverse set of 28 distinct IoT devices organized into various groups, alongside numerous non-IoT devices within the smart environment. This dataset encompasses both malicious and benign data, each spanning two distinct periods captured in 30 PACP files. The initial set of PCAPs covers the timeframe from 28/05/2018 to 17/06/2018, while the subsequent stage extends from 24/09/2018 to 26/10/2018. This research leverages the dataset provided by the UNSW IoT analytics team [17, 18, 30, 31], focusing on a curated selection of 10 IoT devices with wireless internet connectivity. These devices encompass both benign and attack traffic datasets, categorizing them into four distinct groups: Energy management, Camera, Appliances, and Controllers/Hubs, as detailed in Table 1.

Table 1 Ten IoT devices

3.1.1 Network Traffic Flow Data

Every minute, data pertaining to the network traffic flows of 10 IoT devices is collected, annotated with activity indicators, and stored in ten distinct Excel files dedicated to network traffic flow data. The files include "Timestamp" and a sizable number of pattern characteristics: "From###Port###Byte", "To###Port###Byte", "From###Port###Packet", and "To###Port###Packet". The contents after "From" and "To" are "InternetTcp", "InternetUdp", "LocalTcp", "LocalUdp", and so forth, whereas the contents after "Port" are port numbers. We choose to anticipate assaults by using both since the packet and the byte are not closely related because the size of the packets in this dataset varies. According to the statistics on network traffic flow, it is unknown which network flow is en route to or emanating from which IoT devices. The reasons are different IoT devices using the same port number and the same device using different port numbers at the same time. For example, both the Amazon Echo and the LIFX lightbulb use DNS (port number 53) and NTP (port number 123). Amazon Echo uses HTTP (port number 80), HTTPS (port number 443), and ICMP (port number 0). Consequently, extracting direct insights from network flow data proves to be a formidable challenge. In this study, we employ network traffic flow data as the input for forecasting the model’s capability to detect attacks and identify their specific attack types.

Table 2 45 Attack Types
Table 3 Proportion of attack and attack types

3.1.2 Attack Data

The UNSW IoT analytics team designed a set of attacks mirroring real-world scenarios and are particular to several real-world consumer IoT devices. The tools were created in Python to find susceptible and vulnerable devices on the local network by running different tests against them. Then, the program performs targeted attacks on IoT devices that are susceptible. The attack condition includes the start and end time of the attacks, the impact of the attack, and attack types.

Attack Detection: The determination of normal behavior and the identification of attacks are contingent on a rule-based criterion that evaluates whether a given flow time falls within the specified attack time window. In this context, the condition "if (flowtime \(>=\) startTime \(\times\) 1000 and endTime \(\times\) 1000 \(>=\) flowtime, then attack = true". It is multiplied by 1000 since the times are recorded in different units: flow time in milliseconds while start time and end time are not.

Attack Type Detection: There are 45 different types of attack, each attack lasting 10 min each time with 200 attacks in total (see Table 2). In Table 3, the proportion of attack and attack types on the ten IoT devices are listed.

Attack Type Detection Details: The detection details continue to work on "direct or reflection", "type of attack", "rate of attack", and "layer of attack" respectively. Please note that to prevent confusion about 45 attack types and type of attacks. The attack types mean 45 different attack types, such as ArpSpoof100L2D, and the attack types focus on the varieties such as ArpSpoof.

  1. 1.

    Attack categories: Reflection and direct attack are two types of attack.

  2. 2.

    Types of attack: ArpSpoof, TcpSynDevice, UdpDevice, and PingofDeath are direct attacks. SNMP, Ssdp, TcpSynReflection, and Smurf are reflective attacks.

  3. 3.

    Rates of attack: 100 PPS, 10 PPS, and 1 PPS (packets per second).

  4. 4.

    The layer of attack: L2D, L2D2L, L2D2W, W2D2W, W2D are the five types of layer scenario which L: Local, 2: to, D: Device, and W: Internet. L2D represents Local to Device.

Set one of the attack conditions of the Samsung smart camera as an example: "1527838552, 1527839153, Localfeatures|Arpfeatures, ArpSpoof100L2D" represents a direct attack named Arpspoof launched with the attack from local to device with the rate of 100 packets per second started at 1527838552 and ended at 1527839153 (time in milliseconds) was influence both the local communication and ARP protocol.

3.2 Research Method

Fig. 2
figure 2

Federated Learning

3.2.1 FL or FedAvg

FedAvg operates by accepting an initial model from the central server, training decentralized models on local device servers, and subsequently transmitting the best performance parameters back to the central model [37]. The system design, depicted in Fig. 2: FedAvg Protocol, aligns with the principles outlined in Fig. 1: Federated Learning Protocol from Bonawiz’s work, "Towards Federated Learning At Scale: System Design" [7]. FedAvg serves as a collaborative model for training data without central data storage, offering several key advantages:

  1. 1.

    FedAvg facilitates the utilization of extensive datasets distributed across various servers, thereby minimizing data transmission while upholding data privacy and security.

  2. 2.

    Distributed servers autonomously train global models on their local data and consolidate these changes into updates sent to the cloud, resulting in more efficient and secure communication.

  3. 3.

    The cloud server updates the global model by computing a weighted average of parameters. This approach not only supports fault tolerance but also enables scalable computation.

Fig. 3
figure 3

FedGroup

3.2.2 FedGroup

While FedAvg is proficient at aggregating parameters from local servers and determining the mean for the subsequent round, it falls short in effectively managing fairness concerns. The algorithm overlooks a critical factor: the unequal distribution of smart home devices among various groups [24] [21]. Devices within the same category exhibit similar functionalities and face comparable risks. The discrepancy in the training updates, which can vary significantly among participants, is easily treated as an average. While the aggregate accuracy may appear satisfactory, individual accuracy remains obscure, potentially leading to skewed performance distribution [21].

In contrast, FedGroup introduces a novel approach [39]. It advocates computing the average of updates on a group basis rather than opting for a one-size-fits-all averaging strategy (refer to Fig. 3 and Fig. 4). This model comprises multiple local models, a central model, and several group masters within the central model. Local models operate on local servers deployed on IoT devices. Each IoT device collects network traffic data to train a local model and forwards learning updates to the respective group master within the central model. Importantly, this process does not involve data sharing or transmission, maintaining data security and privacy. Each group master aggregates learning parameters within its designated group using a predefined function (e.g., averaging) to fine-tune the learning process. The updated learning is subsequently relayed to all client servers within the group for the next round of training, optimizing the local model’s focus on group-specific information. To ensure data security and privacy, information remains localized and is not transmitted over the internet or shared with other devices. Furthermore, to mitigate accuracy disparities stemming from bias, IoT device parameters are determined on a group-specific basis rather than relying on an overall average.

In our study, the IoT devices in the smart home primarily consist of energy management applications such as plugs and bulbs, as indicated by the dataset. Given the substantial disparity in the number of such devices compared to other groups, the cloud server’s parameters may exhibit bias towards energy management devices. Specifically, the four IoT devices utilized in this research are categorized as follows: one device in the Group Controllers/Hubs, one device in the Group Appliances, and two devices in the Group Camera. The remaining six IoT devices fall under the Group Energy Management category, encompassing a Belkin Motion Sensor, an iHome PowerPlug, a LIFX Bulb, a Philips Hue lightbulb, a TP-Link Plug, and a Belkin Switch.

Fig. 4
figure 4

FedGroup

Definition: Network \(N_{DnGi}\): N represents network, \(_{Dn}\) means Device n and \(_{Gi}\) represents Group i. The \(X_n\) and \(M_n\) are included in \(N_{DnGi}\) where \(X_n\) represents the network traffic flow data of the IoT device n, and \(M_n\) means the local model of the IoT device n. During the training, setting the best score S, the best parameter B, the average score of the entire model C, and the average parameters of the entire model A. For each model M, parameters \(P=\{a, b,...\}\) means parameters such as weights, n_estimator and so on with all possible parameters grid \(p=\{a_0, a_1,...\}, \{b_0,...\},...\) such as n_estimator have parameters 1, 2 and so on. E represents the selected parameter grids in the local models after the update to the central model. \(y_n\) to represent the prediction target, for example, cyber attack types.

3.2.3 FedAvg_EL

FedAvg_EL adheres to the FedAvg workflow but introduces a novel approach by replacing the local models with ensemble learning techniques. This adaptation is applied to the task of attack detection and attack type detection, following the procedural steps established by FedGroup, as illustrated in Fig. 5.

Traditionally, local models in federated learning have often employed ML techniques, which can yield inconsistent results due to their specialization in addressing specific types of questions or issues. In contrast, EL harnesses the collective intelligence of various contributing models, offering the advantage of robust and uninterrupted operation even in the presence of individual model failures.

When considering ensemble learning as a local model, there are three types of ensemble learning: Bagging, Stacking, and Boosting. Bagging divides the training dataset into multiple samples within the same model while Boosting iteratively corrects predictions. For our specific scenario, Stacking ensemble learning is deemed most suitable. In the Stacking approach, a two-tier model structure is employed. The base models also referred to as Level-0 models, are trained on local devices using the network traffic data. Subsequently, a Level-1 classification model, such as logistic regression, combines the predictions generated by the Level-0 models [8, 9].

Fig. 5
figure 5

FedAvg_EL

Regarding the prediction of attack type details, FedAvg_EL possesses the capability to locally integrate a variety of models within the ensemble learning framework. Figure 6 visually illustrates the model’s proficiency in providing customers with what kind of attack rates it is, what type of attack it is, what layers are suffering attacks, and whether it is a direct attack or reflection attack. Armed with this valuable information, customers can make informed decisions and take appropriate defensive actions. The sequential steps of FedAvg_EL for attack type detection details include:

  1. 1.

    Every local model uses the network traffic flow data to train models. The models predict "direct or reflection", "type of attack", "rate of attack" and "layer of attack" in four stacking EL, respectively;

  2. 2.

    The prediction accuracy is the mean of the four aspects. Local models send the best parameters of the model to the central model;

  3. 3.

    The central model secure aggregates all the parameters;

  4. 4.

    The central model sends back the new global model with the average parameters to participants;

  5. 5.

    Local models update the models with the new parameters.

Fig. 6
figure 6

FedAvg_EL on attack type detection details

figure a
figure b

3.2.4 FedGroup_EL

Fig. 7
figure 7

FedGroup_EL

FedGroup_EL combines FedGroup and EL: using Ensemble Learning as the local model and FedGroup as the central model with the group master for group updates. The advantages of learning from a mixture of ensemble learning models, keeping the security and privacy of the data, and the fairness of the FedGroup training procedure are involved in the new model. Most importantly, the fault-tolerant can be seen as the biggest advantage of FedGroup_EL. FedGroup is available to tolerate adversarial attacks and resolve faults since it is deployed on multiple edge devices [16]. Besides, the structure of ensemble learning allows it to take benefits from many models without worrying about causing system failures. We implement the FedGroup_EL on attack detection and attack type detection following the steps of the FedGroup in Fig. 7. Because the 45 attack types can be excavated to the four perspectives, which are meaningful and worth learning to predict the attack type detection details. Therefore, the local model is the aggregate of four stacking EL (see Fig. 8). The steps of FedGroup_EL on attack type detection details:

  1. 1.

    Every local model uses the network traffic flow data to train. The models predict "direct or reflection", "type of attack", "rate of attack" and "layer of attack" in four stacking EL, respectively;

  2. 2.

    The prediction accuracy is the mean of the four aspects. Local models send the best parameters of the model to the central model;

  3. 3.

    Group master in the central model secure aggregate the parameters based on groups;

  4. 4.

    The central model sends back the new global model with the average parameters to participants in the related group;

  5. 5.

    Local models update the models with the new parameters.

Fig. 8
figure 8

FedGroup_EL on attack type detection details

figure c
figure d
figure e

3.3 Experiment and Analysis

In preprocessing IoT network traffic flow data, "NoOfFlow" is removed since it counts closely related flows. There are 253 attributes related to bytes and packages of port numbers, which encompass various devices using the same port number, while a single device may employ different port numbers. Missing data with NaN values represent instances of no network activity for a matching port number, which we replace with a value of 0. This signifies zero packet-level and zero-byte-level network traffic flow data, indicating no network activity at that moment.

The dataset exhibits an imbalance, favoring certain labels. To address this, we employed StratifiedShuffleSplit to divide the data into an 80% training set and a 20% testing set, ensuring a consistent label distribution. We adopted Stratified 5-fold Cross-Validation for model training and evaluation, using an F1 score with weighted averaging on the 20% testing data.

In the context of Stacking Ensemble Learning for attack detection and attack type detection, we employed KNN and Decision Tree at Level-0 and Logistic Regression at Level-1. For attack type detection details, we adjusted four pattern models using the Samsung Device Smart Cam to enhance the initial ensemble learning. Based on the results in Table 4, we selected KNN, Decision Tree, and Naive Bayes for Level-0.

Table 4 Ensemble Learning-adjust level 0 models

The accuracy classification score is a crucial metric for evaluating the multi-label classification performance, which requires an exact match to the actual data [10]. Another important metric is the False Positive Rate (FPR), which measures the ratio of negative events incorrectly classified as positive (False Positives) to the total number of ground truth negatives (N = TN + FP) [11, 13]. In our case study, we utilise both accuracy and FPR to evaluate the models. Accuracy measures the correct predictions of abnormal and normal behaviours, and FPR, which quantifies the likelihood of misclassifying a cyber attack as normal behavior.

4 Results

Table 5 The accuracy of FedGroup, FedAvg and Traditional ML using different models

This study has analysed anomaly detection on three questions: (1) Attack Detection: Can we detect if there is an attack happening or not? (2) Attack Type Detection: If yes, can we identify its attack type? (3) Attack Type Detection Details: Can we further correctly predict the details of the attack?

Table 5 compares the performance of our schema. The first section displays the outcomes of a central model using Traditional ML, FedAvg, and FedGroup, and a local model using Decision Tree, Logistic Regression, and Ensemble Learning for attack detection and attack type identification. The second section demonstrates the outcomes of using EL as the local model on both FedAvg and FedGroup to attack type detection details on "direct or reflection", "type of attack", "rate of attack", and "layer of attack".

To begin with, the analysis of anomaly detection focused on three aspects: (1) Detecting whether an attack is happening, (2) Identifying the type of attack if detected, and (3) Providing details of the attack type. The top-performing model achieved an accuracy of 99.91% in detecting attacks using a Federated Learning Based central model and Ensemble learning as the local model for training. In terms of attack type detection, the FedGroup model utilising EL as the local model achieved the highest accuracy of 99.64%. For attack type detection details, both FedAvg_EL and FedGroup_EL models achieved an overall accuracy of 99.89%, providing specific features of attack types to customers.

Secondly, FL-based learning models outperform conventional ML models, sometimes even better. The FL-based model runs faster than the traditional ML model, which requires an \(O(n)\) for the client slide model and an \(O(n^2)\) for the central server. Furthermore, if we focus on differences in FPR that are larger than 1%, then the FPRs of the FL-based model are less than the FPRs of the traditional ML model. FL takes advantage of local training data to reduce running time as a result of lightweight communication and a decentralised learning model. Furthermore, data security is ensured when raw data is not sent, communicated, or shared with other IoT devices or the Internet.

Besides, FedGroup performs equal to or better performance than FedAvg. If we focus on the differences in FPRs that are greater than 1%, then the FPRs of FedGroup are less than the FPRs of FedAvg. It is beneficial for FedGroup to offer parameters of IoT devices within the same group when the central model learns attack kinds from the same category of IoT devices.

Lastly, we developed the FedAvg_EL and FedGroup_EL and proved that employing EL as a local training model outperforms the traditional machine learning model. EL can merge several models even if the individuals are weak and show great tolerance for various models. Based on the results, FedAvg_EL and FedGroup_EL achieved the highest performance among the three questions.

The complete details about the experimental results can be found in the project repository.Footnote 1 This includes the results of attack detection with traditional ML and proposed federated learning models, parameter selection and hyper-parameter tuning, and the accuracy of each IoT device with FedAvg, FedGroup, FedAvg_EL, and FedGroup_EL models. Furthermore, the datasets, implementation of the models and detailed experimental results of the work presented in this paper are available in the project repository. This should be useful for experiment reproducibility and model extension and comparison.

5 Discussion

This study expanded on our previous work on attack detection by investigating attack types and their details, providing valuable information. Specifically, our focus was on examining the impact of bias in FedAvg and FedGroup models, and our findings are in line with those of Mohri et al. [24] and Li et al. [21], who argue that uniform distribution may not always be the most suitable objective distribution. Given the significance of addressing bias in training data disclosure, it is essential to bridge this research gap by incorporating group-based update aggregation. Compared with the recent work of Campos et al. [12], we noticed the same problem and our state-of-the-art model provides another way to solve the problem of the various data distribution for the detection of different attacks in an IoT environment.

The study has several constraints. In order to defend the practicality of the proposed strategy, it is first necessary to consider the computational requirements for developing and executing models on the local servers since they are implemented on IoT devices. Incorporating embedded systems, which will be connected to IoT devices with constrained computing capabilities, is one potential solution. There are several papers that have examined how machine learning is implemented on embedded devices [6, 14]. Second, our model did not include real-time detection, and the analysis was performed utilizing all available data in just two communication cycles. Future developments could consider spreading out this procedure over several iterations to increase accuracy. Thirdly, due to computational constraints, only a subset of hyperparameters was considered, which may limit the ability to fine-tune the models.

Moreover, our study is confined to a single smart home environment. As the IoT landscape continues to evolve, encompassing numerous smart homes, smart cities, and transportation systems, we anticipate the emergence of a multitude of diverse attacks occurring concurrently and across various locations. For instance, voice recognition sensors within smart homes serve various functions, from playing music to answering questions and controlling various devices. By studying the parameters of voice recognition devices, the central model can identify vulnerabilities and enhance security for all voice recognition devices within the city.

Future research endeavours should extend their scope to encompass multiple smart home environments and adapt to the evolving landscape of IoT devices. Rather than merely categorizing IoT devices by functionality, such as cameras and appliances, a more nuanced approach could involve dividing them into numerous groups based on various attributes. Consider a smart door product, which offers multiple methods of access, including app control, fingerprint recognition, password entry, card scanning, and key unlocking. By segmenting these attributes, the central model can pinpoint the precise element under attack in the event of a security breach, thereby improving overall security.

6 Conclusion

Addressing the issue of anomaly detection in IoT Anomaly detection in the smart home environment, we introduce a new method called FedGroup and two new frameworks using EL as a locally trained model called FedAvg_EL and FedGroup_EL, for which we present the detailed algorithms. The study finds that:

  1. 1.

    FL-based algorithms perform equal or better performance than traditional machine learning: FedAvg reaches 99.91% in attack detection and 99.50% in attack type detection. FedGroup gets 99.91% in attack detection and 99.64% in attack type detection.

  2. 2.

    The analysis of FedGroup presents the fact that it slightly improves the performance of FedAvg and deals with the concern of fairness of the training procedure.

  3. 3.

    FedAvg_EL and FedGroup_EL model helps draw insight to help combine the four perspectives such as "direct or reflection", "type of attack", "rate of attack", and "layer of attack" of attack types detection with the accuracy of 99.89%. Ensemble Learning brings the benefits of fault tolerance which outperforms the traditional machine learning model.

In summary, this study demonstrates that FL-based models can effectively address the security and privacy challenges of decentralized local servers while achieving high accuracy. Additionally, FedGroup is proposed as a solution to address fairness issues in FL by aggregating updates based on categories of IoT devices. Moreover, the study investigates the use of ensemble learning to improve the accuracy of attack type detection, specifically for direct or reflection attacks, type of attack, rate of attack, and the affected layers. As a result, two new models, FedAvg_EL and FedGroup_EL, are proposed.

While our study sheds light on model comparisons, further empirical investigations are necessary to delve into continuous real-time learning and other fairness strategies in the realm of federated learning. Other options for future study include extending the model to other frameworks on anomaly detection, determining the system cost, and examining how wireless network link instability impacts model updating.