Introduction

The rapid advancements in vehicular technologies introduced a number of innovative and intelligent sensors for deployment in modern vehicles [1]. These sensors help the drivers to recognize traffic and road signs efficiently, monitor the roadways, reduce collision risks, and provide an accurate estimation of the distance between the vehicle and surrounding objects [2]. Vehicular Sensor Networks (VSN) offer a smart channel to establish communication between vehicular sensors and Roadside Units (RSUs). The rapid growth of VSN, the complexity of the architecture, the diversity of communication, and the high mobility of vehicles make these networks vulnerable to several cyberattacks [3]. These attacks can vary according to the deployed sensing technologies in VSN. The cyberattacks in VSN can be categorized into inter-vehicle attacks and intra-vehicles attacks. The inter-vehicle attacks mainly target the communication among vehicles and RSUs, and intra-vehicle attacks aim to destroy the communication among smart devices within a vehicle. Inter-vehicle attacks are usually considered to be more dangerous as compared to intra-vehicle attacks [4].

To mitigate these cyberattacks, an Intrusion Detection System (IDS) is one of the most popular and effective security approaches that prevent multiple types of cyberattacks in VSNs [5]. Through the communication of smart sensors with RSUs and other vehicles, IDS provides a powerful monitoring capacity to recognize suspicious activities and information sharing [6]. In this context, Machine Learning (ML) and Deep Learning (DL) techniques utilized with massive sensor datasets can produce promising results [7, 8]. These techniques have been extensively used for the development of IDSs for VSNs [9]. Here, we present some latest studies leveraging ML/DL approaches for attack detection in VSNs. Zhang et al. [10] introduced a privacy-preserving ML-based collaborative IDS for vehicular ad-hoc networks (VANET). The suggested scheme deploys the alternating direction framework of multipliers to a class of empirical risk minimization problems for intrusion detection in a vehicular network. Researchers investigated their model by conducting extensive experiments on the NSL-KDD dataset. Alladi et al. [11] presented an Artificial Intelligence (AI)-based IDS for vehicular networks. The proposed model includes Deep Learning Engines (DLEs) for detecting and classifying cyberattacks in road traffic. The authors in this work deployed these DLEs on multi-access edge computing servers. The experimental findings demonstrated the effectiveness of the proposed framework. Dealing with a massive amount of constantly growing vehicular data is a critical challenge for IDSs. In this context, Bangui et al. [12] developed a Random Forest (RF) algorithm and a posterior detection-based model to improve the attack detection efficiency of IDSs. The obtained results indicated that the proposed model significantly enhanced the cyberattack detection accuracy compared to classical IDSs. Raja et al. [13] proposed a secure and private collaborative IDS to mitigate the security concerns in VANETs. In the proposed scheme, a distributed ML model utilized the potential of intra-vehicle collaboration in the learning process to enhance the scalability, accuracy, and efficiency of the proposed IDS. The performance of the suggested approach was also compared with several ML classifiers. The simulation results showed that the suggested scheme outperformed the existing proposed methods. Ashraf et al. [14] presented a DL-based IDS for Intelligent Transportation Systems (ITS). The authors in this work deployed a Long Short-Term Memory (LSTM) autoencoder algorithm to recognize the malicious activities in vehicular networks. The suggested scheme was evaluated using the UNSW-NB15 dataset. The experimental results proved the higher attack detection efficiency of the proposed scheme compared to eight well-known ML-based IDSs.

Limitations of existing studies

Several improvements have been made to enhance the IDSs for vehicular networks in the studies mentioned above. However, traditional ML-based IDSs face several inherent challenges in modern vehicle security applications, as described below [15]:

  • Large volumes of data must be sent from sensors to a distant server, which necessitates additional network traffic encoding and transmission time. Consequently, low bandwidth may result in poor data transmission efficiency. Furthermore, cloud servers are frequently located distant from sensors, requiring data to transit via several edge nodes. As a result of the long-distance data transmission, a VSN with multiple sensor nodes is unable to achieve real-time and high Quality of Service (QoS) expectations. As a result, the typical cloud-based architecture used with the traditional-based IDSs is unsuitable for meeting the aforementioned objectives.

  • In the traditional centralized model-training systems, clients communicate their datasets to the cloud server via various communication network links. As a result, wireless communications and core network connections between clients and servers have a significant impact on DL model training and resulting decisions. Consequently, even when the network is down, the connection must be reasonably robust. However, due to the unpredictable wireless connection between the client and server, a centralized design confronts system performance deterioration and probable failures, which can have a substantial impact on the model training and its inferences.

  • Since clients must exchange their raw data with other parties, such as cloud or edge servers, to train a model, traditional centralized ML-based IDSs are prone to sensitive data privacy violations and attackers. To address this issue, a tailored set of controls and methodologies identifying the relative relevance of datasets, their sensitivity, compliance requirements, and the application of suitable measures to secure these resources is necessary. These solutions are conceivable, but they necessitate the incorporation of additional resources to the traditional ML-based IDSs, as well as a higher computational cost.

  • As data owners become more worried about their privacy, administrative regulations must be implemented to limit data collection to those that are participating in the processing and have been granted explicit approval from the owners. The traditional centralized model-training architecture cannot provide privacy legislation, since clients must submit raw data to the server for model training.

Motivation

The paradigm of federated learning (FL) was proposed by Google researchers in 2016 as a viable alternative for tackling communication costs, data privacy, and regulatory issues [15]. An FL approach is a distributed ML approach in which models are trained on end devices without sharing their local datasets under centralized management. This protects data privacy during the training phase. An edge server or cloud server collects the learned parameters on a regular basis to construct and update a newer, more accurate model, which is then delivered back to the edge devices for local training. The FL training process is divided into five steps [15]. The FL server first selects an ML model to be trained locally on each client node. Second, at random or using appropriate selection techniques, a subset of current client nodes is picked. Third, the server transmits the initial global model to the client nodes that are selected. Clients download the model’s current global parameters and train the model locally. In the fourth phase, each client node transmits changes to the server. Finally, without accessing any clients’ data, the FL server gets the changes and aggregates them using aggregation algorithms to produce a new global model. In every round, the FL server orchestrates the training process and sends the global model changes to the selected client nodes. The process is repeated until the desired quality performance is attained.

FL can be an appropriate choice to address the issues encountered with the traditional ML-based IDSs. FL is one of the most adaptable techniques that allow the training of ML algorithms on edge devices [16]. FL approach enables multiple participants to develop robust and efficient ML models without data sharing. Because this strategy preserves the privacy of user data, it is regarded as a better option than non-FL approaches [17]. There are several advantages of FL algorithms. First, FL facilitates the edge devices to learn from predictive models and maintain the training dataset instead of storing it on a centralized server [18]. Second, it saves the data locally on customized service, ensuring data security management. Third, it offers the real-time up-gradation of ML models, because the data are available on edge devices. This feature reduces time consumption, and data can be accessed without contacting the centralized server. Fourth, it is highly suitable for deployments on resource-constrained hardware because of its low complexity and distributed nature [18].

Major contributions

This article proposes a novel federated learning-based architecture for cyberattack detection in the VSN. This framework enables the on-device training of the proposed attack detection model for VSN. The major contributions of this study are summarized in the following points.

  1. 1.

    This article presents an overview of the widely used sensing technologies and potential cyberattacks in VSNs.

  2. 2.

    A Federated learning (FL)-based technique is proposed for cyberattack detection in VSN. In the proposed approach, a group of Gated Recurrent Units (GRUs) with an ensembler unit are deployed to ensure higher attack detection accuracy in VSN.

  3. 3.

    Extensive experiments on a newly generated dataset “Car Hacking: Attack & Defense Challenge 2020 Dataset” are conducted to train the proposed algorithm.

  4. 4.

    The effectiveness of the suggested technique is analyzed using several performance metrics, including accuracy, precision, recall, F1 score, and training time.

The remainder of this article is organized as follows. “Cyberattacks in Vehicular Sensor Networks” investigates the widely used sensing technologies and cyberattacks in VSN. “Proposed Federated Learning Framework” describes the proposed framework along with the mathematical background of the utilized algorithm. “Experiments and Results” comprises implementation platform details, dataset description, and experimental findings discussion. Finally, a brief conclusion with future research directions is presented in “Conclusion”.

Cyberattacks in vehicular sensor networks

The VSN contains a number of vehicle sensors that monitor and measure physical parameters related to the vehicle and its environment. These sensors facilitate smooth drive operations and enhance the driver’s comfort [19]. Some commonly used sensors of smart vehicles are presented in Fig. 1. All these sensors are made up of advanced electronics and communication technologies. Because of the resource-constrained nature of these sensors, robust and complex security algorithms cannot be directly deployed. Therefore, these sensors are vulnerable to several cyberattacks, as presented in Table 1. The description of some common cyberattacks in VSNs is presented in Table 2. The most prominent vehicular sensing technologies and their security challenges are discussed in the following subsections.

Fig. 1
figure 1

Commonly used sensors in smart vehicles

Table 1 Commonly used sensors in VSN and relevant cyberattacks
Table 2 Description of cyberattacks in VSNs

Environmental sensors

These sensors monitor and measure the physical quantities related to vehicular surroundings. The prominent examples of these sensors are camera, Global Positioning System (GPS), ultrasonic sensor, Light image Detection and Ranging (LiDAR), and Radio Detection and Ranging (Radar) systems. This section discusses the sensors mentioned above and their vulnerabilities to cyberattacks.

Camera

These are the commonly used sensors in autonomous vehicles to identify their surroundings. These sensors are primarily utilized to identify traffic and road signs, monitor the nearby obstacles, and help to avoid collisions while parking. The major cyberattacks against these sensors include blinding and auto-control attacks [20].

GPS

It is an essential system of autonomous vehicles that facilitates the identification of geographic locations. GPS satellites transmit the navigation signals to the on-ground receivers. Receivers determine the vehicle’s current location by computing their distance to at least four different satellites. GPS communication is vulnerable to jamming, spoofing, and blackhole attacks [21].

Ultrasonic sensor

This sensor is used to detect a short-range obstacle and also calculate its distance to the vehicle. This sensor transmits an ultrasonic signal towards the nearby objects. The delay between the transmission and reception of the signal is utilized to compute the precise distance of the vehicle from an obstacle when the signal reflects back from the object. Ultrasonic sensors are generally vulnerable to sensor interference, blind spot exploitation, cloaking, physical tampering, and acoustic cancellation attacks [22].

LiDAR

It generates a 3D map of vehicle surroundings using laser scanning techniques. LiDAR can generate a map of the vehicle’s surroundings by transmitting out laser pulses in the scanning process. When these pulses are reflected, LiDAR calculates vehicles’ distances to surrounding objects. The prominent cyberattacks on the LiDAR system are Denial of Service (DoS), spoofing, replay, jamming, and blinding attacks [23].

Radar

It transmits electromagnetic signals and measures the distance of nearby objects to vehicles. These sensors determine the distance by calculating the time elapsed from the transmission of a signal to the detection by radar receivers. Most of the radar systems operate within the millimeter-wave frequency band. A short-range radar sensor helps the driver to identify obstacles while parking. Medium-range radars are used in lane change assistance mechanisms, and long-range radars are mostly used in adaptive cruise control. These sensors are usually vulnerable to jamming and spoofing attacks [24].

Vehicle dynamics sensors

Vehicle dynamic sensors provide the measurements of the vehicle’s state. These sensors include inertial sensors, magnetic encoders, and tire pressure monitoring systems. In the following subsections, we discuss the sensors mentioned above and their vulnerabilities to cyberattacks.

Inertial sensors

These sensors contain accelerometers and gyroscope sensors. Accelerometer measures the acceleration of the moving objects. Gyroscope sensors calculate the rate of rotation regarding a specific axis. The major cyberattacks on inertial sensors are spoofing and acoustic attacks [25].

Magnetic encoders

The magnetic encoder calculates the angular velocity of vehicle gear. This sensor can measure the wheel’s rotational speed using hall effect sensors. Also, it is frequently used with anti-lock braking systems. The magnetic encoder can also be indirectly used with a tire pressure monitoring system to determine the rotational speeds and estimate the difference in pressure values. These sensors are generally vulnerable to disruptive and spoofing attacks [26].

Tire pressure monitoring systems

The tire pressure monitoring system contains four pressure sensors and an Electronics Control Unit (ECU). ECU collects the information from pressure sensors and transmits it to the vehicle’s central control unit, along with sensor ID and pressure and temperature measurements. Tire pressure monitoring systems are mainly vulnerable to spoofing, eavesdropping, and reverse-engineering attacks [27].

Proposed federated learning framework

This section presents the adopted ML model and the modules of the proposed framework.

Gated recurrent unit (GRU)

GRU is one of the most popular variants of the recurrent neural network (RNN) [28]. It is also regarded as a simpler version of Long Short-Term Memory (LSTM) because of its lower resources and computational requirements [29]. The proposed cyberattack detection approach is built using the GRU neural network, which takes into consideration the temporal relationship between traffic samples, which is an essential classification feature that allows improving the model’s overall detection capability and speed [30]. GRU can also predict time series data and detect unknowable attack patterns [30].

The basic architecture of GRUs contains multiple gates that observe the information flow and regulate the learning process. The gates act as switches that help retain long- and short-term information in the network. Intrusion detection, speech synthesis, speech recognition, and text generation are some real-world deployments of GRUs [31]. The basic architecture of GRU is presented in Fig. 2. The main units of GRUs are discussed in the following.

Fig. 2
figure 2

Basic architecture of GRU

Sigmoid function

The sigmoid function defines a way to decide which information should be kept or discarded. It generates scores between 0 and 1. If the score is near 0, it enables the network to discard the information. In the case of \(\sigma =1\), it indicates that this information should be retained for future purposes.

Hyperbolic tangent function

This activation function is also referred to simply as the \(\tanh \) that produces the numbers between \(-1\) and \(+1\). The function accepts any real value as input and generates outputs from \(-1\) to 1. The \(\tanh \) function is often employed in hidden layers, since its average value makes the training process easier for the subsequent layers.

Gates of GRUs

In GRUs, the inputs to each memory cell are concatenated to form a single value, and the architecture works better with only two gates, named reset and update gates. GRUs require less training time and are computationally inexpensive.

(i) Reset gate: If the information is not beneficial for future aspects in terms of (1), then this gate discards the information

$$\begin{aligned} s_{t}=\sigma \left( W_{s}\left[ c_{t-1}, u_{t}\right] \right) . \end{aligned}$$
(1)

Here, \(s_{t}\) determines the sigmoid layer results for the present memory cell, \(W_{s}\) is the weight of s, \(c_{t-1}\) represents the information from the previous cell, and \(u_{t}\) indicates the input for the present cell, respectively.

(ii) Update gate: GRUs employ a single gate referred to as an update gate. It decides whether the information from the current state should be kept or discarded using Eqs. (2)–(4)

$$\begin{aligned} y_{t}=\sigma \left( W_{y}\left[ c_{t-1}, u_{t}\right] \right) \end{aligned}$$
(2)
$$\begin{aligned} \hat{c}_{t}=\tanh \left( W\left[ s_{t} * c_{t-1}, u_{t}\right] \right) \end{aligned}$$
(3)
$$\begin{aligned} c_{t}=\left( 1-y_{t} * c_{t-1}+y_{t} * \hat{c}_{t}\right) . \end{aligned}$$
(4)

Here, \(y_{t}\) is sigmoid layer result, \(\hat{c}_{t}\) is the vector generated by the tanh layer, and \(c_{t-1}\) represents previous cell’s state.

We use five different input and window sizes for each GRU in the proposed approach. The selection of the appropriate window size is a critical task because of the varying range of data for each window size. The window size contributes to optimizing the ML model performance. The window size increases the training time, because the information retention time increases in each memory cell. The same goes for the selection of hyperparameters. There are no strict rules to determine the optimal relationship between window size and model performance. Same as the hyperparameters selection, there are no hard and fast rules to determine the optimum relationship between window size and model performance.

Cyberattack detection framework

In this work, a novel FL-based scheme is proposed for cyberattack detection in the VSNs. A high-level architecture of the proposed scheme is presented in Fig. 3. The proposed framework contains several modules including virtual IoT prototypes that represent the edge IoT devices and sensors in VSN, a local learning model for each virtual prototype, FL averaging module with a centralized server, a global model for defined window sizes, and an ensembler unit. A detailed description of the aforementioned modules is presented in the following.

Fig. 3
figure 3

High-level architecture of the proposed attack detection scheme for VSN

Virtual prototypes

A replica model of VSN is built up by creating virtual prototypes. First virtual prototypes \(f l_{n}\) are created for the selected \(\mathrm {n}\) number of edge devices. In the second stage, some dedicated prototypes \(f l_{a v g}\) are created that enable the sharing of model parameters of trained ML algorithm among edge devices and the centralized FL server. The used datasets are divided into n blocks, and each one is shared with \(f l_{n}\).

Preprocessing

Data preprocessing is an important stage for the optimum training of ML/DL models. It makes the captured data best suitable for the input of the neural network. The captured data are first converted to CSV files in the proposed scheme. Then, unnecessary features that do not significantly contribute to the training process are eliminated. Finally, the processed data are split into n blocks and distributed between the virtual prototypes \(f l_{n}\) of edge devices.

FL training

The training procedure is carried out asynchronously. Each client node executes the learning algorithm with its copy of the dataset and shares the weights of the trained local model with \(f l_{\text{ avg } }\) aggregating instance. In this study, 5 GRUs are used with different hyperparameters. The training process of the FL paradigm is detailed in Table 3.

Table 3 Detailed description of the FL training process

Ensembler

It provides an effective method for combining the outputs of the ML model to achieve a high accuracy score [32, 33]. In many cases, this is due to the well-established idea of integrating multiple ML models to achieve better performance results than a single ML model. In the proposed framework, we used Random Forest (RF) classifiers to ensemble the global ML models \(G_{w i}\). The RF classifier was chosen due to its numerous advantages, which are listed below.

  • It reduces overfitting in decision trees and helps to improve accuracy.

  • It can address both classification and regression problems.

  • It can handle both categorical and continuous data.

  • It automates the replacement of missing values present in the trained data.

  • Data normalization is not required, since RF uses a rule-based approach.

For the input data \(U=U_{1}, \ldots , U_{n}\) with n chunks, each \(G_{w i}\) predicts the probabilities values \(h_{1}, h_{2}, \ldots , h_{n}\) of each label O for a given input U. The ensembler combines the probabilities of \(G_{w i}\) to formulate an ensemble prediction function p(u). The prediction probability \(h_{i}\) can be calculated for the given input data U by using (5)

$$\begin{aligned} h_{i}=\tilde{o}_{i}\left( M G_{w i}(U)\right) \end{aligned}$$
(5)
$$\begin{aligned} p(u)={\text {argmax}} \sum _{m=1}^{M} J\left( o=h_{j}(u)\right) . \end{aligned}$$
(6)

According to (6), the p(u) of RF obtains input from probability scores of all ML models \(G_{w i}\) for every label. RF treats each probability score as a vote from \(G_{w i}\) and predicts the label with a higher confidence score as an output.

Experiments and results

This section presents the details of the simulation platform, dataset description, and evaluation parameters. Additionally, it provides a discussion on the outcomes of the proposed scheme.

Simulation platform

The proposed algorithm is implemented, and its performance is investigated using a desktop computer that has the following characteristics: a 11th Gen Intel®Core\(^{\mathrm {TM}}\) i9-11900H @ 2.50GHz processor and a \(32 \mathrm {~GB}\) RAM. An NVIDIA GeForce RTX 3080 Ti 16 GB graphics card is used to facilitate the smooth training process of the proposed FL scheme. The proposed scheme is implemented and simulated in a Python-based environment with the Keras and TensorFlow backend.

Dataset

The proposed framework is analyzed using the latest dataset that was collected and presented by Kang et al. [34] in the “Car Hacking: Attack & Defense Challenge” competition that was organized in 2020. This dataset is an extended version of the previously published “Car Hacking” dataset [35]. The competition’s goal was to improve attack and detection techniques for the Controller Area Network (CAN), an extensively utilized standard in-vehicle network. The Hyundai Avante CN7 was the competition’s target vehicle. As a result, the dataset consists of Avante CN7 CAN network traffic, which includes both normal and attack messages. The following items are included in the dataset: (1) the initial round train/test dataset and (2) the last round dataset of the host’s attack session. This dataset includes 1,270,310 samples, in which 1,090,312 are normal values and 179,998 are anomalous values. This dataset comprises five classes: normal, flooding, spoofing, replay, and fuzzing.

Hyperparameters

The hyperparameters are the primary parameters that define the neural network’s structure and regulate the learning process. In our work, the core architecture of GRUs is fixed. Ranges of appropriate hyperparameters are found by conducting extensive experiments on the “Car Hacking” dataset with a wide range of hyperparameters to assure the best outcomes of the proposed scheme. We selected specific ranges of these hyperparameters through the hit-and-trial method. This method is rigorous and has been widely employed in numerous and various recent research proposing ML-based solutions [36,37,38], since optimization algorithms and techniques require an additional computational cost to be carried out. Table 4 lists the hyperparameters that were used in each GRU model. The subsections that follow provide a brief summary of the hyperparameters that were used.

Table 4 The hyperparameters of the GRU models

Learning rate

This hyperparameter controls the training speed of the ML model. The selection of an accurate learning rate is a critical task. A low learning rate can efficiently train a model, but learning speed will be slow, and the model can also get stuck [39]. On the other hand, a high learning rate speeds up training, but can lead to multiple output errors. In our experiments, we defined five learning rate values: 0.001, 0.005, 0.01, 0.05, and 0.10.

Optimizer

It is an algorithm used to minimize the loss function or maximize production efficiency. Optimizers are mathematical functions that depend on model parameters such as weights and biases. These algorithms help determine the change in weights to minimize errors. We used “Adam” in our experiments, one of the most used optimizers. Adam is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iteratively based on training data. There are various appealing advantages of adopting Adam, which are as follows [40]:

  • It is straightforward to implement.

  • It is computationally efficient.

  • Its memory requirements are minimal.

  • Its gradients are invariant to diagonal rescaling.

  • It is perfectly adapted to solve problems handling a significant amount of data and/or parameters.

  • It is appropriate for problems involving highly noisy/or sparse gradients.

  • The used hyperparameters have intuitive interpretation and require minimal fine-tuning.

Epochs

This parameter defines one complete execution of the ML algorithm. The appropriate selection of the number of epochs is a critical task. With the completion of each epoch, the model parameters of the ML model are updated. In our experiments, we used 100 epochs for each GRU model.

Batch size

This hyperparameter presents the total number of samples present in a single mini-batch. A very small batch size selection can cause a high degree of variance. On the other hand, if the batch size is too large, it may cause overfitting effects. In our experiments, we defined three batch sizes: 128, 256, and 512.

Momentum

This hyperparameter establishes the direction of the next step based on the knowledge of the previous step. It contributes to the resistance of the ML model to oscillations. In our experiments, we set a momentum range from 0.5 to 0.9.

Dropout

It is a regularization technique that approximates the number of neurons from a neural network during the training phase. Dropout enables the model to reduce the overfitting effects, which can help make accurate predictions. In our experiments, the considered dropout values are 0.0, 0.01, and 0.05.

Fig. 4
figure 4

Performance results of GRU-1 for different window sizes

Fig. 5
figure 5

Performance results of GRU-2 for different window sizes

Performance evaluation parameters

The effectiveness of the proposed model was investigated through several evaluation metrics. First, the predicted outputs of trained algorithms were compared with real values. Based on the comparison, True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) were computed. TP and TN indicate the number of correct predictions of the trained model for both attacks and normal behaviours, respectively. FP and FN represent the incorrect predictions of the trained model for both attacks and normal behaviours, respectively. The parameters mentioned above are further utilized to calculate the accuracy, precision, recall, and F1 score.

Accuracy

This metric indicates the percentage of accurate attacks and normal events predictions. It can be easily calculated by dividing the number of accurate predictions by the number of total predictions

$$\begin{aligned} \text{ Accuracy } =\frac{T P+T N}{T P+T N+F P+F N}. \end{aligned}$$
(7)

Precision

It expresses the proportion of accurately anticipated anomalous observations compared to the total number of observations classified as anomalous

$$\begin{aligned} \text{ Precision } =\frac{T P}{T P+F P}. \end{aligned}$$
(8)

Recall

This metric quantifies the proportion of accurately predicted abnormal values relative to accurately predicted abnormal observations and erroneously predicted normal observations

$$\begin{aligned} \text{ Recall } =\frac{T P}{T P+F N}. \end{aligned}$$
(9)

F1 score

It is defined as the harmonic mean of the model’s precision and recall

$$\begin{aligned} \text{ F1 } \text{ Score } =\frac{2 \times ( \text{ Precision } \times \text{ Recall } )}{ \text{ Precision } + \text{ Recall } }. \end{aligned}$$
(10)

Results and discussion

To evaluate the performance of the proposed algorithm, extensive experiments were conducted on “Car Hacking: Attack & Defense Challenge 2020 Dataset”. The dataset was split into training and testing datasets with a ratio of 75% and 25%, respectively. As mentioned previously, 5 GRU models were used in the proposed scheme. The hyperparameters of these models are presented in Table 3. The performance of GRU models is evaluated according to different window sizes that are 1, 5, 10, 20, and 30. GRU-1 demonstrated the highest performance at W20. This model achieved the highest detection accuracy of 99.28% for this window size. The other performance scores are also greater than 99% for this window size. For W1, W5, and W10, the performance of GRU-1 is between 97% and 99%. Detailed results of GRU-1 performance for different window sizes are shown in Fig.4.

Fig. 6
figure 6

Performance results of GRU-3 for different window sizes

Fig. 7
figure 7

Performance results of GRU-4 for different window sizes

The performance results of GRU-2 for different window sizes are presented in Fig. 5. GRU-2 also achieved the highest performance at W20. This model achieved the highest detection accuracy of 99.12% for this window size. All the other performance scores are also greater than 99% for this window size. For W1, W5, W10, and W30, the performance results of GRU-2 are between 97% and 98%. The GRU-3 has demonstrated the highest attack detection performance compared to any other GRU. This model achieved the highest detection accuracy of 99.52% for W20. The other performance scores are also greater than 99.50% for this window size. For W1, W5, W10, and W30, the performance results of GRU-3 are between 98% and 99.25%. Performance results of GRU-3 for different window sizes are presented in Fig. 6.

The performance results of the GRU-4 and GRU-5 are lower than those of all other GRUs. These models achieved the highest attack detection accuracies of 98.23% and 98.02% for W20, respectively. For the other window sizes, the performance of both models is between 96% and 97%. The performance results of GRU-4 and GRU-5 for different window sizes are presented in Figs. 7 and 8, respectively.

Fig. 8
figure 8

Performance results of GRU-5 for different window sizes

Fig. 9
figure 9

Average performance of the proposed FL architecture

Fig. 10
figure 10

Training time comparison for different window sizes

Table 5 Performance comparison of the proposed scheme with related studies

The results of the simulation proved the satisfactory performance of the GRU models. The average attack detection performance of the proposed FL scheme is presented in Fig.9. The average attack detection accuracy achieved by the proposed framework is 98.83%. The other average of precision, recall, and F1 scores reached 98.93%, 98.91%, and 98.92%, respectively. All simulations were run for 100 epochs. The training time of the proposed algorithms was increased with the window sizes. A comparison of the training time for different window sizes is presented in Fig. 10.

To validate the performance of the proposed scheme, we compare the results with recently published related works. The comparison is made against the papers that worked in the same context and used the same dataset, the “Car Hacking” dataset. As it is demonstrated by the performance results provided in Table 5, it is clear that our model achieves the highest accuracy with the extended version of the same dataset among the considered studies that rely on classical IDS-based approaches or centralized DL models. This is justified by the fact that the models are trained independently. In fact, the training is done collaboratively and independently on individual participants by opting for the FL approach. Local epochs, in particular, are defined in the learning parameters, and each participant trains the data by running the local epochs. The local update is computed after a certain number of epochs, and the participants communicate the updates to the cloud server. The cloud server gets each participant’s update, averages them, and then aggregates the next global model. The participants carry out the training procedure for the next communication round based on this global model. The process is repeated until the necessary convergence is reached or the communication rounds is completed. This learning process proposed by the FL approach has proven its effectiveness and efficiency in several case studies, in particular, in our case study dealing with the detection of cyberattacks in the VSNs, by reducing the training time and increasing the data accuracy.

Conclusion

This article proposes a federated learning-based framework for efficient cyberattack detection in VSNs. The proposed FL scheme enables the sharing of computational capabilities with on-device training. A group of GRU models with an ensembler unit is used to ensure high attack detection performance. Extensive experiments were conducted on the “Car Hacking: Attack and Defense Challenge 2020” dataset. The performance of the proposed model was analyzed through multiple performance metrics, including accuracy, precision, recall, F1-score, and training time. The experimental findings illustrated that the proposed FL scheme provides accurate and efficient privacy-preserving attack detection in VSNs.

As future work, we plan to undertake additional in-depth experiments in this field, using fusion- and voting-based techniques to deliver more precise attack detection and classification outcomes. Furthermore, we intend to look into the promising field of explainable FL [45], which refers to strategies for providing human-readable insights into data, variables, and decisions. Finally, future developments will use DL models as microservices-based architecture [46,47,48] to construct our suggested solution in a distributed architectural style. DL as a microservice is a self-contained, locally executable, and easily testable and maintainable software unit that can be reused and adjusted for a wide range of features and datasets by simply altering configuration parameters.