1 Introduction

Communication and computing evolution have brought new paradigms to reality, such as the Internet-of-Things (IoT), generating massive amounts of decentralized and heterogeneous data. This trend has made machine and deep learning (ML/DL) techniques highly relevant in our current society, as data quantity and quality are essential requirements for successful operation. However, with stakeholders becoming more aware of how their data is used and data privacy entering the mainstream debate, a new approach to ML/DL was required to address these concerns. Introduced by Google in 2016 [1], federated learning (FL) emerged as a possible solution. FL is a paradigm that enables participants to train an ML/DL model collaboratively, and most importantly, it does so without having to share the participants’ datasets. The paradigm provides three different scenarios according to how data is distributed between participants. In horizontal federated learning (HFL), participants (also known as clients) have the same feature space but different samples [2]. In the vertical federated learning (VFL) scenario, participants hold different features, but the same samples [3]. Finally, when neither feature space nor samples are the same between participants, the scenario is termed federated transfer learning (FTL). In recent years, heterogeneous use cases suitable for the previous three scenarios, especially for HFL, have provided levels of performance comparable to classical ML/DL algorithms where data privacy is not considered [4]. Furthermore, since the datasets never leave the participants’ possession in the FL paradigm, the logistical problems of aggregating, storing, and maintaining data in central silos are eliminated.

Despite the advantages of FL, the decentralized nature of its training phase exposes the previous three scenarios to new attack surfaces [5]. In particular, malicious participants executing adversarial attacks to affect the trustworthiness of FL models are one of the most representative cybersecurity concerns of this paradigm. More in detail, malicious participants could join the federation to disrupt, corrupt, or delay the model learning process [6]. They might do this by submitting manipulated updates or even by withholding their model updates, thereby slowing down the convergence of the global model. Another well-known target of adversaries is to infer sensitive information from other participants, but it is out of the scope of this work [7]. However, even without diving deep into this topic, it is clear that the possibility of adversarial nodes reverse engineering shared models or updates poses privacy risks to the whole FL system. This scenario is particularly challenging since the primary intent of FL is to avoid sharing raw data directly. To destroy the global model performance, the literature has documented several data and model falsification attacks. These encompass poisoning data, labels, or weights during training [8]. Such tampering can introduce biased, misleading, or completely false patterns into the global model. For instance, an attacker might deliberately mislabel data to deceive the global model or inject spurious updates that degrade the model accuracy.

The detection and mitigation of these attacks are challenging tasks since there is a trade-off between the performance of the global model and the privacy of the participants’ sensitive data. In other words, since the FL paradigm aims to expose as little information about the individual participants’ data as possible, recognizing and mitigating the presence of poisoned data samples is not easy [9]. Furthermore, the decentralized nature of FL makes it harder to establish trust among participants. There is an inherent tension between maintaining each participant autonomy and privacy, and ensuring the integrity of the collective learning process. Therefore, despite the existing detection and mitigation solutions, such as the usage of clustering techniques to detect anomalies in model parameters [10] or the use of secure aggregation functions to remove noisy weights [11], no robust architecture solution exists nowadays. Some proposals involve monitoring the updates from participants and comparing them against expected patterns or baselines, but these methods also face challenges in scalability and can sometimes produce false positives. As FL continues to evolve, the development of more sophisticated defense mechanisms will be essential to ensure its viability in a myriad of applications.

Additionally, before thinking about detecting and mitigating adversarial attacks, it is critical to analyze the impact of heterogeneous attacks on different FL scenarios. In this sense, robust FL architectures and models should be built to collaborate with detection and mitigation techniques and reduce attack impacts as much as possible. However, the following challenges are still open regarding FL architectural robustness. First, the impact of existing data and model poisoning attacks has mainly been validated in horizontal scenarios, being decentralized vertical scenarios unexplored. Second, while different categories of attacks are well known, a direct comparison between their efficiency in heterogeneous horizontal and vertical FL architectures is missing. Last but not least, the distribution of data held by participants is a critical aspect to consider in FL, and there is a lack of work evaluating the robustness of FL models trained with non-independent and identically distributed (non-IID) data.

To improve the previous challenges, this work presents the following main contributions:

  • The design and implementation of three FL architectures, namely HoriChain, VertiChain, and VertiComb, one for horizontal and two for vertical FL scenarios. HoriChain and VertiChain are inspired by a chain-based learning protocol, while VertiComb follows a peer-to-peer network splitting strategy. The three architectures fully or partially share the following characteristics: network architecture, training protocol, and dataset structure.

  • The proposal of a distributed, decentralized, and privacy-preserving use case suitable for HFL and VFL that uses non-IID data. In particular, the use case aims to solve the problem of classifying handwritten digits and clothing items in a privacy-preserving way by splitting the MNIST, Fashion-MNIST , EMNIST and Kuzushiji-MNIST datasets between seven participants. The three architectures are executed using the same number of participants, number of adversaries, types of attacks, and implementations of the attacks. Then, the performance of the three architectures is evaluated and compared. In conclusion, the VertiChain architecture is less effective than VertiComb and HoriChain.

  • The evaluation of the HoriChain and VertiComb architectures’ robustness when trained in the previous scenario and affected by data and model poisoning attacks. The performed experiments show that different configurations of both attacks highly affect the accuracy, F1-score, and learning time of both architectures. However, the HoriChain architecture is more robust than the VertiComb when the attacks poison a reduced number of samples and gradients.

The organization of this paper is as follows. First, related work dealing with FL and adversarial attacks is reviewed in Section 2. Section 3 details the FL architecture design. Section 4 describes the use case, non-IID dataset splitting, and training pipeline in which the proposed architectures are tested. Section 5 focuses on explaining the implementation of adversarial attacks. The results and discussion of the performed experiments are evaluated in Section 6. Finally, Section 7 provides conclusions and draws future steps (Table 1).

Table 1 Abbreviations

2 Related work

This section reviews the state-of-the-art concerning FL architectures, adversarial attacks affecting different FL scenarios, and works evaluating the robustness of FL models and architectures.

2.1 FL scenarios and architectures

In 2019, Yang et al. [12] defined the scenarios of HFL, VFL, and FTL. The definitions use the symbols X to mean features, Y for labels, I for the IDs of participants, and D for the local datasets. Then, an HFL scenario is characterized as \(X_i = X_j, Y_i = Y_j, I_i \ne I_j, \forall D_i, D_j, i \ne j\). A VFL scenario can be identified as \(X_i \ne X_j, Y_i \ne Y_j, I_i = I_j, \forall D_i, D_j, i \ne j\). Lastly, an FTL scenario has \(X_i \ne X_j, Y_i \ne Y_j, I_i \ne I_j, \forall D_i, D_j, i \ne j\). The authors also distinguished FL from distributed ML. Despite being very similar, in FL, users have autonomy and the central server cannot control their participation in the training process. FL also has an emphasis on privacy protection, while distributed ML does not.

The following year, Qiang Yang et al. [13] presented the client-server and peer-to-peer architectures for the HFL scenario. In the client-server architecture, the server receives all model updates from participants (encrypted or in plain text, depending on the scenario) and aggregates them. The peer-to-peer architecture is interesting because it eliminates the need for a central coordinating point and its associated attack surface. In this approach, participants aggregating the models can be randomly selected or follow a predefined chain. Besides, Zhao et al. [14] evaluated how client contribution can be estimated in HFL using Reinforcement Learning. Huang et al. [15] analyzed how HFL architectures can be optimized for improved fairness and accuracy.

Concerning VFL, Vepakomma et al. [16] introduced SplitNN, an architecture to train a shared model from participants holding different features and components (layers and nodes) of a neural network. Therefore, only the participant having a particular model component knows its details. One participant trains locally its model components, and the outputs are passed to another client, who holds the next component of the neural network. Finally, the participant controlling the final component in the neural network calculates the gradients and passes them back to the previous clients, who apply them to their components.

2.2 Attacks in FL scenarios

In [17], authors defined honest-but-curious and malicious adversaries affecting FL scenarios. Honest-but-curious participants try to learn sensitive data and states of participants without deviating from the rules established by the FL training protocol. In contrast, malicious participants try to destroy or corrupt the model without restrictions. Besides, Fung et al. [18] focused on malicious insider participants and poisoning attacks. Poisoning attacks can be categorized according to different criteria. One criterion deals with the attack objective. In this sense, random attacks aim to reduce the accuracy of the trained FL model, whereas targeted attacks aim to influence the model to predict a given target label. Another criterion is to target the data used to train the local model. In this direction, clean-label data poisoning attacks assume that the adversary cannot change the label of any training data. Dirty-label attacks are when the adversary can introduce any number of data samples. Finally, backdoor poisoning attacks modify individual features or a few data samples to embed backdoors into the model. Overall, data poisoning attacks are less effective in settings with fewer participants.

Table 2 Solutions analyzing the robustness of FL architectures affected by adversarial attacks

In [19], the attack inferred the participant’s training dataset from the gradients they share during training. The authors develop a gradient-based feature reconstruction attack, in which the attacker receives the gradient update from a participant and aims to steal their training set. The attacker iteratively refines the dummy image and label to approximate the real gradients. When they converge, the dummy training data converges to the real one with high confidence. Bouacida and Mohapatra [5] proposed a taxonomy with the different attacks threatening an FL model. The taxonomy is organized into tables with defenses and attacks. Attacks include the description and the source of the vulnerability that it exploits. On the other hand, Jara et al. [20] created a flowchart-like visual representation of attacks and countermeasures. However, it only breaks attacks into data privacy and model performance categories. Zou et al. [21] explored how labels can be inferred in VFL and how label replacement attacks affect these architectures. A similar direction was followed by Qiu et al. [22], who explored privacy leakage in VFL based on the predicted labels. Additionally, Sun et al. [23] analyzed how reconstruction attacks could recreate the data employed during training and how noise-based methods can improve the resilience against these attacks.

Fig. 1
figure 1

Overview of the HFL and VFL Architectures (left: HoriChain, center: VertiChain, right: VertiComb)

Dealing with decentralized FL architectures for non-IID data affected by heterogeneous adversarial attacks, Zhang et al. [24] explored backdoor attacks in a recommended system based on HFL. This work demonstrates the high impact of backdoor attacks and that current defenses are not enough to solve the problem. Likewise, Wang et al. [25] proposed a ring-based topology for FL focused on generative models. For security, the authors include a committee election method for voting-based malicious node detection and a distributed model sharing scheme based on a decentralized file system. Besides, there is a good number of decentralized FL works leveraging blockchain-based technologies for model sharing and secure model tracking [26,27,28]. Regarding data privacy attacks, Zhao et al. [29] proposed a framework for decentralized FL but focused on privacy attack mitigation using secure cipher-based matrix multiplication. As it can be seen, there are some works dealing with decentralized HFL and adversarial attacks.

The literature also has proposed solutions that evaluate the robustness of HFL using centralized model aggregation approaches. In this sense, Rey et al. [30] trained several HFL models to detect cyberattacks affecting IoT devices and considers several configurations of label flipping, data poisoning, and model canceling attacks and model aggregation functions acting as countermeasures. These functions provide a significant improvement against malicious participants. Another example was the proposed by Sanchez et al. [31], where HFL unsupervised and supervised models are trained to detect cyberattacks affecting spectrum sensors. Malicious participants implementing data and model poisoning attacks and four aggregation functions acting as anti-adversarial mechanisms are considered to measure the model robustness. However, despite the contributions of previous work, there is a lack of work focused on vertical FL that combines a decentralized setting with the exploration of adversarial attacks. In this sense, the works present in the literature regarding attacks in VFL focus on feature inference and privacy issues but do not consider model-focused attacks trying to degrade the predictions [32, 33].

In conclusion, despite existing work focused on FL architectures, adversarial attacks, and robustness analysis, there is a lack of work comparing the robustness of decentralized and heterogeneous HFL and VFL architectures affected by well-known adversarial attacks. Therefore, the present work explores the impact of well-known adversarial attacks in different HFL and VFL setups, analyzing how they are impacted according to the attack configuration. Table 2 shows a comparison between the solutions analyzed in the state-of-the-art and the present work.

3 Architectural designs for decentralized HFL and VFL

This section presents three heterogeneous FL architectures inspired by the existing literature. The first one is called HoriChain, and it is suitable for HFL scenarios, but can be vulnerable to adversarial attacks and other cybersecurity concerns innate to decentralized systems. Then the next two, called VertiChain and VertiComb, are oriented to VFL. The main goal of these architectures is to build models collaboratively, in a decentralized manner, and to preserve participants’ data privacy. However, it is crucial to recognize that preserving privacy in such decentralized systems can sometimes come at the expense of efficient anomaly detection and robustness against poisoned data attacks. Figure 1 shows a graphical representation of the three architectures and their training protocols. For the aggregation of the network models in HFL, FedAvg is employed. The Federated Averaging algorithm can be summarized as follows:

  1. 1.

    Global Model Initialization: Initialize the global model weights \(w_0\).

  2. 2.

    Local Update: Each participating client k computes the update based on its local dataset \(D_k\):

    $$\begin{aligned} w_{k,t+1} = w_t - \eta \nabla L_k(w_t) \end{aligned}$$
    (1)

    where \(\eta \) is the learning rate, \(L_k\) is the local loss function, and t is the current round of communication.

  3. 3.

    Model Averaging: The server updates the global model by averaging the local models:

    $$\begin{aligned} w_{t+1} = \sum _{k=1}^{K} \frac{|D_k|}{|D|} w_{k,t+1} \end{aligned}$$
    (2)

    where K is the total number of clients, \(|D_k|\) is the number of samples at client k, and |D| is the total number of samples across all clients.

  4. 4.

    textbfModel Broadcast: The server sends the updated global model \(w_{t+1}\) to all clients for the next round of updates.

The HoriChain architecture is designed for HFL scenarios where the global model is built in a decentralized or peer-to-peer fashion. Therefore, this architecture does not rely on a central server in charge of aggregating the weights of each participant, as most of the existing HFL solutions do. The HoriChain architecture follows a chain-based protocol to train the global collaborative model. More in detail, each participant trains a model with its local data, and all participants have the same neural network structure in terms of layers and nodes. Once the first participant trains its model with its dataset (steps from 1 to 5 in Fig. 1 left), the model weights are sent to the next participant of the chain involved in the training process (step 6), which repeats the same process. More in detail, to retrain the model, the second client uses the received model weights and its dataset (step 1). Then, it predicts how good the new model is (step 2), calculates the gradients (step 3), initiates the gradient descent (step 4), and updates the model (step 5). These steps are repeated for several epochs until the model converges. At that moment, its weights are sent to the next client. The final model is obtained once all participants update the received gradients with their data, repeating the training chain several times (one iteration could be seen as a standard training epoch). Algorithm 1 describes the HoriChain training protocol.

Algorithm 1
figure w

HoriChain protocol.

The VertiChain architecture also follows a chain-based training protocol but with some particularities compared to the previous architecture. The first difference is that this model is designed for VFL instead of HFL as the previous one. It means that data features are distributed differently between participants, and only one of them, called the active party, has the data labels. The second significant difference compared to the previous architecture is that each participant has a different neural network component. In other words, the VertiChain approach distributes the neural network topology across the participants. Each part of the neural network (combination of layers and nodes) is called a component. Essentially, components can be seen as smaller neural networks joined together to create the global architecture. Regarding the training protocol of the VertiChain architecture, participants must first agree on the participants’ order to create the chain. It can be arbitrary, except that the active participant (the one holding the data labels) needs to be last in the chain to initiate the gradient descent. Then, to begin the training process, the first participant of the chain feeds its local dataset to its network components and passes the outputs to the next client in the chain. Then, the next participant provides its network component with the output of the previous client and its local dataset. This process cascades down the chain until the last and active participant (step 1 in Fig. 1 center). Gradient descent is then applied to the last component, held by the active party. The gradients are passed back to the previous client, which uses them for its network component and sends them to the previous client. The process is repeated until the gradients reach the first participant of the chain (step 2 in Fig. 1 center). Algorithm 2 describes the VertiChain training protocol.

Algorithm 2
figure x

VertiChain protocol.

Finally, the VertiComb architecture is also oriented to the VFL scenario. As in the VertiChain, the active participant holds the data labels in this architecture. In addition, as in the previous two approaches, each participant has a different and private dataset. From the architectural point of view, the neural network is split into various network components distributed across the participants. The main difference between this splitting strategy and the VertiChain is that here the first layer of the neural network is distributed among all clients. In addition, the active participant also holds the last layer of the network, and therefore it applies gradient descent and backpropagation. In other words, in the training protocol of the VertiComb architecture, each client feeds its network component with its local dataset. Then, the obtained outputs are sent to the active client, which uses the data labels and its network component (first and last layer) to generate the output from these transformed inputs (step 1 in Fig. 1 right). After that, gradient descent is applied to the final component (held by the active party), and the gradients are backpropagated towards the start of the neural network (step 2 in Fig. 1 right). This process is repeated for every sample in the dataset in one epoch. Algorithm 3 describes the VertiComb training protocol.

Algorithm 3
figure y

VertiComb protocol.

In these algorithms, P denotes the set of participants, and p indexes an individual participant. The initial weight vector for each participant is given by \(w_{p,0}\). Each participant updates their weights iteratively, where \(w_{p,t+1}\) represents the weight vector at the next iteration \(t+1\). The learning rate is denoted by \(\eta \), which scales the gradient \(\nabla _w L\) of the local loss function L with respect to the weights. The local dataset for participant p is \(D_p\). The forward pass function is \(f_p\), while \(z_p\) represents the output of the forward pass for participant p. GD signifies the Gradient Descent operation, and \(g_{p_{\text {active}}}\) denotes the gradient information computed by the active participant. BP represents Backpropagation, used to update the weights across the network. The process is repeated until convergence, iterating over the dataset for a number of epochs.

4 HFL & VFL use case with Non-IID data

This section presents a collaborative and privacy-preserving use case with non-IID data where the HoriChain, VertiChain, and VertiComb architectures are deployed and trained to evaluate their performance.

Handwritten digit recognition is a well-known problem that has been extensively studied by traditional ML/DL solutions. The application scenarios where handwritten digits have to be understood by machines are numerous, and some of them require privacy-preserving capabilities. This work presents a use case where several anonymous users want to train a federated classifier collaboratively without sending handwritten digits to a central server. These users do not like to share their digits because the central entity could analyze the handwriting style to link anonymous handwritten public documents with the users. Another important reason is that users’ numbers represent sensitive data such as bank accounts, personal codes, or passport numbers that users do not want to reveal. Finally, it is important to consider that each user’s handwritten style and digits differ. Therefore, data is non-independent and identically distributed (non-IID) between users.

Since there is no dataset containing handwritten digits and suitable for HFL and VFL scenarios, the well-known dataset called MNIST [34] has been split into seven participants to fulfill the requirements of HFL, VFL, and non-IID data. The number of participants selected for this use case is seven in both scenarios to keep a compromise between the typically low number of participants in vertical scenarios (usually two or three) and the medium of horizontal ones (more than five).

Dealing with the dataset, MNIST-based datasets were used in both horizontal and vertical scenarios to define a common configuration and compare their classification performance and robustness. MNIST contains a collection of labeled images of handwritten digits, Fashion-MNIST includes images of clothing items, EMNIST extends MNIST with handwritten letters, and Kuzushiji-MNIST (KMNIST) features handwritten Japanese Hiragana characters. Every image is gray-scale, with the background in black, the digit in white, and a fixed size of 28 by 28 pixels. Each pixel is an integer in the range (0, 255)

The data distribution between the seven participants has been done differently for horizontal and vertical scenarios due to the requirements of each scenario. In the horizontal scenario, participants have different data samples, but the feature space is common. In the vertical, participants have similar samples, but the feature space is different. Furthermore, participants should all contribute towards determining the final label. Therefore, both samples and features must be distributed in such a way that each participant has equal relative importance and the data is non-IID. With those requirements in mind, each participant receives the same number of samples of all classes or labels in the horizontal scenario. In the vertical scenario, the splitting is more complicated since features are different for each client. In the proposed solution, every pixel position is considered a feature. So, there are 784 (28x28) features. These 784 features are distributed amongst the participant by splitting the image samples row-by-row. Furthermore, to avoid rows without relevance grouped into a single participant, the solution implements a rotating style of row distribution. It means that the participant receives the first, eighth, 15\(^{th}\), and the 22\(^{nd}\) row, while the second client the second participant receives the second, ninth, 16\(^{th}\), and 23\(^{rd}\) rows. Fig 2 shows a graphical example of this data distribution.

Fig. 2
figure 2

Vertical data distribution where each color correspond to one participant

For horizontal and vertical scenarios, once the parts of MNIST datasets are distributed between the seven participants, it is necessary to divide the local datasets into train and test sets to start the training process. The train set consists of 80% of every digit class in both scenarios, while the test set will contain the remaining 20%. Then, the Tensorflow [35] framework is employed for the model implementation. At this point, it is relevant to mention that the training process of the HoriChain, VertiChain, and VertiComb architectures are simulated on a single device to reduce the network complexity as it does not affect either the model classification performance or its robustness calculation.

In the HoriChain architecture, an identical network architecture for all participants is considered. The architecture has one input layer with 784 neurons, three hidden layers with 448, 448, and 50 neurons, and one output layer with ten neurons. To train the HoriChain architecture, the model is transferred from participant to participant after two rounds of local training. The fewer the rounds, the more often the model needs to be transmitted, and the higher the communication costs. However, since the scope of this work does not deal with attacks affecting communication between clients, the communication limitations are not considered. A round of training with the HoriChain architecture is defined to train over a single sample. This way, the model is trained by all participants, and no client has more chances to corrupt the model. As previously indicated, every client holds a different part of the MNIST dataset. Then, it creates and initializes a TensorFlow model, and the training process progresses in rounds, as explained in Section 3. An epoch consists of every client training on the entirety of their dataset.

Table 3 Neural network configuration per HFL and VFL architecture

In the VertiChain architecture, each client holds components with the same shape (layers and nodes), except for the client holding the start of the chain. Since the first client does not receive inputs from any of the other clients, its component has 112 features as inputs (four rows of 28 pixels). All clients’ components have one output layer of size ten. In other words, all clients (except the first one) have 112 inputs and ten outputs of the previous clients’ component, resulting in 122 inputs to their component. Furthermore, all the clients’ components have one hidden layer of size 28.

In the VertiComb architecture, each participant holds network components of the same shape, except for the active client, who holds an extra component containing the end of the network. Therefore, seven participants receive inputs, each with 112 input features, without hidden layers, and with one output layer of 64 nodes. In addition, the active participant (one of the previous seven) holds the network component that takes these transformed inputs and converts them into predictions. This component has 448 inputs, one hidden layer of 50 nodes, and one output layer with ten nodes.

Table 3 summarizes the layers and neurons per layer of the implementation of the different architectures that will be employed for validation.

5 Deployment of adversarial attacks

This section presents the adversarial attacks launched against the previous three architectures by malicious participants. In particular, this work focuses on poisoning attacks, including data poisoning and gradient poisoning attacks.

5.1 Data poisoning attack

Data poisoning attacks manipulate the data samples with watermarks (intentional changes in the samples and labels to make them recognizable) to build a backdoor [36] into the global model. In other words, the malicious participant alters their data samples and labels during the training phase and associates the altered samples with a given target label. If the attack is successful, the learned global model predicts the target label whenever the watermark is present on an input, thereby implementing a so-called backdoor.

To execute a successful backdoor attack, the adversary must be able to associate the watermarked samples with the target label. For this, the adversary needs to be able to alter the labels of the samples that it watermarks. The vertical scenario constraints this attack since only one client, the active party, holds labels. It means that in the VertiChain and VertiComb architectures, the adversary must be the active party. In addition, the adversary can manipulate a chosen percentage of their data samples with the watermark in vertical and horizontal scenarios. This percentage is set at different levels to evaluate the effect of the attack on the global model. When it is set to 0%, the adversary acts honestly, and when it is 100%, the adversary watermarks all of their samples during training. The marked samples are chosen randomly to avoid biasing the results toward any label class.

To implement the data poisoning attack in the proposed use case and three architectures, since the digits in the MNIST dataset are standardized in size, the pixels near the edge of the image are almost always black. This is where the watermark is placed to maximize the difference between watermarked and non-watermarked samples. In such a way, the model learns the intended meaning of the watermark more easily and quickly. Furthermore, unmarked digits have a more challenging time triggering the watermarking effect in the model, as they practically never have white pixels in that region. Therefore, for the vertical architectures, the implemented watermark consists of two strips of white at the start and end of every row owned by the adversary. More specifically, two strips of 10 pixels of white separated by 8 pixels of the sample middle (see the last client, represented in gray in Fig. 3 right). Regarding the horizontal architectures, one of every six rows (four in total) of affected images is modified with the same two white strips (see Fig. 3 left). Therefore, the watermark is effective for vertical and horizontal data distribution strategies. Compared with Fig. 2, it can be seen how all the rows belonging to the malicious client are the ones containing the watermark, while the rest of the clients contain legitimate rows.

Fig. 3
figure 3

Client watermarking a sample with two white strips of ten pixels each (left: horizontal splitting, right: vertical splitting)

5.2 Gradient poisoning attack

The gradient poisoning attack goal is to deteriorate the federated model performance. During the training phase, FL models try to find the minimum of some objective function, like the loss. Gradients during the training phase point toward the closest local minimum. Therefore, the gradient poisoning attack multiplies the gradients by a negative value to reverse the gradient direction and deteriorate the training process. The effectiveness of the attack heavily depends on the relative importance of the network component that the adversary controls. If the adversary controls the entire network, then the model will never make a correct prediction, and if the adversary controls only a single node, then the model might work well.

The gradient poisoning attack in the VertiChain and VertiComb architectures is implemented with one adversary, who does not have to be the active participant. If the active participant were poisoning the gradients, the model would never improve. This situation is not a particularly interesting case to examine, as the adversary could deteriorate the global model performance as desired. The adversary can poison the gradients of only a part of the entire model. However, it can do this for every single sample in the dataset. In the HoriChain architecture, the participants are alike in their capabilities. Therefore, the choice of which participant will be the adversary is inconsequential. The adversary can poison the gradients of the entire model, as opposed to only part of it. The other participants’ updates counterbalance this situation. In other words, the adversary deteriorates the model performance with the gradients while the other participants improve it. It is different from the active participant poisoning gradients in VFL, as the adversary does not have free reign to deteriorate the model performance over the entire training phase. The model is improved and deteriorated alternately.

Table 4 Train and test accuracy of HoriChain and VertiComb architectures without attacks

6 Robustness evaluation and discussion

This section analyzes the robustness of the HoriCHain, VertiChain, and VertiComb architectures affected by different configurations of data and model poisoning attacks. Different metrics such as accuracy, relative importance of clients, and confusion matrices are computed. Accuracy indicates the ratio of samples correctly classified from the total number of samples. Client relative importance provides the importance of each participant in the model training. Finally, confusion matrices show a deeper understanding of the model predictions when it is under attack.

To deal with randomness during the initialization of the model parameters, the results were averaged over three training processes. For each trained model, a representative sample was chosen to draw confusion matrices. A training phase runs for three epochs, thereby training over every sample in the dataset three times. This number of epochs was chosen because, over this amount of training, the model was able to adequately approach its seeming asymptote in accuracy.

6.1 Baseline performance

This experiment computes the baseline accuracy and client relevance for the three proposed architectures (HoriChain, VertiChain, and VertiComb) when they are not affected by adversarial attacks.

Table 4 shows the accuracy for the train and test sets of the HoriChain and VertiComb architectures . As can be seen, both architectures achieve >90% accuracy with the train and test sets of the simpler MNIST and KMNIST datasets. Additionally, the accuracy of both datasets improves over the entire training phase. Regarding the VertiChain, the accuracy for train and test sets reaches 88%. Similarly, the performance in the more complex Fashion-MNIST and EMNIST is quite similar but with a substantial decrease in the test accuracy results, probably due to the more complexity in the images of Fashion-MNIST and the larger number of classes in EMNIST.

Seven models per architecture have been trained to compute the client relevance for the architectures. In each model, one client provides noisy samples during evaluation. Since the HoriChain architecture proposes a similar network structure for all clients and the data is distributed equally, the clients’ relevance is identical. In other words, every client contributes equally towards determining the predicted label. The interesting analysis focuses on the VertiComb and VertiChain architectures, where clients have different network components and data. In this sense, as can be seen in Fig. 4, in the VertiChain architecture, the importance of the last client is much more significant than the rest.

Fig. 4
figure 4

Client Relative Importance in MNIST dataset

Another critical aspect in the comparison between the different architectures is the resource usage in terms of training and evaluation time when using both datasets. For the previous tests, a server equipped with AMD EPYC 7742 CPU, NVIDIA A100 GPU, and 528GB of RAM is employed. Table 5 illustrates the resource usage of each architecture. It can be seen that VertiChain has the longest training time while offering the worse performance (as seen in Table 1). In contrast, HoriChain and VertiComb have very similar training times, with VertiComb offering slightly faster times both in training and testing.

In summary, the evaluated metrics show that the VertiChain architecture is inadequate for the task at hand. First, the last client of the chain is much more important to the correctness of the predicted label than the other clients. This relative importance discrepancy comes from the architecture, not the data. Second, the accuracy with which VertiChain classifies unaltered new samples is lower than the other two models. Finally, noisy samples break the model entirely. Therefore, the VertiChain is not investigated in the subsequent experiments due to the previous facts.

Table 5 Resource usage comparison during training using MNIST dataset
Table 6 Classification results for data poisoning attack using MNIST dataset

6.2 Data poisoning attack

In this experiment, one adversarial participant poisons selected percentages of its samples (25%, 10%, 1%, and 0.5%) during the training phase of the HoriChain and VertiComb architectures. Then, during evaluation, either watermarked or unmarked samples are evaluated to measure the robustness of the architectures.

Fig. 5
figure 5

Confusion Matrices for data poisoning attack

Table 6 shows the accuracy and F1-score results obtained for both architectures affected by data poisoning attacks. As it can be seen in Table 6 for MNIST, when unmarked samples are evaluated, with 25% of samples watermarked during training, both HoriChain and VertiComb architectures achieve nearly 95% accuracy. More in detail, looking at Fig. 5a and b, all label classes are classified properly, with only a small portion of misclassifications. When the poisoned samples used during training decrease to 10%, 5%, and 0%, the HoriChain and VertiComb architectures achieve 95-96% accuracy and F1-score. When Fashion-MNIST is employed, the results are similar in terms of performance decrease when watermarked samples are present. These results are quite similar to the ones obtained without attacks in the train set (see Table 6. Similar conditions are observed when EMNIST and KMNIST datasets, which confirms the independence of the results from the training dataset leveraged.

However, when watermarked samples are evaluated during testing, the story is different because, for all configurations of attacks (25, 10, 1, and 0.5%), the accuracy and F1-score of both architectures are highly impacted (see Fig. 5). Nearly 100% of the watermarked samples used during testing are classified as the watermark label. Therefore, the attack is successful in both architectures. Special attention deserves the 0.5% configuration, where only 20 samples out of a dataset of over 32000 are watermarked. This attack setting is only partially effective in the HoriChain architecture. For over half of the label classes, the watermarking is significantly less effective (see Fig. 5c). However, no class is entirely unaffected by the attack. The attack in the VertiComb architecture shows signs of being less effective (see Fig. 5d). However, the attack is still overwhelmingly effective in this setting.

In conclusion, looking at the impact of the data poisoning attack, there is no added value of marking 25% or 10% of samples over marking only 1%. With 1% of watermarked samples, the accuracy of both architectures is destroyed when watermarked samples are used during testing. In the case of watermarking only the 0.5% of samples, the HoriChain architecture is more robust than the VertiComb.

6.3 Gradient poisoning attack

This experiment evaluates the robustness of the HoriChain and VertiComb architectures when they are affected by gradient poisoning attacks. The values selected to multiply and degrade the gradients are -1, -10, and 0. These values are selected to represent an update in the contrary direction of the gradient descent (gradients x -1), the same but it an exaggerated manner (gradients x -10), and not updating the model at all (gradients x 0).

Table 7 shows the accuracy and F1-score obtained by the HoriChain and VertiComb architectures for each gradient poisoning configuration and dataset. The results are obtained using unmarked samples during the testing phase to compare them fairly.

Table 7 Classification results for gradient poisoning attack

With a gradient multiplier of -1, the attacked network component takes a step in the opposite direction to the gradient but of equal magnitude. It means that in the HoriChain architecture, six of the seven participants take steps in the direction of the steepest descent. In contrast, one takes steps in the opposite direction (the adversary). Therefore, as Table 7 shows for MNIST, the test accuracy is 90%, but it does not reach the baseline performance of 95%. The impact is greater when using Fashion-MNIST as dataset, where the performance is decreased from 86% to 36%. In contrast, when using EMNIST and KMNIST, the results follow a similar pattern to MNIST, with a slight performance decrease when -1 is the multiplier. In the VertiComb architecture, one network component applies updates exclusively in the direction of the steepest ascent. In the case of VertiComb, the -1 multiplier is already effective enough to make the model generate random guesses and demolish its performance. As Fig. 6 shows, the training accuracy of the VertiComb starts at a high accuracy value and then drops down under 30% for both datasets. The test accuracy is consistently low but also decreases over the training phase. In contrast, the HoriChain architecture improves its accuracy during training. The confusion matrices of both architectures provide the details regarding how samples of each class are classified (see Fig. 7a and b).

Fig. 6
figure 6

Train and test accuracy with a gradient multiplier of -1

Fig. 7
figure 7

Confusion matrices for gradient poisoning attacks in MNIST

When the gradient multiplier is set to -10, the attack becomes more effective in the HoriChain architecture (see Fig. 7c). It happens because the attacked component takes a step in the opposite direction of the optimization process. This step is ten times the magnitude of the previous configuration attack. In the VertiChain architecture, as Table 7 shows, the accuracy is equivalent to random predictions.

Finally, with a gradient multiplier of 0, the targeted component does not change its weights during training. It means that in the HoriChain architecture, this attack is equivalent to having one fewer participant in the system. In the VertiComb architecture, the adversarial component maintains its initialized weights for the entirety of the training phase. Therefore, the outcome of the experiment depends heavily on the initialization in this case.

Table 7 shows that both architectures learn, but they are not as accurate as the baseline models. In the HoriChain architecture, the accuracy reached 95%, while the vertical obtained 92%. Figure 7e and f show that both architectures classify all labels overwhelmingly correctly in MNIST as an example, with only a handful of misclassifications in each label class. Finally, while the accuracy of both architectures is high, It is not as high as in the baseline.

In conclusion, the HoriChain architecture is more robust than the VertiComb when a gradient poisoning attack occurs. In the HoriChain architecture with a multiplier of -1, the attack slows down the progress of the training, but it does not prevent the learning process. In contrast, the performance of the VertiComb architecture deteriorates to an almost random level with the same attack setup. When the multiplier value is increased tenfold, the accuracy of both architectures is destroyed. Finally, both architectures are robust against gradient attacks with a multiplier established to 0.

6.4 Discussion

The data poisoning attack is successful because the chosen watermark adds significant noise to the genuine handwritten digits in all the datasets. In all authentic samples, the watermarked pixels are entirely black and devoid of information. Therefore, the model can differentiate a watermarked sample from one without, only needing a small number of samples to learn the backdoor behavior.

Neither the accuracy nor the confusion matrices reveal the presence of the data poisoning adversary in any of the experiments when unmarked samples are evaluated. Both metrics stayed near-identical to their baseline counterpart. Data poisoning attacks are extremely successful, and without an adequate and authentic baseline, their presence could absolutely go unnoticed.

In the proposed scenario and architectures, gradient poisoning attacks are also successful. In the HoriChain architecture, the adversary applies poisoned gradients to the entire model in every one of its updates. In contrast, in the VertiComb architecture, the adversary applies them only to the component of the model that they held. However, in VertiComb, the adversary has seven times as many updates to poison as their HoriChain counterpart.

With a gradient multiplier of 0, both HoriChain and VertiComb architectures only delay their learning process. With a value of -10, the attack is completely successful in both architectures since they fail when learning a capable model. When comparing the two architectures, the robustness difference appears with a gradient multiplier of -1. Here, the HoriChain architecture trains a model (delayed, compared to the baseline), whereas the VertiComb architecture deteriorates to almost random predictions.

7 Conclusion

This work presents three decentralized and FL-oriented architectures, HoriChain, VertiChain, and VertiComb, suitable for horizontal and vertical FL scenarios. To evaluate the architectures’ robustness, this work proposes a use case with non-IID data where handwritten digits are classified in a federated and decentralized fashion. After that, the architectures are attacked with two adversarial attacks called data-poisoning and gradient-poisoning. Both attacks are executed with different parameters controlling their efficiency. Finally, the impact of each attack on the classification accuracy, F1-score, confusion matrix, and client relevance of the architectures is analyzed and compared. The performed experiments conclude that even though particular configurations of both attacks can destroy the classification performance of the architectures, HoriChain is the most robust one for both attacks.

In future work, it is planned to propose novel decentralized and FL-oriented architectures equipped with heterogeneous countermeasures such as aggregation functions. Another future work is to evaluate less aggressive configurations of watermark attacks adding minor differences to handwritten digits. The consideration of multiple attacks at once and the choice between them at any time is also another interesting future step. Finally, new metrics apart from accuracy, F1-score, or client relevance could be defined to evaluate the performance of HFL and VFL architectures. More concretely, the gradient poisoning adversary is not revealed by any metric considered in this work, but their relative importance revealed the data poisoning adversary. Therefore, future work will create new metrics to detect a gradient poisoning adversary and any adversary utilizing any other attack.