1 Introduction

Despite the heightened public awareness and technological and regulatory efforts, frequent data breaches and privacy violations are still being reported.Footnote 1 The problem is that large amounts of data are centralized to a single service provider with current data storage and processing architectures, leading to a single point of privacy failure. The recently emerged Machine Learning (ML) architecture, named Federated Learning (FL), decentralizes data and its processing pipelines by allowing usersFootnote 2 to train intermediate models in their devices, effectively collaborating with the central service to build a global model for all clients without having to surrender the raw data to the central service.

Figure 1 describes the system architecture of FL and the training procedure. Steps 2, 3, 4, and 5 are repeated over time to keep the global model up to date across clients.

Fig. 1
figure 1

The system architecture of FL

Formally, FL can be considered an optimization problem where the goal is to minimize a global objective function that aggregates local models while respecting the constraints imposed by the distributed nature of the contributing clients and data (Wang et al. 2021a). Let \(\mathcal {C}\) represent the N number of clients participating in FL. Each client \(i\in \mathcal {C}\) has a local dataset \(\mathcal {D}_i\). The objective is to train a global model M by aggregating the local models of the clients. In FL, the learning process involves minimizing a loss function that is calculated on each client. This is achieved through a weighted aggregation method. The objective of FL is to minimize the following objective function as in Eq. (1):

$$\begin{aligned} min_{w}\mathcal {F}(w) = \frac{1}{N}\sum _{i=1}^{N}w_{i}\cdot f_{i}(w_{i}) \end{aligned}$$
(1)

Where \(w\) represents the model parameters, N is the total number of clients, \(w_i\) is the weight assigned to client i, and \(f_i(w)\) is the local loss function computed on client i. The objective function is minimized by iteratively updating the model parameters based on the aggregated contributions from each client. The weights assigned to each client can be influenced by factors such as client performance, available resources, or fairness considerations. To ensure privacy, FL employs techniques like federated averaging or secure aggregation.

Although this paradigm shift helps users have greater control and transparency over their data, many challenges arise from decentralization. For example, clients may “drop out” during the training phase due to poor network connectivity; or maintaining model performance is challenging due to unreliable model updates from clients; or users may contribute heterogeneous data/devices, causing model divergence and straggling (Lyu et al. 2020b; Lo et al. 2021b; Kulkarni et al. 2020). Many researchers have extensively studied and surveyed many of these general FL challenges recently. For instance, previous papers have focused on the system (design aspects (Rahman et al. 2021) and general challenges such as communication efficiency (Shahid et al. 2021), model performance (Wang et al. 2021b), and security (Lyu et al. 2020b) or the server’s perspective (statistical heterogeneity, client motivatability, and scalability) (Imteaj et al. 2021; Rahman et al. 2021).

However, the challenges from the clients’ perspectives are still under-explored. We refer to “client-side challenges” as the challenges clients face during the FL training procedures. The challenges may arise from security and privacy viewpoints (e.g., malicious servers or dishonest “peers”) and the complexity of FL processes as the computational burden is now placed on the clients.

These client-side challenges can affect a few or all clients on the network. For example, being able to personalize (fine-tune) a global model to a particular client would only affect those who want the capability. Privacy management challenges are relevant to every client.

We choose client-side challenges as our study focus because (i) client participation plays an important role in FL as they contribute resources. Shifting the data processing to clients may cause unintended mishaps and privacy risks as the wider population has limited technical knowledge (Kairouz et al. 2021), (ii) as clients contribute data and resources for FL, they should certainly be entitled to receive some benefits for their contribution. However, most clients are not technically savvy to define their requirements or understand the internal black-box mechanisms. For example, many mobile phone users are unaware that the predictions on Google keyboards are built using their data and resources,Footnote 3 and (iii) most surveys focused on the FL challenges in general. To the best of our knowledge, the review of client-side challenges and solutions is not yet well documented in the literature.

We conducted a comprehensive literature review on the selected research papers, tutorials, dissertations, and magazines in the FL domain to lay out the challenges from the clients’ perspectives. We categorized and grouped the articles according to their primary focus areas. We combine some focus areas to illustrate the challenges better. For example, data management, computation cost management, and communication cost management are combined as resource management. Further, we study the survey papers on the general FL challenges such as lack of motivation of clients, computational/communication cost, and privacy/security attacks (Lyu et al. 2020b; Rahman et al. 2021; Blanco-Justicia et al. 2021) and define the effect of these challenges from the client’s perspective. For example, incentive mechanism is a widely discussed issue for motivating clients to participate in FL. Previous papers discuss this issue in terms of incentive mechanism processes, algorithms, and client motivation. But, clients are more interested in the benefits and transparency of incentives to compare the benefits among themselves. In another example, despite extensive discussions on privacy challenges in the literature, a thorough analysis of these challenges from the clients’ perspective is lacking. Our study specifically focuses on examining five client-specific challenges: auditability, data granularity, re-identification, and consented data sharing under the privacy management challenge. By delving into these aspects, we aim to shed light on clients’ unique issues. Correspondingly, our work concluded with six main categories of client-side challenges.

The overall objectives of this survey are (i) outlining FL challenges from clients’ perspectives, (ii) providing an overview of the current research activities for the client-side challenges, (iii) summarising the challenges with existing approaches, and (iv) helping researchers to understand the open problems and future trends. We derived research questions (RQ) to achieve all the objectives and discuss them in the following section.

1.1 Research questions

  1. 1.

    RQ 1 What are the client-side challenges in FL? (Sect. 3): To enhance the usability of FL to clients, we first need to understand the challenges from the clients’ perspectives. Therefore the first research question focuses on client-side challenges.

  2. 2.

    RQ 2 What are the state-of-the-art solutions that address these challenges? (Sect. 4): As we analyze the challenges in RQ 1.0, we examine state-of-the-art solutions to the given challenges in this question.

  3. 3.

    RQ 3 Can a solution identified for one type of challenge be applied to other types of challenges? Are there any impacts to consider? (Sect. 5): Drawing on previous research and our understanding of existing solutions for the identified challenges, we assess a particular solution’s impact on the other challenges. Specifically, we analyze whether the solution can be applied to solve or potentially exacerbate other challenges.

  4. 4.

    RQ 4 What are the open challenges and possible future trends? (Sect. 6): This RQ focuses on open challenges and future trends in solving client-side challenges, which the literature does not cover fully.

The contributions of this paper are as follows: (i) identifying the challenges of FL from clients’ perspectives, (ii) comprehensive analysis of the solutions given in state-of-the-art approaches with 238 studies, and (iii) analysis and discussion on the impacts of applying the solutions to the identified challenges.

1.2 Sources selection and strategy

We followed a Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (Page et al. 2021) approach to conduct the survey. We searched through Google Scholar search engine to identify research papers on different client-side challenges. We set the time frame from 01.01.2017 to 31.01.2022. The source collection statistics in search strings and the number of articles are tabulated in Table 1. Besides the initial search, we included some additional papers using snowballing process (Wohlin 2014) through the bibliography of identified research papers. As our paper focuses on identifying the client-side challenges of FL, we included the search strings to retrieve the papers that discuss the challenges. Firstly, we screened the papers via title and abstract for including or excluding the papers. Figure 2 shows the PRISMA and snowball approach of the paper search and selection process.

Fig. 2
figure 2

PRISMA and snowballing approach for paper searching and selection (# implies number of)

Table 1 The statistics of source selection

After the initial screening through the title and abstract, we excluded 813 papers for the following reasons: long-short repetitive papers, non-relevance of the focused area, and a certain solution applied in multiple domains. We ended up with 331 papers for further full-text articles assessment with snowballing references. As we mainly focus on client-side challenges, we excluded 93 papers that were focused on the general challenges, survey papers (but we included them in our related works section), application papers (same technology applied in different domains), white papers, irrelevant to client-side issues, short versions of extended papers, and papers which adapt FL as privacy preserved approach (pure FL without any extensions). Finally, this study covered 238 studies.

The remainder of the article is organized as follows. Section 2 discusses related surveys in FL and how our study is unique from others. Section 3 introduces the client-side challenges in FL. Section 4 presents the state-of-art solutions to the client-side challenges. Section 5 discusses the linkage of challenges and applicability of current technologies. Section 6 opens the opportunities and trends for future work. Finally, we conclude the review in 7 concludes the survey.

2 Related works

This section provides an overview of the existing review papers on FL. Initially, we conducted a comprehensive search to identify and gather all the relevant surveys and reviews pertaining to FL. The majority of these surveys primarily focused on providing a general overview of FL, including its design aspects, application domains, and the overall challenges associated with FL.

However, to the best of our knowledge, no surveys have specifically delved into the FL challenges from the clients’ perspectives. While certain literature surveys have addressed individual challenges clients face, they often lack a comprehensive analysis that considers the clients’ viewpoints. Given this gap, our main focus was to thoroughly examine the client-side challenges in FL, considering the available state-of-the-art solutions from the clients’ perspectives.

General surveys focused on the high-level view of the FL environments, such as definitions, components, algorithms, optimization, importance, design aspects, application domains, trends, general challenges, and evaluation approaches. Previous surveys (Rahman et al. 2021; Li et al. 2020e; Aledhari et al. 2020) discussed the characteristics of FL, the general challenges, and the available solutions for FL with future trends. With these general views of FL, Yang et al. (2019) and Kairouz et al. (2021) included privacy and security aspects of FL also in their survey. Moreover, the studies (Li et al. 2021e; Zhang et al. 2021a) provided a comprehensive study of FL systems, incorporating model building, data partitioning, privacy, scalability, and communication architecture aspects of FL. On another note, Lo et al. (2021b) conducted a systematic review from a software engineering perspective. They covered FL lifecycles such as background understanding, requirements analysis, architecture design, implementation, and future trend evaluation.

Some other general surveys have investigated the algorithms and applications of FL. These surveys provide comprehensive overviews of the various algorithms and diverse application areas in which FL has been implemented. Li et al. (2020b) examines FL’s evolution and prevailing applications in industrial engineering. It aims to guide future applications and optimization in FL by reviewing related studies, addressing challenges, and discussing realistic applications in IoT devices, industrial engineering, and healthcare. Likewise, Wang et al. (2021b) provides practical recommendations and guidelines for designing and evaluating federated optimization algorithms through concrete examples and practical implementation. They also address the lack of consensus on core concepts in FL and offer suggestions on problem formulation and algorithm design. Ding et al. (2022) provides an outlook on the challenges and opportunities in FL across five emerging directions: algorithm foundation, personalization, hardware and security constraints, lifelong learning, and nonstandard data. The paper also touches on the challenges of data incompleteness, polarity, and complex dependency in FL.

Moreover, various systematic reviews have extensively examined different domains in the context of FL, including resource-constrained Internet of Things (IoT) (Imteaj et al. 2021; Du et al. 2020), mobile edge networks (Lim et al. 2020), wireless communication (Niknam et al. 2020), and healthcare and informatics (Xu et al. 2021). These reviews have explored existing studies, assumptions, challenges, applications, and problems within each domain.

Several surveys, including (Lyu et al. 2020b; Alazab et al. 2021; Mothukuri et al. 2021; Blanco-Justicia et al. 2021; Ma et al. 2020; Kurupathi and Maass 2020; Enthoven and Al-Ars 2021; Briggs et al. 2021), specifically delve into the comprehensive analysis of privacy and security challenges, applications, key techniques, trends, and open problems in the overall system of FL. In addition, prior studies such as (Kulkarni et al. 2020; Tan et al. 2021) have specifically examined the challenge of personalization in FL, providing insights into motivation, taxonomies, strategies, and future opportunities. A comprehensive analysis of incentive mechanisms in FL has been conducted by exploring existing works and key techniques in studies (Zhan et al. 2021; Zeng et al. 2021). Communication challenges in FL have been addressed by Shahid et al. (2021), while fairness challenges have been surveyed by Shi et al. (2021), covering aspects such as basic assumptions, fairness notions, taxonomies, metric evaluation, and future directions. However, these surveys primarily focus on the challenges of FL from a general perspective, implying a lack of emphasis on the client or user viewpoint. Table 2 outlines the parallel surveys that focused on FL.

Table 2 Summary of exciting surveys on federated learning

Our work distinguishes itself from others in the following ways: (i) We specifically address the client-side challenges in a federated environment, recognizing the significance of clients in contributing resources and data. We highlight the potential risks and privacy concerns associated with complex FL processes delegated to clients, (ii) We thoroughly analyze the client-side challenges and their interdependencies in a federated environment, (iii) We adopt the PRISMA approach, providing a clear methodology for our survey, which is lacking in many existing works, (iv) we extensively cover research papers and survey papers in our review, (v) We discuss the open client-side challenges and future trends, and (vi) our review is up-to-date, incorporating papers published until January 2022.

In contrast to existing surveys that primarily examine FL challenges from the system, server, technical, and taxonomy perspectives, our work takes a unique approach by analyzing the challenges and solutions specifically from the clients’ perspectives. We provide insights into the impacts involved in addressing these challenges. To the best of our knowledge, this work represents the first comprehensive exploration of client-side challenges in FL, offering a fresh perspective on the existing literature in this domain.

3 Client-side challenges in federated learning

This section describes the client-side challenges in the FL environment. Through a comprehensive analysis of the literary works, we were able to identify six main categories of client-side challenges such as (i) personalization, (ii) privacy management, (iii) incentive management, (iv) resource management, (v) data and devices security management, and (vi) fairness management. We analyzed these challenges more granularly from the clients’ perspectives and identified precise issues under the above categories as shown in Fig. 3.

Fig. 3
figure 3

The overview of the client challenges

3.1 Personalization

Generally, personalization means designing a product/service to meet a user’s requirements.Footnote 4 Based on the literature (Tan et al. 2021; Kulkarni et al. 2020; Li et al. 2021f), we describe the personalization challenge from the clients’ perspectives as follows.

In FL, clients with a common goal join the FL environment because they do not have enough data to attain their objective with high performance and generalization guarantees. In a traditional FL setting, every client receives the same global model. However, some clients may require personalized models catered for their preferences, much like the recommendation systems. For example, the next word prediction scenario in Gboard,Footnote 5 each client’s data can have quite different distributions (e.g., due to different texting habits) Therefore, next-word predictions should be personalized to the client while handling generalized situations (predictions of unknown sentences not in the client’s data) with global predictions. Another case is that some clients’ data are unique, which means the performance of the global model may not be satisfactory. The users may then may wish to fine-tune the model to get more personalized results.

Achieving an optimal balance between generalization and personalization poses a significant challenge for clients in the context of FL. It is crucial to strike the right balance to obtain the best results. The algorithm should be designed to handle the optimal balance by leveraging the available client data to address this.

3.2 Privacy management

Even in a federated environment, privacy is a concern, as sensitive information can be leaked through the built models (Zhu and Han 2020). It can occur due to the dishonest server or collusion attacks of other clients. The inference of raw data from the model poses a significant concern as it results in the loss of data control for the owners. This data leakage can potentially give rise to severe issues and vulnerabilities for the data owners.

We conducted a comprehensive analysis of the literature on privacy management (Mothukuri et al. 2021; Kurupathi and Maass 2020; Enthoven and Al-Ars 2021; Fang et al. 2022; Katevas et al. 2020), considering users’ perspectives and categorizing precise issues under privacy management. Additionally, we examined general privacy threats such as linkability, identifiability, non-repudiation, detectability, information disclosure, content unawareness, and consent non-compliance (Deng et al. 2011), adapting the relevant challenges to the FL context from the users’ viewpoints. Consequently, this survey addresses four key issues in privacy management: auditability, consent data sharing, data granularity, and reidentification.

3.2.1 Auditability

Auditing is an assessment mechanism for measuring the quality of a process’s lifecycle to ensure accuracy and efficacy.Footnote 6 It allows users to make judgments based on all transactions that have occurred in the past. In the context of FL, auditing becomes more complex due to the involvement of multiple parties. It involves recording all FL activities, including model parameter transactions, participation records, model attributes, data or resource contributions, and accuracy. By providing visibility into the FL environment, auditing enables users to understand ongoing activities and make informed decisions about their continued participation.

3.2.2 Consented data sharing

Consented data sharing involves participants willingly agreeing to share their data, fully aware of the associated risks, benefits, and purpose(Deng et al. 2011). In the context of FL, individuals may unknowingly provide excessive information,Footnote 7 ultimately relinquishing control over their data. The more information a client discloses, the greater the potential risk of privacy breaches.

3.2.3 Data granularity

Data granularity refers to the level of detail present in a data.Footnote 8 In the context of FL, where a large amount of client data is involved, users should be able to determine the granularity level when building models. Each user may have different privacy preferences, with some preferring to share more data and others opting to limit the sharing of specific data items, such as location, medical history, and ethnicity, or reducing the granularity of address information to city or zip code level. However, manually managing these settings can be cumbersome and inconvenient for users.

3.2.4 Reidentification

While FL is appreciated by many users for its ability to protect their actual data, recent studies (Orekondy et al. 2018) have demonstrated that data can still be reidentified from models built through FL. In order to prevent data reidentification and protect their valuable and current data, clients need mechanisms to safeguard against inference attacks. For instance, location trajectories can reveal sensitive information such as points of interest, social relationships, and user identities (Khalfoun et al. 2021), which poses a significant threat to clients, given the widespread collection of precise location data by IoT devices. Therefore, it is crucial to implement privacy protection mechanisms to safeguard users’ privacy.

3.3 Incentive management

Clients play a crucial role in the FL process by contributing valuable resources such as computation capacity, battery, data, memory, and bandwidth. However, these resources are limited and costly for clients, who also face privacy and security risks when participating in FL. In order to incentivize clients and encourage their active involvement, the FL framework offers various incentives such as personalized models that perform well on their own test data, accurate global models trained on comprehensive datasets, monetary compensation, reputation benefits, user-defined incentives, and auxiliary information like bias and model fairness considerations (Tu et al. 2022). These incentives aim to offset the costs and provide clients with tangible benefits for their contributions in FL.

Incentive mechanisms are widely discussed in the literature (Zhan et al. 2021; Tu et al. 2022; Ding et al. 2020; Kang et al. 2019a; Liu and Wei 2020), focusing on rewards types, motivation towards client participation, incentive calculation challenges, and incentive scheme design. They discussed the generic challenges of incentive mechanisms such as (i) complexity in determining an optimal incentive for the clients in a closed environment with no information on data structure, resource capability, and client infrastructure (Zeng et al. 2021), (ii) deriving an appropriate metric to quantify the contribution of clients. Generally, local model accuracy is used as the evaluation metric, which can be biased to some clients with unique values that do not contribute much to the global model (Zhan et al. 2021; Zeng et al. 2021), and (iii) consequently, motivating clients to actively participate in FL through incentives becomes a challenging task, as the design of an approximation scheme is often intricate (Kang et al. 2019b). It is important to note that these challenges are predominantly addressed by the servers, as they bear the responsibility of resolving these more generic issues in the context of incentive mechanisms.

However, the issue of transparency arises as a prominent concern in incentive management for clients, as it greatly influences clients’ decisions regarding participation. Transparency entails being fully visible, open to scrutiny, and clear, with no hidden aspects.Footnote 9 However, due to confidentiality reasons, clients typically do not share their incentive decisions and compensation information with others, resulting in a lack of transparency in incentive scheme selection. This lack of transparency can lead to unfairness or ignorance of entitlements among clients. Furthermore, users with limited technical expertise may struggle to assert their rights and claim their rightful rewards. Balancing transparency with the need for confidentiality, fairness, security, and privacy presents a significant challenge in designing a robust and transparent incentive allocation mechanism.

3.4 Resource management

Given clients’ devices’ limited and costly resources, effective resource management becomes essential to ensure optimal performance within the given capacity limitations. The literature has extensively analyzed resource management in three key categories: data management (Moon et al. 2020; Jeong et al. 2018; Shin et al. 2020), computation management (Nour et al. 2021; Ren et al. 2019; Ji et al. 2021), and communication management (Shahid et al. 2021; Yue et al. 2022; Sattler et al. 2019b). Following a similar approach, we address client-side challenges in resource management by comprehensively considering all three aspects in our survey.

3.4.1 Data management

The current digital age boosts the quantity and frequency of data generation from various resources (such as social networks, IoT devices, and health centres). The heterogeneous nature of devices results in varying amounts and quality of data generated by different clients. Consequently, the performance of FL global models is adversely affected when built using diverse datasets. Particularly, clients with limited, unique, or unbalanced data are significantly impacted by this challenge.

Poor performance is a common issue experienced by clients in FL, and one of the reasons behind this is data management challenges, such as data scarcity, data imbalance, and data representation. Our survey focuses on addressing data imbalance and representation challenges while acknowledging that data sparsity is commonly addressed in the literature through techniques like sampling. However, managing data in the FL environment is particularly challenging as the data remains decentralized. Consequently, the proposed solutions should consider client-side approaches (considering clients’ limited technical expertise) or platform-based mechanisms that respect user privacy.

3.4.2 Computation cost management

Many researchers (Imteaj et al. 2021; Nour et al. 2021; Ren et al. 2019; Ji et al. 2021) widely analyzed the computation cost management challenge as it directly affects clients’ participation. Typically, servers run these complex algorithms in a centralized architecture with hundreds of GPU machines to produce their model. But, resource-constrained devices run the FL algorithms in the background while managing their main tasks. Given the enormous resource requirements of FL algorithms, clients’ devices are often limited and energy-consuming. Consequently, many clients are hesitant to dedicate all their resources solely to the FL process. However, clients often lack control over their devices, limiting their ability to decide when to participate. This lack of authority over participation poses challenges for clients in achieving better local models, efficiently utilizing resources, and minimizing computational costs.

3.4.3 Communication cost management

Besides preserving privacy, FL divides the computational power among clients and reduces the communication burden by transferring models instead of raw data. However, despite these advantages over traditional centralized architectures, clients still encounter communication challenges. These challenges arise from the resource constraints of devices, unreliable network connections, communication frequency, the transmission of large gradient vectors in complex deep neural networks, and the distance between clients and servers (Shahid et al. 2021).

3.5 Data and devices security management

The multi-party closed nature of the FL environment, where client and server information is not exchanged, introduces vulnerabilities and challenges in monitoring trustworthiness. Additionally, the FL environment is dynamic, with new clients and models constantly being introduced, requiring a continuous verification process. This ongoing verification process adds complexity to ensuring the trustworthiness of participants in the FL ecosystem.

The literature (Alazab et al. 2021; Ma et al. 2020; Fang et al. 2020; Bagdasaryan et al. 2020; Zhang et al. 2019; Lin et al. 2019) discussed several security breaches such as model invalidation, data/model poisoning, model inference, backdoor attacks, malicious clients, and malicious server. Adversaries use various vulnerable ways such as communication medium, client data manipulation, dis-honest server, and aggregation algorithm to attack the environment (Mothukuri et al. 2021). Hence, clients must employ defense mechanisms to protect themselves from adversarial attacks that can be initiated by other clients, servers, or external attackers. These attacks can compromise the shared goal of FL and put data at risk.

3.6 Fairness management

Fairness in FL entails treating every client impartially, without any bias or discrimination (Ezzeldin et al. 2021). Consider a face recognition scenario where the FL server has access to many mobile devices used by white users but only a few used by black users. Consequently, the model may exhibit better performance in recognizing the faces of white individuals compared to black individuals.Footnote 10 However, achieving fairness among clients is challenging due to statistical and system heterogeneity. Defining fairness itself lacks consensus, with different notions representing specific interests and aspects of participant groups. Therefore, attaining acceptable fairness in a multiparty collaboration environment is complex.

In the context of FL, fairness is a multifaceted concept, and different mathematical criteria have been proposed to capture fairness. One widely used criterion is equalized odds, which aims to ensure that the probability of a positive prediction is the same across different groups, irrespective of their protected attributes.

A predicting algorithm satisfies equalized odds if it ensures that both the true positive rate (TPR) and the false positive rate (FPR) are equal across different groups (Garg et al. 2020). More formally, equalized odds requires that the group-specific TPR satisfy Eq. (2) and FPR satisfies Eq. (3).

$$\begin{aligned}{} & {} P(y' = 1 \mid y = 1, G = 0) = P(y' = 1 \mid y =1, G = 1) \end{aligned}$$
(2)
$$\begin{aligned}{} & {} P(y' = 1 \mid y = 0, G = 0) = P(y' = 1 \mid y = 0, G = 1) \end{aligned}$$
(3)

In both cases, the equation is comparing the conditional probabilities of the predicted label \(y'\) being 1, under different scenarios based on the values of the ground truth label \(y\) and the protected attribute G, which indicates different groups.

Other notions of fairness discussed in the literature are accuracy parity (uniformity in performance across clients), good-intent fairness (minimizing loss for underlying protected client classes), group fairness (minimizing disparities in algorithmic decision-making across groups), selection fairness (reducing FL model bias by increasing the participation of under or never-represented clients), contribution fairness (rewarding clients in proportion to the client’s contribution), regret distribution fairness (minimizing the regret difference among clients. Regret indicates the difference between what the client has received so far and what they deserve), and expectation fairness (minimizing inequality between clients over a period until receiving rewards) (Shi et al. 2021). A considerable amount of literature (Li et al. 2021f; Garg et al. 2020; Divi et al. 2021b; Yu et al. 2020a) has been published on the fairness concept in the FL domain. We comprehensively analyzed the challenges and approaches focused on those studies from clients’ perspectives. We derived two critical issues under fairness management: measuring fairness and practicing fairness in different disciplines.

3.6.1 Measuring fairness

As fairness is a variable and complex concept in FL, it is much more difficult for users to understand what is happening and whether they are treated fairly. Existing FL models primarily rely on performance evaluation metrics like accuracy and efficiency, which may not adequately capture fairness considerations. Adapting fairness metrics to evaluate model quality in a collaborative environment is crucial, as global model accuracy may vary among clients and may not align with individual client contributions. However, measuring fairness using all clients’ data in a closed environment is often infeasible, and measuring fairness locally using only client data is inadequate due to limited data and the unknown distribution of other clients’ data. A user-friendly and explainable fairness framework that accounts for fairness among clients would be an ideal approach to alleviate the challenges associated with fairness management.

3.6.2 Practice of fairness in different disciplines

Fairness is a widely studied concept in various disciplines, encompassing incentives, resource allocation, performance evaluation, privacy, client reputation, and addressing group bias. Incorporating fairness into FL requires algorithmic modifications, which primarily rely on the support and intervention of platforms and servers. Although clients may not directly influence algorithmic changes, the issue of ensuring fairness poses a significant challenge for them in the context of FL. The lack of control over the implementation of fairness becomes a noteworthy obstacle that affects their participation.

4 State-of-art-solutions on the client-side challenges

In this section, we will explore the current state-of-the-art solutions for the challenges faced by clients in FL. The research focus on these challenges has significantly grown in recent years, as evidenced by the increased number of papers published between 2020 and 2021.

Addressing client-side challenges in FL is typically expected to rely on users’ involvement, as the architecture empowers them with greater control over their data. However, this approach can lead to unintended consequences when complex challenges are delegated to users with limited technical expertise. Many client-side challenges, such as resource management, fairness, security, and incentive management, require the involvement of platforms and servers for effective solutions. These challenges are inherently tied to the tasks and algorithms of the system.

For instance, achieving fairness in FL requires the integration of fairness considerations within server-side algorithms to ensure equal treatment of all clients. Similarly, platform-level changes are necessary to enhance transparency in incentive mechanisms, such as incorporating blockchain technology. Therefore, this section will explore solutions that involve collaboration between clients, servers, and platforms to address these challenges.

4.1 Solutions for personalization challenges

This section reviews the existing approaches proposed in the literature to address the personalization challenge in the FL. These approaches can be categorized into two main types: single-client-based personalization and cluster-based personalization. The single client-based personalization approaches focus on enhancing personalization by employing additional algorithms directly on individual clients. These algorithms aim to adapt the model to better suit each client’s specific characteristics and preferences, resulting in a more personalized model.

On the other hand, the cluster-based personalization approaches involve grouping together clients with similar data distributions and objectives. By forming clusters of similar clients, the FL process can generate a more tailored and personalized model that aligns with the common characteristics and goals of the cluster members.

By exploring these two categories of approaches, we gain insights into the diverse strategies employed to address the personalization challenge in FL.

4.1.1 Single client-based personalization

Single client-based personalization algorithms aim to enhance personalization by involving clients directly in the process. These algorithms introduce additional calculations or modifications on the client side, allowing clients to actively participate in improving their own personalization. They tune the global model based on the client’s data to improve the model’s accuracy. We classify these approaches as fine-tuning methods and local model-global model closeness methods.

Fine-tuning approaches try to minimize the individual loss of each client by making small adjustments in the global model using a few gradient steps on gradient values based on the client’s data (Deng et al. 2020). Adjusted models give accurate and personalized results for the clients. It can be considered a post-processing method. Local-model-global-model closeness approaches find the closeness between local and global models to achieve optimal personalization points for the client (Li et al. 2021f).

4.1.1.1 Fine-tuning approaches

Fine-tuning approaches assume a similarity in the task across all clients but adjust the loss function based on each client’s data distribution. These methods involve applying fine-tuning algorithms to the global model on the clients’ edge, enabling personalization based on their individual data.

Meta-learning is a fine-tuning approach that involves training an initial model on multiple tasks, enabling it to quickly adapt and learn new tasks with limited training data on the client’s end (Kulkarni et al. 2020; Jiang et al. 2019a). One notable concept in meta-learning is Model-Agnostic Meta-Learning (MAML), introduced by Finn et al. (2017), which is compatible with various deep learning models trained using gradient descent. The MAML framework consists of two main steps: meta-learning, where the model is trained on multiple tasks, and meta-testing, where the model adapts to a new task. Researchers Jiang et al. (2019a) and Fallah et al. (2020) have applied the MAML concept in the context of FL. Another meta-learning approach is parameterized algorithms (Chen et al. 2018), where clients receive algorithm parameters instead of a global model. This allows clients to fine-tune the algorithm based on their specific data for personalized learning. Additionally, studies by Khodak et al. (2019) and Balakrishnan et al. (2021) have extended meta-learning techniques to address dynamic environments and efficient resource allocation, respectively.

Base + Personalization layers with local parameters is a fine-tuning approach that involves clients sharing a common set of base layers with consistent weights while each client maintains individual personalization layers tailored to their specific data (Arivazhagan et al. 2019). This approach allows clients to incorporate their unique data characteristics while benefiting from the shared knowledge in the base layers. The model parameters derived from base layers are shared with the server, while calculation from personalized layers is retained in the client. Cheng et al. (2021) and Jourdan et al. (2021) further improved personalized layers with a stylized regression model and local adaptation.

Collins et al. (2021) introduced an approach that combines a low-dimensional local model with a learned global model to address the personalization challenge. The algorithm utilizes gradient updates to learn a global representation, enabling clients to compute personalized low-dimensional classifiers for individual labeling (Liang et al. 2020). Similarly, the approach proposed by Liang et al. (2020) also adopts a similar strategy of learning features locally and globally.

Transfer learning is learning a new task by transferring the knowledge gained from the other tasks (Torrey and Shavlik 2010). Wang et al. (2017) adapt this technology in FL to tackle the personalization challenge. FedHealth (Chen et al. 2020c) applied transfer learning in the healthcare FL domain for personalization.

Knowledge distillation is another technique where a smaller model (student) learns from a larger network (teacher) by mimicking its behavior (Li and Wang 2019). The studies (Li and Wang 2019; Ozkara et al. 2021; Divi et al. 2021a; Yu et al. 2020d) adapted the knowledge distillation technique into FL to improve personalization and communication efficiency.

Nadiger et al. (2019) adapted the reinforcement learning technique as the fine-tuning approach to make decisions sequentially. It employs trial-and-error procedures until a solution is found for a task in the client (Kaelbling et al. 1996). Hard et al. (2018) used contextual information such as logs and caches to fine-tune the character recognition task. They showed that adding contextual information boosted personalized performance.

Recent works (Yurochkin et al. 2019; Achituve et al. 2021; Yue and Kontar 2021; Kontoudis and Stilwell 2022) have integrated Bayesian and Gaussian Process (GP) techniques in FL to achieve personalized global models. More specifically, by incorporating prior information, the local data on each client can be leveraged as a personalization role in training FL algorithms. Yurochkin et al. (2019) proposed Bayesian nonparametric FL of neural networks, synthesizing a more expressive global network without additional supervision. Achituve et al. (2021) shared a kernel function across all clients, employing a personal GP classifier for each client. Similarly, Yue and Kontar (2021) utilized GP in their regression framework (FGPR), resulting in personalized global models by jointly learning a global GP prior across all clients. Kontoudis and Stilwell (2022) incorporated GP in training and optimization using alternating direction method of multipliers, employing decentralized aggregation techniques for GP prediction through iterative and consensus methods.

Apart from discussed techniques, Li et al. (2021a) proposed a heterogeneous masking technology for fine-tuning, where clients learn a personalized and structured sparse model without changing local model parameters. Dinh et al. (2020) regularises the loss algorithm using the Moreau envelope to improve the personalized results. The works (Hu et al. 2020b) and (Yang et al. 2021b) applied differential privacy (DP) to achieve personalization. Zhang et al. (2021d) achieved personalization by allowing users to transfer personalized knowledge (update prediction) to the server. The global is getting updated based on the clients’ predictions updates.

The fine-tuning process is efficient and rapid due to the internal representation of multiple models, allowing for excellent performance on new tasks with minimal data points and training iterations.

4.1.1.2 Local model-global model closeness approaches

Smith et al. (2017) adapted multitask learning to measure the closeness between the local model and global model. Multitask learning is the process of modeling naturally related tasks at a time and measuring the relationship among them. The studies (Mills et al. 2020; Mahara et al. 2021) extended Smith et al. (2017)’s research in different domains. Yu et al. (2020b) combined reinforcement learning with multitasking to achieve better results. Recently, Li et al. (2021f) proved that multitask learning can improve fairness and robustness along with personalization.

Model interpolation is defined as training a separate local model based on the local and global data and combining them for better performance in FL (Mansour et al. 2020). The studies (Peterson et al. 2019; Hanzely and Richtárik 2020) adapted this technology with experts’ opinions in the domain to build personalized models for clients. Mansour et al. (2020) integrated model interpolation with clustering and data interpolation (training a model on combined local and global data) for better results. The studies (Deng et al. 2020; Zhang et al. 2020c; Luo and Wu 2021) achieved personalization by allowing the clients to build their local models simultaneously with global model building. They used the optimal mixing parameter to mix global and local models. Wu et al. (2021b) presented a hierarchical personalized FL framework in which clients initially define hierarchical information about their data (public and private). Only the public component will be uploaded to the server.

While the techniques mentioned above are commonly employed in the literature to achieve personalization, they rely on algorithms that clients have no control over. Moreover, the resource-constrained client environment makes it computationally challenging to perform the required additional calculations. Clients must allocate their limited computational power to accommodate these algorithms in order to obtain personalized models.

4.1.2 Cluster-based personalization approaches

Single client-based personalization approaches conform when clients’ data distributions are similar. But, when the data distribution is naturally clustered among clients, finding an optimal personalized solution for all clients is difficult. Clustering has been proposed as a personalization solution in the literature. By grouping similar clients together, the model can mitigate the impact of heterogeneous data distribution.

Various clustering techniques have been proposed in the literature to address personalization in FL. These techniques include hard clustering, soft clustering, hypothesis-based clustering, attribute-based clustering, hierarchical clustering, and user-centric clustering. These approaches are typically implemented on the server side, as they involve collecting and aggregating local models. The clustering of clients is often determined based on the values of model parameters (Table 3).

Table 3 Summary of personalization approaches

Hard clustering means assigning a client to only one cluster; A client cannot belong to two clusters. The studies (Ghosh et al. 2020; Vahidian et al. 2021; Huang et al. 2019; Duan et al. 2021; Xie et al. 2021; Li et al. 2020f) adapted hard clustering technique in a federated environment to iteratively assign the clients in clusters. The cluster that provides the least loss updates was selected as the appropriate cluster for that client. Sattler et al. (2020) incorporated the hard clustering technique with multi-task learning based on cosine similarity between the gradient updates.

However, hard clustering in FL faces certain challenges, including unstable training, sub-optimal user assignment, and inefficiency when dealing with a large mix of data distributions. Researchers have introduced a solution known as soft clustering to address these issues. Unlike hard clustering, soft clustering allows clients to be partially assigned to multiple clusters, creating overlapping clusters. The approach presented by Li et al. (2021b) provides enhanced flexibility and robustness in addressing the challenges arising from client heterogeneity in FL.

As another approach, Mansour et al. (2020) applied hypothesis-based clustering, where clients are partitioned according to the best hypothesis based on a stochastic expectation maximization algorithm. Further, a hierarchical clustering approach was adapted in (Briggs et al. 2020; Yoo et al. 2021) to group clients using the similarity between local updates and the global server. The hierarchical clustering algorithm iteratively merges the most similar clients in each round until a given threshold.

In addition to the ones previously discussed, a user-centric federated clustering approach was proposed by Mestoukirdi et al. (2021), aiming to minimize communication overhead in FL. Instead of relying on the generic federated averaging algorithm, they introduced multiple user-centric aggregation rules in the server to obtain clustering results. Unlike traditional approaches that use model parameter values for clustering, they focused on client characteristics such as data size and distribution from the server’s perspective. Another recent study by Kim et al. (2021) explored dynamic clustering, which adapts the clusters based on the changing environments.

However, clustering approaches in FL have limitations such as limited client control, privacy risks during data transfer, and a fixed number of clusters, except in the case of dynamic clustering (Kim et al. 2021).

4.2 Solutions for privacy management challenges

This section delves into state-of-the-art solutions that target different client privacy challenges. These challenges are classified based on the precise issues we discussed in subsection 3.2 regarding privacy management.

4.2.1 Auditability in privacy management

The current state-of-the-art is on the basics of two main techniques for auditing: blockchain and visual analytics.

4.2.1.1 Blockchain technology for auditability

Researchers utilize Blockchain as a popular technology for auditing FL transactions due to its robustness, immutability, and auditability (Swan 2015). These platform-based solutions involve integrating blockchain with FL’s architecture to address privacy challenges.

A study by Lu et al. (2019) proposed a privacy-preserving FL data-sharing architecture that leverages blockchain. The blockchain is utilized to store information related to client selection, data statistics, encrypted retrieval transactions, data sharing requests, and transactions while ensuring the privacy of raw data. Similarly, FLchain (Majeed and Hong 2019) utilizes blockchain to store local model parameters as blocks, enabling trackability throughout global iterations.

The studies (Lu et al. 2019; Zhao et al. 2020a) also adapted blockchain in the IoT domain. Zhao et al. (2020a) replaced the central server with a blockchain-based system to store and aggregate local models, enabling tracking of malicious activities. BlockFlow (Mugunthan et al. 2020b) is another decentralized FL system that leverages blockchain to provide auditability, reward clients for their contributions, and offer protection against malicious adversaries.

Another study, VFChain (Peng et al. 2021), is a verifiable and auditable blockchain-based framework. A selected committee aggregates and records the local models after verifying them. Lo et al. (2021a) enhance FL architecture with reliability, accountability and fairness by integrating blockchain. Accountability is achieved by designing a smart contract-based data-model provenance registry.

To address the auditability challenge and comply with data privacy regulations, servers have implemented blockchain-based architectural changes. These changes aim to increase participant transparency and trust, attracting more client participation. Although users may not possess the technical knowledge to fully understand blockchain architecture, they can rely on its immutable and transparent nature. By recording all FL activities, the blockchain allows users to verify their transactions with the support of legal and technical expertise when necessary.

4.2.1.2 Visual analytics for auditability

Visual analytics approaches empower users by involving them directly in the FL process, providing visibility into various aspects of FL activities. These approaches enable users to monitor and analyze data usage, client information, model aggregation, and accuracy distribution. By offering this level of transparency, users can fulfill their audit objectives and gain insights into the inner workings of FL.

Turbo Tucoon (Mike 2018) and FATE-Board (Fan 2018) are shallow-level analytical tools for FL. Turbo Tucoon summarises process logs and model performance for users to monitor and visualize the system. FATE-Board visualizes real-time log metrics, dataset information, task workflow, model output, and evaluation metrics. Wei et al. (2019) proposed a multi-agent visualization system demonstrating FL, multi-client coordination, input, and output through a game. However, the domain of this approach is for car racing games, and it is difficult to generalize to all the domains.

LEAF (Caldas et al. 2018) is a benchmark visual analysis tool with statistical and system metrics. LEAF can be applied in FL, meta-learning, multitask learning, and on-device learning. But, LEAF is mainly designed for tech-savvy users, mainly software engineers; It can be complicated for general users. PrivacyFL (Mugunthan et al. 2020a) helps users ensure collaboration is feasible and improve their model accuracy.

FedEval is a comprehensive, easy-to-use benchmarking framework that comprises accuracy, communication, time efficiency, privacy, and robustness. Although the system is primarily built for researchers to perform evaluations, users can visualize their performances.

HFLens (Li et al. 2021d) is a comparative visual interpretation system for fine-grained analysis of communication rounds and client instance levels. It analyses the overall client processes, correlation of clients’ information with communication rounds, potential anomalies, data quality, and client contribution.

4.2.2 Consented data sharing in privacy management

Limited research focuses on addressing the challenge of consented data sharing in the FL environment. These approaches typically require collaboration between clients and servers. The server is responsible for obtaining user consent before collecting and sharing their data, while users have the autonomy to decide whether or not to share their data with the server.

The policy-based privacy setting framework “PoliFL” (Katevas et al. 2020) offers users a feature to choose which data to share in FL. Users can opt out of certain data based on their privacy preferences. DS2PM (Chen et al. 2021) protects privacy, integrity, and data ownership using blockchain. Data sharing occurs using an on-chain data retrieval mechanism with owner permission. The framework ensures the auditing and verification of transactions too.

4.2.3 Granularity in privacy management

Granularity solutions primarily rely on user-driven privacy settings mechanisms, allowing users to control the level of granularity in data sharing. Users have the flexibility to define the extent and specifics of data they are willing to share.

PoliFL (Katevas et al. 2020) offers heterogeneous privacy policies as users may have different privacy requirements. The server is responsible for aggregating locally processed models with different datasets. PoliFL considered three policies: a policy that permits all FL activities, a policy that permits FL with DP, and a policy that restricts specific data sources (varied among users) when training the FL model. The results showed that PoliFL performs well with heterogeneous policies within reasonable resource and time budgets.

Similarly, using an opt-out DP algorithm, the FeO2 framework (Aldaghri et al. 2021) protects clients’ privacy. Users may opt out of certain features of their data or additional privacy-enhancing mechanisms based on their privacy needs.

4.2.4 Re-identification in privacy management

Privacy-preserving techniques in the literature for the re-identification challenge can be categorized into three main approaches: perturbation, Secure Multiparty Computation (SMC), and Homomorphic Encryption (HE). These approaches typically involve clients and platforms in order to safeguard against re-identification and uphold privacy. They achieve this by introducing an additional layer of protection in FL through techniques such as noise injection, secret sharing, or encryption of the model parameters.

4.2.4.1 Perturbation approaches

Perturbation techniques, such as DP, involve adding noise to local parameters to ensure privacy in FL. DP creates anonymous data by introducing noise, allowing for statistical analysis without revealing sensitive personal information or individual client identities. Client-side perturbation, known as local differential privacy (LDP), involves data owners adding randomization or noise to their data before sharing it with a third party, addressing privacy concerns in FL (Dwork 2009; Tyagi 2022).

If a randomized algorithm \(\mathcal {A}\) satisfies Eq. (4), then it provides \(\varepsilon\)-LDP.

Definition 1

An algorithm \(\mathcal {A}\) satisfies \(\varepsilon\)-LDP if, for any two data values \(v_1\) and \(v_2\), and for any output \(Q\) within the range of outputs of \(\mathcal {A}\), the following holds Eq. (4):

$$\begin{aligned} P\left[ \mathcal {A}(v_{1})\in Q \right] \le exp(\varepsilon )P[\mathcal {A}(v_{2}\in Q)] \end{aligned}$$
(4)

Here, \(\varepsilon\) represents the privacy budget, quantifying the level of privacy. This definition ensures that the probability (P) of obtaining an output \(Q\) from \(\mathcal {A}\) on \(v_1\) is at most \(e^{\varepsilon }\) times the probability of obtaining the same output \(Q\) on \(v_2\).

Geyer et al. (2017) tackled the re-identification issue from the client’s standpoint. They used client-side DP to preserve complete data privacy and optimize performance. The studies (Wei et al. 2020a; Choudhury et al. 2019) also adapted DP to prevent information leakage by adding noise before aggregation. They showed that different variations of artificial noise lead to different levels of protection.

Another client-side approach, LDP, was adapted in (Zhao et al. 2020b; Seif et al. 2020; Truex et al. 2020), where clients locally perturb their data before sharing. The local privacy approach reduces communication costs and privacy threats. Wei et al. (2021) derived user-level DP algorithm extending the local privacy. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a client’s entire contribution. As the extension of DP, Triastcyn and Faltings (2019) adapted Bayesian DP, which adjusts noise according to the data distribution instead of the random adjustment.

In their work, Marathe and Kanani (2022) focused on subject-level privacy in FL, where a subject’s private information is represented by multiple data items within or across federation clients. They achieved subject-level privacy by introducing noise to the data and training the noisy data in mini-batches. This approach aimed to protect the privacy of individual subjects while enabling effective collaborative learning in FL.

Sherpa.ai framework (Rodríguez-Barroso et al. 2020) was built based on FL and DP. This framework helps to build FL with DP without developing from scratch using the offered functionalities. So, clients with limited technical knowledge can use this framework to integrate DP technology into their data.

4.2.4.2 Secure Multiparty Computation approaches

SMC is a platform-based solution that aims to enable the secure computation of a consensual function among clients without relying on any trusted third parties (Goldreich 1998). In SMC, input data is either masked or secret shared, and the computed result is typically disclosed to all parties involved. SMC offers the advantage of relatively low computational overhead, but it necessitates multiple rounds of interaction among the participating parties to achieve secure computation.

Figure 4 provides an overview of the SMC approach, depicting the secure sharing of private inputs among parties, the use of secure protocols for computations (function F), iterative interaction for communication and secure computations, and result in reconstruction for revealing the final result while preserving privacy.

Fig. 4
figure 4

The overview of secure multiparty computation approach

Within the domain of FL, Truex et al. (2019) presented a privacy-preserving framework that combines DP and SMC. Their approach aimed to strike a balance between these two techniques, minimizing noise injection while preserving privacy. The framework incorporated a tunable trust parameter to accommodate various trust scenarios, ensuring both accuracy and privacy assurance in the FL process.

Another approach, HybridAlpha (Xu et al. 2019a) uses DP and functional encryption to employ SMC protocol. Functional encryption protocol supports mitigating inference attacks from curious aggregators and colluded clients.

4.2.4.3 Homomorphic Encryption approaches

Homomorphic Encryption (HE) allows computations to be performed on encrypted data. In HE, as shown in Fig. 5, clients send encrypted data to a server and request the evaluation of a function on this encrypted data. The computation operates solely on encrypted data, with the inputs and outputs encrypted using the client’s secret/public key, ensuring the privacy and security of the data.

Fig. 5
figure 5

The overview of homomorphic encryption approach

In the FL domain, local models and public and private keys are encrypted before being sent to the server. HE allows performing operations over encrypted models. So, clients are mainly involved in these solutions to protect their data.

Hao et al. (2019) adapted HE in industrial FL, which prevents data reidentification even though many clients collude with each other to attack the system. Pivot (Wu et al. 2020b) protects clients’ data against semi-honest adversaries. It is a hybrid framework with Threshold Partially Homogeneous Encryption (TPHE) and Multipartite Computation (MPC). Another framework, PFMLP (Fang and Qian 2021), transfers local models’ encrypted gradients instead of raw gradients. However, they showed that the accuracy in homomorphic operation after decryption did not change much compared to plain text data.

Rather than adapting technology to reduce reidentification, Wei et al. (2020b) presented a framework for evaluating and comparing different forms of client privacy leakage attacks and methods to solve adversaries. The framework first provides experimental evidence of data reconstruction from model parameters. They then investigated how different hyperparameter configurations, serial compression ratios, and different settings of attack algorithms influence attack effectiveness and cost.

Despite various attempts to address re-identification challenges, recent research (Naseri et al. 2022) has demonstrated that privacy attacks can still succeed even with privacy-preserving mechanisms.

In response, blockchain techniques have emerged as a potential solution, offering benefits in terms of auditability and consented data sharing. However, the computational complexity and user-friendliness of blockchain approaches remain significant drawbacks. Nonetheless, adopting a single blockchain approach can simplify the environment by eliminating the need for multiple techniques.

In Table 4, a summary of privacy management techniques is provided, focusing on auditability, consented data sharing, data granularity, and re-identification. The table highlights the technologies utilized in state-of-the-art approaches for addressing these privacy concerns.

Table 4 Summary of privacy management approaches

4.3 Solutions for incentive management challenges

The literature used different techniques for incentive management in FL such as shapely value (Yu et al. 2020a; Song et al. 2019; Wang et al. 2019a; Lim et al. 2021), contract theory (Kang et al. 2019a, b; Saputra et al. 2020), auction theory (Zhang et al. 2021e; Le et al. 2020; Zeng et al. 2020), game theory (Tu et al. 2022; Sarikaya and Ercetin 2019; Ng et al. 2021), blockchain (Weng et al. 2019; Zhang et al. 2021f), and reinforcement learning (Zhan et al. 2020; Jiao et al. 2020). In this survey, we will not delve into these techniques as our primary focus in incentive management is addressing the client-side challenge of “Transparency”. Blockchain technologies and visual analytics tools are leveraged in conjunction with the aforementioned technologies to achieve transparency in incentive calculation.

4.3.1 Blockchain based solutions

Blockchain, a decentralized peer-to-peer digital ledger, offers robustness. Integrating blockchain into FL requires platform-based modifications to enable transparency. Additionally, blockchain addresses server-side challenges such as identifying malicious clients, task publication, client selection, incentive calculation or allocation, and regulatory compliance, making it an attractive solution for FL.

FLchain (Bao et al. 2019), DeepChain (Weng et al. 2019), and FIFL (Gao et al. 2021) are reputation-based incentive approaches that adapt blockchain to prevent malicious transactions by storing and monitoring all transactions. The probability of receiving rewards on the blockchain nodes is determined based on the client’s confidential, transparent, and auditable previous rewards.

The studies (Zhang et al. 2021e; Toyoda and Zhang 2019) are incentive approaches based on auction theory where auxiliary functions such as task request, client selection, incentive allocation, and logging are made transparent by blockchain. Based on the data in the blockchain, rewards are transparently distributed among clients. FedCoin (Liu et al. 2020a) immutably records the incentive allocations based on proof of Shapley protocol in the blockchain. Fedcoin does not rely on a central server to distribute payments between clients with non-repudiation and tamper-resistant properties.

Refiner (Zhang et al. 2021f) handles malicious participants and incentives by auditing records on the blockchain using trusted validators. Participants randomly select validators to test local model updates with the validation data set. Incentives are distributed based on model quality assessed by validators.

An incentive mechanism based on Bayesian game theory, the Fedserving framework (Weng et al. 2021), adapted the blockchain to regulate transparent transactions between participants. They incorporated a “truth-finding” algorithm to learn accurate predictions and made them transparent using the blockchain.

While blockchain technology has successfully addressed the transparency and confidentiality issues in incentive mechanisms, it does come with certain drawbacks. The implementation of blockchain can be resource-intensive and costly. Moreover, a portion of the FL profits needs to be shared with blockchain miners who validate transactions on the network.

4.3.2 Visual analytics-based solutions

In addition to blockchain-based solutions, visual analytics tools play a crucial role in providing users with insights and transparency in the FL environment. These tools use graphical representations such as graphs, charts, and maps to visually present data and enable users to identify patterns and processes. For example, a study by Ng et al. (2021) implemented a visual analytics tool inspired by multiplayer games to enhance transparency in client incentive schemes. This tool offers clients an overview of the FL system, federation information, client statistics, data quality and quantity, market share changes, information about other clients, profit/loss details, and summary information. By leveraging these visualizations, clients can effectively assess their incentive scheme and make informed decisions.

Table 5 summarises the research based on all technologies in terms of the technology used and the factors driving the incentive scheme.

Table 5 Summary of incentive schemes in terms of technology and incentive factor

4.4 Solutions for resource management challenges

This section describes solutions to efficiently manage client resources such as data, computation, and communication costs.

4.4.1 Data management

Only a few studies focused on data management issues such as data imbalance and data representation. The literature used augmentation and reinforcement learning techniques for data imbalance and scarcity challenges. The majority of these approaches involve server or platform-based solutions, wherein the server simulates the clients’ environments by requesting certain data from clients and generating solutions to be implemented on the client side. It reduces the burden on users while trading off privacy.

4.4.1.1 Solutions for data imbalance

Data augmentation is an approach to address the data imbalance challenge by expanding the dataset through techniques such as generating slightly modified copies of existing data or synthesizing new data (Van Dyk and Meng 2001).

A server-side approach to address the data imbalance challenge was proposed in a study by Jeong et al. (2018), where the server simulated a client environment using a few data samples from clients and augmented the data to build a generative model. Shin et al. (2020) introduced a privacy-preserving XOR-based mixup data augmentation technique that involved adding encoded data from other clients to balance the training data. Duan et al. (2020) employed Z-score-based data augmentation and used Kullback Leibler divergence-based rescheduling to handle data imbalance. Wu et al. (2020a) utilized the synthetic minority oversampling technique (SMOTE) among trustworthy clients in a smart home setting, requiring platform-based changes. Apart from that, using reinforcement learning, Zhang et al. (2021b) addressed data imbalance without sending data to servers by optimizing client selection and global update frequency.

4.4.1.2 Solutions for data representation

Blockchain technologies have been widely utilized to address data storage management challenges. Martinez et al. (2019) leveraged blockchain to securely store client data, enabling secure uploading, recording, and tracking. In a similar approach, Moon et al. (2020) proposed an AI-based data management system that effectively manages data between servers and clients. This system stores important client data characteristics, such as size and distribution, to assist clients in organizing their data efficiently.

4.4.2 Computation cost management

Researchers have devoted significant attention to managing computational and communication costs, as these directly impact users’ participation, especially considering the resource-constrained nature of clients’ devices. Various techniques have been explored to address computation-related challenges, including computation reuse, edge/fog computing, algorithm optimization, blockchain, reinforcement learning, and clustering.

Nour et al. (2021) proposed a computation reuse approach to store model parameters of previously executed tasks with a high probability of repetition, aiming to eliminate redundant computations. Edge-assisted FL techniques, as explored in studies by Ren et al. (2019), Ji et al. (2021), and Al-Abiad et al. (2021), leverage edge computing to reduce the computational burden on clients. These approaches allow clients to compute model parameters locally or offload computations to edge devices based on resource availability. Wang et al. (2021c) introduced the use of high-altitude balloons (HABs) as flying wireless base stations to offload clients’ computational burden. These HABs dynamically adjust user association, service sequence, and task partition schemes to cater to clients’ needs over time.

The studies (Xu et al. 2019b; Chen et al. 2020a; Prathiba et al. 2021; Do et al. 2021) optimized the FL algorithm on client devices or the global model in each training cycle to achieve better computational allocation. The server handles most optimization techniques, although they need to be executed on the client devices. The intelligent UDEC (I-UDEC) framework (Yu et al. 2020c) combines reinforcement learning and blockchain to obtain computation offload decisions in real time with low overhead and resource allocation strategies.

A resource management scheme based on clustering was proposed by Balakrishnan et al. (2021). The authors clustered clients according to their data and learned a federated meta-model from a subset of clients within each cluster. This approach allowed for efficient model building by organizing the process based on client clusters, resulting in personalized results while reducing communication and computation time.

4.4.3 Communication cost management

Researchers have explored several approaches to effectively manage communication costs in FL. These include compression techniques, reducing communication rounds, minimizing communication distance, and optimizing client selection.

4.4.3.1 Data compression

Model compression schemes such as sparsification and quantization are widely used in the literature to reduce the size of local and global models during transfer (Sattler et al. 2019a). Sparsification methods limit the changes to only a small subset of the model parameters to reduce the entropy of the updates. Several studies adapted sparsification techniques such as transferring only model gradients greater than a predefined threshold (Strom 2015), updating only significant gradients (Li et al. 2020d), uploading the model after gradient sparsification (Li et al. 2020c, 2021c; Asad et al. 2020), optimally compressing parameter matrix of model convolutional layers (Zhou et al. 2020), downstream and upstream model compression with an encoding of the weight updates (Sattler et al. 2019a), compressing model gradient to Count Sketch data structure (Rothchild et al. 2020), gradient perturbation (Hu et al. 2020a), using the sparse binary mask technique (Li et al. 2021a), and dynamically adjusting sparsity budgets of the gradient compression variables (Nori et al. 2021). However, sparsification compression methods may not be suitable for many clients.

Quantization methods reduce the entropy of the model updates by restricting all updates to a reduced set of values (Sattler et al. 2019a). The quantization compression scheme is adapted in many studies, such as quantizing each gradient update to its binary sign (Bernstein et al. 2018), stochastically quantize the gradients during upload in an unbiased way (Wen et al. 2017; Chang and Tandon 2020), using vector quantization technique per iteration (Dai et al. 2019; Shlezinger et al. 2020), applying encoding-based compression (Chai et al. 2020b; Malekijoo et al. 2021), and using lossy compression (Amiri et al. 2020). Although the existing approaches for managing communication costs in FL are theoretically sound and exhibit convergence properties, their empirical performance falls short compared to the sparsification method (Sattler et al. 2019a).

Konečnỳ et al. (2016) introduced a combination of sparsification and probabilistic quantization to effectively reduce communication delays. Their method significantly decreased uplink and downlink communication time, making it suitable for large-scale deployments involving numerous clients.

One drawback of data compression methods is the unavoidable loss of information during the compression process.

4.4.3.2 Reducing the Communication Rounds

Another approach to enhance communication efficiency is reducing communication rounds. This can be achieved by allowing clients to perform multiple local epochs before sending their results to the server. Instead of updating the server for every small model change, clients aggregate their updates and communicate less frequently. This approach helps reduce communication overhead and improves overall efficiency in FL. Federated Averaging (FedAvg) (McMahan et al. 2017) algorithm is commonly used to reduce the communication rounds of FL through periodic connections. FedMMD (Yao et al. 2018) adapted a two-stream model with maximum mean discrepancy to integrate more knowledge from the local and global models. However, this method increased the computational cost for the clients to reduce communication rounds.

The CMFL approach proposed by Wang et al. (2019b) effectively reduces communication overhead by controlling irrelevant client updates. Clients evaluate whether their updates align with the server’s feedback and contribute to model improvement before uploading them to the communication network. In contrast, Guha et al. (2019) introduced one-shot federated learning, which aims to learn a global model in a single round of communication across a set of clients. They leverage ensemble learning and knowledge aggregation to capture global information using client-specific models. Another strategy, presented by Avdiukhin and Kasiviswanathan (2021), is the adaptation of local Stochastic Gradient Descent (SGD). This approach allows clients to evolve locally on their own asynchronously and then average the sequences in a global server after multiple iterations.

However, these methods have limitations such as increased local computation cost, potential bias in the global model due to sampling, and the absence of consideration for data heterogeneity among different clients.

4.4.3.3 Reducing the communication distance

Edge computing is a notable approach mentioned in the literature to reduce communication distance. It involves deploying edge servers in close proximity to clients, facilitating communication within the edge computing infrastructure. Some studies, such as Wang et al. (2019c), Lu et al. (2020), and Liu et al. (2021a), have incorporated edge computing into their FL systems. Partial aggregation or partial training may also be performed on edge servers to further optimize communication.

Another technique to reduce communication distance is peer-to-peer learning, where clients can leverage the knowledge and expertise of other clients in the network. BrainTorrent (Roy et al. 2019) and Online Push-Sum (He et al. 2019) are examples of central server-free algorithms enabling clients to communicate exclusively with trusted neighboring clients. In the LotteryFL framework (Li et al. 2020a), a subnetwork is formed based on the lottery ticket hypothesis, allowing clients to learn personalized models instead of a single global model. Similarly, RingFed (Yang et al. 2021a) employs a ring topology instead of a star topology, enabling clients to communicate with each other before transmitting the final model to the server.

4.4.3.4 Client selection

Client selection serves as another mechanism to minimize the amount of data transmitted between clients and the server to reduce communication costs. These algorithms restrict the number of participating clients in a round (only a fraction of clients in a round). The client selection algorithms are mainly implemented on the server. Clients’ influence on these algorithms is limited.

The studies (Nguyen et al. 2020; AbdulRahman et al. 2020; Liu et al. 2021b; Nishio and Yonetani 2019) chose only a subset of the clients in each round to decrease the number of uploading clients. They performed sampling based on device capabilities regardless of the heterogeneity of the data. Cho et al. (2020) reduced client selection bias and addressed data heterogeneity by selecting the highest-loss clients. Instead of static sampling, Ji et al. (2020) and Zhuang et al. (2020) adapted dynamic sampling to choose the fraction of available client models and model parameters. Ribero and Vikalo (2020) used an optimal sampling strategy to select a subset of clients with significant weight updates. FedPaq (Reisizadeh et al. 2020) uses periodic global updates, partial participation of devices, and compression techniques for efficient communication.

Clients with limited resources and data often face bias in most of these approaches. However, recent studies have aimed to address the bias against clients with fewer resources and data by investigating the differentiation in local models and considering the availability of client resources.

Besides these main approaches, there are some other approaches such as ensemble (Hamer et al. 2020), model minimization (Bouacida et al. 2020; Kang and Ahn 2021), pruning (Jiang et al. 2019b), overlapping training and communication (Zhou et al. 2021), feature fusion (Yao et al. 2019b, a), and knowledge distillation techniques (Wu et al. 2021a) focused on communication efficiency (Table 6).

Table 6 Summary of resource management approaches

4.5 Solutions for data and device security challenges

This section examines the existing defense techniques for FL, focusing on three main categories: defense mechanisms against malicious clients or external attackers, defense mechanisms against malicious servers, and approaches for verifying participants and models. These techniques aim to protect clients from adversarial attacks and mitigate potential risks in the FL environment.

4.5.1 Defence mechanism against malicious clients or external attackers

Data/model poisoning is a common attack in FL where malicious attackers (can be clients or malicious agents who take control over clients) incorporate malicious data in the training phase or manipulate the global model using fake data. This attack greatly affects the client as the global model predictions will be incorrect. The literature proposed defence mechanisms such as similarity-based approach (Cao et al. 2019), generative adversarial network approach (Zhao et al. 2019), validation test set-based approach (Wang et al. 2020b; Vy et al. 2021), notions of stealth approaches (Bhagoji et al. 2019), model-agnostic defence technique (Manna et al. 2021), and anomaly detection techniques (Shen et al. 2016; Wan et al. 2021; Li et al. 2021g).

A backdoor attack is a method of injecting a malicious task into an existing model without compromising the accuracy of the original task. This attack aims to introduce a hidden trigger or pattern that can be exploited by an adversary to manipulate the model’s behavior. Backdoor attacks were defended using fine-tuning (Liu et al. 2018), model pruning (Jiang et al. 2022), clients’ contribution similarity (Fung et al. 2018), reverse engineering (Zhao et al. 2021), additive feature importance strategy (Manna et al. 2021), and testing the clients’ data accuracy on the high-quality test set belongs to the central server (Su et al. 2022).

In addition to the defense-oriented methods discussed earlier, there are other techniques such as knowledge distillation (Li and Wang 2019) (sharing only the knowledge instead of model parameters for security), pruning (Jiang et al. 2019b) (reducing the model size without affecting the accuracy), multi-task learning (Smith et al. 2017; Sattler et al. 2020; Li et al. 2019a; Yu et al. 2020e) (personalized model to reduce the impact of affected global models) that contribute to the defense mechanism in FL. While these techniques have different primary objectives, they also contribute to improving the overall defense mechanism in FL.

4.5.2 Defence mechanism against malicious servers

In the FL environment, it is important to consider the possibility of malicious behavior from the central server. If the server is compromised, the impact of the attack can be severe, as the server has access to clients’ sensitive data through model updates. However, the literature on attacks originating from malicious servers is limited, with only a few studies specifically addressing this issue.

The studies (Mo and Haddadi 2019; Chen et al. 2020b) used Trusted Execution Environment (TEE) approach to defend against the malicious server. TEE allocates private memory regions to compute with hardware and software isolation. The server’s memory usage patterns are monitored every time to defend against the server’s malicious attacks. Each participant is compelled to execute secure and privacy algorithms in this environment. However, hardware changes are needed to adapt these approaches on the clients’ end.

Security consortiums within trusted clients and peer-to-peer learning techniques can also protect against a malicious server, as clients do not need to communicate with the central server. The studies (Roy et al. 2019; He et al. 2019; Yang et al. 2021a) built a peer-to-peer network with only trusted clients. Clients only need to know about their neighbors rather than the global network. Because these approaches bypass the central server or minimize central server communication, the impact of malicious server attacks is less than in the traditional FL network. But the client’s responsible for identifying other trusted clients within a huge network.

During the FL process, inference attacks pose a threat by attempting to extract sensitive information from clients. These attacks can be initiated by either the clients themselves or a potentially malicious centralized server involved in the FL system (Hu et al. 2021). Inference attacks are defended using techniques such as DP (Liu et al. 2020), knowledge distillation (Li and Wang 2019), secret sharing or secure boost protocol (Wang et al. 2020c), Generative Adversarial Networks (GAN) based algorithm (Zhang and Luo 2020), and fake data generation at client node (Triastcyn and Faltings 2020).

4.5.3 Verification of participants and models approaches

The verification process validates whether the model, clients, and server are trustworthy or attack-free. Wainakh et al. (2020) adapted the hierarchical FL to verify participants and models. Unlike FL, hierarchical FL is not controlled by a central server; It connects to multiple servers in a tree structure, leading to granular monitoring of clients.

Blockchain technology is used in many studies (Fang et al. 2022; Liu et al. 2020; Rahman et al. 2020; Yi Ming et al. 2021; Jiang et al. 2021) to verify the models and participants. Blockchain handles verification and stores the proofs of clients in the blockchain. Fang et al. (2022) secure the confidential properties of model gradients using a secure aggregation protocol. They verified the global model gradients using blockchain to avoid a possible tampering attack.

Table 7 provides a comprehensive summary of research works, categorizing them based on the malicious actor, attack types, solution techniques, and solution end. Various approaches have been developed to detect attacks, including checking accuracy, model similarity, client contribution, and client similarity. However, it is important to note that these approaches may impact resource-constrained clients, as they could be mistakenly identified as malicious. In recent studies, blockchain-based techniques have incorporated additional factors such as client model updates, user traces, and model participation to enhance the detection of malicious activities.

Table 7 Summary of data/device security management approaches

4.6 Solutions for fairness management challenges

This section is divided into two parts to examine state-of-the-art solutions: fairness measurement and the application of fairness in various disciplines of FL.

4.6.1 Approaches to measuring fairness in FL

With regard to measuring fairness, researchers formulated measurement metrics such as average variance, distance metric (such as cosine distance, euclidean distance, maximum difference), Pearson correlation coefficient, and Jain’s fairness index (Shi et al. 2021). However, the lack of standardization among these values poses a challenge in selecting a single metric. Factors such as the metric’s definition, trade-offs, and compatibility with other metrics need to be considered. Additionally, non-technical users should easily understand the chosen metric and encompass various aspects of fairness.

In the pursuit of standardization, Garg et al. (2020) introduced a mathematical framework that outlines the commonly used fairness metrics and their interrelationships. The relational representation helps users find the most suitable metric. Chu et al. (2021) proposed a new FL framework called FedFair to train models with high performance and fairness without violating client privacy. They propose an estimation method to estimate model fairness in a privacy-constrained environment that is more efficient than estimating fairness locally. The framework includes the fairness estimation function of the loss function as a constraint.

Rather than assessing the accuracy of the global model across all clients, Divi et al. (2021b) focused on evaluating the effectiveness of individualized models for each client. They examined whether the accuracy of personalized models improved for each user and observed a fair perception overall. To evaluate the quality of the personalized models, they introduced five performance metrics and four fairness metrics, which assessed whether the personalized models provided equal improvements for all users.

These approaches encompass platform-based or client-based algorithmic solutions that enable users to visualize and assess their fair treatment within the system using a range of metrics.

4.6.2 Practice of fairness in different disciplines

The concept of fairness is practiced in various FL disciplines, including contribution evaluation, client selection, model optimization, incentive mechanism, and social good. This section discusses the existing works practicing fairness in these disciplines, techniques, and adapted notions of fairness.

4.6.2.1 Client selection

Unfair treatment can start during the client selection process. However, many existing client selection approaches prioritize server interests, such as accuracy improvement and convergence rate, while disregarding the interests of individual clients. These approaches often prioritize clients based on factors like bandwidth, data quality, transmission speed, and computing power. Consequently, client selection can be unfair due to over-representation, under-representation, and the exclusion of certain clients (Shi et al. 2021).

Recent studies focus on reducing bias against under-represented clients (lower computational capabilities and smaller datasets). Huang et al. (2020a) modeled the client selection strategy as a Lyapunov optimization problem, where client participation rates were optimized through a dynamic queuing approach. The algorithm ensures that each client’s average participation rate equals the expected guarantee rate. Similarly, Yang et al. (2021c) proposed a multi-arm bandit-based algorithm to encourage the selection of under-represented clients. The choice depends on the class distribution of the data. Clients with minimal class imbalance will receive the highest rewards, while the system allows clients with a maximal class imbalance to participate in at least a specified number of rounds. Another approach Kang et al. (2020) used reputation measurement in terms of honesty and contribution to choose clients. Highly reputed clients get more opportunities to be selected than low-reputed clients.

4.6.2.2 Contribution evaluation

Contribution evaluation assesses the individual contributions of clients within the FL system without requiring access to their data. Various methods have been proposed to evaluate client contributions, including self-reported information, individual assessment, utility game, Shapley value, and empirical approaches (Shi et al. 2021).

The studies (Kang et al. 2019b; Sarikaya and Ercetin 2019; Zhang et al. 2020b; Le et al. 2021) use clients’ self-reported information to evaluate client contributions. Self-reported information can contain data quality & quantity, data collection costs, and computational & communication capabilities they can commit to FL. The server uses this information to assign ratings for the clients. This approach assumes that clients are trustworthy and capable of assessing their data environment. Clients and server have to be involved to achieve this approach’s fairness goal.

Individual reputation evaluation is based on the performance of clients on specific tasks. Reputation mechanisms are designed to track the clients’ reliability and contribution. Client reputation is calculated based on client validation accuracy (Lyu et al. 2020a), the similarity between local model-global model (Xu and Lyu 2020), loss function values (Song et al. 2021), and direct or indirect reputation of clients from history with task publishers (Kang et al. 2019a; Zhang et al. 2021e; Kang et al. 2020). A task publisher is responsible for assigning tasks to clients.

Along with reputation, Zeng et al. (2020) incorporates resource quality information to evaluate the individual contributions of clients. Another approach proposed by Lyu et al. (2020c) involves a mutual evaluation process among FL clients to assess their potential value. These approaches take into account the involvement of clients, server, and platform to achieve fairness in the system.

Utility games are employed to translate clients’ utility into rewards, offering another avenue for fairness adoption. Wang et al. (2019a) and Nishio et al. (2020) adopted the marginal loss approach to evaluate clients’ contributions. The concept of marginal loss suggests that a client’s gain is equivalent to the utility lost when the client departs from the system (Shi et al. 2021). Primarily, the server plays a significant role in this approach.

Shapley value evaluates contribution by calculating the weighted average of the marginal contribution from the utility perspective and the clients’ impact. The studies (Song et al. 2019; Wang et al. 2020a) evaluated the impact by calculating Shapley value in the entire training session. Wang et al. (2019a) used the Shapley value to calculate the feature importance in VFL instead of considering all client data. If a client has important features that greatly influence the model, then the client receives high Shapley values.

To reduce the computational cost of the aforementioned theoretical methods, Shyn et al. (2021) proposed FedCCEA, which approximates the client contribution using the sample data size weights in the model. The server is mainly responsible for this approach by getting sample sizes from clients.

4.6.2.3 Model optimization

The process of model optimization can also introduce biases in global models, favoring certain groups or relying heavily on a small subset of clients. As a result, the performance of the global model may excel for some clients while neglecting others. Fairness in model optimization aims to achieve an even distribution of accuracy across all clients.

The agnostic FL framework (Mohri et al. 2019) and FedFa (Huang et al. 2020b) optimization algorithm were built to avoid bias towards clients while optimizing the FL model. Mohri et al. (2019) naturally yields fairness for any target distribution with a mixture of clients with a data-dependent Rademacher guarantee. The FedFa combines the double momentum gradient method and weighting strategy. The weights are calculated based on information quantity and training frequency.

Another approach, Ditto (Li et al. 2021f), focused on building personalized models for each client (by allowing clients to fine-tune) closer to the optimal global model. A regularization term is added to make the finely tuned model closer to the global model. It reduces the variation in accuracy between clients by approximately 10% and simultaneously improves fairness and robustness. These works are based on the accuracy parity fairness notion.

Fed-ZDAC (Hao et al. 2021) applied a zero-shot data augmentation technique to under-represented client data to achieve uniform accuracy across clients. The augmentation algorithm generates pseudo-exemplars of unseen classes to avoid under-representation of the client. Hao et al. (2021) considered good-intent fairness notion to minimize loss of underlying protected client classes. Michieli and Ozay (2021) proposed a fair aggregation algorithm, FairAvg, showing that the fair algorithm is useful in terms of accuracy and convergence rate.

Xu and Lyu (2020) proposed RFFL framework based on contribution fairness. They maintained a client reputation scheme based on clients’ contributions via local model updates. The global model is weighted according to the client’s reputation. Another approach is CFFL (Lyu et al. 2020a), where each client receives a different global model corresponding to their reputation. Alvi et al. (2021) also regulated global model quality according to the client’s contributions and costs. The server adds noise to the global model based on the quality of the local model. They regulated utility fairness via adaptive calculations and transmission policies.

q-FFL (Li et al. 2019b) realized accuracy parity using fair resource distribution. They assigned more weight in aggregation to clients with higher losses. The degree of fairness can be adjusted by tuning q. It is a multi-objective algorithm to optimize the loss function of each client individually without sacrificing performance. Their approach to attain fairness in the optimization function, as defined in Eq. (5), involves reweighting the objective function of the traditional FL function (refer Eq. (1)). In this approach, they assign higher weights to devices with poor performance, thereby shifting the distribution of accuracies in the network towards greater uniformity. For given local non-negative cost functions \(f_k\) and parameter \(q > 0\), they define the objective function as in Eq. (5).

$$\begin{aligned} min_{w}\mathcal {F}_{q}(w) = \sum _{k=1}^{m}\frac{p_{k}}{q+1}f^{q+1}_{k}(w) \end{aligned}$$
(5)

The term \(f^{q+1}_{k}\) represents \(f_k\) raised to the power of \(q+1\). The parameter q controls the degree of fairness we aim to achieve. When \(q=0\), fairness remains at the level of the classical FL objective (refer Eq. (1)). A higher q places greater emphasis on devices with higher local empirical losses (\(w\)), leading to a more uniform training accuracy distribution and the potential induction of fairness. \(m\) mathematical m as represents the number of clients in the FL process.

FedFv (Wang et al. 2021d) was proposed to resolve conflicts between local models before averaging them when constructing a global model. The algorithm can handle two types of conflicts: internal conflicts (between selected clients) and external conflicts (between selected and unselected clients).

However, it is important to note that these algorithms assume identical data distributions in all scenarios. In reality, data distributions are dynamic; therefore, it is crucial to consider the applicability of these algorithms in dynamic situations.

4.6.2.4 Incentive mechanism

FLI (Yu et al. 2020a) dynamically and fairly allocates incentives to clients in a context-aware way. A given budget is equitably divided among clients to maximize utility and minimize inequality among clients. The algorithm satisfies contribution fairness, regret distribution fairness, and expectation fairness.

Several studies have focused on rewarding clients based on their contribution rate (Zhang et al. 2021e; Zeng et al. 2020; Cong et al. 2020). Likewise, previous studies (Kang et al. 2019b; Fan et al. 2021; Ye et al. 2020) distributed incentives based on data quality using Shapley value and contract theory methods.

In addition to monetary incentive schemes, the hierarchical fair FL framework proposed by Zhang et al. (2020b) focuses on rewarding clients based on their contribution rate. This approach classifies clients into different levels based on the quality or quantity of data and distributes models at the client level. Similarly, Lyu et al. (2020c) divided clients into clusters and trained one model for each cluster.

The achievement of fairness in incentive mechanisms requires collaboration among clients, servers, and platforms.

4.6.2.5 Social good

The social good ensures that the model is not biased toward a specific individual or group. It is subject to the concept of group fairness. Ezzeldin et al. (2021) proposed a fairness-aware aggregation algorithm using debiasing strategies to provide a fair model across sensitive groups (such as race and gender). Likewise, Yue et al. (2021) obtained group and individual fairness by using a regularisation term to give more weight to low-performing individual clients or groups.

Rodríguez-Gálvez et al. (2021) introduced a modified method of differential multipliers to minimize empirical risks with fairness constraints, thereby enforcing group fairness in private FL. Padala et al. (2021) presented an ethical FL model to achieve demographic parity and equalized odds. Demographic parity indicates that the model’s prediction must be independent of a sensitive attribute. Equalized odds focus on equating false positive and negative rates among different groups or individuals.

Zhang et al. (2020a) focused specifically on discriminatory bias against demographic groups. They addressed the challenges of fairness-performance trade-off, constrained coordination, and information limitation in privacy-sensitive FL settings by adapting a deep multi-agent reinforcement learning framework and a secure aggregation protocol. Another study, Zhang et al. (2021c), solved the unified group fairness problem through an optimization algorithm. They simultaneously investigated attribute level, client level, and agnostic fairness.

These solutions primarily operate at the platform level, with minimal involvement from clients. Table 8 provides an overview of fairness approaches, including fairness measurement, its application in various disciplines, and existing research and solutions.

Table 8 Summary of fairness approaches

5 Discussion

In this section, we examine prior studies by considering the impacts of a given solution in addressing a specific client-side challenge over other challenges. We classify the impacts between these challenges into three groups: those with a positive impact, those with a negative impact, and those that can have either a negative or positive impact on other challenges.

The positive impact category denotes that a solution targeting a specific challenge can also be effective for addressing multiple challenges simultaneously, resulting in time and effort savings. Conversely, solutions falling under the negative impact category may inadvertently exacerbate other challenges while attempting to resolve one. The third category encompasses solutions that can have either a positive or negative impact on other challenges, depending on the specific approach employed. While addressing one challenge may aid in resolving another, it can also inadvertently unveil or intensify other challenges in certain cases.

Table 9 illustrates the interrelationships among various challenges, based on the solutions available in the existing literature, along with the impact of these solutions on model performance. Each row represents the main challenge being addressed in the research, and each column represents a secondary challenge. The cells indicate the impact of a solution for the main challenge on the secondary challenge. As an example, consider the intersection of the “personalization” row and the “privacy” column. The corresponding solution emphasizes personalization and investigates its implications for privacy. However, it is observed that this approach may have a negative effect on privacy management.

Table 9 The relationships between challenges and performance

5.1 Positive impact

Personalization solutions offer the opportunity to establish incentive mechanisms based on the best model, incorporating individual client contributions to incentivize clients. Clustering techniques within personalization can further assist in incentive mechanisms by grouping similar clients, aiding in the allocation of incentives effectively.

Personalization solutions can also contribute to fairness and robustness in model performance. A study by Li et al. (2021f) found that personalization solutions can improve performance in all three of these disciplines. Personalization helps to improve robustness by allowing the global model to be customized based on individual client data. This can help to protect against adversaries attacking the global model, as the impact on individual client performance is mitigated through personalization. Personalization solutions can also improve fairness by reducing the accuracy parity among clients through personalized models that are based on individual client data.

In addition, fairness solutions related to client selection, model optimization, and contribution evaluation can contribute to personalization as they consider the individual client’s contribution to model building. This allows highly contributed clients to achieve high performance while also giving under-represented clients the opportunity to have their contributions recognized. These solutions can also help in resource and incentive management by distributing resources and incentives among clients in a fair manner.

The auditability problem under privacy management is usually solved through blockchain technology. Due to its auditing feature, blockchain technology can help with many other challenges, such as incentives, data, and security management.

Privacy and security mechanisms can work together to provide mutual benefits. Implementing privacy mechanisms can help to reduce information sharing and protect against adversarial attacks, while a secure environment can minimize the risk of privacy violations.

Incentive approaches based on client reputation can have a positive impact on both security and fairness. Client reputation is often determined by the performance and contribution of the client to the environment. These measurements can be used to identify honest (high contribution) and dishonest (low contribution) clients in the network, which can help security mechanisms to be more effective. Additionally, most incentive mechanisms are based on client contribution and data quality, which helps to ensure that fairness is maintained.

5.2 Negative impact

There is a trade-off between privacy and personalization challenges. For example, the clustering approach for personalization may compromise privacy as it requires additional client information, such as data distribution, data size, and client location. Similarly, incentive schemes may also negatively impact privacy by requiring auxiliary information about clients in order to distribute incentives fairly, which can reveal more client data.

Privacy approaches may negatively affect resource management as privacy algorithms (e.g. DP) require additional server computation and transmission power. Security approaches can negatively impact fairness as unique clients can be identified as malicious, leading to unfairness.

Except for privacy and resource management solutions, all other solutions tend to positively impact performance. Privacy approaches may reduce performance by adding noise to model parameters or data before building the model. On the other hand, performance-oriented algorithms that require large amounts of data and resources may improve performance but may also contribute to unfairness by eliminating low-performing and resource-constrained clients from the FL process. Resource management algorithms, on the other hand, may reduce resource usage, which can impact performance.

5.3 Negative or positive impact

One solution for the personalization challenge is clustering, which positively impacts resource management, while another personalization approach, fine-tuning, negatively affects it. Clients can be grouped by clustering based on location, performance, and resource availability. Due to clustering, clients do not need to communicate with the server frequently, thus reducing communication costs constantly. But fine-tuning approaches require additional client computing resources for tuning.

Efficient solutions addressing communication management challenges, such as data compression and reducing communication rounds, can contribute to mitigating security and privacy concerns. Data compression techniques can make it more difficult for adversaries to access client data while reducing the number of communication rounds minimizes the amount of data exchanged between clients and the server, reducing the risk of data being compromised. However, the use of edge-assisted FL technology, which is used to manage computation and communication costs, can increase the risk of privacy and security breaches. This is because it requires clients to send raw data to the edge server, which can increase the risk of privacy and security violations.

Fairness measurement, accuracy parity, and good-intent/group fairness approaches can all contribute to privacy management. Fairness measurement can help with privacy management by making the FL process transparent and visible to clients, addressing the issue of auditability. Accuracy parity and good-intent/group fairness approaches can help to mitigate the risk of re-identification by reducing differences between individuals or groups. However, the self-reported information solution used in contribution evaluation may increase the privacy risk, as it requires clients to report data quality and quantity, data collection costs, and computational and communication capabilities to the server for review. This can expose sensitive information about the client.

Most of the fairness approaches negatively impact security because fairness approaches reduce disparity among clients, reducing the chance of detecting malicious clients using anomaly detection. From the security perspective, the disparity can help distinguish between malicious and honest clients. It is applicable vice-versa, too. However, on the topic of fairness, an approach known as “client-reputation measurement regarding honesty and contribution” can be employed to identify dishonest clients.

5.4 A solution applicability for many challenges

This section emphasizes the importance of considering the applicability of solutions to multiple challenges and understanding the interrelationship between these challenges when designing solutions. By doing so, it is possible to reduce system complexity and avoid duplicative efforts in the federated environment.

Blockchain technology has been used in the literature to address privacy, data computation, incentive, and security management challenges. This is because blockchain has many features, such as robustness, immutability, transparency, append-only, and auditability, that make it suitable for addressing a wide range of challenges. Researchers can consider collaborating with blockchain technology to address various challenges, as it has the potential to simplify solutions by combining different methods in a single system.

Likewise, certain personalization approaches can also address fairness and security challenges. For instance, Ditto (Li et al. 2021f) is a personalization solution that offers both fairness and security benefits. It is important to analyze other personalization approaches to identify their potential benefits in different aspects.

6 Open challenges and trend of future works

In this section, we explore open challenges and future research trends by examining the reviewed research articles, surveys, and our own insights. As we delve into these discussions, one potential avenue for future investigation involves examining the impact of solutions for specific challenges on other related challenges.

  • Personalization challenges:

    • Impact of personalization methods on other challenges: For example, Ditto (Li et al. 2021f) evaluated the fairness and robustness benefits of the proposed personalization method. Therefore, future research could focus on the impact of other personalization solutions on different challenges.

    • Context-aware personalization: Developing context-aware techniques in FL is a potential open problem. The consideration of sensitive contextual information in FL is an ongoing topic of interest. While FL does not involve data transfer to third-party applications, the question of whether context information can be leveraged to enhance personalization without compromising privacy requires further investigation.

  • Incentive management challenges:

    • Incentive schemes based on other values (except monetary value): While we have discussed various incentives such as model performance, reputation, computational power, auxiliary information, and model fairness, there is limited research on other incentive schemes. Exploring and studying additional incentive approaches in the context of FL would be a valuable direction for future research.

    • Incentive schemes with multiple servers: Almost all literature focused on the one-to-many relationship where one server with multiple clients (monopoly market) (Shi et al. 2021). Clients have no option to choose another server if they are not convinced by the offer. This area needs further study to create a non-monopoly market with multiple server options.

  • Privacy management challenges:

    • Privacy and performance trade-off: Current approaches (such as DP) forfeit performance and computation to enable privacy for clients. Though researchers are working on finding an optimal point to manage privacy and performance, the privacy-utility trade-off is still open to researchers.

    • Dynamic settings with context: Privacy approaches in the literature are static, consistently using the same noise level and settings. However, clients’ preferences may vary depending on the context.

    • Explainable AI: Explainable AI refers to building tools or frameworks to describe ML models in a human-understandable format. Applying explainable AI concepts in FL is still an open problem.

    • Granular privacy management and consented data sharing: Very little literature focuses on granular privacy management to meet the diverse privacy needs of clients.

    • Sharing less sensitive model (Lo et al. 2021b): Since data can be inferred even after DP mechanisms are applied, it is useful to have mechanisms to understand the model’s sensitivity before it is shared.

    • Compliance with regulatory (Lo et al. 2021b): The application of regulatory compliance for FL (model exposure, model retention) is still underexplored.

  • Resource management challenges: While significant research has been conducted on computational and communication management in the context of FL, relatively less attention has been given to data management aspects. The focus has primarily been on improving performance through computational and communication strategies, leaving room for further exploration and investigation of data management techniques in FL.

    • Handling unlabelled data (Lo et al. 2021b): Labelled data may not always be available to clients, and labeling is also expensive. Some potential approaches, such as semi-supervised learning (Lo et al. 2021b) to label data based on other clients’ data, can be expected in the future.

  • Security: Current literature mainly focuses on attacks from malicious clients rather than from malicious servers.

    • Security approaches for malicious servers: In literature, only a few studies (Mo and Haddadi 2019; Chen et al. 2020b) focused on solutions for malicious servers. More theoretical and empirical studies are needed to address malicious server problems.

    • Consortium among clients: To avoid malicious attacks from clients or servers, clients can cluster within themselves and form a consortium among themselves without concern of malicious server. The grouping may be based on the reputation of clients.

  • Fairness: The concept of fairness has recently received extensive attention in ML. However, applying these methods in FL is not straightforward due to the data distribution. Thus, new techniques should be introduced in FL.

    • Fairness approaches in FL life cycle: FL consists of different stages of data processing, such as pre-processing (collecting clients’ data, feature selection/modification, data synthesis), in-process (building local models, adding global models), and post-processing (result prediction). Introducing fairness at each stage can enable fairness-aware FL.

    • User interactive fairness system: A framework for setting the boundaries of clients’ expected fairness is appreciable. Clients can visualize and define their own fairness expectations in the framework.

7 Conclusion

To the best of our knowledge, this study is the first survey of client-side challenges in FL. We conducted this systematic survey by analyzing the literature and categorized the client-side challenges into six broad categories: personalization, privacy management, incentive management, resource management, data and device security, and fairness. We also presented the available state-of-art solutions for the identified challenges. In addition, we conducted an analysis of the relationships between challenges, trade-offs in addressing them, and the applicability of solutions. Based on this analysis, a potential future research direction would be to explore the impact of addressing one challenge on others. By applying a solution to multiple challenges, it is possible to reduce system complexity and eliminate redundant efforts.