1 Introduction

Long gone are the days of people getting their news information from news channels or newspapers. Consumers continue to shift away from these traditional sources and are moving more towards social media for their news consumption [45]. Social media has attracted over 2.4 billion internet users by easing access to not only textual content but also multimedia and external websites. In a perfect world, everything reported would be entirely based on facts so that one could trust that the consumed media is reliable; however, this is not the case in the real world [10, 14, 21, 57]. Misinformation and rumors are becoming highly prevalent [2, 79], and such ease of access and fast dissemination of information on social media has unfortunately been exploited for spreading fake news [80].

Fig. 1
figure 1

An example to show truth-campaigning. (a). shows the network, where nodes in red color (node no. 6 and 14) are fake news spreaders, (b). shows fake information propagation, where the nodes in pink color received fake information, and (c). shows truth-campaigning process, where nodes shown in dark blue color (node no. 11 and 13) are seed nodes to spread counter-true information, and light blue nodes are the saved nodes as they accepted counter-true information and would have received fake information in the absence of truth-campaigners

A famous example of the impact of fake news on society is the US elections of 2016. Facebook provided a testimony that 126 million Americans were exposed to Russian-backed, politically-oriented fake news via Facebook during the presidential election campaign of 2016 [10, 46]. In the aftermath of these elections, as the extent of fake news could be observed, three scientists from the Ohio State University [61] - Gunther, Beck, and Nisbet- explored whether people might have changed their vote due to fake news. They interviewed a group of people who voted for Obama in 2012 and looked at their votes in the 2016 election. The voters were asked how much they believed in three statements that, according to an independent analysis, had been promoted by fake news while being false. These statements were: (i) Hilary was in poor health due to serious illness, (ii) Pope Francis endorsed Trump, and (iii) during Hilary’s time as Secretary of State, she approved weapon sales to Islamic Jihadists like ISIS. Although the majority of people did not believe these assertions to be true, there was a very substantial correlation between their beliefs and voting behavior. Among those voters who did not believe in any of the three statements, \(89\%\) did cast their vote for Hilary in 2016; among those who believed in just one of the items \(61\%\) voted for her, but among those who believed in at least two of these fake news only \(17\%\) supported Hilary Clinton. However, these results must be taken with consideration, as this correlation does not imply the causality of Hilary’s defeat. This shows how big the impact of fake news was on the election and how misinformed voters could change their votes. Besides elections, the adverse impact of misinformation propagation has also been observed during COVID-19 [18, 69]. Bermes [6] investigated the link between information overload and sharing of fake news during the COVID-19 pandemic using the stressor-strain-outcome model. The findings suggested that perceived information overload has a negative impact on users’ psychological strain, leading to a higher likelihood of sharing fake news. Misformation propagation can be used to manipulate users for political agenda, marketing, riots, rumor spreading, or misinformed behavior.

In literature, different types of methods exist to mitigate fake news, including immunizing/blocking nodes, truth-campaigning, and fact-checking assistance tools [80]. It has long been pointed out that actively broadcasting the true counter information is much more effective in minimizing the inverse impact of fake news than merely immunizing some nodes, also known as user blocking [81, 98, 99]. Immunizing the users by requesting them not to spread the fake news when encountered might have feasibility problems, as the effectiveness of the method will rely on the willingness of the selected users to follow the recommendation. The truth campaigning approach, on the other hand, might offer higher feasibility as users are more likely to accept and follow the recommendation as a result of the provided knowledge about the real fact. Studies have shown that once users have both fake and true information, they are more prone to explore further the event or topic and believe in true information [49, 63, 81]. In truth-campaigning, given a set of source nodes of the fake news propagation in an online social network, we aim to find a set of nodes, called truth-campaigners, to propagate counter-true information to minimize the impact of the fake news [75, 78, 80]. A small example of truth-campaigning is shown in Fig. 1, where node no. 6 and 14 are fake news spreaders, node no. 11 and 13 are truth-campaigners, and light red and light blue nodes are the nodes that accepted fake and counter-true information, respectively. It explains how truth-campaigning reduces the impact of fake news on the network.

In a social network, different groups or communities are unequally represented, and typically, minority groups are disproportionately absent from advantageous positions that create a diversity gap [38, 47]. The impact of network inequalities on the fairness of social interventions for influence maximization or awareness spread is well-known [87, 89]. There have been proposed various techniques to combat fake news over social media; most are collected in these survey [44, 80, 82]. However, none of the proposed methods considers the structural biases of social networks and fairness in fake news mitigation. The state-of-the-art methods aim to minimize the total number of users who believe in fake information and do not consider the minor and underrepresented communities in the network. This could result in the exclusion of these communities from the benefits of the intervention, which can have important societal repercussions. In this work, we highlight this issue in influence blocking and propose a fairness-aware truth-campaigning method to combat fake news.

We aim that the outcome of the truth-campaign should be fair for all communities irrespective of their size or any other characteristics, also known as group fairness [28, 73]. The fairness is considered using the Maximin fairness constraint, which strives to maximize the saved nodes for the community having a minimum fraction of saved nodes. This notion of fairness relates to the legal notion of disparate impact, which states that a community has been unfairly treated if its “success rate” under a policy is substantially lower than that of the other communities. The proposed solution aims to maximize a fairness-aware objective function while identifying the truth-campaigners to minimize the spread of fake information. The proposed method, called FWRRS (Fairness-aware Weighted reversible reachable system), relies on the weighted reversible reachable tree structure [93] to approximate the blocking power of each node in order to select those that achieve the best and fairest outcome. We verify the proposed method on real-world and synthetic networks, and observe that the proposed method is fair as well as saves the maximum number of nodes as compared to all considered baselines. We also observe that considering fairness while selecting early seed nodes reduces the price of fairness for later stages and makes the method more effective. Therefore, it provides higher efficiency as compared to baselines as the number of seed nodes increases.

To the best of our knowledge, this is the first work that has considered fairness while minimizing the negative spread of information propagation. The main contributions of the paper are briefly discussed below.

  1. 1.

    We propose a fair truth-campaigning method, called FWRRS, to minimize fake news propagation (explained in Section 5).

  2. 2.

    We evaluate the proposed method and compare it with state-of-the-art baselines on real-world social networks and synthetic networks. The experimental evaluation also highlights the long-term effectiveness of using the proposed fairness-aware method.

  3. 3.

    We also propose two fair baseline methods - (i) Fair-CMIA-O truth-campaigning method that extends CMIA-O (heuristic method based on Campaign- oblivious independent cascade model, Maximum Influence In-Arborescence and Maximum Influence Out- Arborescence) method [94] using maximin fairness constraint (explained in Appendix A), and (ii) Parity Seeding that extends fair Influence maximization method proposed in [86] for truth-campaigning.

  4. 4.

    We propose a network generating model, called HICH-BA (HIgh Clustering Homophily Barabási-Albert), that can be used to create synthetic networks having multiple communities of the given distribution with the desired homophily and clustering coefficient (explained in Appendix C).

The rest of the paper is structured as follows. In Section 2, we discuss related work to fake news and fairness-aware social network analysis. Section 3 explains the information propagation model used in the study. In Section 4 and 5, we discuss problem formulation and the proposed method, respectively. In Section 6 and 7, we explain the experimental setup and results of the proposed method on real-world and synthetic networks, respectively. The paper is concluded in Section 8 with future directions.

2 Related work

In this section, we discuss related work to detect fake news, influence blocking, and algorithmic fairness in social network analysis.

2.1 Fake news detection

Fake news detection has been a challenging problem and aims to automatically identify the fake news [34]. To classify a piece of news (also called meme or microblog) as fake or not, there has been used different features, such as (i) content-based features, (ii) user-based features, and (ii) network-based features [79]. Content-based features are the linguistic features of the post, such as the sentiment of the text, stance of the post, subjectivity and readability of text, usage of function words, external links (URLs), hashtags, lexical and syntactic features [13]. User-based features focus on the users who posted or reacted to the post by liking, sharing, commenting, or replying. These features include users’ profile features, such as followers and friends count, the total number of posts by the user, user’s bio, user’s account age, user’s verification, and so on [9, 91]. Network-based features consider users’ connections in the network and the propagation network of the news on social media [53, 83].

Fake news detection methods focus on classifying the features of fake news from real news. [9, 33, 62] studied that the linguistic features of fake news are different from true (real) news. The other main differences are fake news might contain deceptive language [71], lack objectivity (might be biased towards an ideology) [66], the users involved with fake news might have lower credibility [84, 88], or it might contain false multimedia images or videos [36, 67], According to research by Vosoughi et al. [92], fake news spread differently and quicker than real news, with a wider reach. For detecting fake news based on its propagation, methods take into account social context information, such as who shared the news, how it spread through the network, and the connections between engaged users [23, 53, 83]. Given the complexity of fake news detection models, in recent years, the focus has also been on designing explainable fake news detection methods [17, 37, 58].

2.2 Influence blocking maximization

The problem of fake news mitigation is also referred to as influence blocking maximization that aims to maximize the blocking of the spread of the given fake information. The influence blocking maximization problem is NP-hard, and its objective function has been proven under most propagation models to be monotone sub-modular [54]. This implies that there exists a greedy algorithm that offers a solution with an approximation ratio of \((1- 1/e - \epsilon )\) to the optimal solution, where \(\epsilon \) depends on the accuracy of the influence range of each node, estimated using Monte Carlo (MC) simulations. However, this greedy method is very slow as a consequence of numerous and time-consuming required simulations. It is not feasible to apply them in real life, especially for large-scale networks encountered in social media. Therefore, several fast and scalable approaches for the IBM problem have been proposed. Most proposed methods can be grouped into two categories: (i) heuristic and (ii) approximation methods. There have also been other proposed methods based on reinforcement learning [30, 35], GCN [31], edge-blocking [97], and swarm intelligence [15].

The heuristic-based approaches rely on the characteristics of each node to make the selection of the k most suitable nodes for the truth campaign. Most of these methods identify seed nodes using different centrality measures [76]. Yao et al. [95] proposed selecting truth campaigners based on eigenvector centrality (EVC) [96]. A high EVC value for a node means it is connected to many nodes who themselves also have high EVC values. They show that under the Competitive Linear Threshold Model (CLTM), it performs better than the state-of-the-art algorithms. Other heuristics have also been applied, such as degree, clustering coefficient [26], betweenness centrality [11], or percolation centrality [64]. The performance of these metrics and some others, for e.g., random, were evaluated by Erd et al. [25] under two cost functions. One of these considered a degree cost penalty making highly-connected nodes more expensive, and the other one considered equal cost across all the nodes in the network. It was shown that under the Multi-Campaign Independent Cascade Model (MCICM), degree, betweenness, and percolation centrality offered the best performance. However, they required expensive nodes for their success; thus, these methods under-performed with the degree penalty cost function.

Under this category, methods exploiting the network structure have also been proposed. These rely on the community structure of the graph and will be considered as community-based heuristic methods. Lv et al. [54] proposed a method that allocated resources to each community proportional to the fraction of negative seed nodes within that community and their power of infection, i.e., computed using Monte Carlo simulation. Similarly, the work of Arazkhani et al. [3] proposed to select the nodes with the largest degree, betweenness, and closeness centrality measures from the largest k communities as truth campaigners. The community-based heuristic methods aim to allocate resources to communities based on their risk of infection; this could lead to fairer outcomes as resources are fairly divided, though the approach might be unfair toward indirectly infected communities.

In approximation algorithms, the aim is to approximate the influence-blocking power of a node for selecting the top-k influence minimizers. He et al. [32] proposed the CLDAG algorithm that restricts the influence computation of a node v to its local area to approximate the influence blocking power. The proposed method reduces computation cost by carefully selecting a local graph structure for v to allow efficient and accurate influence computation. Wu et al. [94] also proposed two very similar methods, CMIA-H and CMIA-O, that exploit the characteristics of the multi-campaign independent cascade model (MCICM) with high-effectiveness property (assuming the probability of positive transmission is always 1) and campaign-oblivious independent cascade model (COICM), respectively. Their research showed that these methods outperform the considered heuristic approaches, such as degree or proximity, while remaining fast.

Despite the good efficiency of the CMIA-O, Lin and Dai [52] showed that its performance is far from optimal. The method relies on the construction of numerous subgraphs and the consideration of a vast amount of nodes as candidates for the positive seed, which is time-consuming. To improve this further, they proposed the Block Influence Out Graph (BIOG) approach, which claims to be significantly faster than CMIA-O while performing similarly. Song et al. [85] proposed TIB-solver method, in which, first, the set of nodes that might be infected by the rumor is computed, and then the threat of each of these nodes (threat of v is the expected number of nodes that can be infected by v) is calculated. Then they use weighted Reverse Reachable (WRR) trees to greedily select top-k nodes that save the most nodes in the given deadline. They use WRR trees to predict the probability of reaching a node with positive information, and use a different structure for the negative one, namely a DAG. In our proposed method, WRR trees are used to estimate the spread of both involved information types. The main drawback of the existing methods is that they simplify the network structure, such as removing self-loops, for estimating the blocking power of a node, which affects their effectiveness in reality. The proposed method, called FWRRS (Fairness-aware Weighted Reversible Reachable System), does not simplify the network structure; however, we exploit the structure of weighted reversible reachable (WRR) trees to estimate the blocking power.

2.3 FairSNA

To the best of our knowledge, fairness has not been considered for solving the Influence Blocking Maximization problem using immunizing nodes or truth-campaigning. In the extensive surveys on fairness in graph mining [22, 73], no existing work in this area was included. However, fairness has been considered in other related areas of information diffusion, such as the influence maximization (IM) problem. The IM problem was first formulated by Kempe et al. [39] and focuses on identifying a set of k initial adopters for maximizing the spread of certain information within a network. In the past three years, researchers have highlighted the bias in social influence maximization methods and have proposed fairness-aware influence maximization methods [1, 86].

Ali et al. [1] proposed to greedily choose the node that achieves the highest value for an objective function that balances both the total influence and fairness. In this work, fairness was defined using equality that aims to achieve an equal fraction of influenced nodes across all the communities. Thus, the objective function would punish nodes that would cause the distance between the most and least influenced communities to be large. They showed that guaranteeing such fairness comes at the cost in effectiveness. Following this work, Stoica [86, 87] showed that when the seed set size is large enough, promoting parity in the seed results leads to better parity in the outreach. Additionally, this diverse seed set taps into inactivated communities that are hard to reach from central nodes and thus leads to a better outreach, proving the benefits of fairness beyond a balance in the output. Halabi et al. [24] empirically demonstrated that fair solutions are often nearly optimal on real-world scale-free networks, and therefore the price of fairness is significantly low (less than 15%). Additionally, they proved that the algorithms that do not impose fairness constraints introduce significant bias.

Fairness-aware solutions have also been studied for other social network analysis problems, such as link prediction [41, 51, 55, 72, 74] and centrality ranking [90]. In this work, we define fairness for influence blocking maximization that aims that the impact of negative information should not be unfair to any community, especially small and less connected communities.

3 Preliminaries

3.1 Independent cascade model (ICM)

ICM is a stochastic model for simulating information diffusion on a network. Nodes in the network can be in one of two states: (i) active, if it has been influenced by the information, or (ii) inactive, if the node is unaffected by the information or unaware of it. Each edge (uv) has an influence probability that represents the likelihood of node u influencing node u. In ICM, the diffusion starts from a set of seed nodes whose state is changed to active. When a node becomes active, it attempts to activate its neighboring nodes with the influence probability of the connecting edge. This process continues until no new nodes are activated, as each infected node attempts to impact its neighbors once.

3.2 Multi-campaign ICM (MCICM)

MCICM extends the classic ICM by incorporating multiple information types that diffuse simultaneously [12]. For our analysis, let’s assume that there are two information types - fake news (negative) and counter-true information (positive). The negative information consists of the one we want to limit and the positive one is propagated to limit that. In this model, the nodes can be in any of the three states (i) negative active, (ii) positive active, or (iii) inactive, unaware of all diffusion taking place. Once a node changes its state to negative active or positive active, it remains in that state. Each edge has two associated probabilities of transferring positive and negative information. In the influence propagation, there are two sets of seed nodes for fake news and counter-true news, which have negative active and positive active states, respectively. When a node is affected by one of the information types, it will propagate to its neighbors with the respective influence propagation probability. When a node is reached simultaneously by both types, it is assumed that there is a natural order to the two campaigns, and in this work, the positive one will take effect. In our work, we use the Campaign-Oblivious Independent cascade Model (COICM)[12], where the probabilities of positive and negative dissemination are arbitrary and shared over the network. This means that both information propagates using the same cascade.

4 Problem formulation

Let’s assume that the given network is represented as \(G=(V, E)\), where V is the set of nodes, and E is the set of edges. Each edge (uv) is associated with an influence propagation probability that shows the ability of node u to infect or influence node v. The negative seed nodes, represented by a set \(S_N \subseteq V\), spread the misinformation in the network. The positive seed nodes, represented by a set \(S_P \subseteq V \setminus S_N\), spread the counter-true information in the network. At a given time, the state of a node u \((s_u)\) can be in any of the following three states \([-1, 0, 1]\), where \(-1\) shows that the node is negative active, 1 shows that the node is positive active, and 0 shows that the node is inactive and has not been influenced by any of the propagating information. \(s^-(G, S_N, S_P)\) is the number of nodes that are negative active in G if \(S_N\) spreads misinformation and \(S_P\) spreads the counter truth. The influence blocking power of positive seed set \(S_P\) is computed as follows:

$$\mathbb {B}(G,S_N,S_P)=s^-(G,S_N,\phi ) - s^-(G,S_N,S_P)$$

The influence blocking power is also referred to as ‘saved nodes,’ where saved nodes are those that would have been “infected” by the fake news if it was not for the truth campaign.

In the IBM problem, given a graph G(VE), a set of fake news spreaders \(S_N\), and a positive integer k representing the available budget for the mitigation campaign, the aim is to find the positive seed set \(S_P \subseteq V \setminus S_N\) of size at most k that minimizes the expected number of negatively activated nodes under the COICM propagation model. It aims to maximize the following optimization function.

$$\begin{aligned}&{\underset{S_P}{\text {maximize}}}\quad {\mathbb {B}(G,S_N, S_P)}\end{aligned}$$
(1a)
$$\begin{aligned}&\text {subject to}\nonumber \\&{|S_P|<=k,\; S_P\subseteq V\setminus S_N } \end{aligned}$$
(1b)

Fairness-aware Truth-Campaigning. In Fairness-aware Influence Blocking Maximization (FIBM), the goal is not only to maximize the blocking power of the positive seed set but also to ensure that the outcome of the truth campaign is fair. The fairness of the campaign will be measured using the maximin fairness constraint, choosing \(S_P\) to be the positive seed that maximizes the minimum proportion of nodes saved in each community. This relates to the legal notion of disparate impact, which states that a community has been unfairly treated if its “success rate" under a policy is substantially lower than that of the other communities [5]. Maximin aims to avoid the scenario in which some communities are disproportionately neglected from help concerning the remaining population. Thus, trying to maximize the minimum help all communities receive. Let \(\mathbb {M}(G, S_N, S_P)\) define the maximin value achieved by \(S_P\) in the graph G having a set of communities C and a set of fake news spreader \(S_N\). The maximin value is computed as follows: where \(\mathbb {B}_c(G,S_N,S_P)=s_c^-(G,S_N,\phi )-s_c^-(G,S_N,S_P)\) denotes the blocking power of the positive seed \(S_P\) within a community \(c\in C\).

Problem Statement. Given a graph G(VE) having communities C, a set of misinformation spreaders \(S_N\), and a given budget k for the mitigation campaign, we aim to identify positive seed set \(S_P\) using the following objective function.

$$\begin{aligned}&{\underset{S_P}{\text {maximize}}}\quad {\mathbb {B}(G,S_N,S_P)}{}{}\end{aligned}$$
(2a)
$$\begin{aligned}&\text {subject to}\nonumber \\&{S_P\in {\underset{S_P'}{\text {arg max }}}{ \mathbb {M}(G,S_N,S_P')}},\end{aligned}$$
(2b)
$$\begin{aligned}&{|S_P'|<=k,\; S_P'\subseteq V\setminus S_N} \end{aligned}$$
(2c)

In the proposed problem, an additional constraint, namely (2b), guarantees the fairness of the approach. Therefore, the solution set \(S_P\) shall not only guarantee that the truth camping is as effective as possible by saving the maximum number of nodes from believing the fake news but also guarantee that the maximin value has also been maximized.

5 The proposed method

In the proposed method, given the network G and \(S_N\), we first identify the set of nodes S that can be infected by the given rumor starters and estimate the probabilities of their infection. Next, we generate weighted reversible reachable (WRR) trees from set S (the process to generate WRR is explained in Algorithm 2). For each node, WRR trees aim at identifying all those other nodes which can reach it with their information. These WRR tree structures can be generated very fast, and therefore, allowing the method to generate a large pool of WRR trees that resemble how information could diffuse in the network. Thus, if a path of diffusion is to be considered, a WRR tree is expected to cover it. To generate this sample of WRR trees, we rely on the Dynamic Stop and Stare algorithm (D-SSA) [60], which offers an error guarantee on the approximation as well as a good performance, as it is 1200 times faster than the next best sampling method; as claimed by the authors and also we observed in our work.

Next, the generated WRR trees are used to estimate the blocking power of each node with respect to different communities. Now the optimization function defined in (2a) is maximized for maximin fairness constraint, and iteratively, the node with the highest influence blocking power providing maximum fairness is selected until k truth campaigning nodes are chosen. For the process of choosing top-k positive seed nodes, we use six selection steps explained in building Blocks 3.B (in Section 5.1).

The proposed method is presented in Algorithm 1. This takes six parameters as input, network G(VE) having communities C, the negative seed set \(S_N\), the budget k, and three variables \(\gamma \), \(\epsilon \), and iterations. The parameter \(\gamma \) represents the infection probability threshold to add a node to S, which is the set of nodes that are expected to get infected. Next, the parameter \(\epsilon \), which is one of the input parameters of the D-SSA sampling algorithm; we set it to 0.1. This parameter will be the approximation guarantee of the probabilities in the WRR trees returned by D-SSA. Further is \(\delta \), which determines the number of node copies allowed in the WRR trees. Last is iterations consisting of the number of simulations of ICM that will be utilized to approximate the infection probabilities.

Algorithm 1
figure a

FWRRS (G, \(S_N\), k, \(\gamma \), \(\epsilon \), iterations,\(\delta \))

5.1 Building blocks

Here, we explain the steps of Algorithm 1 in detail.

  • 1. Identify the subset of nodes S with expected infection probability (Line 1). The set S consists of nodes that will be influenced by the fake news propagated from nodes \(S_N\) with a probability of at least \(\gamma \) in the absence of a truth-campaign. To obtain these probability values, monte-carlo simulation of ICM is performed iterations times from \(S_N\). The probability of infection of a node v is set to be the ratio of simulations in which it was included and stored in probs[v]. The nodes having a probs value higher than \(\gamma \) are then included in S. Additionally, the variable \(paths_v\) keeps track of simulation paths for each node \(v \in S\).

  • 2. Generate a pool of WRR trees (Line 2). The pool of WRR trees is constructed using an extended version of the D-SSA sampling algorithm, called D-SSA_WRRS [60]. The general idea of the algorithm is to recursively double the number of WRR trees until two conditions are met, in terms of coverage of the selected set and the error in the coverage approximation. The D-SSA algorithm is extended to contain a third condition, which ensures that the generated WRR trees shall be sufficient to guarantee with high probability a good approximation of the influence power of \(S_N\), i.e., necessary to accurately predict how to block such influence. Additionally, to speed the process of meeting this last condition, the roots are chosen with the infection probabilities computed in the previous step, namely probs. Each WRR is generated by calling \(generate\_WRR\)( G, S, probs, \(S_N\)), presented in Algorithm 1. First, a root r is chosen at random from the set S using the probability distribution in probs (Line 2). Including these probabilities helps boost the performance of the method, as the FWRRS method is interested in those WRR trees that include a negative seed node. From r, we begin to visit G in a breadth-first search manner. From the node r, all of its predecessors v are added to WRR with probability \(p_{v\rightarrow u}\), and the process continues until no new node is added or WRR reaches to \(S_N\) as no nodes at a distance larger than this will be able to save r from getting infected. For every node \(v \in WRR\), the probability of the path towards the root is stored in \(p_v\) (Lines 7-8). If v is already in WRR, a copy of v is added (Lines 14-20), unless any copy of v is already present in the path between v and r, as in the MCICM a node can not activate twice (line 13). Finally, the nodes with a lower probability of reaching r than any negative seed in the WRR are removed (Lines 26-28). Efficiency Improvement. The list paths was introduced to boost the performance of this algorithm when the probabilities of infection are high. In this case, the size of the WRR trees can increase that significantly slows the method. The number of copies of nodes grows significantly with the number of paths through which information diffuses. However, not all paths need to be considered. If a node is present in a WRR it is assumed that its information will reach the root node, so having multiple copies of this node does not provide any additional information. However, including a copy of this node might be necessary to cover a path through which the root can get infected; the variable \(paths_r\) is used to avoid this. If a new node v is to be added to a WRR tree, where v already has numerous copies present, \(paths_r\) is used as follows. If the path being considered from the copy of v to the root, namely \(des_u\cup \{v\}\), does not lead to the infection of the root, it will not be considered any further. This is assumed to be the case if it is not a subset of any path in \(paths_r\). This allows for the pruning of the WRR trees significantly as the consideration of numerous unnecessary paths is avoided (Line 15). The maximum number of copies, \(\delta \), is set to 10 in all our experiments, considering at least ten different paths through which any node v can inform r. A larger maximum number of copies will not have a significant improvement for blocking power on the evaluated data sets (also observed experimentally). In fact, it is rarely reached for lower influence probabilities, and the limit comes into significant use when the probabilities become higher

  • 3. Choose Truth-Campaigners (line 3- 44). This has the following three steps.

Algorithm 2
figure b

generate_WRR(G, S, probs, paths, \(S_N\), \(\delta \))

  • A. Initialisation (Line 3- 16). In Algorithm 1, after generating the pool of WRR trees, the infection probability for each node \(v \in S\) is computed. For this, only the WRR trees that include a node in \(S_N\) are considered and stored in \(WRRS_{inf}\)(Line 3). A node’s probability of infection is initialized in line 4 and set to the ratio of WRR trees rooted at v that includes a node in \(S_N\) (Lines 5-8). After computing the infection probability for all nodes in S, the expected number of initial infections per community \(init\_inf\) is computed (Line 8). The value of \(init\_inf\) in the community c is set to be the sum of the infection probabilities of all nodes \(v \in c\). Further, the variable \(C\_inf\) contains the communities where infections take place (Line 9). The percentage of saved nodes per community \(pct\_saved\) is initialized to 0, and the number of expected infections, \(exp\_inf\), is initialized to be equal to the initial ones (Line 10). The blocking power of a node v per community, \(B_v\), for all \(v \in V\) is initialized to 0 (Line 11), as well as the maximin value Mm (Line 12). In lines 13-16, the WRR trees in \(WRRS_{inf}\), which are those with \(WRR.has\_S_N=True\), are iterated over. If any node \(v \in WRR\) were to be added to \(S_P\), the root r of this WRR would be saved from being infected within such a tree, as v would reach it with its information before any node in \(S_N\). Thus, the number of WRR trees in which r gets infected reduces by 1, and the value of \(B_v[comm_r]\) is increased by (1/number of WRR trees rooted at r). This logic is then applied to compute the blocking power in each community for any node. Additionally, to avoid iterating over the entire set of nodes of the network, a set candidates is created to contain the nodes with some blocking power.

  • B. Selection Steps (Line 18-37). Next, we iteratively select k truth-campaigners, and in each iteration, the following six selection steps are performed. In brief, steps 1 and 2 maximize optimization function 2b, steps 3-5 maximize optimization function 2a, and step 6 is used to break the ties given the budget.

    1. 1.

      First, optimize 2b, which maximizes maximin fairness, and store candidate nodes in \(opts_1\).

    2. 2.

      If there is any community that is infected significantly (according to the input parameter \(\epsilon \)) and not reachable from \(S_P\), remove candidate nodes that will not reach any of these communities.

    3. 3.

      Choose candidate nodes that maximize the weighted blocking power in the communities with a below-average saving ratio (based on already chosen \(S_P\)) and store them in \(opts_2\).

    4. 4.

      Pick out the candidate nodes that maximize blocking power in the communities having the lowest fraction of saved nodes (referred to as maximin community) and store them in \(opts_3\).

    5. 5.

      If there is a cost of fairness due to the selection steps 2-4, choose the nodes from \(opts_3\) with the highest blocking power (This is done as it will not impact the fairness as all the nodes chosen in \(opts_3\) provide a similar maximin value, though it will improve the blocking power).

    6. 6.

      Break the ties, if any, by selecting the node with the maximum weighted out-degree, where a node’s weight is set to be its infection probability under the current \(S_P\).

    This way, in each step, we select the node that maximizes the fairness objective without causing a significant cost in the effectiveness of the campaign.

  • C. Recompute and Repeat (Lines 38-44). Once a new node u is added to \(S_P\), the expected infections, maximin value, and ratio of saved nodes per community are modified (Line 38). Next, the blocking power of impacted nodes is updated by iterating over those \(WRR \in WRRS_{inf}\) in which the root r is expected to be reached by u (Lines 39-43). In the WRR trees that include the node u, the root will remain saved, and the expected infections of all nodes not present in this WRR are reduced by \(1/(\text {number of }WRR \text {trees rooted at }r)\). Further, this WRR is removed from the set of WRR trees. The process finalizes when the size of \(S_P\) is k, and the resulting set of truth campaigners is returned.

The flowchart of the proposed method is shown in Fig. 2.

6 Experimental set-up

In this section, we discuss datasets, baselines, and experimental settings.

6.1 Datasets

We evaluate the proposed method on six real-world datasets that are summarized in Table 1. The communities are identified using the Louvain community detection method [8]; this is briefly explained in Appendix B. Facebook and Twitter are a snapshot taken from online social networking websites Facebook and Twitter, respectively. PHY, HEPT, and ASTRO are co-authorship networks extracted using publications in the category of Physics, High Energy Physics, and Astro Physics at ArXiv, respectively. Enron is an email communication network constructed for the Enron organization.

Fig. 2
figure 2

Flowchart of FWRRS method

Table 1 Characteristics of datasets

6.2 Baselines

We compare the proposed method with fairness-oblivious and fairness-aware baselines to choose truth-campaigners. The fairness-oblivious baselines are described below.

  • Degree: nodes with the highest degrees are selected.

  • CMIA-O [94]: compute the blocking power of each node using the Maximum Influence In-Arborescence (MIA) structure. These structures are used to compute the blocking power of each node and greedily choose the node with the highest value and recompute.

  • BIOG [52]: This is similar to CMIA-O; however, it uses a Maximum Influence Out-Graph (MIOG) structure to approximate the influence region. It also reduces the set of candidates for \(S_P\) to those nodes directly connected to \(S_N\) or adjacent to these.

  • TIB-Solver [85]: It first computes the threat of each node that can be reached by the rumor. The threat of a node is set to be the number of nodes that will be infected by it. Then WRR trees are generated to estimate the number of reachable nodes, and greedily the top-k nodes that best block the rumor are chosen.

As mentioned before, there is no work on fairness-aware influence blocking maximization. Therefore, we extend two existing approaches for the IBM problem to achieve a fairer outcome. The fairness baselines used are the following.

  • Parity Seeding [7]: Stoica et al. [86] proposed a fairness-aware influence maximization method. We extend the method to the influence blocking maximization problem where the nodes with the highest degree while maintaining parity are chosen to be truth-campaigners, and their ratio in different communities is proportional to \(S_N\).

  • Fair-CMIA-O [7]: Fair-CMIA-O is an extension of CMIA-O [94] method using maximin fairness constraint that follows the same structure as CMIA-O, but the nodes resulting in the highest maximin value are chosen. Ties are broken by choosing the node with the highest blocking power within those communities that still need help. The detailed algorithm is explained in Appendix A.

6.3 Evaluation metric

The proposed method is evaluated using the following two metrics that consider both efficiency as well as fairness.

  1. 1.

    Fraction of Saved Nodes: It shows the fraction of saved nodes from the total negatively influenced nodes in the absence of a truth-campaign. It is computed as,

    $$\frac{\mathbb {B}(G,S_N,S_P)}{s^-(G,S_N,\phi )}$$
  2. 2.

    Maximin: It represents the fairness of the proposed method, and it is computed as,

    $$\min _{c\in C}\; \frac{\mathbb {B}_c(G,S_N ,S_P) }{s_c^-(G,S_N,\phi )}$$

6.4 Implementation details

The influence propagation is modeled using COICM, and the probability \(p_{u\rightarrow v}\) for each edge (uv) is set to be 1/inDegree(v), where inDegree(v) is the number of incoming edges to v. The parameters are set as, \(\gamma = 0\), \(\epsilon = 0.01\), and \(iterations=1000\). In each iteration, uniformly at randomly 50 nodes are chosen as rumor spreaders, and each experiment is repeated 1000 times, and the average is taken to compute the number of saved nodes, as well as fairness.

The code was written in Python, and experiments were carried out on a server with the following characteristics: CPU 1x Intel Xeon E5-2698v4 @ 2.2GHz, RAM 256GB, and GPU 4x Nvidia Tesla V100. Our implementation is available at Github repositoryFootnote 1.

Fig. 3
figure 3

Percentage of saved nodes versus k (the number of chosen truth-campaigners)

Fig. 4
figure 4

Fairness versus k (the number of chosen truth-campaigners)

7 Results

We first compare the fraction of saved nodes by the proposed method as compared to baselines. The results are shown in Fig. 3. We observe that the proposed method, FWRRS, provides high efficiency. The performance of the considered baselines is inconsistent on all datasets, and we observe that some baselines provide good results on some datasets and some on others. The top three baselines are Parity seeding, Degree, and Fair-CMIA-O, where Degree is fairness-oblivious, and the other two are extended fair methods for IBM. In Fig. 3a, b, e, and f, choosing \(S_P\) using higher degree nodes seems to be a powerful strategy. For these datasets, the best-performing method, after the proposed FWRRS, is generally Parity seeding. The high value of the degree of the most connected nodes results in high coverage of the positive information. Consequently, most paths of diffusion of negative information are intercepted. The maximum degree in Facebook, Twitter, ASTRO, and Enron is significantly above the average degree, and the average degree is also quite large, resulting in information spreading through numerous paths. Hence, reducing the network to maximum probability paths, like the CMIA-O method, is inefficient, as, in reality, information diffuses throughout many other paths. However, in the HEPT and PHY networks, the next best performing methods are CMIA-O and Fair-CMIA-O. These methods benefit from the low average degree, which causes a reduction in the number of paths through which information diffuses. Blocking the maximum probability paths results in the blocking of the most propagation paths. In these networks, degree and Parity seeding do not perform as well as in other networks.

Next, we compare the fairness of the proposed method, and the results are shown in Fig. 4. The results show that the FWRRS method consistently provides the highest fairness value given that it also provides the highest influence blocking. The plots also highlight the unfairness of fairness-oblivious IBM methods, including the highest degree. Existing methods tend to neglect minor communities, showing the need for fairness-aware methods. Fairness-aware methods mostly achieve a maximin value higher than 0 on all datasets, and it shows that there is no community that has been completely ignored while blocking the negative influence. For the Facebook dataset, in Fig. 4a, other baselines, including CMIA-O and BIOG, also achieve a maximin value higher than 0 for smaller k as this network has a small number of communities and a high average degree that makes spreading the true information to all communities more accessible for any method. As a result, all communities that get infected are reached and saved to some extent. Other networks have many small-size communities, and they are most likely to get infected by randomly chosen negative seed sets. However, the existing methods are more likely to neglect such minor communities for the larger good, a higher percentage of saved nodes, and provide low fairness.

Despite the clear impact of the network characteristics in the considered baselines’ effectiveness, the proposed method FWRRS seems to perform best over all networks. The fairness-aware objective function does not negatively impact the number of saved nodes. In fact, if the seed nodes are chosen fairly, then for a sufficiently large budget k, it works as a catalyst to improve the effectiveness. Enforcing the true information to reach all relevant communities guarantees that the intervention reaches communities that otherwise would have been neglected. This helps overcome the overlapping influence spheres that the nodes being selected by the other methods have. For low-budget values, the performance is similar or superior to the next best-performing method. It is only in Fig. 3e and f that some cost of fairness is observed, as degree outperforms FWRRS for \(k \le 20\). However, the cost of such fairness is very low, only \(2\%\) in the worst case. This cost comes from trying to reach all communities in need with a low number of truth spreaders, but this initial effort becomes valuable for larger k values, where the FWRRS achieves better performance than all other baselines. The FWRRS method is the fairest compared to all the evaluated methods while consistently achieving the highest percentage of saved nodes.

7.1 Sensitivity analysis

In this section, we examine the effect of various parameters on the performance of different methods, and the results are shown on the PHY network. We would like to mention that there is no specific reason to pick this network. It is a medium size network, and therefore, the results are shown on this. The results are generalizable to all considered datasets.

7.1.1 Effect of varying \(|S_N|\)

In all previous experiments, the number of rumor starters was fixed to 50. We evaluate the impact of the size of the rumor starter set that varies in the interval [20, 100], and the size of the positive seed set is 50. The experiment results are displayed in Fig. 5.

Fig. 5
figure 5

Experiments in PHY under varying sizes of \(S_N\)

Fig. 6
figure 6

Experiments on PHY under the Uniform model with random \(S_N\)

In Fig. 5a, we observe that the fraction of saved nodes is negatively affected with the increase in \(|S_N|\), as the methods’ effectiveness becomes more limited. However, we observe that this reduction remains quite linear for all methods. In Fig. 5b, we observe that as the number of initially infected nodes increases, the more difficult it becomes for methods to remain relatively fair. The consequential increase in the number of communities being infected challenges the task of saving all communities with a limited budget, especially when the number of rumor nodes is more than the truth-campaigners, and only fairness-aware methods achieve relative fairness.

Fig. 7
figure 7

Experiments on PHY under multiple community detection methods

7.1.2 Impact of influence probability

We further evaluate the proposed method to study the impact of edge influence probabilities. We evaluate all methods under the COICM model with uniform probabilities, where all edge probabilities have been set to a fixed p value. In our experiments, we consider various p values in the interval [0.02, 0.1], and \(|S_N| =50\) and \(|S_P| =50\). The results are shown in Fig. 6 for the PHY Network.

We observe that the FWRRS method saves the maximum number of nodes, and for higher p, the method is more effective.

In this experiment, we observe a similar reduction in fairness, in Fig. 6b, as the one in Fig. 5b. Because the negative information propagates strongly as the p-value grows, the number of communities infected also increases, afflicting the fairness of the methods, which have a limited budget of 50. The increase in p-values also causes the positive information to propagate stronger, leading to an increase in the effectiveness of the methods, observed in Fig. 6a. But despite this effect, the proposed FWRRS method remains fairest and consistently achieves the largest percentage of saved nodes.

7.2 Robustness analysis

7.2.1 Impact of community detection method

To show that the proposed method is not dependent on a specific community detection method, we use different community detection methods to identify communities and evaluate the fairness and effectiveness of the methods. The selection of community detection methods covers different types of methods, including greedy modularity [19], Walktrap [65], Infomap [70], label propagation [68] as well as Louvain [8] community detection methods; these methods are explained in Appendix B. The results on the Physics co-authorship network are shown in Fig. 7.

In Fig. 7, the fraction of nodes saved from infection of each method is barely altered (maximum difference of 0.01) by the change in the community detection method. Slight variations are observed; for example, the fraction of saved nodes of the Fair-CMIA-O method drops 1%, compared to the Louvain algorithm, under the greedy modularity algorithm and increases by 1% in the label propagation case. These variations do not necessarily come from the change in community detection method, as they could be inaccuracies in the measurements caused by the limit of 1000 simulations of the COICM model being used, especially given the small magnitude of this variation. Such variations are also observed in methods that do not take as input the communities, and therefore, shall not be affected by them. Overall, the FWRRS method remains the best performing in terms of saved nodes and fairness under all scenarios, and we can conclude that the performance of the proposed method is not dependent on a community detection method.

7.3 Other fairness metrics

We further measure the performance of all methods using other fairness metrics that are defined below.

  • Equity: The Equity fairness constraint focuses that the number of saved nodes in each community should be proportional to the number of infected nodes [27]. It is computed by taking the variance over the fraction of saved nodes in all communities, defined as,

    $$\begin{aligned} var \left\{ \frac{\mathbb {B}_{C_i}(G,S_N, S_P)}{s^-_{C_i}(G,S_N,\phi )}: \forall C_i \in C \right\} \end{aligned}$$
  • Disparity [1]: The disparity of a method is computed by the maximum disparity in the normalized saved nodes across all communities. It is defined as,

    $$\begin{aligned} \max _{i,j \in \{1, 2, ... , |C|\}}\left| \frac{\mathbb {B}_{C_i}(G,S_N, S_P)}{s^-_{C_i}(G,S_N,\phi )} - \frac{\mathbb {B}_{C_j}(G,S_N, S_P)}{s^-_{C_j}(G,S_N,\phi )} \right| \end{aligned}$$

Figure 8 shows Disparity and Equity on the Physics network for \(S_N=50\). In Fig. 8a, we observe that FWRRS performs better for Equity as maximin fairness aims to maximize the saved nodes for the minimally influenced community so that the saved nodes are balanced among all communities. All fairness-oblivious methods (except Degree in some cases) have high Equity value, showing that the mitigation has huge variance between different communities. In Fig. 8b, we observe that fairness-oblivious methods have a higher disparity than fairness-aware methods. The disparity is low for FWRRS, Fair-CMIA-O, and Parity seeding methods, though if we refer to Fig. 3c, the performance for saved nodes is poor for Fair-CMIA-O and Parity seeding methods. Another important thing to note is that the disparity is consistent for our method, and it improves to be best as k increases; the reason is that FWRRS has been optimized for maximin that also reduces disparity over time as we also maximize the fraction of saved nodes.

Fig. 8
figure 8

Evaluation of different fairness measures on NetPhy

Table 2 Synthetic datasets characteristics

7.4 Analysis on synthetic networks

To better understand the performance of the proposed method on different types of networks, we evaluate the performance on synthetic networks. We propose a synthetic network generation model, called HIgh Clustering Homophily Barabási-Albert (HICH-BA) model, that generates scale-free networks better resembling the characteristics of OSNs, having multiple communities and a high clustering coefficient. The model is explained in Appendix C. All generated networks have 10, 000 nodes and 5 communities of diverse sizes.

We generate synthetic networks having 10,000 nodes and five communities of varying sizes, where two communities are of large size and three are of comparatively small size. The parameters used to generate the networks and their characteristics are summarized in Table 2, where the first column is the name assigned to the generated network based on the given setting, n is the number of nodes, m is the number of edges, h is the homophilic index, \(p_t\) is the probability of completing a triad edge to increase the clustering coefficient, and \(p_{PA}\) is the probability with which new edges are established using preferential attachment instead of at random, that are followed by average degree, assortativity coefficient [20], and clustering coefficient.

Next, we briefly discuss the choices of the selected parameters and results shown in Figs. 9 and 10. The SG1 network contains all favorable conditions for the Degree or Parity seeding methods, as was observed in the real datasets experiments that these methods benefit from a high average degree. Further, in SG1, the highest degree is comparatively a very high degree and more powerful in information coverage, and lastly, it has a high clustering coefficient that results in information propagating strongly in the local area of the selected nodes. In these conditions, methods based on estimating influence power using shortest paths, such as CMIA-O, underperform, as information spreads through many paths other than the maximum probability ones. Therefore in the SG1 network (refer Fig. 9a), after the FWRRS method, the next best-performing methods are degree and Parity seeding. The proposed method FWRRS achieves the largest maximin value as observed in Fig. 10a, and therefore, is the fairest and best-performing method. Further, despite CMIA-O performing sub-optimally, its fair alternative performs well. Spreading the information fairly over all communities seems to cover a larger portion of the propagation paths, leading to better blocking the negative information.

Fig. 9
figure 9

Percentage of saved nodes versus k (the number of chosen truth-campaigners)

Fig. 10
figure 10

Fairness versus k (the number of chosen truth-campaigners)

The SG2 network, on the other hand, favors the conditions where CMIA-O and Fair-CMIA-O perform better. This is the case when information does not propagate through many paths, achieved with a low average degree. The remaining parameters of the model were left unchanged; however, this reduction in average degree has some indirect impact on the clustering coefficient and homophily. The negative information propagated throughout fewer paths, and thus, blocking the maximum probability paths is very effective. Further, Degree and Parity seeding perform well for lower budgets, but for higher budgets, they become outperformed by other methods, as the selected nodes do not reach as many nodes as in the previous network due to their lower degree. The change in the methods’ performance is also seen on fairness in Fig. 10b. Fair-CMIA-O achieves a higher maximin value than in the previous case, and Degree and Parity seeding are more unfair. The proposed method FWRRS shows to be unaltered by this change in the network characteristics and consistently achieves the highest number of saved nodes as well as maximin value, adapting its selection of seeds to this change in the average degree.

Next, the impact of randomness in the graph was evaluated, and for this SG3 was generated. In SG3, \(90\%\) of the edges were established at random while still maintaining the homophilic preference. In Fig. 9, for the SG3 network, we observe that most methods tend to perform similarly, especially for larger values of k. In Fig. 9, it is observed that the randomness in the graph encourages information to spread randomly and reach all communities, eventually leading to achieving comparatively fairer outcomes for all methods.

The fourth synthetic network SG4 is a graph in which the value of \(p_t\) is significantly smaller to evaluate the impact of a lower clustering coefficient. In Fig. 9, for SG4, Degree and Parity seeding are better for lower k, that is due to the scale-free structure of the network. Since the highest-degree nodes reach to many nodes and avoid many infections. Despite this power of high degree nodes in sparse, scale-free networks with low clustering coefficient, FWRRS still performs better than Degree and Parity Seeding, achieving a portion of saved nodes close to \(10\%\) higher. The low clustering coefficient of the graph causes information within communities to spread less powerfully. This means that the positive information that was easily reachable to small size communities now reaches lesser nodes in these groups. For achieving a higher portion of saved nodes in these communities, more effort is required than just reaching a community from highly connected nodes. In such cases, the FWRRS method is a lot more fair than other methods as it considers the maximin value explicitly in the objective function and guarantees that the counter true information reaches to all small-size affected communities.

Lastly, we create SG5 network with low homophily. In this graph, the associativity coefficient has a low value of 0.356. In Fig. 9, one can observe that on this network, methods have very similar effectiveness to the one observed for the SG4 network. Since the number of inter-edges increases in more heterophilic networks, the number of paths between communities also increases. This means information spreading through other paths than the maximin probability ones, leading to the poor effectiveness of CMIA-O and BIOG, leading too to an unfair outcome. Explicit consideration of the fairness objective is then required, as FWRRS and Fair-CMIA-O achieve the highest fairness and efficiency for all k.

In summary, in this evaluation, we were able to observe the impact of network characteristics on the fairness and performance of the methods. High average degree and reach of the highest degree nodes are crucial aspects for methods, such as Degree and Parity seeding, to achieve good results. On the other hand, for the CMIA-O method, the opposite is required, as it benefits from fake news spreading through a small number of paths. We can also conclude that the TIB-Solver method does not perform as well as claimed in most cases. It only works when information spreads strongly through the network, like under the uniform probabilities for the COICM propagation model as described in Fig. 6. Overall we show the usefulness of the proposed method FWRRS, as it consistently achieves a higher percentage of saved nodes as well as the highest fairness. This method achieves such a successful outcome for the intervention while being the fairest in all the considered scenarios.

7.5 Running time analysis

The scalability of the different methods included in our analysis is evaluated on a family of synthetic networks generated using the HICH-BA model. The networks generated increased ten times in size in each iteration, having 1,000, 10,000, 100,000, and 1,000,000 nodes, and average degree 10. All these graphs consisted of five communities and maintained a high clustering coefficient, as \(p_{PA}=0.9\), and a high homophily value, as \(h=0.9\). In the experiment, 50 nodes at random were established as rumor starters, and each method would then select 100 truth campaigners.

The running time for all methods is shown in Fig. 11. We observe that the proposed FWRRS method performs similarly to the CMIA-O and its fair version, which is four magnitudes faster than the greedy algorithms. It is still slower than BIOG and the heuristic algorithms degree and parityseeding. Further, we observe that FWRRS has a sub-linear relationship with the size of the network, similar to CMIA-O and its fair version [94]. This is because the FWRRS method does not consider all nodes in the network, only those that will be reached by the rumor with an imputed probability. The TIB method is significantly negatively affected by the increase in network size and was not completed on 1M node network within 24 hour time.

Fig. 11
figure 11

Scalability of different methods on synthetic networks, where x-axis is the number of nodes and y-axis is running time in seconds

8 Conclusion

In this paper, we propose a fairness-aware method to choose truth-campaigners, called FWRRS, to minimize the influence of fake news spreaders in the network. The proposed method first identifies the nodes that are highly likely to get infected by the fake news and then uses weighted reversible reachable trees to identify the most influential truth campaigners by maximizing the fairness-aware optimization function. We compared the performance and fairness of the proposed method with fairness-oblivious and fairness-aware baselines on real-world and synthetic networks using various case scenarios and observed that the proposed method provides state-of-the-art accuracy and the highest fairness values. The experimental analysis showed the unfairness of existing methods as they carry the inequalities of the network structure into the outcome, as well as the performance of these methods was not consistent and strongly depended on the characteristics of the network. However, the proposed method FWRRS provides consistent results for graphs having different characteristics and performs best overall in both fairness and total saved nodes. The methods were also evaluated under these controlled networks, showing under which circumstances each model performs best. This evaluation also confirmed the instability of the performance of baseline methods, and showed that the proposed method FWRRS consistently performs best, disregarding the network characteristics. The results showed that fairness always does not come at a cost, and in fact, in most cases, it is beneficial for the future effectiveness of the mitigation campaign. We observed that enforcing fairness like in the FWRRS method might catalyze efficiency by exploiting the community structure of the social networks.

As per the best of our knowledge, this is the first work that addresses fairness in influence blocking and shall motivate the research in this area, showing its benefits beyond moral aspects. However, there are some limitations of this work, such as it assumes that the network structure is known, and one can further work on designing a method where the network is dynamic or the entire network structure is not available. Besides this, we have studied the problem using the COICM model, where the probabilities of sharing fake and its counter-true information are the same. However, in the real-world this might not be the case, and it will be interesting to study fairness-aware mitigation for different propagation models, such as shortest path model [42], penta-level spreading model [29, 77], trust-based latency-aware independent cascade [59], conformity-aware cascade model [50], continuous-time markov chain- independent cascade model [100], and Linear Threshold model [40]. In real life, some users might have strong beliefs in some opinion, and therefore, will be more biased towards sharing a particular type of information [4]. Fairness-aware fake news mitigation for highly polarized networks is also an interesting open direction. In the future, we would also like to work on algorithmic fairness on time-constrained fake news propagation, where it is crucial to mitigate the impact within the given time limit. One can further aim to propose a feature-blind fair influence blocking method where the whole network structure is not known, or applying the community detection method is an infeasible option.