1 Introduction

Epistemic polarization arises when a population’s beliefs about some hypothesis grow further apart. This is sometimes operationalized as an increase in the spread or dispersion of the belief across the population (for example, see Bramson et al., 2017; DiMaggio et al., 1996; Freeborn, 2023, 2024a, b; Madsen et al., 2018; Pallavicini et al., 2021). For example, suppose that most of a population are very unsure about the safety of vaccines. If this belief polarizes, then more people might become very sure that vaccines are safe, more people might become very sure that vaccines are unsafe, and fewer people may be left highly unsure.Footnote 1

However, we are often interested in agents who hold many different beliefs, and in how those beliefs might be related. For instance, different polarized beliefs might also become more closely correlated. Epistemic factionalization arises when multiple, different beliefs become correlated in a population of agents (see Bramson et al., 2017; Kawakatsu et al., 2021; Levin et al., 2021; Weatherall & O’Connor, 2021). For example, suppose that some population’s beliefs about vaccination efficacy and anthropogenic climate change have both polarized. However, perhaps the same people who are skeptical about vaccine efficacy also tend to be skeptical about anthropogenic climate change, whilst those who strongly believe that vaccines are effective also tend to believe in anthropogenic climate change. Then, if I know that someone is highly skeptical about anthropogenic climate change, this could give some degree of evidence that they might also be skeptical of vaccines.Footnote 2 This would be a case of factionalization.

Perhaps such factionalization could be driven by the relationships between different beliefs. Consider that proposed correlation between skepticism about anthropogenic climate change and skepticism about vaccines. At first glance, these might seem like unrelated beliefs, pertaining to two very different fields, climate science and medicine. However, these beliefs might be related by an underlying belief, perhaps regarding the trustworthiness of scientists or scientific institutions. If someone regards scientific institutions as generally reliable, this could drive them to accept scientific results about both anthropogenic climate change and vaccines. On the other hand, if someone regards scientific institutions as generally unreliable, this could drive skepticism about both anthropogenic climate change and vaccines.

Previous research has already shown how underlying background beliefs can drive rational polarization of individual beliefs (see Freeborn, 2023, 2024a, b; Jern et al., 2014). In this paper, I demonstrate how factionalization can arise even for populations of ideally rational agents who have probabilistic relations between their beliefs.

To do this, I will assume that the agents are as similar as possible, sharing the same probabilistic relationships between their beliefs, and updating on the same evidence, differing only in their initial degrees of belief about various hypotheses. I show how patterns of factionalization spontaneously emerge due to the probabilistic relations between beliefs themselves. One can think of this model as explicating one particular kind of factionalization—arising due to certain underlying background beliefs, worldviews or ideologies shaping how the agents’ beliefs evolve in the light of new evidence.

The paper is structured as follows. In Sect. 2, I outline a general model for representing a population of agents with multiple beliefs, which could undergo factionalization. I also outline some of the formalism that I will use throughout the rest of the paper. In Sect. 3, I suggest three different approaches for operationalizing “factionalization”, “convergence” and “general divergence” within this model. In Sect. 4, I present three simple examples of belief networks, one that leads to convergence and two that lead to factionalization. I explain whether and how convergence, polarization and factionalization arise in each case. In Sect. 5, I explain why factionalization must arise when agents’ overall beliefs polarize: general divergence never arises.

2 General model

To talk about factionalization more concretely, it will help to have a basic model of a population in mind. This model will include only certain minimal necessary features for factionalization to emerge.Footnote 3 My aim is to distill one particular form of factionalization that emerges due to the relationships between beliefs.

This model is highly idealized, but it will be helpful to have a concrete real-world picture in mind. The model might represent a population, accumulating exactly the same evidence about some particular hypotheses, and updating their beliefs about many other hypotheses on this basis. For instance, we might imagine a subset of the general public reading a series of newspaper articles about the a particular Covid-19 vaccine. From this evidence, each population member might update many other (more or less closely related) beliefs: about the efficacy of vaccines in general, about the reliability of scientists, or about whether humans cause anthropogenic climate change, and so forth.

I assume a finite population of agents. I assume that there is a set of hypotheses or propositions describing the world or some system within it, each of which can be true or false, represented by discrete, binary random variables.Footnote 4 Each agent holds a degree of belief, a probability, about each hypothesis. The agents can have conditional probabilities relating pairs of different beliefs. However, I assume that all the agents agree about each of the conditional relations between beliefs: any disagreement comes down to disagreements about the hypotheses themselves.

To represent relations between beliefs, I use the formalism of Bayesian networks (see Sect. 2.1). A Bayesian network specifies a set of variables, representing hypotheses or propositions, and the conditional relationships between variables. Implicit in this model is that the agents are rational: all of their beliefs must be probabilistically consistent at each time, and upon learning any evidence, their beliefs are updated in a dynamically coherent way.Footnote 5

2.1 Formalism of Bayesian networks

More formally, a Bayesian network is a graphical model that aims to capture some subset of the independence relationships given by a joint probability distribution (Pearl, 2009). Let \(\mathcal {X} = \{ X_1, X_2,\ldots X_N \}\) be a set of N random variables, defined on a probability space. Then, a joint probability distribution \(P(X_1, X_2, \ldots X_N)\) gives the probability that each of \(X_1,X_2, \ldots X_N\) falls within some range or a discrete set of values specified for that variable. A factorization of a joint probability distribution makes a choice about how variables depend upon others. Given some particular ordering of variables 1 to N, a factorized representation \(P(X_1, X_2 \ldots X_N)\) takes the form,

$$\begin{aligned} P(X_1, \ldots X_N)&= P(X_1\mid X_2, \ldots , X_N) \times P(X_2\mid X_3, \ldots , X_n) \ldots P(X_n). \end{aligned}$$
(1)
$$\begin{aligned} P(X_1, \ldots X_N)&= \prod _{i=1}^N P(X_i \mid X_1 , \ldots X_{i-1} ). \end{aligned}$$
(2)

Each of the N! factorizations of a joint probability distribution will correspond to a different Bayesian network. Let \(\mathcal {G} = (\mathcal {V},\mathcal {D})\) be a directed, acyclic graph, where V is a set of vertices (or “nodes”), and D is a set of directed edges, pointing from one vertex to another. In a directed, acyclic graph, these directed edges can never form a closed cycle. Nodes are associated with unique variables, and edges represent the conditional relations between different variables. A directed edge \((X_a, X_b)\) exists in the network if \(P(X_b, X_a)\) is a factor in the joint probability distribution. If there is a directed edge from node A to node B, we call A the “parent” and B the “child”. Bayesian networks encode a series of local Markov independence assumptions. If the joint probability distribution factorizes with respect to a directed graph \(\mathcal {G}\), then each variable in the joint probability distribution, associated with some node in the graph, is probabilistically independent of its non-descendants, given its parents (Geiger & Pearl, 1993; Pearl, 2009). So, we can fully specify a Bayesian network by a set of nodes, \(\mathcal {V}\), directed edges, \(\mathcal {D}\), random variables, \(\mathcal {X}\), where there is a 1–1 map between the random variables and the nodes (I will often use the two interchangeably), and conditional probability distributions \(P(X_i\mid X_{\text {par}_i})\), where \(X_{\text {par}_i}\) are the variables associated with the parents of \(X_i\).

Bayesian networks can be updated on new evidence using upwards and downwards propagation procedures, such that the updated Bayesian network remains consistent with the axioms of probability theory. Downwards propagation involves a simple application of the specified conditional probabilities, upwards propagation involves a Bayesian inference procedure. In practice this requires a particular algorithm; in this case I use successive variable elimination (see Darwiche, 2009 for a comprehensive overview). Successive updating makes use of the rigidity assumption, that conditional probabilities of the form \(P(X_i\mid X_j)\) do not change when \(X_j\) is updated (see Bradley, 2005; Diaconis & Zabell, 1982; Jeffrey, 1983).Footnote 6 The belief propagation process is governed by probability functions for each node which take as input the possible values of the parent nodes, and give as output the probability, or probability distribution, of the variable associated with the node.

2.2 Specification of the evidence

In this model, the agents update their beliefs based on accumulating evidence over time. So, I assume that the agents begin at some timestep 0, and the population evolves through T discrete timesteps. All agents receive the same evidence at each timestep, and then updates all of their beliefs in their belief network on the basis of this evidence.Footnote 7 I will assume that all the evidence, at every timestep, pertains to just one single belief, corresponding to one single node, let us call it the “data node”.Footnote 8 However, the effects of updating this single belief will propagate through the network to other beliefs.

In order to explore the evolution of beliefs over time, I will look at successive updating on uncertain evidence.Footnote 9 Rather than the evidence determining that one of the hypotheses is definitely true or false (with probability 1 or 0), I will specify this as fixed likelihood evidence.

What does it mean for agents to receive the same likelihood evidence? In this case, I will represent that as receiving evidence with the same likelihood ratio. Following, Mrad et al. (2015), I define likelihood evidence \(\eta \) on a variable H of a Bayesian network, as evidence given by a likelihood ratio,

$$\begin{aligned} L(H=h_1) : \ldots : L(H=h_n) = P(\eta \mid H=h_1) : \ldots : P(\eta \mid H=h_n) , \end{aligned}$$
(3)

where the \(L(H = h_i)\) are likelihoods, representing the probability of the observed evidence, given that H is in the state \(h_i\). This is a natural standard of “sameness” of evidence for several reasons. First, it allows the updating procedure to be commutative (see Field, 1978; Huttegger, 2015; Jeffrey, 1988; Wagner, 2002 for a philosophical discussion; see also Diaconis & Zabell, 1982; Mrad et al., 2015 for some mathematical considerations about the explication of uncertain evidence relevant to Bayesian networks). Second, the same likelihood evidence of this kind can also be thought of as exactly the same hard “virtual evidence” in an augmented Bayesian network (Chan & Darwiche, 2005; Jacobs, 2018; Pearl, 1988).Footnote 10

2.3 Agreement between agents

Summarizing, I assume that the agents agree about almost everything.

  • The agents will form beliefs about the same set of propositions, X.

  • The agents will agree about which beliefs are dependent or independent of others (i.e. the agents will share the same belief network structure G).

  • The agents will agree about the conditional relations between beliefs (i.e. the agents will share the same conditional probability distributions between parent and child beliefs).

  • Each agent will receive the same likelihood evidence \(\eta _t\), at each timestep t.

The agents will only disagree about one thing: the initial probabilities that they assign to each proposition. Given the Bayesian network structure, and the rationality constraints on the agents, this disagreement can entirely summarized by their beliefs about the exogenous variables: those with no parents. Beliefs about these variables are in some sense prior to other beliefs: we could imagine as basic background beliefs held by the agents. Any polarization or factionalization that arises must be driven entirely by these disagreements about those exogenous variables. I will assume that the exogenous beliefs of our population are drawn from a random distribution (more precisely, that the degrees of belief are drawn from a uniform distribution between 0 and 1). As such, the exogenous variables will be statistically independent of each other, at least at the initial timestep, \(t_0\).

2.4 Limitations of the model

This idealized model is not intended to fully capture the complexity of real-world factionalization, which is likely to arise from multiple factors. A sophisticated understanding of real-world factionalization should also consider other potential sources, which may include social trust, political alliance-building or underlying psychological attitudes (for example, see Lakoff, 2010; Weatherall & O’Connor, 2021). None of these play a role in the model presented here.

However, this model may still provide insight of one plausible mechanism that drives factionalization. It seems likely that the principles driving factionalization in this idealized model could also be at work within the multifaceted models that better represent the complexities of real-world factionalization.

Furthermore, this model does demonstrate how epistemic factionalization, a phenomenon that one might intuitive suppose to be a result of “irrationality”, can arise for a population of rational agents, who are all updating on the same evidence in highly idealized circumstances. This insight challenges the notion that factionalization is solely a product of cognitive biases or misinformation, suggesting instead that it can be a natural outcome of rational interrelations among beliefs. Therefore, addressing factionalization is not as straightforward as correcting cognitive biases or rectifying skewed information sources; it demands a deeper understanding of the inherent dynamics between beliefs.

2.5 Related models

With this model in hand, it is worth considering how it relates to, and differs from certain other models. Weatherall and O’Connor (2021) demonstrate how factionalization can arise in networks of agents. These agents adopt a heuristic for evaluating the reliability of evidence—they discount evidence from other agents as a function of the overall differences between their beliefs. This model deliberately avoids appealing to background beliefs, worldview or ideologies. Indeed each of the agents’ beliefs are assumed to be independent (except insofar as they depend on the agents beliefs about other agents). Nonetheless, the beliefs systematically become correlated as the population updates its beliefs. As such, they explicate a form of factionalization that emerges solely “from trust grounded in shared belief”.

The approach taken here is importantly different: the factionalization does not arise from network effects or social trust between agents. Indeed, in the model presented here, all agents have access exactly the same evidence. Rather, it arises from relationships between the beliefs of agents. As such, whilst Weatherall and O’Connor (2021) treat beliefs as independent, in the model presented here, the beliefs are explicitly probabilistically related.

Grim et al. (2022a) also create a model with some similarities to the one presented in this paper. In their model, individual agents with multiple, probabilistically related beliefs exhibit patterns of stable beliefs and punctuated equilibria, which they suggest might resemble patterns of paradigm shifts. However, these equilibria arise under different conditions, and by a different mechanism from the factions that I study in this paper. In the Grim et al. (2022a) model, agents receive an “evidence barrage” of continually surprising evidence, of different likelihoods. As such, this does not represent a “learning scenario” (see Huttegger, 2015) in which the agents cumulatively learn the state of the world. Stable belief patterns arise when the agents’ credences become resistant to change as a result of nearing either 0 or 1. By contrast, I will study a population of many agents who receive an increasing (but incomplete) set of information about the world. Most of the time, most of the agents’ credences never become close to 0 or 1.

3 Convergence, polarization and factionalization

Recall the model in mind from Sect. 2. What should we expect to happen to the population’s beliefs as they update on the successive datapoints? We might distinguish three ways in which the population’s beliefs could evolve: convergence, general divergence and factionalization. In this section, I will suggest three different ways to explicate convergence, general divergence and factionalization within this model.Footnote 11

3.1 Intuitive idea

To begin with, let us consider an informal first pass, meant to capture the intuitive ideas of convergence, general divergence and factionalization. We can understand these possibilities as follows.

  • Convergence The beliefs of the population members will grow closer together as they gain evidence.

  • General Divergence The beliefs of the population members will grow further apart in all directions as they gain evidence.

  • Factionalization The beliefs of the population members spread out, but not uniformly. Instead, different beliefs become more correlated.

Convergence would be perhaps the least surprising of these possible outcomes. After all, it is well known that Bayesian agents will often converge when they update on the same information (as indicated by the famous results of Blackwell & Dubins, 1962; Huttegger, 2015; Nielsen, 2018; Schervish & Seidenfeld, 1990; see Freeborn, 2024b for a discussion of these results in the context of agents with a Bayesian belief network).Footnote 12 However, it is well known that Bayesian agents can polarize in single beliefs when they update on evidence (see Freeborn, 2024a; Jern et al., 2014). General divergence and factionalization would be more surprising outcomes: in some sense the agents would be polarizing not just in one belief, but in their overall beliefs.

I will suggest some more precise definitions in Sects. 3.2 and 3.3, but it will be useful to keep this intuitive picture in mind. I represent an example of each of these cases for an imaginary population in Fig. 1.

Fig. 1
figure 1

A schematic representation of an imaginary population of 60 agents, with two different beliefs, 1 and 2, represented by probabilities. The beliefs are shown at a starting timestep, and three hypothetical evolutions of this population at a later timestep. a A starting distribution of beliefs for the population. b A possible evolution from (a) in which the both beliefs have grown closer together. This is a case of convergence. c A possible evolution from (a) in which both beliefs have grown apart. This is a case of general divergence. d A possible evolution from (a) in which both beliefs have grown apart, but not uniformly: the two beliefs have become correlated. This is a case of factionalization

3.2 Variance explication

We can use the statistical variance to measure the spread of a single belief is across the population. A high variance in a population’s beliefs about hypothesis X suggests that the agents’ beliefs are spread out, whilst a low variance suggests that the agents’ beliefs are closely clustered together. We can use the absolute covariance to give one measure of the degree to which one belief gives us information about another. If the absolute covariance between X and Y is large, then knowing an agent’s belief about X allows us to predict something about their belief in Y.Footnote 13 We can define these quantities for our population as follows,

$$\begin{aligned} {\textbf {Variance}}{:} \,\, \sigma ^2_{X}&= \frac{1}{N} \sum _{i=1}^N (x_i - \mu _x )^2 \end{aligned}$$
(4)
$$\begin{aligned} {\textbf {Absolute Covariance}}{:}\,\, |\sigma _{X,Y} |&= \frac{1}{N} \sum _{i=1}^N |(x_i - \mu _x )(y_i - \mu _y ) |, \end{aligned}$$
(5)

where XY are binary random variables representing two propositions, \(x_i\) and \(y_i\) are the probabilities assigned to propositions X or Y being true by agent i, \(\mu _x\) and \(\mu _y\) are the corresponding average degree of beliefs across the population, \(\sigma _X\) and \(\sigma _Y\) are the corresponding standard deviations across the population.

With this in hand, we can give a new explication the concepts of convergence, general divergence and factionalization.

  • Convergence The average variance of the population’s beliefs decreases as the agents gain evidence.

  • General Divergence The average variance of the population’s beliefs increases, and the average absolute covariance increases or remains the same, as the agents gain evidence.

  • Factionalization The average variance of the population’s beliefs increases, but the average absolute covariance decreases, as the agents gain evidence.

3.3 Information-theoretic explication

Finally, we are ready to develop a more general explication of convergence, general divergence and factionalization. To do this, we will deploy several concepts from information theory (see Appendix A for definitions and a brief discussion; see Cover and Thomas (2006) for further detail).

Suppose that we have two joint probability distributions with the same support, \(P(X_2, X_2 \ldots X_N)\) and \(Q(X_2, X_2 \ldots X_N)\). The Jensen–Shannon (JS) divergence \(D_{JS} (P\mid Q)\) gives one natural way to measure the overall relatedness between two joint probabilistic distributions. It is given by,

$$\begin{aligned} D_{JS} (P\mid Q) = \frac{1}{2} D_{KL} \left( P \Biggm \vert \frac{P+Q}{2} \right) + \frac{1}{2} D_{KL}\left( Q \Biggm \vert \frac{P+Q}{2} \right) . \end{aligned}$$
(6)

where \(D_{KL}\) is the Kullback–Leibler divergence, given by,

$$\begin{aligned} D_{KL} (P\mid Q) = - \sum _{\begin{array}{c} x_1 \in \mathcal {X}_1, \\ {\ldots ,} \\ x_N \in \mathcal {X}_N \end{array}} P(x_1, \ldots x_N) \text {log} \frac{P(x_1, \ldots x_N)}{Q(x_1, \ldots x_N)}. \end{aligned}$$
(7)

The Jensen–Shannon entropy effectively gives a measure of the symmetrized joint information between two such distributions. It has the advantage of measuring the overall information that one distribution gives us about another, whereas the absolute covariance is only sensitive to linear relations.

For each joint probability distribution, \(P(X_1, X_2, \ldots X_N)\), we can define a corresponding product of marginal probabilities, \(P^m = P(X_1) P(X_2) \ldots P(X_N)\). In effect, the marginal probabilities product tells us what the probability distribution of the random variables would be if they were all independent. If we regard each of the \(P(X_i)\) as telling us the agent’s credence about some salient hypothesis of interest, \(X_i\), then we could interpret the marginal probabilities product as telling us the agent’s credences about each individual salient hypothesis, whilst neglecting beliefs about how those salient hypotheses are related.

Suppose that our population of A agents holds the set of joint probability distributions, \(P_1, P_2, \ldots , P_A\), with corresponding marginal probabilities products, \(P^m_1, P^m_2, \ldots , P^m_A\). Then the average JS divergence between the joint distributions across the population, \(\langle D^\text {joint}_{JS}\rangle \), gives one way to measure the overall relatedness of the joint probability distributions. On the other hand, the average JS divergence between the marginal probabilities products across the population, \(\langle D^\text {marginal}_{JS}\rangle \), gives one way to measure the overall closeness of the agents’ beliefs about the propositions, ignoring any correlations between these beliefs.

Now we have the tools in place for a plausible information-theoretic explication of convergence, general divergence and factionalization.

  • Convergence \(\langle D^\text {marginal}_{JS}\rangle \) decreases as the as the agents gain evidence.

  • General Divergence \(\langle D^\text {marginal}_{JS}\rangle \) increases and \(\langle D^\text {joint}_{JS}\rangle \) increases or stays the same as the agents gain evidence.

  • Factionalization \(\langle D^\text {marginal}_{JS}\rangle \) increases and \(\langle D^\text {joint}_{JS}\rangle \) decreases as the agents gain evidence.

Seen this way, there is one sense in which factionalization can be understood as a form of epistemic divergence, but another in which it can be thought of as a form of epistemic convergence. Factionalization is a form of divergence in the sense that the agents’ beliefs about the key, salient hypotheses grow further apart overall, \(\langle D^\text {marginal}_{JS}\rangle \) increases. However, it is a form of convergence, in the sense that, when the dependencies between beliefs are taken into account, the overall joint probability distributions grow closer together, \(\langle D^\text {joint}_{JS}\rangle \) decreases.

From hereon, I will primarily use the information-theoretic approach, which has the advantage of being sensitive to any statistical relation between the variables across the population, linear or not. However, at times it will be convenient to consider the variances of variables and the covariances or correlations between variables.

4 Simple examples

To get a better grasp on convergence and factionalization, it will be helpful to investigate some relatively simple examples. These should allow us to see how an actual belief network might drive convergence or factionalization. I will not provide an example of general divergence, for reasons that I will explain in Sect. 5.

In each example, we will follow the model assumptions set out in Sect. 2. I will also simulate a randomly generated population in each case, and demonstrate how its beliefs evolve. In each case I will assume that the agents’ degrees of belief about the exogenous hypotheses are uniformly distributed between 0 and 1.Footnote 14

4.1 Example 1: Convergence

Let us suppose that agents have beliefs about two distinct hypotheses, \(H_1\) and \(H_2\), and agree that \(H_2\) probabilistically depends on \(H_1\) as in Fig. 2. However, the agents do not agree about the probabilities that they assign to the two hypotheses, \(H_1\) and \(H_2\): let us assume beliefs about \(H_1\) are uniformly distributed across the population.Footnote 15 Perhaps, \(H_1\) represents the proposition, “The air pressure is low today”, and \(H_2\) represents the proposition, “It will rain today”. All agree that learning that it is raining today (\(H_2\) is true) provides the same degree of evidence that the air pressure is low today (\(H_1\) is true), and vice versa. Therefore, we should not expect any polarization to take place.

If agents receive the same evidence, then their beliefs will all update in the same direction, as shown in Fig. 3. The variance in their beliefs about \(H_2\) will decrease, and this in turn may drive a decrease in the variance of their beliefs about \(H_1\). Overall, epistemic convergence takes place. The joint probability distributions, \(P(H_1)P(H_2 \mid H_1)\), and marginal probabilities products, \(P(H_1)P(H_2)\), will move closer together.Footnote 16

4.2 Example 2: Factionalization

Now, let us allow the agents to have a slightly more complex network of beliefs, one that allows them to update particular beliefs in opposite directions. Let the population hold beliefs about three related hypotheses, \(H_1\), \(H_2\) and \(H_3\). It is already well known that Bayesian networks of this form can drive the polarization of individual beliefs (see Freeborn, 2023, 2024a, b; Jern et al., 2014 for similar examples).Footnote 17

Fig. 2
figure 2

a A Bayesian network structure with two variables, corresponding to degrees of belief about hypotheses \(H_1\) and \(H_2\). I assume that all agents agree about this structure. b The conditional probabilistic relations between \(H_1\) and \(H_2\)

Fig. 3
figure 3

Belief trajectories for a population of 15 agents, with regards to two related hypotheses, \(H_1\) and \(H_2\) as in Fig. 2b. The agents all update on 20 datapoints about \(H_2\), each with a likelihood ratio of 0.65. This drives all agents to update in the same, positive direction about \(H_1\). Arrow are indicative, showing only the directions in which their degrees of belief change

Once again, suppose that the agents start with uniformly distributed degrees of belief between 0 and 1, now about each of the exogenous variables, \(H_1\) and \(H_3\). Suppose that all agents agree that these beliefs are related: \(H_2\) probabilistically depends on \(H_1\) (as in Fig. 4). Perhaps \(H_1\) represents the proposition “The air pressure is low today”, \(H_3\) represents “My barometer will give the correct reading” and \(H_2\) represents “My barometer states that the air pressure is low today”. All agree about the same conditional relationships between these hypotheses. However, their different beliefs regarding \(H_3\) will partly determine how agents update their expectations about what the barometer will say. If I believe that the barometer is a systematically reliable instrument, then a low air pressure reading should increase my degree of belief that the air pressure really is low. On the other hand, if I believe the barometer systematically gives incorrect readings, then a low air pressure reading should decrease my degree of belief that the air pressure is low.

Fig. 4
figure 4

a A Bayesian network structure with three variables, corresponding to degrees of belief about hypotheses \(H_1\), \(H_2\) and \(H_3\). I assume that all agents agree about this structure. b The conditional probabilistic relations between \(H_1\), \(H_2\) and \(H_3\)

Fig. 5
figure 5

Belief trajectories for a population of 40 agents, with the belief network shown in Fig. 4. Only two beliefs, \(H_1\) and \(H_3\) are shown. The agents all update on 20 datapoints about \(H_2\), each with a likelihood ratio of 0.65. This drives the agents to polarize in their beliefs about \(H_1\) and \(H_3\). Observe that the agents beliefs about \(H_1\) and \(H_3\) become correlated as they coalesce into two clusters. Arrow are indicative, showing only the directions in which their degrees of belief change. Colors indicate whether the belief pair \((P(H_1 = \text {true}), P(H_2 = \text {true}))\) ends closest to (0,0) (blue) or (1,1) (orange) at the final timestep, as measured by the Euclidean distance. (Color figure online)

As before, all of the agents receive the same evidence about \(H_2\). Now the agents’ beliefs about \(H_1\) and \(H_3\) may be drawn in one of two different directions: either they increase their credence in \(H_1\) being true, and decrease it in \(H_3\) or vice versa, as in Fig. 5. Different degrees of belief in \(H_3\) drive polarization of beliefs \(H_1\), upon updating beliefs about \(H_2\). Likewise, different degrees of belief in \(H_1\) drive polarization of beliefs about \(H_3\). Indeed, the marginal probabilities products, \(P(H_1)P(H_2)P(H_3)\) may grow further apart. However, when we look at both beliefs, about \(H_1\) and \(H_3\) together, we see that the beliefs that started independent become correlated. As a result of these correlations, the joint probability distributions, \(P(H_1)P(H_3)P(H_2 \mid H_1, H_3)\) grow closer together. The population’s beliefs factionalize.

Why do the beliefs factionalize, rather than diverging in all directions, without correlations forming? One way to understand this is in terms of the independencies between the variables. Belief polarization arises here because the agents’ beliefs about the \(H_1\) and \(H_3\) can both provide independent information about how to update the other, given some value of \(H_2\).Footnote 18 As a result, unlike in the previous example, the correlations between variables can vary after updating \(H_2\). In fact, the correlations must vary if \(H_2\) is updated to a new value: given some agreed value of \(H_2\), then knowing the beliefs about \(H_3\) provides new information to us about the beliefs about \(H_1\).

We can draw a more general lesson from examples like this. Whenever updating one variable in a Bayesian population leads to the polarization of another variable, then at least some fully or partly independent variables must experience changes in their correlations. In Appendix B, I explain why this is the case. This realization is very suggestive: if at least some variables must become more correlated, does polarization always lead to factionalization, rather than general divergence? I will return to this question in Sect. 5.

4.3 Example 3: Multiple factions

Let us augment the previous example once more, to see how this process can lead to the population dividing into many different factions, rather than just two. A simple way to do this is to add a second polarizing node.

Let the population hold beliefs about five related hypotheses, \(H_1\), \(H_2\), \(H_3\), \(H_4\), and \(H_5\). Suppose that all agents agree that these beliefs are related, with \(H_3\) depending on \(H_4\) and \(H_5\), and with \(H_2\) depending on \(H_1\) and \(H_3\), as in Fig. 6. Perhaps \(H_1\) represents the proposition “The air pressure is low today”, \(H_3\) represents “My barometer will give the correct reading”, \(H_2\) represents “My barometer states that the air pressure is low today”, \(H_4\) represents “The barometer is aneroid” and \(H_5\) represents “aneroid barometers give systematically reliable results”. Now, different beliefs about \(H_5\) will drive polarization in \(H_4\) (and vice versa) given updated beliefs about \(H_1\). But the updated beliefs about \(H_1\) are themselves already polarized by the different beliefs about \(H_3\), given evidence about \(H_2\). As a result, rather than dividing into two factions as in the previous example, the beliefs about \(H_4\) and \(H_5\) now divide into four distinct factions, as shown in Fig. 7. In general, augmenting networks in this way, by adding more polarizing nodes can increase the number of factions that may form.

Fig. 6
figure 6

a A Bayesian network structure with five variables, corresponding to degrees of belief about hypotheses \(H_1\), \(H_2\), \(H_3\), \(H_4\) and \(H_5\). I assume that all agents agree about this structure. b The conditional probabilistic relations between \(H_1\), \(H_2\) and \(H_3\). c The conditional probabilistic relations between \(H_3\), \(H_4\) and \(H_5\)

Fig. 7
figure 7

Belief trajectories for a population of 60 agents, with the belief network shown in Fig. 6. Only two beliefs, \(H_4\) and \(H_5\) are shown. The agents all update on 20 datapoints about \(H_2\), each with a likelihood ratio of 0.65. This drives the agents to polarize in their beliefs \(H_1\), in turn leading to four-way factionalization in their beliefs about \(H_4\) and \(H_5\). Arrow are indicative, showing only the directions in which their degrees of belief change. Colors indicate whether the belief pair \((P(H_4 = \text {true}), P(H_5 = \text {true}))\) ends closest to (0,0) (blue), (0,1) (purple), (1,0) (green) or (1,1) (orange) at the final timestep, as measured by the Euclidean distance. (Color figure online)

5 Why do populations factionalize?

The examples in Sect. 4 illustrate how convergence and factionalization both arise, but not general divergence. In fact, given the definitions in Sect. 3.3, then agents should never rationally expect their population to exhibit general divergence upon learning the value of some variable, under the assumptions of our general model, and assuming that they know the population is rational. We can state this as a general condition.

No General Divergence Condition

Suppose that we have two rational agents, with beliefs specified by joint probability distributions \(P(X, Y, \ldots Z, D)\) and \(Q(X, Y, \ldots Z, D)\) over the same set of discrete, binary variables, \(\mathcal {X} = \{ X, Y, \ldots D \}\). Let us suppose that the two agents share the same conditional relationships, \(P(Y|X) = Q(Y|X)\), for all \(X, Y, \in \mathcal {X}\). Let us suppose that at least one agent is not certain about the value of D. Then, \(D_{JS}(P(X, Y, \ldots Z, D \mid D) \mid (P(X, Y, \ldots Z, D \mid D)) < D_{JS}(P(X, Y, \ldots Z, D ) \mid (P))\).

Proof

From the Kullback–Leibler divergence chain rule (Eq. 18) and the positivity of Kullback–Leibler entropy, it immediately follows that,

$$\begin{aligned}{} & {} D_{KL}(P(X, Y, \ldots Z \mid D) \mid (P(X, Y, \ldots Z \mid D)) \nonumber \\{} & {} \quad < D_{KL}(P(X, Y, \ldots D ) \mid (P(X, Y, \ldots D )). \end{aligned}$$
(8)

Furthermore,

$$\begin{aligned} D_{KL}(P(X, Y, \ldots Z \mid D) = D_{KL}(P(X, Y, \ldots Z, D \mid D). \end{aligned}$$
(9)

Then,

$$\begin{aligned}&D_{KL}(P(X, Y, \ldots Z, D \mid D) \mid (Q(X, Y, \ldots Z, D \mid D))\nonumber \\&\quad < D_{KL}(P(X, Y, \ldots Z, D ) \mid Q(X, Y, \ldots Z, D )). \end{aligned}$$
(10)

The result for Jensen–Shannon divergences follows immediately.

Therefore, if the agents’ overall beliefs grow further apart, then agents should always expect factionalization, not general divergence.Footnote 19 We can understand this as a cumulativity of information condition. If all of the rational agents in some sense acquire the same information, then in some sense their beliefs should move closer together. This does not mean that beliefs cannot polarize, but rather, if polarization generally takes place across all of their beliefs (i.e. their beliefs about the salient hypotheses become more spread out; \(D^\text {marginal}_{JS}\) increases) then the beliefs across the population must factionalize, or become more correlated (i.e. their beliefs about the salient hypotheses become more spread out; \(D^\text {joint}_{JS}\) must decrease). Whilst the population’s marginal beliefs about all the hypotheses individually can diverge, if we look at the the joint probabilities, then the population’s beliefs must nonetheless grow closer together. Another way to think of this is that, in one sense Bayesian learning is genuinely taking place in such a population. Alternatively, one might say that the population’s beliefs are becoming more orderly or predictable, even as the agents’ individual beliefs diverge.

Certain kinds of Bayesian belief polarization can only arise given certain structural or independence relations between the variables (see Appendix B).Footnote 20 In fact, we can understand these as conditions on the dependence between variables: polarization can only take place if the salient variables are dependent in precisely such a way that they must become more generally correlated after polarization. In other words, they can be viewed as conditions that exclude general divergence but allow for factionalization, consistent with our cumulativity of information approach above. I discuss this further in Appendix C.

6 Conclusions

Epistemic factionalization arises very naturally, even for ideally rational agents, who update on exactly the same evidence. This factionalization is driven by probabilistic relations between different beliefs. Different background beliefs drive polarization when the agents update beliefs on the same evidence in different ways: the same evidence can cause some agents to increase their confidence, whilst others decrease theirs. However, this same process tends to lead to different beliefs becoming correlated across a population. Factions emerge, in which agents tend to hold not just one, but many similar beliefs. This process often, but not always, corresponds to the coalescence of distinct clusters of agents, who hold many very similar beliefs, different from the agents in other clusters.

This kind of factionalization is an epistemically rational process. Indeed, it arises precisely because the agents are all rationally learning from the same evidence. There are two perspectives through which we might view factionalization. From one perspective, factionalization might look like a kind of convergence, whereas from another viewpoint, factionalization might look like a particularly severe form of polarization. Fully understanding factionalization requires us to study the phenomenon stereoscopically, using both of these lenses.

In the first sense, factionalization corresponds to the agents’ beliefs genuinely moving closer together: the agents’ overall joint probability distributions become more similar, as measured by the Kullback–Leibler divergences or Jensen–Shannon entropies. As a population factionalizes, the agents’ beliefs line up into two or more opposing camps, each of whom agree about many different beliefs. We can see factionalization as a process in which the populations beliefs become more orderly or predictable, as correlations develop or strengthen between the different agents’ beliefs.

In the second sense, factionalization can be understood as a form of multi-belief polarization. The key is whether we consider the joint probability distributions or marginal probabilities products more relevant to the task at hand. If we are primarily concerned with the beliefs about the individual hypotheses themselves, then factionalization may represent a particularly severe kind of polarization. After all, factionalization indicates that the agents have grown further apart in their beliefs about each distinct hypotheses, even as their conditional probabilities may have grown closer together. Recall our original example, a population factionalizing over the issues of anthropogenic climate change and Covid-19 vaccines, perhaps driven by an underlying belief in the trustworthiness of scientists. If the agents grow apart on both of these issues, and their beliefs become more correlated, then this seems to correspond to a severe kind of polarization, even as the agents’ joint probabilities grow closer together.

Perhaps one way to put this is that a purely formal epistemologist might feel reassured by factionalization. After all, it is the factionalization process that allows a population’s overall beliefs (as represented by the joint probability distributions) to converge, even when individual beliefs are polarizing. By contrast, a social epistemologist or social scientist might find factionalization more concerning. After all, factionalization indicates that the population’s beliefs about each individual hypotheses are moving further apart; in such a way that the population is dividing into factions that disagree about not just one belief, but many.

Moreover, no matter how rational the process, this kind of regimentation of beliefs into distinct factions might often be problematic for real populations. For instance, it is well-known that trust tends to decrease between people with very different beliefs (Kitcher, 1995; Rogers, 1983). It is plausible that factionalization across many different beliefs might exacerbate the general problems with social epistemic polarization (Kawakatsu et al., 2021; Levin et al., 2021). In a real world population, processes mechanically similar to this might plausibly contribute towards populations dividing into distinct worldviews, ideologies or paradigms. The fact that the beliefs of agents in each such faction might be internally consistent may discourage convergence or learning from agents in other factions.

Ultimately, the model presented here explains only one kind of factionalization. A more complete model of social factionalization would need to include many other factors, not limited to cognitive biases of agents, differential access to information between agents, and biased sources of information. However, the type of model studied here suggests that, even fixing all such biases would not, in itself, be sufficient to eradicate factionalization.

As Freeborn (2024b) points out, this type of rational polarization could potentially be resolved with the right kind of evidence. If rational agents are able to acquire the same sufficient evidence to settle all their beliefs, then such agents should expect their beliefs to merge. However, in practice, we do not generally have such complete evidence. Bridging the gap between such ideological factions could be challenging. The beliefs of each opposing faction are rationally held, and mutually self-supporting, on the basis of the same evidence. As a result, the epistemic factions that so form could be difficult to remove through a process of convergence. Simply acquiring more evidence pertaining to just one belief could plausibly drive further factionalization.