Reliability of testimonial norms in scientific communities


Several current debates in the epistemology of testimony are implicitly motivated by concerns about the reliability of rules for changing one’s beliefs in light of others’ claims. Call such rules testimonial norms (tns). To date, epistemologists have neither (i) characterized those features of communities that influence the reliability of tns, nor (ii) evaluated the reliability of tns as those features vary. These are the aims of this paper. I focus on scientific communities, where the transmission of highly specialized information is both ubiquitous and critically important. Employing a formal model of scientific inquiry, I argue that miscommunication and the “communicative structure” of science strongly influence the reliability of tns, where reliability is made precise in three ways.

This is a preview of subscription content, access via your institution.

Fig. 1


  1. 1.

    The Lehrer–Wagner model entails that, ceteris paribus, greater weight ought to be assigned to beliefs that are held by many experts rather than a few. See Lehrer and Wagner (1981). In contrast, Goldman (2001) argues that, because experts judgments might be highly correlated due to common information, agreement cannot always provide greater evidence of a hypothesis. Thus, Goldman claims that there are several heuristics that one might use to evaluate expert testimony.

  2. 2.

    See Burge (1993), Coady (1973), and Foley (2005) for several different defenses of the non-reductionist position. See Adler (1994), Fricker and Cooper (1987), and Fricker (1994) for defenses of reductionism. Additional references are available in Lackey (2011).

  3. 3.

    Bolded terms are defined in the Appendix, which also contains proofs of the theorems.

  4. 4.

    In computer simulations, I assume there are finitely many pills \(1, 2, \ldots , n\). At each stage, a scientist treats a single patient with a pill \(i \le n\) and observes an outcome. Outcomes are normally distributed with unknown mean \(e_i\) (i.e. the effectiveness of the pill) and known variance \(\sigma _i^2\). The normality assumption is immaterial to the results below: similar results are obtained when the agents draw from other types of distributions. Moreover, the theorems do not depend upon assumptions concerning the probabilistic process by which data is generated; in particular, data need not be i.i.d.

  5. 5.

    In simulations, scientists employ likelihood ratio tests (lrts) to test the null hypothesis \(e_i \ge 0\) versus the alternative \(e_i < 0\), where \(e_i\) is the effectiveness of pill \(i\). To make such tests convergent (specifically, s.a.s convergent as defined in the Appendix), significance levels are adjusted downward with sample size. Again, the use of lrts, or frequentist rather than Bayesian methods, is inconsequential for the analytic results below. What matters is that scientists methods are convergent.

  6. 6.

    The notion of convergence employed here is what I call “strong almost-sure convergence” which is stronger than almost-sure convergence (and hence, convergence in probability) in general, but logically equivalent to almost-sure convergence when the partition of the parameter space is finite. See the Appendix for details.

  7. 7.

    See the definitions of finite memory, stability, and sensitivity in the Appendix. Any realistic tn has finite memory, and though space prevents me from arguing so here, I believe stability and sensitivity are minimal normative requirements for any tn. Majoritarian Reidians do not converge because they can get caught in “echo chambers”: if a tightly-connected group of agents all have identical, false beliefs about the efficacy of some pill outside their area of expertise, then when each agent in the group polls her neighbors, she will find her opinion to be in the majority and stick to it. So each agent in the group holds some belief precisely because others in the group do. Because agents in the group polls all neighbors, and not just the experts, it follows that the group’s beliefs cannot be penetrated by external information from experts. Majoritarian e-trusters fail to converge because they behave like majoritarian Reidians in the absence of expert neighbors.

  8. 8.

    The code for the simulations can be found on the author’s website.

  9. 9.

    Convergence times were compared using a one-way random effects analysis of variance. The raw data and relevant sample statistics are available on the author’s website.

  10. 10.

    None of the results below rely on the fact that \(\epsilon \) is the same for all agents. This assumptions is made for simplicity of calculations and proofs only.

  11. 11.

    Statistical tests supporting these claims are summarized in the Appendix to the longer version of this paper.

  12. 12.

    What I call insularity is often called homophily by social scientists. There is relatively little work about homophily affects learning. An important exception is Golub and Jackson (2012), which develops a radically different model from the one presented here.

  13. 13.

    Zollman (2011) finds a very similar trade-off between speed and reliability in a different model of scientific inquiry. This is evidence that the phenomenon (i.e., the trade-off) is robust under varying modeling assumptions.


  1. Adler, J. E. (1994). Testimony, trust, knowing. The Journal of Philosophy, 91(5), 264–275.

    Article  Google Scholar 

  2. Burge, T. (1993). Content preservation. The Philosophical Review, 102(4), 457–488.

    Article  Google Scholar 

  3. Coady, C. A. J. (1973). Testimony and observation. American Philosophical Quarterly, 10(2), 149–155.

    Google Scholar 

  4. Foley, R. (2005). Universal intellectual trust. Episteme: A Journal of Social Epistemology, 2(1), 5–11.

    Article  Google Scholar 

  5. Fricker, E. M. (1994). Against gullibility. Synthese Library, p. 125.

  6. Fricker, E. M., & Cooper, D. E. (1987). The epistemology of testimony. Proceedings of the Aristotelian Society, Supplementary Volumes, 61, 57–106.

    Google Scholar 

  7. Goldman, A. I. (2001). Experts: Which ones should you trust? Philosophy and Phenomenological Research, 63(1), 85–110.

    Article  Google Scholar 

  8. Golub, B., & Jackson, M. O. (2012). How homophily affects the speed of learning and best-response dynamics. Quarterly Journal of Economics, 127(3), 1287–1338.

    Google Scholar 

  9. Grinstead, C. M., & Snell, J. L. (1997). Introduction to probability. Providence, RI: American Mathematical Society.

  10. Lackey, J. (2011). Testimony: Acquiring knowledge from others. In A. I. Goldman & D. Whitcomb (Eds.), Social epistemology: Essential readings (pp. 314–337). Oxford: Oxford University Press.

    Google Scholar 

  11. Lehrer, K., & Wagner, C. (1981). Rational consensus in science and society: A philosophical and mathematical study (Vol. 24). London: D. Reidel.

  12. Zollman, K. J. (2011). The communication structure of epistemic communities. In A. Goldman (Ed.), Social epistemology: Essential, readings (pp. 338–350). Oxford: Oxford University Press.

Download references


This project was inspired by Kevin Zollman’s work on the communicative structure of scientific communities and, in particular, by a presentation he gave at the 2012 Pitt-CMU graduate conference on testimony. I am thankful to Kevin thanks for both his comments on earlier drafts of this paper and for intellectual inspiration. I am also indebted to Gerhard Schurz for suggesting that I investigate the effects of miscommunication within my model. Thanks to Jan Sprenger, David Danks, Clark Glymour, Kevin Kelly, and Michael Strevens, who all provided detailed comments, constructive criticism, and helpful suggestions for improving earlier drafts of this paper. Finally, the paper also benefited from comments by an anonymous referee and participants at the “Collective Dimensions of Science” conference in Nancy, France. This work was partially supported by the National Science Foundation under Grant No. SES 1026586. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

Author information



Corresponding author

Correspondence to Conor Mayo-Wilson.



General strategy

The theorems in the text are all consequences of well-known results about Markov processes. Here, I explain the general strategy of the proofs, as the notation below is rather cumbersome. To do so, I first informally sketch a proof of Theorem 1, which asserts that four of the tns in the body of the paper converge in the absence of miscommunication.

Consider a network of Reidians. Imagine one of the Reidians—let’s call her Jane—is deciding which of her neighbors to trust at time \(t\) concerning a question outside her area of expertise. If Jane has an expert neighbor, Jill, then there’s some chance that Jane will adopt the Jill’s opinion. At time \(t+1\), there’s some chance that Jane’s neighbors will adopt Jane’s beliefs. Then at time \(t+2\), neighbors of neighbors of Jane may adopt Jane’s belief, and so on. In this way, there’s some chance that Jill’s expert opinion propagates through the entire network (if the network is connected), and this is true at any point in time. So there’s some chance that every agent outside of Jill’s area of expertise will eventually hold Jill’s belief.

Since experts eventually hold true beliefs in their area of expertise, this entails that all agents will eventually hold Jill’s true belief. Once everyone has the same, true belief, the Reidian norm ensures that everyone continues to believe it. Since this is true of every area of expertise, the network must converge. Similar arguments apply to e-trusters, proximitists, and majoritarian proximitists.

This argument is an instance of a more general argument concerning absorbing Markov processes. In general, Markov process is one which a system’s behavior depends only upon its most recent past. For example, consider a gambler’s total earnings. Suppose a gambler repeatedly makes \(\$20\) bets on a roulette wheel. Then the gambler’s total earnings at time \(t+1\) depend only upon her earnings at time \(t\) (and whether or not she wins her current bet); whether the gambler was nearly bankrupt or enormously wealthy at some stage prior to \(t\) is irrelevant to her total earnings at \(t+1\).

Some Markov processes have absorbing states, which means that the system reaches a state that it cannot leave. For example, suppose our gambler is very unlucky and loses all of her money at time \(t\). Then she cannot make any more bets, and so her total earnings equal zero from time \(t\) onward. For this reason, the state of having no money is called absorbing. Similarly for the state in which the gambler wins all of the casino’s money.

Under very weak conditions, a Markov process with absorbing states will eventually transition into one of its absorbing states and stay there forever. The crucial assumption is that there is some number \(n\) such that, no matter what state the Markov process is, there’s some finite probability that it will transition to an absorbing state in \(n\) steps. This guarantees that the probability that the process transitions to an absorbing state at some point is one. To prove that the tns in the paper are convergent, therefore, it suffices to show three assumptions are met: (1) agents’ beliefs behave like a Markov process, (2) the state in which they all have true beliefs is uniquely absorbing, and (3) there is some number \(n\) such that, for any state of the process, there is positive probability that agents will find themselves in the unique absorbing state (of all true beliefs) \(n\) stages in the future.

For the first assumption, note that all the tns in the body of the text have a common property: an agent’s beliefs at time \(t+1\) (outside her area of expertise) depend only upon what her neighbors believed at time \(t\). For example, a Reidian fixes her belief at time \(t+1\) by adopting a neighbor’s belief at time \(t\). So her neighbors’ beliefs prior to \(t\), no matter how steadfast or erratic, are completely irrelevant to what she believes now. Consider the vector of all agents’ beliefs about all pills under investigation. If all the agents employ tns like the ones in the text, therefore, their beliefs will behave like a Markov process, just like the gambler’s earnings.

The argument above is right in outline, but a tiny bit too fast. Recall, agents’ opinions at a given time \(t\) depend not only upon others’ opinions, but also upon data. According to my assumptions, agents can use methods to make inferences from any amount of data whatsoever, not just the set of most recent observations. So technically, agents’ beliefs do not form a Markov process. This is, however, easily fixed.

Recall, I assume that for every researcher in the network, there is some stage of inquiry at which she holds true beliefs in her area of expertise from that stage onward. Let \(E_n\) be the event that all researchers have converged to the truth in their respective domains by stage \(n\). The crucial observation is that, conditional on \(E_n\), the evolution of agents’ beliefs do form a Markov process. Why? Researchers beliefs in their area of expertise are fixed from \(n\) onward by assumption, and by definition, the six tns make one’s beliefs outside of one’s area of expertise dependent only upon the last stage. Since experts beliefs converge to the truth with probability one, it follows that some \(E_n\) will occur with probability one, and so the evolution of agents’ beliefs will behave like a Markov process after enough time. This is what the first two lemmas concerning what I call PC-Markov processes say below. So much for the first assumption.

What about the second assumption? Notice that agents employing the tns in the text only change their beliefs if their neighbors disagree. So it looks like any state in which all agents have identical beliefs is an absorbing one. However, “all true beliefs” is the unique absorbing state as I assumed that experts eventually have true beliefs in their area of expertise. In general, any tn satisfying what I call stability will be absorbed in such a way; stability says that if an agents’ neighbors unanimously believe \(\varphi \) for as long as the agent can remember, then the agent also believes \(\varphi \).

Finally, one needs to show that the network transitions to the absorbing state of “all true beliefs” with probability one. This is what the Jane and Jill example showed. For Reidians and e-trusters, there is some positive probability that experts’ opinions propagate through the network in at most \(L\) many steps, where \(L\) is the length of the longest path in the network. In general, an assumption that I call sensitivity guarantees that true expert opinions propagate through the network. Roughly, an agent’s tn is sensitive if, for every area of expertise, there is some positive probability that the agent adopts the belief of her neighbors that are closer to an expert in the area, regardless of what others in her neighborhood believe. Proximitism is sensitive because agents always trust their more proximate neighbors; Reidianism and e-trusting are sensitive because they trust all their neighbors with some positive probability. Majoritarian Reidianism and majoritarian e-trusting are not sensitive because neighbors who are closer to an expert can be outvoted.

In the presence of miscommunication, however, the “all true beliefs” is no longer absorbing: an agent may misinterpret her neighbors and develop false beliefs, even if her neighbors’ beliefs are all true. Instead, agents beliefs evolve according to a regular Markov process (again, conditional on having converged in their respective areas of expertise). Regular Markov processes are, in a sense, the opposite of absorbing ones: there is some number of steps \(n\) such that the process can transition from any state to any other state in exactly \(n\) steps. No state is absorbing. It can be shown that, for each state \(s\) of a regular Markov processes, the probability that the process is in state \(s\) approaches a fixed value as the number of states gets large.

Why do agents’ beliefs evolve according to a regular Markov process in the presence of miscommunication? The six tns are open-minded in the sense that for any particular pill and any judgment about the pill (i.e. effective or not), there is some vector \(n\) such that if an agent thinks her neighbors beliefs are represented by \(n\), then she will adopt the judgment in question about the pill. Let \(b\) be any vector describing all agents’ beliefs outside their respective areas of expertise. Because there is some fixed, positive chance of miscommunication on any stage of inquiry, there is some positive probability that agents will hear exactly what they need to hear from their neighbors to form beliefs \(b\), regardless of what everyone in the network actually believes at the moment. So agents beliefs’ can always transition to \(b\) in exactly one step, which means the process is regular. Note, because \(b\) was arbitrary, it also follows that agents always have some positive chance of developing false beliefs, regardless of whether experts have converged to true ones.

This immediately entails that the error rate of the network approaches a fixed value as Theorem 3 says. Let \(err_t(b)\) be the number of erroneous beliefs in the network if agents beliefs’ are represented by \(b\) and if the truth is \(t\). Because agents’ beliefs evolve according to a regular Markov process, the above mentioned theorem entails there is some fixed probability \(p(b)\) that agents will have beliefs \(b\) in the limit. The error rate of the network is just the weighted average \(\sum _{b \in B} p(b) \cdot err_t(b)\), where \(B\) is the set of all possible belief vectors for the network.


Let \(S^T\) denote all functions from \(T\) to \(S\), and define \(S^{<\mathbb N }\) to be all finite sequences from \(S\). Let \(|S|\) denote the cardinality of \(S\); when \(S\) is a sequence, \(|S|\) is therefore its length. Given a sequence \(\sigma \) and \(n \le |\sigma |\), let \(\sigma _n\) denote the \(n^{th}\) coordinate of \(\sigma \). If the coordinates of \(\sigma \) are likewise sequences, then let \(\sigma _{n,k}\) be the \(k^{th}\) coordinate of the \(n^{th}\) coordinate of \(\sigma \). And so on. Let \(\sigma \upharpoonright n\) denote the initial segment of \(\sigma \) of length \(n\).

Given sets \(S_1, S_2, \ldots , S_n\), let \(\times _{j \le n} S_j\) be the Cartesian product. Given a collection of \(\sigma \)-algebras \(\langle S_i, \mathcal{S}_i \rangle _{i \in I}\), let \(\otimes _{i \in I} \mathcal{S}_i\) denote the product algebra. In particular, \(\otimes _{n \in \mathbb N } \mathcal{S}\) is the infinite product space generated by a single \(\sigma \)-algebra. Given a \(\sigma \)-algebra \(\mathcal{S}\), let \(\mathbb P (\mathcal{S})\) denote the set of all probability measures on \(\mathcal{S}\). If \(p \in \mathbb P (\mathcal{S})\), let \(p^n \in \mathbb P (\otimes _{k \le n} ~\mathcal{S})\) denote the product measure on \(\otimes _{k \le n} ~\mathcal{S}\). When \(\mathcal{S}\) is a Borel algebra, these measures extend uniquely to a measure \(p^{\infty }\) on \(\otimes _{n \in \mathbb N } \mathcal{S}\) (i.e., \(p^{\infty } \in \mathbb P (\otimes _{n \in \mathbb N })\) is the unique measure such that \(p^{\infty }(F_1 \times F_2 \ldots \times F_n \times S^\mathbb{N } ) = p(F_1)\cdot p(F_2) \cdots p(F_n)\), where \(F_i \in \mathcal{S}\) for all \(i \le n\)). Given a metric space \(M\), let \(\mathbb B (M)\) denote the Borel algebra.


The appendix assumes familiarity with Markov processes. For definitions of undefined terms below, see any introductory exposition of Markov processes; I will refer to Chap. 11 in Grinstead and Snell (1997).

Consider a sequence \(\langle X_n, E_n \rangle _{n \in \mathbb N }\), where the \(X_n\)’s are random variables and the \(E_n\)’s are events. Call the sequence a piecewise conditional Markov process (or pc-Markov process) if

  1. 1.

    The events \(\langle E_n \rangle _{n \in \mathbb N }\) are pairwise disjoint,

  2. 2.

    \(p(\cup _{n \in \mathbb N } E_n) = 1\), and

  3. 3.

    \(\langle X_k \rangle _{k \ge n}\) is a time-homogeneous Markov process with respect \(p(\cdot | E_n)\), i.e., \(~p(X_{k+1} | E_n, X_1, X_2, \ldots X_k) = p(X_{k+1} | E_n, X_k)\) for all \(k \ge n\).

Call \(\langle X_k \rangle _{k \ge n}\) the pieces of a pc-Markov process, where \(X_k\) is interpreted as a function from a probability space with the measure \(p(\cdot | E_n)\). Say a pc-Markov process is uniform if all its pieces have the same transition matrix.

Say a pc-Markov process is ergodic/regular/absorbing if each of its pieces is ergodic/regular/absorbing. Say it is uniformly regular/absorbing if it is uniform and the pieces are regular/absorbing. The next two theorems show that the asymptotic behavior of uniformly absorbing (or regular) pc Markov processes is identical to that of each of their pieces. They are routine corollaries of 11.3 and 11.7 in Grinstead and Snell (1997) respectively.

Theorem 4

Suppose \(\langle X_n, E_n \rangle _{n \in \mathbb N }\) is a uniformly absorbing, pc-Markov process with absorbing states \(S_*\). Then \(p(\lim _{n \rightarrow \infty } X_n \in S_*) = 1\).

Theorem 5

Suppose \(\langle X_{n}, E_n \rangle _{n \in \mathbb N }\) is a uniformly regular, pc-Markov process with transition matrix \(\varvec{P}\). Then for any state \(s_i \in S\) there is some probability \(r_i \in [0,1]\) such that \(\lim _{n \rightarrow \infty } p(X_n = s_i) = r_i\).


Worlds and questions

Define a question to be a triple \(\langle W, \langle \Theta , \rho \rangle \rangle \), where \(W\) is a set called worlds, \(\Theta \) is a partition of \(W\), and \(\rho \) is a metric on \(\Theta \). Elements of \(\Theta \) are called answers. Given a world \(w\), let \(\theta _w \in \Theta \) be the partition cell containing \(w\).


Suppose one is interested in determining whether the mean of a normal distribution is at least zero. In this case, \(W\) is the set of ordered pairs \(\langle \mu , \sigma ^2 \rangle \in \mathbb R \times \mathbb R ^+\) representing the mean and variance of the unknown distribution, and \(\Theta :=\{ \theta _{\ge 0}, \theta _{< 0} \}\), where \(\theta _{\ge 0} = \{ \langle \mu , \sigma ^2 \rangle \in W: \mu \ge 0 \}\) and \(\theta _{< 0} = \{ \langle \mu , \sigma ^2 \rangle \in W: \mu < 0 \}\). Define \(\rho \) to be the discrete metric on \(\Theta \). Let \(Q_N\) be the question described here. \(Q_N\) is the question described in the example in the body of the paper.

Learning problems and methods

A data generating process for a question \(Q=\langle W, \langle \Theta , \rho \rangle \rangle \) is a pair \(\langle \langle D, \mathcal{D} \rangle , c \rangle \) where \(\langle D, \mathcal{D} \rangle \) is a measurable space, and \(c: W \rightarrow \mathbb P (\otimes _{n \in \mathbb N } ~\mathcal{D})\) is a function, whose values \(c_w\) are called the chances under \(w\). A learning problem \(L\) is a pair consisting of a question and a data generating process. Informally, \(D\) represents data. For all \(w \in W\), the probability measure \(c_w\) specifies how likely one is to observe particular data sequences.


Let \(Q=Q_N, D= \mathbb R \) and \(\mathcal{D} = \mathbb B (\mathbb R )\). For every world \(w = \langle \mu , \sigma \rangle \in \mathbb R \times \mathbb R ^+\), let \(p_w\) be the unique measure on \(\mathcal{D}\) such that the density of \(p_w\) is a normal distribution with mean \(\mu \) and variance \(\sigma ^2\). So data are just sample points pulled from a normal distribution. Let \(c_w = (p_w)^{\infty }\), and let \(L_N\) be learning problem described here.

A method for a learning problem \(L\) is a function \(m:D^{<\mathbb N } \rightarrow \mathbb P (\mathbb B (\Theta ))\). Let \(m_d := m(d)\) for all \(d \in D^{<\mathbb N }\). Informally, a method takes data sequences as input and returns sets of answers with different probabilities.


In the learning problem \(L_{N}\), one method is to employ a likelihood ratio test to test the null hypothesis \(H_0: \mu \ge 0\) versus the alternative, where the significance of the test is decreased at a rate of the natural log of the sample size. Formally, let \(d \in \mathbb R ^n\) be a data sequence, and let \(\mu (d)\) and \(\sigma ^2(d)\) be the (sample) mean and variance of the data \(d\). Let \(p_{d}\) be the probability measure on \(\mathbb R \) such that the density of \(p_d\) is a normal distribution with mean \(0\) and variance \(\sigma ^2(d)\). Let \(\alpha \in (0,1)\) be a fixed significance level, and define a method \(m\) such that \(m_d\) assigns (i) probability one to \(\theta _{\mu \ge 0}\) if \(p_{d}(x \in \mathbb R : x \ge \mu (d) \}) \ge \frac{1 - \alpha }{\ln |d|}\) and (ii) probability one to \(\theta _{\mu < 0}\) otherwise.


Fix some natural number \(n\). Given a method \(m\) and world \(w\), define \(p^n_{w,m}\) to be the unique measure on \(\langle \Theta ^n, \otimes _{k \le n} \mathbb B (\Theta ) \rangle \) satisfying the following. For all “rectangles” \(E_1 \times E_2 \times \cdots \times E_n \in \otimes _{k \le n} \mathbb B (\Theta )\) (i.e., \(E_k \in \mathbb B (\Theta )\) for all \(k \le n\)):

$$\begin{aligned} p^n_{w,m}(E_1 \times E_2 \times \cdots \times E_{n}) = \int _{D^n} \prod _{k \le n} m_{\delta \upharpoonright k} (E_k) ~ dc^n_w (\delta ) \end{aligned}$$

where (1) \(c^n_w\) is the unique measure such that \(c_w^n(F)=c_w(F \times D^\mathbb{N })\) for all \(F \in \otimes _{k \le n} \mathcal{D}\), and (2) \(\delta \in D^n\). Under \(p^n_{w,m}\), the probability of a method returning a sequence of answers is the chance of obtaining a data sequence \(\delta \) (given by \(c_w^n\)) times the probability that the method returns a given answer (given by \(m\)) in response to \(\delta \).

It is easy to show that there is a unique probability measure \(p_{w,m} \in \mathbb P (\otimes _{n \in \mathbb N })\) such that \(p_{w,m}(E \times \Theta ^\mathbb{N }) = p_{w,m}^n(E)\) for all \(E \in \otimes _{k \le n} \mathbb B (\Theta )\) and all \(n \in \mathbb N \). Further, the following are events in \(\otimes _{n \in \mathbb N } \mathbb B (\Theta ) \rangle \), where \(\theta \in \Theta \):

$$\begin{aligned} \big \{\overline{\theta }_n = \theta \; \text{ for } \text{ large } \text{ n }\big \}&:= \Big \{\overline{\theta } \in \Theta ^\mathbb{N }: (\exists n \in \mathbb N )(\forall k \ge n) \overline{\theta }_k = \theta \Big \} \\ \big \{\lim _{n \rightarrow \infty } \overline{\theta }_n = \theta \big \}&:= \Big \{\overline{\theta } \in \Theta ^\mathbb{N }: \lim _{n \rightarrow \infty } \overline{\theta }_n = \theta \Big \} \end{aligned}$$

A method \(m\) is called almost surely (a.s.) convergent if \(p_{w,m}(\lim _{n \rightarrow \infty } \overline{\theta }_n = \theta _w) = 1\) for all \(w \in W\), and it is called strongly almost surely (s.a.s.) convergent if \(p_{w,m}(\overline{\theta }_{n}= \theta _w \text{ for } \text{ large } n) = 1\). When \(\Theta = \{\{r\}: r \in \mathbb R ^d \}\) is a parametric model, then a.s. convergence as defined here is the standard notion of a.s. convergence of a parameter estimator. It is trivial to show that, if \(\Theta \) is finite, then a.s. convergence entails s.a.s. convergence.


In the learning problem \(L_{N}\), the method defined above is a.s. convergent by the second Borel Cantelli Lemma. Hence, it is s.a.s. convergent by the previous remark and the fact that \(\Theta \) is finite.

Expert networks

A network is a finite undirected graph \(G\); vertices of \(G\) are called agents. A group is a set of agents \(J \subseteq G\). For any \(g \in G\), let \(N_G(g) \subseteq G\) denote the group of agents \(g' \in G\) such that \(g\) and \(g'\) are incident to a common edge. Call \(N_G(g)\) the neighborhood of \(g\), and call its elements neighbors of \(g\). For simplicity, I assume every agent is her own neighbor. When \(G\) is clear from context, I will write \(N(g)\) instead of \(N_G(g)\).

An expert network \(\mathcal{E}\) is a pair \(\langle G, \langle L_g \rangle _{g \in G}, \langle m_g \rangle _{g \in G} \rangle \) such that \(G\) is a network, \(L_g\) is a learning problem for each agent \(g \in G\), and \(m_g\) is a method for \(L_g\). For all \(g \in G\), let \(Q_g\) be the question confronted by \(g\); define \(\Theta _g, c_{w,g}\), etc., similarly. An expert network can be represented by a colored undirected graph such that two vertices \(g\) and \(g'\) are the same color just in case \(Q_g = Q_{g'}\).


In the example in the body of the paper, the expert networks consist of agents confronted with instances of learning problem \(L_N\). Note different agents may sample from different normal distributions.

Let \(\Theta _\mathcal{E} = \{ \Theta _g : g \in G \}\) be the set of questions faced by agents in the expert network \(\mathcal{E}\), and let \(A_{\mathcal{E}} = \times _{\Theta \in \Theta _{\mathcal{E}}} \Theta \) be the set of answers to all questions raised in the expert network. Define \(\Theta _{\mathcal{E} - g} = \Theta _{\mathcal{E}} \setminus \{\Theta _g\}\) to be the set of questions faced by agents other than \(g\), and \(A_{\mathcal{E}-g} = \times _{\Theta \in \Theta _{\mathcal{E} - g}} \Theta \) be all possible answers.

For brevity, I introduce the following notation conventions. \(\theta \) will represent an answer to a single question \(\Theta \), and \(a\) designates answers to several questions. Generally, \(a\) will be a member of \(A_{\mathcal{E}}\) or of \(A_{\mathcal{E}-g}\). The bolded letter \(\varvec{a}\) will indicate a group’s answers to several questions; so \(\varvec{a} \in (A_{\mathcal{E}})^J\) or \(\varvec{a} \in (A_{\mathcal{E} - g})^J\) for some \(J \subseteq G\). Finally, I use the “bar-notation” \(\overline{\varvec{a}}\) to indicate a sequence of group answers to several questions (so \(\overline{\varvec{a}} \in ((A_{\mathcal{E}})^{J})^{< \mathbb N }\)).

Recall, there is a Borel algebra \(\mathbb B (\Theta )\) over each \(\Theta \in \Theta _{\mathcal{E}}\). Hence, one can define \(\mathcal{A}_{\mathcal{E}}\) be the product \(\sigma \)-algebra on \(A_{\mathcal{E}} = \times _{\Theta \in \Theta _{\mathcal{E}}} \Theta \), and similarly for \(\mathcal{A}_{\mathcal{E}-g}\). It is easy to check the following are events in these algebras:

$$\begin{aligned} \big \{(\forall g \in G) \overline{\varvec{a}}_{n,g} \!=\! a \text{ for } \text{ large } n\big \}&= \big \{\overline{\varvec{a}} \in ((A_\mathcal{E})^G)^\mathbb{N }: (\exists n \in \mathbb N )(\forall k \!\ge \! n) (\forall g) \overline{\varvec{a}}_{n,g} \!=\! a\big \} \\ \Big \{(\forall g \in G) \lim _{n \rightarrow \infty } \overline{\varvec{a}}_{n,g} = a\Big \}&= \Big \{\overline{\varvec{a}} \in ((A_\mathcal{E})^G)^\mathbb{N }: (\forall g \in G) \lim _{n \rightarrow \infty } \overline{\varvec{a}}_{n,g} = a\Big \} \end{aligned}$$

Testimonial norms

A testimonial norm (tn) is a class of functions \(\tau _{\mathcal{E},g}: ((A_{\mathcal{E} - g})^{N(g)})^{< \mathbb N } \rightarrow \mathbb P (\mathcal{A}_{\mathcal{E} - g})\), where \(\mathcal{E}\) is an expert network and \(g\) is an agents in \(\mathcal{E}\). Informally, a tn specifies a probability distribution over answers to questions outside \(g\)’s specialty given what \(g\)’s neighbors have reported in the past.

To define the six tns introduced in the body of the paper, let \(\overline{\varvec{a}} \in ((A_{\mathcal{E} - g})^{N(g)})^{< \mathbb N }\) be an arbitrary sequence of answer reports of \(g\)’s neighbors to all questions of interest. Suppose \(\overline{\varvec{a}}\) has length \(n\). Recall that, for each agent \(h\) in \(g\)’s neighborhood and each \(\Theta \in \Theta _{\mathcal{E}-g}\), the symbol \(\overline{\varvec{a}}_{n,h,\Theta }\) represents \(h\)’s report to question \(\Theta \) on stage \(n\) (i.e. the last stage of \(a\)). Similarly, if \(a \in A_{\mathcal{E} - g}\) is an answer to all questions outside of \(g\)’s area of expertise, and if \(\Theta \in \Theta _{\mathcal{E} - g}\) is one such question outside of \(g'\)s area of expertise, then \(a_{\Theta }\) is the answer \(a\) provides to the question \(\Theta \).


Reidianism is the norm such that for all \(a \in A_{\mathcal{E} - g}\):

$$\begin{aligned} \tau _{\mathcal{E}, g}(\overline{\varvec{a}})(a) = \prod _{\Theta \in \Theta _{\mathcal{E} - g}}\frac{ |\{h \in N(g): a_{\Theta } = \overline{\varvec{a}}_{n,h,\Theta } \}|}{|N(g)|} \end{aligned}$$

In other words, an answer \(\theta \) to a given question \(\Theta \) is chosen to be the proportion of one’s neighbors that report \(\theta \) on the most recent stage. Answers to different questions are chosen independently of one another, so the probability of choosing a sequence of answers \(a\) is the product of the probabilities of choosing each element \(a_{\Theta }\) of the sequence.

To define e-trusting and proximitism, replace “\(h \in N(g)\)” in the above definition by the requirement that \(h\) is an expert neighbor (if one exists) or is most proximate to such an expert. The majoritarian versions of all three norms can be defined similarly.

A few properties of tns will play a critical role in proofs. Let \(\tau \) be a tn. Suppose that \((*) ~ \tau _{\mathcal{E}, g}(\overline{\varvec{a}})=\tau _{\mathcal{E}, g}(\overline{\varvec{b}})\) for all expert networks \(\mathcal{E}\), all agents \(g\) in \(\mathcal{E}\), and all answer reports \(\overline{\varvec{a}}, \overline{\varvec{b}} \in ((A_{\mathcal{E} - g})^{N(g)})^{< \mathbb N }\) with identical last coordinates (i.e., \(\overline{\varvec{a}}_{|\overline{\varvec{a}}|} = \overline{\varvec{b}}_{|\overline{\varvec{b}}|}\)). Then \(\tau \) is said to be Markov, as its behavior depends only upon the last element of an answer sequence. It is said to be Markov with memory \(t\) if \((*)\) holds for any sequences \(\overline{\varvec{a}}\) and \(\overline{\varvec{b}}\) for which the last \(t\) coordinates are identical.

Given \(\Theta \in \Theta _{\mathcal{E} - g}\) and some \(\theta \in \Theta \), define \(E(\theta ) = \{a \in A_{\mathcal{E}-g}: a_{\Theta } = \theta \}\). A Markov tn \(\tau \) with memory \(t\) is said to be stable if for all \(\overline{\varvec{a}} \in ((A_{\mathcal{E}-g})^{N(g)})^{<\mathbb N }\), if \(\overline{\varvec{a}}_{k,h,\Theta } = \theta \) for all \(h \in N(g)\) and all \(k\) such that \(|a| - t \le k \le |a|\), then \(\tau _{\mathcal{E}, g}(\overline{\varvec{a}})(E(\theta )) = 1\). Finally, a tn is said to be sensitive if for all expert networks \(\mathcal{E}\), all agents \(g\) in the network, and all \(\Theta \in \Theta _{\mathcal{E}}\), there is some \(\epsilon > 0\) and some \(J \subseteq PN(g,\Theta )\) such that for all \(\overline{\varvec{a}} \in ((A_{\mathcal{E}-g})^{N(g)})^{<\mathbb N }\), if \(\overline{\varvec{a}}_{|\overline{\varvec{a}}|,h,\Theta } = \theta \) for all \(h \in J\), then \(\tau _{\mathcal{E}, g}(\overline{\varvec{a}})(E(\theta )) > \epsilon \). By construction, the six tns in the body of the paper are Markov, and stable. All said tns except majoritarian Reidianism and majoritarian e-trusting are also sensitive.

A group testimonial norm (or gtn for short) is a proper class function from expert networks to vectors of tns for each agent in the network. A gtn is called pure if it is a constant function; it is called mixed otherwise.

Scientific communities, probabilities over answer sequences, and more on convergence

A scientific network is a pair \(S=\langle \mathcal{E}, \langle \tau (g) \rangle _{g \in G} \rangle \) consisting of an expert network \(\mathcal{E}\) and an assignment of tns \(\tau (g)\) to each agent \(g\) in the network. \(\tau (g)_{\mathcal{E},g}\) is abbreviated by \(\tau _{\mathcal{E},g}\) below, as no confusion will arise.

Define \(W_{\mathcal{E}} = \{W_g : g \in G\}\), and let \(\overline{w} \in \times _{W \in W_{\mathcal{E}}} W\) be the true state of the world for all questions faced by agents in the network. Recall, in a given world, an agent’s methods induces a probability measure over answer sequences within her area of expertise. Moreover, gtns specify the probability that agents will assign to answers outside their respective specialties. Therefore, given a scientific network \(S\) and world \(\overline{w} \in \times _{W \in W_{\mathcal{E}}} W\), one can define a probability measure \(p_{\overline{w}, S}\) over infinite sequences of answers for the entire network \(((A_{\mathcal{E}})^{G})^\mathbb{N }\) (where, the events are those in the product algebra). Defining \(p_{\overline{w},S}\) is straightforward but tedious. So details are omitted.

Say an expert network \(\mathcal{E}\) is s.a.s methodologically convergent if the methods employed by each agent are s.a.s. Given \(\overline{w} \in \times _{W \in W_{\mathcal{E}}} W\), let \(a(\overline{w}) \in A_{\mathcal{E}}\) be the unique answer sequence such that \(\overline{w} \in a(\overline{w})\). That is, \(a(\overline{w})\) is the sequence of true answers to every question if \(\overline{w}\) describes the true state of the world. Say a gtn is s.a.s testimonially convergent if for all scientific networks \(S= \langle \mathcal{E}, \langle \tau (g)\rangle _{g \in G} \rangle \):

$$\begin{aligned} p_{\overline{w},S}((\forall g \in G) \overline{\varvec{a}}_{n,g} = a(\overline{w}) \text{ for } \text{ large } n) = 1 \end{aligned}$$

whenever \(\mathcal{E}\) is a connected, s.a.s. methodologically convergent network.

Proofs of theorems

Given an expert network \(\mathcal{E}\) and \(\overline{w} \in \times _{W \in W_{\mathcal{E}}} W\), define \(\varvec{a}(\overline{w}) \in (A_\mathcal{E})^G\) to be the vector representing the state in which all agents believe \(a(\overline{w})\). Let \(\varvec{A}(\overline{w}) = \{\varvec{a} \in (A_{\mathcal{E}})^G: w_g \in \varvec{a}_{g,\Theta _g} \}\) be the set of belief vectors in which every agent holds a true belief in her own specialty. Next, define by recursion:

$$\begin{aligned} E_0(\overline{w})&= \big \{\overline{\varvec{a}} \in ((A_{\mathcal{E}})^G)^\mathbb{N }: (\forall n \in \mathbb N ) \overline{\varvec{a}}_{n} \in \varvec{A}(\overline{w}) \big \} \\ E_{n+1}(\overline{w})&= \big \{\overline{\varvec{a}} \in ((A_{\mathcal{E}})^G)^\mathbb{N }: (\forall k \ge n+1) \overline{\varvec{a}}_{k} \in \varvec{A}(\overline{w})\big \} \setminus E_n(\overline{w}) \end{aligned}$$

So \(E_n(\overline{w})\) is the event that \(n\) is the first stage at which every agent has converged to the true answer in her specialty. Finally, let \(X_n:(A_{\mathcal{E}}^G)^\mathbb{N } \rightarrow A_{\mathcal{E}}^G\) be the function \(\overline{\varvec{a}} \mapsto \overline{\varvec{a}}_n\) that represents the beliefs of all agents, to all questions on stage \(n\). In the following lemma, let \(S = \langle \mathcal{E}, \langle \tau (g)\rangle _{g \in G} \rangle \) be a scientific network and \(\overline{w} \in \times _{W \in W_{\mathcal{E}}} W\). Suppose that \(\mathcal{E}\) is methodologically s.a.s. convergent and that \(\tau (g)\) is Markov for all \(g \in G\).

Lemma 1

Then \(\langle X_n, E_n(\overline{w}) \rangle _{n \in \mathbb N }\) is a uniform pc-Markov process over the state space \(\varvec{A}(\overline{w})\) with respect to \(p_{\overline{w},S}\).


Since \(\mathcal{E}\) is s.a.s. methodologically convergent, it follows that \(p_{\overline{w},S}(\cup _{n \in \mathbb N } E_n(\overline{w})) = 1\). The events \(E_n(\overline{w})\)’s are disjoint by construction. Conditional on \(E_n\), each agent’s beliefs change only outside her specialty at every stage \(k \ge n+1\). Hence, agents’ beliefs at any stage \(k \ge n\) depend only upon tns and not upon data. Since the tns are Markov, the vectors of all agents beliefs at stages past \(n\), represented by \(\langle X_k\rangle _{k \ge n}\), form a time-homogeneous, Markov process conditional on \(E_n\) as desired.\(\square \)

The next theorem corresponds to Theorem 1 in the text.

Theorem 6

Suppose that, for all \(g \in G\), the norm \(\tau (g)\) is also stable and sensitive. Then \(\langle X_n, E_n(\overline{w}) \rangle _{n \in \mathbb N }\) is a uniformly absorbing pc-Markov process with respect to \(p_{\overline{w},S}\), where the unique absorbing state is \(\varvec{a}(\overline{w})\).


By the previous lemma, \(\langle X_n, E_n(\overline{w}) \rangle _{n \in \mathbb N }\) is a uniform pc-Markov process. For all agents \(g\), one can use sensitivity to show, by induction on \(g\)’s length \(n\) from a \(\Theta \)-expert, that there is some non-zero probability that \(g\) will believe an answer \(\theta \) to \(\Theta \) exactly \(n\) many stages after all the most proximate \(\Theta \) experts to \(g\) believe \(\theta \). Again, using stability and induction, one can show that, for all natural numbers \(n\) and \(k\), if \(g\) believes \(\theta \) on stage \(n\) and all the most proximate \(\Theta \) experts to \(g\) continue to believe \(\theta \) for \(k\) stages, then \(g\) will believe \(\theta \) on stage \(n+k\). Since the network is s.a.s. convergent, this suffices to show that true beliefs will eventually propagate through the entire network. By the stability and Markov property of the tns, the network will be absorbed in this state.\(\square \)

Modeling miscommunication

For the following theorems, suppose each agent’s question has only two answers. In order to model miscommunication, one needs only to alter the definition of the measure \(p_{\overline{w}, S}\), so that, on each stage, for all her neighbors, an agent reports the answer other than the one she believes with some fixed probability \(\epsilon > 0\). Call the measure induced by this process \(p_{\overline{w}, S, \epsilon }\). Below, retain the assumptions of the previous theorems and add the further assumption that, for all \(g \in G\), the function \(\tau (g)\) is one of the six tns from the body of the paper.

Theorem 7

Then \(\langle X_n, E_n(\overline{w}) \rangle _{n \in \mathbb N }\) is a uniformly regular pc-Markov process (over state space \(\varvec{A}(\overline{w})\)) with respect to \(p_{\overline{w}, S, \epsilon }\).


By the same reasoning as above, the process is a pc-Markov process. So it suffices to show it is regular. In fact, since each of the six tns has a memory of length one, the process can transition from any state in \(\varvec{A}(\overline{w})\) to another in exactly one step. To show this, I show that any agent’s belief, with respect to any question, changes with positive probability and stays the same with positive probability. This suffices because there are only two answers to a question.

Consider a fixed agent \(g\) and a fixed question \(Q\). Since there are two possible answers, the agent’s belief with respect to \(Q\) can be represented by a \(0\) or \(1\), and her neighbors beliefs with respect to \(Q\) can be represented by a binary vector \(\varvec{a}\). Now each of the six tns has the following property: there are binary vectors \(\varvec{b}_{stay}\) and \(\varvec{b}_{change}\) such that, (i) if \(g\) thinks her neighbors’ beliefs with respect to \(Q\) are represented by \(\varvec{b}_{stay}\), then \(g\)’s beliefs with respect to \(Q\) will remain the same with positive probability, and (ii) if \(g\) believes her neighbors’ beliefs are represented by \(\varvec{b}_{change}\), then \(g\)’s beliefs will change with positive probability. For instance, if \(g\) is a Reidian who currently believes \(0\), then the constant vector containing only zeros is one example that could be \(\varvec{b}_{stay}\), and the constant vector containing only ones is one example of \(\varvec{b}_{change}\).

Let \(\varvec{a}\) represent \(g\)’s neighbors’ current beliefs, and Let \(n\) be the number of entries in the vector differs from the vector \(\varvec{b}_{stay}\). Then, by the definition of miscommunication, the probability that \(g\) will think her neighbors believe \(\varvec{b}_{stay}\) is \(\epsilon ^n\). By definition of \(\varvec{b}_{stay}\), if \(g\) believes her neighbors believe \(\varvec{b}_{stay}\), then \(g\) will retain her belief with some positive probability \(\delta \). So the probability that \(g\)’s belief will stay the same is at least \(\delta \cdot \epsilon ^n\), which is positive. A similar argument shows that \(g\)’s belief changes with respect to question \(Q\) with positive probability. \(\square \)

The next theorem corresponds to Theorems 3 and 2 in the body of the paper.

Theorem 8

The error rate approaches a fixed positive number.


By the previous lemma and Theorem 5, there is a limiting probability distribution over states (i.e. specifications of beliefs for every agent in the network) of the PC-Markov process describing the evolution of agents’ beliefs, and that distribution does not depend on agents’ initial beliefs. In each such state, agents have some non-negative number of erroneous beliefs. The error rate is the expectation of error relative to this limiting probability distribution over states. Note every state has some positive probability by the argument in the previous theorem. So there is some non-zero probability of having erroneous beliefs, which entails the error rate is positive.\(\square \)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Mayo-Wilson, C. Reliability of testimonial norms in scientific communities. Synthese 191, 55–78 (2014).

Download citation


  • Testimony
  • Epistemology
  • Reductionism
  • Social structure of science
  • Networks
  • Homophily