In previous work we have argued for the importance of rigorous methods for measuring statistical evidence in biomedical research and other arenas, and we have proposed using the mathematical foundations of thermodynamics as a template for achieving this end (Vieland 2006, 2011; Vieland and Hodge 2011). Here we complete the first stage of development of a thermodynamically based methodological framework in which evidence becomes measurable on an absolute scale for the first time.

By an absolute scale we mean one for which the unit of measurement means the same thing across the range of the measurement scale (just as an increase of one foot of length means the same thing, up to given measurement precision, regardless of whether that one foot is added to one foot or to one hundred feet), and across disparate applications (just as one foot means the same thing whether measuring lengths of rope or lumber). It is worth pointing out from the outset that this understanding of “absolute” goes beyond requiring an absolute 0 point for the measurement scale, which is sometimes misunderstood as the basis on which the Kelvin thermometric scale is said to be absolute. (The term “absolute” is also used differently in representational measurement theory). The Kelvin scale is clearly and by design absolute in the same sense in which we are using the term (Chang 2004).

Perhaps the most intellectually challenging aspect of our measurement problem is simply articulating what is meant by means the same thing for evidence measures. For instance, many statisticians use the p value to indicate the degree of evidence against a null hypothesis. But how would we go about establishing whether a change of, say, 0.001 means the same thing relative to a baseline p value of 0.01 and relative to another baseline p value of 0.0001? Other statisticians use the log likelihood ratio (LR) as a measure of evidence, but the same conundrum arises. Not only is it unclear whether a change in the log LR from 1 to 2 means the same thing as a change from 10 to 11, or perhaps from 10 to 20; it is, moreover, not even clear how we would go about investigating the matter. We lack a methodological framework in which asking for the meaning of means the same thing is a well-posed question. But surely measurement of any quantity, including evidence, requires at a minimum that the unit of measurement always does in fact mean the same thing.

It is noteworthy that the abstract and obscure nature of this measurement quandary is directly parallel to that faced and overcome by Kelvin himself in the mid-nineteenth century, a stroke of historical luck that we gladly exploit for our own purposes. We develop a methodological framework in the context of which it becomes possible to articulate a definition of means the same thing in measuring statistical evidence, based on what is meant by this phrase in reference to degrees of temperature in thermodynamics. In essence our approach is extremely simple: we reinterpret fundamental thermodynamic quantities in terms of evidential rather than physical processes and then port the mathematical underpinnings of thermodynamics over to the evidential side en bloc. The quantity denoted by T in thermodynamics—temperature measured on the absolute scale—is in this framework translated as the evidence E, which is then intrinsically measured on the absolute scale, in exactly the same sense in which T is.

Various connections have of course been previously made between thermodynamics and certain aspects of statistical inference; see, e.g., (Shannon 1948; Jaynes 1957; Cox 1961; Kullback 1968; Soofi 2000; Caticha 2003), primarily based on the power of one particular concept—entropy—in multiple settings. We view our own results as extensions of Jaynes’ research into connections among Shannon entropy, statistical mechanical entropy, thermodynamic entropy, and statistical inference. We wonder, however, why Jaynes himself never asked the following question: If crucial underlying quantities such as entropy tie statistical inference and thermodynamics together, and given the central role of temperature in thermodynamics, what then is the analogue of temperature on the inferential side? Our answer represents, as far as we know, the first proposal to interpret statistical evidence as the direct mathematical analogue of temperature, and, moreover, of temperature as measured on the absolute scale.

We focus in this paper on one very simple statistical model. Suppose we are interested in the evidence that a certain coin is either fair or biased towards landing tails. We perform one or more experiments, each time tossing the coin n times and observing the number of times x that the coin lands heads (where both n and x can vary across experiments). For a given experiment, the data (n, x) carry information about the coin, which we capture by graphing the likelihood ratio (LR) as a function of P(heads) = θ, fixing θ = 1/2 in the denominator of the LR. Data from subsequent experiments change the LR graph, for instance, causing the height of the graph at its maximum to increase. We consider the evidence itself as a property of the graph and seek to define this property in a way that is amenable to absolute measurement, while preserving our underlying feel for what we mean by evidence.

Our starting point is what may at first appear as a relatively obscure analogy with the elementary heat engine of thermodynamics; more specifically, with the abstract model of the workings of a heat engine embodied in the Carnot cycle. The purpose of a heat engine is to convert heat into work, say, using the heat to raise a piston, and similarly we might describe the purpose of the coin-tossing experiment to be the conversion of the information conveyed by new data into movement of the LR graph. But more importantly for our purposes, the Carnot cycle constituted the basis for rigorous understanding of the fundamental dynamics of thermal systems. We, therefore, consider what is to be gained by considering the statistical experiment as the analogue of the heat engine, to rigorously characterize the relationships among evidential analogues of heat, work, and temperature. Thus, rather than begin by assuming any particular definition of E, instead, we adopt as our sole methodological constraint that our evidential model must adhere to the equations required to construct and analyze Carnot cycles. The resulting evidential dynamical framework will be justified insofar as it proves fruitful, even if it should remain difficult to relate the idealized workings of an engine to how we ordinarily think about statistical data analyses. Indeed, if this approach does prove to be useful, we may question whether it is based on mere analogy, or possibly, the far deeper connection between thermodynamics and inference championed by Jaynes and many others.

Methods and results

We develop the central argument of this paper in three steps, as follows. We begin by postulating that a statistical model can be described by an equation (called an equation of state, or EqS), expressed in the form of the EqS for an ideal gas, and that this EqS defines a system that is governed by analogues of the 1st and 2nd Laws of Thermodynamics. In “Possibility of the formalism” we reparameterize a simple binomial model to illustrate. This establishes that it is possible to represent a statistical system in thermodynamic terms. Then in “Plausibility of the formalism” we demonstrate that the quantity corresponding to temperature in this system, which we call E (for evidence), exhibits behavior that coincides in multiple respects with our intuitive feel for the behavior of statistical evidence. This establishes that it is plausible to use such a system to define statistical evidence. Finally in “Utility of the formalism” we explicitly address the meaning of means the same thing for the unit of measurement for E. We show how this formalism leads us to think about statistical systems in fundamentally new ways, and we posit that doing so will prove useful, just as early thermodynamic investigations led to new and productive ways of thinking about heat. Throughout, we attempt to motivate the analogy between specific thermodynamic quantities and their evidential counterparts heuristically; in Appendix 1 we present an alternative derivation of fundamentals of the new framework, which is mathematically more direct also more abstract.

Possibility of the formalism

We start with a simple binomial coin tossing model, with x heads out of n tosses, and probability θ that the coin lands heads. We consider the hypothesis contrast “coin is biased” versus “coin is not biased” (θ = 0.5).Footnote 1 We stipulate that the LR (Eq. 1, Table 1), captures all the evidential content of the system, insofar as the addition of anything beyond the LR would not change the evidence conveyed by given data with respect to the stated hypotheses. This stipulation is fundamental to our framework; for related discussion, see (Hacking 1965; Edwards 1992; Royall 1997). While we generally think of changes in the LR graph as resulting from changes in the data (n, x), we can equally well consider the permissible transformations of the graph per se, where what is permissible is any state of the system that conforms to the underlying binomial LR equation as the data change. In “Plausibility of the formalism” below, we treat both n and x as continuous (non-negative) numbers, for reasons that will become clear in “Utility of the formalism”.

Table 1 Fundamental equations

We begin with the postulate that the EqS for this system can be described by the equation describing an ideal gas (Eq. 2, Table 1). Our task, therefore, is to see if we can find a reparameterization of the statistical system that enables us to express it in the form PV = RT. This involves deriving the analogues of the constituent quantities and demonstrating internal consistency for the resulting EqS in statistical terms. (Here we omit the usual factor N = number of moles of matter, or equivalently, we use as our template the EqS for a single mole of matter; see the “Discussion” for further comments).

As the evidential analogue of volume, VE, we simply use the area under the LR curve (Eq. 3, Table 1). (In a two-parameter model, this would be an actual volume). Here we consider a one-sided case, in which the quantity in Eq. 3 is integrated from θ = 0 to 0.5 (with x ≤ n/2). While this complicates the arithmetic somewhat, it will simplify some aspects of the presentation. The one-sided case also plays a special role in statistical genetics, and we have investigated its properties in detail elsewhere (Hodge and Vieland 2010). We have established previously that at least in statistical genetic applications, VE exhibits “thermoscopic” behavior, in the sense that it properly goes up (or down) as the evidence goes up (or down), and in a manner unmatched by other standard approaches to quantifying evidence, such as the p-value or the maximum LR (Huang and Vieland 2001; Bartlett and Vieland 2006; Huang and Vieland 2010). Thus, we view VE as a direct analogue of physical volume V in elementary thermodynamic systems, in that like V, VE can serve as a thermoscopic indicator of increasing or decreasing temperature.Footnote 2

In thermodynamics, if we impose the 1st Law (conservation of energy) and 2nd Law (impossibility of a perpetual motion machine) on an ideal gas, then we can also express the EqS in terms of the entropy S (Fermi 1956 orig. 1936) (Eq. 4 in our Table 1), where S is defined up to an (additive) constant. In physics, CV can take on one of three values: 1.5R, 2.5R, 3R, for monatomic, diatomic, and polyatomic gasses, respectively. Here we fix C V  = 1.5R throughout. See the “Discussion” and Appendix 1 for further comments on the evidential analogues of the 1st and 2nd Laws; see also Palacios (quoted in (Krantz and Luce et al. 1971, p. 456)) for discussion of the thermodynamic constants.

From Jaynes (Jaynes 1957) and Kullback (Kullback 1968) we have a direct connection between the thermodynamic quantity S and the Shannon information, and between the Shannon information and statistical information (Fisher information, etc.). We, therefore, deploy a definition of S based on Shannon. This will be the “hook” that enables us to connect the binomial LR to a thermodynamic representation of the system. Specifically, we define the evidential analogue of S, which we denote SE, as a particular form of relative entropy, viz., the information entropy evaluated at the constrained maximum entropy (MaxEnt) value of the binomial probability θ, which is in this case simply the maximum likelihood estimator x/n, relative to the information entropy evaluated at the unconstrained MaxEnt value θ = 1/2 (Eq. 5, Table 1). As in physics, SE is defined only up to an additive constant. See also (Krantz and Luce et al. 1971) for an axiomatic basis for entropy as a mathematical rather than physical quantity. We note that the property of reversibility is a consequence of this definition: SE depends only on the current state of the system, regardless of the path taken on a PEVE diagram (see below) to arrive at that state from some other state; and because we have defined it in terms of MaxEnt states, assuming that MaxEnt also corresponds to statistical equilibrium, the evidential system developed here can be said to be in an equilibrium state at all times.

We can now derive the remaining variables needed to completely describe the system. Substituting SE and VE into Eq. 4 and exponentiating, we obtain an evidential analogue of T, which we call E (Eq. 6, Table 1). Note that E in Eq. 6 is always non-negative. (Because SE is defined with respect to the MaxEnt state, we expect this definition of E to relate directly to the parameter β of the Boltzmann distribution). Finally, substituting E as expressed in Eq. 6 into Eq. 2 (Table 1), we obtain the evidential analogue of the pressure P, PE (Eq. 7, Table 1).

Thus we have successfully reparameterized the initial binomial system, expressed in terms of (LR, n, x), into an explicit EqS describing the LR graph in terms of (VE, PE, E), while tacitly imposing analogues of the 1st and 2nd Laws of Thermodynamics. We can now describe allowable transformations of the system in terms of any change in the LR graph that conforms to the underlying binomial EqS, without explicitly referencing changes in data. This shift from the usual statistical perspective is critical, because it allows us to view changes in the LR graph in terms of the in- or out-flux of evidential information, rather than in terms of changes in data (see “Utility of the formalism” below). The only remaining unknown in this system is the constant R. For the present, we set R = 1 (again, see Krantz and Luce at al. 1971).

Plausibility of the formalism

In this section we look at features of the system to see how they relate to what we would ordinarily mean by statistical evidence. As just noted, a crucial aspect of this formalism is that we can consider transformations of the system directly in terms of changes in the LR graph, rather than in terms of changes in the data. However, to establish the plausibility of the system, it is desirable to be able to appeal to statistical intuition, and this requires considering the behavior of the graph in familiar terms, as a function of (n, x). Thus, while we continue to assume that we are working with the EqS for (on the physical side) a single mole of matter, or what is probably best viewed as a fixed quantity of data, we will plot results as a function of changes in n. The rationale lies in the transitional nature of this section, illustrating properties of the system in familiar statistical terms, en route to thinking of changes as influenced not by the influx of data, but rather, by the influx of evidential information (again, see “Utility of the formalism”).

A feature of this system that plays an important role in subsequent discussion is that it captures evidence either in favor of the numerator of the LR or in favor of the denominator.Footnote 3 However, there is no particular “mark” designating which is which, that is, no fixed value below which the evidence always favors the denominator. On the other hand, for given n, there exists a value of x at which E is minimized (Fig. 1a). We call this (n, x) pair, which depends on E, the TRansition Point (TrP(E)).Footnote 4 To the left of TrP(E) the evidence favors “coin is biased”, and to the right of TrP(E) the evidence favors “coin is not biased”.

Fig. 1
figure 1

Behavior of binomial evidential system: a E as a function of x/n for various values of n; b x/n at TrP(E) as a function of n; c E as a function of (n, x)

The behavior shown in Fig. 1a makes sense from a statistical point of view. Bearing in mind whether one is looking to the left or right of TrP(E), this behavior conforms to what we mean by evidence in terms of two basic properties: (i) for fixed x/n, as n increases E increases; (ii) for fixed n and reading from left to right on the x/n axis, E decreases up to TrP(E), then increases. Figure 1b shows x/n at TrP(E) over a broader range of n. This plot also illustrates sensible behavior: as n increases, TrP(E) moves towards x = n/2; i.e., for very large n, even a small deviation from x = n/2 constitutes evidence against θ = 0.5.Footnote 5 One thing that may trouble statisticians is the increase in E at TrP(E) as a function of n. We return to this point in the “Discussion”. Figure 1c shows a view of E as a function of (n, x) in three-dimensional space for n up to 1,000, again showing consistent statistical behavior. We note also that as Fig. 1a and c illustrate, for given n, it is possible to have far stronger evidence in favor of “coin is biased” than in favor of “coin is not biased.” This reflects the fundamental asymmetry in the hypotheses, with the denominator specifying a single value (θ = 1/2), and the numerator allowing for any value θ < 1/2.

In aggregate, the behavior illustrated in Fig. 1 speaks to the plausibility of the current model as a representation of a statistical system. Moreover, with the model in hand, we can now begin to think of the system in ways that are unfamiliar to statisticians. For instance, the contours of Fig. 1c describe what could be called the “isotherms” of the system, that is, the set of (n, x) values corresponding to a single value of E (Fig. 2a, b) shows these same isotherms plotted against x/n). These isotherms also behave in accordance with our intuitions regarding evidence, e.g., consider (n = 10, x = 0) (not shown on figure). Whatever the value of the evidence for that point, we know that as n increases while holding x fixed, the evidence must go up. This implies that, were we to hold the evidence fixed and increase n, x would have to increase in order to compensate. Conversely, if we hold n fixed and increase x, the evidence will diminish, which implies that if we were to hold the evidence constant while increasing x, n would have to go up to compensate. We do not normally think of the dynamics of statistical systems in these terms, precisely because outside of this framework we have no way to hold the evidence constant in our mind’s eye while allowing the LR graph to change. Given a way to establish these isotherms, however, the behavior illustrated in Fig. 2b clearly makes statistical sense. Note also that the maximum value of n for each isotherm occurs at TrP(E). Figure 2c shows one isotherm plotted as a function of VE and PE, illustrating consistency between the behavior of E and a physical system.

Fig. 2
figure 2

a Various “isothermal” contours of the binomial evidential system as a function of (n, x); b These same isotherms plotted as a function of (n, x/n); c The E = 2.25 isotherm on a PEVE diagram. Starting from the TrP:moving to the left in panels a and b corresponds to moving to the right in panel c, and vice versa, moving to the right in a and b corresponds to moving to the left in c

Finally, we consider another quantity unfamiliar in traditional statistical frameworks, namely, the evidential equivalent of physical (mechanical) work. We are interested in the effects of in- or out-flux of evidential information on the LR graph, characterized in terms of the fundamental features of the graph. We posit, therefore, that the quantity PEdVE correctly characterizes a transformation of the system from an information-transfer point of view, just as PdV does in physics from an energy-transfer point of view (Fermi 1956 orig. 1936). Thus, the amount of evidential work done during a given transformation of the system from state A to state B is the quantity WE (Eq. 8, Table 1). While we are not aware of a familiar statistical analogue of this notion of evidential work, nevertheless, we think that work defined in this way has some intuitively appealing features. In particular, a given change in VE is more difficult to effect (requires greater information influx, see below) for higher E. As in physics, we distinguish work being done by the system (WE) from work done to the system (−WE). The central importance of this notion of evidential work will become clear in the following section. See also Appendix 1, which makes clear that nothing in the underlying mathematics requires an interpretation in terms of mechanical work per se.

Note that, because our evidential system follows the ideal gas EqS by design, it follows immediately that the system exhibits the characteristic properties of ideal gases, including Boyle’s Law (for fixed E, PE and VE are inversely proportional), Charles’ Law (for fixed PE, VE is proportional to E), etc.

Utility of the formalism

We have thus far established that it is possible and plausible to view a statistical system in thermodynamic terms, with the evidential analogue of T, our quantity E, serving as a measure of statistical evidence. It remains to illustrate why we believe this new framework will prove useful. Here we show the utility of the framework for establishing an absolute scale for the measurement of E and a context in which we can explicitly give the meaning of means the same thing for one degree on this scale.

Following Kelvin, we make use of the Carnot engine, an abstract mathematical device for considering the relationship between physical heat and work.Footnote 6 As we now know, although Carnot himself did not, a heat engine works by converting one form of energy to another: specifically, by converting heat input into mechanical work. The Carnot engine can be pictured as a cylinder containing a fixed quantity of gas, with a movable (frictionless) piston constraining the volume, and with two (infinite) heat reservoirs available for contact with the cylinder, one at temperature t 2 and the other at t 1 < t 2. Here we use lower case t to indicate that the temperature need not be measured on the absolute scale; any thermoscopic measure will suffice, even one for which the size of the unit changes across different parts of the temperature range. Each cycle of the Carnot engine is decomposed into four distinct strokes, A–D, as follows: (A) Heat is absorbed by the system from the warmer heat reservoir while maintaining constant temperature t 2, with corresponding increase in V and decrease in P, for an isothermal change. (B) Heat transfer is stopped (the reservoir is removed from contact) and the system is allowed to continue to expand and cool down to temperature t 1, through a process known as adiabatic change. (C) The system is compressed isothermally by application of external force (work) at constant t 1, with corresponding reduction in V and increase in P, requiring a corresponding transfer of energy out of the system into the cooler heat reservoir. (D) The system is permitted to continue adiabatically, with no further transfer of energy (no further contact with the second reservoir), until it returns to its initial state. During the first two strokes work is done by the system (positive work), while during the second two strokes work is done to the system (negative work). A mechanical heat engine is effective insofar as more work is done by the system during the first two strokes than is required to be done to the system through external means during the third stroke to return it to its initial state, and this happens because less work is required to effect the same transformation (or its reverse) at lower temperatures.

We too can run evidential Carnot cycles using our binomial system.Footnote 7 Figure 3 shows two numerical examples of such cycles on a classical PV (more specifically, PEVE) diagram. These cycles are readily seen to correspond to genuine Carnot cycles: in each cycle, work done during the two adiabatic strokes (WB, −WD) is identical in magnitude but opposite in sign; work WA done in the first (isothermal) stroke is greater in magnitude than the work −WC done in the third (isothermal) stroke; and WC/WA = e 1/e 2, where e is the analog of t in the physical engine, that is, any “thermoscopic” measure of evidence. We now posit that evidential work is a transformation of evidential information, just as mechanical work is a transformation of heat in a physical heat engine. In physics, heat Q represents the quantity of energy transferred in or out of the system. Following this convention, we consider evidential energy, QE, to be the amount of evidential information being transferred. In what follows, we use Q2 to indicate the evidential energy transferred into the engine during the first stroke at e 2, and Q1 to denote the evidential energy transferred out of the engine during the third stroke at e 1.

Fig. 3
figure 3

Two evidential Carnot cycles for the binomial system in the format of a classic PV diagram. Cycle 1 (on left) operates between E1 = 1 and E2 = 2; cycle 2 (on right) operates between E1 = 2 and E2 = 4. Both cycles start in the upper left and travel clockwise. For each stroke i (i = A, B, C, D; see text), work W i is calculated as the area beneath the line connecting the nodes. For each cycle, the ratio WC/WA equals the ratio of evidence levels, or 1/2, as indicated visually by the right-hand panel for each cycle. This illustrates the general principle that for any cyclic transformation taking a system from evidence E to (1/2)E, the ratio of mechanical work performed at the two evidence levels is the same

For a cyclic, reversible transformation, there is no net change in energy from the initial to final state of the system, since by stipulation the system is returned to its initial state. In this case and for an ideal gas, W is a direct measure of heat, and in particular, WA = Q2. Thus defining the efficiency η of the engine as the ratio of the work performed by the cycle to the evidential information absorbed at the higher evidence level (and temporarily omitting subscript “E”s for notational convenience), we have η = (WA + WB − WC − WD)/WA = (W− WC)/WA = 1 − WC/WA = 1 − Q1/Q2. Carnot proved that reversible cyclic engines have maximal efficiency, and by direct application of his reasoning we also have the result that all such evidential engines operating between the evidence levels e 1, e 2 share the same ratio Q1/Q2. In other words, the ratio of evidential energies depends only on the ratio of evidence levels and on nothing else specific to the particular system. Thus we can write Q1/Q2 = f(e 1, e 2), where (paraphrasing Fermi) f is a universal function of the two evidence levels. From here we can directly follow the arithmetic in Fermi to show that f(e 1, e 2) = f(e 0, e 1)/f(e 0, e 2), for e 0 any arbitrary level of evidence. Fixing e 0 at a constant value we can, therefore, consider f(e 0, e) to be a function of e only, thus, K f(e 0, e) = g(e), where K is an arbitrary constant. This gives, finally, Q1/Q2 = g(e 1)/g(e 2).

If we now consider g(e) to be the evidence E, we see that we have converted our “thermoscopic” measure e onto an absolute scale in the following sense: first, it is immediately apparent that E is on a ratio scale (Hand 2004) and hence has an absolute 0; second, while the functional form of g remains to be set, meaning that the size of the degree is not yet specified, a given change in evidential energy will always correspond to the same amount of change in E. This result is entirely independent of the particulars of the engine and will hold for any reversible cyclic operation.Footnote 8 Figure 3 illustrates the meaning of “means the same thing” in the context of reversible cyclic transformations. As shown, a twofold change in E corresponds to an identical ratio of evidential work, and thus also an identical ratio of evidential information, across the measurement scale.

As in physics, we did not define QE up front. However, we have now arrived at an understanding of what QE must be, namely, the transfer of evidential information in a form that can be converted to WE. Moreover, we immediately have a ready mechanism for measuring the amount of evidential information through the heat–work (energy-work) relationship in an evidential Carnot cycle. In principle then, we can use any “thermoscopic” measure of statistical evidence, such as VE itself (at least for simple systems such as the one-sided binomial), as an evidence measure, and calibrate it against this absolute scale. Of course, different functions g must be derived for each different type of statistical outcome measure to accomplish this; and beyond the context of the highly stylized Carnot cycle, myriad practical issues complicate actual measurement calibration. Additionally, the statistical outcome measure must itself behave thermoscopically, properly tracking up and down with the evidence.

Thus we have not so much solved the evidence measurement problem, as reformulated it in a way that makes it amenable to solution for the first time. Any remaining skepticism regarding the probability of success in practical applications may be offset by an appreciation of how difficult the corresponding practical problems were in physics (Chang 2004). It is all the more heartening, therefore, that we can now take the ordinary drugstore thermometer wholly for granted.


We can now restate the central problem we set out to address. We imagine that a given set of data has internal energy, UE. UE can be thought of as a form of information if that seems more comfortable, but to begin with we did not define UE, nor did we propose any particular device for directly measuring it. (This too parallels the development of thermodynamics (Callen 1985) p. 461). We posit only that it is the transfer of evidential energy in or out of a statistical system that causes changes in the LR graph. Having characterized those changes as above, we can now refer to this as the process of an influx (or outflux) of evidential heat QE doing evidential work WE. Our task then is to measure that property of UE that corresponds to what we mean by statistical evidence, which we have done by drawing directly on thermodynamic results to show how to measure E, the evidential analogue of T, on what is demonstrably an absolute scale in the same sense in which the Kelvin scale is an absolute scale for temperature. As in thermodynamics (Chang 2004), we first derived the measurement scale, and only then were we in a position to consider what precisely it is that E is measuring.

Under this new formalism, E is a measure of evidential information, or what we might now preferably call evidential energy, with the unit of E having fixed meaning as described above. This opens an entirely new line of research into the manner in which evidential energy is transformed into evidential work as data accumulate, including investigation of relative efficiencies of different mechanisms (statistical models) for so doing. We speculate that this shift in perspective will have numerous ancillary benefits, allowing us to extend Jaynes’ program of connecting the laws of inference with the laws governing physical systems through the shared concept of information, or evidential energy. (Note, however, that our framework is silent on some matters of central concern to Jaynes and other Bayesians, such as the proper formulation of priors or rational procedures for rank-ordering beliefs). First and foremost, however, we envision this paper as merely a prelude to the difficult empirical work of aligning evidence measures across applications in biomedical research, so that results can be properly interpreted across the measurement scale within any one application, across disparate applications, and across different types and levels of analysis.

While it may seem odd, although we have derived an absolute scale for evidence measurement we have in fact not yet specified the units of that scale. In thermodynamics too, the size of the degree is an arbitrary choice, set by convention. Thus, we can say that E, as it occurs in all Figures in this paper, is on the absolute scale; yet at the same time, the numerical values of E shown here may not correspond to what we will eventually settle upon as the actual values of the evidence, once a decision has been made regarding the size of the standard measurement unit.Footnote 9

One important consequence of our results is the necessity to reconsider the role of n in statistical systems. Initially, we had supposed that the accumulation of evidence based on two sets of data, with n 1 = n 2 observations, respectively, would be the analogue of doubling the number of particles (or moles of matter) in the system. However, the new framework invites us to view n (more precisely, the pair (n, x)) as an index of evidential energy, not of the number of particles. Once we see that E measures energy UE and that in the context of an evidential Carnot cycle this energy can be directly measured through the work WE, it becomes clear that n as it appears in our equations must relate to the units of energy, not matter: (n, x) is what changes as energy flows in and out of the system, and it is changes in (n, x) that correspond to evidential work. This is built into the thermodynamic equations that form the basis of the system. In thermodynamics too, it is possible to effect equivalent changes in macroscopic properties of a system (such as V, P) by either the input of energy or the input of particles (under suitable conditions for each). Indeed, our own previous work on evidential “thermoscopes” has highlighted the distinction between “pooling” data (considering all data as a single data set) and “sequentially updating” the evidence across data sets (Vieland 2006; Hodge and Vieland 2010; Vieland and Hodge 2011). In hindsight this appears to relate to the distinction between viewing changes in the system as resulting from changes in n considered as the number of observations (pooling) or as changes in n considered as an index of energy (sequentially updating results without combining data). Virtually all asymptotic theory in statistical inference is based on pooling as the fundamental operation as n increases, but we have argued elsewhere (Vieland and Hodge 2011) that a cogent measure-theoretic approach to the concatenation of evidence across data sets may require some form of updating. This remains a topic for further research.

We also view it as likely that future work will need to carefully consider the units of n itself, e.g., in human genetics, the unit of observation may be a pair of individuals (e.g., an affected sibling-pair) or an arbitrarily large family, which might correspond to gases of different atomic structures. Another tantalizing connection relates to the characterization of energy in our system by the (n, x) pair, with x being bounded by n. Possibly even a simple statistical system would be more appropriately modeled through the Van der Waals EqS (Fermi 1956 orig. 1936), which incorporates particle size and particle interaction, with the strength of the latter being constrained by the former. While merely a metaphor at this point, the Van der Waals equation has the further advantage that it allows for phase transitions, in a way that Eq. 2 itself does not. It could be that the two sides of TrP(E) are best thought of as different “phases,” like liquid and gaseous states. This might also relate to the behavior of E as an increasing function of n at TrP(E), which is a direct consequence of the definition of E (Eq. 6) as a function of SE. Possibly in further elaborations of the model, TrP(E) will prove to be a phase transition boundary point, at which E remains constant regardless of energy input into the system. See also Appendix 2, in which we derive a relationship between E and the observed Fisher Information. This relationship is interesting in its own right; moreover, we believe that further consideration may yield insights into the behavior of E at TrP(E).

One further aspect of our system that may strike both statisticians and physicists as implausible is the assumption of evidential analogues of the 1st and 2nd Laws of Thermodynamics, which are generally thought of as essentially physical in nature. In Appendix 1 we formally define the evidential 1st Law and take preliminary steps towards an evidential 2nd Law. (see Vieland 2013 for a more complete treatment of this topic). As in thermodynamics, these laws cannot be proved, but will be vindicated insofar as they allow us to usefully describe and manipulate statistical systems (Van Ness 1983; Callen 1985). Given the coherence of the new evidential framework so far, derived under the methodological assumption of an intimate relationship to the equations of thermodynamics, we believe that further investigation will confirm deep axiomatic connections between statistical systems characterized in terms of evidential information or energy and the basic laws of thermodynamics.

Theoretical matters aside, the ultimate intent of this project is to be useful in practice. Measurement is not, after all, an end in itself, but rather a sine qua non of rigorous scientific activity. When completed, our formalism would not replace other statistical investigations, but it would ideally provide a basis for reporting results of statistical analyses on a unified scale for purposes of meaningful comparison across the range of the measurement scale and across disparate applications. Considerable hard work remains to be done, however, before this framework can be usefully applied to real data analyses. To begin with, we have considered only a simple, discrete distribution. (Discrete distributions may play a special role in evidential systems, just as they do in physical systems). We will need to establish the validity of corresponding treatments of more complex statistical models involving nuisance parameters, approximating likelihoods, and disparate data structures, along with two-sided LR contrasts, before the framework can be deployed in target areas such as biomedical research. Finally, as a practical matter, we will need ways to indicate whether a given value of E is to the left, directly at, or to the right of TrP(E). This information is readily recovered from the underlying calculations, but forms a kind of meta-data that would need to be associated with the value of E in applications. Again, this might be equivalent to wanting to know whether a substance is in its liquid or gaseous state, which depends not just on temperature, but also on the volume and pressure of the system.

The practical challenges to deploying the new framework are substantial, but no more so than the challenges physicists faced in making practical use of the Kelvin thermal scale itself (Chang 2004). Just as we have exploited the well-established foundations of thermodynamics to develop the new evidential model, we believe that we can look to the experiences of experimental physics to facilitate translation of the theory into practice. Here too we may enjoy the benefits of standing on the shoulders of giants.