Though cognitivists view the mind as an information-processing device, applications of information theory to psychological issues are extremely rare. The target of this Outlook article (Sims, 2018) attempts to prove its relevance.

Rate-distortion theory and its application to psychology: A primer

Fundamental concepts

Sims’ approach (Sims, 2016 is a better introduction) relies on rate-distortion theory, which mixes information theory with decision theory. Imagine a source S emitting a signal x ∈ X.Footnote 1 Let P(x) be the probability that a signal whose value is x is emitted. − log[P(x)] measures how surprising x is: it tends toward 0 when P(x) tends toward 1 (highly probable events are not surprising) and toward +∞ when P(x) tends toward 0 (highly improbable events are extremely surprising). The entropy H(S) of P(x) measures how surprising the signal is on average

$$ H(S)=-\sum \limits_{x\epsilon X}P(x)\log \left[P(x)\right] $$

Suppose that you know that a signal has been emitted but you do not know its value. On average, your level of surprise when informed of it should be H(S). Now suppose you are informed that in response to the unknown signal, a channel has emitted an output y ∈ Y. Let H(S| O) be the entropy of the source conditional upon the fact that you have been informed of the channel output.

$$ H\left(S|O\right)=-\sum \limits_{y\in Y}P(y)\sum \limits_{x\epsilon X}P\left(x|y\right)\log \left[P\left(x|y\right)\right] $$

If there is any relation between the value of the signal and the output of the channel, knowing y should reduce your uncertainty regarding x. If so, when informed of the value of x, you should not be as surprised as if you had not been informed of the channel output: H(x) > H(x| y). The average amount of information the channel output brings you about x is

$$ I\left(x,y\right)=H(x)-H\left(x|y\right) $$

If x and y are independent of each other, H(x) = H(x| y), and I(x, y) = 0. The best-case scenario is when only one value of y corresponds to each possible value of x: knowing y is as good as knowing x. In that case, H(x| y) = 0 and so, I(x, y) = H(x). There is an upper limit C to the amount of information a channel can transmit. Hence, a channel with an information capacity C will not be able to perfectly transmit information about a source whose entropy H(x) is higher than C.

All of these are the basic concepts of information theory. What rate distortion theory adds is the idea of a cost function, L(x, y), expressing a cost that incurred when the channel emits response y while the source has emitted signal x. The average cost E(L) is

$$ E(L)=\sum \limits_{x\in X}\sum \limits_{y\in Y}L\left(x,y\right)P\left(y|x\right)P(x) $$
(1)

Some applications

Example 1: Standard of comparison

If L(x, y) and P(x) are known, rate-estimation theory provides computational tools (reviewed in Sims, 2016) to find the values of P(y| x) that minimize Eq. (1) under the assumption that the channel capacity is C. The result can be summarized in a rate-distortion curve that shows the minimal channel capacity F(L) required to reach a specific cost level. F(L) can be used as a standard of comparison to determine whether a channel efficiently processes information. For a given physical channel, P(y| x) can be derived empirically, which allows computation of the empirical cost E(C) generated by the channel and the empirical amount of information I(x, y) the channel brings about the input. If I(x, y) > F[E(L)], the channel does not process information efficiently.

Sims (2016) provides an example of such an approach. The channel is a human participant exposed to a set of stimuli x ∈ X = [x1, x2,  … , xn] (this constitutes the source) that he needs to identify by emitting a response y ∈ Y = [y1, y2,  … , yn]. The participant receives feedback about his performance: L(xi, yj) = 1 if i = j, 0 otherwise. The rate-distortion analysis revealed that the participants did not process information efficiently.

Example 2: Estimating the cost function

The previous example aimed at testing whether the participant processed information efficiently. But, if it is assumed that they do, rate-distortion theory can be used to test the hypothesis about the cost function. Two approaches are possible. In both cases, a general form L(x, y, A) is assumed for the cost function with A being a set of free parameters. If A is held constant, the values of P(y| x) minimizing Eq. (1) can be computed and compared with the empirical P(y| x). Alternatively, the empirical P(y| x) can be used in Eq. (1) to find the values of A minimizing Eq. (1). The first approach aims to test the hypothesis about the shape of the cost function. The second assumes that a specific shape is correct and aims to estimate its parameters.

Sims (2016) applied both approaches to the set of data discussed in Example 1. He showed that the simple cost function we assumed in that example cannot account for the data. This is an example of the first approach. Sims (2016) then assumed a more complex cost function. Each stimulus xi is represented by an anchor αi, which also corresponds to the response yi, and

$$ L\left({x}_i,{y}_j\right)=\left|{\alpha}_i-{\alpha}_j\right| $$
(2)

Given the number of free parameters in that cost function, the model is able to provide an almost perfect account of the data. This is an example of the second approach.

Conclusion

Other applications of rate-distortion theory (to the capacity of visual short-term memory and to stimulus generalization) can be found in Sims (2016, 2018), but the two examples above should allow the reader to gain an understanding of what it can achieve when applied to psychological experiments. Overall, rate-distortion theory is very similar in nature to signal detection theory: Both are frameworks proposing a new set of concepts along with tools to evaluate the value they take when numbers can be assigned to them. A term is needed to differentiate this kind of theory from empirical theories aimed at being tested: the term “metaphysics,” without any pejorative connotation, seems appropriate.

Because a metaphysical theory provides concepts to frame experimental theories, it cannot be refuted or confirmed by them. The critical issue is whether its application to a specific problem makes sense. This is the goal of a conceptual analysis (Machado, Lourenço, & Silva, 2000), a stage most often skipped in psychology. Concerning Sims’ application of rate-distortion theory, the following remarks can be made.

  1. (a)

    Applying information theory to a problem implies the identification of a source emitting the signal x, a channel providing the output y, and an observer using y to infer the value of x. It is only by virtue of the observer that the channel output can be considered as carrying information about the source and that the channel can be said to be “processing information.” For instance, a climate scientist might use the size of the growth rings of trees in northern France to estimate the amount of rainfall in this area in the year 1889. The climate scientist is the one who could be genuinely said to be processing information. Saying that the trees process information about rainfall is just a way of saying that the size of the growth rings can be used by an observer to infer something about the amount of rainfall. In his applications of rate-distortion theory to perception, Sims most often assumes that the source is the environment providing an objective stimulus x to the participant while the channel is the perceptual system generating a subjective stimulus y in response. But who is the observer?

  2. (b)

    The distinction between an external objective stimulus x and its internal subjective perception y is a quite common (though maybe misguided) assumption of many approaches to perception. It does not make sense in that context to assume that the subject can compare the objective stimulus to his subjective perception of it as he has no access to the former. Yet this is exactly the assumption made by Eq. (2) and its variants, which appear in Sims (2018).