Why does the GSV method produce narrow confidence intervals? We can get a clue by running the GSV method when there are 10 reports of “heads” out of 10 for a fair coin flip (\(R = N = 10, P = 0.5\)). The resulting point estimate is that 100% of subjects lied. The upper and lower 99% confidence intervals are also 100%.
This is calculated as follows. First, given R reports of heads, the probability that a total of \(T\) “true” heads were observed is calculated as:
$$\begin{aligned} \text {Prob}(T \text { heads}| R;\, N, P) = \frac{ \text {binom}(T, N, 1 - P) }{ \sum ^R_{k=0}\text {binom}(k, N, 1 - P) }. \end{aligned}$$
(5)
This is the binomial distribution, truncated at R because by assumption, nobody “lies downward” and reports tails when they really saw heads.
Next, from T the number of lies told is calculated as \(R - T\); and the proportion of lies told is:
$$\begin{aligned} \mathrm{Lies} = \frac{R-T}{N-T}, \end{aligned}$$
(6)
because \(N - T\) people saw the low outcome and had the chance to lie. Combining this with the truncated binomial gives a cumulative distribution function of Lies. This is then used to estimate means and confidence intervals.
Putting these together, for \(R = N = 10\), the estimated distribution of Lies is calculated as follows:
With probability \(\frac{1}{1024}\), there were really 10 heads. Nobody lied in the sample.Footnote 4
Otherwise, 1 or more people saw tails, and they all lied. The proportion of liars is 100%.
Hence, the lower and upper confidence intervals are all 100%.
There are two problems with this approach: one statistical, and one conceptual.
First, if many heads are reported, you should learn two things. On the one hand, there are probably many liars in your sample. On the other hand, probably a lot of coins really landed heads. The probability distribution in Eq. (5) does not take account of this.
For example, suppose we are certain that everyone in the sample is a liar who always reports heads. In this case, observing \(R = N = 10\) gives us no information about the true number of heads. The posterior probability that \(T = 10\) is then indeed 1/1024, the same as the prior. Now, suppose we know that nobody in the sample is a liar. Then on observing \(R = 10\), we are sure that there were truly 10 heads: the posterior that \(T = 10\) is 1. If exactly 5 out of 10 subjects are liars, then observing \(R = 10\) means that all 5 truth-tellers really saw heads. The posterior probability that \(T = 10\) is then \(1/32\), the chance that all 5 liars saw heads, and so on.
When we are uncertain about the number of liars, our posterior that \(T = 10\) will be some weighted combination of these beliefs. Unless we are certain everyone in the sample is a liar, the probability that \(T = 10\) will be greater than 1 in 1024. Equation (5) is, therefore, not correct. In this case, it is equivalent to assume that everybody in the sample is a liar, whose report is uninformative about the true number of heads. One then uses the prior distribution of heads to estimate the proportion of those who actually saw tails and lied.
Indeed, in the simulations with \(P = 0.5\) and across all values of \(\lambda\), the overall probability that there were 10 true heads, conditional on \(R = N = 10\), was about 1 in 161, not 1 in 1024. Fixing \(\lambda = 0.2\), it was about 1 in 4.
This problem means that the GSV estimator of Lies is biased. In the “Appendix”, I show that the GSV estimator can have substantial bias, and performs worse than the naïve estimator from Eq. (1), \(\frac{R/N-(1-P)}{P}\). Also, the GSV confidence intervals do not always achieve nominal coverage of Lies. When the number of heads reported is either high or low, the percentage of confidence intervals containing Lies may fall below the nominal value.
There is a second, more important problem. The GSV approach attempts to estimate Lies in Eq. (6). This is the proportion of lies actually told, among the subsample of people who saw tails. But we are not usually interested in the proportion of lies actually told. We care about the probability that a subject in the sample would lie if they saw tails—\(\lambda\) in Eq. (2). This \(\lambda\) can be interpreted in different ways. Maybe on seeing a tail, each person in the sample lies with probability \(\lambda\). Or maybe the sample is drawn from a population of whom \(\lambda\) are (always) liars, and \(1 - \lambda\) are truth-tellers. Lies has no interpretation in the population, because the rest of the population has no chance to tell a lie in the experiment.
Lies can be treated as an estimate of \(\lambda\). It is unbiased: it estimates \(\lambda\) from the random, and randomly sized, sample of \(N - T\) people who saw tails. But it can be a very noisy estimate. Again, suppose 10 heads out of 10 are reported, and 9 heads were really observed. Lies is 100%. But it is 100% of just one person.
This means that even the correct confidence intervals for Lies would not be correct for \(\lambda\). For example, if 3 out of 3 subjects report heads, the GSV software reports a lower bound of 100% for any confidence interval. Indeed, since anyone who had the opportunity to lie clearly did so, this is the correct lower bound (if we arbitrarily define Lies = 1 when \(T = N\)). But it makes no sense as a confidence interval for \(\lambda\): we clearly cannot rule out that one or two subjects truly saw heads, and would have reported tails if they had seen tails.
Because of this problem, the GSV confidence interval coverage of \(\lambda\) is much worse than its coverage of Lies.The issue is especially serious when there are many reports of heads. In this case there were probably many true heads, so T is high and the true sample size \(N - T\) is low, making Lies a noisy estimate of \(\lambda\). Table 4 shows this. It splits the simulations by the proportion of reported heads, R/N. GSV coverage levels fall off sharply as R/N increases. Note that for fair coin flips, R/N is usually greater than 0.5, both in the simulations and in reality.
Table 4 GSV confidence interval coverage by proportion of heads reported (R/N)