Justifications for democracy generally presume that the average citizen should make decisions on the basis of realistic perceptions of the world (see, e.g., Fernbach, Rogers, Fox, & Sloman, 2013; Gilens, 2001; Kuklinski, Quirk, Jerit, Schwieder, & Rich, 2000; Somin, 2013, 2014). When considering such policies as immigration, welfare, or social security reform, citizens should be expected to have some understanding of how many Americans are undocumented, receive welfare, or are eligible to receive social security benefits, respectively. Such perceptions are typically measured by asking people to make quantitative estimates of the prevalence of particular demographic groups, such as proportions of adherents to particular ethnic groups, social classes, or religions. Unfortunately, people across many different nations appear to hold massively distorted perceptions of the demographic composition of their communities, often wildly misestimating the proportion of various demographic groups within their local communities (e.g., Wong, 2007) and their larger nation (Citrin & Sides, 2008; Kuklinski et al., 2000; Lawrence & Sides, 2014; Sigelman & Niemi, 2001). For instance, a recent Gallup poll found that U.S. residents estimate that over 20% of the population self-identifies as LGBT (Newport, 2015), but the true proportion seems closer to 3% (Gates & Newport, 2012). A similar bias toward overestimation is found in estimates of politically salient subpopulations—such as the proportion of African Americans who are on welfare (Kuklinski et al., 2000)—and in estimates of the size of minority groups, including Hispanics, Asian Americans (Wong, 2007), immigrants (Sides & Citrin, 2007), and African Americans (Sigelman & Niemi, 2001). This pattern reverses when people estimate the size of dominant groups, including U.S. citizens who are White or Christian, which are massively underestimated (e.g., Ipsos Social Research Institute, 2014; Wong, 2007).

Since misinformed voters often make misinformed decisions, a phenomenon that has potentially serious consequences for the optimal functioning of democratic systems of government (Somin, 2013, 2014; but see Lupia & McCubbins, 1998). It is thus critical to understand the extent and root causes of this apparent widespread misinformation. As a step toward understanding the extent of misinformation about the size of demographic groups, a number of major international surveys have documented patterns of misestimation of different subpopulations within and across countries (e.g., Ipsos Social Research Institute, 2014; Norwegian Social Science Data Services, 2006). These surveys provide a view unparalleled in breadth into a widespread social phenomenon: In countries around the world, people massively overestimate the size of minority groups while dramatically underestimating the size of majority groups.

Most explanations of this phenomenon have focused on properties of the individual topic or population being estimated. One such explanation is that people overestimate the prevalence of the things they fear (e.g., immigrants), resulting in a form of “phobic innumeracy” (Allport, 1954; Gallagher, 2003; Herda, 2010; Nadeau, Niemi, & Levin, 1993; Whaley & Link, 1998). While this approach is not without its critics (Martinez, Wald, & Craig, 2008; Sides & Citrin, 2007), it illustrates the general tendency of the literature on demographic estimation to invoke explanations centered on topic-specific biases. Other explanations speculate that individuals overestimate the size of the specific groups to which they are overexposed, so that beliefs about population size are inflated by media over-representation (Gallagher, 2003; Herda, 2013) or increased social contact (Alba, Rumbaut, & Marotz, 2005; Herda, 2013; Wong, 2007).

The media has given extensive—often alarmist—coverage to these reports of apparent ignorance. For instance, in an article on demographic ignorance, a headline in The Guardian screamed, “Today’s Key Fact: You Are Probably Wrong About Almost Everything” (October 29, 2014). Similarly, a piece for the newsmagazine Slate had the pessimistic headline, “Americans Drastically Overestimate How Many Unauthorized Immigrants Are in the Country, and They Don’t Want to Know the Truth” (January 9, 2012). More recently, the Washington Post chastised Americans for grossly overestimating the proportion of people in the U.S. who are immigrants or Muslim: “But while many Americans consider immigration one of the biggest issues for the future president,…it’s remarkable just how much Americans overestimate immigration in their country....American estimates for the size of the Muslim population in this country, also a focus of political discussion, are even more extreme” (September 1, 2016). Even the Wall Street Journal noted that, although “Americans have strong opinions about policy issues shaping the presidential campaign, from immigration to Social Security…their grasp of numbers that underlie those issues can be tenuous” (January 7, 2012).

The mainstream consensus, therefore, is that people are often massively wrong when it comes to the demographic facts that are relevant to critical political issues. And the origins of these errors? Journalists, like many scholars, assume that misestimation is a telltale sign of topic-specific bias. Why do people overestimate the proportion of LGBT Americans? Bloomberg suggests it’s because “gay, lesbian, bisexual, and even transgender characters have become prominent in recent years on TV shows such as ’Modern Family,’ ‘Scandal,’ ‘Degrassi,’ and ‘Glee,’ as well as in movies” (May 22, 2015). Why do people overestimate the proportion of crimes committed by people of color? An op-ed in The New York Times argues that the cause is a targeted “racial bias” that specifically distorts beliefs about people of color (September 7, 2014). It has become part of the public discussion, therefore, that personal and topic-specific biases have caused massive political ignorance, especially when it comes to politically relevant demographic proportions.

These topic-specific explanations, however, cannot account for the striking regularities in misestimation across specific issues, populations, and surveys. Topic-specific explanations predict strong individual differences in misestimation patterns across countries, topics, and time periods. For instance, those specific subpopulations that are seen as a cause of fear or are over-represented in the media should be heavily overestimated. As the demographic groups that are feared or over-represented in the media differ across countries, so should biases in estimation. But if we zoom out to consider a wide variety of demographic estimates across countries, instead of focusing only on a limited sample of hot-button topics within the U.S., a systematic pattern is glaringly obvious: Small values are overestimated, and large values are underestimated, regardless of the topic. Previewing our results, Fig. 2 below shows the strength of this pattern, manifested as a clear (inverted) S-shaped curve that cuts across a wide range of countries, topics, and values. Although there is some question-by-question variability, it appears that most estimation error is driven by this S-shaped curve. One consequence of this pattern is that the degree of overestimation is related systematically to the true size of the quantity being estimated: The smaller the true proportion, the more it is overestimated. This striking regularity suggests that one of the major causes, if not the major cause, of widespread demographic misestimation is some domain-general psychological process related to the estimation of demographic proportions.

Indeed, researchers working on quantitative estimation are familiar with this pattern of errors—overestimation of small values and underestimation of large values—particularly when those estimations involve some degree of uncertainty (Hollands & Dyre, 2000; Spence, 1990; cf. Gescheider, 1976; Stevens, 1975). Indeed, this over–under pattern has been reported in fields as diverse as children searching through sandboxes (Huttenlocher, Newcombe, & Sandberg, 1994), the estimation of the age of the earth (Resnick & Shipley, 2013; Resnick, Shipley, Newcombe, Massey, & Wills, 2012), numerical magnitudes (Barth & Paladino, 2011), proportions of dots presented on a screen that are white or black (Varey, Mellers, & Birnbaum, 1990), proportions of letters in a string that are “a” (Erlick, 1964), and personal economic decision-making, as analyzed in prospect theory (Tversky & Kahneman, 1992). One upshot of this research is that perceptions can, and often do, differ starkly from their explicit expressions (Gescheider, 1976; Mosteller & Tukey, 1977; Stevens, 1946), since the process of transforming perceptions into responses can introduce systematic distortions. Nevertheless, research in political science often uncritically takes errors in estimation to be direct indicators of biased or misinformed perceptions.

The central claim of this article is that the interpretation of polls and surveys must account for domain-general features of human proportional reasoning; only afterward is there is any need to invoke issue-specific theories about media bias, fear, or other social or informational factors. In other words, before invoking domain-specific biases—such as homophobia or xenophobia—to explain demographic misestimation, one should first consider how a perfectly informed individual, with access to unbiased information about various populations, would estimate the relative sizes of those populations. This individual would be subject to the standard psychophysical mechanisms that are known from other instances of proportional reasoning—mechanisms that have the potential to make the individual appear misinformed when completing a survey. This approach is in line with prior work in political science that has sometimes suggested that general numeracy and cognitive function may be important for demographic estimation (Alba, Rumbaut, & Marotz, 2005; Herda, 2015; Kunovich, 2012; Lawrence & Sides, 2014; Lundmark & Kokkonen, 2014). We expand on this argument by providing precise predictive models of the connection between demographic perception, general numeracy, and explicit demographic estimates, grounded in the prior literature on psychophysical transformations.

In what follows, we first provide a brief review of psychological models of proportional reasoning, in order to motivate intuitions for why proportional reasoning might function the way it does. In particular, we discuss two properties of human cognition that suffice to explain the S-shaped pattern of responses that is found in proportion estimation: (1) Proportions are psychologically processed not as raw proportions, but as log-odds (Zhang & Maloney, 2012), and (2) human reasoning tends to combine new information with prior expectations (i.e., cognition is often “Bayesian”; Huttenlocher, Hedges, & Duncan, 1991; Lee & Danileiko, 2014). We then introduce a specific model of proportional reasoning that formalizes these two principles. Next we use this model to reanalyze two large, publicly available polls in which people across 14 countries estimated the sizes of demographic groups within their countries. Our results suggest that most demographic misestimation is a simple consequence of the quirks of human proportional judgments, rather than evidence of topic-specific fear or media-motivated biases. Although topic-specific biases may exist, most of the evidence that has been invoked as support for topic-specific biases is better explained by the domain-general psychological processes that translate psychophysical stimuli into explicit estimates.

How do humans reason about proportions?

Human judgments do not involve direct translation of the incoming information and beliefs to explicit responses. Instead, in general the psychological processes involved in representing and processing information introduce a nonlinear relation between information and response (Fechner, 1860; Spence, 1990; Stevens, 1957). This insight is the basis of modern psychophysics. Psychophysics breaks down the route from the initial information to an explicit response into a series of steps. First, there is the raw, incoming information about the world, which itself may be biased (e.g., in the context of demographic estimates, there may be media bias in the representation of the immigrant population). Second, this raw information is perceived by the individual, which involves processing this incoming information to create a perception of the world (e.g., a perception of the size of the immigrant population). Third, if an individual is asked to give an explicit response (e.g., a numerical estimate on a survey, voting, etc.), the perception must be transformed from an internal scale into an explicit response. When an explicit estimate is incorrect, therefore, it could reflect any stage of the psychological process: bias in the raw information, bias introduced when creating a perception from this raw information, bias introduced when translating an perception into an explicit response, or any composition of these.

To make sense of the public’s misestimation of demographic proportions, we must thus consider the combined influence of the psychophysical transformation from raw information to perceptions of the world, and then from these perceptions to explicit judgments. These psychological processes are not specific to demographic estimation; they occur whenever humans make judgments on the basis of information—any information, whether visual or derived from memory or any combination of different sources. Critically, prior work on nondemographic proportion estimates has demonstrated that these domain-general psychological processes introduce reliable, systematic deviations. When people generate explicit estimates of proportions, they overestimate small proportions and underestimate large ones (Hollands & Dyre, 2000; Spence, 1990). A very similar phenomenon occurs with other kinds of quantitative estimates, such as probability judgments as studied in prospect theory (Gonzalez & Wu, 1999; Prelec, 1998; Tversky & Kahneman, 1992).

Why might proportion estimation rely on psychological processes that introduce such reliably “incorrect” judgments? One possible reason is that, under many circumstances, it is reasonable for an individual to rely not only on raw information, but also on prior expectations about proportions in general. If an individual’s information suggests an especially extreme value, then they might suppose, reasonably, that the sample of information they have been exposed to must have been biased. Indeed, when judgments occur under uncertainty, it can be rational to combine raw information with prior expectations—even though this process can produce explicit estimates that systematically overestimate small values and underestimate large values (e.g., Fennell & Baddeley, 2012; Fox & Rottenstreich, 2003; Huttenlocher et al., 1991; Petzschner, 2012). One good example of this kind of approach is the “Decision by Sampling” framework (Stewart, Chater, & Brown, 2006), according to which decisions are made by combining information sampled from the decision context with beliefs about the background distribution of relevant values. Systematic over- and underestimations are thus natural predictions of both optimal and descriptive accounts of judgment and decision-making. In all these accounts, information from the current context is combined with past beliefs, experiences, and perceptions in ways that can manifest as systematic biases.

An example might help to illustrate the rationality of this general strategy. Let’s say that you encounter a new species of tree, and the first three such trees that you encounter are exceptionally tall. What should you conclude about the average height of this species of tree? One approach would be to conclude, on the basis of that raw information, that trees of this species, on average, grow very tall—much taller than the typical tree. Alternatively, one might conclude that those three individual trees must have been extreme outliers, since your past experience with tree heights of other species suggests that such extreme heights are highly unlikely. Given the conflict between the heights that are observed and the heights that are expected to be seen, a reasonable thing to do is to split the difference and guess that trees of this new species are probably, on the whole, somewhat taller than other trees, though perhaps not as tall as one might conclude from the three observed trees alone. Similarly, if you were to encounter three trees of another new species, all three of which are exceptionally small, you might reach the complementary conclusion: Trees of this species are small, but likely somewhere between your small sample and the typical height of trees that you have encountered (see Fig. 1). More generally, evidence of any extreme value should rationally be attributed partially to a biased sample and partially to a genuinely extreme value—and, thus, explicit judgments should end up somewhere between the value suggested by the current information and the value suggested by the prior distribution. According to Bayesian accounts of reasoning, this strategy—combining newly sampled information with prior expectations—is a basic principle of rationality (Box & Tiao, 1973).

Fig. 1
figure 1

(Top panel) Illustration of the rational process of adjusting extreme but uncertain perceptions toward aggregate prior expectations (i.e., a “typical” value). Other panels: Representative results from prior research into proportion judgments. Compare the results to those in Fig. 2, which presents data specifically from demographic estimations. These studies cover a range of different psychophysical tasks: (left) proportion of a presented sequence of letters that were the letter “A” (Erlick, 1964); (middle) proportion of presented dots that were of a particular color (Varey, Mellers, & Birnbaum, 1990); (right) proportion of a time interval delimited by clicks (Nakajima, 1987). Bottom panels adapted from “Bias in Proportion Judgments: The Cyclical Power Model,” by Hollands & Dyre, 2000, Psychological Review, 107, p. 501. Copyright 2000 by the American Psychological Association

How do these considerations apply to estimates of demographic proportions? Here we provide two examples to illustrate the intuitions of the psychophysical models presented below. Say that you are asked about the proportion of Americans who are Cambodian Americans. Sampling from your experience, you might expect this proportion to be quite low—perhaps less than 1%, if you have had low exposure to Cambodian people in your daily life. On the other hand, you have lots of experience with demographic subgroups in general. Most such groups are much larger than the Cambodian-American population, and only very few are smaller.Footnote 1 As in the case of the extreme tree heights, part of the extremity of this sample is probably due to the sample being small and noisy. Thus, on the basis of your experience with the distribution of demographic proportions, you could reasonably infer that your sample of Cambodian people is unrepresentative—that there are many more Cambodian people living in the U.S. than your personal experience would suggest, but you have not encountered or recognized them. Even if your memory sample of Cambodian Americans was actually unbiased—and, indeed, the true value is quite small, around 0.07%—then a rational reasoning process would push you to systematically overestimate the proportion. Analogous considerations apply to majority subgroups, which should be rationally underestimated. This process of rational adjustment can be conceptualized as a special case of regression to the mean, in which explicit estimates of demographic proportions are shifted systematically toward the mean proportion across all demographic groups (i.e., 50%). By taking into account previous knowledge, this strategy reduces overall error—but at the cost of introducing systematic biases, with smaller values overestimated and larger values underestimated (Box & Tiao, 1973; Huttenlocher et al., 1991). We will refer to this pattern as uncertainty-based rescaling, or simply rescaling.

Uncertainty-based rescaling is thought to be ubiquitous in quantitative judgments, and evidence for rescaling comes from a wide range of situations. Systematic overestimation of small values and underestimation of large ones has been observed in judgments in diverse domains, including number estimation (Barth & Paladino, 2011; Cohen & Blanc-Goldhammer, 2011; Landy, Charlesworth, & Ottmar, 2017), reading of bar graphs and pie charts (Spence, 1990), the perceived proportion of a letter in a list of random letters (Erlick, 1964), the proportion of dots in a collection that are white or black (Varey et al., 1990), remembering the location of dots in space (Huttenlocher et al., 1991), and risky events (Gonzalez & Wu, 1999; Tversky & Kahneman, 1992). Figure 1 presents a small sample of prior results.

A second critical feature of proportional reasoning is that, under most circumstances, people process proportions not as percentages or probabilities, but as odds (Fox & Rottenstreich, 2003; Gonzalez & Wu, 1999; Shepard, 1981; Spence, 1990; Stevens, 1957; Tversky & Kahneman, 1992; Zhang & Maloney, 2012). The difference is subtle but important. In a percentage, one considers the outcome of interest as a proportion of all possible outcomes (e.g., Asian Americans/all Americans). To calculate the odds, one compares the outcome of interest with the rest of the possible outcomes (e.g., Asian Americans/non-Asian Americans). For example, you may estimate that your train runs late about two out of every ten times. You may then represent the proportion of “running late” cases and consider it against the “on time” cases, yielding an odds of .2 to .8, or .25. Odds are rarely presented in scientific contexts, but are prototypical in betting situations. Notice that although the comparison of odds forms the basis of a variety of statistical techniques, including logistic regression and risk analysis, here we consider simply the evaluation of a single odds: the relation between the proportions of times that a thing happens and that it does not. Moreover, since the mental representation of most unbounded positive quantities is roughly log-scaled (Dehaene, 2003; Fechner, 1860; Shepard, Kilpatric, & Cunningham, 1975; Zhang & Maloney, 2012), the mental representation of odds is likely also log-scaled. Thus, in accounting for proportional reasoning, we must take into account the principle that the psychological processes will operate over the relevant log odds, not the raw proportions.

Modeling the psychology of proportion estimation

A number of different models have been developed to account for rescaling and other features of proportional reasoning (Asano, Basieva, Khrennikov, Ohya, & Tanaka, 2017; Cohen & Blanc-Goldhammer, 2011; Hollands & Dyre, 2000; Huttenlocher et al., 1991; Lee & Danileiko, 2014; Prelec, 1998; Spence, 1990; Tversky & Kahneman, 1992; Petzschner et al. 2015). Although these differ in the details of their predictions, they all capture the same pattern of systematic over- and underestimation that typically appears in proportional reasoning in general, and in demographic estimates in particular.

We sought a model that formalized both insights about the psychology of proportional reasoning: (1) People encode proportions as odds, and these odds are represented, like other unbounded positive variables, on a log scale, and (2) people have prior expectations and uncertainties about proportions, and they incorporate those priors into their explicit estimates (i.e., uncertainty-based rescaling). The first insight implies that the mental representation of a proportion, p, should consist of the log odds, r p :

$$ {r}_p=\log \left(\frac{p}{1-p}\right). $$
(1)

Conversely, to give an explicit estimate in terms of proportions, the (possibly transformed) log odds \( \left({r}_{p\prime}\right) \) must be converted back into a proportion:

$$ {p}^{\prime }=\frac{e^r{p}^{\prime }}{1+{e}^r{p}^{\prime }}. $$
(2)

One implication of this log scaling is that the psychological distance between proportions is given by their difference in log, rather than linear, space. For example, the psychological distance between a 20% chance of my train running late and a 50% chance should be roughly the same as that between a 20% chance and a 6% chance [i.e., since log(.5/.5) = 0, log(.2/.8) ≈ –1.4, and log(.06/.94) ≈ –2.8].

The second insight—uncertainty-based rescaling—implies that estimates of demographic proportions should reflect an interpolation between perceptions and a generic, domain-general prior:

$$ {\psi}^{\prime}\left({r}_p\right)=\gamma {r}_p+\left(1-\gamma \right)\mathit{\ln}\left(\delta \right), $$
(3)

where r p is the source proportion as perceived in log-odds space, γ is the relative weighting of the perception or the prior, ln() is the location of the prior expectation (i.e., the distribution of values of a “typical” proportion), and the result, ψ′(r p ), is the predicted explicit response—all expressed as log odds rather than as raw proportions. Since political surveys require participants to respond using proportions, not log odds, Eq. 4 can be transformed using Eqs. 1, 2, and 3 to give:

$$ \psi (p)=\frac{\delta^{\left(1-\gamma \right)}{p}^{\gamma }}{\delta^{\left(1-\gamma \right)}{p}^{\gamma }+{\left(1-p\right)}^{\gamma }}. $$
(4)

This is equivalent to Eq. 3, except that it is given in terms of the raw source proportion, p, instead of the log odds of the source.

This model allows us to predict how a “perfect” observer would respond when asked to estimate demographic proportions. Even if somebody had correct, unbiased information about the exact, true values for all demographic proportions, their explicit estimates should exhibit systematic “errors” due solely to the psychological processes involved in proportional reasoning. Thus, if we assume that perceptions are perfectly correct, then we can predict how a “perfect” observer would transform their unbiased perceptions into explicit proportion estimates—estimates that, even for a “perfect” observer, will reflect the rational rescaling that is characteristic of quantitative estimates.

In fact, Eq. 4 has been used previously, sometimes with slight modifications, to account for judgments about a variety of more mundane, human-scale proportions, preferences, probabilities, and risks (Birnbaum & McIntosh, 1996; Goldstein & Einhorn, 1987; Gonzalez & Wu, 1999; Karmarkar, 1979; Lattimore, Baker, & Witte, 1992; Tversky & Fox, 1995; cf. Tversky & Kahnemann, 1992). We should note, however, that our intention is not to advocate this particular model—although, as we shall see, it does an excellent job of predicting errors in demographic estimates. Rather, our intention is to illustrate the more general point that apparent errors in demographic estimates might be nothing more than one manifestation of ubiquitous properties of human proportion estimation under uncertainty.

Summary

Proportion estimation is an extremely well-studied phenomenon in human reasoning and shows a consistent, clear pattern of inward bias toward mean or expected values. But discussions in political science and the media have not considered that, as a specific variety of proportion estimation more generally, estimates of demographic proportions will be subject to the same domain-general sources of systematic error, which are thus not always evidence of topic-specific misinformation. Therefore, when interpreting systematic errors in demographic proportion estimation, the first step should be to compare the estimates to those of a “perfect” observer who is subject to the basic, domain-general psychological processes that govern proportional reasoning. One way to do this is to apply an established psychophysical model of proportion estimation that formalizes the basic principles that govern proportion judgments under uncertainty.

This approach still leaves a role for the previous explanations in the literature, such as media bias, disproportionate exposure to certain groups, or fear. But if domain-general psychological processing is responsible for most estimation error, then these topic-specific explanations should only be invoked after accounting for deviations that will occur naturally and rationally. Topic-specific explanations such as media bias, social contact, or xenophobia will likely be necessary to explain residual deviation between individuals’ estimates and the predicted estimates of a “perfect” observer—but should not be invoked to explain errors in the individuals’ raw estimates. This is because perfectly rational, informed people will vastly overestimate low demographic proportions and underestimate high ones. The critical question, then, is whether past reports of widespread, topic-specific biases in demographic estimates are anything other than the rational rescaling that occurs during quantitative judgments under uncertainty.

Present study

Method

To investigate whether considerations of domain-general psychological processing can account for previously reported demographic ignorance and misinformation, we reanalyzed two large, multinational surveys for which the datasets are publicly available. Both these surveys have been used to argue for widespread topic-specific biases. The first was the Ipsos MORI Perils of Perception Survey, conducted in 2014 by the public-policy polling center Ipsos Social Research Institute. They polled individuals (n = 11,527) from 14 countries across North America, Europe, and Asia. Each individual was asked about the demographic proportions of his or her own country. The sample was a non-probability-based sample that was weighted to match the national demographics. The second was a subset of the European Social Survey (ESS), in which participants (n = 38,339) from 20 European countries were asked to estimate the proportion of their own country’s population that was foreign-born (Norwegian Social Science Data Services, 2006; analyzed previously in Sides & Citrin, 2007). Both polls phrased questions in units of 100—for instance, “out of every 100 people living in [Country], how many immigrated to that country”? For details on the sampling designs, response rates, and so forth, please see the Appendix.

The model given by Eq. 4 was hand-coded, and all statistical analyses were conducted in the R statistical environment (R Development Core Team, 2015). Our models were fit using maximum-likelihood methods and assuming normally distributed errors. The full model and dataset are available at https://osf.io/kt8wa/.

Results

For both polls, estimation errors were strongly predicted by the true value of the items being sampled. As expected, large values were systematically underestimated, and small values overestimated, regardless of the specific topic or country (Ipsos: true for 95/98 estimated items, p < .0001, binomial test; ESS: 19/20, p < .0001). We compared these estimates to what we should expect if individuals had veridical, unbiased estimates of the true proportions, but rescaled all their estimates toward a generic expected value, as predicted by the model. These predictions were an excellent fit to the actual estimates, and could account for the observed relation between the true values and raw estimation errors (Fig. 2, gray lines). Indeed, using the model to account for the systematic rescaling predicted by psychological models of proportional reasoning had a dramatic impact on apparent bias. For the Ipsos poll, the root-mean squared error (RMSE) in the raw responses dropped from 17% without accounting for uncertainty-based rescaling, to just 6% after rescaling was taken into account. For the ESS, the RMSE dropped from 9% to 5%. In other words, the explicit estimates were actually quite close to what a “perfect” observer might say, assuming that even a perfect observer will engage in rescaling when making explicit estimates. Most of the apparent “error” in observed estimates is what we should expect if individuals were engaging in rational rescaling. Thus, error that has been interpreted previously as topic-specific “ignorance” is actually predicted systematically by a simple, issue-agnostic psychophysical model of proportion estimation under uncertainty.

Fig. 2
figure 2

Results of the 2014 Ipsos MORI Perils of Perception survey (A and B) and the European Social Survey (C and D). The left panels (A and C) present country-level mean estimates of a variety of demographic proportions. Note that smaller values are systematically overestimated, while larger values are systematically underestimated. The dashed lines indicate “perfect” accuracy, in which the explicit estimates are equal to the true proportions. The gray curve in each panel indicates the predictions of a psychologically plausible model of proportion estimation assuming that individuals have unbiased perceptions of the true value but systematically adjust their explicit estimates toward a more “typical” value. The right panels (B and D) present the same results, but in terms of log odds. When converted to log odds, the curves in panels A and C become straight lines: The model predicts a linear relation between the true and estimated log odds. In these panels, the slope indicates the amount of rational rescaling toward a typical value (see Eq. 3). Delta, the log odds of the fitted prior, is given both as a proportion (left) and in log-odds space (right)

Another way to visualize this result is in terms of log odds instead of raw proportions (Fig. 2B and D). As is illustrated by Eq. 4, the model predicts a simple linear relation between the true and estimated log odds, with the slope of this relationship indicating the amount of rational rescaling toward a typical value. When interpreted as log odds, explicit estimates of demographic facts are related systematically and linearly to the true values of those facts.

If individual issues were subject to topic-specific biases such as phobic threat or media bias, this would manifest itself as reliable and systematic deviations after accounting for uncertainty-based rescaling. However, the residual errors suggested patterns of bias that differed starkly from those indicated by previous work (Fig. 3). In fact, for many items, the direction of residual bias after rescaling was in the opposite direction from the pre-rescaling error (Ipsos: 43/98; ESS: 9/20). For instance, every single country in the Ipsos poll overestimated the proportion of immigrants—sometimes by a factor of eight. But after accounting for rescaling due to uncertainty, most countries’ estimates were actually less than the predicted estimate for proportions of that magnitude (i.e., most green dots lie below the gray line in the online version of Fig. 2A). In other words, after controlling for domain-general psychological processing of the proportions, it appears that immigration-specific factors may be driving estimates of these populations down, not up, relative to other demographic groups of the same size. This residual error that is left after accounting for general psychophysical rescaling could be explained by immigration-specific factors—factors such as underrepresentation in the popular media and in many social networks.

Fig. 3
figure 3

Residual errors from the Ipsos MORI dataset, separated by country and question. The y-axis is ordered by the mean cross-national average objective proportions of the subgroup. Cool bars indicate raw errors, and warm-colored bars indicate the errors after accounting for rescaling. The raw errors are both larger and more structured than the postmodel errors (i.e., the subgroups with typically large values are underestimated, but small values are overestimated)

Discussion

Our central thesis is simple: Demographic proportion estimation is just one specific kind of proportion estimation, and thus is subject to the same psychological processes that humans use to estimate other proportions. This banal observation has immediate and major implications for the interpretation of demographic misestimation. Decades of psychological research on proportional reasoning has established that explicit estimates are not direct reflections of perceptions, but systematic transformations of those perceptions. As a result, surveys and polls that ask participants to estimate demographic proportions cannot be interpreted as direct measures of participants’ (mis)information, since a large portion of apparent “error” on any particular question will likely reflect rescaling toward a more moderate expected value, regardless of the specific demographic population being estimated. Indeed, as predicted, in the present study we found that a simple, domain-general model of proportion estimation can predict most of the “error” in demographic estimates. Of course, this work does not undermine the broader conclusion that citizens are, in general, distressingly misinformed about the world—for instance, about nonquantitative judgments (e.g., Berinsky, 2015; Nyhan & Reifler, 2010; Somin, 2013). When it comes to misinformation in quantitative estimates of demographic proportions, however, our results indicate that most of the difference between the true values and explicit estimates reflects rational rescaling under uncertainty, not topic-specific bias or misinformation.

The two datasets examined here both show massive errors in demographic estimates, errors that have been taken as evidence of topic-specific bias or misinformation. For instance, Sides and Citrin (2007) stated that “[i]f correct information about immigrant stock and flows reached the general public, our analysis suggests that the sense of ‘threat’ might wane, mitigating hostility towards immigrants (p. 501).” This interpretation makes sense only if people are, indeed, misinformed about immigrant stocks and flows. After accounting for rational rescaling, however, residual errors in the estimates of immigration were small, and often in the opposite direction from what one would infer from the raw data. According to our analysis, therefore, global overestimation of immigration follows naturally from general psychophysical biases; the puzzle that remains for topic-specific explanations is why immigration was not overestimated even more. Explaining these errors will thus require theoretical accounts that go in the opposite direction from those that are dominant in the field.

The goal of the present study was to explore how explicit demographic estimates would deviate from the true value if perceptions of the world were unbiased; we thus assumed that people’s perceptions were normally distributed around the true value. However, in reality, individuals are unlikely to be perfectly informed. Topic-specific biases, while probably not as massive as has previously been assumed, are nevertheless likely to be widespread. The model can easily be adapted to account for topic-specific biases—such as those due to media bias or xenophobia—by assuming that perceptions are distributed around some other, biased value. Future work should investigate whether incorporating topic-specific biases can help account for the small residual errors that are not explained by domain-general, rational rescaling. Media bias, for example, cannot account for the general pattern of over- and underestimation that we have documented here. But we can now ask whether, after accounting for psychophysical rescaling, media bias accounts for any of the residual error in individuals’ estimates.

In past research on how perceptions affect political judgments and decisions, most researchers have interpreted raw estimation error as a direct measure of individual misinformation or bias (but see Kuklinski et al., 2000; Pasek, Sood, & Krosnick, 2015). However, from the psychophysical perspective that we are advocating here, political expressions of beliefs will be driven, not just by perceptions themselves, but by the outcome of some process that transforms perceptions into responses. As a result, citizens may internally hold unbiased perceptions about demographic groups, but the unavoidable process of transforming those perceptions into responses will introduce systematic distortions, whether citizens are estimating proportions for a survey or using their perceptions to decide how to vote. Making sense of these different transformations, which can differ by context or task, will improve our understanding of individual-level political behavior and help develop interventions aimed at correcting perceptions and civic behavior. For instance, people who feel more threatened by minorities also provide higher estimates of the size of minority populations (Nadeau et al., 1993). But such differences in overestimation may reflect two very different phenomena among respondents who feel more threatened: greater antiminority bias (the preferred explanation of many political scientists) or increased uncertainty about the size of minority populations (which, in our account, will also lead to overestimations). Without a proper analysis that is grounded in the psychology of quantity estimation, it is impossible to distinguish these different sources of error.

Identifying the true source of errors in demographic estimation is important both practically and theoretically. Most obviously, if a person does not know at all how to transform a perceived quantity into a reported percentage, then “informing” them of the true immigration rate may be essentially worthless. Alternatively, if people are massively misinformed but give reasonable estimates due to uncertainty-related psychophysical transformations, then incorrect perceptions may be left unaddressed by interventions that specifically target issues that are reliably misestimated. These possibilities may have contributed to previous findings that supplying relevant information, such as the rate of unemployment and poverty in the U.S., has limited utility in shifting estimates and political opinions (Lawrence & Sides 2014). Similar psychophysical analyses may play a helpful supporting role in aiding the interpretation of different patterns of overestimation across states, counties, and individuals (Wong, 2007), but we leave this to future research.

Although past work on political ignorance has largely ignored the implications of psychological rescaling, many social scientists already rely on statistical techniques that implicitly formalize the same central insight: that extreme values should be treated with suspicion and moved toward the expected value. Ridge or lasso regressions, for instance, involve a process known as shrinkage, in which estimates of the observed values are “shrunk” toward a common value (e.g., Hastie, Tibshirani, & Friedman, 2009). From a statistical perspective, this can be an informed and even rational correction for unlikely values, and it is built into many modern statistical inference processes, including some approaches to Bayesian statistical inference (Gelman & Shalizi, 2013). Therefore, when social scientists use these techniques, they are implicitly deploying one principle of uncertainty-based rescaling. But what’s good for the goose is good for the gander. If it is rational for social scientists to overestimate extremely low values and underestimate extremely high values, then it should be equally rational for everyday citizens, when faced with uncertainty about demographic proportions, to “shrink” their extreme estimates toward a more moderate expected value.

There are some limitations to our analytic framework. First, as we mentioned, a variety of psychological models of proportional reasoning can account for the general pattern of over- and underestimation that we observe here, and these make slightly different predictions. We do not claim that the model used here is the best among those psychophysical models. Our intention here is only to illustrate the more general point that demographic estimates must be interpreted in light of the psychology of proportional reasoning. When interpreting any particular survey, poll, or study, some psychological model that accounts for rational rescaling is a necessary first analytical step, before moving on to more topic-specific, ad-hoc explanations. Second, here we analyzed group-level data, but the model was developed to account for individual behavior (Gonzalez & Wu, 1999). This is problematic because the model is not closed under averaging—that is, the mean of two individual log-linear response patterns is not usually itself log-linear in the aggregate. However, for the Ipsos MORI dataset analyzed in the present study, only mean responses were publicly available. Although, typically, the average of many individual log-linear response patterns is itself a fairly close approximation to a log-linear response pattern, future research should confirm the present results by analyzing responses on an individual level. Indeed, ongoing work has shown that patterns of over- and underestimation at the population level are well accounted for by rational rescaling of log odds (Marghetis et al., in prep.).

Two outstanding questions remain unanswered: First, do individual differences in “uncertainty-based rescaling” of demographic proportions reflect differences in individuals’ uncertainty? In our account, the more uncertain an individual is about a true value, the more he or she should rely on the prior distribution of values typical to that category (see also Stewart et al., 2006). Indeed, in other domains, uncertainty has been found to predict the tendency to rescale judgments toward an expected or typical value (Crawford, Landy, & Salthouse, 2016; Fox & Rottenstreich, 2003; Huttenlocher et al., 1991; Martins, 2006). Second, how do individuals’ demographic estimates relate to their political behavior? That is, when people engage in political behavior, such as voting or advocating a policy, do they rely primarily on their explicit estimates—which combine perceptions with prior expectations—or do they behave on the basis of perceptions alone? In at least some decision-making contexts, rescaled probability estimates have been found to predict subsequent evaluations and behaviors (Tversky & Kahneman, 1992). Alternatively, explicit estimates may only shape behavior when the task or situation requires individuals to construct an explicit estimate, thus prompting a rescaling of their implicit perceptions; in cases in which explicit estimates are not salient, perceptions may be more potent influences on behavior.

These results have important implications for policy makers, the press, and political scientists. Overestimation of minority populations has often been taken as evidence of personal or topic-specific biases—most commonly media bias, phobic innumeracy, or social-contact biases. Our results underline the principle that overestimation is a natural consequence of making estimates about values that are smaller than the expected or typical value (e.g., the proportion of the U.S. population that is Asian, which is smaller on average than most demographic groups). To conclude that there is a truly topic-specific bias, individuals’ demographic estimates need to exceed what we would predict on the basis of rational rescaling alone. In light of this, significant prior work must be reconsidered.

Finally, these results point to the need for greater collaboration between cognitive psychologists and psychophysicists, on the one hand, and political scientists and polling agencies, on the other. While political scientists are making increasing use of psychological theories and methodologies, political scientists and policy-relevant polling groups cannot be expected to divine the results and patterns learned over decades of research by psychologists (Druckman, Kuklinski, & Sigelman, 2009). Similarly, psychologists are unlikely to understand the full ramifications of their models and results in adjacent fields, like political science, that use similar methods to ask important and practical questions. Our results point to the potential for fruitful collaborations across disciplines, collaborations that can sharpen our responses to important theoretical and empirical questions.