On the eve of the 2008 New Hampshire Democratic presidential primary, pre-elections polls predicted a sizable win for Barack Obama in the first presidential primary of the year. A victory in New Hampshire would be the second consecutive defeat for the one-time front runner Hillary Clinton and essentially end her bid for President. In a surprising turn of events, Hillary Clinton defied pre-election poll predictions and won the New Hampshire primary. While some polling critics argued that this was evidence of the Bradley Effect, others claimed the poll discrepancies could be explained by an unexpected boost in turnout or possibly a large shift in undecided voters towards Clinton. What was omitted from this discussion, however, was the role that Clinton’s gender played in these polling discrepancies.

While the claim that polls often over-state support for African-American candidates—a phenomenon known as the Bradley Effect—is a familiar one, women too may be susceptible to systematic polling biases. We can readily identify instances where polls have overestimated the support of female candidates such as Janet Napolitano in the 2002 Arizona Gubernatorial Election. However, there appear to be many more cases where the polls have underestimated the support of female candidates. For example, Hillary Clinton in the 2000 New York US Senate Election, Ann Richards in the 1990 Texas Gubernatorial Election, and Christine Todd Whitman in the 1993 New Jersey Gubernatorial Election were predicted to lose by sizable margins or be in very competitive contests, and yet all three won their respective elections with ease. This anecdotal evidence begs the following questions: To better understand polling inaccuracy, must we take into account the gender of the candidate, the same way we should consider race (Finkel et al. 1991; Traugott and Price 1992; Reeves 1997)? Or is the gender of the candidate inconsequential once we account for factors thought to influence polling accuracy, such as the party of the candidate, turnout, margin of victory, and the proportion of undecided voters in any poll?

Despite a very large literature on polling accuracy, there is almost no research examining the role that gender has on polling discrepancies. The one refreshing exception to this rule is Hopkins’ (2009) study in which he examines a rich data set of female candidates who campaigned for either the US Senate or Governor from 1989 to 2006. In his analysis, which is the first to systematically examine polling bias for female candidates, Hopkins concludes that there is no “Whitman Effect” (the term he uses for polling inaccuracies with respect to female candidates) and that polls do not overestimate support for women. Despite the fact that Hopkins finds that polls do not overestimate support for female candidates, his analysis stops short of testing alternative hypotheses or offering any theoretical explanation why we would observe differences for female candidates. Therefore, a necessary addition to this literature is a test of whether polls underestimate support for female candidates and a theoretical explanation of why we might observe gender related polling discrepancies.

Moreover, we build on Hopkins’ work by adding additional features to the empirical analysis. These features include the addition of a set of carefully selected white-male matched comparison cases and the introduction of other potentially important factors such as the election and social context, the incumbency status of the candidate, and gender specific turnout rates. If we find that the accuracy of polls for female candidates is significantly reduced when compared to polls for similarly situated traditional white male candidates, then this is strong evidence that such a decrease in polling accuracy is somehow associated with the candidate’s gender.

In this paper, we analyze polling accuracy for female candidates. Based on previous research pertaining to gender and socially desirable response bias, we construct two competing hypotheses. The first hypothesis claims that polls will overestimate support for women, likely because respondents want to avoid being perceived as sexist. The second hypothesis argues that polls will tend to underestimate support for women because voters may fail to publicly endorse a female candidate out of reluctance to signal support for non-traditional views of gender roles. Using data on female and white male candidates collected from the US Census, university and newspaper polls, and numerous state election resources, we run several standard OLS regressions to test our hypotheses. The results indicate that pre-election polls for female candidates significantly underestimate their ultimate electoral support, a phenomenon we call the “Richards Effect” after former Texas Governor Ann Richards.

To further investigate the motives behind the systematic underestimation of female candidates’ electoral support, we analyze contextual factors for the universe of female candidates who were campaigning for governor or US Senator from 1989 to 2008. We find that female candidates are most likely to experience larger polling discrepancies in states where fewer women are in the labor force and in states where the Congressional delegation has a poor record on supporting progressive gender issues. The results from both sets of analyses suggest that voters tend to prove more supportive of female candidates in states in the voting booth than in responding to a pre-election poll, and this tendency intensifies in gender-conservative states.

Causes of Polling Inaccuracies

Election polls have evolved over the course of a century and have become increasingly precise instruments for measuring election outcomes. This is why most pre-election polls are very successful in predicting the eventual winner (Bolstein 1991; NCPP 1997; Mitofsky 1998; Traugott 2005; Keeter and Samaranayake 2007). Several authors have shown that the best predictor of whether or not a person will vote and who they will vote for can be found in their pre-election poll response (Mosteller 1949; Bolstein 1991; Mitofsky 1998). Despite the general accuracy of modern pre-election polls, there have been numerous instances in which the actual results do not match their predictions (the 1948 and 1996 Presidential elections are often cited as examples of particularly poor poll performance). Furthermore, under certain conditions related to the nature of the candidates and the race, the predictive power of these polls may be further diminished.

Studies of polling have identified several factors that could lead polls to be less accurate. Some of the usual culprits of imprecise polling are large numbers of undecided voters and larger than expected turnout rates (Perry 1979; Fenwick et al. 1982; Mitofsky 1998; Berinsky 1999; Durand et al. 2001; Traugott 2005). Others have also shown that non-competitive races and elections with incumbents can also lead less accurate polling, both pre- and post-election (Crespi 1988; Wright 1990; Gow and Eubank 1984). These factors have historically helped predict polling accuracy. We propose that certain salient characteristics of the candidate’s identity as a necessary addition to this literature.

One of the major shortcomings of the literature on polling accuracy is that it has not taken into account the gender of the candidate, with the only major exception to this rule being Hopkins’ 2009 piece. Perhaps the chief reason for this is that until the 1990s few female candidates were ever nominated for races that got statewide or national attention. As a result, only a small number of firms conducted polling for these candidates. With the growth of female candidates in recent elections there has been a concomitant increase in the number of polls gauging their support. In what follows, we argue that polls predicting support for female candidates may be less accurate than for traditional candidates, even once context and other relevant factors have been taken into account.

Gender and Polling Inaccuracies

A commonly cited problem with polling and survey accuracy is that respondents may give misleading responses in order to appease the interviewer. This phenomenon is known as socially desirable response bias and may be expected to be most common among respondents who have the most to gain from this deception (Hatchett and Schuman 1975; Jackman and Muha 1984; Krysan 1998). In the race featuring the candidate who lends her name to the “effect” discussed in this paper—Ann Richards—gender figured into the contest in important and salient ways.Footnote 1 Because many issues regarding gender are sensitive, survey respondents may—even more so than is usually the case—perceive that they are being personally judged based on their responses (Ballou and Del Boca 1980; Lueptow et al. 1990; Northrup 1992). Such a psychological process thus could induce the Richards Effect that we document below.

While there is a dearth of literature directly theorizing regarding the direction, if any, of polling biases for female candidates, we can gain some insight into how voters will respond to these candidates based on related research. Several studies find that opinion polls measuring attitudes on topics pertaining to gender are often subject to inaccuracies (Ballou and Del Boca 1980; Lueptow et al. 1990; Northrup 1992; Huddy et al. 1997; Johnson and DeLamater 1976; DiCamillo 1991; Streb et al. 2008). This research, however, provides two competing hypotheses which predict biases in opposite directions. The principal objective of our analysis is to adjudicate, using the universe of female candidates for Governor or US Senator from 1989 to 2006, between these two contradictory hypotheses.

One strain of research indicates that men and women will often over-report their support for causes and policies related to gender equality (Huddy et al. 1997; Johnson and DeLamater 1976; DiCamillo 1991). These studies find that some men and women want to appear more supportive of issues such as pay equity, the women’s movement, and non-traditional gender roles than they actually are in order to preclude being labeled a sexist (Huddy et al. 1997; Johnson and DeLamater 1976; DiCamillo 1991). Furthermore, Streb et al. (2008) find that there is a significant socially desirable response bias in discussions regarding hypothetical female presidential candidates. Using a list experiment, Streb et al. (2008) show that many respondents are willing to tell a pollster that they will vote for a female president when in actuality, they have a small probability of supporting such a candidate.

What incentives does a respondent have to say that they will vote for a female candidate or agree with a progressive gender ideology when they in fact do not? Similar to explanations of the Bradley Effect (see Finkel et al. 1991; Traugott and Price 1992), some respondents might tell pollsters that they support a female candidate, even when they do not, in order to preclude the appearance of sexism. There is evidence indicating that female candidates may be perceived as too liberal, weak on national security and crime, and biased toward women’s issues (Sapiro 1983; Huddy and Terkildsen 1993; Sanbonmatsu 2002). As a result, we might find that respondents mislead pollsters in stating their support for the female candidate, but based on fears that she will not adequately represent their interests, vote against her. If this hypothesis is correct, we should see that female candidates perform better in the polls than in the actual election. Moreover, this effect should be exacerbated in states where residents generally hold more progressive views about gender roles. It is in this context where respondents may falsify their preference in support of female candidates in order to appear more progressive on gender issues so their friends and colleagues do not think less of them.

A second strain of research suggests that both males and females are more progressive on gendered issues than they admit (Breinlinger and Kelly 1994; Buschman and Lenart 1996; Jacobson 1981). For example, previous studies have shown that respondents will agree with the idea of progressive gender roles including pay equity and increasing the number of women in positions of power, but these respondents will not identify as being a “feminist” or anything related to that term (Burn et al. 2000). Moreover, Theriault and Holmberg (1998) provide evidence that suggests that respondents want to appear more conservative on gender issues than they actually are. They find that the respondents in their study who score high on the social desirability scale “seem to be portraying themselves as relatively conservative and relatively traditional [on gender issues]” (Theriault and Holmberg 1998, p. 108). These authors argue that, possibly as a cultural backlash to the feminist movement of the 1960s and 1970s, some respondents may publicly distance themselves from the values that feminist fought for, while supporting these issues privately.

Why would respondents want to appear more conservative on gender issues than they actually are? There are a variety of reasons why this may be the case, including fears that a gender progressive agenda would be biased towards women and a negative cultural perception of feminism. While most men and women may publicly agree with pay equity and having more women in positions of power in the abstract, they may be uncomfortable with the methods used to obtain this equity. For example, some voters may see affirmative action as a vehicle for ending gender disparities in the work place (Bielby 2000; Northrup 1992). However, policies like affirmative action are perceived by some as examples of feminists being too aggressive and creating policies which unfairly benefit women (Northrup 1992). Those who support such policies may be negatively perceived as being too “radical” or unfair. As a result, some voters may support these issues privately, but publicly may be ashamed to support policies which are perceived as privileging women.

A persistent belief about female candidates is that they are seen as biased toward gender issues, regardless of their partisanship, and thus they are often perceived as being supportive of a progressive gender agenda (Leeper 1991; Huddy and Terkildsen 1993; McDermott 1997; Sanbonmatsu and Dolan 2009). As a result, some voters may believe that female candidates, like feminists, may advocate for policies which unfairly benefit women. While some may privately agree with and support policies that address gender inequity, they may publicly want to distance themselves from female candidates in order to preclude the appearance of being too “extreme”. Polling accuracy for women may suffer as a result as these voters may mislead pollsters by saying that they do not plan to vote for a female candidate, but then vote for her in the privacy of the voting booth.

Additionally, the negative stereotypes of feminism may lead some voters to try to distance themselves from any progressive gender issues. In addition to being perceived as too aggressive, some people attribute the demise of the traditional American family to feminism and the women’s liberation movement (Burn et al. 2000; Williams and Wittig 1997; Twenge and Zucker 1999; Roy et al. 2007). As a result, voters may approve and agree with a progressive gender agenda privately, but may be too ashamed to support such issues in fear that they will be associated with feminism and these negative attributes.

If this is true, this backlash to feminism may also influence polling accuracy for female candidates. Female candidates are sometimes portrayed in the media as having the negative stereotypical qualities associated with feminism (Templin 1999). For example, Glenn Beck notes “[Hillary Clinton] is like the stereotypical…she’s the stereotypical bitch, you know what I mean? She’s that stereotypical nagging…” (National Organization for Women 2008). Democratic Strategist Paul Begala and Washington Post staff writer Tony Kornheiser have drawn comparisons between Florida US Senate candidate Katherine Harris and Disney villain Cruella De Vil (CNN 2004; Kornheiser 2000). These characterizations of female candidates as being too aggressive, cold, and calculating are often the same negative stereotypes associated with the feminist movement. Moreover, female candidates who run for elected office are often portrayed as being irresponsible home makers. Kantor and Swarns in a 2008 New York Times article identify this perception among voters “Many women expressed incredulity—some of it polite, some angry—that Ms. Palin would pursue the vice presidency given her younger son’s age…” (Kantor and Swarns 2008, p. 1). This negative dialogue about such candidates can make voters feel uncomfortable about voicing their support for these candidates publicly, even if, for ideological or other reasons, they support these candidates privately.

We expect that if such a behavioral response is exhibited, it will be more prevalent in areas in which traditional gender roles are the norm. In this context, signaling a preference for a female candidate would go against the prevailing view of traditional roles for women, and thus could be expected to lead to the loss of esteem among one’s peers. Therefore, if it is true that respondents falsify their true preference for the female candidate as a public signal of their commitment to traditional roles for women, then we should expect such behavior to be more prevalent in gender-conservative states, especially in states where the prevailing view of gender roles is the traditional one.

To summarize, previous research has shown that survey respondents may be particularly likely to give misleading information when discussing gender related topics. This makes it difficult to gauge respondents’ true feelings about these issues. Studies related to gender have found mixed results in this regard. On one hand, respondents may want to appear more progressive on gender issues than they actually are. On the other hand, respondents may want to distance themselves from being perceived as supporting feminist issues. Taken together, these empirical findings lead us to the following two competing hypotheses.Footnote 2

Hypothesis A

Pre-Election Polls will consistently overestimate support for female candidates because respondents want to appear more progressive on gender issues. And, if true, this tendency should be greatest in more gender-progressive states.

Hypothesis B

Pre-Election Polls will consistently underestimate support for female candidates because respondents want to appear more conservative on gender issues. And, if true, this tendency should be more prevalent in gender-conservative states.

In order to assess the validity of these hypotheses, we collect and analyze data on more than 100 major party female gubernatorial and US Senate candidates, and to serve as a baseline for comparison, nearly 100 cases of two white male opponents in similar contests.

Data

Our sample includes 215 distinct cases from over 40 states. It contains 123 female candidates and, to serve as controls, 92 races in which two white males were the major party candidates. We focus on prominent statewide races—those for US Senate and Governor—for several reasons. At this level of government, candidates tend to be better known than those at the local or congressional level. Also, there are many more female candidates at this level than in Presidential elections. Finally, there are more polling results available for US Senate and Gubernatorial candidates then there are for congressional and local level candidates.

The election results in this analysis were obtained from www.uselectionatlas.org. In addition to collecting information on the final election results, we also collected information on pre-election polls. This data was obtained from newspapers from the state in which the election took place. We used four main criteria in selecting these polls. First, we used the poll that was closest to the election date. Therefore, we had the most current—thus presumably the most accurate, at least on average—estimate of the candidate’s support.Footnote 3 Second, we only included surveys that were conducted by telephone. By relying on a single method of polling we can ensure that differences in our dependent variable are not driven by differences in methods of poll collection.Footnote 4 Third, we only used surveys that were conducted by polling firms, newspapers, or universities, thus eliminating the potentially biasing factor of partisan polling. Finally, we only considered surveys that contacted more than 300 respondents.

Because our principal variable of interest is the gender of the candidate, we collected polling information for the universe of female candidates from 1989 to 2008 in which polling data was available.Footnote 5 The data was collected primarily from Lexis-Nexis, Access News Archives, and Daniel Hopkins’ publicly available data on polls for female candidates.Footnote 6 For comparison cases, we also collected information on white male candidates from the similar sources.Footnote 7 All white male candidates were selected according to matching criteria, so as to have one matched “male” case for every case of an election with a female candidate in our sample. This was done in order to minimize the potential confounds introduced by ever-changing polling and weighting formulas as well as to control for certain temporal and spatial effects on polling accuracy.

Based on previous research, we also expect polling inaccuracies to be tied to the circumstances of the election and the context in which the election took place. With respect to the election context, we considered whether the candidate of interest was a Democrat or Republican and whether the candidate or his opponent was an incumbent.Footnote 8 Several social scientists argue that polls are more likely to overestimate Democratic candidates’ support and encounter difficulties in estimating incumbents’ vote shares (Crespi 1988). We also include a dummy for whether the election of interest was a gubernatorial election or a US Senate election. Research shows that some voters are more hesitant to support a female candidate for an executive position than they are for legislative office (Huddy and Terkildsen 1993; Dolan 1997). Therefore, the office being sought by the female candidate may influence the level of polling accuracy.

With respect to other features of the election, we also considered whether the election had an uncharacteristically high (or low) turnout which could lead to less accurate polls (Crespi 1988; Bolstein 1991; Mitofsky 1998). This variable was constructed using voting age population (VAP) turnout rates for each state of interest from George Mason’s United States election project (http://elections.gmu.edu/voter_turnout.htm). For each state, we calculate a standard score for both Presidential and non-Presidential elections from 1988 to 2008. Using standard scores allows us to see if the election of interest had much higher or lower turnout than expected and allows us to make comparisons across states. Turnout of certain subgroups may be particularly important in elections with female candidates, so we also calculated a standardized score for female turnout. If we do find that polling for female candidates is less accurate than polling for male candidates a potential culprit may be an unexpected increase in female turnout when there is a female candidate on the ballot. A large swell in female turnout may be particularly problematic for pollsters as all else being equal, female voters prefer female candidates (Brians 2005; Dolan 2004; Fox 1997).

We also include a variable for the number of undecided voters, because it is well established that large numbers of undecided voters can lead to polling inaccuracies (Perry 1979; Fenwick et al. 1982; Mitofsky 1998; Berinsky 1999). We also controlled for the percent of third party support and the percent of “Don’t Know” voters in the poll by subtracting the candidate of interest’s vote share, their opponent’s vote share, and the percent undecided from 100. Next, we used the margin of victory in the pre-election polls as an indicator of competitiveness.Footnote 9 Some research suggests that polling in uncompetitive races will be less accurate because there is often a regression toward the mean (i.e., toward a 50–50 split of the vote share between the two candidates), which continues past the final poll into the election (Hopkins 2009). As a result, candidates who are leading by a large margin will tend to perform significantly worse in the election than their poll-based expectations and candidates losing by large margins according to polls will likely outperform expectations. However, polls for candidates in close elections do not have much room to regress to the mean as the (estimated) vote shares are already near an equal split. Finally, we include a variable that measures the number of days away from the day of the election that the poll was conducted to ensure that our results are not driven by the time frame of the poll’s collection of data.

We also sought to account for the social context in which each election occurred, since this may affect the decision making calculus of the survey respondents. We use four variables to measure the social context of the election: percent women in the state legislature, percent of women in the labor force, the political culture surrounding gender issues in the state, and the Democratic Presidential candidate’s vote share compared to the his national vote share. The percent of women in the state legislature variable was collected from Rutgers’ Center for American Women in Politics. The percent of women in the labor force was collected on an annual basis through the Bureau of Labor Statistics summaries of the Current Population Survey. The women in the labor force variable measures the percent of women who are employed in that state who are over 16 years old and not institutionalized.Footnote 10

To approximate the gender political culture in the state we use the average American Association for University Women (AAUW)Footnote 11 voting records for Congressional representative and US Senators from each state. The AAUW assigns scores for each US Senator and Congressional representative based on their support for bills which advance educational, economic, and social equity for women. High AAUW scores indicate that the particular representative supports a gender progressive agenda. Our AAUW measure is calculated by averaging the AAUW score for entire Congressional state delegation. Finally, our proxy for partisanship of the state was created by subtracting the Democratic presidential candidate’s vote share in the state from his vote share in the general election. The partisanship score for each state is based on the previous Presidential Election. For example, the state’s partisanship score for 2002 is based on that state’s support for Gore in 2000 relative to Gore’s national support.

If Hypothesis A is correct, we should expect that respondents in states with higher percentages of women in the labor force, women in the state legislature, and states with higher AAUW scores to be the most likely to over-report their intentions to support female candidates in an attempt to express views which better fit these presumably more gender-progressive areas. Conversely, we should expect a tendency to under-report in states with lower values of these variables if Hypothesis B is correct.

Methods

The measure we use as our dependent variable is often referred to as Mosteller #5 (Mosteller 1949). To construct this measure, we first calculate the margin of victory as measured by pre-election polling, meaning that we subtract the level of polling support for the opponent (P b below) from the level of polled support for our candidate of interest (P a below). We then repeat this calculation for actual election margin of victory (V a and V b below). Finally, we subtract the election margin of victory from the predicted margin of victory. If this difference is negative, then there is putative evidence that polls underestimate support for female candidates. That is, when compared to his/her opponent, the candidate’s expected performance (measured by the pre-election poll) was worse than his/her actual performance. The reverse is true if this difference is positive. Equation 1 below provides a more formal definition of our dependent variable.Footnote 12

Construction of polling inaccuracy measure

$$ {\text{Mosteller}}\,{\text{\#5}} = \left( {P_{a} - P_{b} } \right) - \left( {V_{a} - V_{b} } \right) $$
(1)

where P i is the percentage of poll respondents indicating their intent to vote for candidate i, and V i is the percentage of votes for candidate i, where i = A, B. In our application, candidate A is our candidate of interest, i.e. the candidate with respect to whom we are assessing the polling discrepancy.

This measure of polling inaccuracy suits our purpose for several reasons. It can be calculated with respect to a specific candidate, it is unaffected by the pre-election margin of victory, (i.e., competitiveness) because it is a measure based on absolute rather than relative error. Moreover, it is not a function of the level of support for the candidate of interest, making it less susceptible to variation due to changes in third-party support or undecided voters. Finally, it is the measure of (in)-accuracy most often reported by the media and preferred by social scientists (Mitofsky 1998).

To understand the relationship between gender and the accuracy of pre-election polls we use several statistical methods. To obtain a basic understanding of this relationship, we first review the descriptive statistics and perform two difference of means tests. The first difference of means test measures whether male and female candidates’ support in pre-election polls is significantly different from their actual election support. In order to isolate the effect of the candidate’s gender on polling accuracy, we utilize a second difference of means test. This time we use a paired two sample t-test measuring differences in polling accuracy for female candidates and their white male comparisons. As described above, for each female candidate we chose a comparison statewide election (i.e., one which involves two white males). The polls for the comparison cases are often collected by the same polling firm as that which conducted the polls for the election of interest.Footnote 13

Using this matching method, we are able to control for geographic, temporal, and polling firm-specific effects to the maximum extent possible. Various studies have identified many different types of general survey mode effects (Tourangeau and Smith 1996; Krysan et al. 1994; Fricker et al. 2005; Kreuter et al. 2008), and moreover, the way in which a candidate’s gender interacts with these previously identified modes effects is entirely unknown. In light of this, the use of matched cases becomes even more crucial. Without them, we could not be sure that any polling inaccuracy that exists for female candidates is not attributable to the mode in which the survey was conducted. Utilizing the matched cases we can isolate the effect of gender and obviate any confounds due to mode effects. Finally, this technique allows us to compare the average polling bias for one set of candidates of interest (white females) with the average for another set of carefully chosen candidates of interest (within carefully chosen elections of interest) with the result being that the two groups are as similar as possible, except the polar differences in the gender identity of the candidates. Without identifying a candidate of interest in a particular election the average polling-performance gap (to borrow Hopkins’ (2009) phrase) is always zero, as an overestimate for one candidate perfectly cancels an underestimate for the other. However, because our interest is in the effect of gender on polling accuracy, the female candidate is our default candidate of interest, allowing us to calculate a female-specific bias.

Because there are multiple factors, aside from the candidate’s gender, which we expect will affect polling accuracy we estimate four different OLS regressions which predict the accuracy of pre-election polls. In the first regression, we analyze the differences between white male and white female candidates without any controls. This model will allow us to determine if polling accuracy for female candidates are significantly different than polls for males. The second and third models will test the same relationship, but in these models we control for the context of the election (Party ID, non-competitiveness, standardized voting age population turnout, female turnout, percent undecided, incumbent in the race, and a gubernatorial election indicator variable). What distinguishes the third from the second model is in the third model we account for female turnout instead of voting age population turnout.Footnote 14 Our final model will test the effects of both election and state (social) context variables on the accuracy of polls.

Results

Table 1 presents descriptive statistics for the size distribution of the polling inaccuracy for our groups of interest. For each group, we report the mean dependent variable score (the average difference between the predicted and actual margins of victory), its standard deviation, the percent of cases in which the polls fall outside the margin of error, and, of those which fall outside the margin of error, the percentage of cases which over-predict and under-predict the electoral performance of the candidate. Figure 1 is a box plot of levels of polling accuracy for male and female candidates which presents the descriptive statistics in Table 1 and the distribution of the data visually.

Table 1 Summary of polling discrepancies for male and female senate and gubernatorial candidate W/T-test of means
Fig. 1
figure 1

Box plot of polling accuracy for male and female US Senate and gubernatorial candidates from 1989 to 2008. The box-plot displays the results in Table 1 graphically. The results of the box-plot indicate that the median female candidate generally outperform her pre-election poll prediction. Moreover, the graph shows that most female candidates in the upper quartile of the distribution perform better in the election than polls predict. The box plot also shows that the there is not a large discrepancy between the median white male candidates’ polled support and their electoral support

As indicated by the first column, in the aggregate there is essentially no difference between the pre-election polls and the final results for white male candidates. The average difference between the pre-election polls and the final result for these candidates is well under a half of one percentage point. Therefore, the discrepancies for the individual cases are not biased in either a positive or negative direction, so they, for the most part, cancel each other out. The same table shows that pre-election polls tend to underestimate the final results for female candidates by 2.12% points. Furthermore, the difference between the pre-election poll and the final result is significantly different, which indicates fairly unequivocally that polls significantly underestimate support for female candidates.Footnote 15

The difference in polling accuracy between white male and female candidates is also indicated in Fig. 1. According to the figure, polls tend to underestimate female candidates’ electoral support. This appears to occur for a majority of the women in our sample, a fact which becomes very apparent when examining the upper middle quartile of the plot. Even female candidates above the median in terms of polling accuracy mostly have values which are less than zero indicating that polls underestimate their support. Figure 1 thus dramatically demonstrates that polls for female candidates are much more likely to underestimate their support compared to polls for white male candidates.

While polls may be quite accurate in predicting the winning candidate, about half of the polls included in this analysis did not predict the margin of victory within the margin of error. In spite of the similarities in terms of the proportion of elections predicted within the margin of error, there appears to be a clear directionality to the nature of the error for female candidates. In nearly 70% of the female cases, the pre-election polls underestimate performance in the election for these candidates. This is in contrast to the set of white males, for which polls are much closer to being equally split (43–57%) between underestimating and overestimating support.

As we discussed earlier, female candidates may not be the only ones susceptible to polling inaccuracies. Therefore, we use the paired sample t-test to compare the total number of the female candidates to their male comparisons (elections with two white male candidates). The results of this t-test are reported in Table 2.Footnote 16 When compared to white male candidates in their state who ran for US Senate or governor around the same time (within 6 years), and mostly under similar circumstances and from the same party, female candidates perform significantly better in the final results than would be predicted by pre-election polls to the tune of about three and half percentage points. This difference in polling accuracy for male and female candidates is well below the 0.01 level of significance. This finding provides further evidence that a candidate’s gender is associated—in a systematic way—with the accuracy of pre-election polls. However, we would like to be confident that possible confounds such as competitiveness, partisanship, and turnout do not affect our conclusion that the Richards Effect is real. In order to do so, we turn to a standard Ordinary Least Squares Regression model.

Table 2 Matched paired T tests comparing women candidate to white male candidates from the same state and time period

A Comparison of Female and Male Candidates: Polling Versus Electoral Performance

The results of our regression models are reported in Table 3.Footnote 17 The baseline model again shows that female candidates perform significantly worse—by almost 2.7% pointsFootnote 18—in pre-election polls than the final results when compared to polling for white male candidates. The magnitude of this variable weakens slightly but remains significant when we take into account the election context. According to Model 2, the election context model, female candidates perform 2.15% points worse in their respective polls than in the election when compared to white male candidates. When one controls for the same factors as Model 2 but substitutes voting age population (VAP) turnout for female turnout, female candidates perform better, in fact by 2.12% points, in the final election than polls predict. Finally, when we account for social context in addition to the election context, women continue to out-perform their pre-election polls by 2.41% points compared to polls for similarly situated white males. This social context model was also the strongest model used in this analysis as it explained 24% of the variance in our dependent variable.Footnote 19

Table 3 Predicting the Richards Effect: OLS regression predicting the difference between the latest pre-election poll and the final results for male and female candidates from 1988 to 2008

The only other variables that were significant in our analysis were the competitiveness measure and the size of the female labor force in the state. For every point increase in the predicted margin of victory, polls underestimated the candidate’s support by about 0.1% points in the election context models and 0.092% in social context model, holding all else equal. This suggests that candidates who are losing by a large margin of victory are more likely to perform (slightly) better in the final election than polls predict. Polls also performed worse in states where women make up a large proportion of the labor force. For every one percentage point increase in the percent of women in the labor force, polls overestimate support of the candidate of interest by 0.4% points. Contrary to our expectations, the proportion of undecideds, the proximity of the poll to the election date, and turnout had no significant effect on polling inaccuracy.Footnote 20

Context and Polling Discrepancy

The regression results from the previous section provide consistent evidence that voters may be more apprehensive about admitting their support for female candidates than they are for male candidates. In the theory section, we argued that this result may be explained by men and women trying to appear more conservative on gender related issues to distance themselves from feminism. Our findings thus far are consistent with hypothesis B. Nonetheless, the data presented in Table 3 alone does not provide sufficient evidence that voters are falsifying their preferences for female candidates in order to appear more gender conservative. Therefore, we explore the state context in search of additional support for this hypothesis.

As we mentioned in the theory section, if voters falsify their preference to support male candidates when they intend to vote for female candidates, we should see larger polling discrepancies in context where residents adhere to traditional gender roles and in states where voters are not supportive of advancing a gender progressive agenda, as potential voters in such areas may be more reluctant to disclose their (true) support for a female candidate.

While there are not direct measures of these conditions, there are several proxies that we utilize to approximate these contexts. First, to measure whether states adhere to traditional gender roles we can examine the percent of women in the work force and the number of women in state government. Several studies have shown that voters are much more supportive of women in politics in states where a large percentage of women are in the workforce (Iversen and Rosenbluth 2008; Matland and Studlar 1998; Andersen 1975; Andersen and Cook 1985). When more women enter the work force, support of traditional gender roles for women generally declines and voters are more approving of women in politics (Andersen 1975; Andersen and Cook 1985). Based on this research, we should expect that states with fewer women in the work place would be more conservative about traditional gender roles in politics and be the most apprehensive about professing their support for female candidates. Therefore, there should be additional evidence for hypothesis B if the results demonstrate that polls tend to more drastically under-predict female candidates’ performance in states where fewer women are in the workforce.

We also expect that states with fewer women in the state legislature will be less accepting of women in politics. The amount of female representation in the state thus should be a valid proxy of the progressiveness of a state regarding women and politics. If polls underestimate support of female candidates at a greater rate in states with fewer women in the state legislature, then we consider this as additional support for Hypothesis B.

Finally, to measure the culture toward women’s issues in a state we will look at the representatives voting records towards women’s issues in that state. While there may be a slight disconnect between voters and their representatives, there is a vast literature which shows that US House and Senate representatives follow the public opinion of their constituency when voting (Page et al. 1984; Erikson 1978; Bartels 1991). Therefore, representatives in states where voters tend to be more conservative with regards to gender issues should have poor voting records on bills which are considered to be supportive of women’s issues.

To further test Hypothesis B, we run several OLS regressions predicting polling accuracy for only women US Senate and gubernatorial candidates from 1989 to 2008. This method will provide information on which female candidates are most susceptible to polling inaccuracies. The first regression model only includes variables that measure the percent of women in the labor force, the average American Association of University Women voting record scores of US Senators and Congressional Representatives in each state and year, and the percent of women in the state legislature. The second model uses these same variables but also includes controls for several election and social context variables to isolate the effect of the state’s culture toward gendered issues on polling accuracy for female candidates. In addition to running an OLS regression for only female candidates, we run one model with all the controls for male candidates to determine whether the partisanship of the candidate and the social context effects on polling accuracy are unique to female candidates.

The results from all three models provide additional support for Hypothesis B. In the first model, polls are significantly more likely to underestimate female candidates’ support when there are fewer women in the labor force. For every one percentage point increase in the number of women in the labor force, polls overestimated support for female candidates by a half of a percentage point. Both the average AAUW scores for the state and the percent of women in that state legislature were inconsequential without the appropriate controls.

In the second set of models, polls in states where women make up a small portion of the labor force continue to significantly under predict support for female candidates. In the full model, for every one percentage point decrease in the percent of women in the labor force, polls underestimate support of female candidates by about a fourth of a percentage point. Also, when controlling for the election context, polls for women candidates in states where Congressional representatives and US Senators have low AAUW scores are significantly more likely to underestimate female candidates’ electoral support.Footnote 21 The final model on Table 4, which replicates the election context model for only white male candidates, demonstrates that the male candidate’s polling accuracy is not significantly influenced by the percent of women in the labor force nor the average AAUW scores for representatives in their state. This indicates that these variables have a unique effect on female candidates.Footnote 22

Table 4 Context, candidates characteristics, and polling inaccuracy: OLS regression predicting sources of polling inaccuracy for only women candidates from 2000 to 2008

Contrary to expectations, the percent of women in that state legislature had no effect on polling accuracy for female candidates. While this result goes counter to our expectations based on Hypothesis B, there are several reasons why percent women in the state legislature may not have the influence that we expected. First, because most Americans are unaware of how many women are in the state legislature (Sanbonmatsu 2003), female representation at the state legislature may not have a significant effect on altering perceptions about women in politics. Second, the partisanship of the women in the state legislature may influence this result. If a large proportion of the women in the state legislature are Republicans who advocate for traditional gender roles, then this measure may not accurately gauge how progressive the state is on gender issues.

The only other variable that was significant in any of the specifications is the predicted margin of victory. For every one percentage point increase in the female candidate’s support, she performed 0.12 points worse in the actual election. For male candidates, polls understated their support by 0.06% points for every percentage point lead they had on their opponent.

Discussion

In the 1990 Texas Gubernatorial election, many polls predicted Clayton Williams to replace the retired Bill Clements as the Governor of Texas. To everyone’s surprise, Williams lead (which was as much as 8 points in some polls) dissipated on election day and Democratic candidate Ann Richards was victorious. When a similar polling discrepancy occurred a year earlier in Virginia, where black gubernatorial candidate Douglas Wilder’s large estimated pre-election lead proved wildly incorrect as he won by less than one percentage point, many people attributed this polling discrepancy to his race. In the 1990 Texas Gubernatorial election, gender was omitted from the list of potential causes of polling error. Our study provides new insight into this puzzle. Perhaps it was not only the traditional polling problems that led polls to be less accurate, Ann Richards’ gender may have also played a vital role in these polling discrepancies. Our results indicate that female candidates, and in particular female candidates from gender-conservative states, like Ann Richards in Texas, tend to do worse in pre-election polls than in actual elections.

The findings in this article suggest that there may be a stigma attached to supporting a female candidate. While polling biases for gender related issues are nothing new (e.g. Ballou and Del Boca 1980; Lueptow et al. 1990; Northrup 1992) the finding that female candidates are also susceptible to a polling bias is a novel one. In spite of the growth of high profile female candidates and elected officials in recent years, some Americans still appear to be hesitant to offer their true opinions regarding women in positions of power. While some have argued that respondents will want to appear more supportive of women politicians than they actually are (Streb et al. 2008), our results suggest the opposite. The findings in this paper indicate that voters—and especially those living in states with more traditional views—may want to appear less supportive of female candidates. The results in this paper suggest that pollsters and researchers should be concerned about the gender of the candidate as much or possibly even more than they are about race.

In addition to showing that female candidates tend to see their vote shares systematically underestimated by pre-election polls, we were also able to show that it is more likely to occur in areas with a more conservative view of gender roles. While both analyses lend support to our hypotheses that voters may want to appear less supportive of female candidates as an attempt to distance themselves from feminists—i.e., they tend to falsify their true preference when responding to polls—we can only make inferences based on aggregate data. Due to the proprietary nature of polling data and the secret ballot process, testing claims regarding why we observe these systematic polling inaccuracies for these groups is extremely difficult. While we acknowledge that inferences about individual behavior based on aggregate data are not entirely satisfactory, this analysis offers the first explanation of these empirically robust results.

Therefore, more research into this topic is necessary, not least because 2010 has been dubbed the “year of the woman”, with female candidates in many major races across the country. Future research should focus on the role that undecided voters and partisanship play in the underestimation of electoral support for female candidates. Because the data limitations inherent in proprietary polling and the secret ballot process often prevent a more disaggregated analysis of polling response and vote choice, perhaps the best methodology for this type of an investigation would be a survey experiment. Such an experiment would allow us to isolate the factors that may affect the willingness of respondents to announce their support of a female candidate.

Moreover, future studies should conduct a content analysis of elections with female candidates to determine whether the electoral context, like social context, also influences the magnitude of the Richards Effect. It may be that races where gender is a prominent issue such as the 2010 South Carolina Republican primary, 2000 New York US Senate Election, or the 1990 Texas Gubernatorial Elections where the Richards Effect is most prominent. Further investigation of this issue is especially important given the increasing number of female candidates, and we hope this manuscript lays a foundation for such an understanding.

On the surface, the Richards Effect appears to be innocuous and in some ways may appear beneficial for the increasing number of female candidates, but under-reporting female candidates’ support can have negative consequences. There is a well documented “bandwagon” effect in American and international politics where likely voters change their preferences from an underdog candidate to support a leading candidate in order to conform into what they perceive to be the political norm (Marsh 1985; McAllister and Studlar 1991). If support for female candidates is consistently and systematically underestimated because of polling discrepancies, some voters who would have otherwise supported a woman for political office may adjust their preference to support her competitor. As a result, the Richards Effect may cause female candidates to be less competitive than they would be if polls were correctly gauging their support.

The Richards Effect could also be problematic for female candidates who are trying to attract financial backing. Candidates who perform better in pre-election polls often have an easier time raising resources for their campaigns (Gross et al. 2002). Potential donors may be less willing to give money to female candidates if their polling support appears to be low. This may be especially problematic for female candidates, who, because they are relative newcomers to the political scene, struggle to demonstrate their viability (Carroll 1994). By underestimating support for female candidates, polls may thus weaken female candidates’ appeals for financial support. Taken together, these arguments suggest that polling discrepancies for female candidates adds to an already extensive literature about disadvantages women face in the political system.