1 Introduction

Gender gaps in politics matter for the political arena and people’s private lives. For political parties, gender imbalances in the electorate can make gender equity more of a party-political battleground than a common goal (Kaufmann and Petrocik 1999; Shorrocks 2018). At the same time, gender as a cleavage has consequences for people’s private lives. For most potential dividing lines, such as class, religiosity, or location, people tend to stand on the same side as their partners and family. When gender becomes a cleavage, however, the dividing line cuts right through many people’s homes (Chen and Rohla 2018; Muxel 2014).

This paper traces gender gaps in voting behavior in Germany using information from the German representative election statistics, a globally unique source (Kobold and Schmiedel 2018) that counts actual votes cast by demographics. The empirical analyses have two main parts. First, I analyze long-term changes in (a) gender differences in voting behavior separately for each party’s electorate and (b) summary measures for the nondirectional degree of dissimilarity in voting patterns by gender and the directional left–right gender gap. I also analyze trends in the aggregate measures by age groups. Findings from these analyses partly contrast with those of previous international survey-based studies. The second part compares results from real ballots with estimates from survey data to explore potential methodical reasons for the discrepancy.

I offer three main contributions to the literature. First, the much larger sample than those of previous studies allows for precise estimates, even at the age group and party levels. Second, whereas relevant surveys started in the 1970s or later (Inglehart and Norris 2003), the data source here enables analyses of behavioral change since the 1950s. Finally, previous studies relied on self-reports from surveys. The data on actual voting behavior I analyze rule out issues of selective participation and social desirability bias, which is frequently observed in political polls (Stout et al. 2021). In Germany, for example, people more often vote for the radical right-wing AfD (Alternative for Germany) than they admit in surveys (Gschwend et al. 2018). If the degree or direction of social desirability bias differs between women and men, this may bias estimates of gender gaps (Hebert et al. 1997; Johann et al. 2016; Paunonen 2016). This study thus calculates gender gaps that are free from such bias and, by comparing real ballots with surveys, further tests whether survey data are generally suitable for studying political gender gaps.

2 Background

It is well documented that women tended to report more conservative voting and more right-wing ideological self-placement in most Western countries during the 1970s and 1980s (Dassonneville 2021; Norris and Inglehart 2000). Women’s adherence to the more conservative parties has been explained by women’s traditional values and, most prominently, greater religiosity (Bremme 1956; Hartwig 1927, 1931; Lipset 1960).

Previous research has documented the move from the “traditional gender gap” toward the “modern gender gap” in Western societies (Dassonneville 2021; Harteveld et al. 2019; Norris and Inglehart 2000). Today and in many countries, women report more left-wing positions than men, concerning both self-reported voting behavior and self-placement on left–right scales. This reverse toward the “modern gender gap” has been attributed in greater part to women becoming more left-wing than men becoming more right-wing (Norris and Inglehart 2000). In the USA, women increasingly favored the Democratic over the Republican party as early as the 1980s (Box-Steffensmeier et al. 2004; Norris and Inglehart 2000). In the 1990s, women in most Western societies placed themselves more to the left of the political spectrum and reported more left-wing voting than men (Abendschön and Steinmetz 2014; Giger 2009; Norris and Inglehart 2000). In the most thorough available study on ideological self-placement, Dassonneville (2021) combines information from different international surveys and confirms the move toward the modern gender gap for the majority of OECD countries. In that study, Germany represents the average OECD pattern well, with women having placed themselves more to the left than men since the 1990s. One previous study from Austria analyzed changes in gendered voting using real ballots and found that women had been voting more left-wing since the 1970s (Koeppl-Turyna 2021). However, this study is limited in scope because it only analyzes municipal elections in one specific urban area until 1991.

2.1 Explanations for a Move Toward the “Modern Gender Gap”

Explanations for women’s move to the left and the emergence of the modern gender gap include the topics of religion, education and employment, the salience of gender equity-related topics, and a cultural backlash. Of course, this list of potential explanations does not claim to be exhaustive.

Most Western societies have experienced secularization and declines in religiosity, which resulted in weakened church–party linkages (Norris and Inglehart 2000). Women have been, on average, more religious (Stark 2002), and the decline in religiosity may weaken the pull toward religiously oriented parties, which was particularly strong among women. In Germany, the link between the Catholic electorate and the Christian Democrats has weakened (Elff and Roßteutscher 2022).

Women’s employment rates have increased over recent decades and women are now overrepresented in some lower-paying jobs as well as the public sector and educational and occupational groups that are sometimes referred to as “new middle class” or “socio-cultural (semi‑)professionals” (Abou-Chadi and Hix 2021; Oesch 2006). These groups are typically associated with voting for left-wing (green) parties and less associated with voting for (radical) right-wing parties (Abou-Chadi and Hix 2021; Elff and Roßteutscher 2022; Klein 2022; Norris and Inglehart 2000; Oesch 2006). When voting based on material motives, partnered women and men might be affected by either their individual situation, or by the household’s situation. Consider a low-earner who is married to a very high-earner. If that person focuses on their individual situation and the potential situation after divorce, the benefits of high taxes and redistribution are more salient. A focus on the high household income would, on the contrary, center the harms of redistribution. Decreased rates of marriage and increased divorce rates could shift the focus from the household situation toward the individual situation (Hudde and Engelhardt 2023; Van de Kaa 1987). For women, a shift toward the individual perspective could increase the material motives to vote for left-wing parties that favor more economic redistribution as well as spending on the public sector (Debus 2016).

Gender differences in voting might stem from parties’ different stances on gender equity-related issues. Left-wing parties have a long tradition of focusing more on gender equity topics than conservative parties (Debus 2016), women have more egalitarian attitudes toward gender on average (Grunow et al. 2018; Hudde 2018), and today, those with more egalitarian gender role attitudes are more likely to vote for left-wing parties (Diabaté et al. 2023). If the salience of gender equity-related issues increases in a society, this area might become more important in electoral decisions and could lead to a stronger alignment of women with the left-of-center parties that emphasize gender equity (Norris and Inglehart 2000; Vachudova 2021).

In general, a modern gender gap might (partly) arise because radical right movements or parties emerge that gain more popularity among men than among women. Potential explanations come from theories of “losers of modernization” or “cultural backlash.” These approaches are sometimes contrasted, but for this brief review, I focus on their important similarities rather than their differences (Suckert 2022). According to both approaches, economic and cultural changes over recent decades have produced perceived and/or actual “losers” who resent the new order and vote for parties that promise to bring back the “good old days” (Betz and Johnson 2004; Norris and Inglehart 2019; Steenvoorden and Harteveld 2018; Suckert 2022). Such promises are often found among the populist or radical right, with examples such as Trump’s “make America great again” or Brexiteers’ “take back control” (Norris and Inglehart 2019; Suckert 2022).Footnote 1

Changes over recent decades include globalization with increased migration, market competition, and labor market insecurities, as stressed in the “losers of modernization” literature. The change further includes the cultural domain, with shifts in values and attitudes regarding topics such as migration and integration, environmental protection, or family and gender roles. In these domains, “liberal” or “progressive” values and attitudes, such as egalitarian gender role attitudes, have diffused into society as a whole and gained a dominant position (Baldassarri and Park 2020; Ebner et al. 2020; Henninger and Von Wahl 2019).

In economic and cultural domains, men might more often belong to those who see their status threatened. For instance, men are now less likely to achieve a high level of education (Hudde and Engelhardt 2023) and somewhat more likely to hold values and attitudes that have lost normative ground, such as a preference for gender-separate spheres (Davis and Greenstein 2009; Ebner et al. 2020; Grunow et al. 2018; Hudde 2018). Of course, the leveling of inequities, such as an increasingly gender-balanced political representation, might also be perceived as a loss of relative status. By these arguments, men might have become more susceptible to backlash radical right parties and the promise of the “good old days” (Norris and Inglehart 2019).

2.2 Historic Research from Germany

In Germany, ballots have been counted by gender since the first elections with women’s suffrage in 1919. However, compared with the post-World War II period, these data were gathered less systematically and only for certain elections and regions (Bremme 1956; Hartwig 1927, 1931). The available evidence shows that the “traditional gender gap” has existed since the introduction of women’s voting rights. In the first elections with universal suffrage, women were already voting less for left-wing and more for religious, national, or conservative parties (Bremme 1956; Childers 1983; Duverger 1955; Falter 2020). This implies that women’s right to vote “harmed the parties that demanded it but benefited those that rejected it” (Hartwig 1927, p. 510). The Communist Party (KPD) was the party with the most men-leaning electorate, and the conservative Catholic Centre Party (Zentrum) had the most women-leaning electorate. The nationalist DNVP (German National Peopleʼs Party) was moderately more successful among women, and the Social Democrats (SPD) were slightly more successful among men (Bremme 1956; Falter 2020).

Hitler’s NSDAP (National Socialist German Workersʼ Party) was initially more popular among men than among women. This initial gender gap is often explained by the argument that women were, on average, more religious than men, and that religious Catholics were the group that was most distant from the NSDAP (Falter 2020). In addition, given women’s much lower participation in the labor market at the time, women may have voted more often for religious considerations than, for example, job-related ones. However, by around 1930/1933, when the party softened its anti-church rhetoric and specifically targeted women in its campaigns, the NSDAP found roughly gender-balanced electoral support (Bremme 1956; Childers 1983; Falter 2020; Hamilton 1982).

Concerning the post-war period and potential changes toward the modern gender gap in Germany, previous survey research mainly finds Germany in line with the general pattern across Western societies (Abendschön and Steinmetz 2014; Giger 2009; Inglehart and Norris 2003). The large-scale analyses of ideological self-placement find that Germany represents the average OECD pattern well, with women having placed themselves more to the left than men since the 1990s (Dassonneville 2021). For electoral behavior in Germany, however, one is not limited to polling data but has access to extensive and high-quality information on real electoral processes—a data treasure that remains understudied.

2.3 Differences by Cohort vs. Period and by Ideological Subdimensions

A recurrent question in the social sciences is whether period or cohort effects drive social change. If people’s political ideas and party identifications form during young adulthood and remain stable thereafter (Campbell et al. 1960; Lipset and Rokkan 1967), only generational replacement can bring change. However, changes in party preferences also occur among adults (Arzheimer and Schoen 2016; Dejaeghere and Dassonneville 2017; Kuhn 2009). In the United States and on average across OECD countries, the move to the modern gender gap has been attributed rather to cohort than to period effects (Dassonneville 2021; Shorrocks 2018; but for more mixed findings see Harsgor 2018). I analyze voting behavior by the broad age groups available in the data and discuss how far they allow conclusions regarding cohort versus period effects.

Researchers argue that besides the general left–right scale, two subdimensions are needed to describe the current German party landscape, namely an economic one and a socio-cultural one, both of which my analysis includes (Faas and Klingelhöfer 2019). The socio-cultural dimension is also called “GAL-TAN” and opposes green-alternative-libertarian positions with traditional-authoritarian-nationalist positions. In recent decades, the socio-cultural divide has become more salient and party competition over it has increased (Norris and Inglehart 2019; Vachudova 2021). Gender gaps might be larger for the socio-cultural than for the economic dimension because the former includes topics of gender equity and is more closely associated with the new middle class (Abou-Chadi and Hix 2021; Abou-Chadi and Wagner 2020; Norris and Inglehart 2019). Moreover, support for right-wing populist parties, which are more salient in terms of their socio-cultural than their economic views, is greater among men than among women (Harteveld 2021; Lengfeld and Dilger 2018; Norris and Inglehart 2019).

3 Data and Methods

3.1 Data: Voting Behavior

I analyze actual votes in German federal elections collected by the statistical offices (Der Bundeswahlleiter 2021)Footnote 2. When going to the ballot on election day, people show their ID card and voter’s notification and are handed a ballot paper. Those who live in a voting district that is part of the representative sample are handed a paper where their gender and grouped year of birth is printed on the top of the ballot paper.Footnote 3 The votes are counted in two steps. First, they are counted on-site, disregarding the information on gender and age. Second, “After the official result has been established, the electoral boards send the ballot papers to the statistical offices of the states (Länder) or, in municipalities which have their own statistical unit, to the municipalities. These sort the ballot papers by population group and determine the election result for each group” (Der Bundeswahlleiter 2021). The election results are then published by gender and age group.

For reasons of data protection, results by age and gender are never published at a small-scale geographic level. Voters who are part of the sample are informed about this at various stages, including the letter with the voter’s notification and available handouts at the polling station (however, given the amount of information available in such letters, the large ballot papers, and the small print of age and gender, it still seems plausible that many voters are not aware that they are part of this sample). Voters can opt out of the data collection but would need to take additional bureaucratic steps: “a polling card may be applied for in due time before the election. It can be used to vote in any other polling district of the constituency or by post” (Der Bundeswahlleiter 2021).

The sample includes 1.5–4.0% of people eligible to vote, e.g., about 1.9 million of 61.2 million in 2021 (Der Bundeswahlleiter 2021). The data are available for elections since 1953, except for 1994 and 1998, when the data collection was suspended owing to concerns about vote secrecy (Schoen 1999). Pre-1990 elections cover only Western Germany. For more information on data and sampling, see the Online Appendix.

3.2 Data: Parties’ Left–Right Position

Information on parties’ left–right orientation comes from the Chapel Hill Expert Survey (Jolly et al. 2022) and its predecessor, the Ray–Marks–Steenbergen survey (Ray 1999; Steenbergen and Marks 2007). These surveys of academic experts have been widely used (Laver 2014) and judged to be valid and reliable for Germany (Bakker et al. 2015; Bruinsma and Gemenis 2020; Thomeczek et al. 2019).Footnote 4

Three measures for parties’ placements are available. First is the general left–right placement, surveyed as parties’ “overall ideological stance. 0 = extreme left; 10 = extreme right” (Jolly et al. 2022, p. 2). The second available measure captures parties’ economic left–right position and the third their socio-cultural position, which is associated with Inglehart’s notion of postmaterialism (Inglehart 1990). The reliability, measured as agreement in judgment between different experts, is highest for the general scale and lowest for the socio-cultural scale (Bakker et al. 2015). I analyze all three measures.Footnote 5

Parties’ general left–right placement is available for ten timepoints between the years 1984 and 2019. Information on the two subdimensions of left–right placement, the socio-cultural and the economic dimension, are available for six timepoints between 1999 and 2019 (see Fig. 1). For further analyses (Figs. 3, 4 and 6), the years in between are intrapolated and 2019 data are used for the election in 2021.Footnote 6 Further, the earliest available data point of the general left–right scale from 1984 is also used for the elections 1953 to 1983. This might be a valid approximation given the relative stability of party orientation in Germany (see Fig. 1; also Bruinsma and Gemenis 2020), but the greater the distance to the year 1984, the more caution is advised when interpreting results.

Fig. 1
figure 1

Left–right positions of parties over time. (Source: Chapel Hill Expert Survey (Jolly et al. 2022) and its predecessor, the Ray–Marks–Steenbergen survey (Ray 1999; Steenbergen and Marks 2007))

Measure: Gender Gap in Voting by Party

Suppose two scenarios: (A) a party receives 30% of women’s and 20% of men’s votes, and (B) the party receives 20% of women’s and 30% of men’s votes. What is the party’s gender gap? In (A), women are 50% more likely than men to vote for that party, and in (B), 33% less likely to do so. However, the gender gap is equally large in both scenarios, only the direction differs. The standardized measure represents this by calculating the difference between women’s and men’s vote share, +10 percentage points in (A) and −10 percentage points in (B), and dividing it by their average, 25% in both (A) and (B). The result, +40% and −40%, is the percentage difference, a standardized measure of the gender gap (Countryman 2013). This standardization ensures that the positive bars in both exemplary scenarios described above are equally long.

$$\text{stand}.\mathrm{gap}\,\text{party}_{i}=\frac{\mathrm{vote}\,\text{share}_{\text{women}}-\mathrm{vote}\,\text{share}_{\mathrm{men}}}{.5*\left(\mathrm{vote}\,\text{share}_{\text{women}}+\mathrm{vote}\,\text{share}_{\mathrm{men}}\right)}$$

Calculation of the Nondirectional and Directional Measure for Dissimilarity

The nondirectional measure for gender dissimilarity can be understood as follows. Suppose we matched 100 women voters with 100 men voters according to their party choice and could form 95 couples, whereas 5 women and 5 men would remain unmatched. Then, the nondirectional gender gap would take the value of 5. In formal terms, it is the sum of the absolute values of all parties’ differences in vote shares between women and men, multiplied by 100. Here is the formula of calculation where p refers to the vote share and i refers to the party (there are six parties, AF). Therefore, pwomen_i is the party’s vote share among women and pmen_i is the vote share among men.

$$\textit{nondirectional}\,\textit{gender}\,gap={\sum }_{i=A}^{F}| p_{\mathrm{wome}{n_{i}}}-p_{me{n_{i}}}| \mathrm{*}100$$

The measure for the directional, left–right gender gap is the average left–right score of parties that men voted for minus the average left–right score of parties that women voted for. Positive values represent that men’s voting behavior is more to the right and negative values indicate that women’s voting behavior is more to the right.

$$left-\textit{right}\,\textit{gender}\,gap={\sum }_{i=A}^{F}p_{\mathrm{men}\_ i}*lr\_ \text{score}_{i}-{\sum }_{i=A}^{F}p_{\text{women}\_ i}*lr\_ \text{score}_{i}$$

4 Results: Gender Gaps in Real Ballots

4.1 Overview: Left–Right Positions of Parties

Figure 1 plots the parties’ left–right orientations over time. The ordering of parties remains stable except for the socially liberal and pro-business FDP, who are coded left to the Christian democrats CDU/CSU in the 1980s and early 1990s but slightly right to the CDU/CSU in more recent years.

The positions of CDU/CSU and the social-democratic SPD on the general left–right scale are consistent with their respective positions on the economic and socio-cultural scales; the other parties take varying positions on the different scales. Most recently, the right-wing populist AfD is far-right in socio-cultural terms and moderately right in economic terms. The FDP is coded right-wing in economic terms but center-left in socio-cultural terms. The Greens are economically center-left and socio-culturally far-left. The Left and its predecessor the PDS (Party of Democratic Socialism) are the opposite: economically far-left but socio-culturally center-left.

4.2 Party-Specific Gender Differences

Figure 2 arranges parties according to their left–right position. The bar width represents the parties’ size (the average between the vote share among women and men) and the height and direction of the bars show the percentage difference, the standardized gender gap in voting.

Fig. 2
figure 2

Voting behavior by gender. Results for each election and party. The figure contains information on the size of the party’s electorate (x‑axis), the gender gap in voting (y‑axis), and rough information on the party’s position (they are ordered according to their average left–right position over time). The names below the election years are the chancellors in office following election

Over time, the bars become more narrow and more numerous, showing the remarkable fractionalization of the German party system (Hudde et al. 2022). The number of relevant parties at the federal level has doubled from three to six since 1953.

Some parties’ gender gaps remained relatively stable over time. For example, the FDP was considerably more popular among men in almost all elections. Since its appearance in 2013, the same has been the case for the AfD. The CDU/CSU mainly fluctuated between being more popular among women and displaying no gender gap. The other parties’ gender balances changed considerably. Most remarkable is the case of the Greens, who gradually moved from having the most men-leaning electorate in 1980 (lowest value on the standardized gender gap) to having the most female-leaning electorate in all elections since 2005 (highest value on the standardized gender gap). However, their electorate became more gender-equal in 2021, breaking the previously monotonous trend. The Left had the most gender-imbalanced voters in 2005, with an electorate dominated by men, and the most gender-balanced voters in 2021. The SPD shows a rather fluctuating gender pattern in the new millennium.

Overall, the party-specific gender differences do not always follow a linear or straightforward left–right scheme. For example, in 2013 and 2017, both the most left-wing party and the most right-wing party were more prominent among men.

4.3 Nondirectional and Directional Gender Differences

Figure 3 summarizes and condenses the detailed information from Fig. 2. It shows a nondirectional measure for gender differences and a directional, left–right measure. Because parties’ general left–right placement has only been available since 1984, Figs. 3 and 4 also use the 1984 value of left–right scores for all previous elections. This is likely a reasonable approximation because parties’ orientations were relatively stable in that period (Bruinsma and Gemenis 2020). However, the values should be interpreted with some caution.

Fig. 3
figure 3

Level and direction of gender differences in voting over time

Fig. 4
figure 4

Level and direction of gender differences in voting over time, by age group. Note that the specific intervals of the age groups in the data change over time. For example, people aged 30 were in the 30–59 category in the elections 1953–1961, in the 30–44 category in the elections 1965 and 1972, and in the 24–34 category thereafter

In broad terms, gender differences were large in the 1950s and 1960s, relatively small in the 1970s and 1980s, and have widened substantially in the new millennium. Initially, the increasing nondirectional gender dissimilarities did not coincide with a growing split in left–right orientation. The most extreme case is in 2013: women and men voted very differently in nondirectional terms, but almost identically in terms of the average left–right position.

The well-known move from the traditional gender gap towards the modern gender gap, where women are more left-wing than men, only emerges as late as 2017. This is remarkable because previous survey-based studies identified the new gender gap in Europe, including Germany, as early as the 1990s (Giger 2009). Just as remarkable is the speed of change: the directional gender gap jumped from zero to the second-largest gap of the entire post-World War II period in only 8 years (2013–2021).Footnote 7

Overall, gender gaps are similar for the general left–right orientation and its two subdimensions. Larger gender gaps concerning the socio-cultural than the economic dimension were expected, but this only shows marginally. The levels and trends are practically identical for the general score and the socio-cultural dimension.

4.4 Comparisons Between Age Groups

Figure 4 depicts changes in the directional and nondirectional gender gaps by age brackets. The figure broadly shows younger voters in solid orange lines (for greyscale: light grey), middle-aged voters in dashed blue (medium grey), and older voters in short-dashed, dark grey lines. Overall, period differences across age groups are much larger than age/cohort differences at one point in time. In tendency, the traditional gender gap in the 1950s and 1960s was rather driven by older voters whereas the modern gender gap is largely driven by young and middle-aged voters. The age-specific perspective confirms that the modern gender gap is a very recent phenomenon. Among the youngest voters, there were already hints of a modern gender gap emerging in 2013, but the gap only really became visible in 2017.

The broad age groups available cannot be harmonized for cohort analyses across elections.Footnote 8 However, it is apparent that cohort succession cannot explain the rapid change in recent years. Consider the elections of 2013 and 2017 with the age group 25–34 as a focal group. Between these two elections, people from the 18–24 group moved into the focal group, and people moved from the focal group into the 35–44 group. If cohort succession was the driver of change, the focal group would have become more similar to the 18–24 group’s 2013 value and the 35–44 group would have become more similar to the focal group’s 2013 value. However, the gender gap was similar across age groups in both elections and all age groups moved rapidly in the same direction. Likewise, the disappearance of the traditional gender gap between 1969 and 1972 cannot plausibly be explained by cohort changes. The patterns, therefore, hint at period effects.

The election of 2021 sticks out. There, young people aged 18–24 show the greatest gender gap ever reported for any group at any time, both in the directional and nondirectional perspectives. The value of the nondirectional index of 16 is 43% higher than the pre-2017 record from voters aged 60+ in 1969. Young women’s voting behavior in 2021 was 0.65 points more to the left than that of their male counterparts. In absolute terms, this value is 44% higher than the pre-2017 record (age 45–59 in 1969; gap in the opposite direction). The 2021 election is also unique in heterogeneity by age group. In all previous elections, the gender gap was relatively similar across age groups. Concerning the nondirectional measure, the spread across age groups (highest minus lowest value) was below 5 points in all elections before 2017 but at 11.4 points in 2021. For the directional gender gap, the spread was below 0.25 in all pre-2017 elections but at 0.50 in 2021.

4.5 A Spotlight on the 2021 Election

The 2021 election is unique for its large gender gap and the vast age differences therein. To understand which specific parties are causing the gender gap in the different age groups, Fig. 5 shows these details. Gender gaps were largest among the youngest voters, for whom they aligned most clearly with the left–right scales. The three (center‑)left parties were substantially more popular among women whereas the others were gender-balanced (Christian democrats) or substantially more popular with male voters (liberals and right-wing populists). The two most popular parties among young voters, the Greens and liberals, were also the most gender unbalanced in that age group. The liberals and the Greens generally received less but more gender-balanced support from older voters. The Left was more popular among the youngest female voters, but gender-balanced to more popular with men in the other age groups. The only party whose gender gap was similar in all age groups is the AfD, which received more votes from men.

Fig. 5
figure 5

Zooming in on the election of 2021. Voting behavior by age group and gender. The figure contains information on the party’s size (x‑axis), the gender gap in voting (y‑axis), and rough information on the party’s position (they are ordered according to their average left–right position)

5 Comparison of Gender Gaps in Real Ballots and Surveys

To summarize, the demonstrated results from actual cast votes showed an emergence of the modern gender gap only in the 2010s, which contrasts with previous international studies that identified it in most Western countries decades earlier (Abendschön and Steinmetz 2014; Dassonneville 2021; Giger 2009; Inglehart and Norris 2003, p. 85). First, we need to distinguish between studies that identified the modern gender gap in voting behavior from studies that identified it in ideological self-placement on a left–right scale. Most importantly, Dassonneville (2021) analyzed a very large, six-digit survey sample for Germany and found that women have been identifying as more left-wing than men for almost 30 years now. Political ideology and voting are certainly closely related, but they are not the same. For example, ideology is considered to be more stable whereas voting might be affected by situational factors such as characteristics of a specific election campaign, candidates, or strategic considerations (Shorrocks 2018). Therefore, the findings from Dassonneville (2021) and the analyses presented here are not in direct contradiction.

Potentially more concerning are differences in the conclusions between previous studies on self-stated voting and the actual voting behavior from real ballots reported here. Such differences could stem from the following reasons: (1) random survey errors where previous international studies had small sample sizes at the period and country level or (2) survey bias and gender differences therein.

(1) Concerning sample size and random error, previous international studies did indeed have mainly small samples for each country individually and, for example, reported gender differences for Germany that were partly statistically insignificant or substantially small (Abendschön und Steinmetz 2014; Giger 2009; Inglehart and Norris 2003, p. 85).Footnote 9 They identified the modern gender gap for several countries together and found no clear or statistically significant indication that Germany differed from the general, international trend. Such studies might have underestimated heterogeneity between countries. The implication for future research would be that greater caution should be exercised in deriving country-specific interpretations from cross-national studies with small-to-moderate samples.

(2) The previous finding of an early emerging modern gender gap in Germany might result from gendered survey bias concerning sampling, selective participation, and, in particular, social desirability bias. Social desirability bias is common in political polls and often leads people to hide more extreme political positions (Johann et al. 2016; Stout et al. 2021). Previous research from nonpolitical fields found greater social desirability bias among women than among men (Chung and Monroe 2003; Dalton and Ortegren 2011; Hebert et al. 1997; Paunonen 2016). Suppose this is also the case for political surveys. In that case, preferences for radical parties will be more strongly underestimated among women than among men, which biases the estimated gender gap for extremist voting and potentially the left–right gap overall. If this was the case, researchers should keep the existence and the probable direction of bias in mind when interpreting survey-based findings of political gender gaps.

To explore the possibility of such bias, I compare the results on voting behavior from real ballots with estimates from survey data. Further, I test whether the survey results are biased for the most likely case of social desirability bias: the radical right-wing party AfD. Across Western societies, right-wing populist parties receive the most negative feelings from other partisans (Gidron et al. 2023; Harteveld 2021). This pattern also shows in Germany, where supporters of the right-wing populist AfD face strong dislike from other members of society (Hudde 2022). Indeed, people more often vote for the populist right-wing AfD than they admit in surveys (Gschwend et al. 2018). More generally speaking, these analyses inform us about whether survey data give unbiased estimates of political gender gaps, including in cases of high expected levels of social desirability bias, e.g., because of higher support for radical parties.

5.1 Survey Data Sources

I use two sources of survey data to compare with the real ballots, which have complementary advantages and disadvantages.

First is the post-election cross-sectional study, the German Longitudinal Election Study (GLES 2020, 2022), which includes the German survey of the Comparative Study of Electoral Systems (CSES). This survey is chosen because it has the advantage that it surveys the same outcome, namely voting behavior in the federal election, follows the highest scientific standard, and is widely used by international scholars. The disadvantage is that this study is limited in its coverage and sample size. It covers the elections from 2009 to 2021 and sample sizes range between 1900 and 3400 per election—which is not very large in absolute terms but still larger than the CSES samples of most other countries.

Second is the Politbarometer, a monthly survey of voting intentions, commissioned by the public TV channel ZDF (Forschungsgruppe Wahlen 2022). Its scientific standards and reputation are possibly not on par with the purely academic GLES/CSES, but the Politbarometer is nevertheless recognized and used by the scientific community (Wüst 2003). The Politbarometer is chosen because it has two main assets compared with the GLES/CSES: it has been running since 1977 and has a very large sample of almost one million respondents overall. Data are available up to 2020. For the comparison with the real ballots, the Politbarometer has two relevant disadvantages: it surveys voting intention and not past voting behavior, and it does not refer to the same time points (the Politbarometer is running continuously whereas real ballots refer to specific election dates).

5.2 Results: Aggregate-Level Comparison of Gender Gaps in Real Ballots and Survey Data

Figure 6 plots the gender gap from real ballots against the estimates from the two survey data sources. The left-hand panel plots the nondirectional gender gap (see the gray dots in Fig. 3) and the right-hand panel the directional gender gap (see the brown diamonds in Fig. 3).

Fig. 6
figure 6

Comparing the gender gap as estimated by real ballots (colored dots, connected with lines) and two types of surveys. (Data from the Politbarometer are smoothed using local mean smoothing and a bandwidth of 1.5 years (Fan and Gijbels 1996; Gutierrez et al. 2003) (weighting follows the Epanechnikov distribution))

When comparing real ballots with the Politbarometer, we need to keep in mind that the real ballots refer to specific reference dates (i.e., election dates) whereas the Politbarometer line shows the smoothed line of continuous data collection. Overall, the Politbarometer shows a relatively similar picture to the real ballots. However, for the directional gender gap, the line of the Politbarometer is continuously above the connected dots from real ballots in the period between around 1990 and 2015. The difference is not very large, but it still suggests that the Politbarometer might be slightly biased in the direction that, relative to men, women are estimated to vote more left-wing than they actually do.

Estimates from the GLES/CSES seem very noisy overall. For example, GLES/CSES estimates the nondirectional gender gap as being very high for the election of 2009 and relatively low for 2013. Real ballots show that it was, on the contrary, slightly higher in 2013 than in 2009. Furthermore, for the directional gender gap, the GLES/CSES gives a substantially wrong picture for the 2013 election. According to GLES/CSES, there is already a major modern gender gap, whereas the real ballots show that the directional gender gap was actually zero. The GLES/CSES hence estimates a sudden emergence of the modern gender gap between 2009 and 2013, even though real ballots show that this change only occurred between 2013 and 2017. Overall, and at the aggregate level, there is no clear evidence for a time-constant bias in the GLES/CSES data because the deviations of GLES/CSES from real ballots go in different directions in different years. Rather, it seems that these survey data are simply very noisy and one should refrain from interpreting election-to-election changes substantively.

5.3 Results: Party-Level Comparison of Gender Gaps in Real Ballots and Survey Data

Aggregating voting behavior over all parties may hide relevant survey bias at the party level. The gender gaps estimated by the Politbarometer are similar to results from real ballots for CDU/CSU, SPD, and The Left; and moderately similar for the Greens and the FDP (Fig. 7). For the FDP, there seems to be a bias in the direction that the Politbarometer estimates it to be more men-leaning than is actually true for all periods since around 2000. However, there is a major difference in the estimation of the gender gap concerning the right-wing party AfD: the Politbarometer estimates the gender gap to be much larger than what real ballots show. The difference between results from real ballots and Politbarometer estimation does not stem from random error: Over the total period between 2013 (the emergence of the AfD) and 2020, the Politbarometer estimates the standardized gender gap at −80.9, with a narrow confidence interval, ranging from −84.0 to −77.8 (95% confidence interval computed via bootstrapping). The standardized gender gap according to real ballots, averaged over the elections 2013, 2017, and 2021, is −51.3 and therefore differs strongly from the survey-based confidence interval. This shows that the gender gap for the AfD as estimated by the Politbarometer is strongly biased.

Fig. 7
figure 7

Comparing the standardized gender gap as estimated by real ballot (colored dots, connected with dashed lines) and two types of surveys

Party-level results from the GLES/CSES also seem noisy, with relevant deviations for some parties and elections. For example, the estimated gender gaps for The Left and FDP fluctuate strongly and differ from the results of real ballots. Looking at the estimated gender gap for the AfD, the party with the strongest expected social desirability bias, the GLES/CSES estimates a gender gap similar to real ballots for 2013 and 2021. For 2017, the GLES/CSES estimates a strongly inflated gender gap. Taking all three elections together, the gender percentage difference from GLES/CSES is −57.3, with the 95% confidence interval ranging from −79.4 to −35.2. Thus, the estimate from the GLES/CSES for all elections combined is somewhat too large, but the confidence interval is rather broad and the difference between survey and real ballots is not statistically significant. It is thereby not clear whether the difference between real ballots and the GLES/CSES is caused by random error or bias.

Based on these different findings for AfD voting, I investigated the survey-estimated gender gap in AfD voting using three additional data sets that have previously been used in studies on AfD voting: the European Social Survey (ESS), the German General Social Survey (ALLBUS), and the Socio-Economic Panel (SOEP) (e.g., Hartmann et al. 2022; Lengfeld 2018; Lux 2018; Steinmann 2022; Tutić and von Hermanni 2018).Footnote 10 Results show that (1) the point estimates on all surveys overestimate the gender gap in AfD voting, (2) this overestimation is much less severe in the other surveys than in the Politbarometer, and (3) the overestimation of the gender gap is only statistically significant in ALLBUS, where it is also moderately strong (−66.2 in ALLBUS vs. −51.3 among the real polls). The SOEP seems to yield the most reliable estimate. A figure comparing these estimates is shown in the Online Appendix.

Overall, the comparison with survey data shows that (1) very large survey data provide reasonable estimates for the gender gap for most parties, but (2) not necessarily for radical right-wing parties and presumably other parties where social desirability bias is a major issue. In Germany, where the share of such right-wing parties has been relatively low, this averages out to fairly accurate estimates with only a low bias at the aggregate level. However, when more extremist and/or populist parties rise, the estimates based on survey data such as the Politbarometer could become substantially biased as well. (3) The data from the smaller but—presumably—higher quality datasets partly hint toward the same direction of bias, but at a smaller magnitude. Overall, sources such as the GLES/CSES have samples that are too small and produce estimates of gender gaps that are too noisy to interpret any election-to-election changes for single countries, let alone single parties or subpopulations such as age groups.

6 Discussion

This article analyzed data from actual cast votes to trace the gender voting gap in Germany over seven decades. The data analyzed are free from sample bias and the large sample allows for precise estimations, even at the age group and party level.

Women tended to vote for more conservative parties than men until the 1970s. Then, women and men voted for parties that were similar on the left–right scale up to and including 2013. Women and men have already been voting relatively differently in nondirectional terms since around 2009, but the directional, modern gender gap in voting only appeared with the 2017 election. Once the move toward women voting more to the left than men started, it was so fast that it cannot be explained by cohort succession.

Results contrast with those of previous international studies that identified the modern gender gap in most Western countries decades earlier. To uncover the reasons for this contrast, I compared the results from real ballots with estimates based on survey sources. Concerning methodology, this comparison overall showed that (1) large surveys provide reasonable estimates at the aggregate level but (2) can suffer from bias for more radical parties. It shows that the gender gap in radical right-wing voting is smaller than in surveys, as the Politbarometer suggests. Finally, surveys such as the GLES/CSES are too noisy for interpreting results for single years, countries, or even single parties. Concerning substantial conclusions, the survey estimates also show a rather late emergence of the modern gender gap. The difference between the results reported here and previous conclusions about an earlier gender gap does not seem to stem from biased estimates of the gender gap. Rather, international studies had samples that were too small to draw conclusions about individual countries that might deviate from the average cross-country trend, such as Germany.

Post-hoc reasoning might suggest that differences in the estimation of the AfD gender gap between different surveys could stem from the survey mode. Social desirability bias is found to be large on the telephone, the survey mode that Politbarometer uses (Holbrook et al. 2003; Jäckle et al. 2006, 2010). ALLBUS and ESS were conducted face-to-face and self-administered, either online or with a paper questionnaire sent via mail. The seemingly lower social desirability bias in SOEP would be in line with the previous finding of particularly low bias in long-running panel studies where interviewer and interviewee know each other (Kühne 2018). Future applied research should keep in mind that survey results could overestimate the degree to which radical (right-wing) parties are more popular among men. Future methodical research could elaborate more on the role of survey attributes, such as survey mode, for gender differences in social desirability bias.

The speed with which the modern gender gap has emerged means that it cannot be explained by gradual, long-term changes alone. However, long-term trends, such as the decline in religiosity or women’s increased participation in the labor market, could interact with period effects, such as the emergence of a new party or some parties’ leading candidates. Long-term change could build up a push in one direction, and period effects could either temporarily block or catalyze that push.

To better understand the late-but-fast emergence of the modern gender gap, we can disentangle which parties’ electorates have contributed to it. In part, it is due to the emergence and growth of the AfD. This party had its biggest electoral success in 2017, and the popularity of AfD with the male electorate was the largest single contributor to the rise of the modern gender gap between 2013 and 2017. In most other Western democracies, radical right political parties had emerged much earlier than in Germany (Arzheimer 2015). The reason for the late emergence of the modern gender gap in Germany might therefore, at least partly, lie within the party system. It is imaginable that the gendered demand for nostalgic, right-wing representation was already present before 2013, but there was no party that supplied this type of politics (Arzheimer and Berning 2019; Schulte-Cloos 2022).

However, the modern gender gap in Germany is more than just a radical right gender gap. For example, the gender gap grew moderately between 2017 and 2021, although the AfD’s contribution to this gender gap declined as its electorate became both smaller and slightly more gender balanced. This gender gap beyond the radical right is particularly evident among the youngest voters. The gender gap is the largest in this age group but the contribution of AfD voting to the gap is comparatively small. More influential is that young men vote most often for the liberal FDP, whereas young women vote most often for the Greens. In fact, among young voters, all three left-of-center parties—the Left, the Greens, and the Social Democrats—are substantially more popular among women, which was not the case in any previous election.

The role of the AfD in the emergence of the gender gap is at least partly in line with the cultural backlash and related theories. According to these approaches, the modern gender gap may occur because men more often belong to the perceived or actual losers of modernization and therefore elect parties that counter modernization and promise to bring back the “good old days.” The early AfD, after its foundation in 2013, was described as a nonradical party that combined “soft Euroscepticism with economic liberalism and socially conservative policies” (Arzheimer and Berning 2019, p. 1). It was only in 2015 that the party placed more emphasis on migration and cultural issues including gender roles, and turned more radical and populist (Arzheimer and Berning 2019; Jankowski et al. 2021). One continuity between the Euro focus and the culture focus is the nostalgic character, a rejection of globalization and modernization. The slogan of the AfD’s 2021 campaign was “Germany. But normal” (in German: “Deutschland. Aber normal.”; see for example, Ruhose 2021; compare also Suckert 2022). This might be less blatantly nostalgic than “make America great again” or the Brexiteers’ “take back control”, but it might still play into a similar vein.

However, these considerations cannot explain the modern gender gap beyond the radical right, especially men’s higher propensity to vote for the liberal FDP and women’s higher propensity to vote for the Greens or the other left-wing parties. The FDP is generally market liberal but center-left in terms of socio-cultural aspects (Jolly et al. 2022). On a religious–secular dimension, the FDP, as well as the Greens, are secular (Euchner and Preidel 2018). In the elections of 2017 and 2021, when the FDP received almost 20 and above 30% of votes from the youngest men respectively, the party did not center on socio-cultural topics. Rather, it pushed standard economically liberal positions, such as lower taxes, and further stressed the importance of investing in education, boosting entrepreneurship, and advancing digitization (Höhne and Jun 2019; Jesse 2021). Unlike the nostalgic radical right parties and typically for liberal parties (Steenvoorden and Harteveld 2018), the campaigns of the FDP in 2017 and 2021, played optimistic tones and advocated for more change and more modernization. It seems that the party wanted to attract young voters who are the perceived or actual winners of globalization, digitization, and social change. In sum, gender differences in voting between the Greens and FDP cannot be explained by religious considerations or cultural backlash and related approaches.

Could parties’ positions on gender equity-related issues explain the patterns among young voters? Today’s gendered voting patterns among young voters partly align with the parties’ positions on gender roles (see Diabaté et al. 2023). According to respondents of the Open Expert Survey 2021, the Greens most strongly and most saliently endorse political interventions toward egalitarian gender roles, followed by the Left and SPD (Jankowski et al. 2021). The FDP takes a middle position on gender roles, combined with a low salience of this topic.Footnote 11 However, it is not obvious to explain a change toward a modern gender gap with the parties’ position on gender roles, because these differences have existed for many decades. In fact, Henninger and Von Wahl (2019) argue that German parties—with the strong exception of the AfD—have rather depolarized and aligned their positions on gender-equity topics in the direction of more egalitarian gender roles. Parties’ gender-role positions could only be a promising explanation if combined with a sharp increase in the degree to which it is election decisive. Socio-cultural issues have generally and gradually become more salient (Norris and Inglehart 2000), but it remains open whether the importance of gender-equity topics for party choice increased so strongly and rapidly that it could explain the changes in voting behavior.

Finally, rapid change could always stem from the period effect of candidate-driven voting and a same-gender preference (Plutzer and Zipp 1996). In particular, one could think of a ‘Merkel effect’: the popular female leader drew women toward her conservative CDU/CSU. However, voting patterns do not show clear evidence for this. In 2005, when Angela Merkel ran against Gerhard Schröder, who was often labeled ‘macho’ (e.g., Kister 2019), her party’s electorate was gender balanced, and Schröder’s party was more popular among women than men. Most of the shift of women toward the Christian democrats occurred only after Merkel was already in power. When Merkel left the political scene in 2021 and the CDU/CSU nominated a male candidate, Armin Laschet, their electorate became comparatively more popular among men, but women were still more likely to vote for CDU/CSU than men. The pattern among the Greens shows no signs of same-gender voting. In earlier elections, the Greens had mixed-gender lead teams, but in 2021, they put Annalena Baerbock in the first position, instead of the popular Robert Habeck. However, the Green’s electorate was somewhat more gender balanced in 2021 than in the previous elections. These observations do not hint at the lead candidate’s gender as being the most central force behind the emerging gender gap in voting behavior in Germany (Debus 2017). However, one could only speculate about counterfactuals, for instance, one where the CDU/CSU had a male candidate instead of Merkel. Would the party’s electorate have become more male-leaning, which would have contributed to an earlier emergence of the modern gender gap?

The findings from the analyses presented here are based on exceptionally large and reliable data, but the tentative theoretical interpretations of these results must be treated with caution. Some of the empirical findings were surprising and the discussion of possible theoretical explanations involves post-hoc reasoning. These non-exhaustive considerations should therefore be seen as a starting point for more in-depth research and hypothesis testing, rather than a conclusion. Future research could, for instance, more explicitly test the role of “good old days” appeal and motives regarding a cultural backlash in the emergence of the modern gender gap. Furthermore, the question to what degree young voters are mobilized by gender equity-related topics and how that differs between women and men, for instance, seems understudied. A promising avenue might also be to analyze whether the move to the modern gender gap is primarily driven by certain demographic groups, such as those from East or West, more urban or more rural areas, and higher or lower levels of education.

After decades of relative similarity between the genders, men and women today are voting more and more differently. Throughout Germany’s post-World War II period, the political division between women and men has never been as large as among young voters in 2021. This suggests that gender could become a relevant political cleavage. If so, it could undermine progress on gender equity by moving the issue to a party-political battleground instead of a field of constructive cooperation. Growing gender gaps could also mean that politics increasingly seeps into private lives, as people are more likely than ever to disagree with their different-gender family members, spouses, or potential dating partners.