1 Introduction

In response to aging populations, many governments are reducing the generosity of retirement pensions for younger cohorts through increases in eligibility ages or reductions in benefit amounts. A key aim of these reforms is to encourage older workers to remain in the labor force, making it vital to understand how these policies affect retirement decisions. While several studies find large effects of these reforms on retirement decisions (e.g., Börsch-Supan & Schnabel, 1998; Cribb et al., 2016; Geyer & Welteke, 2021; Hanel & Riphahn, 2012; Lalive et al., 2017; Snyder & Evans, 2006; Staubli & Zweimüller, 2013; Vestad, 2013), it has been difficult to disentangle the key mechanisms as these reforms may affect retirement behavior through several channels. For example, these reforms may encourage people to delay retirement due to (i) a decrease in individuals’ lifetime pension wealth, known as the wealth effect, (ii) the opportunity to work for a longer period and thus accrue a larger pension, known as the accrual effect, (iii) a change in social norms, (iv) the presence of liquidity constraints, or (v) by delaying any income/earnings tests that only apply when receiving a pension, which make it unattractive for people to continue working after they become eligible for retirement benefits.

An important paper in this literature is Atalay and Barrett (2015). Atalay and Barrett (2015) study an Australian reform in 1994 that gradually raised women’s eligibility age for the means-tested public retirement pension from 60 to 65 based on their birthdates (see Fig. 1). Atalay and Barrett (2015) argue that this reform is useful for understanding mechanisms because Australia’s retirement pension is non-contributory. This means that there is no accrual effect — an effect which has been emphasized in the literature (Gruber & Wise, 2004; Samwick, 1998) — and any impact on retirement decisions must result from other mechanisms. Atalay and Barrett (2015) estimate the impact of the reform on female labor force participation at ages 60 to 64 using repeated household surveys ranging from 1994 to 2010 and a differences-in-differences design in which male cohorts form the comparison group. Their headline estimate is large, reflecting the stronger increase in female participation rates over this period (see Fig. 2). They estimate that each one-year increase in women’s eligibility age increased female labor force participation by 12 percentage points, an effect they interpret as evidence of strong wealth effects.Footnote 1

Fig. 1
figure 1

Age pension age (APA) by date of birth and gender. Notes: This figure shows the eligibility age of Australian men and women for the Age Pension based on their date of birth. The phased increases in women’s Age Pension Age (APA) from 60 to 65 were legislated in 1994.

Fig. 2
figure 2

Labor force participation rates of men and women aged 60–64 over time. Notes: This figure replicates Fig. 1 in Atalay and Barrett (2015). The black lines plot aggregate time series data from the Labor Force Survey, while the gray lines show the calculations based on the Income and Housing Costs Surveys (with the X indicating each survey). I also include a dashed line denoting the counterfactual change in female labor force participation (LFP) over Atalay and Barrett’s (2015) sample period (1994–2010); this line shows how female LFP would have changed if it followed the same trend as male LFP over the sample period. Evidently, female LFP increased considerably more than male LFP over this period.

In this paper, I start by replicating Atalay and Barrett’s (2015) labor force participation estimates. Using the same source data, I apply the sample restrictions and methods described in Atalay and Barrett’s (2015) paper and find that their estimates can be successfully replicated. Namely, due to the clarity of Atalay and Barrett’s (2015) paper, I am able to closely match the sample and regression estimates without needing to use original program files. Then, I extend the analysis with earlier data, building on a literature that is increasingly scrutinizing difference-in-difference methods (e.g., Bertrand et al., 2004; Borjas, 2017; Cameron et al., 2008; Donald & Lang, 2007; Goodman-Bacon, 2018; Wolfers, 2006). In the Australian context, one concern with using men as the comparison group is that, as in many countries, the participation rate for women has risen considerably over recent decades, while the rate for men has been relatively stable. With these gender-specific time trends in mind, later cohorts of women may have higher participation rates for reasons other than their higher pension age. If this is the case, Atalay and Barrett’s (2015) estimates may be biased upwards. As Atalay and Barrett’s (2015) sample period started after the reform was announced, they were unable to use their data to examine whether the parallel-trends assumption held before the reform for the cohorts in their sample.

I extend the analysis using two sources of data from before the reform was announced. First, I use aggregate data on participation rates by age and gender to examine whether the parallel-trends assumption held before the reform for the cohorts in Atalay and Barrett’s (2015) sample. This builds on Fig. 1 in Atalay and Barrett (2015), which showed similar pre-reform trends in male and female participation rates near the retirement age among earlier cohorts. For the relevant cohorts, I find opposing trends in male and female participation rates before the reform (at younger ages), with a strong upward trend in female participation rates and a weak downward trend in male participation rates (see Fig. 3). For example, at ages 45 to 49, women’s participation rate increased by 18.6 percentage points in the fifteen years before Atalay and Barrett’s (2015) sample period (from 52.5% to 71.1%), while men’s participation rate decreased by 2.4 percentage points (from 93.5% to 91.1%). Moreover, I find similar gender-specific trends among these cohorts at both older and younger ages, suggesting that women would have reduced the gender gap in participation rates at ages 60 to 64 over Atalay and Barrett’s (2015) sample period even if the reform had not occurred. This indicates that Atalay and Barrett’s (2015) estimates, which attribute the convergence in male and female participation rates to the reform, may be biased upwards.

Fig. 3
figure 3

Labor force participation rates of the relevant cohorts at younger ages. Notes: These figures plot labor force participation (LFP) rates from the Labor Force Survey (in black) and the Income Distribution Surveys (in gray). Between the vertical dashed lines, the cohorts in Atalay and Barrett’s (2015) sample were aged 45–49 in (a) and 50–54 in (b). During these periods, male and female LFP rates follow opposing trends. Thus, the parallel-trends assumption did not hold for the relevant cohorts before the reform.

Second, I use repeated household surveys from 1982 to 1990 to examine whether the different trends of men and women can be explained by the control variables in Atalay and Barrett’s (2015) regressions. Specifically, I estimate placebo regressions using Atalay and Barrett’s (2015) differences-in-differences approach to test for an impact on the participation rates of female cohorts before the reform was announced. In these regressions, positive estimates of the (placebo) treatment effect would indicate that Atalay and Barrett’s (2015) estimates are likely to be biased upwards. I find that the estimates are positive, highly statistically significant, and broadly consistent with the size of Atalay and Barrett’s (2015) estimates in percentage terms. Thus, Atalay and Barrett’s (2015) estimates are likely to confound women’s responses to the reform with a persistent cohort trend in female participation rates.

In the final part of the paper, I use the replication sample to re-estimate the effects of the reform on female labor force participation. I follow the empirical strategy of several recent studies examining phased increases in pension-eligibility ages (Cribb et al., 2016; Oguzoglu et al., 2020; Staubli & Zweimüller, 2013). This approach differs from Atalay and Barrett’s (2015) approach by directly controlling for the trend in female participation rates across cohorts. These controls may partially or even fully absorb any wealth effects from the reform but not effects from other mechanisms, such as a change in social norms, liquidity constraints or the pension’s income test. I find that the estimated increase in female labor force participation falls by two-thirds to 4.1 percentage points and becomes statistically indistinguishable from zero at the 10% level (with a p value of 0.109). This estimate provides only weak evidence that the reform had any effect on female labor force participation. This modest impact is consistent with the raw data, which shows little evidence of a drop in female participation rates at the eligibility age (and only a small decrease in male participation rates at their constant eligibility age of 65). Moreover, the estimated effect is similar in magnitude to (unpublished) estimates of the reform by Morris (2019) based on more detailed survey data.Footnote 2 These impacts are not only smaller than Atalay and Barrett’s (2015) estimates but also, as discussed below, smaller than estimates of similar reforms in other countries.Footnote 3

More generally, the results in this paper have implications for studies estimating the impact of retirement reforms. Over recent decades, female labor force participation rates have increased considerably across the developed world; from 1970 to 2016, the participation rate of women aged 25 to 64 in OECD countries increased from 46.3% to 68.1% (OECD, 2018). In the context of Australia, my results show that these trends can result in meaningful differences across cohorts in female participation rates near the retirement age. Even when these differences appear to reflect logical responses to a decrease in the generosity of retirement benefits for later cohorts — and even when differences across earlier cohorts are minimal near the retirement age — it is important, where possible, to examine trends across the relevant cohorts in earlier time periods. This recommendation applies generally and is not specific to studies focused on the effects on female labor supply; studies focused on other potentially correlated outcomes like welfare receipt or health should also be careful in attributing cohort differences in outcomes to the effects of pension reforms.

2 Background and replication of Atalay and Barrett (2015)

2.1 Institutional background

Australia’s public retirement pension, the Age Pension, is received by around 70% of older Australians. Unlike in many other countries, employment history does not affect pension eligibility or payment rates. Instead, there is a means test based on the current income and assets of the household; from a maximum level, payments are gradually reduced to zero for households with higher income and assets. In practice, around two-thirds of pension recipients receive the maximum payment (Barrett & Tseng, 2008). In 2008, near the end of Atalay and Barrett’s (2015) sample period, this payment was $546.80 every two weeks for singles (~$14,300 per annum) and $913.60 for couples combined (~$23,800 per annum).Footnote 4 For perspective, the median equivalized household income in Australia in 2008 was $692 per week. Thus, a single household receiving the maximum payment and no other income would have income equal to 40% of the median household income in the Australian population (44% for couples).Footnote 5

Households are eligible for the maximum payment if their assessable income and assets are below the respective income and asset disregards.Footnote 6 Otherwise, payments are determined by either the income test or the assets test, whichever implies a lower payment. In 2008, each dollar of assessable income above the income disregard reduced Age Pension payments by 40 cents (20 cents each for couples) and each $1000 of assessable assets above the asset disregard reduced payments by $1.50. In practice, the income test binds for many more pensioners than the assets test (Chomik et al., 2018).

To be eligible for the Age Pension, individuals must reach the Age Pension Age (APA). The APA is most similar to an Early Retirement Age (ERA); it is the first age individuals can claim the Age Pension. However, there is a stronger incentive to start claiming at the APA because payment rates do not depend on the age at which individuals start claiming. Prior to the Social Security Legislation Amendment Act (No. 2) 1994, the APA was 60 for women and 65 for men. This reform, proposed in 1993, legislated that women’s APA would increase to 65. As shown in Fig. 1, the increases were phased in based on women’s date of birth. For example, the APA increased to 60.5 for women born between July 1935 and December 1936, and it increased to 61 for women born between January 1937 and June 1938. The APA continued increasing in this manner until it reached 65 — the APA of men — for women born after December 1948.

Other sources of retirement funding exist besides the Age Pension. For example, many Australians receive other government payments before they reach the APA. Oguzoglu et al. (2020) show that, depending on the cohort, around 30–45% of women receive a regular payment before they reach the APA, with the most common payment being the Disability Support Pension. Like the Age Pension, other payments are non-contributory and means tested, and many have the same payment rates. These payments have additional eligibility criteria but no age conditions. Typically, recipients of other payments are transferred onto the Age Pension at the APA (Oguzoglu et al., 2020). Australians can also fund their retirement privately through voluntary savings and a mandatory scheme called superannuation, in which employers make contributions to their employees’ private retirement accounts.Footnote 7 However, the Age Pension is the main source of retirement income for older women.Footnote 8

2.2 Sample

Atalay and Barrett’s (2015) sample comes from 11 cross-sections of the nationally representative Australian Bureau of Statistics Income and Housing Costs Surveys (IHCS). The first survey was in 1994/95 and the last in 2009/10, with surveys conducted over the course of a financial year and a gap of at most one year between surveys.Footnote 9 Atalay and Barrett (2015) restrict their sample to men and women aged 60–64. They also exclude immigrants who have lived in Australia for less than ten years, who would not satisfy the residency condition for the Age Pension. Using the same source data and the information in Atalay and Barrett’s (2015) paper, I am able to closely replicate their sample. Appendix Table B1 shows that the replication sample is similar to Atalay and Barrett’s (2015) sample in terms of the number of observations and the means of the key variables.Footnote 10 Overall, women in the sample have low rates of labor force participation (LFP), with just 27% of women participating compared to 52% of men. A relatively high proportion of women in the sample receive a government payment (56%),Footnote 11 with 39% of women receiving a payment before they are eligible for the Age Pension. Due to the means-testing of payments, payment recipients are typically in lower income households. In 2008 dollars, female payment recipients have average equivalized household income of $441 (including pension income),Footnote 12 compared to $860 for non-recipients.

2.3 Mechanisms affecting the impact on female labor force participation

Before describing Atalay and Barrett’s (2015) estimates, it is worth discussing the relevant factors that may affect the size of the reform’s impact on female LFP. The reform may increase female LFP at ages 60–64 for two main reasons. First, it may be optimal for some women to permanently leave the labor force (i.e., retire) exactly at the APA. This optimality may arise due to (i) the income test for the Age Pension, which results in an increase in the effective marginal tax rate on earnings of up to 40 percentage points at the APA, or (ii) liquidity constraints, since individuals cannot borrow against their future Age Pension wealth. The APA may also influence social norms by providing a signal as an appropriate age to retire (Cribb et al., 2016). Second, the reform reduces the present value of women’s expected lifetime income from the Age Pension. This negative wealth effect is the key mechanism emphasized by Atalay and Barrett (2015). Atalay and Barrett (2015) construct a model that indicates that women’s optimal retirement age will increase as long as leisure is a normal good. The model implies that the reform will affect retirement decisions at all ages, including below the old APA and above the new APA. However, studies of comparable reforms indicate that most of the impact is likely to result from a shift in retirement decisions at ages in which pension eligibility is affected (Cribb et al., 2016; Geyer & Welteke, 2021; Staubli & Zweimüller, 2013).Footnote 13

There are also several factors that may attenuate the impact of the reform on female LFP. First, women in these cohorts had very low rates of LFP, with many women having left (or never entered) the labor force before age 60. Second, other government payments provided a substitute form of retirement income for many women until they reached the new APA (Atalay & Barrett, 2015; Oguzoglu et al., 2020).Footnote 14 Third, the substitution to other payments was particularly strong from single women (Oguzoglu et al., 2020), who are more likely to be financially constrained and faced a larger reduction in income from the retirement pension (due to the higher per-person payment rates for singles). Fourth, the Age Pension is unavailable to many older women due to the means test; Oguzoglu et al.’s (2020) estimates imply that the reform had no effect on Age Pension receipt for 48.1% of women.

Finally, there is a mismatch between women who are likely to claim the Age Pension at the APA and women who are most likely to be in the labor force at ages 60–64. Appendix Fig. B1 shows how Age Pension receipt and LFP varies across the distribution of household income for women in the sample. I sort women into quintiles based on their (current) equivalized household income, excluding their own income from the Age Pension (and other payments). In Appendix Fig. B1a, I present Age Pension receipt for women in the sample who are above the APA, and in Appendix Fig. B1b I present the LFP rates of women at age 59, before all women have reached the APA. As expected, Age Pension receipt is much higher for women in lower income households, while LFP is much higher for women in high-income households. Thus, the group of women who are more likely to remain in the labor force at ages 60–64 were likely to be relatively unaffected by the reform.

2.4 Atalay and Barrett’s (2015) estimates

Atalay and Barrett (2015) estimate the causal effect of the reform on female LFP using a differences-in-differences design in which male cohorts form the comparison group. Figure 2 shows this approach graphically. Figure 2 replicates Fig. 1 in Atalay and Barrett (2015), showing the trends in male and female LFP at ages 60–64 over time. The black lines plot aggregate time series data from the Australian Bureau of Statistics Labor Force Survey, while the gray lines show the calculations based on the IHCS data (with the X indicating each survey).Footnote 15 I also include a dashed line denoting the counterfactual change in female LFP; this line shows how female LFP would have changed over Atalay and Barrett’s (2015) sample period if it followed the same trend as male LFP. Evidently, female LFP increased by 13.2 percentage points more than male LFP over the sample period. If men are a suitable control group, this comparison indicates that the reform has caused a substantial increase in female LFP.Footnote 16

Atalay and Barrett (2015) formalise their approach using a regression framework. In their regressions, they estimate the impact of the reform on the LFP of women who were affected by the reform, those born after June 30, 1935 (the “after” cohort). Their approach compares the LFP of women in the after cohort with the LFP of women born before July 1, 1935 (the “before” cohort) — whose APA remained at 60 — using the equivalent male cohorts as the comparison group. Atalay and Barrett (2015) estimate the following regressions to implement this approach:

$$LF{P}_{i}=\beta {x}_{i}+{\alpha }_{0}Femal{e}_{i}+{\alpha }_{1}\Pr (Cohort\mathop{A}\nolimits_{i}^{* }=1)+\delta [Femal{e}_{i}\\ \times \Pr (Cohort\mathop{A}\nolimits_{i}^{* }=1)]+{u}_{i}$$
(1)

where LFPi is an indicator for individual i participating in the labor force; Femalei is an indicator for individual i being female; \(\Pr (Cohort A_{i}^{\ast}=1)\) is the probability that individual i is in the after cohort; and xi includes controls for age, marital status, an indicator for having a bachelor’s degree or higher, household size and state of residence.Footnote 17 The key explanatory variable is \(Femal{e}_{i}\times \Pr (CohortA_{i}^{\ast}=1)\). Its coefficient, δ, identifies the effect of the reform on the LFP rates of women in the after cohort under the assumption that, if not for the reform, the difference in the LFP rates of women in the before and after cohorts would have been equal to the corresponding difference among men.

Atalay and Barrett’s (2015) estimate of δ is 7.7 percentage points (p < 0.01). This estimate indicates that, on average, the reform increased the LFP of women in the after cohort by 7.7 percentage points at ages 60–64. This estimate implies a large effect, given that it is an average impact among women in the affected cohorts and 41% of women in these cohorts had already reached the APA when surveyed. For example, if we were to assume that this impact only applied to women below the APA (consistent with the findings of similar studies), this estimate implies an increase in LFP of 13.1 percentage points.Footnote 18 Overall, Table 1 shows that I am able to closely replicate this estimate. I find an estimate of δ of 7.8 percentage points, and the standard error is almost identical to the one reported in Atalay and Barrett (2015) to three decimal places.

Table 1 Replication of baseline estimates.

In addition, I am able to closely replicate Atalay and Barrett’s (2015) other LFP estimates (see Appendix Table B2); these estimates come from modified versions of Eq. (1) that allow the treatment effect to vary based on women’s APA cohort (and APA cohort and age). For example, I am able to closely replicate Atalay and Barrett’s (2015) headline estimate that each one-year increase in women’s APA increased female LFP by 12.0 percentage points (p < 0.05).

For robustness, Atalay and Barrett (2015) also estimate their regressions without men as a control group (dropping the female dummy and interaction terms from their regressions). These estimates are larger, which Atalay and Barrett (2015) argue validates the use of male cohorts as a control group. However, this analysis provides no evidence that male cohorts are correctly accounting for any unobserved cohort and time factors affecting female LFP. Atalay and Barrett (2015) also estimate regressions on the subsample of single men and women.Footnote 19 Atalay and Barrett (2015) motivate this analysis as allowing them to assess the robustness of the results to potential within-family spillovers. Atalay and Barrett (2015) estimate larger effects on this subsample (replicated in Appendix Table B2 for Eq. (1)). I examine whether single men are a suitable comparison group for single women in the next section.

From here on, I focus on the validity of the baseline specification (Eq. (1)). I do this because at least one of the identifying assumptions underpinning the expanded specifications will not hold if the baseline parallel-trends assumption does not hold. See Appendix A.1 for more details.

3 Robustness to cohort trends in female participation

In this section, I extend Atalay and Barrett’s (2015) analysis with earlier data. Starting with aggregate time series data on labor force participation rates by age and gender, I examine whether the parallel-trends assumption held before the reform for the cohorts in Atalay and Barrett’s (2015) sample.

Figure 3a and b show the trends in male and female LFP at ages 45–49 and 50–54 respectively. For now, focus on the black lines, which plot aggregate time series data from the Australian Bureau of Statistics Labor Force Survey. These lines show opposing LFP trends for men and women. For example, at ages 45–49, female LFP increased from 52.1% to 79.7% over the period from 1978 to 2010, while male LFP decreased from 94.0% to 89.8%. Figure 3a and b also include vertical dashed lines to denote Atalay and Barrett’s (2015) sample period shifted back fifteen and ten years respectively. Between these lines, the sample consists of individuals born between 1929 and 1950, the cohorts in Atalay and Barrett’s (2015) sample. During these periods, male and female LFP rates follow very different trends, with a large increase in female LFP rates and a small decrease in male LFP rates. Therefore, the parallel-trends assumption did not hold for the relevant cohorts before the reform.Footnote 20

A critical question is whether these gender-specific trends are driven by transitory factors or factors likely to persist through to Atalay and Barrett’s (2015) sample period. To assess this question, I compare the trends in LFP rates across the relevant cohorts at several points in time. In Fig. 4a, I present LFP rates at ages 50–54 over the period from 1983 to 1994 against the five-year lag of LFP rates at ages 45–49 (here, both series correspond to individuals born between 1928 and 1944). In Fig. 4b, I present LFP rates at ages 45–54 over the period from 1978 to 1994 against the ten-year lag of LFP rates at ages 35–44 (here, both series correspond to individuals born between 1923 and 1949).Footnote 21 In each figure, there are strong positive correlations for men and women between (i) the LFP rate in a given year and (ii) the lagged LFP rate of the same cohorts at a younger age. For example, there are strong upward trends in (i) female LFP rates at ages 45–54 from 1978 to 1994 and (ii) female LFP rates at ages 35–44 from 1968 to 1984, while the corresponding trends for men are weaker and in the opposite direction. Hence, persistent cohort factors can explain a large part of the gender-specific trends in Fig. 3.

Fig. 4
figure 4

Persistence of labor force participation trends across cohorts. Notes: These figures combine data from the Labor Force Historical Timeseries (1968–1977) with the Labor Force Survey (1978–1994). a plots labor force participation (LFP) rates at ages 50–54 from 1983 to 1994 against the five-year lag of LFP rates at ages 45–49 (here, both series correspond to the 1928–1944 cohorts). b plots LFP rates at ages 45–54 from 1978 to 1993 against the ten-year lag of LFP rates at ages 35–44 (here, both series correspond to the 1923–1949 cohorts).

To further understand whether these trends would have persisted if not for the reform, it is useful to understand the characteristics of women behind the rise in female LFP. For example, if the rise in female LFP was entirely driven by women in low-income households, many of these women may have dropped out of the labor force once they turned 60 and became eligible for the Age Pension. To examine this, I use three earlier surveys administered by the Australian Bureau of Statistics: the 1982, 1986 and 1990 Income Distribution Surveys. These surveys are appropriate for this analysis because they (i) provide a nationally representative snapshot of LFP by age and gender at several points in time prior to the reform and (ii) have information on household income. Below, I also use these surveys to check whether the different trends of men and women can be explained by the controls in Atalay and Barrett’s (2015) regressions and whether single men are likely to be an appropriate comparison group for single women.Footnote 22 As outlined by Siminski et al. (2003), these surveys differ from the surveys used by Atalay and Barrett (2015) in important ways. For example, age information in the pre-reform surveys is in five-year groups rather than single years, and the pre-reform surveys were conducted over a shorter period of time (less than three months) than the post-reform surveys, which were conducted over the course of a financial year. Nonetheless, I show in Appendix A.3 that the placebo regression estimates below are only marginally affected by these and other differences in the pre-reform surveys.

With these surveys, I exclude immigrants who have lived in Australia for less than ten years (to be consistent with Atalay and Barrett (2015)), and I restrict the sample to people aged 45–49 or 50–54.Footnote 23 The gray lines in Fig. 3a and b plot the LFP rates of men and women in each survey (with the X indicating each survey). As the graphs demonstrate, the LFP rates in these surveys are broadly consistent with the aggregate data at the time of each survey.Footnote 24

Figure 5 shows that the increase in female LFP rates has come from women in both high- and low-income households. Using the pre-reform surveys, I divide women in half based on the median equivalized household income among women of the same age in the relevant survey.Footnote 25 Figure 5 shows that although women in high-income households had much higher baseline rates of LFP, they exhibited a similar (and slightly larger) percentage-point increase in LFP rates between 1982 and 1990 at ages 45–49 and 50–54.Footnote 26 Assuming that most of these women would have remained in high-income households, they would have had limited access to government support at ages 60–64 even if the reform had not occurred. As such, it seems likely that women would have reduced the gender gap in LFP rates at ages 60–64 over Atalay and Barrett’s (2015) sample period even if the reform had not occurred.

Fig. 5
figure 5

Pre-reform female labor force participation rates by household income. Notes: These figures plot the labor force participation (LFP) rates of women from the 1982, 1986 and 1990 Income Distribution Surveys. I divide women in half based on the median equivalized household income among women of the relevant age in the relevant survey. Between the vertical dashed lines, the cohorts in Atalay and Barrett’s (2015) sample were aged 45–49 in (a) and 50–54 in (b). These graphs show that women in low- and high-income households contributed to the higher LFP rates among later cohorts prior to the reform. See Section 3 for more details.

One key question remains: can the control variables in Atalay and Barrett’s (2015) regressions explain the different trends of men and women across the relevant cohorts? To examine this question, I use the pre-reform sample to estimate placebo regressions with Atalay and Barrett’s (2015) baseline regression approach. That is, I estimate Eq. (1) on the pre-reform sample to test whether the LFP of women in the after cohort was affected before the reform. To estimate these regressions, I have to define \(\Pr (Cohort A_{i}^{\ast}=1)\), the probability that individual i is in the after cohort. With age information in five-year groups and surveys conducted over a three-month period, each individual’s date of birth lies within a window of five years and three months. This relatively large window makes it less appealing to follow Atalay and Barrett’s (2015) assumption that each individual’s date of birth is uniformly distributed within the possible window. Hence, I construct \(\Pr (Cohort A_{i}^{\ast}=1)\) in a more refined way that accounts for age-specific mortality and differences in the size of individual birth cohorts: using annual population counts by gender and single year of age from the Australian Bureau of Statistics (see Appendix A.2 for details).Footnote 27 This approach is validated by the fact that these surveys are strongly representative of the Australian population in terms of age (see Appendix Fig. B5).Footnote 28

For these regressions, positive and statistically significant estimates of δ, the placebo treatment effect, would indicate that Atalay and Barrett’s (2015) estimates are likely to be biased upwards by unobservable gender-specific differences in LFP across the relevant cohorts. Table 2 presents the results. In Panel A, I present the results for the full sample of men and women. Columns (1) and (2) show positive, large and statistically significant estimates of δ at ages 45–49 and 50–54 of 24.6 and 15.7 percentage points respectively (both p < 0.01).Footnote 29 In column (3), I remove anyone with a positive probability of being misclassified into the before and after cohorts. While this restriction precludes estimation on the age 45–49 sample (as there is no one who is definitely in the before cohort), the estimated placebo effect at ages 50–54 remains similarly large and statistically significant (16.1 percentage points, p < 0.01). In Panel B, I present the results when the sample is restricted to men and women with household income above the median (for their age and gender in the relevant year). These results are useful because high-income women would have had limited access to government payments at age 60 even if the reform had not occurred (and thus the pre-reform trends for this group may be especially likely to reflect counterfactual LFP trends at ages 60–64). The results are similar, with large and highly statistically significant placebo estimates ranging from 12.7 to 24.2 percentage points. Overall, this analysis suggests that Atalay and Barrett’s (2015) estimates are likely to be biased upwards.

Table 2 Placebo regression estimates on the pre-reform sample.

Using the pre-reform sample, I also examine whether single men are likely to be an appropriate control group for single women. Appendix Fig. B6 replicates Fig. 3 for this sample. I also present LFP rates from the Labor Force Survey for unmarried men and women, since the Labor Force Survey does not report LFP rates by single status. While ‘unmarried’ is an imperfect proxy for single status, this data provides a continuous time series over a longer period (and allows the cohorts in Atalay and Barrett’s (2015) sample to be matched exactly). Evidently, the relevant cohorts of single men and women had very different LFP trends before the reform. Single women’s LFP rates increased by approximately 15 percentage points at ages 45–49 and 50–54 between the 1982 and 1990 surveys, while there was little change in the LFP rates of single men at ages 45–49 and a decrease of 10 percentage points at ages 50–54. There are also strong gender-specific trends for the unmarried sample over the relevant years. Moreover, the gender-specific trends cannot be explained by the control variables in Atalay and Barrett’s (2015) regressions. For the pre-reform sample of single men and women, the placebo estimates are positive, large and mostly statistically significant (ranging from 16.2 to 24.6 percentage points, see Appendix Table B4). Thus, Atalay and Barrett’s (2015) estimates on single women are also likely to be biased upwards.

3.1 Discussion

It is worth making several points here. First, while the placebo estimates in Table 2 are larger than Atalay and Barrett’s (2015) estimates of women’s actual labor supply response at ages 60–64, women had much higher LFP rates at ages 45–49 and 50–54. Thus, in percentage terms, the placebo estimates imply a similar increase in female LFP to Atalay and Barrett’s (2015) estimates (37.0–55.6% compared to 35.3%).

Second, the probabilistic assignment of the after cohort means that the estimates in column 1 for the age 45–49 sample are particularly sensitive to differences in LFP rates between the 1982 survey (cohorts 1932–37) and subsequent surveys. In the 1982 survey, there is a 48.4% chance that individuals are in the after cohort, and this uncertainty magnifies the placebo estimate. A conservative way to address this is to assume that everyone in the 1982 survey is in the before cohort (i.e., by reducing the probability that these individuals are in the after cohort to 0). Appendix Table B5 shows that this adjustment reduces the placebo estimate by 44% to 13.7 percentage points. However, as the standard error also falls by the same percentage, the t-statistic is unchanged (6.3). Note that this approach may if anything be conservative because the aggregate data in Fig. 3a shows that female LFP rates are slightly higher for the 1932–37 cohorts than the 1930–35 cohorts, while the reverse is true for male LFP rates; this is evident by comparing the LFP rates in 1982 (cohorts 1932–37) and 1980 (cohorts 1930–35). Thus, if it were possible to exactly match the before cohort, it is likely that the placebo estimate would be similar to 13.7 percentage points.

Third, there is little change in the placebo estimates with the inclusion of controls. As shown in Appendix Table B6, the placebo estimates fall by less than 5% with controls. Hence, the control variables in Atalay and Barrett’s (2015) regressions explain only a small fraction of the different trends of men and women prior to the reform.Footnote 30

Fourth, as Australia experienced recessions in 1982 and 1990, a possible concern is that the placebo estimates may be affected by macroeconomic shocks. However, the aggregate time series data in Fig. 3a and b shows no evidence of any meaningful break in the LFP trends of the relevant cohorts in any of the three survey years. Moreover, Figs. 3 and 4 show that there is nothing specific about the 1982 to 1990 period. The aggregate data in Fig. 3a shows different trends in the LFP rates of men and women in the relevant cohorts both throughout and beyond these years, and Fig. 4b shows similar gender-specific trends for the relevant cohorts ten years earlier, during a different time period.

Fifth, as the pre-reform surveys are less frequent and contain age information in five-year groups rather than single years, it is not possible to exactly match the cohorts in Atalay and Barrett’s (2015) sample. In the pre-reform sample, the age 45–49 sample consists of individuals born from 1932 to 1945, and the age 50–54 sample consists of individuals born from 1927 to 1940. Atalay and Barrett’s (2015) sample consists of individuals born from 1929 to 1950. However, an exact match is possible in the aggregate data; between the vertical dashed lines in Fig. 3a and b, where the cohorts match those in Atalay and Barrett’s (2015) sample, men and women have opposing trends. As such, we would expect to find similar results if it were possible to exactly match the cohorts in Atalay and Barrett’s (2015) sample in the survey data.

Sixth, given the slightly different cohorts in Atalay and Barrett’s (2015) sample, controls may have a greater effect on their estimates. However, controls have a similarly minor impact on Atalay and Barrett’s (2015) estimates, reducing their baseline estimate by just 10% (see Appendix Table B8).

Finally, in Appendix A.3, I verify the robustness of the placebo estimates further. Specifically, I show that the estimates are only marginally affected by other limitations of the pre-reform surveys, which are discussed in detail by Siminski et al. (2003) and include (i) changes in the measurement of labor force status across surveys, (ii) seasonality in LFP rates, and (iii) the lack of individual-level age controls.

4 Re-estimating the impact on female participation

In this section, I re-estimate the reform’s impact on female LFP using a different approach. I follow the approach of several recent studies examining phased increases in pension-eligibility ages (Cribb et al., 2016; Oguzoglu et al., 2020; Staubli & Zweimüller, 2013). Specifically, I compare the LFP of women who are just below the Age Pension Age (APA) to those who are just above, with fixed effects for women’s age and cohort. The benefit of this approach is that it directly controls for differences in female LFP across cohorts, which, as shown above, cannot be explained by controls.

For women in the replication sample, I estimate regressions based on the following specification:

$$LF{P}_{i}=\beta {x}_{i}+\delta {\bf{1}}(ag{e}_{i}\,< \,AP{A}_{i})+FE\,{\_}age{\_}yr{s}_{i}+FE{\_}APA{\_}\,Cohor{t}_{i}+{u}_{i}$$
(2)

where FE_age_yrsi is a set of fixed effects for woman i’s age in years; FE_APA_Cohorti is a set of fixed effects for woman i’s APA cohort; 1(agei < APAi) is an indicator variable for woman i being below her APA; and xi includes the same variables as in Eq. (1) (except age). As age is only available in years, there is potential misclassification for some women of (i) the cohort fixed effects, FE_APA_Cohorti, and (ii) the key variable, 1(agei < APAi). I address this issue by replacing these variables with (i) the probabilities that woman i is in each APA cohort, Pr(APA_Cohorti), and (ii) the probability that woman i is below her (expected) APA, \(\Pr (ag{e}_{i}\,<\,{\mathbb{E}}[AP{A}_{i}])\).Footnote 31 In these regressions, δ estimates the effect on female LFP of women remaining below the APA because of the reform if the following assumption holds: different cohorts would have had identical age-related trends in LFP if not for the reform.

Figure 6 shows graphical evidence in support of this assumption; it shows the age-related trends in LFP for different cohorts of women. While there are considerable differences in LFP rates across cohorts, the trends with age are similar at ages 55–59, when all cohorts are below the APA. Moreover, in Appendix A.4, I show that — after controlling for age, cohort and individual factors — there is no evidence of any trend in female LFP based on the number of years until the APA (and no significant change at the APA).

Fig. 6
figure 6

Female labor force participation rates by age and cohort. Notes: This figure shows female labor force participation rates by age and cohort. I assign women to their most likely cohort. The sample comes from the eleven Income and Housing Costs Surveys used by Atalay and Barrett (2015) from 1994 to 2010.

Table 3 presents estimates of the treatment effect from Eq. (2). Column 1 shows a similar estimate to Atalay and Barrett’s (2015) preferred estimate when cohort controls are excluded (13.4 vs. 12.0 percentage points). However, when cohort controls are included, column 2 shows that the estimate falls to 4.1 percentage points and becomes statistically indistinguishable from zero at the 10% level (p = 0.109).

Table 3 Re-estimating the impact on female labor force participation.

Note that the cohort controls may partially or even fully absorb any wealth effects of the reform, the main mechanism offered by Atalay and Barrett (2015) to explain their estimates.Footnote 32 Specifically, the cohort controls will absorb any wealth effects on female LFP at ages where women’s pension eligibility is not affected, which, as discussed above, are likely to be small based on studies of similar reforms (Cribb et al., 2016; Geyer & Welteke, 2019; Staubli & Zweimüller, 2013).Footnote 33 However, the treatment effect may capture wealth effects at ages where women’s pension eligibility is affected. For example, the estimates are likely to capture the response of a women who decides to retire at age 63 rather than 60 because of the wealth effects of her APA increasing from 60 to 63. Given that changes in pension wealth may be particularly salient at ages where women’s pension eligibility is affected, the estimates may capture a sizable proportion of any wealth effects from the reform.

I also show that the estimates remain similar if I use time fixed effects rather than cohort controls in Eq. (2). Specifically, I add fixed effects for the quarter-year that the interview took place. This specification reduces measurement error (since women’s exact cohort is unknown) and potentially allows δ to better capture differences in LFP rates across cohorts stemming from wealth effects. Cribb et al. (2016) use a similar approach to examine the importance of wealth effects in their context. Appendix Table B10 presents the estimates. The estimates are similar and actually slightly smaller than the baseline estimates (3.9 percentage points, p = 0.075). The estimates are also similar with survey-wave or year fixed effects (not shown).

I also assess the robustness of the results to other concerns. First, I examine whether the lack of exact information on age causes attenuation in the effects. In this context, there are two sources of measurement error: (i) in the assignment of women’s APA cohort and (ii) in the assignment of whether women are below the APA. In Appendix Table B10, I examine the sensitivity of the estimates to each source of measurement error in columns 3–5. In column 3, I restrict the sample to women who can be classified as either above or below the APA with certainty, which reduces the sample size by 13%. The treatment effect remains similar and falls slightly compared to the baseline estimates (from 4.1 to 3.8 percentage points). In column 4, I restrict the sample to women who can be assigned to their APA cohort with certainty, which reduces the sample size much more, by 55%. The estimated treatment effect remains at 4.1 percentage points but becomes less precise. In column 5, I combine both restrictions, which reduces the sample size slightly more (56%). In this case the treatment effect falls to 3.1 percentage points. Thus, the estimates are relatively stable and there is little evidence that measurement error leads to attenuated estimates.

Second, since Atalay and Barrett (2015) interpret their LFP estimates as a decline in the probability of retirement, it is worth discussing under what circumstances the estimated effects on LFP are likely to reflect a change in retirement decisions. Based on the analysis in Section 3, one concern with using “not in the labor force” as a proxy for retirement is that women in earlier cohorts were more likely to have never worked (or dropped out of the labor force well before the APA), as documented in Figs. 3 and 4. This is problematic for Atalay and Barrett’s (2015) identification strategy and suggests that their estimates are more likely to reflect pre-existing differences in LFP rates across cohorts rather than changes in retirement decisions near the APA. In contrast, Eq. (2) relies on within-cohort changes in LFP around the APA. This variation is more likely to result from decisions to retire, given the low re-entry into the labor market at these ages.Footnote 34

Third, a further valid concern is reform endogeneity. There are two potential sources of endogeneity. First, it is clear that the reform was implemented in such a way that there was a strong positive correlation between women’s pre-reform rates of LFP and the size of the increase in APA. Unlike Atalay and Barrett’s (2015) estimates, which rely on cross-cohort comparisons, my preferred estimates account for this source of endogeneity by relying on within-cohort changes in female LFP around the APA. A second possible concern is that the reform may have been implemented in response to changes in social norms about the appropriate age to retire. This would imply that later cohorts would have retired later even if the reform had not occurred, biasing both estimates upwards. However, the trends in LFP rates prior to age 60 in Fig. 6 show little evidence that later cohorts systematically favored later retirement (conditional on their LFP rates at age 55). It also seems unlikely that my estimates are strongly biased upwards, given that (i) the 90% confidence interval for my estimate contains zero and (ii) economic theory suggests we should expect at least some increase in female LFP rates.

4.1 Discussion

The modest effects on female LFP are consistent with other evidence. First, the raw data shows little evidence of a drop in female LFP at the APA. Figure 7 plots the average LFP rate of women with respect to the number of years until their pension age.Footnote 35 While female LFP decreases with age, there is little evidence of a stronger drop at the APA. Moreover, Figure 4c of Morris (2019) shows a near-identical pattern in female LFP based on longitudinal survey data from 2001 to 2015 that contains precise information on birthdates and survey dates.Footnote 36 Second, with this data, Morris (2019) finds an increase in female LFP of similar magnitude (3.1 percentage points, p < 0.1) to the preferred estimate in Table 3 (4.1 percentage points). Although the two estimates correspond to the average impact of delayed pension eligibility on slightly different birth cohorts at slightly different ages, both estimates suggest that the reform had modest effects. Third, these impacts are consistent with similarly modest changes in male LFP at the APA. Using the same data as Morris (2019), Appendix Fig. B7 shows that the decrease in male LFP from ages 64 to 65 is only 5.5 percentage points larger than the decreases from 62 to 63 and 63 to 64.Footnote 37 This decline is smaller than the decline observed at the ERA in most countries.

Fig. 7
figure 7

Female labor force participation around the pension age. Notes: This figure shows the average labor force participation of women by the number of years to their pension age. I assign women’s number of years to pension age based on their most likely group, assuming birthdates are uniformly distributed within the possible 15-month windows. The sample comes from the eleven Income and Housing Costs Surveys used by Atalay and Barrett (2015) from 1994 to 2010.

Overall, the estimates suggest that the Australian reform had a smaller impact on female LFP than similar reforms in other countries. For example, studies of similar reforms in the U.K. (Cribb et al., 2016), Austria (Staubli & Zweimüller, 2013) and Germany (Geyer et al., 2020; Geyer & Welteke, 2021) estimate increases in female employment of 6.4 to 14.4 percentage points. The smaller impacts found here may partly result from the fact that around one third of women do not satisfy the means test — Atalay and Barrett (2015) estimate this proportion to be 37.5%. If we assume that all of the increase in LFP occurs among women who satisfy the means test, the estimated effect on LFP increases from 4.1 to 6.6 percentage points, similar to Cribb et al.’s (2016) estimate of 6.4 percentage points for the impact of raising women’s State Pension Age from 60 to 62. However, it is worth remembering that my preferred estimate is statistically indistinguishable from zero at the 10% level.

It is somewhat surprising that the Australian reform did not have a larger impact given that the means test discourages work beyond the APA. This lies in contrast to the reform examined by Cribb et al. (2016) in which there is no income test for the pension and thus a much weaker incentive to retire at the State Pension Age. While the absence of an accrual effect is one explanation for the modest effects, it may not be the most important one. Relative to other reforms, the Australian reform targeted a group of women with much weaker attachment to the labor market and a very high reliance on other government transfers. As shown by several studies (Atalay and Barrett, 2015; Morris, 2019; Oguzoglu et al., 2020), many women extended their receipt of other means-tested government transfers in response to the reform. Recent estimates by Oguzoglu et al. (2020) with administrative data emphasize how large these responses were; Oguzoglu et al.’s (2020) estimates indicate that 76% of women who were delayed from receiving the Age Pension ended up receiving another government payment instead. These spillovers are much larger than have been found in other countries and help explain the relatively modest labor supply responses.

5 Conclusion

In this paper, I replicate Atalay and Barrett’s (2015) labor supply estimates and extend their analysis. Atalay and Barrett (2015) estimate the effects of an Australian reform in 1994 that gradually raised women’s pension-eligibility age from 60 to 65. Using a differences-in-differences design in which male cohorts form the comparison group, Atalay and Barrett (2015) estimate that each one-year increase in women’s eligibility age increased female labor force participation by 12 percentage points. While I am able to successfully replicate this estimate, I show that the underlying parallel-trends assumption did not hold before the reform for the relevant cohorts. Specifically, there is a strong upward trend in female participation rates and a weak downward trend in male participation rates. After accounting for these gender-specific trends, I find that the estimated increase in female labor force participation falls by two-thirds to 4.1 percentage points and becomes statistically indistinguishable from zero at the 10% level. This suggests that the Australian reform had a smaller impact on female labor supply than estimates of similar reforms in other countries.

More generally, the results have implications for other studies examining the impact of retirement reforms on female labor supply or other (potentially) interdependent outcomes such as welfare receipt and health. This paper demonstrates the importance of cross-cohort trends in female labor supply in the Australian context, and given the widespread increase in female labor supply over recent decades, similar trends may also exist in other countries. Hence, studies evaluating the impact of retirement reforms on female outcomes should examine trends across cohorts before these reforms and control for them if necessary.