Skipping the doctor: evidence from a case with extended self-certification of paid sick leave

This paper examines the impact of a policy reform in a municipality in Norway that extended to workers the right to self-certify sickness absence from work. After the reform, workers were no longer obliged to obtain a certificate from a physician to receive sickness benefits. They could call in sick directly to their line leader and had to engage in a counselling program organized by the employer. To estimate the effect of this reform, we contrast the change in sickness absence among employees who were granted the extended right to self-certify absence with absence among employees who had to obtain a physician’s certificate to be entitled to sickness benefits. We use both a standard difference-in-differences method and the synthetic control method to estimate the effect of the reform. We can rule out large positive effects on absence after the reform, with strong evidence that the policy change actually resulted in a reduction in absence for female workers.


Introduction
Paid sick leave is an important insurance, allowing workers to smooth consumption over transitory negative health shocks. However, sickness benefit programs can, like any other insurance, be misused; employees who are fit to attend work may call in sick or may request benefits for longer periods than their health status calls for (Henrekson and Persson 2004;Hesselius et al. 2013;Dionne and St-Michel 1991). To reduce moral hazard, both welfare states and private insurers may require a medical certificate from a physician to verify sickness (OECD 2010). Physicians are used as gatekeepers to prevent illegitimate claims of paid sick leave. This is a costly practice, and it is not clear how well it works.
This paper looks at the effect of a reform that removed the physician as a sickness certifier. In 2008, one municipality in Norway allowed all workers employed in the municipal sector to self-declare health-related absence for a whole year, which is the maximum entitlement period for temporary sickness benefits in Norway. The municipal workers were free to self-report sickness absence, but had to report regularly about their health and work capacity to their line leader. In other municipalities, and in the reform municipality prior to the change, the rule was that workers could self-declare periods of absence shorter than 9 days. For longer periods, they needed a medical certificate to obtain benefits. Normally, physicians also play a role in dialogue meetings (counselling) between the employer and the worker who is on sick leave. The reform naturally relegated the physician also from that scene. Instead, the employer took a more direct and active role in the counselling of sick-listed workers.
If workers' demand for sickness absence were unaffected by the reform, removing the requirement of a medical certificate would increase absenteeism. By how much depends on how strict physicians are as gatekeepers. However, the reform may reduce workers' demand for absence. Workers' demand for sickness absence depends on their health, on the replacement rate of the benefits and on the utility difference between staying home and attending work. Allowing workers to selfcertify absence will probably not directly affect the health of the workers, nor will it change the replacement rate. The reform can, however, influence the nonpecuniary costs of calling in sick.
The reform that we study here was announced (by the municipality) as the "Trust Project." Recent research in behavioral economics has shown that many individuals want to return (reciprocate) the treatment they receive from others. For example, employers who show distrust towards their employees by imposing excessive control mechanisms may induce misbehavior (Ellingsen and Johannesson 2008;Falk and Kosfeld 2006). According to this logic, allowing self-certification of absence can foster greater loyalty and motivation among employees, which in turn may lower the demand for paid sick leave. In addition, frequent meetings with the employer (the line leader) while sick, without having the physician as a counselor, may also reduce the intrinsic utility (increase the hassle) of asking for a sick leave.
Our data contains registered sickness absence for all workers in Norway during 2001 to 2014. To assess the impact of the reform on sickness absence, we compare absence among municipal workers in the reform municipality (Mandal) before and after the reform with the change in absence among municipal employees working in all other Norwegian municipalities. That is, we apply differencein-differences (DD) logic, with Mandal as the treated unit and the other municipalities as controls, to assess the effect of the reform.
We also consider the same model for employees who work in the private sector or for the central government as a placebo exercise. This helps us rule out the possibility of other contemporary shocks that may have affected Mandal. In addition, we use the synthetic control method to construct a control that better resembles the treated unit (Abadie and Gardeazabal 2003;Abadie et al. 2010).
We can rule out large positive effects on absence in Mandal in the post-reform period, with strong evidence that the policy change actually resulted in a reduction in absence for female workers. One of the stylized facts regarding sickness absence is that in almost every country, women tend to have higher absence levels than men (Mastekaasa and Melsom 2014). It is interesting that extending self-certification of sick leave reduces the gender gap in absence. Since the female share of the labor force is around 80% in Mandal's municipal sector (which is in line with the rest of the municipal sector), the effect of the reform is far stronger than if the same change in behavior had occurred among the male workers. We also find that the number of spells decreased, while the average length of the remaining spells increased. This suggests that the reform had a larger impact in preventing relatively shorter absence spells. Our main finding stands in stark contrast to the result from a Swedish experiment where a random sample of workers was granted 1 week of extra self-certification of sickness absence (Hesselius et al. 2009). We discuss potential reasons for these opposing results at the end of the paper.
Although the main ingredient of the Mandal reform is that municipal workers are granted the right to self-certify absence, it also contains other elements, such as a stronger involvement in the sickness absence counselling by the employer. We do not have data to disentangle the importance of the different elements of the reform, so our results should be seen as the aggregate effects of all these elements.
The main contribution of this paper is to use quasi-experimental data to estimate the effect of a policy that, at its core, excludes a costly procedure for constraining moral hazard in paid sick leave. Back-of-the-envelope calculations suggest that Norwegian primary physicians spend as much as 10-15% of their working time on sickness absence certification. 1 Our finding that sickness certification can be taken out of the hands of the physicians without a subsequent rise in sickness absence should be of considerably policy relevance.
The paper unfolds as follows. "Section 2" provides a brief introduction to the sickness insurance system in Norway and a description of the reform. "Section 3" presents a conceptual framework for analyzing the relation between medical health certificates, gatekeeping and sickness absence, and a discussion of how our paper relates to relevant literature. "Section 4" describes our data, while Section 5 presents our empirical setup. The results are reported in Section 6, along with some suggestions to their channels in Section 7. Section 8 discusses our findings and concludes the paper.

Sickness benefits and sickness absence in Norway
Sickness insurance is mandatory in Norway and covers all workers employed for more than 4 weeks. The wage compensation ratio is 100% from day one for a maximum period of 1 year. 2 The employer pays sickness benefits for the first 16 days; thereafter, the benefits are financed by the National Insurance Administration (NAV) for a maximum of 50 weeks. Municipal workers do not need a medical certificate for sickness spells lasting less than 9 days. Periods of 9 days or longer require a medical certificate, usually from a general practitioner, and for more than 8 weeks an expanded certificate is required.
The level of sickness absence is high in Norway, around 7% of contracted work hours are lost because of sickness absence (certified by a medical doctor). Around 80% of the total absence days comes from periods remunerated by NAV (lasting more than 16 days), which we define as long-term absence in the present paper. The public expenditures associated with sickness absence are in the order of 2.5% of GDP. Individuals who obtain long-term absence certificates have a high risk of never returning to ordinary work and be transferred to permanent benefits (Markussen et al. 2012).
Short-term absence in Norway is remarkably stable over time and across individual characteristics. Long-term absence, on the other hand, varies substantially over the business cycle (Askildsen et al. 2005) and across gender, age, education, and occupation (Mastekaasa 2015). The majority of sick leave certificates from doctors classify the health issues as diffuse and subjective health problems; mental disorders and muscle-skeletal symptoms accounted for about 60% of the cases in 2012. As the term "diffuse diagnoses" suggests, these are cases that cannot be objectively verified by the physician and it is difficult to prescribe evidence-based treatment. Diffuse diagnoses dominate in the long-term spells. Diagnoses that are easily verifiable, e.g., cancer and cardiovascular diseases, play only a limited role. Cardiovascular diseases, for example, accounts for only 5% of the absence days. Short-term absence also contains diagnoses that are difficult to verify (chronic pain, etc.). But for shorter spells, uncomplicated and observable diagnoses (airways infections, etc.) makes up a larger share than for the long-term spells.
The difference between short-and long-term absence suggests that it is long-term sickness absence that will be most influenced by the sick-listed's own judgments; in particular when it comes to length of spells (Mastekaasa 2015). Hence, moral hazard is likely to be most relevant for long-term absence.
A noticeable pattern in the sickness absence in most developed countries is that women have considerably and persistently higher absence rates than men (Avdic and Johansson 2017). As for Norway, the average number of sick days per year is now 60-70% higher for female than for male workers. Pregnancy and other biological differences explain only some of the gap, and do not offer explanations to the increasing gender difference in absence over the last 50 years. Before the 1980s, the rate of sickness absence was more or less equal for men and women. To explain the increasing gender gap, most research therefore look to the women's advent of the labor market and the corresponding change from single to dual earner families. Empirical evidence from Norway is rather inconclusive, however. Based on EU Labor Force Surveys from 1983 to 2011,  finds no support for increasing representation in the workforce of mothers of small children. Based on administrative register data, the "double burden" hypothesis (women have the main responsibility for the household production also when they work in labor market) is tested but rejected in Cools et al. (2017). 3 Furthermore,  finds no support for occupational segregation (women work in high absence occupations) as explanation of the increasing difference, while he finds some support for changing composition of the female labor force (more women with health problems or with lower job motivation).
Finally, the observed gap in sickness absence may be explained by gender differences in health-related behavior, preferences and norms. Women may be more concerned about health and/or be less devoted to their job and career and therefore have lower threshold for reporting sick. Alternatively, they may be more susceptible for influence from local absence culture. Even with the rich Norwegian register data, hypotheses like these are notoriously hard to test. An attempt is found in Hauge et al. (2015), who combine survey and register data from the city of Oslo. They do find gender differences in relevant attitudes, norms and preferences, but not of a size that manage to explain the huge differences in sickness absence.
Summing up, previous research has given several explanations to aspects and elements which do not explain the persistent and even increasing gender gap in the sickness absence. Knowledge about the actual causes is still lacking, however. Nevertheless, it is important to determine whether men and women respond differently to the Mandal reform. Approximately 80% of the employees in the municipal sector are women, and the gender gap in the sickness absence is as large here as in other sectors. Hence, from the view of the municipality as an employer, the degree of success depends critically on female response to the reform.

The reform: extended self-certification of sickness absence in Mandal
In 2014, there were 428 municipalities in Norway. They are all responsible for producing the same services: compulsory education (until the 10th grade), outpatient health services, senior citizen services, and maintenance of the road infrastructure within a municipality. In 2012, 23% of the total workforce was employed by municipalities. The vast majority (about 75%) of the municipal workers are women. Although they all serve the same functions, municipalities vary widely in size. The smallest has fewer than 300 inhabitants and the largest more than 600,000. In 2012, Mandal-the reform municipality-had 15,000 inhabitants and 1200 employees (around 900 workers in full time positions), which is slightly above the average municipality size in Norway.
Historically, the level of sickness absence for municipal employees in Mandal has been around the average for the sector in Norway. During the last decade, several municipalities-and firms more generally-have experimented with various local reforms to reduce sickness absence. This is also the case for the municipality of Mandal; at the end of 2003, it launched the so-called "presence project" to reduce sickness absence among municipal workers. From this project grew an initiative directed to the Ministry of Labour, requesting permission to "bypass" the physician as a sickness absence certifier. The suggestion was to allow municipal employees to self-certify their sickness absence for the entire benefit period (1 year).
Regarding the reform we consider here, the municipal administration predicted that the employees would respond positively to extended trust and counselling in relation to sickness absence. The administrative leadership in Mandal worked out a detailed plan for how lower level leaders should follow-up workers who self-certified sickness absence. The idea was that a stronger involvement from the employees' line managers would substitute for the physician's involvement and advice. For shorter spells, leaders were instructed to call the absentees (after 3 days and after 8 days). For longer periods, the leaders were instructed to initiate a number of different meetings for individual counselling and follow-up plans, and to also regularly contact the absentee, and send cards, and flowers etc. A hierarchical system of email-based action reminders among the leaders guaranteed that the follow-up plan was implemented.
The application of the system with extended self-certification of sickness absence was approved by the Ministry in June 2007. The "Trust Project" (with a handshake as the official logo) was officially launched on July 1, 2007. With this, Mandal became the only municipality-and firm-in Norway with a permission to operate with a sickness insurance scheme that made the medical certificate from a physician optional for the full length of the sickness period. After some months of piloting a web-based system of self-reported absence was in place in May 2008. At the end of 2008, almost 90% of all sickness absence was self-reported.

Demand for health-related work absence
Sickness absence insurance allows workers to be absent from work and receive benefits in periods when their health temporarily drops under the level that is required for them to perform their work tasks. Health is a multidimensional and complex entity, but for our purpose here it can be represented by a scalar h, with higher h indicating better health. The implicit sickness absence insurance agreement is that if h drops below h 0 , the worker is unable to do his or her job and it is legitimate to call in sick.
Workers request (demand) for sickness absence depends on several factors: (i) the health condition of the worker, (ii) the replacement rate of the sickness benefit scheme, (iii) the costs associated with obtaining a permit to be absent and receive benefits, and (iv) the non-pecuniary utility difference between being sick absent at home and being at work.
In this context, moral hazard is the potential problem that workers, who are fit to work (h > h 0 ), demand sick absence. The standard way to constrain this problem is to have a system where workers must get a medical certificate from a physician in order to get paid sick leave. The idea is that doctors will screen those who request sick leave and deny a medical certificate to those who are healthy enough to do their job. This is optimistic. It is often difficult to diagnose a patient, and the health status that separates legitimate sickness absence from illegitimate absence is open for interpretation. In addition, it is not obvious that doctors will act as gatekeepers to welfare benefits. Many physicians consider themselves as their patients' advocate; requests for sick leave certificates might then be difficult to deny (Svärdsudd and Englund 2000;Carlsen et al. 2020;Markussen et al. 2013). Their own economic interests may also weaken the role of general physicians (GPs) as gatekeepers, as they may lose patients if they decline requests for a sickness absence certificate.
We consider what might happen to overall sickness absence when a system with doctor certification is replaced by a system where workers can self-certify their absences. In this case, workers would instead have to enter into a counselling relationship with their line leader at the workplace.
If the reform leaves the demand for absence unchanged, there will be an increase in absenteeism after self-certification is introduced. By how much depends on the magnitude of the moral hazard problem, and by how lenient physicians are as gatekeepers. There are, however, good reasons to expect that the reform will change the demand for absence. Self-certification means that workers can skip the trip to the doctor and avoid the costs associated with obtaining a sick leave certificate. This change would increase demand for sick leave. Several other aspects of the self-certification reform, however, may reduce the demand for sick leave.
An employer-initiated reform that allows workers to self-declare health issues that reduce their work capacity signifies both generosity and trust. Indeed, the reform in Mandal was branded as "The Trust Project." Workers may therefore feel more guilt if they call in sick, or they may have a higher intrinsic utility of attending work, after having been granted permission to self-declare sickness absence. 4 Another reason why the reform may reduce demand for sickness absence is that the arrangement implies frequent and direct consultations with the employer. In these meetings, the physician is no longer the mediator between the employer and the employee in dialogue meetings where the absentee, the absence certifier and the employer discuss adjustment that could be carried out at the workplace to make full or partial work resumption possible. With no certifier, there is only a direct dialogue between the absentee and the employer. Direct counselling and activation, not having the medical doctor as the patients' advocate, may also reduce the intrinsic utility of staying home with a sick leave.
A third reason for lower absence is lost legitimacy. Workers with diffuse symptoms may themselves be uncertain whether they are unfit for work or not. Some of these workers will probably feel that a medical certificate relieves them from some of the guilt and remorse that comes with calling in sick, given their health status. In a regime with self-certification of absence, the legitimacy of the medical certificate disappears and this may lower demand for absence.
Lower demand for absence will affect both the extensive and the intensive margin, both the number of spells that are realized and the length of those that are realized. It is likely that a negative shift in demand will reduce the length of the sick absence spells that are realized. Lower demand will also relegate spells for which the net utility of demanding sick absence leave was just above zero. If these marginal spells are among the shorter spells, the effect on the extensive margin implies that we should observe longer average spells when demand for sick absence declines.
Theoretically it is, therefore, ambiguous how a reform granting workers the right to self-certify sickness absence will affect the absence level. For a given demand for sickness absence, skipping the physician as a sickness certifier will increase absence, but, as we have argued, the demand for sick leave may fall because of extended self-certification and employer involvement. It is also unclear what we should expect with respect to the length of the spells.

Data
Our unit of observation is yearly sickness absence at the municipality level, broken down by sector (municipal employees and others (private and central government)), by gender, and by four age intervals, [16 − 39], [40 − 49], [50 − 59], and [60 − 69]. We have data from 2001 until 2014, implying 7 years of observations before and 7 years of observations after the reform.
Sickness absence rate (percent) is defined as the number of aggregated sick days (for the respective sector and gender) divided by the corresponding sum of contractual workdays. The latter is defined as the number of days a person has agreed to work for his employer in a given period, adjusted for fraction of employment, weekends and public holidays. 5 (This is the common way of reporting sickness absence by Statistics Norway.) Every sick leave is counted separately and included in the aggregate measures whether they are multiple leaves from the same individual or not.
Our analysis is based on absence spells that are longer than 16 days. We use long spells primarily because of data reliability. The data are obtained from the National Insurance Administration (NAV). Employers (municipalities in our case) pay for absence spells that are shorter than 17 days. The government takes over the financial responsibility after 16 days. To be reimbursed for absence benefits that extend 16 days, employers must report all the absence spells that last longer than this to NAV. Hence, it is in the economic self-interest of the employer to report long-term sickness absence to NAV. For short-term periods only absence certified by a medical doctor is reported to NAV. Hence, if we were to use data on short-term spells that are reported to NAV, absence in Mandal would drop mechanically, simply because only physician certified absence is recorded in the NAV data.
We do not consider our focus on long-term spells to be a serious limitation of our study. As explained above, but perhaps contrary to the conventional wisdom, moral hazard problems appear to be especially relevant for long-term absence. In addition, since we are concerned with the number of working days that are lost because of illness, long-term spells dominate. Among municipal employees, around 80% of the contracted workdays that are lost because of sickness absence come from periods that extend beyond 16 days. Furthermore, all municipal workers in Mandal, and in other municipalities in Norway, could already self-certify spells shorter than 9 days before 2007. Hence, to the extent that Mandal reform had an effect on short-term spells, it could only apply for spells of a duration in the interval of 9-16 days.
We present descriptive statistics from Mandal and other municipalities in Appendix Table 3.

Empirical setup
We rely on a difference-in-differences (DD) design to estimate the effects of the reform, where we contrast measures of absence between Mandal and other municipalities before and after the reform. More specifically, we construct a balanced municipality-year panel dataset with all municipalities in Norway, and estimate the following equation 6 : where y it is an outcome variable for municipality i in year t, α i are municipality fixed effect, and β t are year fixed effects. The variable DD it is an indicator variable equal to one for the treated municipality Mandal in the post-reform period, and 0 otherwise. Note that the indicator variables for the post-reform period and for the treated municipality are absorbed by the fixed effects. In our main specifications, the outcome variable is the percentage of working days lost because of health-related absence for municipal workers. We also consider the effect of the reform on the length of absence spells and the use of graded sickness absence (where the workers are partly at work).
The identifying assumption in this DD framework is that the average absence among all municipal workers in Norway (except Mandal) and in the treated municipality (Mandal) would have followed parallel trends in the absence of the reform. As we discuss in details in "Section 6.1.1," such assumption seems plausible when we consider 2004-2007 as pre-treatment periods. Therefore, we focus on estimation of Eq. (1) using data from 2004 to 2014.
As a robustness check, we also estimate the same model for employees who work in the private sector or for the central government. Since those workers were not directly affected by the reform, we should not expect to find significant effects for them if the identification assumption is valid. Therefore, considering the effects on those workers is informative about potential contemporaneous Mandal shocks to sickness absence that could invalidate our main results. We also consider an alternative DD specification in which we compare municipal and non-municipal workers within Mandal.
Finally, we also estimate the effect of the reform with a more tailored-fit control group of municipalities, using the synthetic control method, SCM, developed by Abadie and Gardeazabal (2003) and Abadie et al. (2010). The essence of this method is to use the pre-reform period to construct a synthetic control unit ("Synthetic Mandal")-a convex combination of potential control municipalities-that resembles the treated unit along the dimensions that are important predictors for sickness absence in the post-reform period. Intuitively, the idea of the method is to construct a comparison unit that is affected by potential common shocks in the same way as the treated unit, relaxing the assumption on parallel trends (see Abadie et al. 2010). (1)

Validation of the DD assumptions
We start presenting a graphical evidence to check the validity of the parallel trends assumption. Figure 1 plots long-term sickness absence in Mandal and in the control municipalities, using yearly data. Since absence levels are in general higher for women than for men, and since around 80% of the workers in the municipal sector are women, we also provide a separate plot for female workers. Figure 1 depicts yearly averages and the vertical line indicates the time of the reform.
If we look first at the pre-reform development in absence rates, there is a visible drop in absence between 2003 and 2004, both in Mandal and in the average of the other municipalities. This drop came in the wake of a major nationwide reform in the absence certification regulation that was implemented in July 2004. The 2004 reform and its effects on absence are discussed in Markussen et al. (2012). The 2004 drop seems larger for Mandal than for the other municipalities. This suggests that the effects of the reform were potentially heterogeneous across municipalities, with stronger effects for Mandal. From 2004 to 2007, however, the difference between Mandal and the control municipalities remain stable, suggesting that the parallel trends assumption is reasonable if we do not include pre-2004 data. Considering the information after 2007, this graphical evidence suggests that this gap between Mandal and the control municipalities widens just after the reform. Such effect seems particularly large for female workers.
In order to provide more evidence on the validity of the DD assumptions, Table 1 tabulates the before and after mean values for some key variables. Comparing levels before the reform (2004)(2005)(2006)(2007) and after the reform (2008-2014), Table 1 shows that Mandal moves roughly in tandem with the average of all other municipalities on all variables, with one important exception; sickness absence. There is a large drop in absence in Mandal, especially for women, while there is no such drop in the average long-term absence for all other municipalities. We can also see that the length of the spells increases by 14% in Mandal while there is a small drop in other municipalities. While the number of female employees and the number of female contracted workdays increased slightly more in Mandal when compared to other municipalities, these differential changes are not statistically significant. The differential change in unemployment is also not statistically significant. 7 0 1 2 3 4 5 6 7 8 9 10 11 12 Sick absence (% of contracted work days) 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 year Mandal Other municipalities All workers 0 1 2 3 4 5 6 7 8 9 10 11 12 Sick absence (% of contracted work days)

Regression results
To quantify the effects depicted in Fig. 1, we estimate Eq. (1). From the discussion in "Section 6.1.1," we consider 2004-2007 as the pre-treatment periods, and 2008-2014 as post-treatment periods. In Appendix B, we formalize the conditions in which the DD estimator is valid once we exclude the years before 2004. From column 1 of Table 2, we estimate a 13% drop in absence relative to the baseline. Note that cluster robust standard errors at the municipality level are not reliable when we have only one treated cluster (e.g., Conley and Taber (2011)). Indeed, the inference assessment proposed by Ferman (2019a) indicates that we should expect over-rejections at the order of 91% for a 5%-level test if we rely on cluster robust standard errors at the municipality level. Therefore, in order to evaluate whether such effect is statistically different from zero, we estimate standard errors, and calculate p-values and confidence intervals, using the method proposed by Ferman and Pinto (2019). This method is an extension of the inference method proposed by Conley and Taber (2011), and is suited for settings with only one treated municipality when there is heteroskedasticity generated from variation in municipality sizes. This method allows for unrestricted serial Table 1 Descriptive statistics, before (2004)(2005)(2006)(2007) and after (2008)(2009)(2010)(2011)(2012)(2013)(2014) the reform. Mandal municipality (treated) and other municipalities (control) Table 1 compares the municipal sector in Mandal to all other municipal sectors inNorway before and after the reform. The numbers are averaged over the respectivetime intervals (2004-2007 and 2008-2014). Sickness absence (% of workdays) is the fraction of workdays lost as a percentage of the contracted workdays, countingonly periods > 16 days. The average length of absence is found by dividing the numberof absence days by sickness spells during a year. The percentage of graded spellsis the fraction of all spells during which the worker is partly at work and partly onsick leave. Female employees refer to female workers in the municipal sector.

Mandal
Other municipalities (N = 440) correlation in the errors (a problem well documented by Bertrand et al. (2004) for DD designs), and, since we have only one treated municipality, it also even allows for some kinds of spatial correlation (see Ferman (2020)). We present in the Appendix C more details on the implementation of this inference method. When we consider the full sample of workers, the p value of a test that the effect of the reform is zero is equal to 0.125. While we are not able to reject the null hypothesis in this case at standard significance levels, note that we can rule out large positive effects of the reform, providing evidence that the reform did not lead to a large increase in absences. The upper bound of our 90% confidence interval would imply an increase of less than 1% in the absence rate relative to the baseline.
We consider the estimated effects separately for men and women in columns 2 and 3 of Table 2. We find a small and non-significant effect for men, but a large and significant reduction in absence for women (p-value = 0.026). This effect for women is also economically meaningful, representing a drop of 17% in absence. Our data do not allow an investigation into the causal mechanisms behind the observed gender differences. Still, it is interesting that our results suggest that the policy reduced the gender gap in absence rates. Moreover, since the female share of the labor force is around 80% in Mandal municipality (which is in line with the rest of the municipal sector), it is particularly important to estimate the effects on female workers to assess the impact of such policy. All results remain virtually the same if we include unemployment rate as a covariate in the DD regression, as presented in Appendix Table 4.
In columns 4 to 6 of Table 2, we consider a placebo exercise, where we estimate the effects on workers who live in Mandal, but are employed in other sectors. Since these workers were not directly affected by the reform, we should not expect to find significant effects. Both unconditionally and when we separate by gender, we find non-significant and economically small estimated effects in these placebo regressions (the p-values are always greater than 0.60). We present in Appendix Figure 4 the trajectories of absence rates for non-municipal workers in Mandal and in other municipalities. We also present in Appendix Table 5 and Appendix Figure 5 the results from a DD model comparing municipal and nonmunicipal workers in Mandal, before and after the reform. All results are remarkably similar to the findings presented in columns 1 to 3 in Table 2. Overall, all these results reiterate that the effects we estimate in columns 1 to 3 are not capturing shocks specific to Mandal other than the 2008 reform, giving us confidence that our main results are not driven by municipality level unobserved variables that are not included in the DD model.
We also consider in Appendix Table 6 the changes in the gap between Mandal and other municipalities after the 2004 national reform, contrasting data from 2001 to 2003 with data from 2004 to 2007. Consistently with the graphical inspection from Fig. 1, we find a reduction in absence, which could indicate that Mandal was differentially affected by the reform, although these effects are not statistically significant at standard levels. Interestingly, we find that the point estimates on the effects of the 2004 national reform are very similar for workers in the municipal sector and for workers in other sectors, which is consistent with the fact that the national reform affected both types of workers alike. This makes us even more confident that, if we had relevant time-varying unobservables that differentially affected Mandal, then the placebo exercise presented in columns 4 to 6 in Table 2 would have captured that. While the effects we estimate for the 2004 national reform are not statistically significant, the point estimates are relatively large, so we still consider that the DD estimates are more reliable when we exclude the 2001-2003 data from the main analysis of the Mandal reform. Including 2001-2003 data in the main DD analysis would lead to stronger negative estimated effects of the Mandal 2008 reform, which would be statistically significant even when we consider the full sample.
We also test whether the pre-trends from 2004 to 2007 where different between Mandal and other municipalities by estimating a DD model with those periods, and including a placebo dummy equal to one for Mandal after 2005. The point estimates are very small, and the p-values are very large, ranging from 0.55 to 0.77, providing further evidence in favor of our identification assumption that Mandal and other municipalities would have followed parallel trends from 2004 to 2014 in the absence of the reform.
Finally, a potential concern is that other municipalities may have implemented other reforms in the same period to reduce absence rates. If this is the case, and if these reforms were effective in reducing absence rates, then this would bias our DD estimator in the direction of finding an increase in absence rates in Mandal. In this case, our DD estimator would provide an upper bound on the effects of Mandal's reform, making our evidence even more convincing that the reform did not imply in large increases in absence rates. This same rationale is valid for the SC analysis we present next.

Synthetic control
This section assesses the effects of the reform using the synthetic control method (SCM), which was developed by Abadie and Gardeazabal (2003) and Abadie et al. (2010) for settings with aggregate data in which a single unit is treated. The SCM applies information from the pre-reform periods to construct a synthetic Mandal, namely a convex combination of the control municipalities that best resembles the trajectory of the treated unit-Mandal-prior to the reform. Following Ferman et al. (2020), in order to avoid specification searching in the choice of predictor variables, we use the outcome of all pre-treatment periods as predictors. In this case, we choose weights for the control municipalities, w = (w 1 , w 2 ,..w N ), that minimize the root mean squared prediction error (RMSPE) between the weighted control unit and the treatment unit over the pre-treatment period for (a weighted) combination of sickness-absence predictors and pre-treatment levels of the outcome variable (sickness absence). These weights are restricted to be non-negative and sum one. Following Ferman and Pinto (2021), we demean the data using the pre-treatment periods before estimating the SC weights, to adjust for possible bias due to discrepancies in levels between the treated and the synthetic municipality in the pre-treatment periods.
Contrary to the analysis in 'Section 6.1," where we considered only 2004-2007 as pre-treatment periods, we consider 2001-2007 as pre-treatment periods. The reason is that the SC estimator is well suited to take into account changes in parallel trends as the one considered after the 2004 national reform. More specifically, we believe the trends between Mandal and the average of the other municipalities are not parallel because Mandal was differentially affected by the 2004 national reform. What the SC estimator aims to do in this case is to consider a weighted average of the control municipalities that was affected by the 2004 national reform in the same way as Mandal. We present this idea more formally in Appendix B. While the number of pre-treatment periods in our setting is not very large, we should expect the SC estimator to have a lower bias relative to DD if there were other common shocks after 2008 that differentially affected Mandal. 8 We also consider the SC estimator considering only 2004-2007 as pre-treatment periods.
We present in Fig. 2 a comparison between Mandal and the synthetic Mandal. This figure shows a very good fit in the pre-treatment periods, followed by a large drop in absence after the reform, particularly when we consider female workers. When we consider the average effects across all post-treatment periods using the SC estimator, we find a point estimate of − 0.961 for the full sample, which is remarkably similar to our DD estimate when we consider 2004-2007 as pre-treatment periods (− 0.949). 9 The estimates are also very similar when we consider the effects for women (− 1.620 using SC vs. − 1.348 using DD) and for men (− 0.294 using SC vs. − 0.043 using DD). In all cases, the SC point estimate is within the 90% confidence interval of the DD estimator. Overall, these results suggest that the parallel trends assumption we consider in the DD analysis is reasonable once we restrict the sample to 2004-2014. In contrast, if we considered the DD estimator using all periods (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014), then the DD estimates would be larger than the SC estimates (in absolute values), raising concerns about the validity of the DD assumptions. We present in Appendix Table 7 the weights assigned to the municipality the received the largest weights, for each of these SC estimates.
To examine the uncertainty of the synthetic control estimates, Abadie et al. (2010) suggest comparing the effect of the real reform with placebo reforms in all the control units (compare the synthetic x and x, where x is a municipality that did not extend self-certification of sickness absence). We present in Appendix Figure 6 the differences between Mandal and synthetic Mandal, compared to the differences between the placebos and their synthetic controls. In order to take into account that the pre-treatment fit might vary depending on the placebo municipality, they construct a test statistic that is given by the post-pre ratio of the RMSPE. If this test statistic assumes an extreme value for the treated municipality, then this would indicate that we should reject the null that the reform had no effect. We present in Appendix Figure 7 the distribution of post-pre RMSPE ratios. We find a p-value of 0.028 when we do not restrict by gender, and a p-value of 0.052 when we consider the results for women. While these results suggest that the reform had statistically significant effects on absence rates, we consider such p-values with caution. Since the number of municipalities is much larger than the number of pre-treatment periods, the pre-treatment RMSPE is very close to zero for many municipalities (including Mandal), which might distort the distribution of the test statistics. 10 In any case, it is reassuring that we find extreme pre-treatment RMSPE values relative to the distribution of placebos, especially once we combine with the information that both the SC and the DD estimators are remarkably similar. Since the SC estimator attempt to construct a comparison municipality that follows a similar business cycle as the treated municipality, this gives us additional confidence that the results we find are not driven by variations in the business cycle. As a robustness check, we re-estimate the SC model considering only 2004-2007 as pre-treatment periods. We present these results in Appendix Figure 8. The estimated average effects are again very similar to the DD estimates and to the SC estimates using all pre-treatment periods (− 0.949 for all workers, and − 1.244 for female workers). As a final robustness check, we also re-estimate the SC model, but considering only 2001-2006 as pre-treatment periods. In this case, the weights are not chosen to minimize the distance between Mandal and synthetic Mandal in 2007. Still, since the treatment only started in 2008, we should expect that the synthetic Mandal would capture the trajectory of Mandal in 2007 if the SC method is working well. We present these results in Appendix Figure 9. The synthetic Mandal reconstructs the outcome for Mandal in 2007 remarkably well, even though this year was not used in the estimation of the weights. The estimated treatment effect in this exercise, considering the average between 2008 and 2014, is − 1.037 for all workers and − 1.597 for female workers, again very similar to our findings based on a series of other different DD and SC specifications.

Channels
To better understand why extended self-certification and employer involvement lead to a reduction in absence, we examine the fraction of spells that are graded (partial sickness absence) and also the number and length of the absence spells.
A worker with a graded absence certificate has moderate health problems and some work capacity left and should, accordingly, spend some time at work. Markussen et al. (2012) study a nationwide Norwegian reform in 2004 that, among other points, encouraged the substitution of graded for non-graded sick leave certificates. They argue that the reform led to shorter spells of sickness absence which in turn reduced absence levels, with graded sickness insurance workers utilize their remaining work capacity and this leads to a faster recovery and to a reduction in sickness benefits claims. Normally, it is the physician, together with the employer and the worker, who decides the grading of absence spells. After the reform in Mandal, the physician did not take part in this decision and the employer (the line leader) and worker together decided the grading of the absence. Could it be that the Mandal reform increased the use of grading, which then reduced overall absence, as found in Markussen et al. (2012). Figure 3 plots both the fraction of graded spells and the average length of spells. In comparing with control municipalities, there is no evidence that graded sickness absence is more frequently used in Mandal. There is a general trend toward more grading of absence, but Mandal basically follows the trend.
Turning to the number and length of the absence spells, Table 1 uncovers that the number of absence spells in Mandal dropped almost 20% after the reform, while it increased in other municipalities. If we divide the number of spells per year by the number of municipality employees to obtain the fraction of spells per employee, there is a drop from 0.25 in the years before the reform to 0.18 in the post-reform period. This amounts to a decline of 28% in absence spells per worker. When comparing the same periods in the other municipalities, the fraction of spells per employee is the same before and after the reform (0.30). We conclude that the reform apparently lowered sickness absence at the extensive margin.
According to Fig. 3(b), the remaining spells have become longer than in the average of the other municipalities in the post-reform period. When we use length of spell as the outcome variable and estimate Eq. (1) on yearly data, we obtain a DD estimate of 8.5 days with a standard error of 4.7 days. The p-value using the inference method proposed by Ferman and Pinto (2019) equals 0.059. Measured against a pre-reform base average length of 53 days (both in Mandal and in the average of all other municipalities), this amounts to an increase of almost 17% in the length of the spells in Mandal in the reform period. When we use data from the private and central government sectors (workers who are not affected by the Mandal reform), the (placebo) DD estimate is very close to zero (0.5 days).
Combining these results, we find evidence that the self-certification reform in Mandal relegated shorter, marginal spells (of those that lasted more than 16 days). This reconciles well with the conceptual framework presented earlier, according to which the reform led to a negative shift in demand, implying that workers with minor health issues did not demand sick leave. 11

Discussion and conclusion
We find that allowing workers to self-declare absence-allowing them to skip the doctor certificate-did not lead to increased sickness absence in Mandal. On the contrary, for female workers, the reform actually resulted in a significant drop in absence. We believe that the DD estimator we use captures the effect of the reform. The pre-reform 1 3  [2001][2002][2003][2004][2005][2006][2007][2008][2009][2010][2011][2012][2013][2014]. Graded sickness absence is a partial absence used when the worker has some work capacity left and spend some time at work trend in the treated municipality is moving in parallel with the average of all other municipalities. There is a sharp drop in absence just around the time of the reform in the treated municipality. If there were a contemporaneous Mandal shock to absence at this time, we should expect to see a similar drop in absence for workers in Mandal who are not affected by the reform. That did not happen. We explain the effect as a decline in the demand for sickness absence. For a given demand for absence, skipping the doctor as an absence certifier, as a gatekeeper, would increase the level of absence. By how much depends on the magnitude of moral hazard, that is, how many workers who are healthy enough to work claim absence benefits, and how rigorous physicians are as gatekeepers. However, as we explain in detail in a "Section 3," there are several features of this reform that may induce workers to lower the demand for absence. Our results show that the decline in demand dominates the direct effect of removing the physician as a gatekeeper.
There is little prior empirical research on the effects of medical certificates. One exception is the assessment of a Swedish experiment with extended self-certification of work absence, by Hesselius et al. (2013) and Hesselius et al. (2009). A random sample of workers in two different municipalities could self-certify one extra weektwo instead of one-of sickness absence. In comparison with workers who did not obtain the extra week, the treatment group increased their absence; on average, absenteeism increased by 0.8 days per year, from 11.8 to 12.6 days. In the Swedish case, the gatekeeper effect dominated the demand effect.
There are several differences between Swedish experiment and the self-certification reform we study. First, the Mandal reform differs in the level of discretion and trust it grants the employees. In Sweden, the workers' received one extra week, while in the present case they can self-certify for the whole entitlement period (1 year). The Mandal reform was branded (with a handshake logo) as the "Trust Project" and appealed openly to workers' responsibility and reciprocity. In addition, and maybe just as important, the reform in Mandal also implied a stronger employer involvement in the counselling of the sick workers, which was not the case in Sweden. This intense counselling may have increased the hazzle and costs of being absent. Olsen and Jentoft (2012) report from focus group interviews with leaders and employees that participated in the Mandal project. Some of those who were interviewed reported that the meticulous registration of absence and the frequent meetings between the line leaders and the sick absent employees felt intrusive and added costs to being absent (page 104).
For policy, our finding is a significant result. Using medical personnel to certify absence is costly for the doctors, the patients, and the insurer (which reimburses the medical doctors). Our analysis indicates that sickness certification can be taken out of the hands of the physicians without a subsequent rise in sickness absence. In fact, extended self-certification of sickness absence in Mandal appears to be a win-win reform, with less absence and fewer resources needing to be used on certification. However, note that extended self-certification of absence implies extended employer involvement, which is likely to use administrative resources in the municipality.
A natural question at this point concerns the extent to which our findings have external validity. First, since the reform involved stronger involvement from the municipality administration in addition to self-certification, we cannot guarantee that the effects would be the same if we did not have this involvement. Still, since these follow-ups from the employees' line managers are much less costly than physician certification, our results indicate that it is possible to turn down the requirement of physician certification without implying large increases in absence rates. Moreover, even if we identify a reform effect in Mandal, there could be specific attributes of Mandal that made skipping the medical doctor as a sickness certifier especially effective. It is reassuring that Mandal appears to be very much an "average" municipality if we look at the pre-reform data (on sickness absence or other variables such as, age, gender composition, and unemployment). Another possible source of singularity of Mandal might be that the key persons who initiated the Trust Project in Mandal are just as important as the reform itself. Again, it is reassuring that a team of leaders of the Trust Project were present also in the years before the extended self-certification was introduced. Another relevant point is that physicians in other countries may be stricter gatekeepers than the physicians in Norway, and hence if the same reform was introduced in another country the direct effect of skipping the doctor as an absence certifier-thereby pulling towards more absence-may dominate the decline in demand for absence. Although we cannot address this issue, our findings should encourage more sickness absence insurers to experiment with extended self-certification of sickness absence.  Table 3 compare Mandal municipality with all other municipalities in Norway. The numbers are averaged over the time interval of our analysis [2001,2014]. Standard errors in parenthesis. Employees in "Non-municipal" sectors include all workers in the private sector and those working for the central government. Sickness absence (% of contracted workdays) is the fraction of workdays lost as a percentage of the contracted workdays, counting only periods > 16 days. The average length of absence is found by dividing the number of absence days by sickness spells during a year. The percentage of graded spells is the fraction of all spells during which the worker is partly at work and partly on sick leave   Absence for non-municipal workers before and after the reform. Figure 4 compares the non-municipal workers in Mandal to non-municipal workers in all other municipalities in Norway, for each year in the time interval of our analysis [2001,2014]. Sickness absence (% of workdays) is the fraction of workdays lost as a percentage of the contracted workdays, counting only periods > 16 days. Importantly, the differences in trends before and after the reform are not statistically different from zero, as presented in columns 4 to 6 from Table 2 asymptotically, the distribution of the DD estimator depends only on W 1 , our estimate for the standard error of the DD estimator, which is presented in Table 2, is given by √ 0.398 = 0.631. In our application, the estimated variance of W i conditional on M i ranges from 0.323 to 2.473 suggesting a relevant level of heteroskedasticity coming from variation in population sizes. Interestingly, the median variance of W i is 0.528 , which is larger than the variance of W 1 . Therefore, if we followed Conley and Taber (2011) approach, then we should expect their inference method to be too conservative. The reason is that there are many municipalities with smaller population relative to Mandal. Therefore, we would recover a more disperse distribution for W 1 than we should if we do not take that into account.

Appendix A: Appendix Tables and Figures
Assuming further that W i has the same distribution for all i up to a scale parameter, then we can recover the distribution of W 1 by dividing the residuals Ŵ i of the controls by the squared root of the estimated variance conditional on M i (which asymptotically recovers a distribution with variance equal to one), and then multiplying by the squared root of the estimated variance conditional on M 1 (which then asymptotically recovers a distribution with the variance of the linear combination of the errors of the treated unit). Let's denote these re-scaled residuals by ∼ W i . With this estimated distribution of W 1 , we can calculate the p value for this test, which is given by the proportion of control municipalities in which the absolute value of ∼ W i is greater than the absolute value of the DD estimator. 12 We can also construct confidence intervals by looking at the quantiles of ∼ W i among the control municipalities. In this setting with only a single treated municipality, Ferman (2020) shows that this approach is valid even when we allow for spatial correlation in the errors, provided we assume a strongly mixing condition.