Introduction

Economists are increasingly interested in the effect of education on economic and non-economic outcomes, especially the relationship between education and health. Since Grossman [15], the relationship has been actively investigated both theoretically and empirically, with the literature reviewed by Grossman [16] and updated by Eide and Showalter [11] and Grossman [17]. While most empirical studies to date focus on the causal effect of education on health, less attention has been paid to the specific underlying mechanism, which Grossman [17] pointed out is an area of future work that has important implications for effective health policy implementation.

One possible mechanism is that highly educated individuals might respond to new information and change their behaviors more quickly. In other words, individuals for whom education has been a successful endeavor may be more “teachable.” This possibility has received support from a range of studies on the uptake of newly-approved drugs [29]; for example, a national information campaign on the HIV/AIDS epidemic in Uganda [9], a medical research publication which demonstrated that the risks of attempting a vaginal birth after having a previous C-section birth are higher than previously thought [35], and several studies since the 1950s of the growing awareness of the negative effects of smoking, as reported in the popular press [10] and the 1964 U.S. Surgeon General’s Report on Smoking and Health [1, 10].

A detailed study of the specific mechanisms behind heterogeneous responses by level of education can help policymakers consider new effective policies. In some of the abovementioned studies, however, the interval between the receipt of information and subsequent behavior is long, which makes it difficult to determine the specific mechanism in detail. For example, there are at least two possible paths by which level of education could lead to a heterogeneous response to new health information, each with very different implications for policy. One is that more highly educated individuals might have better access to new health information, but another possibility is that they might respond more quickly or efficiently to new health information even when access is equal. If highly educated individuals have better access to new health information, it is important to find ways to disseminate this information widely. If access to information is not the problem, however, then it is important to devise more effective ways to encourage more widespread behavioral change.Footnote 1 This might include, for example, health campaigns that are much easier to understand, with practical recommendations, specific implications, and concrete examples. This study uses a change in the health checkup system in Japan aimed at reducing obesity as the clear mechanism underlying the causal relationship between education and health. Because all workers were aware of the reform and it did not alter participation in the checkup itself, this study discusses the latter possible path by which differences in education lead to heterogeneous responses to policy reform.

Another limitation of the extant literature is that most studies are not quasi-experimental in that they compare changes in outcome variables among subjects who are highly or poorly educated without defining clear treatment and control groups. Therefore, while the heterogeneous responses to new information received by these two groups may seem intuitive, the studies cannot claim direct causal effects. By contrast, we use an institutional setting, worker type, to identify the policy effects by comparing the changes in health outcomes and behaviors between two groups with higher and lower proportions of members affected by the policy reform.

While economists have studied the effects of an exogenous variation in assimilating health-related information through various interventions, including health checkups and screening programs, on health outcomes and behaviors in Austria [18], China [40], Japan [13, 21, 22, 26], Korea [27], and the United States [2, 24, 33] to date, there is no unified view of the effects. Some studies have found evidence for the improvement of health by receiving health information through health checkups, health guidance, and health screenings [e.g., [13, 21, 27]]. Iizuka et al. [21], for example, found health improvement due to diabetes diagnoses among individuals with high health risks. They also found that health improvement is worth the medical spending on other preventive care increased due to the diagnoses. Fukuma et al. [13] found evidence for weight loss, BMI reduction, and waist circumference reduction due to the Specific Health Guidance in Japan one year after screening, but the health improvements were attenuated a few years later.Footnote 2 In contrast, others found no evidence for health improvement [e.g., [2, 18, 24]]. For the effects on health behaviors, there are previous studies that found statistically significant changes in health behaviors towards health improvement [e.g., [26, 33, 40]], while the others found no evidence [e.g., [2, 24, 27]].

One possible reason for the mixed results is the estimation methodology used in previous studies. Most of the studies in the extant literature apply a regression discontinuity design (RDD) with a biomarker threshold for diagnosing a health condition such as high blood pressure [40], diabetes [2, 21, 27], obesity [27], or waist circumference [13]. Since biomarkers are affected by various exogenous factors such as timing, measurements just above and below a threshold are likely to be random and the effect of a diagnosis on subsequent health outcomes and behaviors can be estimated around that threshold. Although this strategy has strong identification power, the estimated effects appear to be highly localized, and it is difficult to interpret insignificant effects.Footnote 3 Further, these studies have utilized a variety of thresholds for the identification strategy, and, moreover, have found heterogeneous effects based on numerous factors including level of income [40], age [18], and level of health risk [21, 27]. Taking this into consideration, as it is likely that these studies investigated different estimands making the estimated results somewhat incomparable, further investigation of the effects of health checkups is necessary to gain consensus.

Additionally, there are few studies of health diagnoses focusing on heterogeneity by level of education. One study by Zhao et al. [40] utilizes an RDD framework to analyze the heterogeneous effects of a hypertension diagnosis on nutrition intake by both education and income, finding that the effect on fat intake is stronger among those with a lower education and higher income, which is inconsistent with the literature on the relation between education and health. One possible reason is that an individual who knows the diagnosis threshold may change their behavior upon learning that their blood pressure rating is just below the threshold because they understand that they are still at risk. Highly educated individuals may be more likely than poorly educated individuals to know the diagnosis threshold. Therefore, this strategy does not seem ideal for analyzing the heterogeneity of effects by level of education,Footnote 4 leaving ample room for more studies of heterogeneous effects of health checkups by level of education.

This study utilizes a reform of the health checkup system in Japan aimed at reducing metabolic syndrome to analyze the heterogeneous responses of health behaviors and outcomes to the policy reform by education level. Japan has one of the most aged populations in the world and faces the problem of extended medical expenditure on lifestyle-related diseases including diabetes and hypertension. The percentage of males who are overweight, a risk factor for lifestyle-related diseases, increased by 32 % in the 20 years to 2017, and the number of people strongly suspected of having diabetes also increased by 45 % in the 20 years to 2016. Therefore, the Japanese government has reformed the health checkup system aimed at reducing the overweight population and eventually the people with lifestyle-related diseases.Footnote 5 Recognizing that the reform did not affect the participation rate in checkups but that the proportion of workers affected by the reform differed exogenously according to their work status, the study uses a difference-in-differences (DID) approach to compare the changes in the outcome variables between two groups with higher and lower proportions of members affected by the policy reform.

The results show that while the DID estimates of weight and body mass index (BMI) are statistically significantly negative for university graduates at higher risk of obesity, there are no significant changes among non-graduates at higher risk of obesity or individuals at relatively low risk of obesity. This highly educated group achieved significant weight loss, reducing BMI to levels that minimize all-cause mortality. Further, among university graduates at high risk of obesity, health behaviors such as physical activity and eating habits also changed. These results suggest that highly educated individuals are more likely to respond to a health checkup diagnosis and/or health guidance to improve their health. A secondary analysis suggests that cognitive functioning may be a key factor explaining this heterogeneity of response, which is consistent with other recent discussions of the role of cognition in the causal relationship between education and health [e.g., [3, 4, 8]].

The remainder of the paper is organized as follows: “Institutional background” explains the institutional setting; “Data” discusses the data and descriptive statistics; “Identification strategy” describes the identification strategy and estimation model; “Estimation results” discusses the estimation results; “Discussion” provides some additional remarks; and “Conclusion” summarizes the study, discusses limitations, and makes suggestions for further research.

Institutional background

Japan’s annual health checkup

Since the 1970s, an annual health checkup has been provided as part of Japan’s health promotion policy. The Industrial Safety and Health Act of 1972 requires all employers to provide a health checkup for their employees, who are, in turn, legally obligated to take it. Thus, virtually all salaried workers in Japan undergo a health checkup each year. Although the checkup may include additional tests, a set number are required by law. As such, salaried workers across Japan all receive a uniform minimum-level checkup. In addition to the checkups provided by employers, local governments also provide checkups for residents over 40 who are not salaried workers. Thus, all middle-aged and above Japanese residents have the opportunity to receive an annual health checkup, although it is mandatory for salaried workers and voluntary for others, including the self-employed.

However, in the early 2000s, despite these publicly provided checkups having taken place annually for decades, health had not sufficiently improved. Specifically, according to the mid-term evaluation of the “Health Japan 21” policy aimed at reducing the prevalence of lifestyle-related diseases and implemented in 2000 at the turn of the 21st century, the incidence of lifestyle-related health conditions such as diabetes and obesity had increased. As these health conditions comprise a large proportion of public health expenditure, the problems with the health checkups were investigated and summarized in a Council of Governments report.Footnote 6 The first problem identified was that the existing intervention did not work for people who already had a disease. While screening is most effective for people at high risk of a disease but who do not yet have it,Footnote 7 the purpose of the checkups, up to that point, was to select for intervention patients who were already in the early stages of a disease. The second problem with the health checkups was that the intervention was insufficient. While Knowler et al. [28], for example, found that for those at high risk of diabetes, intervention to help change lifestyle habits was more effective in preventing the onset of diabetes than medication, the health checkups merely provided those identified as high risk with general information about the disease and a recommendation to see a doctor. The final problem was that the content of the health checkups, conducted by providers across the country under various local laws, was not unified. Addressing these identified inadequacies of the existing health checkups thus required a reform of the system to provide a more substantial intervention targeting those at high risk of disease and implemented uniformly across all institutions nationwide. This new system is also a part of the “Health Japan 21” policy.

Specific health checkups and specific health guidance

In April 2008, a new health checkup system, Specific Health Checkups and Specific Health Guidance, was introduced, aimed at preventing lifestyle-related diseases by providing participants with objective assessments of their health risks and specific guidance from health professionals. This new health checkup system now focuses on metabolic syndrome, a condition represented by a confluence of biomarkers including excess body fat, high blood pressure, and high blood sugar which, together, identify people at high risk of lifestyle-related diseases. The policy reform was introduced uniformly for the target population of individuals covered by public health insurance and their dependents aged between 40 and 74. Since Japan has a universal health care insurance system, this target population covers almost all residents of the country. Importantly for this study, the reform did not alter participation in health checkups. As seen in Panel (a) of Fig. 1, there is no surge in the checkup participation rate of the middle-aged around the policy reform.

Fig. 1
figure 1

Source: Comprehensive Survey of Living Conditions. Participation rate is calculated for males aged 50–62

Participation rate in health checkups of the middle-aged.

The current system is divided into two parts: a health checkup to screen for participants at high risk of metabolic syndrome, followed by face-to-face guidance by a doctor, public nurse, or dietitian aimed at prevention by changing lifestyle habits. The content of the checkup is based on medical and scientific evidence for identifying metabolic syndrome, and includes body measurements, blood tests, and questionnaires about such topics as smoking and medical histories. As excess body fat is a marker of metabolic syndrome, a measure of abdominal girth was added to the new system to estimate the amount of visceral fat. Based on the results of the health checkup, an objective assessment of metabolic syndrome risk is determined.

Participants then receive health guidance specifically tailored to their physical condition. Those at high risk are given health guidance about their lifestyle habits, aimed at informing participants of the benefits and risks of their lifestyle habits and providing support to change behavior. There are two types of health guidance: Motivational support for participants who have risk factors of metabolic syndrome, but the number of risk factors is relatively low; and active support for participants who have more risk factors of metabolic syndrome. Active support guidance is a support program with higher intensity than motivational support guidance. In both types of guidance, the participants are required to set their weight loss goals and make a lifestyle-change plan to achieve them.Footnote 8

In short, the policy reform provides checkup participants with objective knowledge of the risks associated with their health condition and specific information about the benefits and risks of their health behaviors.Footnote 9

Data

The main dataset used in this study is the Japanese Study of Aging and Retirement (JSTAR), a biennial panel survey of Japanese people aged over 50 which is a sister dataset of the U.S. Health and Retirement Study (HRS), the English Longitudinal Survey on Ageing (ELSA), and the Survey on Health, Ageing, and Retirement in Europe (SHARE). The first wave of the JSTAR was conducted in 2007 in five cities in Japan. Although other sample cities were added in later waves, this study employs only the five original cities in order to obtain data both before and after the health policy reform of 2008. The JSTAR includes comprehensive information on demographics, labor force status, economic variables, health investment behaviors, and health outcomes to analyze the impact of the checkup system reform on health outcomes and behaviors.Footnote 10 The purpose of the policy reform was to reduce the number of people at high risk of metabolic syndrome, and one solution is weight loss. This is captured in this study through measures of weight and body mass index (BMI), or body weight adjusted by height, a common measure of obesity.

In addition to the JSTAR, the Comprehensive Survey of Living Conditions, the Longitudinal Survey of Middle-aged and Elderly Persons, and the General Survey on Working Conditions are also used in this study to interpret and provide further context for the main estimation results.

Identification strategy

This study utilizes heterogeneity in the health checkup participation rate to identify the effects of the policy reform on health outcomes and behaviors. As noted above, the new checkup system was implemented uniformly, making it difficult to assign participants to treatment and control groups on that basis, so a different institutional setting was used based on employment status. As explained above, the annual health checkup has always been mandatory for salaried workers, so their participation rate is higher than others. Panel (b) of Fig. 1 shows the participation rate of middle-aged workers according to employment status and indicates that the participation rate of salaried workers is about 90% and constant before and after the policy reform. Therefore, about 90% of salaried workers were affected by the policy reform but their participation in health checkups did not change. On the other hand, the participation rate of self-employed workers is substantially lower, at about 50%, and also shows no significant change in participation around the time of the policy reform. We can, therefore, conclude that about 50% of self-employed workers were affected by the reform, but their participation in health checkups did not change.

We use this stable difference in the proportion of workers affected by the policy reform as the identification strategy, applying a difference-in-difference (DID) framework to compare the before-after change in health outcomes between the salaried worker, i.e., the treatment group with a higher proportion of members affected by the reform, and the self-employed worker, that is, the control group with a lower proportion of members affected.Footnote 11Footnote 12 Within this DID approach, the estimated effects are deducted by the difference in the participation rate between salaried and self-employed workers, so if the signs of both groups are the same, the DID estimate indicates the lower bound of the magnitude of the effect in absolute value. The interpretation of the DID estimate is discussed further in “Discussion”.

In order to ensure the validity of the DID approach, a number of assumptions must hold: First, it is important for the identification strategy that the relative participation in health checkups does not change between the treatment and control groups. As the checkup has always been mandatory for salaried workers, one would not expect their participation to change, and this has been confirmed above. For self-employed workers, however, the checkup is voluntary and so it is conceivable that their participation in the improved checkups might increase. If their participation rate were to rise to something approximating that of salaried workers, the shrinking difference in participation rates would reduce the effectiveness of the DID approach. However, as discussed above, the participation rate did not change significantly after policy reform for either salaried or self-employed workers (Panel (b) of Fig. 1).

Second, an important assumption for the internal validity of DID is the common trend assumption; namely, that counterfactual changes in outcomes among salaried and self-employed workers must be the same if policy reform is considered to not have occurred. This allows us to measure the effect of the reform. The typical means of testing for this is to check the trends in target outcomes before the reform, but as our dataset includes only one period before the policy reform, we must test this via other means. First, as the JSTAR asks respondents to self-report the change in their health from one year before the survey to the survey date, we use this information to assess whether the trends of salaried and self-employed workers are heterogeneous. Second, as the dataset is comprised of longitudinal data that includes rich information about demographic, economic, and health related variables, we can control for individual observable characteristics related to health such as age and economic condition, as well as time-invariant individual heterogeneity. Third, in order to provide further evidence of a common trend, a placebo regression using a health variable less related to the newly introduced system was estimated. While the details of these validity checks are discussed below, at this point we can state that there is no evidence implying the violation of the common trend assumption.

Estimation equation

Controlling for any potential bias caused by heterogeneity in the trends of salaried and self-employed workers, the estimation equation is as follows:

$$\begin{aligned} y_{it}= & {} \beta _{0} + \beta _{1} SalariedWork07_{i} + \beta _{2} After_{t} \nonumber \\&+ \beta _{3} SalariedWork07_{i} \cdot After_{t} + x'_{it} \delta + \theta _{i} + \epsilon _{it} \end{aligned}$$
(1)

where i and t are indices of individual and time. The dependent variable \(y_{it}\) represents health outcomes such as weight and BMI, and health investment behaviors such as physical activity and eating habits. \(SalariedWork07_{i}\) takes one if the respondent was a salaried worker in 2007 before the policy reform, while \(After_{t}\) takes one after the policy reform. The vector \(x_{it}\) is a set of control variables that includes age dummy variables, marital status, number of children, household income, house ownership, hours of work, stress condition at the workplace, occupation variables,Footnote 13 and prefecture-level macroeconomic variables. Parameter \(\theta _{i}\) captures the unobserved individual fixed effects and parameter \(\epsilon _{it}\) is an unobserved error term. In Eq. (1), parameter \(\beta _{3}\) corresponds to the DID estimate and is the parameter of interest in this study. This captures the difference in the change in the outcome variable between salaried and self-employed workers in 2007. Equation (1) is estimated for both university graduates and non-graduates and the DID estimates are compared in order to discover the heterogeneous effects of education on the outcomes of policy reform.

Verification of common trend assumption

As discussed above, any causal interpretation of the DID estimate requires the common trend assumption to hold. Although our dataset includes only one period before the reform, we can use the 2007 self-reported change in health from the previous year to check for any heterogeneity in the trends. We present the self-reported change in health from 2006 to 2007 by level of education and employment status in Table 1.Footnote 14 We see that the patterns of the self-reported change in health are not statistically significantly different between salaried and self-employed workers for both university graduates and non-graduates. Most of the males in our sample answered that their health conditions were the same as the previous year’s. This suggests that the common trend assumption between salaried and self-employed workers is not violated, at least by what can be captured by self-reported changes in health.

Next, as an additional measure to control for any heterogeneity in the trends between salaried and self-employed workers, observable characteristics were added to the model and a fixed effects estimation was conducted using the panel structure of the JSTAR. In addition to the demographic variables of age, marital status, and number of children, household income and home ownership economic variables were included as control variables because previous studies have shown a relationship between health and economic conditions [e.g., [6, 7, 38]]. Further, because the analysis sample includes the JSTAR data during the financial crisis of 2008 which may have affected workplace and regional economic conditions heterogeneously, workplace-related variables (hours worked, physical stress at workplace, job stress at workplace, occupation category dummy variables), and time-variant regional characteristics (prefecture-level GDP and per capita income) were also included as controls. Additionally, although the accumulation of health stock until middle age and health preferences could also cause heterogeneity in the trends, the fixed effects estimation controls for such time-invariant unobserved individual heterogeneity.

As a final measure to ensure that the common trend assumption is satisfied, placebo regressions were run using mental health measurements and height as the dependent variables. As alluded to above, one concern regarding the validity of the common trend assumption is the financial crisis of 2008. The analysis sample includes the JSTAR data during the financial crisis which could change individual behaviors heterogeneously, resulting in heterogeneous changes in health conditions, including body weight. Therefore, heterogeneous behavior changes due to the financial crisis may cause some violations of the common trend assumption. We run the same estimation model with the measurements of mental health conditions, which have been said to be associated with economic recessions in previous studies, as dependent variables to discuss whether the financial crisis heterogeneously affected individuals.

Height is also used as a dependent variable for the placebo regression as it is a component used to calculate BMI and it is known to decline after middle-age. For Japanese middle-aged males, height loss begins at age 41, and annual height loss is estimated between − 0.01 and − 0.17 cm [34].Footnote 15 Suppose that height declines only among self-employed workers and body weight remains the same among both salaried and self-employed workers. This could produce results implying BMI decline due to the policy revision even without weight loss.Footnote 16 The financial crisis could change individual behaviors heterogeneously, resulting in heterogeneous height loss. In addition, heterogeneous height loss could also occur due to a difference in health decline between salaried and self-employed workers, because significant height loss after middle age is used as a proxy for health decline [12, 19]. To eliminate these possibilities, we also implement the placebo regression using height as the dependent variable.

Analysis sample restrictions

For the estimation, the analysis sample was first restricted to males aged between 50 and 62 because the identification strategy requires workers and in Japan, and males are more likely to be working. Next, although the newly introduced health checkup system was made available to everyone aged 40 and over, a minimum age of 50 was chosen for this study because the JSTAR data included only those aged 50 and over. Meanwhile, the maximum age was restricted to 62 in order to eliminate workers who were eligible to retire with a pension from the sample, as previous studies have found that this affects health conditions and behaviors [e.g., [25, 30,31,32, 41]]. In the first wave of the JSTAR, 62 is the age that people are eligible for a full pension.Footnote 17

Next, the sample was divided into two groups according to their risk of metabolic syndrome before the policy reform because, as mentioned above, individuals at a high risk received objective results of the checkup and guidance from a health expert. Moreover, low risk individuals do not need to change their behaviors because they are already healthy. For these reasons, there is a possibility that the reform effects might be heterogeneous according to the individual’s potential risk of metabolic syndrome.

Fig. 2
figure 2

Source: JSTAR 2009. Sample: Males aged over 50

Relationship between BMI and eligibility for health guidance (2009).

BMI was used as the criterion to divide the sample by health condition before the reform. As explained above, a range of measurements are used to evaluate the risk of metabolic syndrome, but as the 2007 JSTAR includes only BMI information, it is difficult to construct a full picture of the pre-reform risk of metabolic syndrome from that alone. Fortunately, however, the 2009 JSTAR also includes information on girth of abdomen and blood pressure that can be used to gauge an individual’s eligibility for guidance.Footnote 18 A plot of these two pieces of information, i.e., eligibility for guidance and BMI, was used to determine the BMI criterion for dividing the sample into high and low risk. From Fig. 2, we see that the probability of receiving guidance increases as BMI rises, but there is a jump of about 30 percentage points when BMI reaches 23.5, from 40 % at BMI = 23–70 % at BMI = 23.5. Thus, BMI greater than 23.5 became the criterion for defining the high-risk portion of the sample, with others at low-risk of metabolic syndrome. Because those at high risk have an incentive to change their behavior and reduce their obesity, the policy reform effects should be stronger for these individuals; therefore, the analysis sample was further restricted to them.

Estimation results

Descriptive statistics

Table 1 Differences in characteristics between salaried and self-employed workers by education before the policy reform

Before presenting the estimation results, this section discusses relevant descriptive statistics of the sample. First, looking at males between 50 and 62 who were at high risk of metabolic syndrome before the policy reform, Table 1 shows the average values of observable characteristics and the difference in those characteristics between salaried and self-employed workers by level of education in 2007. According to Table 1, after conditioning for health risk before the reform, most of the characteristics are not statistically different between salaried and self-employed workers, regardless of whether they were university graduates or not (Columns (3) and (6)). Average BMI is similar for all groups, ranging from 25.66 to 25.93. Therefore, while we were concerned that if more obese individuals decreased their weight more, this would lead to heterogeneity in the effects of the policy reform on BMI, this concern appears unfounded because BMI is similar across groups. We also notice that salaried workers with university degrees appear to be taller, but the difference is small (about 1.8 %). Next, looking at workplace conditions, we notice that among those with lower education, the self-employed work about 10% more hours than salaried workers and are more physically stressed. This means that salaried workers enjoy more leisure time, which can be used to invest in positive health behaviors, such as exercise. Additionally, the difference in physical effort required at the workplace may affect weight loss. Therefore, because these two differences can cause the estimates to be biased, it is important to control for these workplace conditions, especially for non-graduates.

Fig. 3
figure 3

Source: JSTAR. Sample: Males aged 50–62 whose BMI before the policy reform was greater or equal to 23.5

Changes in BMI among individuals at high pre-reform risk by employment status and level of education.

Before presenting the estimation results for Eq. (1), we first discuss the DID estimates via Fig. 3. Figure 3 summarizes the changes in average BMI among individuals at high pre-reform risk of metabolic syndrome by their pre-reform employment status and level of education. Panels (a) and (b) show the changes for university graduates and non-graduates, recalling that average BMI was approximately the same for all groups in 2007. From Fig. 3, we see that for salaried workers with university degrees, average BMI falls below 25 in 2009 and remains at that level throughout (solid line in Panel (a)), while all other groups show no decrease in average BMI (dashed line in Panel (a) and lines in Panel (b)). This suggests that the goal of the reform, that is, obesity reduction, occurred only for salaried workers, for whom a larger proportion of members were affected by the reform, and only if they had a higher level of education.

While this result does not completely rule out the possibility of bias arising from the common trend assumption failing to hold, in the estimations discussed below, an attempt is made to minimize any such bias by controlling for observable characteristics and time-invariant individual heterogeneity.

Effect on health outcomes

Table 2 Effects of policy reform on body measurements by education (males aged 50–62 with high pre-obesity risk)

This section discusses the estimation results for the effect of the reform on the measurements of BMI, weight, and height. Table 2 shows the estimation results for those at higher risk of obesity before the policy reform by level of education, with Columns (1), (2), and (3) showing university graduates and Columns (4), (5), and (6) showing non-graduates. Fixed effects (FE) estimation was used in all specifications to control for individual time-invariant heterogeneity, and the DID estimate, \(\beta _{3}\) of Eq. (1), is reported.

According to Table 2, the DID estimate of BMI for university graduates at higher risk of obesity before the policy reform is − 1.065 and statistically significant at the 1% level after controlling for observable characteristics and unobserved time-invariant heterogeneity (Column (1)). Since average BMI for salaried workers with university degrees before the policy reform was 25.78, this means that their BMI decreased by about 4.1% to approximately 24.72. A medical study by Tsugane et al. [39] analyzing the relationship between BMI and all-cause mortality for middle aged Japanese finds that the mortality profile for males has a U-shape and bottoms out at a BMI range of 23.0–24.9. Consequently, we can interpret the observed reduction of BMI among university graduates to within this range as an improvement in their health condition. The effect of the policy reform on BMI is non-linear: the higher the pre-reform BMI, the larger the magnitude of BMI reduction due to the policy reform.Footnote 19 As explained in “Institutional background”, the higher the risk of metabolic syndrome, the more intense the health guidance that is implemented. The non-linear effects of the policy reform on pre-reform BMI may reflect the difference in the intensity of health guidance by the risk of metabolic syndrome.

Additionally, among university graduates, we find a statistically significant 4.0% weight loss (Column (2)) but no significant change in height (Column (3)), suggesting that the decline in BMI is associated with the weight loss.Footnote 20 From this we can further infer, as discussed earlier, that the lack of significant height shrinkage suggests that the BMI and weight estimates are less likely to have suffered from bias due to heterogeneous health shocks. In addition, there are no statistically significant changes in mental health conditions, suggesting that the results of BMI and weight are less likely to be suffering from bias due to the financial crisis of 2008.Footnote 21

In contrast to university graduates, non-graduates show no statistically significant changes in any of the body measurements (Columns (4), (5), and (6)), indicating that health improvement with weight loss is observed only among university graduates at higher risk of obesity.Footnote 22 From this, it appears that the policy reform was effective only for individuals with a higher level of education.

To round out the analysis, for individuals at lower risk of obesity before the policy reform, we found no statistically significant changes in any of the body measurements (Table 12) for either university graduates or non-graduates. This is not surprising, as individuals who are at lower risk of obesity are less likely to have an incentive to change their health condition because they are already healthy, at least by this measure.

Effect on health investment behaviors

Table 3 Effects on health behaviors (University graduates with high pre-obesity risk)

This section discusses the estimation results for the effect of the reform on health investment behaviors related to obesity, focusing on physical activity, energy intake, drinking habits, and eating habits. Table 3 shows the estimation results for university graduates at higher pre-obesity risk. To measure physical activity, a dummy variable was constructed that takes a value of one if the respondent both walks 90 minutes on a normal day and takes part in some form of physical activity on holiday (Column (1)). I constructed the distance between the distance between actual and ideal levels of energy intake as a measure of energy intakes (Column (2)). I defined the ideal level as the estimated energy requirement (the level of energy intake required to maintain current body weight)(hereafter referred to EER) for individuals with low physical activity.Footnote 23, Footnote 24 Column (3) reports estimates for a drinking dummy variable which takes value 1 if alcohol intake is greater than zero, and Columns (4) to (8) show the results for the daily intake of staple foods, main dishes, meat dishes, fish dishes, and vegetables, where main dish intake is the sum of meat and fish dishes intake. Column (9) shows the result for the ratio of fish dishes to main dishes intake.

From Table 3, the DID estimate for the physical activity dummy variable is positive and statistically significant, and the magnitude of the estimate can be interpreted as a 197 % increase in the amount of physical activity compared to the average value for salaried workers before the policy reform (Columns (1)). The DID estimate for the energy intake variable is negative and statistically significant, indicating that, for university graduates, the actual energy intake comes close to the ideal energy intake for weight loss (Columns (2)). The results suggest that, among university graduates, energy expenditure increases, and energy intake reaches the ideal level, which should result in weight loss. Further, among university graduates, salaried workers, compared to self-employed workers, stopped drinking after the policy reform at a significance level of 10 %, and the magnitude can be interpreted as a 26.87 % decline compared to the average before the policy reform (Column (3)). Salaried workers also statistically significantly changed their daily intake of meat and fish dishes, but not of staple foods, main dishes, or vegetables (Columns (4) to (8)). As the DID estimates for meat and fish dishes have opposite signs (meat is -63.843 and fish is 38.332) and the estimate for the ratio of fish to main dish intake is positive and statistically significant, this suggests that university graduates replaced meat with fish after the policy reform.Footnote 25 In contrast, among non-graduates, no systematic changes in energy intake was observed while physical activity decreased (Table 13). Therefore, systematic changes in health behaviors were observed only among university graduates.

To sum up, we found that among university graduates at higher risk of obesity prior to the health policy reform, the revised health checkup resulted in changes in both health condition and health investment behaviors. BMI decreased to within the range associated with minimal all-cause mortality for middle aged Japanese men, and physical activity and eating habits improved. For all other groups, however, no change was observed in either health outcomes or behaviors.

Discussion

Interpretation of the DID estimates and comparison of university graduates and non-graduates

In the previous section, educational heterogeneity in the response to the policy reform is analyzed by comparing the DID estimates of university graduates and non-graduates. This section discusses three possible situations that complicate the comparison of the different effects between university graduates and non-graduates even when the common trend assumption between the treatment and control groups (salaried and self-employed workers) is satisfied. These are: (1) when the participation rate of salaried and self-employed workers is different, which is the case as this difference was important for our identification strategy, (2) when the health condition of salaried and self-employed workers is different prior to the reform, and (3) when employment status might be related to company size.

First, as discussed in “Identification strategy”, the DID estimate is interpreted as the lower bound of the magnitude of the effect of the policy reform in absolute value. Specifically, the estimate corresponds to the effect of the policy reform deducted by the difference in health checkup participation rates between salaried and self-employed workers. Therefore, it is difficult to compare the DID estimates for university graduates and non-graduates when the participation rates of salaried and self-employed workers among these two groups differ. For example, suppose that among university non-graduates the health checkup participation rate for salaried and self-employed workers is similar, so that the difference in participation rate is close to zero. Suppose, however, that there is a difference among university graduates. In such a situation, the DID estimate for non-graduates would be smaller in absolute value than for graduates even if the magnitude of the effect of the policy reform is the same for both the graduates and non-graduates. This occurs because there is no difference in the participation rate-which is the source of the difference in treatment intensity among the treatment and control groups-only for non-graduates.

Table 4 Differences in participation rate in health checkups by level of education

To investigate this possibility, Table 4 shows the difference in health checkup participation rates by level of education using two additional datasets: the Comprehensive Survey of Living Conditions (Panel A) and the Longitudinal Survey of Middle-aged and Elderly Persons (Panel B). The first and second columns show the participation rates of salaried and self-employed workers, and the third column shows the difference in their participation rates. The three rows of each panel show the results for the whole sample, that is, university graduates and non-graduates. Considering both panels, the difference in the participation rate ranges from 29 to 38 percentage points, with the difference for university graduates larger than for non-graduates (0.35 vs 0.29, and 0.38 vs 0.34), a single-digit percentage point difference. However, as seen in Table 2, the DID estimate of BMI for university graduates is several orders of magnitude larger (about 50 times larger) than for non-graduates (Column (1) vs Column (4)). Consequently, we can infer that the difference in the checkup participation rates of the two groups is not a major source of the difference in the estimation results.

A second concern in interpreting the DID estimates occurs when there is a difference in the pre-reform health conditions of salaried and self-employed workers because, as explained in “Institutional background”, the newly introduced system is comprised of two programs, health checkup and health guidance, and the specific intervention received depends on the participant’s assessed risk of metabolic syndrome. If salaried workers were less healthy than self-employed workers before the policy reform, they would have been more likely to receive health guidance than self-employed workers, causing the intensity of the effect of the policy reform to be larger among salaried workers. This would be problematic because it would conflate the DID estimates, with the difference in the DID estimates of university graduates and non-graduates perhaps only capturing the different health tendencies. For this reason, before conducting the estimation procedure, the sample was divided according to pre-reform health condition in order to address this issue.

Fig. 4
figure 4

Source: General survey on working conditions 2007

Relationship between firm size and provision of extra health checkup items.

Fig. 5
figure 5

Source: General survey on working conditions 2011. Average monthly cost per capita

Relationship between firm size and benefit expenses not required by law.

A third issue regarding the interpretation of the estimation results involves a possible conflation of employment status and workplace conditions. Recall that we found that only university graduates among salaried workers significantly changed their health conditions relative to self-employed workers. In Japan, salaried workers with university degrees are likely to work at large companies, which often provide more fringe benefits than smaller companies, and this could include extra health checkup items beyond those items required by law. If this is the case, then the significant effects observed for salaried university graduates might merely capture large company trends, thus we want to eliminate this possibility. According to the 2007 General Survey on Working Conditions, the ratio of firms providing extra health checkup items in addition to those required by law was almost the same regardless of firm size (Fig. 4). Similarly, the 2011 report shows that the cost to firms of fringe benefits not obligated by law related to health checkups is almost the same for all firm size categories (“health checkup” in Fig. 5). However, the cost of medical and health services for companies with over one thousand employees is approximately three times more than for smaller companies (“medical and health services” in Fig. 5).Footnote 26 In order to address this issue, the model was estimated again including a firm size variable,Footnote 27 and the results were found to be robust after this addition (Table 14).

To sum up, we found that the three potential issues raised in this section are not of concern, and so the DID estimates of the university graduates and non-graduates are comparable for discussing the heterogeneity of the response to the policy reform by level of education. The next section discusses possible reasons why the DID estimates are heterogeneous by level of education.

Causes of heterogeneity by level of education and policy implications

This section discusses possible reasons for the observed heterogeneity by level of education in the responses to the health checkup, followed by policy implications. One possible explanation for the heterogeneous responses is a difference in cognitive functioning. In recent studies of the relation between education and health, there is much discussion about the relative contribution of variables at early stages of life, including cognitive and non-cognitive skills and socioeconomic background in childhood [4, 8]. According to Bijwaard et al. [4], at least half of the difference in the survival probability between educational groups can be explained by the selection of education choice, based mainly on cognitive skill. While Bijwaard and Van Kippersluis [3] finds that people with higher education are more efficient producers of health investment to survival probability than people with lower education; when cognitive skill is accounted for, the difference disappears.

In order to test this cognitive functioning hypothesis, the JSTAR memory and numeracy skill tests of cognitive functioning were used.Footnote 28 Additionally, the discount rate, health preferences, and self-reported probability of being alive at age 80 are used to discuss other possible explanations of the heterogeneous response.Footnote 29

Table 5 Differences in characteristics of salaried and self-employed workers by education

First, according to Table 5, university graduates have statistically significantly higher cognitive functioning scores than non-graduates on the word recall test of memory functioning and the serial 7s numeracy test (8.0% and 5.3%, respectively). Moreover, the proportion of individuals whose word recall and serial 7s test scores are greater or equal to the 3rd quartile point is higher for university graduates than non-graduates (48.5% higher and 17.5% higher, respectively). In contrast, differences between university graduates and non-graduates on other characteristics such as the discount rate and self-reported probability of being alive at age 80 are not significant.

Next, in order to decompose the educational heterogeneity of the policy reform response into cognitive functioning and other factors, the university graduate and non-graduate samples were combined, and the model was estimated with the interaction terms of the DID term and variables, including university dummy variable, health preferences, cognitive functioning test score, discount rate, and self-rated probability of being alive at age 80. For this estimation, the latter three dummy variables took a value of one if the response was equal to or greater than the 3rd quartile point.

Table 6 Decomposition

Columns (1)–(3) of Table 6 show the results of the estimation without any interaction terms other than the university dummy variable. We see that the coefficients for the interaction of DID and university (“\(\times\) Univ. = 1”) on BMI and weight are negative and statistically significant (Columns (1) and (2)), indicating that the DID estimates of BMI and weight are statistically significantly different between university graduates and non-graduates. However, after adding the other interaction terms, the coefficients for the interactions of DID and university on BMI and weight are no longer statistically significant (Columns (4) and (5)). Further, the coefficients for the interaction of DID and cognitive functioning test scores (“\(\times\) Serial7 score \(\ge\) 3rd quartile” and “\(\times\) Word Recall score \(\ge\) 3rd quartile”) on BMI and weight are statistically significantly negative while all other interaction terms are insignificant. This implies that cognitive functioning is one of the key factors explaining the educational heterogeneity of the responses to the health policy reform.Footnote 30

This discussion concludes with a limitation of the estimation related to cognitive functioning test scores. Recall that the identification strategy relies on the difference in the participation rate between salaried and self-employed workers. As discussed above, if the difference in the participation rate were to differ between people with higher and lower cognitive functioning test scores, it would be difficult to interpret the coefficient of the DID and high cognitive test score interaction term. Because the Comprehensive Survey of Living Conditions and the Longitudinal Survey of Middle-aged and Elderly Persons do not include information about cognitive abilities, we were not able to conduct a secondary analysis to verify this. Therefore, we cannot unequivocally state that the coefficients of these interaction terms represent heterogeneity in the effects rather than heterogeneity in the participation rate of the health checkup.

Conclusion

This study analyzed the effects of a policy reform of the health checkup system in Japan aimed at giving participants more objective information about their risk of obesity and health guidance from professionals on the health outcomes and health investment behaviors of participants, finding heterogeneity in the effects by level of education. According to the results of the estimation conducted using a DID framework, the estimates of BMI and weight were statistically significantly negative for university graduates at higher risk of obesity, indicating that the reform of the health checkup produced favorable health outcomes for these participants, who also improved their health investment behaviors including physical activity and eating habits. There were no significant changes observed, however, for non-graduates or even for graduates with relatively low risk of obesity either in terms of health outcomes or investment behaviors. The differences in the DID estimates between university graduates and non-graduates were statistically significant. These results suggest that highly educated people are more likely to respond to a health policy reform to change their behavior and improve their health condition.

In order to discover why this might be, a secondary analysis was conducted and revealed that cognitive functioning appears to be a key factor in explaining the educational heterogeneity of the responses to the policy reform. The descriptive statistics show that the university graduates in the analysis sample have statistically significantly higher cognitive functioning test scores than non-graduates, and a secondary analysis adding the interaction terms of DID and high cognitive test score dummies found these interaction terms to be statistically significantly negative. However, the difference in the DID estimates between university graduates and non-graduates was no longer significant, indicating that the reform was effective not strictly for university graduates but only for individuals with high cognitive functioning scores.

One possible mechanism is that it is easier for professionals such as doctors, public nurses, and dietitians to teach individuals with high cognitive functioning scores (i.e., they are more “teachable”). They may be more capable of efficiently processing the information from the checkup and subsequent guidance because of their higher cognitive functioning or because of more experience of being taught in their lives. In Specific Health Guidance, as discussed in “ Institutional background”, the participants are required to set their weight loss goals and make a lifestyle change plan to achieve them. They are also required to update their goals and plans as necessary. Thus, it is necessary for the participants to be able to process the information from the professionals, and teachability could be one of the key factors for the success of health guidance.

The policy implication is that interventions should be provided depending on individual characteristics, including not only pre-intervention health risk, but also teachability. For the interventions to be more successful, more intense interventions may be necessary for individuals who lack teachability. For the Japanese health checkup programs, a more effective health checkup with more wide-ranging effects could be provided through a more accessible presentation of health information and guidance that includes a clearly articulated explanation of the effects of specific identified risk factors and directly linking these risk factors to a concrete individualized action plan, especially for individuals who lack teachability. Health promotion policies that take individuals’ teachability into account could also have an important role in achieving health improvement for the entire population in other policy settings, and in other countries as well.

The limitations of the study that should be addressed in the future are as follows. First, this study holistically analyzed the effect of the policy reform of the Japanese checkup system which is now a two-part system of assessment and guidance, but it did not separate the relative effects of the assessment and guidance. Decomposing the reform effects into these two components is the subject of future work, as it may shed additional light on the observed educational heterogeneity of the reform. Second, as the first wave of the JSTAR only included five Japanese cities, and the sample size is relatively small, we need to pay attention to the external validity of the results. An expansion of this study to a more generalized population could also be the subject of future work. Third, although we tried to examine the validity of the common trend assumption by looking at self-reported health changes before the policy reform, using the observable characteristics as control variables, and implementing placebo regressions, we cannot confirm the parallel trend of the outcome variables between the treated and control groups, which is the typical way to discuss validation of the common trend assumption used in recent studies using the DID strategy. This is because, as explained in “Identification strategy”, we only have one period before the policy reform. Fourth, previous studies have argued that BMI is not a perfect measure of obesity. One reason is that BMI relies not only on fat, but also fat-free mass such as muscle and bone [5].Footnote 31 Using other measures of obesity such as waist circumference and fat mass as outcome variables could be another subject of future work.