Does moderate weight loss affect subjective health perception in obese individuals? Evidence from field experimental data

This paper analyzes whether moderate weight reduction improves subjective health perception in obese individuals. Besides simple regression models, in a simultaneous equation framework we use randomized monetary weight loss incentives as instrument for weight change, to address possible endogeneity bias. In contrast to related earlier work that also employed instrumental variables estimation, identification does not rely on long-term, between-individuals weight variation, but on short-term, within-individual weight variation. Yet, our result does not suggest that the simple regressions suffer from much endogeneity bias, since instrumental variables estimation yields similar—though far noisily estimated and statistically insignificant—estimates. In qualitative terms, our results do not contradict previous findings pointing to weight loss in obese individuals resulting in improved subjective health. Our results suggest that a reduction of body weight by one BMI unit is associated with an increase in the probability of reporting self-rated health to be ‘satisfactory’ or better by 3 to 4 percentage points. This finding may encourage obese individuals in their weight loss attempts, since they are likely to be immediately rewarded for their efforts by subjective health improvements.


Introduction
It is well documented in the literature that excessive accumulation of body fat (obesity) is associated with many undesirable health outcomes such as heart disease (Hubert et al. 1983), type 2 diabetes (Mokdad et al. 2003), and several forms of cancer (Calle et al. 2003). A recent meta-analysis (Di Angelantonio 2016) even finds that obese individuals face a higher risk of all-cause mortality compared to their normal-weight counterparts. Although, at least in western societies, the general public seems meanwhile to be well aware of the health risks associated with obesity (Tompson et al. 2012), its prevalence is at an all-time high and further increasing worldwide (WHO 2000;Ng et al. 2014).
Even for moderate weight loss (5-10% percent of body weight) in obese individuals, substantial benefits for objectively measurable health outcomes, such as blood lipid profiles or cardiovascular risk factors, have been established (Blackburn 1995;Wing et al. 2011). However, despite likely health benefits from losing weight, many obese struggle with realizing even small, sustained reductions in body weight. This ubiquitous everyday experience is also well documented in the scientific empirical literature. In a systematic review of long-term weight management schemes, Loveman et al. (2011), for instance, find that short-run reductions of body weight are commonly offset by subsequent weight regain. A better understanding of the mechanisms that make weight loss sustainable and the factors that let weight loss efforts fail is, hence, crucial for battling the obesity 'epidemic'.
One possible explanation is that moderate weight loss insufficiently induces short-term improvements in perceived health. Objective health measures, for which beneficial effects are well established, do not necessarily reflect patient's subjective health perception. Yet the latter is likely to matter much for health-and obesity-related behavior. If one realizes some weight loss under great efforts without feeling better, it may be tough to keep up the discipline to maintain or further reduce one's body weight.
In order to contribute to the discussion, we empirically address the question of whether moderate weight loss causally influences the subjective health perception of obese individuals. Several analyses have examined the relationship between self-rated health (SRH) and excess body weight. The vast majority of the existing literature find a significant negative association that is poor self-rated health accompanies obesity. Using a national survey with Americans, Ferraro and Yu (1995) find that-even after controlling for morbidity and functional limitations-obese individuals have a higher probability of bad self-rated health compared to normal weight individuals. Okosun et al. (2001) approve this finding also analyzing a sample of Americans. Phillips et al. (2005), Prosper et al. (2009), and Baruth et al. (2014) are further, more recent examples for analyses yielding similar results based on US data. This general pattern is not confined to studies using data from the USA. Guallar-Castillón et al. (2002), for instance, analyze a sample of Spanish women and find that overweight and obese individuals are significantly more likely to report poor health compared to normal weight women. Molarius et al. (2007) found that overweight (body mass index [BMI] ≥ 25 kg /m 2 ) and obese (BMI ≥ 30 kg /m 2 ) Swedes have a higher probability to rate their health as poor, compared to normal weight survey respondents. Using data from Finland, Johansson et al. (2009) establish a statistically significant and negative correlation between self-assessed good health and any measure for overweight they consider in their analysis, specifically raw weight, fat mass, waist circumference, and BMI. This result holds for both men and women. Examining health surveys from Portugal and Switzerland, Marques-Vidal et al. (2012) find that obese subjects rated their health significantly worse compared to their normal weight counterparts. This also holds for UK residents as shown in Ul-Haq et al. (2013b).
Only very few empirical studies yield mixed findings or do not find a significant association between self-rated health and obesity at all. Looking at American crosssectional data over a time span of 30 years , Macmillan et al. (2011) confirm the above pattern for women. Yet, for men, the association between obesity and SRH is weaker and only significant in roughly half of the considered years. Imai et al. (2008) find that the association of BMI and SRH varies significantly across different ages and sexes. They generally confirm previous findings, stating that being underweight or severely obese is associated with bad SRH. However, they find no significant association for obese men older than 65. Darviri et al. (2012) find no significant association between SRH and BMI for a rural population in Greece, neither do Kepka et al. (2007) using a sample of Hispanic immigrants in the USA.
Although a close association of excess body weight and SRH is very well documented in the literature, the question remains unsettled whether excess body weight causally affects self-rated health. Such effect is crucial for subjectively perceived health improvements encouraging obese individuals in their weight loss efforts. However, the mere correlation may just capture the influence of confounding third factors such as certain lifestyles that affect both body weight and self-perceived health. An example for a confounding third factor is sleep duration. Studies find short sleep duration to be associated with poor self-rated health (Frange et al. 2014) as well as obesity (Patel and Hu 2008). Stress may serve as another example for such confounding factors. An increase in stress is likely to have detrimental effects on self-rated health. At the same time, stress may induce overeating (Zellner et al. 2006). Moreover, reverse causality may also be an issue. One may, for instance, think of individuals who feel well and healthy and are motivated by this to practice an active lifestyle that prevents them from becoming overweight.
The above mentioned studies analyze the relationship of inter-individual weightvariation and self-rated health in cross-sectional data sets. They spend little effort in establishing causality in the link between SRH and obesity. One notable exception in this literature is Cullinan and Gillespie (2016) who employ instrumental variables estimation to identify a causal link. Following several examples from the literature (Ali et al. 2014;Cawley and Meyerhoefer 2012;Sabia and Rees 2011;Kline and Tobias 2008;Lindeboom et al. 2010), they use body weight of biological relatives (children) as instrumental variable. This choice of instrument seems to be well justified by evidence from adoption (Vogler et al. 1995;Sacerdote 2007) and twin studies (see Elks et al. 2012;Maes et al. 1997, for survey soft his literature), which suggests that shared genetics explain intra-family correlation of BMI much better than the shared social environment. 1 However, despite the major importance of genetic disposition, household level environmental conditions may still play some role for intra-family correlation of body weight. 2 They may, in turn, contaminate biological relatives' body weight as instrument, since such conditions may also matter for health and subjective health perception. More importantly, even if close relatives' body weight is a valid and strong instrument for the level of BMI or overweight status in a cross section of data, it can hardly be used as instrumental variable if the analysis is concerned with the effects of relatively small changes in body weight, which are observed over a relatively short period of time.
This is precisely the focus of the present analysis that aims at identifying subjective health effects of a moderate, short-term weight loss in obese individuals. Our contribution is to develop an empirical strategy that allows for identifying such intra-individual short-term effects. Following Cullinan and Gillespie (2016) and earlier work, we rely on instrumental variables estimation to establish a causal link. Yet, we do not adopt their instrument, which provides an exogenous source of variation in the long-term level of BMI. We rather make use of a randomized controlled experiment that exogenously induced short-term variation in body weight, 3 and hence provides a basis for identifying short-term effects attributable to moderate weight loss.
To summarize our analysis in a nutshell, we use data of 695 obese patients of four rehabilitation clinics, who voluntarily participated in a field experiment. Upon discharge, all participants were set an individual weight-loss target which they were prompted to realize within 4 months. The participants were randomly assigned to one control group and two incentive groups. Only the latter could earn monetary rewards of up to e 150 or e 300, respectively, for successfully reducing body weight. The participants were asked about their subjective health both by the end of the rehab stay and by the end of the 4 months weight-loss phase. Weight loss over 4 months turns out to be significantly associated with self-assessed health reported by the end of the weight-loss phase. In an instrumental variables (IV) estimation approach, we only use that weight variation for identification that is externally induced by the monetary incentives. In the IV estimation, the point estimates do not change much compared to the simple 'naïve' estimation approach. Yet the estimates become by far noisier, not allowing for judging the IV estimates as statistically significant. Yet, since statistical tests do not point to endogeneity being a major issue and since the point estimates are similar, we still regard the IV results as in concordance with our earlier findings. In quantitative terms, our results suggest that reducing body weight by one BMI unit increases the probability of rating his or her health as 'satisfactory' or better by roughly three percentage points.
The remainder of this paper is structured as follows: In Sect. 2, we introduce our data. In Sect. 3, we describe our estimation procedure. In Sect. 4, we show the results 1 A closely related identification strategy is to directly use genetic information as instrument for body weight. Few recent contributions (Norton and Han 2008;Fletcher and Lehrer 2011;von Hinke et al. 2016;Willage 2018), which consider different outcomes than subjective health, have adopted this strategy. 2 Price and Swigert (2012) find substantial weight differences among siblings who reside in the same households. The authors mention differing parental behaviors across siblings as a possible explanation. 3 Reichert (2015) and Reichert et al. (2015) use the same source of exogenous weight-variation but consider different outcomes than health. of our estimations. Finally, in Sect. 5, we summarize and discuss our main findings and present a conclusion.

The field experiment
The data used in the present analysis originate from a field experiment that was conducted by RWI-Leibniz-Institut für Wirtschaftsforschung. Its prime objective was to test whether monetary incentives are an effective instrument for assisting obese individuals in losing body weight. Four medical rehabilitation clinics operated by the German Pension Insurance of the federal state of Baden-Württemberg and the association of pharmacists of Baden-Württemberg cooperated with RWI in this project. The Pakt für Forschung und Innovation, which is part of the excellence in research initiative of the German federal government, provided funding. The study protocol of the project was approved by the ethics commission of the Chamber of Medical Doctors of Baden-Württemberg. See Augurzky et al. (2018) and Augurzky et al. (2014) for a more detailed discussion of the project.
Upon admission to one of the four involved clinics, 695 4 obese individuals were recruited for participation in the experiment between March 2011 and August 2012. The medical staff in charge was advised to approach any new patient whose BMI exceeded 30 5 and to invite him or her to take part in the experiment. Yet, participation was entirely voluntary and had no consequence for any treatment or advice the patient received over their rehab stay, which usually takes 3 weeks. The prime objective of rehab stays in these clinics is to preserve, or to restore, patients' workableness. Our study population is hence biased toward the working population, which is, however, no challenge to the internal validity of our analysis. For the vast majority of participants, obesity was not the prime reason for being sent to rehabilitation. Yet, many suffered from health problems related to overweight such as chronical back pain. Hence, all obese patients, irrespective of participation in the experiment, were advised to reduce their body weight.
At rehab discharge, participants' body weight was measured again and participants were set an individual weight-loss target by the physician in charge, which they were prompted to realize within 4 months. Physicians were asked to choose a weightloss target of about 6-8% of current body weight. Yet they were in principal free to deviate from this guideline. Near the end of the rehab, the participants received a questionnaire, which they were prompted to answer. The questionnaire covered a wide range of questions regarding socio-economic characteristics and weight-related 4 Originally 700 patients were recruited, yet five had to be excluded because of ex-post violation of the inclusion criteria (pregnancy, developing cancer) or missing documents. 5 In addition to BMI > 30, a detailed list of inclusion and exclusion criteria needed to be met. In detail the inclusion criteria were: age between 18 and 75 years and resident of the federal state of Baden-Württemberg; while the exclusion criteria were: pregnancy, psychiatric illness, eating disorder, carcinosis within the past 5 years, drug or alcohol abuse, a significant language barrier, and a severe generalized disease. Since the latter broadly defined criterion refers to a generalized disease, local diseases that affect only specific organs or bodily functions were in principal no criterion for being excluded. behavior, such as exercising or eating habits. Most importantly, participants were also asked about their current health status. Two health questions addressed self-rated health and physical well-being, in a standard fashion. The questionnaire was collected (in a sealed envelope) at the appointment with the physician in which the weight-loss target was fixed.
Right after rehab discharge, the participants were randomly assigned to one control and two treatment/incentive groups, and subsequently informed about the result of the randomization by regular mail (intervention). While in this letter all participants were prompted to realize their weight-loss target, treatment group members were informed about the monetary reward they could earn by being successful in losing weight. 6 For one treatment group, the maximum reward was e 150; for the other, it was e 300. If participants failed to realize at least 50% of the contractual weight loss, they did not receive any money. If they were partially successfully, i.e., they lost more than 50% but less than 100%, they were rewarded proportionally to the degree of target achievement. 7 By the end of the 4 month weight loss period, all participants received another letter, by which they were prompted to visit a specified pharmacy in a specific week for a weigh-in. Body weight measured in the pharmacy served as basis for the cash out of rewards. Upon attending the weigh-in all participants, irrespective of the experimental group they were assigned to, received an expense allowance of e 25. Each letter was accompanied by a questionnaire, which included the same set of questions as the questionnaire the participants had answered by the end of the rehab stay. In particular, the questions addressing subjective health were exactly the same and made no reference to information the patients had provided earlier.
The experiment included two further phases: a 6-month weight-maintenance phase, which directly followed the weight reduction phase, and a subsequent 12-month follow-up phase. In the weight-maintenance phase, participants who were at least partially successful in meeting their weight-loss target were offered another monetary reward for not exceeding their target weight. In the follow-up phase, participants were not exposed to any monetary incentives for weight loss. In both phases, the weigh-in procedure was the same as for the weight-loss phase. The present analysis only uses information up to the end of the weight reduction phase. The reason for this is that in the weight reduction phase the exogenous source of weight variation, i.e., being 6 At recruitment, all participants were informed about the design of the experiment (randomization, monetary rewards). Control group members, hence, knew that they missed the chance of financially benefitting from losing weight. In consequence, the intervention may have had adverse motivational effects in the control group. Indeed, 55% of the members of this group reported (in the second survey) disappointment about the randomization outcome. Twelve individuals even reported to have eaten more in response to not being assigned to an incentive group. Nevertheless, the data does not reveal a significant correlation of the level of disappointment and the achieved weight loss in the control group. Moreover, possible adverse effects on the weight loss motivation is no challenge to using the group assignment as instrument. It still provides an exogenous source of weight variation and the monotonicity assumption is not violated, since possible adverse motivational effects operate in the same direction as the lack of financial incentives. 7 One may suspect that individuals who participate in the experiment are more likely to be motivated to lose weight than obese individuals in the general population. Due to the random treatment assignment this is however no threat for the internal validity of our analysis. However, we cannot rule out that the effect of weight loss on health for our sample considered differs from the general obese population, which is why external validity may be limited. member of the control or the treatment arm of the experiment, is clearly random by the design of the experiment. This applies less to the subsequent weight-maintenance phase, since the second randomization was conditional on success in the previous phase.
The econometric analysis rests on information which was collected at rehab discharge and by the end of the weight-loss phase. While the information regarding body weight is complete for the first time of measurement, this does not hold for the second, since roughly one-fourth of the participants did not attend the weighin by the end of the weight-loss phase. In consequence, weight-change information is available for only 517 participants. Augurzky et al. (2018) comprehensively discuss the issue of experiment drop-out and its possible implications. Using a battery of different econometric techniques, they find that the results are rather robust to correcting for selective drop-out. Unlike body weight, which was measured in the clinic or the pharmacy, the information regarding self-rated health and physical well-being was collected through a written questionnaire. This renders item nonresponse an issue, which further reduced the size of the estimation sample to 485 individuals in the self-rated health estimation and 468 in the physical well-being estimation, for which weight and health information is available for either time of measurement.

Variables used in the empirical analysis
We employ two variables to measure the outcome subjective health perception: (i) selfrated health (SRH) and (ii) physical well-being (PWB). Self-rated health is measured by asking the respondents "how would you describe your current health status?" and allowing for five possible answers: "excellent", "good", "satisfactory", "poor" and "bad". 8 Physical well-being is measured by asking the respondents "how would you describe your current physical well-being?", allowing for the same five possible answers.
While either variable measures subjective health perception, they potentially capture different aspects of it. PWB emphasizes subjectiveness in health perception even stronger, while SRH leaves more room for objectifying the reported health status. For instance, an obese individual without any health impairments might rate her physical well-being as excellent. At the same time, she is probably aware that her excess weight is a risk for her health. Although feeling healthy she might therefore report a relatively poor SRH, to account for potential health risks.
While any questionnaire the participants were asked to fill in included questions about SRH and PWB, the present empirical analysis focusses on SRH and PWB that was reported by the end of the 4-month weight reduction phase. These variables, 8 A wide variety of methods to assess subjective health perception have been suggested in the literature. These methods include multi-item measures as well as single item-measures. An example of a multi-item measure is the often used Medical Outcomes Study Short Form 36 (SF-36) (Ware et al. 1993). Most studies using the SF-36 find obesity to be associated with poor subjective health perception (see Kroes et al. 2016;Ul-Haq et al. 2013a;Kolotkin et al. 2001;Fontaine and Barofsky 2001, for reviews). denoted as SRH 1 and PWB 1 enter the econometric model at the left-hand side. 9 The analysis also makes use of self-rated health and physical well-being reported at rehab discharge, i.e., at the outset of the weight reduction phase. As single item measures that do not refer to any objective health indicator but are purely subjective in nature, SRH and PWB are well suited for analyzing self-perceived rather than objectively measured health effects. Table 1 displays the (joint and marginal) sample distribution of SRH for both considered times of measurement. Not surprisingly-all respondents underwent medical rehabilitation for some reason-the share of individuals who regarded themselves in excellent or good health is smaller than in general population surveys such as the German Socioeconomic Panel (SOEP). Nevertheless, SRH exhibits substantial heterogeneity between individuals. From Table 1, it also becomes obvious that self-rated health considerably varies at the individual level over the observation period. 10 For 54% of the participants, we observe a change in SRH (off-diagonal elements in Table 1), while 46% report the same category of SRH at the beginning and by the end of the weight reduction phase (values highlighted bold in Table 1). 60% of all changes are improvements in SRH (cells above the principal diagonal). Among the participants who reported SRH changes, 81% report a change to an adjacent category. Yet, some rather drastic shifts in SRH, e.g., from 'excellent' to 'poor' or the other way round, are observed.
The corresponding (joint and marginal) sample distribution of PWB at rehab discharge (PWB 0 ) and at the end of the weight-loss phase (PWB 1 ) is displayed in Table 2. 11 Comparable to SRH, physical well-being exhibits substantial heterogeneity between individuals and varies at the individual level over time. For 60% of the 9 The subscript 1 is a time index that refers to the information gathered by the end of the weight-loss phase (period 1). The subscript 0 indicates pre-intervention values that is SRH 0 (PWB 0 ) denotes self-rated heath (physical well-being) at rehab (period 0) discharge. This notation analogously applies to all variables that are measured at different points in time such as the body mass index BMI 1 and BMI 0 . 10 If no within-individual variation of SRH was observed, linking changes in SRH to weight change would arguably make little sense. 11 The number of observations for the variables measuring subjective health perception differ -individuals reported their self-rated health status slightly more often.  participants, we observe a change in PWB (off-diagonal elements in Table 2), while 40% report the same category of PWB at the beginning and by the end of the weight reduction phase (values highlighted bold in Table 2). 60% of all changes are improvements in PWB (cells above the principal diagonal). Among the participants for which reported PWB changes, 79% report a change to an adjacent category. Self-rated health and physical well-being are obviously closely related measures and are strongly correlated in the sample. However, as their correlation is far from perfect, the two variables seem to capture different aspects of subjective health perception. Table 3 displays the (joint and marginal) sample distribution of SRH and PWB at the end of the weight-loss phase. Most respondents report the identical answer category for both variables (61%). However, 25% of the respondents reported better SRH, while 14% of the respondents reported better PWB. 12 Only 1% of the respondents deviated by more than two answer categories (bad SRH and excellent PWB). 13 12 This pattern is similar when we look at the relationship of self-rated health and physical well-being at the end of the rehab-phase (correlation coefficient of 0.64). Here 58% of respondents reported the same answer category for both variables, while 28% of respondents reported better SRH and 14% reported better PWB. See Table 7 in Appendix. 13 Excluding these individuals from the analysis does not change our results in qualitative terms. Body weight, which is the key explanatory variable in the present analysis, is measured in terms of the body mass index. 14 Rather than its level, we consider the absolute change (BMI loss ≡ BMI 0 − BMI 1 ) between rehab discharge and the end of the weight-reduction phase as regressor. By this choice, we emphasize that the focus of the analysis is on the effects of within-individual weight loss rather than between-individual heterogeneity in the level of BMI. 15 The variation of weight change in the sample is quite substantial. 81% of the participants lost weight. Mean weight change is 1.56 BMI units. The median of the weight loss distribution (1.49) is close to the mean. The 95% quantile is 4.71, indicating that a substantial share of participants managed to materially reduce body weight over the 4 month weight-loss phase. Yet, the 5% quantile is −1.37, pointing to substantial weight gain being not a rare phenomenon in the sample; see Fig. 1 for sample distribution of BMI loss. Figure 1 also illustrates that members of the incentive groups were on average clearly more successful in reducing body weight (cf. Augurzky et al. 2018).
14 For decades, BMI and its commonly used threshold value of 30kg/m 2 (WHO 2016) have been criticized as an, at least in certain circumstances, inappropriate measure of clinical obesity (Garn et al. 1986). We nevertheless stick to this frequently used measure. Since we consider changes of BMI over a relatively short period of time, rather than comparing the level of BMI between individuals, several shortcomings (age dependence, indifference regarding lean and fat tissue, etc.) of the BMI are arguably of little importance. Using percentage change in body weight instead of absolute change in BMI as weight change measure yields largely equivalent results in our empirical analysis. Moreover, the problem of misreported height and weight (cf. Gorber et al. 2007) is of little relevance to our study since body weight is not self-reported but measured by clinic staff or pharmacy staff. 15 We include the pre-intervention level BMI 0 as control. Hence technically, our preferred specification is equivalent to including both the pre-and post-intervention level BMI 0 and BMI 1 at the right-hand-side of the regression model. While the median weight loss is 1.82 BMI units for the former, it is only 0.85 BMI units for the latter. Yet, it also becomes visible that the weight loss variation in the respective group is substantial and exceeds the variation between the groups. From the first panel of Table 4, one can see that-in a descriptive senseparticipants who lost weight are more likely to report good health. Among this group of the participants around 40% reported good or excellent health, while this only holds for around 19% of the participants who gained weight. 38% of the latter reported poor or bad health. In contrast, the corresponding share of participants who lost weight is only 19%. According to a Wilcoxon rank-sum test, the distribution of SRH 1 clearly differs (p-value 0.000) between individuals who lost weight and individuals who did not. The estimated probability for an individual from the former group to be in better health than an individual from the latter is 0.65. These descriptive findings line up with the general pattern of results found in the literature that less body weight is associated with better self-ratings of health. Considering physical well-being instead of self-rated health yields a very similar picture. Again, according to a Wilcoxon rank-sum test, the distribution of PWB 1 clearly differs (p-value 0.000) between individuals who lost weight and individuals who did not.
If the same descriptive analysis is applied to self-rated health measured at the beginning of the weight-loss phase, i.e., to SRH 0 instead of SRH 1 , we still find a significant (p-value 0.044), though less distinct, deviation in the distribution of selfrated health. At the one hand, this suggests that weight loss might be endogenous and, in turn, calls for an empirical approach that does not interpret the mere correlation as causal effect. On the other hand, this pattern suggests analyzing the effect of BMI loss on SRH conditionally on its initial level SRH 0 in order to account for persistent unobserved heterogeneity and to eliminate variation in the dependent variable that cannot be explained by a change in BMI. For this reason, SRH 0 enters the econometric analysis as control variable.
If the same analysis is applied to physical well-being measured at the beginning of the weight-loss phase (PWB 0 ), we do not find a clearly significant difference (pvalue 0.159) in the distribution of physical well-being. Yet, the share of respondents who rate their physical well-being at least satisfactory is still higher for respondents who lost weight. Hence, analogously to the regression explaining self-rated health, we control for PWB 0 when our outcome variable is PWB 1 in order to account for persistent unobserved heterogeneity. 16 As another approach to account for unobserved heterogeneity, we also control for initial body mass index BMI 0 . Though all participants were obese at the time of recruitment, BMI 0 exhibits pronounced heterogeneity ranging from 28 up to 60. 17 The average of the initial BMI is 37.26, while the median value of 36.03 is somewhat smaller, indicating that distribution of initial BMI is skewed to the right.
Due to the relatively small estimation sample, we abstain from specifying a rich regression model with a large number of controls. As basic socioeconomic characteristics we only control for age and gender. 18 As discussed in Sect. 1, we use exposition to monetary weight loss incentives as instrument for weight change. Though random assignment to the experimental groups is a very strong argument for the instrument being exogenous, direct effects of the group assignment on subjective health might still be a challenge for exogeneity. One such channel is anxiety of not earning the reward because of insufficient weight loss, which may negatively impact on health. 19 We cannot rule out that this channel plays some role. However, this effect should downward bias the estimated effect since only the members of the incentive groups are subject to such adverse effects of the incentives. This possible direct effect should hence not generate a spurious regression result in the IV estimation. Moreover, to dig deeper into this issue, we stratified the analysis of weight loss effects with respect to: (i) the weight-loss target [kg] (sample split at the median) and (ii) the size of the reward (e 150 and e 300). One may hypothesize that a more ambitious target and a higher amount of money at stake are more prone to elicit anxiety. Yet, regarding the effect of weight loss, we see no significant differences between the respective groups. 20 We take this as indication that possible anxiety of not earning the reward does not generate a major endogeneity problem. Another possible channel is that not weight loss itself, but the measures taken to achieve the reduction in body weight, affect subjective health. Though it is almost impossible to disentangle these two channels, the results of Augurzky et al. (2018), who find a much stronger effect of weight loss incentives on weight loss than on weight reducing activities such as doing sports and health eating, argue in favor of weight loss being the prime channel through which the incentives operate. 16 Both SRH 0 and PWB 0 enter the model in a linear way. Estimating the model with dummy variables for the different categories of SRH and PWB does not alter our results. 17 Many individuals already lost weight over the rehab stay. This is the reason for some participants entering the weight-loss phase with a BMI smaller than 30. 18 We also estimated models with more explanatory variables (controlling for education, income and employment), however, the results of those models (reported in Tables 12 and 13 in the Appendix) are similar to the results of our preferred specification, where the number of observations is higher. 19 We would like to thank the reviewer for pointing us to this issue. Though the experiment involved two treatment groups which were offered incentives of different size, in the regression analysis we use a simple dummy that indicates random assignment to one of the treatment groups. Pooling the treatment groups is in line with the finding of Augurzky et al. (2018) that the size of offered monetary reward proved to be immaterial for realized weight loss. Descriptive statistics for all variables that enter the preferred regression model are provided in Table 5.

Estimation procedure
In order to take the ordered categorical nature of our dependent variables SRH 1 and PWB 1 into account, the econometric analysis rests on ordered probit models. We start with estimating a conventional specification of this model that regards all regressors as exogenous. Besides the key explanatory variable BMI loss, pre-intervention body weight BMI 0 , age and gender enter the models at the right-hand-side. Additionally, we control for pre-intervention self-rated health (SRH 0 ) or pre-intervention physical wellbeing (PWB 0 ), depending on the dependent variable that is used. This basic model specification serves as reference.
Yet, as discussed above, results from conventional ordered probit estimation are most likely biased, due to unobserved confounders affecting both subjective health perception and BMI loss, as well as reverse causality. To tackle possible endogeneity bias, and to allow for identifying a causal effect of BMI loss on subjective health perception, in our preferred empirical model we do not only rely on naïve ordered probit estimation, but tap an exogenous source of variation in body weight for identifying the effect under scrutiny. Random assignment to either the control or the treatment arm of the experiment generates weight variation, which by the experimental design is exogenous. Moreover, as shown elsewhere (Augurzky et al. 2018), the incentive treatment was clearly effective and hence induced exogenous variation in weight loss. Technically, the binary indicator incentive, which indicates assignment to one of the two incentive groups, serves as instrument for BMI loss. 21 If health was measured on a continuous scale, two-stage least squares would be an obvious choice for the estimation procedure. However, this choice would conflict with the ordered categorical nature of SRH 1 and PWB 1 . 22 We, hence, opt for a more parametric approach to instrumental variables estimation. That is, we augment the equation of prime interest by a second equation that specifies the endogenous regressor BMI loss as a function of the instrument incentive and the covariates that enter the main equation, and assume joint normality of the two error terms. The cross-equation error-correlation, hence, captures possible endogeneity of BMI loss. Joint estimation by full-information maximum likelihood (ML) is straightforward for this model. 23

Results for the basic model
In this section, we present and discuss results for the regression models we introduced in the previous section. Columns one and two in Table 6 display the estimation results of the naïve model that does not take possible endogeneity into account.
The results are in line with those of the majority of the related literature. In both specifications we find a statistically significant association between weight change and subjective health perception, where weight loss is positively associated with the inclination to report a better status of subjective health. In terms of magnitude, the estimated coefficient is similar for both measures of subjective health perception.
In quantitative terms the point estimate of 0.15 (self-rated health) translates into an average increase in the probability of rating one's health 'satisfactory' or better of 3.7 percentage points if one reduces her or his body weight by one BMI unit. For physical well-being the average marginal effect is of similar magnitude. A reduction of body weight by one BMI unit is associated with an increase in the probability of rating one's physical well-being 'satisfactory' or better by 4.7 percentage points.
Turning to the coefficients of the control variables, BMI 0 is not significantly associated with subjective health perception. Yet, not surprisingly, the coefficients estimated for initial subjective health perception (SRH 0 and PWB 0 ) are positive and highly sig- Table 6 Coefficient estimates of Naïve ordered probit and IV-ordered probit models (full sample) Naïve ordered probit IV-ordered probit Dependent variable nificant, revealing pronounced persistence in subjective health perception, which has already been observed in Tables 1 and 2. The simple ordered probit regressions indicate a gender differential in subjective health perception, with women exhibiting a less favorable subjective health rating both in terms of SRH and PWB. The regression analysis does not yield a significant association between age and subjective health perception 24 . We also find no significant influence of age on physical well-being.

Results from IV estimation
As discussed above, the results presented in columns one and two of Table 6 might suffer from endogeneity bias regarding the coefficient attached to BMI loss. In this subsection, we discuss estimates that address this issue by the use of an instrumental variable. 25 Columns three and four of Table 6 display coefficients for the model that relies on weight variation induced by randomly assigned weight loss incentives for identifying the coefficient of prime interest. Besides the coefficients of the equation of prime interest (upper panel), average marginal effects of BMI loss as well as estimates for the auxiliary equation explaining BMI loss (lower panel) are also displayed. Starting with the instrumental equation, estimation results indicate that cash incentives have a substantial effect on achieved weight loss, see second panel of Table 6 columns three and four. This result, which has already been established in the literature (e.g., Augurzky et al. 2018;Volpp et al. 2008;John et al. 2011;Cawley and Price 2013;Paloyo et al. 2014), is important for the present analysis, as it points to the experiment generating exogenous variation in BMI that can be used for identification. Indeed, the indicator incentive proved to be a rather strong instrument for BMI loss. The relevant F-statistic 26 is 27.72 and 28.54, respectively, 27 which clearly exceed the conventional threshold value of 10 (Stock et al. 2002). Besides this key result regarding instrument relevance, the estimates for the auxiliary equation indicate that those who start with a high initial BMI are more likely to lose weight. Yet it is worth mentioning that including controls in the instrumental variable equation is of minor importance since the randomization balances the covariates between the groups. 28 Dropping the controls from the auxiliary equation thus has very little effect on the results.
Turning to the equation of main interest, we see almost no change in the coefficients of the control variables as compared to simple ordered probit. Yet, with respect to the effect of weight change on subjective health perception, we find a pattern of results that in some respect deviates from its counterpart from naïve estimation. The point estimates are smaller and the coefficients of BMI loss turn statistically insignif-24 Including age 2 as additional regressor does not point to a non-linear relationship between SRH and age. 25 From a purely technical perspective, one could argue that instrumental variables are not required for identification that could solely rest on the non-linearity of the model. Yet, this will rarely work in practice. Indeed, in the present application the optimization procedure runs into serious convergence problems if incentive is not included as instrument. 26 It is calculated from estimating the auxiliary equation separately by least squares. 27 The first stage regressions differ slightly between the models explaining SRH 1 and PWB 1 , since either SRH 0 or PWB 0 enters the model at the right-hand-side. 28 A joint balancing test (Pei et al. 2019, p. 212) yields an F-statistic as small as 1.06 ( p-value 0.375), i.e., the test is far from rejecting the null of no association of any covariate with the group assignment. icant. However, the estimated BMI loss coefficients are still positive. For the model explaining self-rated health, the deviation in the point estimates and the corresponding marginal effects is very small. Yet, due to the large standard errors the 95-% confidence interval around the IV-estimates is wide, more precisely [−0.092, 0.345]. For the model explaining physical well-being the deviation from the IV estimate and its counterpart from simple ordered probit is more pronounced. But still, the associated confidence interval, which is [−0.144, 0.287], includes the coefficient from the naïve approach. The same line of argument applies to the corresponding marginal effects. The 95-% confidence intervals around the estimated mean effects are [−0.022, 0.085] and [−0.042, 0.086], respectively. They clearly include the relatively precisely estimated effects from simple ordered probit estimation. Yet, they are that wide that one cannot reject effects which are much smaller (even zero effects and effects to the opposite direction) or much bigger than those one obtains from the naïve estimation approach.
The lack of statistical significance of the estimated effects in the IV approach, thus, seems first of all to be a standard error issue, and can hardly be interpreted as evidence for the absence of a weight loss effect on subjective health. Evidently-although the instrument is not weak-augmenting the naïve model by the instrumental equation inflates the noisiness of the estimates substantially. The reason for this is that the instrument, despite the large F-statistic, still explains a relatively small fraction of the variation in the endogenous variable BMI loss. The partial R-squared is, indeed, just 0.051 despite the relatively large F-value of 27.72; compare Fig. 1. 29 The finding that instrumenting BMI loss does not fundamentally change the estimated key coefficients in our main specifications is mirrored by the estimates of the cross-equation error correlation. The estimate is positive but of moderate or even negligible magnitude and statistically insignificant. Though the positive sign argues in favor of unobserved confounders may play some role in the correlation of subjective health perception and BMI, the estimates still provide little evidence for endogeneity of BMI loss being a major issue. Moreover, based on linear specifications (see next subsection) that employ OLS and 2SLS rather than ordered probit and IV ordered probit, Hausman-Wu tests do not yield evidence for any systematic deviation between instrumental variables and naïve estimation. 30 One possible reason for this pattern is that relying on short-term, within-individual variation cuts or at least weakens already several channels-for instance, health-conscious attitudes, educational and family background, and certain genetic endowments-that are likely to be major sources of endogeneity in cross-sectional data based analyses. 29 The issue of why a small share of variation in the endogenous regressor that can be attributed to variation in the instrument inflates the variance becomes clearer if one considers a two-step control function approach (e.g., Wooldridge 2015), which is a close alternative to joint ML estimation. In the present analysis the twostep estimator yields almost the same coefficient estimates as those reported in Table 6, columns 3 and 4. In the control function approach the first stage residual enters the main equation as regressor in addition to BMI loss. The less variation is explained by the first-stage regression the stronger the residual is correlated with the endogenous regressor. Hence, in technical terms, additionally including the first stage residual means that another explanatory variable enters the model that is substantially correlated with the regressor of primary interest. This necessarily inflates the variance. In this context it is important to note that the argument of a 'sufficiently' strong instrument primarily addresses the issue of the IV finite-sample bias, but not the issue of instrumental variables estimation inflating the variance; see e.g., Angrist and Pischke (2009, p. 208). 30 For the majority of specifications the p-value exceeds 0.9.
To sum up this discussion, though the instrumental variables estimation does not yield a clear cut result regarding the effect of weight change on subjective health perception, the pattern of results is still telling. While, the point estimates argue for the naïve empirical approach suffering from some upward bias, the rather noisy IV estimates can hardly put the general finding of weight-gain in obese individuals affecting subjective health perception detrimentally into question. This in particular holds since the IV approach reveals little evidence for the naïve estimates suffering from a severe endogeneity bias. In qualitative terms our results, hence, do not conflict with the bulk of the literature. They also do not contradict those of Cullinan and Gillespie (2016) 31 , who carefully designed their analysis to allow for a causal interpretation of the link between body weight and self-rated health. This appears to be an interesting finding, given that the present analysis exploits a source of variation for identification that is very different from what is used in Cullinan and Gillespie (2016) and that, in consequence, the nature of the estimated effect also differs. While Cullinan and Gillespie (2016) rest on genetics as a persistent and long-term determinant of body weight, the present analysis uses short-term extrinsic incentives. Thus, the result of the former can be interpreted such that permanently reducing the BMI of an obese individual to a normal level will improve her SRH substantially. In contrast, our results-at least in terms of the point estimates-suggest that also a small reduction in body weight will make an obese individual instantaneously feel healthier. This distinction is important for the question of how to motivate obese individuals to lose weight. Even if substantial weight loss is known to pay-off in the long-run for sure, obese individuals may still need some instantaneous improvement in subjectively perceived health in order to keep the discipline to continue their weight loss efforts.
Comparing our results in quantitative terms to those of Cullinan and Gillespie (2016) is not straight forward as their analysis relies on a very different source of weight variation and since they consider a categorical measure of the weight status (healthy weight, overweight, obese grade I, obese grade II) as key explanatory variable, rather than an continuous measure of weight change as we do in our analysis. Nevertheless we use a simple simulation to translate our results into figures that allow for being compared to the results of Cullinan and Gillespie (2016). More specifically, based on the point estimates from the IV model (Table 6, column 3), for each participant we calculate the change in the probability of reporting to be in excellent health-this is the probability Cullinan and Gillespie (2016) focus on when discussing their results-that would occur if she reduced her body weight to a BMI of 25. We hence consider a shift from obesity to a healthy body weight as Cullinan and Gillespie (2016) do. Moreover, following the line of how they present their results, we examine the mean change in this probability separately for men and women and for grade II obese (BMI > 35) and grade I obese individuals. 32 At least for grade I obese individuals our results are 31 They identify a strong link between BMI and SRH in obese individuals, while their instrumental variables estimation results are inconclusive for moderately overweight individuals. 32 Because some participants lost some weight during their rehab stay, this group includes few individuals who are not obese in the sense that BMI 0 does not exceed 30; see footnote 17. Yet, because of the small number of individuals to whom this applies and because of the fact that even the least overweight participant left the rehab clinic with a BMI that exceeded 28, we do not distinguish between being grade I obese and being just overweight. surprisingly similar to those of Cullinan and Gillespie (2016). For women we calculate a mean change of 7.0 percentage points while the corresponding value in Cullinan and Gillespie (2016, IV with controls) is 9.1. For grade I obese men our approach yields a mean change of 9.7 percentage points while the corresponding value in Cullinan and Gillespie (2016) is 11.6. For the grade II obese the results are less well aligned, but the overall pattern is still similar. Cullinan and Gillespie (2016) report effects of 17.1 (women) and 20.1 (men) percentage points on the probability to be in excellent health, while we calculate mean effects of 31.7 (women) and 40.0 (men) percentage points, respectively. 33

Robustness checks
We ran several robustness checks in order to test how sensitive the results are to changes of the model specification. (i) We estimated all discussed model specifications with additional controls for education, employment, and income measured at rehab discharge; see Tables 12 and 13 in Appendix. This does not change the overall pattern of results. The coefficients of the naïve model are hardly affected. Their counterparts from instrumental variables estimation remain inconclusive, as they do not change consistently in the same direction. For SRH the point estimate of the BMI loss coefficient gets smaller, while it gets bigger for PWB. (ii) We reduced the number of health categories to just three merging 'good' with 'excellent' and 'bad' with 'poor'. This just marginally affects the coefficient estimates; see Table 11 first panel in Appendix. As another robustness check, (iii) we excluded individuals with extreme changes in BMI, in order to check for few extraordinary cases possibly driving the empirical results. We considered two definitions of extreme, (BMI loss > 5) and (BMI loss < −2 | BMI loss > 5). For both, the pattern of results remains largely unchanged; see Table 17 and 18 in Appendix.
In order not to rely exclusively on fully parametric model, (iv) we also ran twostage least squares (2SLS) regression. To avoid interpreting SRH as being measured on an interval scale, we recoded the dependent variable to have just two categories and, in consequence, we estimated linear probability models by 2SLS and as reference also by OLS. Since transforming five-category SRH to a binary health indicator involves the somewhat arbitrary choice of a cutoff category, we tried all four possible variants; see Table 11 second to fifth panel and Figs. 2-7 in Appendix. If the left-hand side variable is specified to indicate one of the two extreme categories ('better than good' or 'bad') the effect of BMI loss gets very small or even vanishes. Interestingly, this holds for both OLS and 2SLS. One possible explanation is that these categories are rather rare in the sample, hampering the identification of effects on these extreme categories; see Table 1. An alternative, less technical, explanation is that relatively small changes in body weight will rarely be the reason for a shift in subjective health perception to an extreme. If an interior category is chosen as cutoff, the linear model insofar mirrors the results of ordered probit estimation as the naïve estimator yields a significant and favorable effect of weight loss on SRH, while IV does not. For the variant with the left-hand side variable indicating poor or bad SRH, the 2SLS coefficient even turns negative. This does not apply to the variant with an indicator for 'SRH neither good nor excellent' serving as dependent variable. There, the 2SLS coefficients are similar or larger than their OLS counterparts.
Since due to drop-out the estimation sample is considerably smaller than the initial 695 recruited participants, non-random sample attrition might bias our results. (v) To address the concern, we estimated model specifications that include a third equation explaining experiment attrition that is jointly estimated with the remaining two. As an 'instrument' for attrition this additional equation includes a dummy variable 'pharmacy in town' which indicates whether the assigned pharmacy for the weigh-in lies in the same zip-code as the respondent's place of residence. The results from these specifications suggest, that non-random sample attrition is likely to be no issue; see Table 15 and 16 in Appendix. 34 As displayed in Table 5, initial body weight varies substantially in our sample. To check whether this heterogeneity is mirrored by heterogeneity in the effect of weight loss on self-perceived health, (vi) we stratified the analysis by initial BMI. Figure 8 in Appendix displays the respective estimated coefficients 35 of BMI loss, which do not reveal a striking pattern of heterogeneity with respect to BMI 0 .
Since earlier studies found gender differences in the relationship between BMI and SRH (Imai et al. 2008), besides the pooled model, we also conduct separate regression analyses for males and females in the specification of reference as well as in the robustness checks. The naïve regression models yield results that are rather similar to those from the pooled model. However, for the stratified instrumental variable analysis the results deviate more from the naïve estimation compared to the pooled model: In the male sample, the estimated effect of BMI loss on PWB gets smaller, which is similar in the female sample for the effect of BMI loss on SRH. However, neither the cross-equation error correlation in the male nor the cross-equation error correlation in the female sample is significant, which is in line with our main results, stating that endogeneity of BMI loss does not seem to be a major issue. Due to the rather small sample sizes and the instrument becoming relatively weak in these sub-samples the instrumental variable analysis seems not to be overly informative and has to be taken with a grain of salt.

Discussion and conclusion
This paper analyzes the relationship between moderate weight loss and subjective health perception in obese individuals. We confirm the results of the related literature, which find a significant association of body weight and subjective health perception. Unlike the bulk of the existing literature, the present analysis is not only concerned with the association of body weight and self-rated health, but employs instrumental variables estimation to establish a causal link. In doing this, it follows Cullinan and Gillespie (2016) who also use instrumental variables estimation to identify a causal effect of BMI on SRH. Our analysis differs from this key reference, by tapping a completely different source of exogenous weight variation. While Cullinan and Gillespie (2016) use BMI of biological relatives as instrument and rest on exogenous genetically determined, long-term, between-individuals weight variation for identification, we use cash incentives of a weight loss intervention as instrument and, hence, rely on short-term, within-individual variation. Though, our instrumental variable approach does not establish statistical significance of the effect under scrutiny, the pattern of results suggests that the positive association of subjective health perception and weight loss is not primarily due to unobserved confounders. Our results hence appear not to conflict with the finding of Cullinan and Gillespie (2016). It, nevertheless, adds a relevant aspect to the insights into how weight loss affects subjective health perception in obese individuals. While Cullinan and Gillespie (2016) establish that obese individuals' health perception will improve if they manage to become normal-weight, our analysis yields some evidence for subjective health improvements accompanying even small initial weight reduction. This finding may encourage obese individuals in their weight loss attempts, since they can expect to be immediately rewarded for their efforts by subjective health improvements.
With respect to external validity our results have, however, to be interpreted with some caution. Our analysis uses a very specific sample of obese individuals. In particular-besides being admitted to a rehab clinic and meeting the inclusion criteria discussed in section 2-all participants actively selected themselves into the sample by agreeing to participate in the field experiment. It is, hence, likely that our results are based on a sample that is selective with respect to the motivation for loosing body weight and probably with respect to the subjective likelihood of being successful in reducing overweight. Our findings may hence not one-to-one apply to the obese population in general. Nevertheless, we regard our results as relevant as our discussion focusses on obese individuals who try to lose body weight that is a subpopulation which in this respect is similarly selected as the study population. Moreover, as discussed above, our results are astonishingly similar to earlier findings that are based on very different samples of data. We take this as evidence that our conclusions are not completely specific to our study population.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A Appendix
See Tables 7, 8 , 9, 10, 11, 12, 13, 14, 15, 16, 17 and 18.   The number of observations differs between the main (468) and the auxiliary (487) equation of the econometric model. Descriptive statistics for the explanatory variables are for the auxiliary equation Standard errors in parentheses; * p ≤ 0.1, * * p ≤ 0.05, * * * p ≤ 0.01; † Sample average of individual marginal effects of BMI loss on the probability of rating one's health or physical well-being 'satisfactory' or better. Number of observations differ in the IV-Ordered Probit model: Due to joint estimation, participants for which SRH 1 and PWB 1 , respectively, is not observed but the endogenous regressor BMI loss is observed contribute to the log-likelihood function and enter the estimation sample. Cut points of the ordered probit estimations are displayed in Table 14 in the Appendix Standard errors in parentheses; * p ≤ 0.1, * * p ≤ 0.05, * * * p ≤ 0.01 † Sample average of individual marginal effects of BMI loss on the probability of rating one's health or physical well-being 'satisfactory' or better. Number of observations differ in the IV-Ordered Probit model: Due to joint estimation, participants for which SRH 1 and PWB 1 , respectively, is not observed but the endogenous regressor BMI loss is observed contribute to the log-likelihood function and enter the estimation sample. Cut points of the ordered probit estimations are displayed in Table 14 in the Appendix   Standard errors in parentheses; * p ≤ 0.1, * * p ≤ 0.05, * * * p ≤ 0.01 Standard errors in parentheses; * p ≤ 0.1, * * p ≤ 0.05, * * * p ≤ 0.01; † Sample average of individual marginal effects of BMI loss on the probability of rating one's health or physical well-being 'satisfactory' or better Standard errors in parentheses; * p ≤ 0.1, * * p ≤ 0.05, * * * p ≤ 0.01; † Sample average of individual marginal effects of BMI loss on the probability of rating one's health or physical well-being 'satisfactory' or better Standard errors in parentheses; * p ≤ 0.1, * * p ≤ 0.05, * * * p ≤ 0.01; † Sample average of individual marginal effects of BMI loss on the probability of rating one's health or physical well-being 'satisfactory' or better