The ‘Gender Life-Satisfaction/Depression Paradox’ Is an Artefact of Inappropriate Control Variables

Previous research has suggested that there is a ‘gender paradox’ associated with life satisfaction and depression: women are said to have higher levels of life satisfaction on average but also a higher likelihood of experiencing depression. That finding comes from quantitative analyses that include socio-demographic control variables. In this article I show that the inclusion of these control variables leads to biased results. In general, controls are to be selected on the basis that they are antecedents of the focal independent variable (as well as the dependent variable). When the focal independent variable is gender (or, more precisely, sex), no controls are required: there are no determinants of life satisfaction that also determine someone’s sex. If we include socio-demographic controls, we get biased results – because the controls themselves are affected by sex. More precisely: if we include controls (e.g. for income) to discern the difference between women’s and men’s life satisfaction, we get a result that fails to reflect the way women experience specific disadvantages (e.g. lower income) that contribute to lower life satisfaction. The same points apply to an analysis of depression. In a properly specified model (using data from the European Social Survey), there is no difference between women’s and men’s life satisfaction – so, there is no paradox with respect to depression.


Introduction
A recent article in this journal (Becchetti & Conzo, 2022) presented an analysis indicating that there is a 'gender paradox' in connection with life satisfaction and depression: on average, women have higher life satisfaction than men, but they are also more likely to report being depressed. Having shown that the paradox exists, the authors then proceed to offer an explanation for it: women are (it seems) more affected by events (whether good or bad) and less resilient in the face of challenges.
In this article I demonstrate that the appearance of a paradox in these terms follows from a misunderstanding informing the models used to draw their conclusions. To summarise their findings more precisely: Becchetti & Conzo (2022) find higher life satisfaction among women 'after controlling for all relevant socio-demographic factors ' (2022: p. 35). My core argument in this article is that other socio-demographic factors should not be controlled in an analysis seeking to learn whether women have higher life satisfaction than men. If we include controls in this context, we create bias in our results. (Angrist & Pischke, 2009, use the term 'bad controls' to describe instances of this sort.) Controls are of course sometimes relevant to a quantitative analysis of patterns in life satisfaction (and other forms of subjective well-being) -but a more explicit understanding of their function is required, so that we can identify clear criteria for selecting specific controls.
The position taken here is again that no controls for socio-demographic factors are required -and indeed that if we include these controls we create biased results. I establish this point in general terms by discussing in depth what controls do in the context of a regression model, with attention to the relationship between (potential) controls and the focal independent variable (here, gender). Many researchers consider controls only in connection with the relationship they have with the dependent variable (here, life satisfaction) -but I show why it is necessary to consider how they relate to the focal independent variable as well. I then analyse the gender/life-satisfaction relationship using the same data (i.e., from the European Social Survey, ESS) used in the earlier article (Becchetti & Conzo, 2022). The key finding is that when we exclude socio-demographic controls, there is no difference in the life satisfaction of women and men in Europe (and thus no paradox in connection with differences in depression).
The 'action' of this paper is methodological. I therefore do not review previous research on the 'paradox' in substantive terms (i.e., why we might expect to find one, or not to find one). There is a related larger body of research exploring sex differences in life satisfaction, as reviewed by Batz & Tay (2018), who conclude that this research consists of 'highly inconsistent' results. Use of 'bad controls' in research on this topic is not hard to find (see e.g. Montgomery 2022); lack of clarity about how to select controls for this question is likely a contributing factor to the inconsistency of results. There is an evident need for greater attention to this general methodological topic. So, I turn immediately to the task of identifying the right criterion for control-variable selection for this question (rather than engaging in a substantive review, beyond the initial comments above, of earlier research).

What Do Controls Do in Regression Models?
For many quantitative researchers, the idea that regression models need some control variables is typically taken for granted. There is commonly a discourse of 'net effects', a term that is understood to get us closer to a 'real' effect. In a bivariate analysis, we might discern a difference between two groups, but a bivariate analysis is typically only a starting point. We would then proceed to explore that difference by introducing controls. If a regression model with controls eliminates the difference (between the two groups) evident in a bivariate analysis, the typical conclusion is that the difference was not real.
The logic underpinning that approach is that we should control for 'other determinants' of the dependent variable. Sometimes this logic is made explicit (e.g. Hou 2014); in other instances it is not articulated directly but is nonetheless discernible. In the article by Becchetti & Conzo (2022), the key passage in these terms (on p. 39) is as follows: '…the gender paradox is confirmed after controlling for concurring factors affecting our dependent vari-ables…'. This passage gets very close to articulating the more general idea that we should include 'other determinants' of the dependent variable. This idea is at best incomplete and will sometimes give us biased results.
For this discussion it is helpful to use some shorthand notations.
Our key question asks about the impact of a focal independent variable (X) on the dependent variable (Y) -a proposition we can summarise as X→Y. The controls we might use can be labelled W. So, the perspective holding that we should control for other determinants of the dependent variable says: control for W where W→Y.
What is not explored in this perspective -indeed what needs much greater attentionis the relationship a potential control might have with the focal independent variable. We can identify three general patterns. (1) A particular control might be an antecedent of the focal independent variable; in other words, W→X. (2) The focal independent variable might influence the control; in other words, X→W. (3) Perhaps there is no relationship between X and W.
The inclusion of any particular control in a regression model will have different consequences for our estimate of X→Y, depending on which scenario applies for that control. A great deal hinges on these distinctions -and in particular on the difference between W→X and X→W. To see why, it helps to be clear about the purpose we would have for the inclusion of controls. The relevant purpose is: to avoid (or minimize) bias in our estimate of X→Y. If we do not have the right controls, our estimate of that relationship might give us a 'spurious' result, i.e., a result that doesn't reflect the real effect. The estimate might be entirely spurious (i.e., there is no impact of X on Y, at all), or partly spurious. In that latter scenario, we would see that there is an effect, but our estimate is biased upwards or biased downwards, i.e., overstating or understating the real effect.
In the classic example, a bivariate analysis of shoe-size (X) and academic ability (Y) gives biased results. The bivariate analysis fails to take into account the way the apparent relationship (having larger feet means higher ability) is actually generated by age (W). Once we control for age, we see that there is no X→Y result. The example works because the control fits the right criterion for selection of controls: we need controls that are antecedents of the focal independent variable (W→X) (see e.g. Gangl 2010, Pearl & Mackenzie, 2018. If however we include controls where X→W, we exacerbate bias (rather than remedying it). Suppose we want to estimate the impact of unemployment on life satisfaction. If we control for 'other determinants' of life satisfaction (W→Y), we are likely to control for income. But when we consider the relationship between unemployment and income, we can see that the important pattern here is X→W: when people lose their jobs, their income goes down. If we control for income in this context, we now have a result for the impact of unemployment on life satisfaction (X→ Y) that keeps income constant. The model now reflects a scenario that we know not to be true: it imagines that unemployment has no impact on income. By including this control, we obscure part of the impact of unemployment on life satisfaction: losing one's job leads to lower income, which leads to lower life satisfaction. A model that includes this control induces downward bias in our estimate of X→Y (Bartram, 2021a).
We are now in a position to see why inclusion of any socio-demographic controls creates bias in our estimate of the way life satisfaction might differ between men and women. For that analysis, we can ask about the relationship between controls and the focal independent variable (gender). In every case, the pattern is almost certain to be X→W. The pattern cannot be W→X. People's gender (or, more precisely, their sex) does not depend on other sociodemographic factors. People are not women (as against men) on the basis of their income, or their level of education, or the frequency of their social engagements, etc. Instead, we would see (and indeed expect to see) that sex has an impact on many other socio-demographic factors (so, X→W). The obvious example is the pay gap: women typically earn less than men, via a variety of processes that are rooted in the way people (including employers) relate to sex, including outright discrimination. If we control for these factors, we will induce bias in our estimates of the impact of the way women and men might differ in their life satisfaction.
The situation with respect to income here is analogous to the unemployment/income/ life-satisfaction example above. If being a woman leads (via a variety of socio-economic processes e.g. discrimination) to having a lower income on average, then if we control for income our estimate will fail to reflect the way being a woman is likely to be associated with lower life satisfaction. In other words, a model containing a control for income is likely to give an estimate for X→Y that is biased upwards. That outcome (upward bias) might follow from inclusion of any control pertaining to situations where women experience socioeconomic disadvantages: if we control for a sex-based disadvantage experienced by women, our models will hide part of the impact of being female on people's life satisfaction.
There is perhaps one caveat to consider, in connection with the possibility that some controls might fit the pattern W→X for this question. If people changed their sex in significant numbers, then perhaps we could see other socio-demographic factors as relevant controls. In other words, if changes in sex were widespread and (in particular) more common among higher earners (or, perhaps, among more educated people, or healthier people, etc.), then these variables would make some sense as controls. But the number of people changing their sex is far too low for this possibility to become relevant. The example also shows why it is important to use precise language here: what is in play is not a 'gender paradox' but rather a sex paradox. That idea becomes evident via close consideration of the ESS variable we can use for the analysis: it is labelled 'gender' (gndr), but the answers offered in the survey are male and female, i.e., sex. Changes in gender are more common -but what is relevant here as the focal independent variable is in fact sex.
Once again, then, we can see that we do not need socio-demographic controls here because those controls are not antecedents of sex; on the contrary, the pattern is X→W, so including these controls creates bias. We therefore proceed now to an analysis that accords with this perspective.

Data and Analysis
The question now becomes whether women have higher life satisfaction than men (along with greater likelihood of depression) when we include only the controls where W→X and take particular care to exclude controls where X→W (in other words, when we exclude socio-demographic controls). For this purpose I draw on data from the European Social Survey, at first using the same rounds (4 through 8, corresponding to 2008, 2010, 2012, 2014, and 2016) as in Becchetti & Conzo (2022). In this initial stage, the total sample size is 250,871. Non-response on certain key variables (and list-wise deletion) reduces this figure to an analytical sample of 150,447. In a subsequent step where socio-demographic control variables are omitted, the analytical sample is 249,335. A further analysis uses all nine rounds (spanning the period of 2002 to 2018), with an analytical sample of 430,523. This analysis draws on all 38 countries that have participated in the ESS. 1 (In the initial analysis using rounds 4 through 8, it is not possible to include Turkey, Serbia, Kosovo, Romania, Albania, Montenegro, and Luxembourg.) The variables needed for a correctly specified analysis are straightforward. The focal independent variable is sex, with two possibilities (male and female). 2 Life satisfaction, one of the key dependent variables, is given on a scale of 0 to 10; 0 is labelled 'extremely dissatisfied', 10 is labelled 'extremely satisfied', and the numbers in between are unlabelled. A variable for depression (the other dependent variable) is rooted in a question that offers four options for response regarding frequency: none or almost none of the time (in the past week), some of the time, most of the time, and all or almost all of the time. In line with the analysis in Becchetti & Conzo (2022), this variable is recoded into a binary variable: 0 for none or some of the time and 1 for most or all of the time. The analysis below will explore whether it makes sense to include controls for country and time (i.e., survey round).
I draw on additional variables used by Becchetti & Conzo (2022), to replicate their results, and to ensure that differences in my results vs. theirs are not an artefact of their reduced sample size following from list-wise deletion of non-responders. If non-response e.g. on income questions is associated with both sex and life satisfaction, the omission of these sample members might be another source of bias in the estimation of male/female differences in life satisfaction when the controls are included. For the replication, I draw on variables for age (agea, recoded into 10-year categories), economic activity status (mnactic, eight categories), household income (hinctnta, giving country-specific deciles), household size (hhmmb), education (eisced, seven categories), marital status (harmonised across maritala and maritalb, giving married, civil partner, separated, divorced, widowed, and never married), health (five categories), frequency of social interaction (sclmeet, seven options from never to every day), left-right political ideology (lrscale, options from 0 to 10), and feeling about household income (hincfel, four options, from 'living comfortably' to 'very difficult').
The analysis for life satisfaction below consists of ordinary least-squares regression models. An ordered probit analysis is superfluous, given that there are 11 options for response on the dependent variable (Ferrer-i-Carbonell and Frijters 2004). For analysis of depression, the analysis consists of logistic regression models. Table 1 gives univariate summaries of the variables used here. The table gives separate values for women and men -a separation that will help us interpret differences in analytical results below.  Table 1: women have slightly lower life satisfaction than men. The difference is statistically significant but also small -one tenth of a point on the 11-point scale. Statistical significance here comes not from a large difference between men's and women's life satisfaction but from the very large size of the sample (which has the consequence of reducing the size of the standard error, relative to a smaller sample). When we introduce controls for countries (Model 2) and then also survey round (Model 3), the coefficient for sex is almost identical to zero, and of course it is no longer statistically significant. The 'action' is in the inclusion of the control for countries; time (survey round) does not matter.

Women
Do we have grounds to prefer Model 2 over Model 1? Given the already small effect size in Model 1, the choice is not especially consequential. In both cases, we see no evidence supporting the idea that women have higher life satisfaction than man (a core component of the 'paradox' in Becchetti & Conzo 2022). I see no reason to argue for the opposite conclusion: if men have higher life satisfaction than women (i.e., in Model 1), the difference is quite small, and we should not use the presence of asterisks to conclude otherwise.
To insist on making a choice even so, we would need clarity on what the control for country would be doing in this context. At an aggregate level, using country as a control could make sense, in line with the idea that we want controls where W→X. Living in one particular country as against a different country does not determine someone's sex on an individual level -but different countries have different sex compositions in their populations. These differences could be consequential for life satisfaction, especially if the differences pertain mainly to older people (i.e., different life expectancies for women vs. men). Contrary to the popular 'u-shape' idea, life satisfaction decreases on average as people become very old (e.g. Kratz and Bruderl 2021). 3 If that decrease is generally sharper for women than for men (see e.g. Inglehart 2002) and men in certain countries are more likely (relative to the rates in other countries) to die at a younger age, then inclusion of country as a control could make sense when estimating the impact of sex on life satisfaction. We likely don't have to worry  about the possibility that the X/W relationship here is X→W; being male vs. female does not have any great impact on which country someone lives in. One scenario to consider has to do with migration, i.e., the possibility of a sex differential in migration rates. If there is a difference in migration propensity associated with (male vs. female) differences in life satisfaction, then including country as a control could perhaps induce bias. For our purposes, the choice is again not consequential. If we include country as a control (in Models 2 and 3), we do not see higher life satisfaction among women (relative to men). Instead, the levels are essentially identical. (In general, we are enjoined not to 'accept the null hypothesis' -but here the coefficients themselves are zero at two decimal points.) In the absence of evidence that women have higher life satisfaction than men, the idea of a 'gender paradox' disappears.
In Model 4, we see a result closer to Becchetti and Conzo's core finding (2022 -see their Table 3). In Model 4, women have higher life satisfaction than men -a difference of 0.14, on the scale running from zero to ten. (The full table, giving results for the control variables, is in an on-line supplement.) What should we make of this result? The meaning of the gender coefficient in this model is: women have higher life satisfaction as long as we set aside some of the important disadvantages they have. The disadvantages in question are evident in Table 1, especially in connection with income: we see an average level of income for women that is substantially lower than the average for men (5.1 vs. 5.6 -recall that these numbers pertain to country/round-specific deciles). The lower incomes for women are also reflected in the way respondents feel about their incomes: female respondents are less likely to say that they are 'living comfortably' on their current incomes, and more likely to say they are finding things 'difficult' or 'very difficult'. Another important difference is evident in Table 1: women are more likely than men to report being widowed (13.6 per cent vs. 4.5 per cent) or divorced (11.3 per cent vs. 8.2 per cent). These differences are obviously relevant to any sex-specific patterns in life satisfaction. People who earn less tend to be less satisfied (e.g. Easterlin 1974, Clark, 2011. Losing one's partner (via death or divorce) has a large impact on life satisfaction (e.g. Layard 2005).
The key point is then that a regression model that includes these variables as controls will misrepresent the differences (if any) between men's vs. women's life satisfaction. The 'female' coefficient in Model 4 gives the difference in life satisfaction for women (vs. men) when holding other variables constant. If income is held constant, the model implies that women and men generally have the same income. Table 1 demonstrates that this implication is untrue. It then matters that the relationship between these two variables (sex and income, with income as the control) can only be X→W: income does not determine sex (W→X), instead sex helps determine income. Including income as a control, then, induces bias in the estimate of the sex difference in life satisfaction. The idea many people have about controls is that the coefficient for the focal independent variable tells us its effect net of the impacts of the controls. But when the relationship between the independent variable and a control is X→W, the coefficient for X now gives us a quantity that is net of part of the effect of X itself. More precisely: when we control for income, the coefficient for sex tells us only about part of the effect of sex -it now excludes another part, i.e., the portion that has to do with the way being female means having a lower income (on average).
The same point applies to the inclusion of a control for partner/marital status. Women in the sample are much more likely to have experienced the death of a spouse. Including a control for marital status means comparing women's vs. men's life satisfaction while holding marital status constant. A model of this sort imagines that there is no relevant difference in marital status following from being male vs. being female. But being female in fact means a higher likelihood of losing one's spouse, which typically results in lower life satisfaction. So, a model that controls for marital status hides part of the (negative) impact of being female. Or (what amounts to the same thing) it obscures the way being female involves this particular disadvantage, a feature of women's lives that weighs against other advantages they might have in connection with life satisfaction (from Table 1: a lower incidence of unemployment). The inclusion of the control induces upward bias in the coefficient describing the impact of being female on life satisfaction. That coefficient is not merely 'net' of the impact of marital status; in addition, it is net of part of the effect of sex itself, given the impact sex (being female) has on marital status.
As noted above, the sample size for a model that includes controls is diminished by nonresponse on the control variables. There is a potential source of bias here: if men and women have different rates of non-response and non-response is also associated with life satisfaction, exclusion of non-responders from the analysis might lead to a biased estimate of sex differences in life satisfaction. As is evident from Table 1, non-response on certain variables is substantial -especially for income, and also for left-right political ideology.
To explore this possibility, in Table 3I present results from an analysis that does not exclude non-responders. (In Table 2, all models -even the ones that did not include controls -excluded non-responders, so that the same sample is used for the different models.) The sample size is of course much larger -249, 335 (vs. 150,447). The conclusions we draw, however, are not different. In particular, there is no sign here of a higher level of life satisfaction among women.
The analysis so far uses data from Rounds 4 through 8 of the ESS. Once we accept that we do not need control variables, it becomes easier to use the entire dataset, i.e., all nine rounds, with 430,523 respondents (which also enables use of all 38 participating countries, including Turkey, Serbia, Kosovo, Romania, Albania, Montenegro, and Luxembourg). The results for an equivalent analysis (available on request) are not different: the conclusions we draw are identical, and in particular there is no indication of any substantial life satisfaction advantage among women.
The key point in the presentation of results so far is that results from analysis that include controls (i.e., Model 4 in Table 2) give an unwarranted impression that women have higher life satisfaction than men. The inclusion of controls results in upwards bias for this analysis; when we take care to exclude inappropriate controls, there is no support for the idea that women have higher life satisfaction. What about the depression variable? Results for a logistic analysis of depression are given in Table 4, which has the same structure as  In Model 1, with no controls, we see a positive coefficient (0.53). That coefficient equates to an odds ratio of 1.7: so, women are 70 per cent more likely to report feeling depressed (most or all of the time), relative to men. That result changes a bit when we control for country: in Model 2, the coefficient for female is 0.49, which tells us (via an odds ratio of 1.63) that women are 63 per cent more likely to report feeling depressed. In Model 4 (which adds socio-demographic controls), the result for 'female' (the coefficient of 0.36) gives us an odds ratio of 1.43 -so, women are now only 43 per cent more likely to report feeling depressed, relative to men. The inclusion of controls gives us biased results, on the basis that the pattern for the controls is X→W. In substantive terms, the bias consists of the way we now see a smaller female/sex effect. In reality (as given to us by Models 1 and 2), the higher incidence of depression among women is more substantial than what we see via Model 4. The reasons for the bias in Table 4 are similar to the ones discussed above: when we hide the impact of women's disadvantages (by including controls pertaining to those disadvantages), we see a smaller magnitude for women's greater risk of experiencing depression. This result is not the 'true' impact of being female on experiencing depression. On the contrary, this result hides part of that impact, because it holds constant some of the disadvantages of being female, disadvantages that likely contribute to the risk of experiencing depression. The consequence of including inappropriate controls is the same as for the analysis of life satisfaction.
As with the analysis of life satisfaction, these results do not change if we use more of the available sample (e.g. by not deleting sample members who do not respond to some of the questions used for control variables in Model 4 -results available on request).

Discussion and Conclusion
There is a 'gender paradox' of life satisfaction and depression only if it is true that women have higher life satisfaction than men. The analysis above shows that women do not have higher life satisfaction than men. The appearance of higher life satisfaction among women in Europe comes only when we introduce socio-demographic controls. The intention behind inclusion of controls is to mitigate the possibility of bias in our results. But mitigation of bias follows from use of the right controls (the ones where W→X). In the present context, where our X is sex, using the right criterion for selection of controls means that we do  not need any socio-demographic controls. The results from models containing socio-demographic controls are biased, for reasons we can understand, in line with the discussion of income and widowhood above. Women do have a higher likelihood of experiencing depression -indeed, that increased likelihood is more substantial when we exclude inappropriate controls. But there is no life satisfaction advantage among women in Europe -so, no 'gender paradox', at least in the way described by Becchetti & Conzo (2022). At most, one might say only that women are more likely to be depressed, in contrast to the way they are not more likely to report substantially lower life satisfaction than men.
To gain clarity on the right way to think about control variables for questions of this sort, we can focus on some of the language researchers typically use to articulate what their results mean. We commonly see a notion of 'net effects': in the presence of controls, the coefficient for one variable is said to be net of the effects of all the other variables in the model. That language ignores the important distinction between W→X and X→W. If our goal is to estimate X→Y but we include some control variables where X→W, we do not get a result for X that is net of the effects of (only) the other variables. Our estimate for X is now also 'net' of part of the effect of X itself. In other words, it is biased. The term 'net effects' does not help us here. Our target quantity is more effectively expressed with different language. What we want is simply an unbiased estimate of the impact of X on Y. To achieve that aim, we need the right control variables (W→X). Using this criterion for selection, it then sometimes works out that we do not need any control variables at all.
There is of course more complexity that could be explored to gain further understanding of differences in subjective well-being between men and women. That complexity pertains in part to different concepts and measures, beyond life satisfaction (e.g. happiness, flourishing, negative/positive affect, etc.). Another angle is to consider the way differences between men and women might change over the life-course: Inglehart (2002) finds that women are happier than men at younger ages but then less happy than men at older ages. It would also be possible to investigate other regions; perhaps outside Europe there is more evidence of a difference between men and women. Regardless of the angle or focus, as long as gender/sex is the focal independent variable it remains necessary to think carefully about the control variables needed for one's analysis. A perspective focusing on 'other determinants' of the dependent variable might lead to use of 'bad controls'. Clarity on the relationship between the focal independent variable and the (potential) control (W→X vs. X→W) offers an important corrective to that more conventional perspective.
Declarations The author declares that there are no competing interests associated with this manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.