1 Introduction

Social isolation is defined in literature as a condition of deterioration of human relationships both from a quantitative and a qualitative point of view. The discussion concerning the role of social isolation is increasing in centrality nowadays, focusing on the context of industrialized societies, in which individuals experience new patterns in intergenerational ties, social mobility and living arrangements due also to dual career families, increased instability in unions and a decline in large families (Holt-Lunstad et al. 2010).

Recent literature has investigated deeply the consequences of social isolation, mainly documenting the relationship between a deprivation in social relationships and health (Holt-Lunstad et al. 2010; Steptoe et al. 2013; House et al. 1988). In more detail, the literature has underlined that low levels of social relationships have a negative impact on health conditions. The mechanism through which a deprivation in social contacts undermines health is well summarized in two theoretical streams: the “buffering” (Cohen et al. 2000) and the “direct effect” models. The “buffering” model states that social support is beneficial when individuals are exposed to the pathogenic effect of stressful events, by moderating the negative impact of those stressors.

On the other hand, the “direct effect” model suggests (Cohen and Wills 1985) that good social relationships have positive effects irrespective of whether individuals are under the effect of health stressors or not.

Empirical evidence (Cohen and Wills 1985) supports both the illustrated points of view, confirming in any case the positive impact of social networks on mental and physical health.

While this field of research has plenty of contributions which investigate and support (Ong et al. 2016; Cornwell and Waite 2009) the impact of social isolation (and/or loneliness) on health and, in particular, on the risk of death in the older cohorts, less attention has been devoted to the relationship between human relationships and health in large samples of adults of all ages.

In this paper, we consider a large representative dataset of more than 40,000 individuals aged sixteen and older and coming from 21 countries in Europe, with the aim of establishing a relationship between social isolation and health.

Data include self-reported responses as a suitable measure of health in a cross-sectional setting. Subjective health, being a self-reported measure of health, is commonly used in national representative surveys with the objective of measuring a health condition, but the literature shows that self-reported measures of health are in certain cases more subject to measurement errors if compared with more reliable objective measures (Butler et al. 1987; Johnston et al. 2009).

Nevertheless, subjective measures of health are widely used in the literature when objective measure of health are not available. In addition, self-reported scales are more reliable when individuals are asked to report on their current health and not over any long period of time (Bourne 2009), as in the case of our data.

Using data from the European Social Survey, we aim to assess whether social isolation increases or decreases subjective health. Due to the potential presence of measurement errors which may be present both in subjective measures of well-being and in the perceived component of the social isolation index adopted in the paper, the relationship between the two variables may be affected by endogeneity. In econometric practice, endogeneity may arise when we are in the presence of possible unobservable factors that may influence both the main predictor (social isolation) and the outcome (subjective health). In this framework, the presence of measurement errors in self-reported indices of social isolation and subjective well-being represent one of the most common sources of bias in estimates.

An instrumental variable approach, through the identification of exogenous variables that are strongly correlated with the main predictor (social isolation) but presumably not with the error term of the equation that explains subjective well-being as a function of social isolation, may protect estimates from the bias produced by endogeneity.

After this brief introduction, the structure of the paper is organized as follows: the section “Data and Methods” describes the peculiarity of the dataset and the statistical methodology implemented to evaluate the effect of social isolation on subjective well-being through instrumental variable estimation after having analytically defined the subjective well-being and provided full details about the construction of the social isolation index used in the analysis. The section “Results and Discussion” presents the outcome of the regression analysis and discusses both the main and the corollary results, while the final section concludes.

2 Data and methods

2.1 Data

To analyze the relationship between social isolation and subjective general health, we employ data from round 7 of the ESS (2014). The ESS is an academically driven cross-national survey which issues a multidimensional questionnaire across several European countries every two years. In the seventh round, there were 21 countries surveyed: Austria, Belgium, Czech Republic, Denmark, Estonia, Finland, France, Germany, Hungary, Ireland, Israel, Lithuania, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland and the United Kingdom. We decided to choose this particular round not only because it is one of the most recent available, but also because it is the only one containing a specific module on social inequalities in health. Thus, it includes health measurements which are crucial for our analysis.

For this particular round the total sample consists of 40,185 individuals aged between 16 and older (Mean = 49.34; SE = 0.09). The analysis focuses on the individuals that answered all questions on subjective general health and social isolation, thus, 39,122 individuals.

2.2 Methods

The main model we use to estimate the impact of social isolation on the health status dummy variable is a probit regression with endogenous regressors specified as follows:

$$Y_{i} = \Pr \left( {S_{i} = 1|\left( {\begin{array}{*{20}c} {X_{i} } \\ {{\varvec{W}}_{{\varvec{i}}} } \\ \end{array} } \right)} \right) = \phi \left( {\beta_{0} + \beta_{1} X_{i} + \underline{{\beta_{2}^{^{\prime}} }} {\varvec{W}}_{{\varvec{i}}} + \varepsilon_{i} } \right)$$
(1)
$$\phi^{ - 1} \left( {Y_{i} } \right) = \beta_{0} + \beta_{1} X_{i} + \underline{{\beta_{2}^{^{\prime}} }} {\varvec{W}}_{{\varvec{i}}} + \varepsilon_{i}$$
(2)

where Si is a binary variable represents subjective general health for the individual i; Yi, the dependent variable, represents the probability that the chosen health-level variable assumes value 1 for the individual i; Xi is the main explanatory variable, which is a measure of social isolation for the individual i; Wi is a vector consisting of all the other exogenous regressors, that is, all the other variables that have an influence on health, according to the literature. Finally, εi represents the error term and Φ is the cumulative distribution function (CDF) of the standard normal distribution. Since, as mentioned before, this relationship is likely to be affected by endogeneity problems, that is, social isolation is potentially correlated with the error term, we argue that an IV-probit regression model (Newey and West, 1987) would lead to more reliable estimates than a standard probit regression. Using an IV-probit estimation means implementing an auxiliary ordinary linear regression, where Xi is a function of the exogenous control variables Wi and of instruments, i.e. a set of predictors collected in the vector Zi that have the requirement of not being correlated with the error component of the main equation. The auxiliary equation is specified in the following way:

$$X_{i} = \alpha_{0} + \underline{{\alpha_{1}^{^{\prime}} }} {\varvec{Z}}_{{\varvec{i}}} + \underline{{\alpha_{2}^{^{\prime}} }} {\varvec{W}}_{{\varvec{i}}} + u_{i}$$
(3)

where Zi is the vector of instrumental variables not correlated with the error term of Eq. (2). After estimating [α0, α1, α2] using Eq. (3), the obtained prediction of Xi replaces Xi in Eqs. (1) and (2), giving robust estimates of the coefficient β1.

2.2.1 Dependent variable

Subjective general health (SGH) is the dependent variable of the main equation. It has been obtained by generating a dummy from the following question:

How is your health in general? Would you say it is… [Very good, good, fair, bad, or, very bad?].

The dummy SGH takes a value of 1 when respondents choose the answers “very good” or “good” (we denote this condition as characterizing individuals with high self-reported health) and 0 otherwise, denoting those respondents that we classify in a low self-reported health category. The choice to dichotomize the outcome variables is justified by the availability of econometric tests for supporting the choice of an instrumental variable procedure.

2.2.2 Endogenous variable and instruments

The main explanatory variable of our analysis is social isolation. The ESS has a specific section about social exclusion. In our analysis, we focus on the measures of social isolation. In order to construct a robust index of isolation, we follow the recent literature (Gustafsson et al. 2015) and build this measure by adding together the score assigned to each response category of three related ESS variables, namely, family status, social contacts and having close friends. These items have been used in many past studies (Gustafsson et al. 2013, 2014; Floderus 2009),

2.2.2.1 Family status

This item, indicating whether the respondent cohabits with a partner and/or has children living at home, is obtained by the combination of the following two questions of the ESS survey:

Respondent lives with husband/wife/partner? [Yes, No] and Respondent has children living at home? [Yes, No].

In particular, the variable family status has the following four response categories and to each possible answer a score between 1 to 4 points (p) was assigned:

  • Lone without children (1p)

  • Lone with children (2p)

  • Cohabiting without children (3p)

  • Cohabiting with children (4p)

2.2.2.2 Social contacts

This variable indicates the frequency of the respondents’ social meetings and is recorded after asking this question of the interviewee:

How often do you meet socially with friends, relatives or work colleagues?

  • Never/less than once a month (1p)

  • Once a month (2p)

  • Several times a month/once a week (3p)

  • Several times a week/every day (4p)

2.2.2.3 Having close friends

This last item indicates whether the respondent has some close friends or not; it is thus coded as a binary variable.

How many people, if any, are there with whom you can discuss intimate and personal matters?

  • None (1p)

  •  ≥ 1 (4p)

The index obtained following the past literature (Gustafsson et al. 2015) is reversely coded; that is, low values on the index mean that the individual experiences a high degree of social isolation, whereas a high score implies less social isolation-

Table 1 shows the social isolation items and the distribution of their response categories.

Table 1 Descriptive statistics of social isolation items (percentages)

Hence, the social isolation index is obtained by adding together the score of the three individuals’ answers. This index is decreasing with respect to the level of isolation and it varies between 3 and 12p. It means that individuals now scoring low on the index experience high levels of social isolation, while individuals scoring high on the index experience less social exclusion. Since this variable can assume 10 different values (10-point scale), it can be treated in practice as a continuous predictor (Rhemtulla et al. 2012) without a great loss of information.

The relationship between social isolation and subjective health is documented in Table 2. The first column shows percentages for the whole sample, while the remaining ones show them at different quartiles of social isolation.

Table 2 Descriptive statistics of social isolation and subjective general health (percentages)

Looking at the last four columns, we can immediately see that health increases for individuals who experience higher levels of social inclusion.

As outlined above, the main issue with the estimation strategy is endogeneity: social isolation is correlated with health through different mediating factors and the coefficient of the isolation factor would likely be a biased estimate of a causal effect. The reason may be: (i) omitted variable bias, if unobserved individual factors are affecting health; (ii) reverse causality, if bad health conditions lead people to isolate themselves; or (iii) persistent measurement error, if isolated individuals tend to underreport their health conditions. In order to overcome these issues, we specify an instrumental variable model, where the variance of the isolation factor exploited in the main regression comes from some exogenous variables. Our instruments are the membership in ethnic minorities and the experience of past conflicts within the family. The underlying assumption is that minority status and family conflicts determine the isolation of an individual, which in turn undermines her health. According to the literature (Smith and Calasanti 2005), individuals belonging to racial and ethnic minorities have less social support and feel that they do not fit properly into society. The link between these variables is further confirmed in the literature (Boden-Albala et al. 2005), since membership in an ethnic minority group is considered a proxy for differences in social resources, in particular, social isolation. On the other hand, ethnic minorities can have worse health not necessarily due to their stronger social isolation: minorities may have lower health care access and less information about services, have different cognitive and non-cognitive skills which may affect health. In order to prevent the robustness of the estimates we control in both the first stage and second stage regressions for the household income and for family and individual education, which are proxies of individual skills. In addition, in order to increase safety in estimates procedures econometric tests can help us to provide evidence of the validity of the instruments in an overidentified setting.

The exigence of having overidentification requires the introduction of a second instrumental variable, which is the presence of family conflicts. With respect to past family conflicts, literature shows how adolescents’ unresolved feelings of loneliness linked with the tensions experienced between household members may be an obstacle to the formation of normative social relationships through the development of anxiety and social avoidance (Johnson et al. 2001; Lau and Kong 1999; Vernberg 1994).

This approach should be suspected of not being enough to fully overcome the problem of endogeneity. First, health may be related to ethnicities, if they have different health attitudes. In the ESS there is no information about ethnicities or area of origin: we do not know the immigration history of the respondent and her family; some may have moved to their current country to benefit from better services, including health care. The best we can do with these data is to control for area of residence (small or large town), occupation and country effects. Second, family conflict regarding the past is generally a problematic concept because it is a subjective variable like health. Thus, it is plausible that the bias in reporting the two variables is correlated, for instance, if perception about one’s own health affects the level of trust in others (e.g. doctors) or vice versa. Third, social isolation is due to a set of factors, so it is reasonable to expect that family contrast, minority status and other observable covariates are able to explain only a limited part of its variability in the sample. Hence, we could run into weak instrument problems: instruments with small explanatory capacity would give artificially high IV coefficients and inflate inconsistency and finite sample bias (Bound et al. 1995). Although the structure of the data does not allow us to solve all these problems, the results of econometric tests displayed in the following section may help to support or reject the candidate instruments.

The questions of interest for building instruments are:

Do you belong to a minority ethnic group? [Yes, No];

Tell me how often there was serious conflict between the people living in your household when you were growing up? [Always, often, sometimes, hardly ever, never].

The first variable is a dummy indicating whether the respondent belongs to a minority ethnicity in his country or region. The second one is a categorical variable which we recode as a dummy, taking a value of 1 when the respondent answers “always” or “often” and 0 otherwise.

2.2.3 Exogenous explanatory variables

We use as controls some variables which affect health and social contacts. First, we include basic demographic variables: age classes and gender. Second, we include other socio-economic controls (Venetoklis 2019): paid work status, dummies for urban area of residence (big city, suburbs, village) and level of education attained by respondents and also by their parents. Finally, we use country dummies to rule out any country-level effect.

2.2.3.1 Age

Older people are generally less healthy than younger people (Ecob 1996). We decided to use a set of dummies for age classes, rather than a continuous variable, because the relationship between age and isolation (or health) may be nonlinear.

2.2.3.2 Gender

According to past studies (Wuest 2002), the inclusion of gender as a health determinant is crucial in order to control for sex-related social norms and structures that influence vulnerabilities to illness, health status, access to preventive and curative measures and quality of care.

2.2.3.3 Occupational status

Work status may be associated with anxiety and other health-affecting symptoms (Benavides et al. 2000). In order to control for these effects, we included in our regression a set of dummies representing the following categories: employed, students, unemployed, retired and other. The latter category includes individuals such as houseworkers and those in community or military service.

2.2.3.4 Religiosity

Scientific research neglected the connection between religiosity and health until recently; however, during the last few years the interest in the relationship between these two variables has started to grow (Aukst-Margetic and Margetic 2005; Seeman et al. 2003). Given the prevalence and importance of religiosity among the population, it is reasonable to consider the impact that religious beliefs, practices and traditions may have on physical and mental health outcomes. Although a large proportion of published work suggests a positive association between religion and health outcomes (Casar Harris 1995), some studies show a negative relationship (Trenholm et al. 1998). Considering the contradictory positions in the existing literature, further research is needed to investigate this relationship.

2.2.3.5 Domicile

The area of residence may significantly impact health outcomes by influencing items such as healthcare access and behaviors (Humphreys and Carr-hill 1991; Macintyre et al. 1993; Mullen 1988). However, until now the impact of area characteristics on health has still not been completely clear because the existing studies show contradictory findings (Ecob 1996). By including a set of dummies representing residents of big cities, small cities, suburbs, country villages and the countryside, we try to capture these effects.

2.2.3.6 Education

The subjects surveyed were asked: “What is the highest level of education you have successfully completed?” Answers were then recoded in the International Standard Classification of Education (ISCED) which was developed by UNESCO, and includes seven different levels. This coding frame facilitates comparisons of education statistics and indicators across countries. In our analysis, we reduced the possible answers to 3 different categories and generated four dummy variables accordingly:

  • Primary education: assumes a value of 1 if the subject completed ISCED level 1 and 0 otherwise;

  • Secondary education: takes a value of 1 if the subject completed level 2–5 of the ISCED and 0 otherwise;

  • Tertiary education: a value of 1 is indicated if the individual successfully completed either level 6 or 7 of the ISCED and 0 otherwise.

Those who achieved a tertiary level of education represent the reference category.

In the literature, it is well documented that the well-educated have better health than the poorly educated, as indicated by high levels of self-reported health (Guralnik et al. 1993; Gutzwiller et al. 1989; Winkleby et al. 1992) with a directional path that goes from education to health and not vice versa (Doornbos et al. 1990; Fox et al. 1985; Wilkinson 1986).

2.2.3.7 Country dummies

Finally, we included a dummy for each country in order to take into account the residual unobserved heterogeneity at country level, such as country-specific institutional effects.

3 Results and discussion

We now turn to the results of our analysis. Table 3 displays the results of the first-stage regressions using the social isolation index as the endogenous variable. Notice that due to missing values in the control variables, the total sample is reduced to 31,726 observations.

Table 3 Estimates for the determinants of subjective general health

The results were in line with the expectations: individuals who belong to an ethnic minority and/or who grew up in a household with serious conflicts between family members are more likely to be socially isolated. We can claim that ethnic minority and family conflict are strong determinants of social isolation, since the coefficients of the two instrumental variables are statistically significant at the 0.1%. Most of the socio-economic controls have significant coefficients—namely age class, gender, religion, area of residence and education. The only exception is provided by the education level of the father, even if the results are compensated with those regarding the education of the mother. All the variables used in the regression, i.e. the instruments, the demographic and socioeconomic controls and the country dummies, were altogether significant with a F-statistic greater than 10. Thus, they are highly correlated with our endogenous variables. This is a good argument in favor of the effectiveness of our IV-probit regression (Baum 2003), proving the relevance of the selected instruments.

The second column of Table 3 reports the results of the second-stage regression, where health status is the dependent variable and the index of social isolation is used as an approximation of a continuous endogenous variable. The coefficient of social isolation is positive and strongly significant, meaning that health benefits from a higher level of social integration. This effect is robust to the inclusion of all our controls and country dummies. This basic model shows evidence that social isolation has adverse effects on health as reported in the ESS.

The last column of Table 3 presents ML estimates (standard probit estimates), which show a similar sign for social isolation: in both the regressions, social isolation is positively linked with subjective health, even if in the IV model the coefficient is larger: this means that subjective health is better for individuals who do not belong to families with past conflicts and who are not part of ethnic minority.

With respect to the other explanatory variables, ML results are in line with expectations: self-assessed health is worse in older cohorts and in retired and unemployed individuals. Higher level of education are also connected to best performances in subjective health.

Table 4 displays country fixed effect deriving from IV-probit. Portugal, which is the country with the worst performance, has been chosen as the reference category.

Table 4 Country fixed effects in IV estimation

Results are surprising and reserve to Ireland the best performance in terms of self-assessed health, being constant the effect of isolation and of the other covariate. This result partially confirms that obtained in literature (Winkleby et al. 1992), which confirms the high health standards of Ireland.

In order to further support the choice of the IV-probit method with respect to the standard probit regression and to show the validity of our instruments, the Amemiya-Lee-Newey test and the Wald test of exogeneity were run. The results, reported in Table 5, are in favor of the IV-probit method. The Amemiya-Lee-Newey statistic can be calculated for cases in which the model is “overidentified,” that is, when the number of items exceeds the number of endogenous variables. This test is equivalent to the Sargan test for the standard IV regressions and is implemented in order to judge whether the items are valid (i.e. they are exogenous and they affect the dependent variable only indirectly) or not. Results supported accepting the null hypothesis that the variables we used to instrument the index of social isolation, i.e. ethnic minority and family conflict, are exogenous. The Wald test is instead an indirect test for the endogeneity of the instrumented variable. Results were against the null hypothesis of exogeneity; thus, the error terms of the first and second-stage regressions are correlated. This suggests that using the IV-probit estimator produces consistent and more efficient estimates for the impact of social isolation on subjective general health.

Table 5 Instrumental variable estimation diagnostic

Our analysis has some objective shortcomings. As explained in the previous sections, there are endogeneity issues which still remain in our model, even after instrumenting.

First, the validity of our instruments must be tested since the exclusion restriction might not be satisfied. In fact, belonging to an ethnic minority might be correlated with health status, even if the direction of the correlation is ambiguous. A stream of literature in public economics focuses on overutilization and underutilization of welfare services, including healthcare, by immigrants and ethnic minorities. A possible explanation for underutilization is that ethnic minorities and immigrants face barriers to access healthcare services due to a lack of information. This could result in worse health conditions for those belonging to an ethnic minority. Nevertheless, immigrants tend to be younger and, on average, healthier; hence, the health effect of being part of an ethnic minority is ambiguous. However, by controlling for age classes, this problem should be overcome. Our second instrument, namely, family conflict, is a lagged variable because it investigates possible tensions between family members in the past; thus, it is a good candidate for being an exogenous item. The goodness of both our instruments is confirmed by the result of the Amemiya-Lee-Newey overidentification test, which states that the instruments used in our model are not endogenous (thus, valid) and so the model seems to be specified correctly.

A second issue is that ethnic minority and family conflict are not the only determinants of social isolation. In fact, the coefficients from the first-stage regressions are quite small (Table 3), even if jointly strongly significant. The F-statistic on excluded instruments in the first stage is greater than 10, so following the rule of thumb by Staiger and Stock (1997), we do not worry about weak instrument problems, even if this does not guarantee that the coefficients are unbiased.

Nevertheless, our objective is to test a relationship which is outlined in the literature. In this respect, it is important to focus on the qualitative conclusion: there is strong evidence that isolation has a negative effect on health. Our findings are consistent with those reported in the medical literature.

Our analysis provides additional evidence to support the hypothesis that social isolation deteriorates people’s health. Although our results are coherent overall with the existing literature, they are not fully comparable. The first difference is the measure for social isolation: most studies use the UCLA loneliness scale, while, as explained above, we used the social isolation index.

Secondly, while many papers focus on specific medical conditions, often related to the circulatory system, such as cardiovascular diseases, blood pressure, myocardial infarction etc. (Umberson 2010; Cacioppo et al. 2002; Hawkley et al. 2003), we resort to self-reported health status at a single moment in time. A similar measure was however employed in the literature (Luo et al. 2011), together with other measures: using a crossed-lagged model, studies found that loneliness has a modest negative impact on self-rated health over two years. As we have mentioned, researchers who focus on health behaviors obtain contrasting figures.

4 Conclusions

In this paper, we investigate the relationship between social isolation and health. In particular, we test the hypothesis that higher isolation worsens the health status of individuals. However, since this relationship is ambiguous and affected by endogeneity, a simple correlation among these variables does not prove our hypothesis. For this reason, in order to investigate the causal relationship, we implemented an instrumental variable approach, in which the instruments are given according to the variables “belonging to an ethnic minority” and “family conflict”, both of which are highly correlated with social isolation.

Using different data from a cross-sectional survey and a different methodology from those employed before, we find results in line with past research in medicine and psychology but extended to self-reported health for a large, representative sample of European citizens (not necessarily belonging to the older cohorts). High levels of social isolation, defined as a lack of personal contacts with peers and an absence of profound relationships, are found to favor a decline in the perceived health of individuals. The validity of the model providing these results is supported by the implementation of statistical tests of validity and the relevance of the instruments.

All in all, our results show that a lack of social connections impacts the health status of individuals. This is an important result for policymakers, as exploring the channels through which social status affects people’s health may help in designing preventative interventions. The relationship between isolation and health becomes even more important if one considers the peculiarities of modern European societies. On the one hand, an aging population is likely to increase the already high healthcare expenditures. On the other hand, family size is decreasing with plausible consequences on social ties and feelings of loneliness. Hence, addressing the issues of social isolation and loneliness might be an effective strategy to improve the population’s health, which in turn will benefit governments. Finally, we cannot ignore the role of the internet and social media, the effects of which on social relationships and loneliness are still not completely clear.