Data
This research draws on Phase II of the RLMS-HSE,Footnote 2 which is a series of annual household surveys designed to monitor the health and economic welfare of households and individuals in Russia. Data has been collected each autumn since 1994 (other than in 1997 and 1999) and represents the only long-term, nationally representative, source of household and individual level data for the Russian Federation. In this paper we use the adult survey data from 2000 to 2017 inclusive (the years when information about VHI is included in the survey), restricting our sample to adults above mandatory schooling age (17) and up to age 72, beyond which labour market engagement is limited.Footnote 3
The survey, which takes place each autumn, is based on the principle of ‘repeated sampling of dwellings’, in which all household members are interviewed in each survey (if they can be contacted within three visits), and then the dwelling itself (rather than the household) is followed over time. In combination with regular annual replenishments this sampling strategy maintains the cross-sectional representativeness of the sample for each round. Additionally, there is a component of the panel which is followed regardless of dwelling, and further attempts are also made to follow-up individuals who have moved out of the household. The round-by-round attrition, at a little under 10%, is not out of line with equivalent household panel surveys from elsewhere [54], while the 97% response rate of individuals within surveyed households is testament to the robustness of the survey protocol. The relative richness and reliability of these data allows us to draw on their longitudinal component to explore the relationship between VHI and health-related behaviours and outcomes, including as these are conditioned by important socio-economic and demographic heterogeneities.
We identify eight health outcome and behaviour (dependent) variables as well as a rich range of explanatory variables, including the VHI indicator. Full variable definitions are provided in Table 3. The first dependent variable, ‘visits to doctor’, is a variable with five categories, increasing sequentially from ‘less than annually’ through to ‘several times per month’ and represents a proxy for individual interaction with the health care system. This question has only featured in the RLMS-HSE survey since 2004, so the corresponding estimates refer to a slightly reduced sample. There are then 6 variables which proxy for individual health behaviours: smoking or not; quantity of cigarettes smoked among the sub-set of smokers; drinking or not; frequency of alcohol consumed among the sub-set of drinkers; regularity of physical exercise (question not included in the 2007 survey); and body mass index (BMI), disaggregated into deciles, as a proxy for health outcomes concerned with under- and overweight. Finally, self-assessed health (SAH) is a standard survey variable reflecting an individual’s assessment of their own overall health state on a 5-point scale, ranging from very bad to very good.
The main explanatory variable of interest is drawn from a question, included in the RLMS-HSE survey since 2000, asking if individuals have supplementary VHI. Respondents were subsequently asked whether they had paid for the VHI themselves, whether their enterprise/organisation had paid, or whether it was financed through some ‘other’ source. The share of VHI contracts paid by enterprises is approximately 80%, though higher in Moscow and St. Petersburg.Footnote 4 In addition, we include a standard set of explanatory variables which are theoretically and empirically linked with health outcomes. At the individual level we control for log of monthly income measured in (Moscow, December 2006) roubles, the highest level of education achieved, and self-reported chronic illness (specifically heart, lung, kidney, liver, spinal or gastrointestinal). Income and education are traditional controls in so far as they determine access to and knowledge of medical services and healthy lifestyles. Table 4 demonstrates that the mean values of health indicators are different over respondents’ education levels, while Table 5 confirms, as expected, that chronic illness is a significant predictor of the dependent variables. At the household level, we follow the literature [39, 55, 56], in controlling for the presence of children aged under three, as a proxy for ‘young family’ status, and Table 5, shows that, in most cases, the mean values of healthy life indicators for people with children differ significantly from those without young children. At the macro level we control for year fixed effects to capture the influence of aggregate time trends.
Table 1 presents descriptive statistics for the dependent and explanatory variables according to whether the individual respondent reports having no VHI, having enterprise VHI (ever during 2000–2017 and concurrently with the response year), or having self-provided VHI (ever during 2000–2017 and concurrently with the response year). Those with enterprise VHI comprise of more males, are richer, report fewer chronic illnesses, have more young children and are more likely to be located in Moscow and St. Petersburg. They are more educated than those who never had VHI during the period but marginally less educated than those who are self-insured. Compared to those with no VHI, the latter are less male but richer, more educated and with more chronic illness reported. Turning to the eight health outcomes of interest, those without VHI have the lowest number of visits to the doctor, the lowest propensity to drink but also the lowest engagement with physical exercise and the lowest self-assessment of their health status. Comparing enterprise and self-provided VHI, visits to the doctor and physical exercise are lower among the former, but smoking, drinking and BMI are all higher. Self-assessed health is higher among those currently reporting VHI but lower among those ever having reported VHI.
Table 1 Descriptive statistics (mean values, standard deviations, number of observations, and number of respondents) To further understand the data, in Figs. 1 and 2, we present cumulative hazard functions, taking into account that we have left truncation of VHI in our data. The hazard functions are estimated on the sample of observations for which we have non-missing values of the explanatory variables and which we use in our subsequent regression estimates. Figure 1 confirms that respondents have a greater chance of leaving an uninsured state for enterprise provided VHI rather than for self-funded VHI. Figure 2 suggests that there is slightly greater persistence in the enterprise funded case and that the differences become noticeable after 10 years. Table 6 illustrates that the sample churning (e.g., major sample changes in 2001, 2006, 2010 and 2014) and left and right censoring would render any single cumulative hazard curve (i.e., not considered in comparison) largely meaningless. However, for our purposes, where we compare the behaviour associated with two contract types, the long panel component of the RLMS-HSE data facilitates analysis in which we can control for time invariant individual unobservable characteristics using fixed effects (FE) models. Moreover, as Table 7 shows, insurance episodes tend to be of short duration (i.e., less than 3 years) which, combined with the churning evident in Table 6, means that we observe substantial within variation of the corresponding explanatory variables, allowing us to reliably estimate the correlation between insurance status and a range of potential moral hazard behaviours.
Methods
The descriptive statistics, discussed above, begin to reveal a pattern across health outcomes and behaviours. However, evaluating impact effects based on the comparison of simple descriptive statistics overlooks the respondent specific characteristics, including environmental differences, which may be important drivers of health-related outcomes. To explore more systematically we use regression analysis, controlling for the observed confounding factors (income, education, chronic illness, children, year) described above. However, a further part of the between respondent variation in the outcome variables of interest could be due to unobserved confounders. To the extent that such factors are individual time-invariant we can control for any consequent bias using the deviations from the individual means of variables in the FE regression models. In the regression models, with various health indicators as dependent variables, \(self\) and \(ent\) are the key (binary) variables of interest. For this approach to be valid, we rely on the assumption that the trend behaviour across different categories is consistent across time, regardless of movement between categories. To guard against the possibility that this may not hold, as a robustness check, we repeat estimates for different samples, defined according to age, gender, education, region and, finally, without the respondents who were never insured in the period 2000–2017.
We estimate eight FE base regressions (corresponding to the eight dependent variables):
$$y_{it} = \alpha_{i} + \beta_{1} {\text{self}}_{it} + \beta_{2} {\text{ent}}_{it} + x^{\prime}_{it} \delta + \mu_{t} + \varepsilon_{it} ,$$
(1)
where \(y_{it}\) is the response of individual \(i\) in period \(t\) to questions relating to the eight dependent variables; \(x_{it} = (x_{1it} {\kern 1pt} x_{2it} \ldots x_{kit} )^{\prime}\) is the column-vector of the \(k\) control variables (income, education, chronic illness, presence of children under 3 years in household); \(\beta\)s are the main parameters of interest (revealing the excess in the mean value of the dependent variable, \(y\), for the corresponding insurance state in comparison with the uninsured state, holding all other control variables, time, and individual effects fixed); \(\delta\) is a column-vector of parameters for the control variables; \(\alpha_{i}\) is the respondents’ time invariant individual-specific unobservable characteristics (that could correlate with regressors); \(\mu_{t}\) is time fixed effects; and \(\varepsilon_{it}\) is the error term that captures unobservable individual and environmental characteristics that may vary between respondents and over years. We first estimate each base model regression for the full pooled sample, before then re-estimating for different sub-samples defined according to age, gender, education and region.
We then repeat this exercise in what amounts to an extensive series of robustness checks. First, we simply exclude from the sample those respondents who never had any form of VHI. Second, we estimate FE logit models for the binary dependent variables (‘smoke’ and ‘drink’):
$$\Pr (y_{it} = 1|{\text{self}}_{it} ,{\text{ent}}_{it} ,x_{it} ) = \Lambda (\alpha_{i} + \beta_{1} {\text{self}}_{it} + \beta_{2} {\text{ent}}_{it} + x^{\prime}_{it} \delta + \mu_{t} ),$$
(2)
where \(\Lambda\) is the cumulative logistic distribution. Third, given the ordered choice nature of five of the dependent variables (doctor, drink_n, sport, bmi_dec, and sah), we might consider estimating the corresponding FE ordered choice logit:
$$\left\{ {\begin{array}{*{20}l} {\begin{array}{*{20}l} {\begin{array}{*{20}l} {y_{it}^{*} = \alpha_{i} + \beta_{1} {\text{self}}_{it} + \beta_{2} {\text{ent}}_{it} + x^{\prime}_{it} \delta + \mu_{t} + \varepsilon_{it} ,\quad i = 1,2, \ldots ,n,\quad t = 1,2, \ldots ,T,} \hfill \\ {y_{it} = j,\quad \gamma_{{i{\kern 1pt} j - 1}} < y_{it}^{*} \le \gamma_{ij} ,\quad j = 1,2, \ldots ,M - 1,\quad \gamma_{i0} = - \infty ,\quad \gamma_{iM} = \infty ,} \hfill \\ \end{array} } \hfill \\ \end{array} } \hfill \\ \end{array} } \right.$$
(3)
where \(y_{it}^{*}\) in each model is one of the five latent variables which theoretically depend on the individual utility function values concerned with the respondent’s choice of the frequency of visits to doctor, alcohol consumption, level of physical activity, BMI decile, and self-assessed health;\(y_{it}\) is the respondent’s answer to the corresponding ordered choice question or BMI decile; and \(M\) is the number of possible responses from which respondents choose the \(j\)-th response (or else is mechanically located in the \(j\)-th BMI decile).
Unfortunately, in the case of the non-linear FE model (3), the \(\hat{\beta }\) estimates derived from short panels are inconsistent due to the so-called ‘incidental parameters problem’ [57, 58], when estimating the \(\gamma_{ij} - \alpha_{i}\) differences necessary to obtain the \(\hat{\beta }\)’s. Without being able to ‘naturally’ extend \(T\) we attenuate the inconsistencies in our \(\hat{\beta }\) estimates through the application of a ‘BUC’ (Blow up and Cluster) methodology. In this approach we replace each observation in the sample by \(M - 1\) copies of itself (so-called ‘Blow-Ups’) and each of these \(M - 1\) replications of the individual’s choice is dichotomised at a different cut-off point \(\gamma_{ij}\). Crucially, this process of dichotomisation, which preserves as much information as possible concerning the changes in the dependent variable, means the BUC methodology is robust to finite samples [3] and is well suited to our data. The \(\beta\) parameters, for each specification of Eq. (3), are estimated by the conditional logit model on the ‘blown up’ sample, clustering the standard errors according to the \(i\) individuals and removing the need to estimate \(\alpha_{i}\).
The selection mechanism
At the core of this research lies the identifying assumption that there is no systematic selection mechanism at work in the way that individuals become associated with enterprise provided VHI. Given the pre-eminence of this assumption it merits further elaboration before we present our results, with respect to both the institutional framework prevailing in Russia and to the empirical stylised facts in the Russian data. As we have seen, accounting for just 5.4% of all medical appointments, VHI remains relatively niche in Russia. Most Russian citizens have little familiarity with insurance principles in general and very limited understanding of the benefits associated with VHI services. In part this reflects the absence of clear legislation separating the varieties of service available through VHI and MHI and in part, reflects the restrictions within VHI contracts themselves. Specifically, the contracts are limiting in terms of the range of services, the range of locations and the quantity of doctors and specialists, but most significantly, they do not cover the (significant) costs of prescribed medication.
From the companies’ perspective, there are substantive benefits from offering VHI contracts to employees. These include benefits (up to 6% of payroll budget) relating to salary payments, exemptions from social taxes on VHI contract payments, and exemptions from personal and value added tax on the premiums and benefits associated with VHI. Large corporations, particularly in Moscow and St. Petersburg—where the health care market is more developed—therefore, face strong incentives to offer VHI as part of the employment package. Notwithstanding these incentives, there is little to suggest that Russian job seekers place high value on the provision of supplementary health insurance. Recent surveys by two of Russia’s largest employment agencies, Headhunter and Superjob, confirm that salary, job prospects, job stability, work environment, colleagues, location and the nature of the work are all seen, by job seekers, as more important than any accompanying social package, which may include VHI.Footnote 5
We consider that this restrictive institutional setting renders it very unlikely that enterprise provided VHI suffers from the kind of adverse selection that we may expect in more mature health markets. However, even for the US, the evidence suggests that adverse selection is unlikely to be a significant issue for employer-provided insurance [53]. Nevertheless, to reassure ourselves further about this, we examine the empirical regularities within the RLMS-HSE data for evidence of adverse selection.
Specifically, we construct five binary health indicators comprising of: self-assessed health (1 if SAH is average, good, or very good); health problems in the last 30 days; hospitalisation during the previous 3 months; operation during the last 12 months; whether the household faces financial constraints in obtaining treatment/medicine; and three (log) real income/expenditure variables: per capita household spending on medical services; per capita household spending on medicine; and total individual income received in the last 30 days. We then implement a series of t tests comparing the means of these indicators across two groups: those in employment who had no VHI in period t and changed their place of work during the subsequent year for employment in t + 1 that (1) included VHI and (2) did not include VHI. We then repeat this exercise for those who were not in employment (including secondary employment) in period t. The results (Tables 8, 9, 10, 11) reveal no systematic evidence of adverse selection into employer provided VHI. If anything, the results demonstrate that workers changing jobs are healthier and financially better equipped to undertake health expenditure.Footnote 6
In sum, while we do not claim to have a perfect natural experiment, our understanding of the Russian institutional context and the stylised facts that we observe in the data provide for both strong arguments and empirical evidence that Russian workers do not choose their employer based on the expectation of receiving VHI. This being so, we are confident that we can distinguish the effects of post-contract opportunist behaviour from adverse selection and can compare pre- and post-contract behaviours and outcomes across groups. To this end, Russia follows France [24] in providing a rare opportunity to contribute robustly to a US-centric literature with a European example.