Data
Given our multi-level framework, we need data both at the individual and aggregate (provincial) level. Individual-level SRH and physical measures of girth, blood pressure, handgrip strength, and covariates comes from the 2011 national baseline survey of the Chinese Health and Retirement Longitudinal Study (CHARLS). CHARLS, modelled after the US Health and Retirement Study, provides a wide range of information, from socio-economic status and social support to health conditions for a randomly selected and nationally representative sample of Chinese residents age 45 and above, living in households. The total analytical sample of the baseline national wave comprises 17,368 individuals across 28 provinces. Among the interviewed subjects, 11,635 individuals (67%) accepted to provide a blood sample, and therefore allow us to have biomarker information on glycosylated haemoglobin, triglycerides, cholesterol ratio and C-reactive protein. Since women had a slight higher blood response rate than men (69 versus 65%), we use the sample weights that the CHARLS team has calculated to correct for household and individual non-response as well as non-participation in the blood collection (for details, see Zhao et al. 2014).
Outcome variables
We study SRH, seven single biomarkers, and AL as outcomes. SRH and single biomarkers are coded as dichotomous indicators. AL is treated as continuous variable. The seven single biomarkers are: girth, glycosylated haemoglobin, blood pressure, triglycerides, cholesterol ratio, C-reactive protein, and grip strength. These biomarkers are good predictors of diseases. For instance, Ridker et al. (2003) found that C-reactive protein helps predict risk of heart attack and stroke. Abdominal girth (Alberti and Zimmet 1998), glycosylated haemoglobin (an indicator for diabetes) (Seccareccia et al. 2001), triglycerides, cholesterol ratio, and blood pressure (Chobanian et al. 2003) are risk factors of cardiovascular disease (CVD). For each of the single biomarkers, a binary indicator of bad health is created. A score of “1” is assigned to those with “high-risk” values and a score of “0” is assigned to those with “lower risk” values. Values assigning high and low risk are based on cut-off values commonly accepted in clinical practice and the literature for Chinese or Asians (see Table 1). However, each of these biomarkers only measures the potential for a specific type of disease, their effects on general health could be fairly small and a type II error may present. Hence, we also include AL as a summary measure representing the number of biomarkers falling within high-risk values. The AL in this study refers to the group allostatic load index (Juster et al. 2010) which is equal to the sum of “high-risk” conditions weighted by the number of non-missing values. Respondents missing more than three biomarkers are excluded (15%). AL is natural log transformed to reduce skewness, and subsequently standardised for ease of interpretation of results. A number of variables has missing or unknown values, the highest level being reached for household income (5.8%). After deleting these cases we are left with 16,249 respondents for SRH models; the sample size for biomarker models varies from 10,802 to 12,827.
Table 1 Cut-off points for high risk values of individual biomarkers
Provincial-level independent variables
Most provincial-level information is obtained from the 2012 Chinese Statistics Yearbook. Nine variables are selected from available official statistics to reflect the economic situation and healthcare organisation of each province, including GDP per capita, urban/rural median income, level of urbanisation (measured by the proportion of a province’s population living in urban areas), government expenditure for health care, number of hospitals, primary care institutes and doctors weighted by provincial population, and the ratio of government expenditure for health care to total government expenditure. Four variables—GDP per capita, urban median income, rural median income, and level of urbanisation—transformed statistically using natural logarithm. To limit the number of provincial-level variables and to avoid multicollinearity problems, principal component analysis is carried out, with two components extracted, which we interpret as representing a general economic development dimension and a health infrastructure dimension.
Consistent with the literature regarding income inequalities, we use the Gini coefficient. However, given that there are no official published data on Gini coefficients at the provincial level, we constructed Gini coefficients using CHARLS. More specifically, we use equivalised household income using the square root scale and calculate the Gini coefficients for each of the 28 provinces using the package INEQDECO in Stata 12.0, taking the design-weights into account. The Gini coefficient for the jth province is computed as (Jenkins 2015):
$${G_j}=1+\frac{1}{{{n_j}}} - \left( {\frac{2}{{n_j^2 \cdot {\mu _j}}}} \right)\mathop \sum \limits_{i=1}^{{n_j}} \left( {{n_j}+1 - i} \right){y_{ij~}},\quad j=1,2, \ldots ,k~~$$
(1)
In (1) n
j
represents the number of households in the province, \({\mu _j}\) is the average equivalised household income, and y
ij
denotes the equivalised household income for household i in province j (with households sorted by their income).
Individual-level control variables
At the individual level, we use three SES-related control variables: urban/rural residency, education, and equivalized household income. Other socio-demographic covariates include: age, age squared, gender, marital status, and living arrangement.
Analytical strategy
We use a set of two-level logit regression models using the same specification for different outcomes variables, with individuals nested in provinces. The generic equation for a binary health outcome variable y
ij
for the individual i living in province j has the following form:
$$\log \frac{{\Pr \left( {{y_{ij}}=1} \right)}}{{1 - \Pr \left( {{y_{ij}}=1} \right)}}={\gamma _{01}}{X_{ij}}+{\gamma _{02}}{W_j}+{\mu _{0j}}$$
(2)
where X
ij
is a vector of individual-level independent variables; \({W}_{j}\) is a vector of provincial-level variables; \({\gamma }_{01}\) and \({\gamma }_{02}\) are vectors of parameters related, respectively, to individual- and provincial-level covariates; \({\mu _{0j}}\) is a random intercept at the provincial level (with the usual assumptions of normal distribution, independence from observed variables, expected value of zero and variance \({\sigma }_{{\mu }_{0}}^{2}\)).