Introduction

Poverty and people's health status are intimately connected, yet the relationship between them is complex and bi-directional [1, 2]. On one hand, ill-health may lead to economic poverty [1], or a decrease in expendable income due to high medical bills and/or via a direct reduction, or loss, of wages throughout an illness [3]. On the other hand, poor health may result from poverty [1], including an inability to afford adequate nutrition, sanitation, housing, education and healthcare, and poverty-related lifestyle factors that increase disease risk and/or decrease access to medical facilities and services [4, 5]. In the People's Republic of China (P.R. China), rapid economic growth and human development over the past three decades has brought over 300 million people out of poverty (arbitrarily defined as living on less than US$ 1 per day) and has vastly improved the overall health status of the population [6]. However, it has also affected the course of income distribution such that disparities in socio-economic position (SEP; for a definition, see Appendix) are currently among the most important social policy issues in the country [7]. Inequalities appear to be widening both across and within different provinces in P.R. China, with the rural-urban gap of particular concern [7]. Since SEP is an important determinant of health, it is conceivable that such disparities will lead to large gaps in health care provision within P.R. China [8]. In order to plan, implement and monitor health programs and other publicly or privately provided services in an equitable way, it is necessary to identify the poor, including individuals or households with low SEP, who might be more vulnerable to poor health outcomes [5].

While SEP can be measured on multiple levels [1], in the past it was mostly determined using an individual's education level, sometimes in combination with their occupation. Currently, approaches for measuring household SEP include 'direct' measures of economic status, including (i) income, (ii) expenditure, and (iii) financial assets (e.g., savings and pensions), and 'proxy' measures (e.g., household durable assets (Appendix), housing characteristics and access to utilities and sanitation) developed from the wealth index originally proposed by Rutstein in the mid-1990 s [9]. Direct measurements can be expensive to collect and may require complex statistical analyses that are beyond the scope of many population health studies [5, 1012]. In developing country settings in particular, large seasonal variability in earnings and a high rate of self-employment, together with potential recall bias and false reporting, may render such data inaccurate or even unreliable [10]. Proxy measures are thought to be more reliable, since they require only data collected using readily available household questionnaires supported by direct observation. A study carried out in southeast Nigeria, however, questioned whether proxy measures are indeed more reliable than direct measurements [11]. From a public health point of view, the proxy wealth index approach is more useful than that of direct measures, since it explains the same, or a greater, amount of the differences between households on a set of health indicators than an income/expenditure index, while requiring far less effort from respondents, interviewers, data processors and analysts [10]. Additionally, proxy measures might be more accurate approximations of SEP, as they measure financial stock ('permanent income') rather than flow ('current income'), and hence are less prone to fluctuation [10, 1214].

Due to the large volume of potentially redundant asset data produced, a data reduction technique known as exploratory factor analysis is often utilized. Exploratory factor analysis evaluates the most meaningful basis to re-express a large, pre-determined set of variables, exploring the relationships between them and filtering out noise to reveal indicators that map most strongly to an underlying latent structure. Two common methods of extracting that structure are principal components analysis (PCA; Appendix) and principal axis factoring (PAF; Appendix), which describe variation among the observed variables via a set of derived uncorrelated variables referred to as principal components (PCs) or principal factors (PFs), respectively [15]. Although these two methods often yield similar results, the former is preferred as a method for data reduction, while the latter is widely used for detecting structure within the data. Previously, studies have used either PCA or PAF but comparisons between these two approaches are rare. Based on the inter-relationship between the set of variables, exploratory factor analysis also assigns weights to ownership of the assets. The weights correspond to the factor loadings (eigenvectors; Appendix) of the first derived variable, and are used to generate an index of relative SEP. Using weights derived through exploratory factor analysis may be a more appropriate method of assigning weights to the variables than the more simplistic equal weights method, the complex weighted-by-price-of-item approach or on an ad-hoc basis [16].

Few studies have attempted to verify the extent to which the asset-based index approach is a good proxy for household economic wealth. Concerns include the handling of publicly provided goods and services, and the direct effects of the indicator variables that make up indices, as well as ways of adjusting for household size and age-composition [17, 18]. The increasingly widespread use and application of proxy measurements of household economic wealth and SEP, and growing use of exploratory factor analysis, in public health studies calls for further research in this area, particularly in low-income settings and transitional countries.

Here we report the application of exploratory factor analysis to household data that were collected during a survey of parasitic infections in Hunan province, P.R. China. Our aim was to calculate and examine asset-based proxy wealth indices generated by PCA and PAF, and to compare them to other measures of wealth based on purely economic variables, including self-reported annual household income and savings. Results are reported for a rural and a peri-urban (Appendix) setting and aggregated between the two.

Methods

Study area and population

The study was carried out in two villages; namely (i) Wuyi, in Hanshou county, southern Dongting Lake area, and (ii) Laogang, in Yueyang county, eastern Dongting Lake area. Both villages are located in Hunan province. The surveys were conducted between November and December 2006. The villages were selected on the basis of previous studies investigating the epidemiology of parasitic infections, including schistosomiasis [19]. Wuyi is situated in a rural area, whereas Laogang is peri-urban, located on the outskirts of Yueyang, the major city in the Dongting Lake region. All individuals from both villages were invited to participate in the study.

Field procedures

Senior personnel from two local schistosomiasis control stations were involved in co-ordinating the study. Basic demographic information was obtained from a census performed one year previously in both villages. The questionnaire was translated into Mandarin Chinese, back-translated into English, pre-tested in a nearby village and readily adapted to the local setting. It was administered to the heads of households, and included questions on household demographics, the number of wage earners and non-wage earners, annual household income (7 categories: <500; 500-1999; 2000-4999; 5000-9999; 10,000-29,999; 30,000-49,999; ≥50,000 CNY) and savings (6 categories: <500; 500-699; 700-999; 1000-1499; 1500-1999; ≥2000 CNY), the primary and secondary sources of income, ownership of 22 household durable assets (e.g., color TV, washing machine, air conditioner, etc.), 10 housing characteristics (e.g., floor material, wall material, roof material, etc.) and six utility (Appendix) and sanitation variables (e.g., tap water, toilet in house, etc.).

Interviewers were familiar with the local setting and dialect and were acquainted with qualitative methods. The head of each household was invited to respond to the questions; if the household head was absent on the day of interview, the interviewer returned to that residence the following day, for up to a period of 14 days, after which the next of kin was asked to respond.

Consent and ethical approval

Ethical clearance for this study was obtained from the Medical Ethics Committee of Hunan province and Queensland Institute of Medical Research. Village authorities were informed about the aims and procedure of the study and provided written informed consent. Oral informed consent was obtained from each individual.

Data management and statistical analysis

Data management

Data were double-entered into a bilingual Microsoft® Access 2002 database, cross-checked and subsequently analyzed with SPSS version 16.0 (Illinois, USA).

Socio-economic data and asset ownership

Household income and savings data were equivalized to adjust for household needs based upon the number of household members (per-capita) and a combination of number and age (per-adult, defined as individuals aged >16 years). This was done using the median value of each income or savings band and dividing by the number of members or adult members per household. Annual per-capita income and per-adult income, primary source of income, ability to save (yes or no binary variable) and annual per-capita and per-adult savings were then examined using a χ2 test. The Student's t-test statistic was used to compare the mean age and mean household size between the two villages. A stepwise multinomial logistic regression analysis with annual per-capita income bands as the dependent variable and ownership of household durable asset as independent binary covariates was used to test the association between household income and asset ownership within each setting separately and for the pooled data from both villages. Covariates were included at a significance level of <0.2. Covariates that were not significantly associated with income were removed in a stepwise backward elimination process. Adjusted odds ratios (OR) and 95% confidence intervals (CI) were computed for associations with p-values <0.05.

Construction of asset-based proxy wealth indices using PCA and PAF

A detailed protocol of how we constructed asset-based proxy wealth indices is given in the Additional File. In brief, the binary data on household durable assets, housing characteristics and utility and sanitation variables were organized into a matrix with m households as rows (where m rural = 258 and m peri-urban = 246) and n variables as columns. The initial n = 38-item correlation matrices for each setting were examined for internal consistency (Table 1). To enable the matrix to be factorable, only variables with sufficient correlation (φ > |0.3|) with at least three other variables were included in further analyses. If any variable correlated highly (φ > |0.8|) with other variables, only one variable from the group of correlated variables was arbitrarily selected and included in further analyses, to avoid multicollinearity. Factorability of the m by n matrices was determined using Bartlett's test of sphericity (Appendix) and the Kaiser-Meyer-Olkin (KMO) test (Appendix). Variables were excluded in a stepwise manner until a factorable m by n correlation matrix with a KMO >0.7 was reached, for each village separately. Diagonal and off-diagonal values of the anti-image correlation matrix (Appendix) were used to assess the sampling adequacy.

Table 1 Demographic variables of the respondents in a household-based questionnaire survey in rural (Wuyi village) and peri-urban (Laogang village) settings of Hunan province, China

Next, components and factors were extracted from each of the final two correlation matrices using PCA and PAF, respectively. Components and factors, respectively, were extracted without and with rotation (Appendix) and the best method was selected according to the maximum squared factor loadings and the relative simplicity of the model. In each case, eigenvalues >1 (Appendix), examination of the scree plots and the cumulative proportion of variance explained by each component or factor were taken as criteria for extraction. For simplicity, a cut-off eigenvector > |0.3| was used to signify component or factor loadings of interest and, where variables loaded equally on more than one component or factor, the Cronbach's coefficient α (Appendix) was used to select the component or factor on which to place the variable.

The PC and PF loadings were used to compute standardized indices of relative household wealth within each village, according to the following equation:

A i = γ ^ 1 α i 1 + + γ ^ k α i k

where α i k = ( x i k x ¯ k ) / S k

such that A i is the standardized asset index score per household i, the k s are the factor loadings or weights of each asset k, estimated by either PCA or PAF, and the α ik s are the standardized values of asset k for household i (i.e., x ik is the ownership of asset k by household i, where 0 represents not owning the asset and 1 represents owning the asset, and x ¯ k and s k are the sample mean and standard deviation (SD) of asset k for all households).

The association between the PCA- and PAF-based proxy wealth indices was estimated by the Spearman's rank correlation coefficient (Appendix). Based on the overall small sample size of our study, we chose to divide each index into quartiles, rather than the standard quintiles or tertiles, representing: (i) most poor (MP), (ii) below average (BA), (iii) above average (AA), and (iv) most wealthy (MW) households.

Proxy wealth indices and self-reported income and savings

Corresponding wealth quartiles were also generated based on annual household per-capita income and on a combination of household income and savings, as follows: (i) high income (≥4000 CNY per person per year) with savings, (ii) high income without savings, (iii) low income (< 4000 CNY per person per year) with savings, and (iv) low income without savings. Households' categorical position for each respective index was assessed by a Kappa agreement, using the following cut-offs: 0, no agreement; 0.01-0.2, poor agreement; 0.21-0.4, fair agreement; 0.41-0.6, moderate agreement; 0.61-0.8, substantial agreement; 0.81-1, almost perfect agreement [20]. Households that were re-ranked into different quartiles were examined in further detail. Mean scores per category were examined by means of Kruskal-Wallis (Appendix) analyses and a ratio of MW to MP was calculated. This entire process was then repeated for the pooled data from both villages (m total = 504).

Results

Study compliance and operational results

From a total of 646 households in both villages, 504 (78.0%) had complete datasets. This corresponded to 258/294 (87.8%) in the rural setting and 246/352 (69.9%) in the peri-urban setting. Demographic variables are summarized in Table 1.

Comparison of income, savings, and possession of assets

Table 2 shows annual household per-capita income and per-adult income, the primary source of income and the ability to save based on the primary source of income, for both settings. Annual household per-capita income was significantly associated with village setting (χ2 test p <0.001), as was annual household per-adult income (χ2 test p <0.001). The primary source of household income was also associated with village setting (χ2 test p <0.001). Overall, 156 (31.0%) households reported the ability to save money; however this was more frequent in the rural setting (106 or 41.1% vs. peri-urban: 50 or 20.3%; χ2 test p <0.001). Within both villages, saving was positively associated with annual household per-capita income (χ2 test p <0.001 and χ2 test p <0.001) and varied significantly according to primary source of income (χ2 test p = 0.040 and χ2 test p = 0.027) for Wuyi and Laogang, respectively. In Wuyi, younger household heads were more likely to save than their older counterparts (χ2 test p = 0.006), while there was no significant difference in Laogang. Within both settings, the amount of money saved per capita was also positively associated with annual household per-capita income (χ2 test p <0.001 and χ2 test p = 0.001), but not with the primary source of household income or with age for rural and peri-urban settings.

Table 2 Self-reported annual household income and savings in rural (Wuyi village) and peri-urban (Laogang village) settings of Hunan province, China

Table 3 shows the complete list of household durable assets, housing characteristics and utility and sanitation variables for both settings. Item ownership varied between and within villages. For example, all 246 peri-urban households but only 5 (1.9%) rural households had tap water in the house. While 229 (88.8%) rural households owned animals, the respective number and percentage was 46 (18.7%) among peri-urban households.

Table 3 Prevalence of ownership of household durable assets, housing characteristics and utility and sanitation variables in rural (Wuyi village) and peri-urban (Laogang village) settings of Hunan province, China

Table 4 summarizes all significant associations between annual per-capita household income and ownership of household durable assets across pooled data from both rural and peri-urban settings, with the model accounting for 48.6% of variation in the data.

Table 4 Significant associations between annual per-capita household income and ownership of household durable assets, as assessed by a stepwise multinomial logistic regression analysis.

Comparison of asset-based indices constructed using PCA and PAF

Examination of the initial correlation matrix for both settings identified inter-item φ correlations >0.8, excluding numerous variables from further analyses. The final correlation matrix consisted of 15 variables for Wuyi and 11 variables for Laogang, and 14 variables for the pooled data (Table 5).

Table 5 Principal components (PCs) or principal factors (PFs) extracted by principal components analysis (PCA) and principal axis factoring (PAF), showing component or factor loadings.

Bartlett's test of sphericity was significant in both settings (rural: χ2 test p <0.001 and peri-urban: χ2 test p <0.001) and for the pooled data (χ2 test p <0.001), and the respective KMO statistics were 0.788, 0.726 and 0.863. The anti-image correlation matrix measures of sampling adequacy were above 0.636 and the off-diagonal values were below |0.345| for each setting and pooled data. Cronbach's coefficient α for the 15-item scale was 0.652 in Wuyi, 0.615 for the 11-item scale in Laogang, and 0.667 for the 14-item scale pooled data.

PCA and PAF revealed four components or factors with eigenvalues >1.0 in the rural setting and three in the peri-urban setting. In each case the first component or factor comprised of several heavily loaded variables (eigenvectors >0.3) and accounted for 24.3% and 27.8% of the variation in the data from Wuyi and Laogang, respectively, while the remaining components or factors had fewer variables and explained a smaller proportion of the variation (Table 5). For the pooled data, three components or factors had eigenvalues >1.0 and the first component or factor accounted for 33.9% of the variation in the data. The un-rotated extraction method was selected for PCA and PAF in both settings and for the pooled data, as rotation did not add measurably to the simplicity or fit of each of the models. The relative magnitude and direction of the weights in the PCA and PAF models are consistent within settings (Table 5) and across pooled data (data not shown).

For both settings, standardized indices of relative wealth were created using heavily loaded variables of the first PC, or the first PF, with variables weighted according to their eigenvector, as in the equation. All four indices showed evidence of clumping and truncation (Figure 1). The PCA and PAF indices correlated well with each other within each village (for both settings Spearman's rho = 0.99, p <0.001). The Kappa agreement was found to be almost perfect, with values of 0.91 and 0.81 for Wuyi and Laogang, respectively. In Wuyi, 17 (6.6%) households were in different quartiles according to factor extraction method, while this was the case for 35 (14.2%) households in Laogang (Figure 2).

Figure 1
figure 1

Distribution of the standardized asset-based proxy wealth index scores created using exploratory factor analysis with the principal components analysis (PCA) extraction method (a, c) and the principal axis factoring (PAF) extraction method (b, d) in rural (Wuyi village) and peri-urban (Laogang village) settings, Hunan province, P.R. China.

Figure 2
figure 2

Correlation of the standardized asset-based proxy wealth index scores created using exploratory factor analysis with the principal components analysis (PCA) and the principal axis factoring (PAF) extraction methods in (a) rural (Wuyi village) and (b) peri-urban (Laogang village) settings of Hunan province, P.R. China. Lines vertical to the axes define the respective wealth quartiles of each index for rural (dashed) and peri-urban (dotted) settings, respectively.

Comparison of proxy wealth indices with self-reported income and savings

Both PCA and PAF indices showed a weak, but significant, positive correlation with annual household per-capita income (Spearman's rho = 0.27, p <0.001 for PCA and Spearman's rho = 0.26, p <0.001 for PAF), with annual household per-adult income (Spearman's rho = 0.30, p <0.001 for PCA and Spearman's rho = 0.29, p <0.001 for PAF) and with annual household per-capita savings (Spearman's rho = 0.16, p = 0.016 for PCA and Spearman's rho = 0.16, p = 0.017 for PAF) in the rural setting. Similarly, in the peri-urban setting both indices were weakly, but significantly, correlated with annual household income (per-capita income Spearman's rho = 0.27, p <0.001 for PCA and Spearman's rho = 0.28, p <0.001 for PAF and per-adult income Spearman's rho = 0.36, p <0.001 for PCA and Spearman's rho = 0.37, p <0.001 for PAF) and with annual household per-capita savings (Spearman's rho = 0.26, p = <0.001 for PCA and Spearman's rho = 0.27, p <0.001 for PAF). Mean asset-based index scores, derived either by PCA or PAF, were significantly different between combined income and savings categories in both settings (PCA rural score (i) high income with savings: 21.5, (ii) high income without savings: 12.8, (iii) low income with savings: -8.0, and (iv) low income without savings: -16.3; PCA peri-urban score (i) high income with savings: 25.4, (ii) high income without savings: 11.5, (iii) low income with savings: 2.6, and (iv) low income without savings: -16.8; Kruskal-Wallis = 27.6, degrees of freedom (d.f.) = 3, p <0.001 for the rural setting and Kruskal-Wallis = 32.0, d.f. = 3, p <0.001 for the peri-urban setting) (Figure 3).

Figure 3
figure 3

Mean asset-based index scores, derived either by principal components analysis (PCA), according to income and savings categories. Values shown are for rural (Wuyi village) (filled) and peri-urban (Laogang village) (blank) settings, Hunan province, P.R. China.

We found wide disparities among the asset-based proxy wealth quartiles in mean annual household per-capita income and per-adult income. To illustrate this issue, using the PCA extraction method, we found highly significant Kruskal-Wallis test results both for rural and peri-urban settings (annual household per-capita income in rural setting Kruskal-Wallis = 14.7, d.f. = 3, p = 0.002 and peri-urban setting Kruskal-Wallis = 21.0, d.f. = 3, p <0.001 and for per-adult income in rural setting Kruskal-Wallis = 23.7, d.f. = 3, p = 0.001 and peri-urban setting Kruskal-Wallis = 35.1, d.f. = 3, p <0.001). Similarly, we found disparities among wealth quartiles in a household's ability to save for both settings (χ2 test p = 0.014 and χ2 test p <0.001 for rural and peri-urban settings, respectively) (Table 6, rural setting only). This pattern was also confirmed when comparing mean annual household per-capita savings among wealth quartiles for the peri-urban setting (Kruskal-Wallis = 17.3, d.f. = 3, p <0.001) but not for the rural setting (Kruskal-Wallis = 6.9, d.f. = 3, p = 0.077). Disparities in a combination of annual household income and saving were also apparent between MW and MP quartiles, in both settings (Table 6, rural setting only).

Table 6 The relationship between the proxy wealth index generated using principal components analysis (PCA) and income and savings, among households in a rural (Wuyi village) setting, Hunan province, China

When the analyses were repeated for pooled data, we found that households from each setting were highly unequally distributed among the proxy wealth quartiles (Figure 4). Both PCA- and PAF-based indices showed weak, but significant, positive correlations with annual household per-capita income (Spearman's rho = 0.27, p <0.001 for PCA and Spearman's rho = 0.28, p <0.001 for PAF), per-adult income (Spearman's rho = 0.21, p <0.001 for PCA and Spearman's rho = 0.22, p <0.001 for PAF) and per-capita savings (Spearman's rho = 0.26, p <0.001 for PCA and Spearman's rho = 0.27, p <0.001 for PAF). Kappa agreements of the PCA and PAF indices with the index based on per-capita income were poor (0.12 and 0.13, respectively). Wide disparities in household durable assets, housing characteristics, utilities and sanitation were clear among the four proxy wealth categories. Disparities in a combination of annual household income and saving were also apparent between MW and MP quartiles (Table 7).

Figure 4
figure 4

Proportion of households within each proxy wealth quartile that are in the rural (Wuyi village) (blue) and peri-urban (Laogang village) (green) settings of Hunan province, P.R. China, respectively. Quartiles represent (a) most poor (n = 127); (b) below average (n = 126); (c) above average (n = 126); and (d) most wealthy (n = 125) categories for pooled data from both villages.

Table 7 The relationship between the proxy wealth index generated using principal components analysis (PCA) and income and savings, among households in rural (Wuyi village) and peri-urban (Laogan village) settings, Hunan province, China

Discussion

This study contributes methodologically and analytically to research into measurements of wealth and SEP in a country undergoing rapid social, economic, demographic and health transitions [21]. Using household-level data collected with a pre-tested and standardized questionnaire in a rural and a peri-urban setting in Hunan province, P.R. China, we examined asset-based proxy wealth measurements constructed by two common exploratory factor analysis approaches. Our results confirm that, although they have different underlying theoretical assumptions, both PCA and PAF are equally effective statistical techniques in evaluating relative wealth among households. Consistent with the proxy wealth indices derived in the Demographic and Health Surveys (DHS) [9], we selected the first un-rotated component/factor, which accounted for 24.3% (rural) and 27.8% (peri-urban) of the overall variation in the data. Proxy wealth index scores were significantly associated with wealth quartiles based on a household's self-reported annual income, and a combination of income and savings, but not savings alone. We found large discrepancies between MW and MP households within and between the two study villages. However, further analyses of pooled results suggest that when combining data from the two settings, these differences may be structural, owing more to urbanization, modernization and accessibility of goods and services, rather than wealth per se. This may be particularly true for P.R. China, which is undergoing a long-term, yet spatially heterogenous, period of industrialization and development [2224].

Salaries in the rural setting were frequently at the low (e.g., CNY <2000) and the high (e.g., CNY >7000) ends of the spectrum, while those in the peri-urban setting seemed clustered in the middle. This is possibly explained by reporting bias, as many of the peri-urban respondents were the next of kin and not the household head, or, by unaccounted externalities (e.g., government policies) imposing a spatial correlation on household income [25]. As in most of rural P.R. China, household income was predominantly (in the case of 220, or 85.3%, households) sourced from fishing and/or farming activities [26]. However, in the peri-urban setting our questionnaire survey failed to capture the most common source of primary income (175, or 71.1%, peri-urban household respondents reported 'other' primary source of income), although anecdotal evidence suggests that these are mainly remittances from non-resident household members and occasionally government payments or basic pension schemes. Saving was more commonly reported in the rural setting which, assuming no reporting bias, is likely a result of less secure employment, and hence greater income uncertainty [27], and a weakened social security system bringing about high user charges for public services [28]. We found that age was an important factor in saving patterns of rural households, implying that younger households smooth consumption, perhaps in order to invest so that their living standards can be enhanced in the future. Furthermore, stronger social networks in the rural setting may impact on decision making behavior such as household expenditure patterns, while costs of basic needs may also be substantially lower in rural areas [29, 30].

Though proxy measures of wealth are welcome tools in international health research [15], the construction of indices based on exploratory factor analysis has been criticized for being subjective and unstandardized [12, 31]. Conversely, several studies have reported that the asset-based index is a more accurate indicator of long-term wealth than income and consumption data [14, 15]. Nonetheless, the reliability of the asset-based index has also been questioned by some authors [11]. Indeed, using binary data, such as ownership of a particular asset, may violate underlying assumptions that the measured variables are related in a linear fashion to the underlying latent constructs (i.e., wealth). Our results confirm those in other settings [15, 32], indicating internal consistency and robustness in both methods, particularly for higher ranking households. While household income showed a significant association with ownership of numerous household durable assets, the correlation between the asset-based proxy wealth indices and the direct measures of wealth was low. The proxy wealth models explained a higher proportion of data in the peri-urban setting than the rural setting (27.8% vs. 24.3% for PCA), which may add strength to the concern that an asset-based index is a more 'appropriate' measure of wealth in urban areas compared with rural areas [18, 31]. To increase these percentages, other data analysis tools such as the modified hierarchical ordered probit (HOPIT) model [33] and multiple correspondence analysis (MCA) [34] may be used to weight the indicators and should be explored further in subsequent studies.

Questions remain regarding the choice and number of variables to be included, although it has been suggested that the data should comprise of 10-15 subjects per variable [35]. With 15 and 11 variables in the rural and peri-urban villages, respectively, our sample size of 246-258 households was satisfactory. Sampling adequacy was further confirmed by the KMO measure (KMO >0.7 is said to be 'meritious'), and by Bartlett's test of sphericity, which indicated that the correlation matrices were not identity matrices, and hence the factor model was appropriate. Retaining only components or factors with eigenvalues >1.0 ensured that they explained at least as much variance in the data as one measured variable, since the variance accounted for by each of the components is its associated eigenvalue. However, Cronbach's coefficient α was just below 0.7 for each setting, indicating that up to 50% of the variance in the items may be attributable to measurement error. Similar to other studies, we found that the first PC and the first PF only explained a low percentage of variation in the data (20-30%). This finding suggests that, while the derived indices do provide a proxy measure of wealth, it is estimated with a considerable level of inaccuracy [31, 32, 36]. Although inclusion of the remaining components or factors helps explain some of the remaining variation, it is unclear if, and how, this should be done [37].

Consistent with findings from other studies, both PCA and PAF showed signs of clumping and truncation, hindering their ability to accurately classify wealth quartile borderline households, although this was less obvious in the rural data [18, 32, 34] (see Figure 1). Clumping may be a statistical phenomenon caused by a lack of input variables that can adequately distinguish between households of a similar economic status [18], or it may be a product of social and economic homogeneity stemming from half a century of socialist rule in P.R. China [27, 31]. In the peri-urban setting, including ownership of computers and household Internet service may have helped further differentiate between the AA and MW. Differentiating between the age, price, condition and quantity of specific assets may reduce the effect of clumping and/or truncation and should be explored in greater detail, although results from previous studies imply that this information may not add to the accuracy or robustness of the index [14, 38]. Furthermore, in our study the few households which were re-classified into different quartiles according to the factor extraction method employed only moved to immediately adjacent quartiles.

Notably, our asset-based proxy wealth index includes utility and sanitation variables, which can have direct effects on health, hence making it difficult to separate out indirect effects on health, via improved living conditions, from direct ones. Furthermore, a distinction should be made between variables that may be determinants of wealth, such as means of production, communication or transport, and those that are purely indicators of wealth, such as certain leisure goods [18]. Where quantifying the extent of inequality is the major goal, the concentration index and its associated concentration curve may be used [39, 40]. Alternative approaches to measuring wealth, for example participatory wealth ranking (PWR), may be borrowed from development studies or from econometrics, potentially providing new insight for public health researchers [41].

An important drawback of the household survey method employed in our study is that the population sample did not include migrant populations, who tend to be the most poor and socially disadvantaged households in society [42], or information on informal remittances from temporary migrating household members [31]. Furthermore, the census data employed were obtained one year before our survey, and may have become inaccurate in the fast-changing living environment of contemporary P.R. China. Finally, compliance in the peri-urban setting was considerably lower than in the rural setting (70% vs. 88%) and no further information was available on non-compliant individuals for comparison [43]. Including the migrant population may have significantly altered the patterns emerging from our aggregated data and the apparently wide systemic rural to peri-urban gap [17].

While it is beyond the scope of this paper to comprehensively explore the factors behind the disparities both within and between settings, we call upon further research into the complex interactions between these and other assets such as human capital, public capital and land assets [44, 45]. This would help to establish the driving forces of the observed differences between direct and proxy measures of wealth and to further examine how these differences impact on health service utilization, research and health policy [44, 45]. Improved living conditions and diminished inequality gaps are not only important as distal and proximal determinants of health, but are also vital factors for national and regional socio-political stability [29]. Closing the rural to urban gap in particular is currently a top policy priority in P.R. China, with the 11th Five Year Plan (2006-2010) having introduced the "Building a Socialist New Countryside" campaign [46]. In order to monitor and evaluate this campaign, however, it is crucial to have a time- and cost-effective appraisal of relative SEP [47]. This paper supports the use of the asset-based index as a proxy measure of wealth, with weights derived from either PCA or PAF, although we recommend caution when comparing aggregated data from various settings. Given the renewed interest in the role of inequalities on economic inefficiency [48], and the important role of P.R. China in achieving the Millennium Development Goals (MDGs) [49], it is conceivable that these methods will be of use in numerous other applications, as well as in other geographical locations.

Appendix Definitions of economics and statistics terms used in this paper

Anti-image correlation matrix

A matrix containing the negatives of the partial correlation coefficients. Most of the off-diagonal elements should be small in a good factor model.

Asset

An item of ownership convertible into cash.

Bartlett's test of sphericity

A method to test whether the correlation matrix is an identity matrix, which would indicate that the factor model is inappropriate.

Cronbach's coefficient α

A method of assessing the internal consistency, or reliability, of a set of items, where [(1-α2) × 100] indicates the percent of variance in the items that could be attributed to measurement error.

Durables

Manufactured products such as an automobile or a household appliance that can be used over a relatively long period without being depleted or consumed.

Eigenvalue

The scalar of the associated eigenvector, indicating the amount of variance explained by each PC or each PF.

Eigenvector

A vector that results in a scalar multiple of itself when multiplied by a matrix. It corresponds to the weights in a linear transformation when computing PCA and PAF.

Kaiser-Meyer-Olkin (KMO)

A measure of sampling adequacy which tests whether the partial correlations among items are small.

Kruskal-Wallis

A non-parametric method for testing equality of population medians among groups.

Peri-urban

Immediately adjoining an urban area; between the suburbs and the countryside.

Principal axis factoring (PAF)

A data reduction technique which uses squared multiple correlations as initial estimates of the communalities. The communalities are entered into the diagonals of the correlation matrix before factors are extracted from the matrix, allowing the variance of each item to be a function of both item communality and non-zero unique item variance.

Principal components analysis (PCA)

A data reduction technique using the principle components model. It assumes that components are uncorrelated and that the communality of each item sums to one for all components, therefore implying that each item has zero unique variance.

Rotation

Turning the reference axes of the factors about their origin in order to achieve a simpler and theoretically more meaningful factor solution than is produced by the unrotated factor solution; the positions of the items are fixed in geometric space while the factor axes are rotated through specified angles.

Socio-economic position (SEP)

An aggregate concept that includes both resource-based and prestige-based measures. Resource-based measures refer to material and social resources and assets, including income, wealth, and educational credentials. Prestige-based measures refer to individuals rank or status in a social hierarchy, typically evaluated with reference to peoples' access to, and consumption of, goods, services and knowledge, as linked to their occupational prestige, income, and educational level.

Utility

A public service such as plumbing, electricity or railroad line.