Background

Population ageing is quickly becoming a problem worldwide. The World Health Organization (WHO) reported that there are currently 900 million people aged 60 years and older, which may increase to 2 billion by 2050 [1]. Furthermore, in 2050, approximately 80% of the elderly are predicted to live in countries that are currently low- or middle-income [2]. The World Health Statistics reported that the life expectancy in most countries was greater than 60 in 2015, and the global average life expectancy was 71.40 years [3]. The mortality rate of the elderly is decreasing, which is the primary reason for the increasing life expectancies in high-income countries [4]. Although there have been considerable research developments regarding the medical and public health of the elderly, the health status of the elderly is not significantly better than that of their parents [1].

However, the definition of health is no longer merely the absence of diseases. The ability for routine functioning is also important and should be given proper attention when assessing health status [1]. The WHO has stated that “health is a state of complete physical, mental and social well-being and not merely the absence of disease or infirmity” [5]; however, the requirement of “complete…well-being” does not apply to the aged population. Many elderly individuals with one or two chronic diseases consider themselves “well enough” to be aging successfully, which refers to a status characterized by a low probability of diseases and related disabilities, high cognitive and physical functioning, and active social engagement [6, 7]. Therefore, a specialized measurement of health status for the aged population should be developed separately for an accurate description of elderly health status.

It is more difficult to establish the norm of social health compared to that of psychological or physical health [8]. Social health contains two aspects: individual social health and the social health of society or a population [9]. Social health of an individual is usually explained as “well-being”, “adjustment” or other terms rather than “health” [10], and it can be measured from two aspects: social support (SS) and social adjustment (SA). The assessment of SS mainly discusses the processes and outcomes of support from relatives, friends or other people. The measurement of SA usually refers to relationships with others and the performances of social roles [9]. SS places emphasis on the level of social support the subject receives from others whereas SA focuses on the adaptive capacity of subjects to actively interact with the community where they live. Some studies have assessed the relationships between SS, SA and other health outcomes and reported that SS was a significant factor contributing to loneliness in the elderly [11]; moreover, emotional support has a positive effect on reducing the mortality of the elderly [12]. Some researchers have mentioned that SA is related to quality of life [13] and that psychotherapy is effective for improving the SA of elderly individuals with suicide attempts [14].

Another important tradition of social health assessment is the characteristics of society, that is, the social health of the society as a whole. A healthy society is defined as follows: “A society is healthy when there is equal opportunity for all and access by all to the goods and services essential to full functioning as a citizen” [10]. In addition, previous studies have indicated that the neighbourhood environment can significantly influence the psychological and physical health of the elderly [15]. Therefore, the “social health of society” mainly reflects the neighbourhood environment. The utilization of health services was partly determined by the perceived health status [16]. Similarly, the utilization and feeling of the same objective environment might be different between any two people [17] and is influenced by their demands and criterions. Instead of objective environmental indicators, perceived environmental indicators are more suitable for assessing the support received from the environment. Therefore, to assess the social health of society, this study took perceived environment resources (PERs) into account, which refers to perceived built environment, community management and service. The relationships between PER and health outcomes have been reported, and previous studies have demonstrated that PER was marginally associated with greater possibilities of poor self-rated health [18] and was associated with depressive symptoms, anxiety and physical symptoms [19].

To improve the health status of the Chinese elderly, the development of a specialized and comprehensive measuring tool that can accurately evaluate the social health status of the Chinese elderly is required. Social health is an important part of health. However, a measuring tool for the Chinese elderly has not been previously developed. This study aimed to develop a scale to assess the social health status of the elderly that evaluated both the social health of the individual (SS and SA) and the social health of society (PER). The scale could contribute to a more comprehensive measurement of the health status of Chinese elderly.

Methods

Design

We developed the Social Health Scale for the Elderly (SHSE) over 4 phases, which are discussed in detail below.

Phase 1

Based on the literature review findings, the items in the original draft scale were chosen. Some items were excluded after consulting with experts, and a revised version of the draft scale was developed.

Phase 2

Pilot testing aimed at selecting the items for the revised draft scale. In this phase, a test-retest reliability analysis, Cronbach’s alpha analysis, a correlation analysis, a distinguishability analysis and a principal component analysis were conducted for item selection, and then the final versions (some items in the long form were deleted in the short form) of the SHSE were generated.

Phase 3

Field testing was conducted to assess the validity and reliability of the scales (SHSE-L: long form of the SHSE; SHSE-S: short form of the SHSE). The test-retest reliability, internal consistency reliability, inter-rater reliability, concurrent validity, construct validity, convergent validity and discriminant validity were calculated in this phase.

Phase 4

Based on the field testing data, the raw score distributions among the different groups could be compared, and two norms (standard norm and percentile rank norm) of social health were generated.

Development of the draft scale

The draft scale was generated by reviewing published books, systematic reviews and original articles [9, 15, 20,21,22]. Objective evaluation indicators, such as the frequency of communication with children and duration of optimistic mindset, were considered the better choices. The item pool included items related to social health as much as possible, and each question intended to reflect a specific aspect of some items.

After consulting with sociology experts and public health experts, the items in the original draft scale that contained repeated content or were not suitable for the Chinese elderly were deleted, and necessary missing items were added. The questions and options were modified for better intelligibility.

Data collection

Before the pilot testing, a trial survey was conducted to test the investigation ability of the interviewers after training. Each interviewer was required to participate in standardized training and then normatively interviewed at least one person who was aged 60 years or older. Four communities in the Gongshu district were randomly selected. The Gongshu district is located in the centre of Hangzhou, and the proportion of elderly individuals there is similar to that in Hangzhou as a whole [23]. The minimum sample size was calculated to ensure that there were at least 10 subjects per item in the factor analyses [24]. The target population was the general healthy population aged 60 years and older. After the Health Records in community public health service stations were checked, persons who were bed-ridden, had serious physiological or psychological illnesses, and/or had hearing disorders, were excluded before sampling. Then, stratified random sampling by age and gender was conducted. The community doctors contacted potential participants by telephone before conducting the interviews to obtain higher resident compliance. Each participant was required to sign informed consent if he or she agreed to be an interviewee. The interview was conducted at the Community Health Service Centre of the community that the participants lived in, and the participants were required to attend the interview in person to complete a face-to-face interview. During the interview, if the interviewer believed that the characteristics of this participant met the exclusion criteria, the data of this interviewee were not included. Those participants who did not attend the interview in time but did not refuse to participate were contacted by telephone more than once because the elderly might forget the designated interview appointment time because of their poor memory.

The field testing procedure was similar to that of the pilot testing. The main differences were the field and the method of sampling. Considering the compliance and the number of aged residents, eight communities in Gongshu district and nine villages in Xihu district were selected. The former was the sample source of urban residents, and the later was that of rural residents. The sample size of each district should be 40 times larger than the number of items in the final version of the SHSE-L [25]. Convenience sampling was used for field testing. Convenience sampling refers to a procedure in which community doctors contact potential participants in advance of the interview, followed by the interviewers remaining in the field for one week or less to interview participants. Those residents who did not participate in the interview in time but did not refuse to participate were reminded by telephone calls, but the interviewers would not wait for them if they did not come to the site for the interview within the stipulated time. The chi-square test was used to compare the distributions of the subjects in the two tests.

Item selection

After calculating the raw scores of the revised draft scale, we selected items to generate the final versions of the SHSE (SHSE-L and SHSE-S). We utilized five statistical methods to select the items in the revised draft scale.

Test-retest reliability analysis

The interval between the test and re-test did not exceed two weeks [26, 27]. The correlation coefficient between the raw score of a particular item in the first interview and that in the second should be larger than 0.30 (P < 0.05) for this item to be retained. If the correlation of some item was too small or the P-value was not less than 0.05, then the test-retest reliability of this item was unsatisfactory.

Cronbach’s alpha analysis

We calculated the standardized Cronbach’s α coefficients of this scale before and after eliminating some items. If the standardized Cronbach’s α coefficient of the scale increased after eliminating some items, then these items were deleted to obtain better internal consistency of the scale [28].

Correlation analysis

The raw score of some items should statistically relate to that of the related dimension (r > 0.40, P < 0.05). Meanwhile, each remaining item should be statistically unrelated (P ≥ 0.05) or minimally related (r < 0.30) to the other two unrelated dimensions.

Distinguishability analysis

We compared the raw scores of a particular item between the high-score group (P75) and the low-score group (P25). An item was determined to lack distinguishability when the difference in distribution was not statistically significant (P ≥ 0.05).

Principal component analysis

A principal component analysis was used to extract the factors after performing Bartlett’s test and using the Kaiser-Meyer-Olkin (KMO) measure (Bartlett’s test: P < 0.05; KMO > 0.60) [29]. The number of factors was preset and was equal to the number of sub-dimensions (see Table 1) because we considered that the sub-dimensions were reasonable and could independently explain the social health of the Chinese elderly. The factors were rotated by Varimax because each two items (see Table 1) were not significantly correlated (the correlation coefficient of each two items was less than 0.30, or P ≥ 0.05). Items were reserved if the factor loadings were greater than or equal to 0.40 [30].

Table 1 The draft structure of Social Health Scale for the Elderly

Reliability and validity assessments

The reliability and validity of the final versions were assessed after calculating the raw scores. The scoring method was the same as that in item selection.

Test-retest reliability

The time intervals between the test and re-test should be no longer than two weeks. A larger correlation coefficient indicated that the test-retest reliability of scale or dimensionality was better. Generally, if the correlation coefficient is larger than 0.80, then the correlation between two variables is desirable.

Internal consistency reliability

Cronbach’s α was used to assess the internal consistency of scale or dimensionality. In most cases, a satisfactory internal consistency indicates that the standardized Cronbach’s α coefficient is greater than 0.70 [31].

Inter-rater reliability

The McNemar-Bowker test was used to assess the agreement between two interviewers who had interviewed the same person. A good agreement meant that the weighted kappa was not less than 0.75 [32].

Concurrent validity

Firstly, the external criteria were those widely used in Chinese populations and had satisfactory reliability and validity. Any of the external criteria were used to assess just one of our dimensions because a comprehensive criterion of the SHSE does not exist. The correlation coefficient between the raw score of some dimension and the external criterion score should be statistically significant (P < 0.05). Additionally, the external criterion score should be comparatively low compared to the raw scores of unrelated dimensions or statistically unrelated (P ≥ 0.05).

Construct validity

A confirmatory factor analysis was performed to assess construct validity, and the maximum likelihood estimation was selected. If the goodness-of-fit index (GFI) and adjusted goodness-of-fit index (AGFI) were larger than 0.95 and 0.90, respectively, then the fitness of the model was desirable [33]. In addition, the root mean square error of approximation (RMSEA) can also be used to assess the degree of fit. If the RMSEA is less than 0.05, then the degree of fit is satisfactory; 0.05–0.08 indicates good fitness, and an RMSEA of less than 0.10 indicates moderate fitness [34].

Convergent and discriminant validity

The average variance extracted (AVE) of scale was calculated. If the AVE is larger than 0.50, then the convergent validity is good [35]. Discriminant validity is acceptable when the squared correlation coefficient of each two factors (factors were extracted when the eigenvalues were larger than 1 in the principal component analysis) was smaller than the AVE of the associated factors [36].

Development of norms

The raw scores were calculated, and the T-test or Wilcoxon rank sum test was used to compare the distributions of the binary variables. Multiple categorical variables were compared using an analysis of variance or the Kruskal-Wallis H test. For better application of the SHSE, the standard norm and percentile rank norm were developed. The former can be applied when comparing two or more populations with different characteristics. The latter was easier for unprofessional people to understand, but the norm might not be descriptive for all Chinse elderly unless the sample was perfectly representative.

Standard norm

The equation for converting the raw score of some subject to the standard score (T score) was as follows: [37].

$$ {\mathrm{T}}_i=50+10\times \left({\mathrm{R}}_i-{\mathrm{M}}_{\mathrm{n}}\right)/{\mathrm{SD}}_{\mathrm{n}} $$

Where: T i is the standard score of the subject; R i is the raw score of the subject; Mn is the mean of the raw score; and SDn is the standard deviation of the raw score.

Percentile rank norm

This norm showed the range of the raw score in each percentile rank [38].

Results

Phase 1: Development of draft scale

There were 3 dimensions, 9 sub-dimensions and 40 items in the revised draft scale (see Table 1). Only one item entitled “quality of natural environment” was added after consulting experts, and the other 39 items were selected from references. The questions and options and the scoring method of the items in the revised draft scale are shown in the Additional file 1. The raw score ranged between 40 and 200. A higher score represents a better social health status.

Phase 2: Pilot testing and items selection

The pilot survey was performed from December 14, 2015 to January 8, 2016. Based on the ratio of subjects to items, the smallest sample size was 400. Considering the low response rates of similar surveys in China, the size of randomly drawing samples was nearly twice the minimum, and 271 potential participants refused to participate when community doctors approached them through telephone calls. Finally, 430 subjects were included in the statistical analysis, and 107 were interviewed twice. Six interviewees were excluded because of serious illness (physically or mentally disabled).

Table 2 shows the characteristics of the pilot test subjects. Mobility, self-care, daily activities, pain or discomfort, and anxiety or depression were the five dimensions in the European Quality of Life-5 Dimensions questionnaire assessed [39]. The “chronic diseases” in the variable “number of confirmed chronic diseases” included 12 diseases found in the top 10 lists of disease burden for the Chinese elderly [40]. The distributions of the two tests were significantly different regarding the type of household, religion, marital status and quality of sleep (P < 0.05). In addition, there were differences in mobility, daily activities, pain/discomfort and anxiety/depression between the pilot testing and field testing.

Table 2 Characteristics of the subjects in two tests

Based on 5 different statistical methods, the items in the revised draft scale were extracted. The items in the final versions of the SHSE are shown in Table 3. There were 25 items in the SHSE-L and 14 items in the SHSE-S.

Table 3 Items in the Social Health Scale for the Elderly after selection

Phase 3: Field testing, reliability and validity assessments

The field testing was performed from November 6, 2016 to January 20, 2017. A total of 2415 residents were interviewed, and 11 of them were excluded before the statistical analysis because of missing data in the SHSE. In total, 494 subjects were interviewed twice. The differences between the distributions of subjects in the two tests were not statistically significant for gender, age group, education level, the status of living alone, smoking status, drinking status, the ability of self-care, or the number of confirmed chronic diseases (see Table 2).

Test-retest reliability

The correlations (Spearman’s correlation analysis) of any two items in the SHSE-L ranged from 0.41 to 0.87. The correlations of scales were 0.77 (SHSE-L) and 0.78 (SHSE-S). In the SHSE-L, the correlations of dimensions were 0.61 (SS), 0.81 (SA) and 0.78 (PER), and those correlations were 0.49, 0.79 and 0.78 in the SHSE-S, respectively. Each correlation was statistically significant.

Internal consistency reliability

In terms of the SHSE-L, the standardized Cronbach’s α coefficient of scale was 0.79, and those of dimensions were 0.85 (SS), 0.61 (SA) and 0.65 (PER). With regard to the SHSE-S, the standardized Cronbach’s α coefficient of scale was 0.65, and those of dimensions were 0.69 (SS), 0.55 (SA) and 0.63 (PER).

Inter-rater reliability

In total, 43.12% of the subjects who were interviewed twice were interviewed by different interviewers. Both the McNemar-Bowker tests (SHSE-L and SHSE-S) indicated disagreement between the interviewers (P < 0.01). The weighted kappas were 0.44 (SHSE-L) and 0.43 (SHSE-S).

Concurrent validity

The Social Support Rate Scale (SSRS) has been widely used to assess social support of the Chinese [41], and it was selected as the external criterion of SS. One question used to assess the relationship between the interviewee and his or her colleagues was removed, so the maximum aggregate score was 62. A total of 2358 subjects did not have missing data in the SSRS. Spearman’s correlation analyses were conducted to assess the correlations between SSRS and SS, SA, or PER. Moderate correlations were identified between the SSRS and SS parts of the SHSE-L and SHSE-S. The correlations between the SSRS and SS were 0.64 (P < 0.01) and 0.61 (P < 0.01) in the SHSE-L and SHSE-S, respectively. In addition, the SSRS was uncorrelated or weakly correlated with SA and PER in both the SHSE-L (SA: r = 0.23, P < 0.01; PER: r = 0.03, P > 0.05) and the SHSE-S (SA: r = 0.20, P < 0.01; PER: r = 0.01, P > 0.05).

Construct validity

Two models were constructed, one based on the SHSE-L (model I) and another based on the SHSE-S (model II). Model I was listed as follows: x1 = a1*f1 + e1, x2 = a2*f1 + e2, x3 = a3*f1 + e3, x4 = a4*f2 + e4, x5 = a5*f2 + e5, x6 = a6*f2 + e6, x7 = a7*f3 + e7, x8 = a8*f3 + e8. Model II was listed as follows: x1 = a1*f1 + e1, x2 = a2*f1 + e2, x3 = a3*f2 + e3, x4 = a4*f2 + e4, x5 = a5*f2 + e5, x6 = a6*f3 + e6, x7 = a7*f3 + e7. In the equations, ai and ei represent coefficients and xi and fi represent sub-dimensions and dimensions, respectively. Figure 1 shows the relationships between sub-dimensions (xi) and dimensions (fi) in the two models. In model I, GFI = 0.95, AGFI = 0.90, and RMSEA = 0.10. In model II, GFI = 0.97, AGFI = 0.93, and RMSEA = 0.09.

Fig. 1
figure 1

The structures of Model I (a) and model II (b) in confirmatory factor analysis

Convergent and discriminant validity

The AVEs of the SHSE-L and SHSE-S were 0.54 and 0.53, respectively. Table 4 shows the matrix of factor loadings after being rotated by Varimax in the principal component analysis. Six and four factors were extracted in the principal component analysis of the SHSE-L and SHSE-S, respectively. The AVEs of every two factors were larger than the squared correlation coefficients of related factors in both versions of the SHSE (SHSE-L: the AVEs of the factors ranged from 0.31 to 0.78, and the maximum squared correlation coefficient was 0.14; SHSE-S: the AVEs of the factors ranged from 0.33 to 0.66, and the maximum squared correlation coefficient was 0.10).

Table 4 The matrix of factor loadings after being rotated by Varimax

Phase 4: Development of norms

Table 5 shows the distributions of raw scores in the field testing. Except for the status of living alone and the number of confirmed chronic diseases, the distributions of the other variables were similar between the SHSE-L and SHSE-S. The differences were statistically significant for gender, age group, type of household, religion, education level, marital status, quality of sleeping, smoking status, the ability of mobility, the ability of self-care, the ability of daily activities, and anxiety status. Female, young elderly, Christian, highly educated, and married persons had better social health. Living alone; poor quality of sleep; current smoking; poor ability of mobility, self-care and daily activities; and serious anxiety/depression might imply worse social health. The standard norm and percentile rank norm are shown in the Additional file 1. Generally, SS and SA changed with age, so the same norm was not suitable for every age group. Taking these results into consideration, we generated three different norms for the three age groups.

Table 5 The distribution of raw score of the Social Health Scale for the Elderly

Discussion

This study developed two versions of the SHSE, with 25 items in the long form and 14 items in the short form. Each form could assess three dimesons of social health, and both social health of the individual and social health of society were measured. The reliability and validity of the two versions were acceptable. Two norms could reflect the social health status of the generally healthy elderly living in Hangzhou. We believe that the SHSE-L can be used to explore the risk or protective factors of social health, and the SHSE-S can be combined with other domains of health status (e.g., mental health) to assess comprehensive health status. Usually, the short forms of scales are generated based on their longer forms, such as the SF-12 [42]; therefore, we suggest further studies for the development of the SHSE-S, although the reliability and validity results of the SHSE-S were similar to those of the SHSE-L.

This study had the following limitations: firstly, the response rate of the pilot testing was not very good [43], so non-respondent bias existed. Neither random sampling survey nor census was performed during the field testing. Compared to the pilot testing sample, some differences were present (Table 2); thus, the representativeness of the field testing sample was not desirable, and volunteer bias was inevitable. All the participants lived in Hangzhou; therefore, the representativeness of the sample was not satisfactory. Secondly, the test-retest reliability and inter-rater reliability of the SHSE-L and SHSE-S were acceptable but were far from perfect. The internal consistency of the SHSE-S was lower than the optimum level. All of the above limitations might result from imperfect design of the questions and options. Because of the lack of applicative external criteria about the SHSE, SA and PER, the concurrent validity assessment was not completed. Thirdly, the application of SHSE was not wide enough because of the lack of multiple cultures in the stage of developing the draft scale; therefore, Chinese elderly who live in different cultures might not be suitable for assessment with this scale. Finally, this study lacked a comprehensive outcome variable to assess the contribution of social health to the comprehensive health status of the elderly.

The social adjustments of people in different cultures are diverse [44]. China is a multi-ethnic society; therefore, the existence of multi-cultures is inevitable in China. Similarly, the levels of SS and PER might also be diversified. It was difficult to generate a scale/norm that could be applied universally in China based on one study. For better utility, the validity and reliability of the SHSE-L and SHSE-S should be assessed based on a representative sample or total population. Then, the SHSE-L and SHSE-S should be revised to improve their reliability and validity. Finally, the norms of the SHSE-L and SHSE-S could be widely used in the assessment of social health status of all Chinese elderly.

Previous studies have indicated that the agreement of answers between scales designed for self-report and scales designed for short interviews are not optimistic [45]. Therefore, we do not suggest that residents complete the SHSE-L or SHSE-S by themselves; rather, we recommend that trained personnel complete the scales by interviewing the participants. Additionally, there were some problems with the interviewers, such as improper ways of asking sensitive questions, time and site constraints, and interviewer bias. Self-report versions of the SHSE-L and SHSE-S should be generated in the future.

Conclusion

For successful ageing, a suitable instrument to measure health status is necessary. This study developed a long and short form of the SHSE (SHSE-L and SHSE-S, respectively) to measure the social health status of the Chinese elderly, which fills a gap in social health assessment. The standard norms and percentile rank norms of the social health of the elderly in Hangzhou city were generated, which can be used as references in other studies.