Introduction

Health-Related Quality of Life (HRQoL) is distinguished from quality of life in that it is concerned primarily with those factors that fall under the purview of health care providers and health care system [1]. HRQoL as a multi-dimensional concept describes the effect of diseases and illnesses on persons′ physical, social and mental well-being [2, 3] and it is important in estimating the efficacy of medical intervention on quality of life [46] and also to monitor community health [2, 3].

Literature is replete on various HRQoL measures or test batteries which can either be generic or disease-specific [25, 711]. Amid the various measurement tools, the Medical Outcomes Study (MOS) Short Form 36-item Health Survey (SF-36), is one of the most widely used generic measures of HRQoL with good psychometric properties and substantial data on its applicability in clinical and research settings [6, 914]. The SF-36 was derived from the original 245 items RAND MOS Questionnaire as a set of generic, coherent, and easily administered quality of life measures [9, 10, 15]. The 36 items health survey tool assesses eight health dimensions referred to as subscales, namely Physical Functioning (PF: 10 items), Role Limitations due to Physical Problems (RP: 4 items), Bodily Pain (BP: 2 items), General Health (GH: 5 items), Vitality (VT: 4 items), Social Functioning (SF: 2 items), Role Limitation due to Emotional Problems (RE: 3 items) and Mental Health (MH: 5 items) [9, 10]. These subscales’ scores are summarized into physical and mental composite domains [9, 10]. Individual SF-36 items are recoded, summed and transformed. The health concepts described by the SF-36 range in score from 0–100, with higher scores indicating higher levels of function or better health. Scores on the eight subscales can be used to compute a summary index of Physical Health Component (PC) and Mental Health Component (MC) respectively [9, 10].

The SF-36 has been employed to compare quality of life between different disease groups and populations [7, 16]. However, Cheung [17] opined that the SF-36 scores cannot be used to make valid inferences about racial/ethnic group differences when measurement equivalence is not provided, such differences might be due to item response bias rather than true differences in self-reported health. Non-equivalence can occur when differences in values, attitudes, language and overall world view cause respondents to respond to survey questions differently, leading to differential item functioning [18]. Therefore, the cultural sensitivity and bias of the SF-36 necessitates its adaptation and translation into different languages. Consequently, the SF-6 has been translated for use in both general and condition-specific populations in many languages such as Arabic [14], Chinese [19], Malay [20] and Persian [21] among others.

The SF-36 has been used by some Nigerian researchers [2226], with two studies reporting content and criterion validity of its Yoruba translations among patients with hypertension [25] and low-back pain [26] respectively. However, cross-cultural adapted versions of the SF-36 in indigenous Nigerian languages based on internationally accepted guidelines seems not to be available for referencing. Therefore, this study was conducted to cross-culturally adapt the SF-36 into Yoruba language and determine its reliability and validity. Yoruba is one of the major Nigerian languages spoken in Southwest Nigeria, and also in countries like Benin and Togo. In addition, there is a pocket of Yoruba population especially in the UK, Brazil and the USA [27, 28].

Methods

The study protocol was approved by the Ethical Review Committee, Institute of Public Health, College of Health Sciences, Obafemi Awolowo University, Ile-Ife, Nigeria (IPHOAU/12/156). A total of 1087 (657 males and 430 females) individuals comprising of students, workers and other residents of Ile-Ife, Osun state, Nigeria volunteered for this study, yielding a response rate of 98.8 % (i.e. 1087/1100). Informed consent was obtained from all the respondents. Eligible respondents were 18 years and older, literate in English and Yoruba languages and with no reported history of cognitive or mental impairment or current medical condition. A multistage sampling technique was employed in the study. Respondents from Obafemi Awolowo University, Ile-Ife comprised of students and staff. The students were recruited from four randomly selected students’ halls of residence (two for male and female students respectively). Every odd numbered room in each block in the halls of residence was sampled. Staff respondents were volunteers sampled from ten randomly selected departments using a fishbowl technique. Respondents who were resident outside the university community were recruited based on the World Health Organization guideline for conducting community surveys [29]. These respondents were randomly selected from five out of eleven political wards of Ile-Ife Local Government Area. Every odd numbered house was selected for survey.

Based on the International Quality of Life Assessment (IQOLA) Project, the English version of the SF-36 was translated into Yoruba language. The IQOLA project was established in 1991 with the goal of developing validated translations of a health status questionnaire as required for their use internationally in order to avoid bias in interpretation and adaptation [10, 19, 30].

The protocol was carried out in sequential order as highlighted by Fukuhara et al. [12]:

  1. a.

    Forward translation of the items and response choices of the English version of the SF-36 into Yoruba language by two native Yoruba speakers with fluency in English (Translator A and B). These translators are linguists and educators in Yoruba language at the University who worked independently to produce two initial Yoruba versions of the SF-36. Both translators were instructed, as described in the IQOLA protocol, to aim for conceptual rather than literal translation and to keep the language colloquial and compatible with a reading level of age 14 as described by Fukuhara et al. [12]. The translators were also asked to give difference translations for each response choice where possible.

  2. b.

    Harmonization and reconciliation of the two different translations was carried out. Another bilingual translator reviewed the items in the two Yoruba translated questionnaires in order to produce a single, reconciled and harmonized translation. Based on consensus among the translators, items were included in the reconciled translation. For items that were linguistically or culturally problematic, the translators decided on the most acceptable option after exhausting all reasonable options.

  3. c.

    Two natives who were literate in oral and written Yoruba language assessed the Yoruba consensus translation for comprehensibility and ambiguity on the difficulty and quality rating in terms of clarity, common language usage, and conceptual equivalence. A scale of 0 to 100 (0 means “not at all difficult” and 100 means “extremely difficult”) was used for difficulty rating, while quality rating was also rated on a scale of 0 to 100 (where 100 indicate perfection). First assessor rated the harmonized translation 70 and 80 % on difficulty and quality scale respectively, while the other assessor rating was 80 and 90 % for difficulty and quality scale respectively.

  4. d.

    The harmonized forward translation was back translated into English by two bilingual (English and Yoruba) professional translator for conceptual equivalence with the original source version.

  5. e.

    In order to validate the back translated English version by comparing it with the source English version, independent rating of the equivalence of the backward translations to the English version was carried out.

  6. f.

    Problematic items and response options were reconciled through an iterative procedure.

  7. g.

    The pre-final version of the Yoruba SF-36 was pilot tested among 32 individuals. The pilot test was aimed to explore the clarity and applicability of the translated Yoruba SF-36 in terms of perception, understanding of various terminologies used and interpretations. The results of the cognitive debriefing from the pilot study was used to further refine the pre-final version in terms of words used and the format or layout of the questionnaire.

The clustering and ordering of items in the translated Yoruba SF-36 was the same as the English version of the SF-36. However, in order to give the translated Yoruba SF-36 a conceptual equivalence to the English SF-36, arrangement and wording of some parts of the questionnaire were altered. The following cultural adaptations were made to the translated Yoruba SF-36:

  1. i.

    Alphabetic numbering of the translated Yoruba SF-36 is not consistent with the English version. The alteration in the alphabetic numbering was because ‘C’ does not exist in the Yoruba alphabets, meanwhile alphabets such ‘Ẹ’ and ‘GB’ which are in the Yoruba does not exist in the English alphabets. For example, the first eight Yoruba alphabets are –a, b, d, e, ẹ, f, g, gb. Therefore, ‘d’ is the 3rd and ‘gb’ is the 8th Yoruba alphabet, hence the alphabetic numbering alteration of the translated Yoruba SF-36.

  2. ii.

    Item 1 was rearranged in order to enhance its meaning in Yoruba language (This is because instruction comes before question).

  3. iii.

    In question ‘3b’, “pushing a vacuum cleaner and bowling” were changed into “floor mopping and archery” respectively. Pushing a vacuum cleaner and bowling are uncommon activities in the study context. Therefore “floor mopping” was considered as an alternative moderate activity for “pushing vacuum cleaner”. On the other hand, bowling which refers to a number of sports or activities involving throwing a bowling ball towards a target, is strange to the Yoruba culture. However, archery which also refers to a sport involving shooting arrows at a target using a bow can easily be related with by the Yoruba people.

  4. iv.

    ‘Blocks’ as a measure of distance in questions ‘3 h’ and ‘3i’ of the English SF-36 (which corresponds to ‘3gb’ and ‘3i’ in the Yoruba SF-36) were changed in the Yoruba SF-36 to ‘electric pole distance’. This is because blocks are less known as a descriptive measure of distance in the study setting compared with electric pole distance. However, the distance between two high tensions electric poles are commonly 50 m, which is not an equivalent of ′one block” that actually means a distance of 100 m. Nonetheless, it also known that distance between blocks are not fixed and varies widely in different setting.

Respondents completed both English and Yoruba versions of SF-36 as well as questions on socio-demographic variables. Thereafter, the respondents rated the English and Yoruba versions of SF-36 separately based on standard method of rating SF-36 Questionnaire recommended by IQOLA on the same day. Two weeks later, the respondents were asked to rate their quality of life on the Yoruba version of SF-36 again and the scores were compared with the initial rating.

Computation of the SF-36 involves, firstly, recoding of the pre-coded numeric values based on SF-36 scoring key for the required 35 out of the 36 items. Each item is recoded on a 0 to 100 range. Items “1, 2, 6, 8, 11b, 11d” with pre-coded numeric values 1 through 5, are recoded inversely to values of 100, 75, 50, 25, and 0 respectively. Items “7, 9a, 9d, 9e, 9 h” with pre-coded 1 through 6, are recoded inversely to values of 100, 80, 60, 40, 20 and 0 respectively. On the other hand, items “3a, 3b, 3c, 3d, 3e, 3f, 3 g, 3 h, 3i” with pre-coded numeric values, 1 through 3 are recoded in the same direction to values of 0, 50 and 100 respectively. Items in “4a, 4b, 4c, 4d, 5a, 5b, 5c” with pre-coded numeric values, 1 and 2 are recoded in the same direction to values of 0 and 100 respectively. Items in “10, 11a, 11c” with pre-coded numeric values, 1 through 5 are recoded in the same direction to values of 0, 25, 50, 75 and 100 respectively, while items in “9b, 9c, 9f, 9 g, 9i” with pre-coded numeric values, 1 through 6 are recoded in the same direction to values of 0, 20, 40, 60, 80 and 100 respectively.

Secondly, items in the same hypothesized scale are computed and averaged together to create the eight scale scores. The scales and the constituent items are 1) GH –“1, 11a, 11b, 11c, 11d”; 2) PF - “3a, 3b, 3c, 3d, 3e, 3f, 3 g, 3 h, 3i, 3j”; 3) RP – “4a, 4b, 4c, 4d”; 4) RE – “5a, 5b, 5c”; 5) SF – “6, 10”; 6) MH – “9b, 9c, 9d, 9f, 9 h”; 7) BP – “7, 8”; and 8) VT – “9a, 9e, 9 g, 9i”. Thirdly, scales in the same domain are computed and averaged together to create the two domain scores. The domains and the constituent scales are 1) PC – “GH, PF, RP, BP”; and 2) MC – “MH, RE, SF, VT”.

In order to determine the psychometric properties of the Yoruba version of the SF-36, it was hypothesized that items, scales and domain scores would correlate significantly (r >0.40) with the English SF-36. Based on correlation co-efficient (r) cut-off points for high (= > 0.70), moderate (0.4 - <0.7) and low (<0.40), high correlations (>0.70) were considered desirable because this would indicate good validity of the translated Yoruba SF-36 (Appendix).

Data analysis

Descriptive statistics of scales and domains of the Yoruba version of SF-36 was determined by analyzing mean score, confidence interval, skewness and Kurtosis. Concurrent validity of the Yoruba SF-36 was determined by correlating scores of English and Yoruba versions of the SF-36 using Pearson’s product moment correlation. Intra class correlation (ICC) was used to determine the reliability (test-retest) of the Yoruba SF-36. Multi trait scaling analysis (i.e. item-scale correlations) was used to confirm item discriminant validity (i.e. correlations between each item and its hypothesized scale). Known-groups validity of Yoruba version of SF-36 was tested by comparing scale and domain scores by gender and age groups using independent t-test and One-way ANOVA respectively. Data was analyzed using SPSS (Statistical Package for Social Sciences) version 16.0. Alpha level was set at p < 0.05.

Results

The respondents’ ages ranges between 18 and 70 years with the mean of 27.9 ± SD 9.48 years. The socio-demographic characteristics of the respondents are presented in Table 1. The respondents were mostly of the Yoruba tribe (95.8 %), single (67.8 %), Christians (41.3 %) and had tertiary education (83.8 %). The mean, confidence interval, skewness and Kurtosis of mean scores for the eight scales/dimensions of the Yoruba version of SF-36 are presented in Table 2. The result shows that the mean scores for the scales range between 83.2 and 88.8. The highest and lowest scores were observed in the MH (88.8) and RE (83.2). PC and MC domain scores was 85.6 ± 13.7 and 85.9 ± 15.4 respectively. The scale and domain scores yielded negative skewness ranging from −2.08− −0.98 on the Yoruba version of SF-36.

Table 1 Socio-demographic characteristic of the respondents (n = 1087)
Table 2 The mean score, standard deviation, confidence interval, Skewness and kurtosis of each scales and components (domains) of Yoruba SF-36 (n = 1087)

Table 3 shows the Pearson correlation analysis between respondents’ scores on the English and Yoruba SF-36 (concurrent validity). The correlation co-efficient (r) of the scales and domains ranges between 0.749 and 0.902. GH and BP had the highest (r = 0.902) and lowest (r = 0.749) correlation co-efficient respectively. Correlations between each item and its hypothesized scale (i.e. scale score computed from all other items in that scale as a test of item internal consistency) were all above 0.50, except for item 1 and GH (i.e. “In general, would you say your health is′) where r = 0.421. The highest item-scale correlation coefficient was between item 8 and other items on the BP sub-scale yielding a correlation co-efficient of 0.907 (i.e. “During the past four weeks, how much did pain interfere with your normal work -including both work outside the home and housework”). The details for item-scale correlations (discriminant validity) for Yoruba SF-36 is presented in Table 4. The result shows that items in VT, SF and MH scales had correlation scores greater than 0.23 with scales other than their hypothesized scales except the correlations between item 9a (“Did you feel full of pep?”) and each of RP (0.192) and PF (0.146). Correlations of item 11c (“I expect my health to get worse”) of GH with other scales were less than 0.3. Correlations of items in hypothesized GH scale with items in MH, BP and EF scales were less than 0.3 except correlations between 4b (“Accomplished less than you would like”) and GH; and 4c (“Were limited in the kind of work or other activities”) with each of MH and EF (Table 4).

Table 3 Pearson correlation analysis between respondents’ scores on the English and Yoruba SF-36 (concurrent validity) (n = 1087)
Table 4 Item-scale correlations (discriminant validity) of the Yoruba SF-36 (n = 1087)

For the known-groups validity of the Yoruba version of the SF-36 by gender and age, Table 5 shows the result of the independent t-test comparison of scales and domains by gender. The result showed that men had significant higher mean scores in GH (p = 0.022), RP (p = 0.054), RE (p = 0.013) and SF (p = 0.013) scales respectively. There were no significant gender differences in domain scores (p > 0.05). On the other hand, Table 6 shows the result of the One-way ANOVA comparison of scales and domains by age group. There were significant differences in the mean scores of the Yoruba SF-36 scales and domains (p < 0.05). The younger age group (18-24years) had significantly higher mean scale and domain mean scores (p < 0.05). A decline in mean scores with higher age was observed across the different scales and domains except within the age bracket 35–44 years where high mean scores were found (except for GH scale).

Table 5 Independent t-test comparison of scales and domains score of the Yoruba version of the SF-36 by gender
Table 6 One way ANOVA comparison of the Yoruba version of the SF-36 scales and domains by age group

The correlation between domains and scales of the Yoruba version of SF-36 (internal consistency of the scales and domains) are presented in Table 7. The result shows that correlations between scales and hypothesized domains (PC and MC) were above 0.50 (except the correlations between GH and each of PC (r = 0.477) and MC (0.28). The highest scale-domain correlation was between RE and MC (r = 0.826) and RP and MC (0.826). PC was strongly correlated (≥0.70) with each of PF (r = 0.711), RP (0.823) and BP (0.700) while MC was strongly correlated with each of RE (0.826), SF (0.811), MH (0.789) and VT (0.793). Intra-Class Correlation (ICC) of scores on the Yoruba SF-36 on two occasions (test-retest reliability) (n = 249) is presented in Table 8. The ICC ranges between 0.636 and 0.843 for scales, and between 0.783 and 0.851 for domains.

Table 7 Correlation of physical health components and mental health domains (in horizontal axis) with the 8 scales (vertical axis) in Yoruba version of SF-36 (n = 1087)
Table 8 Intra-Class Correlation of scores on the Yoruba SF-36 on two occasions (test-retest reliability) (n = 249)

Discussion

Translation and cross-cultural adaptation of HRQoL tools to languages other than the original population from which the tool was developed enhances understanding and facilitate acceptance of the tool by the accessible population [3133]. This study was conducted to cross-culturally adapt the SF-36 into Yoruba language and determine its reliability and validity. A response rate of 98.8 % was achieved in this study, therefore, suggesting that the Yoruba SF-36 was an acceptable tool for measuring HRQoL in the Yoruba population. Based on difficulty and quality rating, the Yoruba SF-36 had a high rate of data completion with good quality data in the study population.

The concurrent validity of the Yoruba SF-36 was high, with scales and domains having co-efficient ranges greater than 0.70 that was considered desirable for good validity of a new tool. The correlation co-efficient ranges for concurrent validity obtained in this study, is consistent with scales and domains ranges of 83.2 to 88.8; and 85.6 and 85.9 respectively reported in previous studies [3441]. From this study, all scale scores showed negative skewness among the sample population, implying that respondents gave answers that tilt towards the positive end of the health spectrum. Skewness distribution of the Yoruba SF-36 scales follows a similar pattern to previous findings on SF-36 in Hong Kong [36], Australia [37], Netherland [30], New Zealand [41], Brazil [42], Malaysia [20] and Turkey [16] among others.

The result of test of the known-group validity of the Yoruba SF-36 indicated that many dimensions of the SF-36 are influenced by socio-demographic variables such as age and gender. Men had higher mean scores in all scales (except MH) and domains. This finding is consistent with previous reports [11, 14, 39, 42]. However, the reason for higher HRQoL scores in men is still a subject of debate. Hopman et al. [39] implicated poorer HRQoL scores among women on higher incidences of psychological symptoms and greater psychological distress compared with men; in addition, women are more expressive of their symptoms and well-being. The finding of this study also showed that all the scales (except BP) and domains were associated with age. There was an obvious decline in mean score with older age across different scales and domains except within the 35–44 years age bracket. The result also revealed that that subscales of PC (i.e. PF, GH, BP and RP) decreased with older age, while age seem to have less influence on subscales of MC (MH, RE, SF and VT). This finding is also consistent with earlier reports [35, 37, 39, 41, 42].

The finding of this study showed a high level of item-scale correlations (i.e., correlations of an item with its own scale) greater than minimum value of 0.4 recommended by Ware et al. [10]. The finding showed that definite scaling success was met because the difference between the item-hypothesized scale and item-other scale correlation were >2 S.D. (i.e. >0.15) as recommended [10]. All items in the Yoruba SF-36 correlated strongly with its hypothesized scale than with scales measuring other concepts except the correlation of item 1 (i.e.” in general would you say your health is?”) of GH with RE, SF, BP and VT scales. Also, item-scale correlations were comparable within each scale, except item 1 which is similar to the findings of a previous study by Sararaks et al. [20]. Therefore, the pattern of item-scales correlation in this study was consistent with the recommendations for good psychometric criteria for SF-36 translations and cultural adaptions [31, 32, 3441]. In addition, the test-retest results of the Cronbach’s α and ICC confirm high reliability of the Yoruba SF-36 at the level of scales and domains, greater than 0.7 coefficient level for good reliability for group-level analyses [8, 4244].

This study’s findings on concurrent and discriminant validity, reliability and internal consistency indicates that the Yoruba SF-36 is a valid tool to assess HRQoL among Yoruba populace. The Yoruba SF-36 showed excellent psychometric properties comparable to the original American and other versions. However, item 1 (“In general would you say your health is?”) was poor on discriminant validity scores. Caution is recommended in the interpretation of the finding on item ‘1’ pending further studies. To validate the findings on the Yoruba SF-36 obtained in this study, further studies among various in health and disease populations are needed. The heterogeneity of sample population, mixed methods of sampling and using distance between two adjacent electric poles as equivalence of one block are potential limitations of this study.

Conclusion

The data quality, concurrent and discriminant validity, reliability and internal consistency of the Yoruba version of the SF-36 are adequate and it is recommended for measuring health-related quality of life among Yoruba population.