The SF-36 and SF-12 summary scores were derived using an uncorrelated (orthogonal) factor solution. We estimate SF-36 and SF-12 summary scores using a correlated (oblique) physical and mental health factor model.
We administered the SF-36 to 7,093 patients who received medical care from an independent association of 48 physician groups in the western United States. Correlated physical health (PCSc) and mental health (MCSc) scores were constructed by multiplying each SF-36 scale z-score by its respective scoring coefficient from the obliquely rotated two factor solution. PCSc-12 and MCSc-12 scores were estimated using an approach similar to the one used to derive the original SF-12 summary scores.
The estimated correlation between SF-36 PCSc and MCSc scores was 0.62. There were far fewer negative factor scoring coefficients for the oblique factor solution compared to the factor scoring coefficients produced by the standard orthogonal factor solution. Similar results were found for PCSc-12, and MCSc-12 summary scores.
Correlated physical and mental health summary scores for the SF-36 and SF-12 derived from an obliquely rotated factor solution should be used along with the uncorrelated summary scores. The new scoring algorithm can reduce inconsistent results between the SF-36 scale scores and physical and mental health summary scores reported in some prior studies.
(Subscripts C = correlated and UC = uncorrelated)
Health-related quality of life (HRQOL) refers to functioning and well-being in physical, mental and social dimensions of life. The SF-36 and the SF-12 are the most frequently used multi-item HRQOL instruments [1, 2]. The SF-36 is composed of 8 multi-item scales (35 items) assessing physical function (10 items), role limitations due to physical health problems (4 items), bodily pain (2 items), general health (5 items), vitality (4 items), social functioning (2 items), role limitations due to emotional problems (3 items) and emotional well-being (5 items) . These eight scales can be aggregated into two summary measures: the Physical (PCS) and Mental (MCS) Component Summary scores . The 36th item, which asks about health change, is not included in the scale or summary scores. The SF-12 is a 12-item subset of the SF-36 that has two summary measures: the Physical (PCS-12) and Mental (MCS-12) Component Summary scores . Higher scores represent better health.
The standard scoring algorithm for the SF-36 and SF-12 version 1 summary measures is based on a factor analytic technique that forces the scores to be orthogonal [2, 3]. Figure 1 depicts the conceptual framework on which the orthogonal component summary scores are based. The model assumes that physical and mental health constructs are uncorrelated (Φ = 0). Recent studies have shown inconsistent results between the 8 SF-36 scale scores and the PCS and MCS [4–7]. For example, a study of 482 patients initiating antidepressant treatment found improvements from baseline to 3 months of 0.28–0.49 SD units on the physical health scales (physical functioning, role limitations due to physical health problems, pain, general health), but the PCSuc was essentially unchanged (from 51 to 50). These patients had large improvements on the emotional well-being scale (1.67 SD) .
Taft et.al. concluded that the discrepancies between results for the SF-36 scale scores and component scores are a result of the negatively weighted scales used in the PCS and MCS scoring algorithm [5, 6]. The scoring algorithm for PCS includes positive weights for the physical functioning, role-physical, bodily pain, general health and vitality scales and negative weights for the social functioning, role-emotional and emotional well-being scales . The scoring algorithm for MCS includes positive weights for the vitality, social functioning, role-emotional, and emotional well-being scales and negative weights for the physical functioning, role-physical, bodily pain and general health scales . As such, higher mental health scale scores drive the PCS down and higher physical functioning scores drive the MCS down (and vice versa).
The objective of this study is to estimate the SF-36 summary scores (PCSc and MCSc) from a correlated (oblique) physical and mental health factor solution. In addition, we derive weights that can be used to create SF-12 component summary scores from the correlated factor model (PCSc-12 and MCSc-12). We hypothesize that the correlated factor model will produce better correspondence between the scale and summary scores. The results are compared to those obtained from the standard uncorrelated approach . (Summary scores with a subscript "c" are based on oblique [correlated] factor analysis whereas summary scores with the subscript "uc" are created via orthogonal [uncorrelated] factor analysis.)
The sample consists of a random selection of patients receiving medical care from the Unified Medical Group Association (UMGA), an independent association of physicians in the western United States [9, 10]. Patients were at least 18 years of age or older and had a minimum of one provider visit during the year prior to the data collection period from October 1994 to June 1995. Study participants were mailed $2 cash along with a 12-page questionnaire assessing HRQOL, patient evaluations of health care, utilization and demographic characteristics. Those who had not yet responded were sent a questionnaire two weeks later and were given a reminder telephone call. There were 7,093 respondents, a 59% response rate after adjusting for undeliverable surveys, ineligible respondents, and deceased. Our analysis was conducted on patients who had complete data for the SF-36 (n = 6,931).
Deriving Weights for Correlated SF-36 PCSc and MCSc
The method used here is identical to that used by Ware et al.  except the factors were allowed to be correlated. Factor analysis of the 8 SF-36 scale scores with a two-factor oblique rotation was used to estimate the physical and mental health factor scoring coefficients (weights). PCSc was then constructed by multiplying each SF-36 scale z-score by its respective physical factor scoring coefficient and summing the eight products. Similarly, MCSc was created by multiplying each SF-36 scale z-score by its respective mental factor score coefficient and summing the products. The component scores were then transformed so that each had a mean of 50 and a standard deviation of 10 (T-score) in the sample.
In order to illustrate the potential differences in scores produced by the weights derived from the uncorrelated versus correlated factor analysis, we determined summary scores if the scales that load heavily on physical health (physical health, role physical, bodily pain, general health) have z-scores of 1 and the scales that load heavily on mental health (vitality, social functioning, role-emotional and emotional well-being) have z-scores equal to 0.3. Then we determined the summary scores if the z-scores for scales loading heavily on physical health are equal to 0.3 and z-scores for scales loading heavily on mental health are equal to 1.
Deriving Correlated SF-12 PCSc and MCSc
To derive weights for the SF-12 summary measures, the SF-36 PCSc and MCSc were regressed in separate models on the SF-12 items. Dummy variables were created for each of the response choices of the 12 items, allowing the relationship of each level of each SF-12 item to vary rather than assuming a linear relationship. Following Ware et al. , the most favorable response choice for a question was the holdout category. As such, the parameters (weights) estimated are decrements associated with different SF-12 response choices. The predicted values in the models were the PCSc-12 and MCSc-12 scores, respectively.
Thirty-five percent of the sample was male. The majority was Caucasian (80%). The average age was 50 (SD = 18) The majority of the sample had either gone to vocational school, had some college, or completed college (55%) and had a household income greater than $20,000 (77%). Other sample characteristics and average scale and summary scores are given in Table 1. There were no differences between the demographic characteristics (gender, race, age, education, income) of the total respondent sample (n = 7,093) versus the analytic sample (n = 6,931). We also compared adult members in the sampling frame who visited the physician within the last 365 days (n = 1,203,001) and those who returned the questionnaire (n = 7,093). Those who returned a questionnaire tended to be slightly more likely to be older, female, to have hypertension, and to have visited the physician group more recently .
Factor Analysis Results
The oblique two-factor solution indicated that role-physical (0.76), physical functioning (0.71), bodily pain (0.66) and general health (0.53) loaded heavily on the physical factor whereas emotional well-being (0.84), role-emotional (0.59), vitality (0.58) and social functioning (0.39) loaded most heavily on the mental factor. The estimated correlation between the two factors was 0.62 (Table 2).
The factor scoring coefficients produced by the oblique factor solution produced fewer negative numbers than the factor scoring coefficients produced by the orthogonal factor solution used by Ware et al. . For the physical health factor, only emotional well-being had a negative coefficient (-0.03); for the mental health factor, only physical functioning had a negative coefficient (-0.02). The magnitudes of the negative factor scoring coefficients are smaller than those derived in the orthogonal model (Table 3).
Sensitivity Analysis Results
As shown in Table 4, when the SF-36 physical health scale scores are 1 SD and the mental health scales are 0.3 SD above the mean, the PCSuc score is 62.2 (1.2 SD above the mean) and the MCSuc score is 49.6 (equal to the mean). As such, the MCSuc does not reflect the fact that the mental health scales are better than the mean. The alternative scoring algorithm results in a PCSc score that is 1 SD above the mean (60.0) and a MCSc score that is 0.5 SD above the mean (54.6). Similar results were found when the physical health scale scores were 0.3 SD above the mean and the mental health scale scores were 1 SD above the mean, resulting in a PCSuc score of 50.1 (at the mean) and a MCSuc score of 62.8 (1.2 SD above the mean). However, the alternate scoring algorithm produced a PCSc score of 55.1 and a MCSc score of 60.3 (0.5 SD and 1 SD above the mean, respectively).
Regression Analysis Results
Table 5 lists the SF-12 items, the variable names, the parameters estimated previously from the regression models where the orthogonal PCSuc and MCSuc were regressed on the SF-12 items and the parameters estimated here from regressing the obliquely derived PCSc and MCSc scores on the SF-12 items .
It is informative to compare the parameters estimated for the PCSc-12 and MCSc-12 to those estimated for the PCSuc-12 and MCSuc-12. Since the most favorable response choice for each item is the reference group, the y-intercept is the PCS-12 or MCS-12 score for a person who is in the best possible health (respondent selects the most positive response choice for all questions). Hence, the parameters estimated are decrements associated with each response choice for the items. For an individual item, response choices that represent a more favorable health state should have smaller decrements compared to a response choice for a less favorable health states such that we would expect negative coefficients in descending order of magnitude for the response choices of each item. The latter is not the case for four items in the PCSuc-12 model and five items in the MCSuc-12 model. In fact, the parameters estimated are positive, implying an increase in score, if the respondent chooses a non-favorable response choice over the most favorable response choices. These items are denoted with an asterisk ("*" or "+") in Table 5. In the PCSc-12 model, all parameters estimated were negative and in descending order of magnitude except for the response choices for two items (SF2 and EWB3). Similarly, in the MCSc-12 model, three items have higher estimates for less favorable response choices (PF02, PF04, and SF2). The magnitude of the weighting discrepancies are smaller than those obtained in the orthogonal model .
Correlations amongst the SF-36 and SF-12 summary measures are similar when the summary measure is derived using the correlated rather the uncorrelated algorithm. The correlation between PCSc and PCSc-12 was 0.98 whereas the correlation between the PCSuc and PCSuc-12 was 0.96. Similarly, the correlation between the MCSc and the MCSc-12 was slightly higher (0.97) than the correlation between the MCSuc and MCSuc-12 (0.96) (Table 6).
The SF-36 is one of the most commonly used HRQOL measures. Summary scores can be used to minimize problems with multiple comparisons. Ware et al. argue that the orthogonal method of developing summary scores is mathematically simpler and makes the interpretation of each scale less complicated compared to the oblique method [11, 12]. However, several studies have shown that product-moment correlations between the physical and mental health factors range from 0.32 – 0.66, suggesting a moderate to strong correlation between the two components.  Summary scores that are forced to be uncorrelated may yield contradictory results compared to the scale scores. Our data demonstrate that this can be problematic if one assesses the significance of summary scores first and then assesses the scale scores only if the summary scores are significant. Alternatively, if the summary scores are presented alone, without the scale scores, the study may fail to detect an effect of an intervention or an important association with physical health, mental health or both. In fact, specific guidance regarding the SF-12 emphasizes the use of the summary scores because of the limitations of the 8 scale scores. [14, 15] The present study suggests limitations of the summary scores need to be taken into account, as well.
This paper provides an alternative scoring algorithm for the SF-36 (version 1) and the SF-12 (version 1) physical and mental health summary scores. Our approach to constructing these scores is the same as the approach taken by Ware et al. [2, 3] except we allow the physical and mental health constructs to be correlated. By allowing the constructs to be correlated, our results reduce the negative weights that were causing scale and summary score inconsistencies in the scoring algorithm for the uncorrelated SF-36 summary measures. Similarly, our approach reduced the positive weights in scoring algorithm for the uncorrelated SF-12 summary measures that result in weighting discrepancies. Thus, we conclude that by removing the constraint of "uncorrelated factors," it is likely the discrepancies between the scale and composite scores will be reduced.
While this manuscript focused on the method of composite score construction developed by Ware et al. [2, 3], it is important to note that an alternative algorithm for the construction of correlated mental health and physical health summary measures exists [16, 17]. The RAND-36 method is based on item response theory (IRT) scoring for scale scoring and uses only the 4 scales that are primarily indicative of physical health (physical functioning, role limitations due to physical health problems, pain, general health perceptions) and mental health (emotional well-being, role limitations due to emotional problems, social functioning, vitality), respectively, in creating the summary scores. Future research should also examine whether the RAND-36 method resolve inconsistent results between the SF-36 scale scores and the summary scores.
We recognize that there are several limitations inherent to this study. First, our sample includes only those receiving care from UMGA health plans, which may limit generalizability. When comparing the UMGA sample characteristics to those of the general population studied by Ware et al[2, 3], there were some differences with respect to age, gender and race between the two samples [1, 18]. Second, the majority of the study sites included in this study was from the West Coast which would also limit generalizability. Third, non-responders accounted for 41% of the patients contacted. As such, we do not know if the characteristics of the non-responders are the same as the responders. Hence, while this study derived weights based on one sample, we recommend that a similar approach be applied in other samples including the original sample from the general population that was used to generate the uncorrelated summary scores [18, 19]. Lastly, even with the correlated factor solution, there are still some negative factor scoring coefficients.
Summary scores that are forced to be uncorrelated may yield inconsistent results compared to the scale scores from which they are derived. This manuscript provides an alternative approach of deriving summary scores that allows the scores to be correlated. In this sample, the alternate scoring algorithm produced weights for scale scores and items that make it more likely that consistent results will be obtained for summary scores and scale scores. When presenting results from the SF-36 and SF-12 version 1, we recommend presenting the summary scores for the PCSc and MCSc derived from an obliquely rotated factor solution along with the scale scores and uncorrelated summary scores. Future research should be dedicated to deriving a scoring algorithm from an optimal correlated physical and mental health factor solution that is based on the general population, but the scoring algorithm presented in this manuscript can be employed until that is available. Lastly, we recommend that a similar approach be applied to derive summary measures for version 2 of the SF-36 and SF-12.
Short-Form 12 Item Health Survey
Short-Form 36 Item Health Survey
Mental Component Summary – Correlated
Mental Component Summary – Uncorrelated
Physical Component Summary – Correlated
Physical Component Summary – Uncorrelated
Health-related Quality of Life
Ware JE, Kosinski M, Gandek B: SF-36 ® Health Survey: Manual & Interpretation Guide. Lincoln, RI: QualityMetric Incorporated;1993; 2000.
Ware JE, Kosinski M, Turner-Bowker DM, Gandek B: How to Score Version 2 of the SF-12 Health Survey (With a Supplement Documenting Version 1). Lincoln, RI: QualityMetric Incorporated; 2002.
Ware JE, Kosinski M, Keller SD: SF-36 Physical and Mental Health Summary Scales: A User's Manual. Boston, MA: The Health Institute; 1994.
Hays RD, Morales LS: The RAND-36 measure of health-related quality of life. Ann Med 2001, 33: 350–357.
Taft C, Karlsson J, Sullivan M: Do SF-36 summary component scores accurately summarize subscale scores? Qual Life Res 2001, 10: 395–404. 10.1023/A:1012552211996
Taft C: Reply to Drs Ware and Kosinski. Qual Life Res 2001, 10: 415–420. 10.1023/A:1012552211996
Blanchard CM, Cote I, Feeny D: Comparing short form and RAND physical and mental health summary scores: results from total hip arthroplasty and high-risk primary-care patients. Int J Technol Assess Health Care 2004, 20: 230–235. 10.1017/S0266462304001011
Simon GE, Revicki DA, Grothaus L, Vonkorff M: SF-36 summary scores: are physical and mental health truly distinct? Med Care 1998, 36: 567–572. 10.1097/00005650-199804000-00012
Cunningham WE, Nakazono TT, Tsai KL, Tsai KL, Hays RD: Do differences in methods for constructing SF-36 physical and mental health summary measures change their associations with chronic medical conditions and utilization? Qual Life Res 2003, 12: 1029–1035. 10.1023/A:1026191016380
Hays RD, Brown JA, Spritzer KL, Dixon WJ, Brook RH: Member ratings of health care provided by 48 physician groups. Arch Intern Med 1998, 158: 785–790. 10.1001/archinte.158.7.785
Ware JE, Kosinski M, Gandek B, Aaronson NK, Apolone G, Bech P, Brazier J, Bullinger M, Kaasa S, Leplege A, Prieto L, Sullivan M: The factor structure of the SF-36 health survey in 10 countries: results from the IQOLA project. J Clin Epidemiol 1998, 51: 1159–1165. 10.1016/S0895-4356(98)00107-3
Ware JE, Kosinski M: Interpreting SF-36 summary health measures: a response. Qual Life Res 2001, 10: 405–413. 10.1023/A:1012588218728
Hays RD, Morales LS: The RAND-36 measure of health-related quality of life. Ann Med 2001, 33: 350–357.
Ware JE, Kosinski M, Keller SD: A 12-item Short-Form Health Survey: Construction of scales and preliminary tests of reliability and validity. Medical Care 1996, 34: 220–226. 10.1097/00005650-199603000-00003
Johnson JA, Maddigan SL: Performance of the RAND-12 and SF-12 summary scores in type 2 diabetes. Qual Life Res 2004, 3: 449–456. 10.1023/B:QURE.0000018494.72748.cf
Hays RD, Sherbourne CD, Mazel RM: The RAND 36-item Health Survey 1.0. Health Econ 1993, 2: 217–221. 10.1002/hec.4730020305
Hays RD, Prince-Embury S, Chen H: R-36 HIS: RAND-36 Health Status Inventory. San Antonio, TX: The Psychological Corporation; 1998.
McHorney CA, Kosinski M, Ware JE: Comparisons of the costs and quality of norms for the SF-36 health survey collected by mail versus telephone interview: results from a national survey. Med Care 1994, 32: 551–567. 10.1097/00005650-199406000-00002
General Social Survey Codebook [http://webapp.icpsr.umich.edu/GSS/]
Drs. Cunningham and Hays were partially supported by UCLA/Drew CHIME and RCMAR (AG-02-004) and the UCLA/Drew Project EXPORT, National Institutes of Health, National Center on Minority Health & Health Disparities (P20-MD00148-01). Dr. Hays was also supported by a grant from the National Institute of Aging (AG 20679-01). The authors wish to thank Karen Spritzer for reviewing the statistical programs.
The author(s) declare that they have no competing interests.
Drs Cunningham, Farivar and Hays conceived of the study. Dr. Farivar wrote the first draft. All authors wrote and contributed sections of the manuscript. Dr. Farivar performed the statistical analysis. All authors read and approved the final version of the manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Farivar, S.S., Cunningham, W.E. & Hays, R.D. Correlated physical and mental health summary scores for the SF-36 and SF-12 Health Survey, V.1. Health Qual Life Outcomes 5, 54 (2007). https://doi.org/10.1186/1477-7525-5-54