Patients with multiple distressing somatic symptoms are prevalent in primary, secondary and tertiary care settings [14]. The Patient Health Questionnaire (PHQ)-15 [5] is an economical, self-administered instrument that has been used as a screening tool in several studies. Moreover, the PHQ-15 has been suggested by the DSM-5 Workgroup on Somatic Symptom Disorders (SSD) to serve as a measure of somatic symptom severity for the classification of SSD [6, 7].

The PHQ-15 has been developed from its precursors, the “Primary Care Evaluation of Mental Disorders” (PRIME-MD) [8] and the “PRIME-MD Patient Health Questionnaire” (PRIME-MD PHQ) [9]. Previous studies [5, 1012] suggest that individual somatic symptoms frequently cluster into 4 groups: cardiopulmonary, gastrointestinal, pain, and general.

Among 40 self-reported somatic symptom scales investigated in a review, the PHQ-15 and the 12-item Symptom Checklist–90 somatization scale [13] were identified as the most appropriate measures for large-scale studies because of their well-established psychometric properties, relevance to symptoms, brevity, and availability in multiple languages [11].

The PHQ-15 in Hong Kong and Mainland China

Historically, there has been a popular belief that Asians manifest a lower prevalence of mood and anxiety disorders than their Western counterparts because they are more prone to experiencing and manifesting distress via somatic pathways [1416]. Among Chinese patients receiving psychiatric services, somatic symptoms such as pain, insomnia and fatigue have been associated with depressive and anxiety disorders [17].

The validity and reliability of the Chinese version of the PHQ-15 [18] were examined in the general population of Hong Kong. The Hong Kong version of the PHQ-15 exhibited satisfactory internal consistency (Cronbach’s alpha = 0.79) and stable 1-month test-retest reliability. Somatic symptom severity positively associated with functional impairment and health service use.

In mainland China, the validity and reliability of the PHQ-15 were tested in the outpatient clinics of general hospitals in Shanghai [19]. Cronbach’s alpha was 0.73, and the test-retest reliability coefficient was 0.75. There were moderate positive correlations between the PHQ-15 score and anxiety and depression values.

No PHQ-15 data are available for tertiary hospital inpatients in China, and no item response theory (IRT) analyses have been performed. IRT is a probabilistic test theory that represents a strong paradigm for the analysis of tests or questionnaires. Compared to the “simpler” classical test theory, IRT does not assume that each item is equally difficult.

Furthermore, we found several inconsistencies in translation among the English, Hong Kong and Shanghai versions of the PHQ-15 (see Methods section).

The objective of the present study was to assess the validity of the Chinese version of the PHQ-15 for the detection of distressing somatic symptoms in a sample of inpatients at a tertiary hospital.

We aimed to answer the following research questions:

  1. 1.

    What somatic symptoms are reported most often by patients?

  2. 2.

    What is the internal consistency and discriminant validity of the Chinese PHQ-15?

  3. 3.

    Is the PHQ-15 consistent with IRT?


Study design

We conducted an observational cross-sectional survey. The study was initiated under normal clinical conditions on a random day in October 2013.

Participants in this study were inpatients recruited from 10 departments (oncology, cardiology, respiratory medicine, rehabilitation, geriatrics and gerontology, general practice, pain management, thyroid and breast surgery, rheumatology, and hepatic surgery) of the West China Hospital of Sichuan University. The West China Hospital of Sichuan University is a “3 A hospital,” indicating that it meets the highest standards in China. The West China Hospital provides primary, secondary and tertiary care and has a full complement of services, including the departments mentioned above.

All inpatients of these departments were considered to be potential participants in our study. The following inclusion criteria were used: (1) treatment as an inpatient in the selected wards; (2) sufficient language skills to understand the questionnaires; and (3) informed consent to participate in the research. Exclusion criteria were (1) discharge from the hospital on the day of survey completion and (2) inability to independently complete the self-reported questionnaire due to serious physical debilitation or mental status. The investigators were well trained medical doctors, nurses or medical students. A pilot study was performed in advance to confirm the feasibility of the study, e.g., that the patients would agree to participate and would understand the questionnaires. The investigators collected the questionnaires from the patients.

The validation of the PHQ-15 is a component of a larger project investigating the prevalence and recognition of inpatients with emotional distress and their treatment needs at a general hospital.

Assessment instruments


The PHQ-15 is a self-administered somatic symptoms subscale derived from the full PHQ [5.9]. The PHQ-15 includes 15 prevalent somatic symptoms or symptom clusters that represent over 90 % of the symptoms observed in primary care (exclusive of self-limited upper respiratory symptoms such as cough, nasal symptoms, sore throat, and ear ache) [5]. The patients were asked to rate the severity of their symptoms during the previous 4 weeks on a 3-point scale as either 0 (“not bothered at all”), 1 (“bothered a little”) or 2 (“bothered a lot”). Two items consisted of questions regarding “feeling tired and having little energy” and “trouble sleeping”; these items are in the depression module of the PHQ-15.

The classification of somatic symptom severity included minimal (0–4), mild (5–9), moderate (10–14) and severe (15–30). The total symptom severity score ranged from 0 to 30.

The reliability of the PHQ-15 was initially supported by the results of one study of 6000 patients from general internal medicine and family practice clinics in Phoenix [5]. In that study, the PHQ-15 demonstrated good internal consistency (Cronbach’s alpha = 0.80) and was related to criterion indices or physical dysfunction, self-reported disability days, clinic visits, and the amount of difficulty that the patients attributed to their symptoms. Furthermore, linear regressions were performed to examine the ability of the PHQ-15, along with other variables, such as depression scores and medical comorbidities, to independently predict clinical outcomes (e.g., bodily pain and physical functioning).

To determine prevalence rates, a cut-off score of ≥ 10 was used because the range of 10 to 30 reflects moderate to high somatic symptom severity. The selection of this cut-off score was based on previous studies [20, 21].

Each item is measured using a ranking scale; therefore, one open question is whether the sum of these data can be interpreted as metric data. Additionally, the merits of a 3-point scale compared with a 2-point scale are discussed.

The PHQ-15 has been translated into other languages and has been examined in samples from many countries, e.g., Saudi Arabia, Germany, Spain, Belgium, Korea, and the Netherlands. This evidence offers the potential for comparisons between ethnic groups.

Translation of the PHQ-15

This study is part of the Sino-German research cooperation, which was started in 2010. Workshops and a multicenter study on illness perception and illness attribution in patients with somatoform disorders were funded by a grant from the Sino-German Center of Research Promotion in Beijing. A working group of three native Chinese speakers who resided in Germany and were fluent in written and spoken English and German (one psychiatrist, one psychologist, and one educator) was established to revise the Chinese version of the PHQ-15. One translator regularly participated in project meetings. Translations were discussed during the project meetings [22, 23].

We used the Chinese version translated from English to Mandarin by colleagues from Shanghai Mental Center [19]. Because “stomach pain” (胃痛) in item 1 was narrowly translated to mean “gastric pain” in the mainland Chinese version, we changed this to “stomach and abdominal pain” (胃痛或肚痛), in accordance with the suggestions of Lee et al. [18]. In item 8, “fainting spells” was translated to “occasional fainting” (偶尔昏晕过去) in the Shanghai version, but we preferred the Hong Kong wording of “brief fainting” [短時間暈倒 (Cantonese), 短时间晕倒 (Mandarin)]. Please see Additional file 1.

Other than these slight changes, back-translation of the Chinese PHQ-15 showed perfect concordance with the English language version of the PHQ-15.

The Mandarin version of the PHQ-15 used in this study can be provided upon request to the corresponding author.

Depression scale (PHQ-9)

The Patient Health Questionnaire-9 (PHQ-9) assesses each of the nine DSM-IV depression criteria on a scale of “0” (not at all) to “3” (nearly every day) [24]. The PHQ-9 demonstrated acceptable psychometric properties for the screening of patients with late-life depression in Chinese primary care settings, as this questionnaire showed a sensitivity of 0.86 and a specificity of 0.77 [25].

General Anxiety Disorder (GAD-7)

A seven-item anxiety scale (GAD-7) was used to assess the severity of generalized anxiety [26]. In a Chinese general hospital population, this instrument showed good reliability and good criterion, construct, factorial, and procedural validity [27].

Statistical analyses

Using IBM SPSS (23.0), STATA 14 and MPlus 7.3 software, a single sample was analyzed. For descriptive analyses of the quantitative variables, mean, standard deviation and range were calculated, and for analyses of the qualitative variables, frequencies and percentages were used. The distribution of the total scores obtained using Chinese version of the PHQ-15 was studied, and the percentage of patients with each of the possible total scores was calculated.

Three types of analyses were performed to evaluate validity. First, to examine the discriminant validity of the PHQ-15, we investigated the correlations of the PHQ-15 scores with sociodemographic data and the PHQ-9 and GAD-7 scale scores. Based on the results from previous studies of the PHQ-15, we expected that women would have higher somatic symptom severity (SSS) scores than men and that the SSS scores would increase with increasing age and decreasing education level [19]. Second, reliability was analyzed in terms of internal consistency using Cronbach’s alpha coefficient for the total scale score. Exploratory factor analysis was performed to reveal the structure of the internal consistency of the PHQ-15. Finally, IRT analysis was performed to assess the thresholds of the items because each item had only three answer options and could only be interpreted as rank data.

Statistical analyses were conducted using an alpha level of 1 % to avoid alpha inflation resulting from multiple tests.


Description of the sample

Of the 1662 inpatients approached in the 10 departments, 151 patients were excluded based on the exclusion criteria, and 149 patients refused to participate in the study. The main reasons that patients gave for their non-participation were lack of time (n = 27) or interest (n = 60). The final sample consisted of 1362 subjects, corresponding to an overall response rate of 90.1 %. Patients for whom more than 15 % of the data were missing were excluded. Therefore, 1329 eligible patients were included in our study.

The mean total score on the PHQ-15 was 6.79, with a standard deviation of 4.94 (minimum = 0; maximum = 28). We divided the PHQ-15 data from the sample into two groups: the somatoform symptom (SOM) - group (PHQ-15 score <10, n = 960, mean = 4.34, SD = 2.80) and the SOM+ group (PHQ-15 score  10, n = 369, mean = 13.18, SD = 3.28).

The PHQ-15 score moderately correlated with the PHQ-9 score (r = 0.565) and the GAD-7 score (r = 0.512). The Spearman rank correlation coefficients of the PHQ-15 score with income (r = −0.069) and education levels (r = −0.075) were near zero.

The sociodemographic data of the sample are presented in Table 1. Based on an alpha level of 0.1, there were no significant differences in sociodemographic characteristics between the SOM- and SOM+ groups. The distributions of education level and income were comparable to the distributions observed in other studies performed at the general hospitals in China [2830].

Table 1 Sociodemographic characteristics

The clinical data are presented in Table 2.

Table 2 Clinical measures

A comparison of all departments showed no significant difference in the mean PHQ-15 score between the SOM- and SOM+ groups considering an alpha level of 1 % (F (9.1319) = 1.904, p = 0.048, partial Eta2 = 0.013). Significant differences (based on MANOVA) in the mean PHQ-9 and GAD-7 scale scores between the SOM- and SOM+ groups were found.

Item and scale characteristics

The distributions of the items displayed extreme floor effects (see Table 3); in particular, item 4 showed a frequency of 91.1 % for the null, compared to 7.4 % for item one and 1.4 % for item two. Other items, such as items 8 and 11, displayed comparable floor effects. Item 4 showed a very small variance of 0.35. Other items displayed greater variances, but none of the items displayed an ideal difficulty of approximately 1 on a scale from 0 to 2. An ideal difficulty of 1 would be valuable for improving the reliability of the questionnaire because this is a requirement for variance and for a large Cronbach’s alpha.

Table 3 Descriptive statistics for the PHQ-15

The PHQ-15 displayed a Cronbach’s alpha of 0.833. Excluding item 4 slightly increased Cronbach’s alpha to 0.837. The item-to-item correlations were in the range of 0.32 to 0.56. The item-to-item correlations with item 4 were less than 0.15, and the item-to-item correlations of 6 items exceeded 0.50.

The deduced determination coefficients showed a common variance with the PHQ-15 score of 31.9 % for the PHQ-9 score and 26.2 % for the GAD-7 scale score. The discriminant validity of the PHQ-15 is therefore acceptable because the PHQ-15 measures different constructs than the PHQ-9 or the GAD-7 scale in this sample. Cronbach’s alpha was 0.908 for the PHQ-9 and 0.815 for the GAD-7 scale.

Factorial validity

For internal consistency, we performed exploratory factor analysis on the categorical data using MPlus software (see Table 4). All subjects were included in this analysis. By adopting the Kaiser criterion (Eigenvalue >1), three factors were extracted; these factors accounted for 55.97 % of the total variance. The Eigenvalues of the three factors were as follows: factor 1 = 6.026, factor 2 = 1.279 and factor 3 = 1.091. Based on this factor structure, the items loading the 3 factors may be termed “cardiopulmonary”, “gastrointestinal” and “pain”. Thirteen items of the PHQ-15 loaded on only one of the factors; in contrast, items 1 and 2 cross-loaded on two of the factors. The Chi-Square Test of Model Fit showed that the sample size was acceptable (Chi2 = 371.064, df = 63, p < 0.0001). The root mean square error of approximation (RMSEA) of 0.061 was acceptable (90 % C.I.: 0.055 - 0.067). The Comparative Fit Index (CFI) was adequate (0.961), and the Tucker-Lewis Index (TLI) was acceptable (0.935). The standardized root mean square residual (SRMR) was approximately 0.048. The geomin-rotated factors showed correlations between 0.418 and 0.531.

Table 4 Geomin-rated factor loadings

Ten out of the fifteen variables in this model had significant double- or, in some cases, triple-loadings. Some of these double-loadings had nearly the same values. However, a one-factor model displayed a worse fit (Chi2 = 928.208, df = 90, p < 0.0001). The RMSEA of 0.084 was not acceptable (90 % C.I.: 0.079 - 0.089). The CFI was marginal (0.894), and the TLI was marginal or unacceptable (0.876). The SRMR was approximately 0.077. Because the three factors moderately correlated, the one-factor-model displayed a poor fit, and because the authors of the questionnaire used the sum of all items as an outcome, we conducted second-order factor analysis considering these three factors and a second-order factor. The Chi-Square Test of Model Fit considered the large sample size to be acceptable (Chi2 = 451.988, df = 85, p < 0.0001. The RMSEA of 0.057 was acceptable (90 % C.I.: 0.052 - 0.062). The CFI was adequate (0.954), and the TLI was acceptable (0.943). The weighted root mean square residual (WRMR) was approximately 1.554. For this model, double-loading for items 1 and 2 was supposed. The R2 of the three factors was high (Factor 1 = 0.733, Factor 2 = 0.622, and Factor 3 = 0.684).

IRT analysis

IRT analysis of the partial credit model showed that all of the items suited the model. The problematic item 4 displayed two thresholds in the appropriate order, but these thresholds did not markedly differ (2.498 vs 2.583). All of the other items showed greater differences in their thresholds and showed adequate results based on IRT analysis (see Table 5).

Table 5 Thresholds for the PHQ-15 based on IRT analysis

In three further IRT analyses of the partial credit model, all of the items within the three factors showed good fitness in the models. Although item 4 remained problematic, the three-factor solution was acceptable (see Table 6).

Table 6 Thresholds for the three possible factors of the PHQ-15 based on IRT analysis


The present study evaluated the Chinese version of the PHQ-15 in a large tertiary hospital inpatient setting in Chengdu. The results revealed satisfactory reliability (Cronbach’s alpha = 0.83) of this scale and good evidence of its validity. Cronbach’s alpha in this study was higher than that in Western and Chinese studies (between 0.78 and 0.82).

The correlations of the PHQ-15 scores with the PHQ-9 depression scale and the GAD-7 anxiety scale scores were similar to the correlations between these instruments in other studies; this evidence suggests that the PHQ-15 has discriminant validity [31].

The correlations of the PHQ-15 score with the PHQ-9 and GAD-7 scale scores were not sufficiently high to completely attribute the PHQ-15 results to coexisting depressive and anxiety symptoms. Aside from medical comorbidities, functional or bodily distress symptoms were observed as factors (discriminant validity).

In a factor analysis of a former version of the PHQ-15 in a USA clinical study, three factors were identified: cardiopulmonary, gastrointestinal, and general pain/fatigue (explanation of the total variance: 46 %) [32]. A study from Hong Kong [18] determined four clinically meaningful factors that explained 49.7 % of the total variance: “cardiopulmonary,” “gastrointestinal,” “pain” and “neurological”. A study from Shanghai [19] identified three factors: “general discomfort,” “gastro-intestinal discomfort” and “cardiothoracic discomfort” (explanation of the total variance: 54 %). Based on factorial analysis in our study, we identified three factors, referred to as “cardiopulmonary” “gastrointestinal” and “pain/neurological,” which explained 56 % of the total variance. A second-order factor analysis including these three factors produced an acceptable model. Because of substantial double-factor loadings, a unidimensional model is also discussed.

Item 4 (menstrual problems), item 8 (sexual problems) and item 11 (fainting spells) displayed extreme floor effects. These floor effects were also found in previous Chinese and Western studies [18, 19, 21, 29, 33, 34]. Additionally, item 4 displayed a very small variance of 0.35 and showed very small differences in its thresholds based on IRT analysis.

Because of their limited associations with other items, rare symptom prevalence, and limited associations with measures of functioning, quality of life, and health service use, these three items were not included in a new questionnaire, termed “The Somatic Symptom Scale-8” (SSS-8) [35].

Strengths and limitations

This is the first validation study of the PHQ-15 in a large sample of patients at a major Chinese tertiary hospital that has a full complement of services for a broad range of medical conditions.

The sample included the most important departments of a general hospital. The patients were representative of general hospital inpatients with respect to sex, marital status, education level and income level. The overall response rate was very high (90.1 %). The validation process included IRT analysis, which is a new analysis of the PHQ-15.

However, there were some limitations of our study. (1) We did not perform a structured clinical interview; therefore, the sensitivity and specificity of the PHQ-15 for assessing somatoform disorders could not be established. However, the PHQ-15 is best characterized as a measure of somatic symptom severity rather than a diagnostic instrument for somatoform disorders [5]. It would be important to diagnose patients with a new classification system of somatic symptom disorders in China. (2) The study was cross-sectional. Longitudinal studies are needed to determine the test-retest reliability of the PHQ-15 and its responsiveness to treatment. (3) There was no assessment of functional status or health-related quality of life. (4) There was no systematic assessment of medical conditions or independent measure of healthcare utilization. (5) Indigenous and common expressions of somatic distress among Chinese patients are not captured by the PHQ-15. (6) A multi-center study would be an optimal approach.


The PHQ-15 displayed adequate reliability and good evidence of validity for detecting patients with severe somatic symptoms in a Chinese hospital. Several of the current findings were consistent with previous research regarding the PHQ-15. To improve the diagnostic quality of the PHQ-15, items 4, 8 and 11 can be omitted.

Future research should examine whether differences in factorial structure and the cross-loading of items across populations are related to sampling, methodological factors and/or cultural differences in experiences with somatic disorders.

Ethics approval and consent to participate

The study was approved by the Ethics Committee of West China Hospital of Sichuan University. Written informed consent was obtained from all participants.

Consent for publication

Not applicable.

Availability of data and materials

All the data supporting our findings is contained within the manuscript.