Validation of the Patient Health Questionnaire-9 for Major Depressive Disorder in the Occupational Health Setting

Purpose Because of the increased risk of long-term sickness leave for employees with a major depressive disorder (MDD), it is important for occupational health professionals to recognize depression in a timely manner. The Patient Health Questionnaire-9 (PHQ-9) has proven to be a reliable and valid instrument for screening MDD, but has not been validated in the occupational health setting. The aim of this study was to validate the PHQ-9 for MDD within a population of employees on sickness leave by using the MINI-International Neuropsychiatric Interview (MINI) as a gold standard. Methods Participants were recruited in collaboration with the occupational health service. The study sample consisted of 170 employees on sickness leave between 4 and 26 weeks who completed the PHQ-9 and were evaluated with the MINI by telephone. Sensitivity, specificity, positive and negative predictive value, efficiency and 95 % confidence intervals (95 % CIs) were calculated for all possible cut-off values. A receiver operator characteristics (ROC) analysis was computed for PHQ-9 score versus the MINI. Results The optimal cut-off value of the PHQ-9 was 10. This resulted in a sensitivity of 86.1 % [95 % CI (69.7–94.8)] and a specificity of 78.4 % [95 % CI (70.2–84.8)]. Based on the ROC analysis, the area under the curve for the PHQ-9 was 0.90 [SE = 0.02; 95 % CI (0.85–0.94)]. Conclusion The PHQ-9 shows good sensitivity and specificity as a screener for MDD within a population of employees on sickness leave.


Introduction
Major depressive disorders (MDD) are highly associated with sickness leave, and lead to personal suffering and high societal costs [1,2]. The yearly prevalence of MDD in the working population of the Netherlands is 4.8 % [3]. Moreover, employees with MDD are at risk for long-term sickness leave [4,5]. Long-term sickness leave is responsible for enormous costs for patients, companies and society as a whole. The loss in productivity and the payments for disability benefits place a substantial burden on the economies of many developed countries [6].
Because of the increased risk of long-term sickness leave for employees with a MDD, it is important for occupational health professionals (e.g., occupational physicians) to be able to recognize depression and start or refer to treatment in a timely manner. Several studies have shown that it is difficult to recognize MDD, because patients do not always present themselves with mental health problems [7,8]. As such, the availability of good screening instruments for depression among employees on sickness leave is important. For the occupational health (OH) setting, these instruments must be brief, easy to use and reliable and valid for the specific population.
The Patient Health Questionnaire (PHQ) is a short, selfreport version of the Primary Care Evaluation of Mental Disorders (PRIME-MD) [9]. The PHQ-9, the depression subscale of the PHQ, is a reliable and valid instrument for screening MDD [10,11]. Several studies have reported good psychometric qualities of the PHQ-9 in primary care settings as well as in the general population [11][12][13][14]. A meta-analysis showed that the optimal cut-off points for diagnosing depression with the PHQ-9 are between 8 and 11 [15].
The commonly used cut-off value for the PHQ-9 is 10 [10]. However, the optimal cut-off score may differ depending on the setting [15]. In a validation study of the PHQ-9 in primary care in the Netherlands, an optimal cutoff value of 6 was found [13]. Whereas, a validation study in the Netherlands among diabetes patients in specialized outpatients clinics found an optimal cut-off value of 12 [16]. It could be expected that in a population who is suffering from other physical conditions and symptoms a higher cut-off value of the PHQ-9 is needed because these symptoms could be recognized by the PHQ-9 as depressive symptoms, while in reality they are symptoms of other physical conditions.

Rationale
To our knowledge, validation of the PHQ-9 in the OH setting has not yet been performed. For many people, working is an important aspect of daily life and absence of work is associated with social isolation or loss of daily routines, which are also symptoms of MDD [17,18]. Furthermore, sick-listed employees often have other physical disorders or conditions with symptoms that can also occur as symptoms of MDD, such as pain and fatigue [19]. This may cause higher scores on the PHQ-9 in a population of sick-listed employees than in the general population. Therefore, it is possible that to correctly identify MDD within a population of sick-listed employees, a higher cut-off value is necessary. The aim of the current study is to validate the PHQ-9 for the OH setting by comparing the PHQ-9 with the Dutch version of the MINI-International Neuropsychiatric Interview (MINI) as the gold standard [20].

Design
This validation study was performed as part of a randomized controlled trial (RCT) evaluating cost-effectiveness of an e-health module embedded in collaborative occupational health care for common mental health disorders. The design of this RCT is described extensively elsewhere [21].
In February 2011, the medical ethics committee at the Institutions for Mental Health, Utrecht, the Netherlands, approved the study protocol. Data for this validation study were collected in the recruitment phase of the RCT.

Setting
The study was conducted in an occupational health setting.

Participants
Employees on sickness leave for any reason between 4 and 26 weeks received written information about the study from the occupational health service, together with an information leaflet from the Trimbos-institute, an informed consent form and a screener that contained the PHQ-9.
They were asked to participate in the RCT, to sign the informed consent form and to return it together with the completed screener to the researchers if they agreed to participate in the study. For the RCT, employees with a positive score on the PHQ-9 were contacted by telephone for a diagnostic interview, the MINI [20]. For this validation study, during a period of 4 months in the recruitment phase of the RCT, employees with negative PHQ-9 score were also contacted for a diagnostic interview. Employees who could not be contacted for a diagnostic interview within 30 days were excluded from the validation study. The interviewers were blinded to the results of the screener.

Demographics
Age, gender and duration of sickness absence were assessed at the start of the study.

The PHQ-9
The PHQ-9 is the subscale for depression of the self-administered version of the PRIME-MD diagnostic instrument for common mental disorders [10]. The PHQ-9 contains nine questions corresponding to the nine DSM-IV symptoms for MDD during the past 14 days. The answer categories were based on a 4-point response scale, with the categories 'not at all' (0), 'various days' (1), 'more than half of the days' (2) and 'nearly every day' (3). As such, the summed PHQ-9 score could range from 0 to 27. A score of C5 is considered an indication of mild depression, a score of C10 moderate depression, a score of C15 moderately severe depression and a score of C20 is an indication of severe depression [10].

MINI-International Neuropsychiatric Interview
The MINI-International Neuropsychiatric Interview is a short structured diagnostic interview, developed jointly by psychiatrists and clinicians, for diagnosis of the most common DSM-IV and ICD-10 psychiatric disorders [20]. For the current study, a Dutch version of the interview was used [22]. The MINI includes 23 disorders, however for the current study, only the modules for depressive and anxiety disorders were used. All interviewers were trained in carrying out the interview and were able to consult a psychiatrist in case of diagnosis uncertainty.

Statistical Analysis
First, the demographic characteristics and the mean PHQ-9 scores were compared between the group of employees who, according to the MINI, had MDD, and the employees who did not have MDD. Chi square tests and independent samples t tests were used to test for significant differences. It was expected that the mean PHQ-9 score was higher in the MINI MDD group than in the MINI non-MDD group. This supports the construct validity of the scale, using the ''known groups'' method [23]. Cohen's d was calculated for reporting effect size [24].
The diagnostic validity of the PHQ-9 was analysed in terms of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and efficiency for all possible cut-off values of the PHQ-9 ranging from 0 to 27. Youden's J (=(sensitivity ? specificity) -1) was computed to find the optimal balance between sensitivity and specificity. The optimal cut-off value is the value for which J reaches its maximum.
Furthermore, to access precision, 95 % confidence intervals (95 % CI) were calculated for the sensitivity, specificity, PPV, NPV and efficiency for each cut-off value. The 95 % CIs were computed using the method suggested by Agresti and Coull because this method also produces accurate 95 % CIs for observed proportions close to 0 or 1 [25,26]. For cut-off values at the extremes of the PHQ-9, the sample sizes were too small to calculate accurate 95 % CI for the NPVs and PPVs. Therefore, we only report the 95 % CI of the NPV and PPV if the sample sizes were C15 [25].
A receiver operating characteristic (ROC) analysis was performed, which calculated an area under the curve (AUC) for the PHQ-9. The AUC can be interpreted as the distinctive character of the tests, or the probability that a randomly chosen participant would be correctly distinguished based on their screening score [27].
The statistical analyses were performed in SPSS version 22.0 [28].

Flowchart
In total, 3569 employees sick-listed due to any cause were approached to fill out the PHQ-9 questionnaire (and to participate in the RCT), of whom 188 employees returned the questionnaire. It is not known whether the 3381 nonresponders had already fully returned to their work and therefore did not complete the PHQ-9 or that they did not respond due to any reason. Of the 188 eligible employees, 18 employees were unable to be reached for the MINIinterview within 30 days after they complete the PHQ-9. As a result, data from 170 employees were included in the analyses. From the total of 170 MINIs, 36 employees scored positively for MDD (prevalence = 21.2 %). Figure 1 shows the flowchart of the participants in this study.

Demographic Characteristics
The mean age of participants in the final study sample (N = 170) was 45.4 years (SD = 10.9); age ranged from 21 to 66 years. Gender was divided equally between male and female participants (50.0 %). The average number of weeks of sickness leave when filling out the PHQ-9 was 10.8 (SD = 3.6). The average number of days between completion of the screener and administration of the MINI was 13.7 (SD = 7.2). None of these characteristics showed a significant difference between the MINI MDD and the MINI non-MDD group.

Mean Scores PHQ-9
The mean score on the PHQ-9 for the entire group was 8.0 (SD = 7.1, range 0-27). The mean PHQ-9 score in the MINI MDD group was 16.3 (SD = 6.0, range 6-27) and the mean PHQ-9 score in the MINI non-MDD group was 5.8 (SD = 5.6, range 0-23). The difference between the means was significant (p \ 0.01). This results in a Cohen's d of 1.81, which indicates a large effect size [29]. Table 1 shows the sensitivity, specificity and corresponding 95 % CI for all possible cut-off values. Table 2 shows the predictive values for both positive and negative test results (PPV and NPV), efficiency and the corresponding 95 % CI for all the cut-off values of the PHQ-9.

Classification Scores
Youden's index J is highest at a cut-off value of 10. Table 1 shows that a cut-off value of 10 also results in the most optimal balance between sensitivity and specificity. This results in a sensitivity of 86.1 %, specificity of 78.4 %, PPV of 51.7 %, NPV of 95.5 % and an efficiency of 80.0 % (see Tables 1, 2).

ROC Analysis
The ROC curve is shown in Fig. 2

Main Outcomes
In the current study, the concurrent validity of the PHQ-9 in screening MDD among sick-listed employees for any reason was evaluated. The mean scores on the PHQ-9 in the MINI MDD group versus the MINI non-MDD group were significantly different. This supports the construct validity of the PHQ-9. The PHQ-9 also showed good criterion validity characteristics; the optimal cut-off value was 10. At this value, the PHQ-9 has a sensitivity of 86.1 %, specificity of 78.4 %, PPV of 51.9 %, NPV of 95.5 % and efficiency of 80.0 %. This means that 86.1 % of sick-listed employees with MDD (according to the MINI), will be detected as such and 78.4 % of sick-listed employees without MDD will score negative on the PHQ-9. Furthermore, 51.9 % with a positive PHQ-9 score will be diagnosed with MDD by the MINI and 95.5 % with a negative PHQ-9 score will not be diagnosed with MDD by the MINI. The AUC refers to the distinctive character of the tests and is 0.90.

Comparison with Other Studies
The optimal cut-off value for the PHQ-9 in this study was 10. This cut-off value is the same value that is typically used in primary care [10]. In a meta-analysis of validation studies of the PHQ-9, a pooled sensitivity of 85 % and a pooled specificity of 89 % was found for the cut-off value of 10 [15] this is comparable to the sensitivity and specificity that we found in the current study.
In the Netherlands, Zuithoff et al. [13] studied the validation of the PHQ-9 in a primary care setting. The results showed that the commonly used threshold of 10 had a sensitivity of 49 % and a specificity of 95 %. The optimal cut-off value was 6, which resulted in a sensitivity of 82 % and specificity of 82 % [13]. The fact that in the primary care setting in the Netherlands a lower cut-off value was found than in the OH setting could be due to the fact that sick-listed employees often have other physical disorders or conditions with symptoms that overlap with the symptoms of MDD. The PHQ-9 is also validated in the Netherlands in patients with diabetes in specialized outpatients clinics [16]. The optimal cut-off value in that setting was 12, which resulted in a sensitivity of 75.7 % and a specificity of 80.0 %. Thus, in that setting, a higher cut-off value was found than in the OH setting. It is hypothesized that this may be due to the fact that the patients from a specialized diabetes clinic have more severe pathology and more complications, which could be recognized by the PHQ-9 as depression symptoms, while instead being diabetes symptoms [16].

Strengths and Limitations
A strength of the current study is that this was the first validation study of the PHQ-9 within a population of sicklisted employees. Another strength is that the interviewers were blinded to the results of the screener. The inclusion of the 95 % CIs using the method from Agresti and Coull is also a strength, as this indicates the precision of the estimated classification indices (e.g., sensitivity and specificity), which in turn informs researchers about the generalizability of the outcomes at the population level [25,26].
A limitation of the current study is that because of the high rate of non-response and exclusion of participants that could not be reached within 30 days for the MINI-interview, selection bias might have occurred. Unfortunately, there were no demographic data for the non-responders; as a result a sensitivity analysis was impossible. Reasons for the high rate of non-response could be that this validation study was conducted alongside a randomized controlled trial and it is likely that employees who did not want to participate in the RCT did not respond to the screener. Furthermore, it is possible that a number of the employees did not respond to the screener because they were no longer on sick leave. Another limitation may be the amount of time between completion of the screener and the diagnostic interview. It is possible that the absence or presence of MDD at the time of completion of the screener did not match the results of the MINI-interview due to a change in symptoms in the time between the screener and the MINI- interview. However, the test-retest reliability of the PHQ-9 over a similar two week period, studied by Zuithoff et al. [13] was very good.
A final limitation is the lack of information about reason for sick leave, types of disabling conditions and comorbid physical symptoms of the sick-listed participants. However, Vlasveld et al. [5] showed that regardless of the reason for sick leave, depression is a predictor of a longer duration of absence from work. Therefore, it is important to detect MDD in this population of sick-listed employees regardless of their reason for absence.

Practical and Research Implications
Our findings suggest that the PHQ-9 can be used as a screener for detecting MDD in the OH setting. The optimal cut-off value is determined by the decisions that are made based on the cut-off value and depend on the context in which the screening instrument is used. OPs often have to decide on the referral to treatment. It is important for them to save costs by avoiding unnecessary treatment and to refer to treatment correctly for the employees that need it. The test needs to detect the presence of the disorder in employees who actually suffer from the disorder, but it also needs to detect the absence of the disorder in a person who does not suffer from the disorder. It should be noted that with the cut-off value of 10, the PPV is 51.9 %, thus there is a substantial chance of false positives. The PHQ-9 and MINI used in this study are both based on the DSM-IV; during the course of this study, the DSM-5 was published [30]. The criteria for MDD are minimally changed in the DSM-5, the most important change is that bereavement is no longer an exclusion criteria. The PHQ-9 scores are not affected by this change because the questionnaire does not include an item on bereavement. However, because the MINI does include a question about bereavement, the removal of bereavement as exclusion criterion for MDD might lead to a slightly better concurrent validity of the PHQ-9.
In the current study, the concurrent validity of the PHQ-9 in a population of sick-listed employees is studied. Further research could address other forms of validity testing and related aspects such as factor structure.

Conclusions
Due to the increased risk of long-term sickness leave for employees with a MDD, it is important for occupational health professionals to recognize MDD and to start or refer to treatment in a timely fashion. This study showed that the PHQ-9 is a questionnaire with good sensitivity and specificity in the OH setting. Therefore, we recommend the use of the PHQ-9 as a screening instrument for MDD in sicklisted employees.
Funding The Trimbos-institute received funding for this study from The Netherlands organization for Health Research and Development (ZonMw) and from Achmea SZ, a Dutch insurance company. The results and conclusion reported in this paper are independent from funding sources.