Design
This validation study was performed as part of a randomized controlled trial (RCT) evaluating cost-effectiveness of an e-health module embedded in collaborative occupational health care for common mental health disorders. The design of this RCT is described extensively elsewhere [21]. In February 2011, the medical ethics committee at the Institutions for Mental Health, Utrecht, the Netherlands, approved the study protocol. Data for this validation study were collected in the recruitment phase of the RCT.
Setting
The study was conducted in an occupational health setting.
Participants
Employees on sickness leave for any reason between 4 and 26 weeks received written information about the study from the occupational health service, together with an information leaflet from the Trimbos-institute, an informed consent form and a screener that contained the PHQ-9. They were asked to participate in the RCT, to sign the informed consent form and to return it together with the completed screener to the researchers if they agreed to participate in the study. For the RCT, employees with a positive score on the PHQ-9 were contacted by telephone for a diagnostic interview, the MINI [20]. For this validation study, during a period of 4 months in the recruitment phase of the RCT, employees with negative PHQ-9 score were also contacted for a diagnostic interview. Employees who could not be contacted for a diagnostic interview within 30 days were excluded from the validation study. The interviewers were blinded to the results of the screener.
Measurement Instruments
Demographics
Age, gender and duration of sickness absence were assessed at the start of the study.
The PHQ-9
The PHQ-9 is the subscale for depression of the self-administered version of the PRIME-MD diagnostic instrument for common mental disorders [10]. The PHQ-9 contains nine questions corresponding to the nine DSM-IV symptoms for MDD during the past 14 days. The answer categories were based on a 4-point response scale, with the categories ‘not at all’ (0), ‘various days’ (1), ‘more than half of the days’ (2) and ‘nearly every day’ (3). As such, the summed PHQ-9 score could range from 0 to 27. A score of ≥5 is considered an indication of mild depression, a score of ≥10 moderate depression, a score of ≥15 moderately severe depression and a score of ≥20 is an indication of severe depression [10].
MINI-International Neuropsychiatric Interview
The MINI-International Neuropsychiatric Interview is a short structured diagnostic interview, developed jointly by psychiatrists and clinicians, for diagnosis of the most common DSM-IV and ICD-10 psychiatric disorders [20]. For the current study, a Dutch version of the interview was used [22]. The MINI includes 23 disorders, however for the current study, only the modules for depressive and anxiety disorders were used. All interviewers were trained in carrying out the interview and were able to consult a psychiatrist in case of diagnosis uncertainty.
Statistical Analysis
First, the demographic characteristics and the mean PHQ-9 scores were compared between the group of employees who, according to the MINI, had MDD, and the employees who did not have MDD. Chi square tests and independent samples t tests were used to test for significant differences. It was expected that the mean PHQ-9 score was higher in the MINI MDD group than in the MINI non-MDD group. This supports the construct validity of the scale, using the “known groups” method [23]. Cohen’s d was calculated for reporting effect size [24].
The diagnostic validity of the PHQ-9 was analysed in terms of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and efficiency for all possible cut-off values of the PHQ-9 ranging from 0 to 27. Youden’s J (=(sensitivity + specificity) − 1) was computed to find the optimal balance between sensitivity and specificity. The optimal cut-off value is the value for which J reaches its maximum.
Furthermore, to access precision, 95 % confidence intervals (95 % CI) were calculated for the sensitivity, specificity, PPV, NPV and efficiency for each cut-off value. The 95 % CIs were computed using the method suggested by Agresti and Coull because this method also produces accurate 95 % CIs for observed proportions close to 0 or 1 [25, 26]. For cut-off values at the extremes of the PHQ-9, the sample sizes were too small to calculate accurate 95 % CI for the NPVs and PPVs. Therefore, we only report the 95 % CI of the NPV and PPV if the sample sizes were ≥15 [25].
A receiver operating characteristic (ROC) analysis was performed, which calculated an area under the curve (AUC) for the PHQ-9. The AUC can be interpreted as the distinctive character of the tests, or the probability that a randomly chosen participant would be correctly distinguished based on their screening score [27].
The statistical analyses were performed in SPSS version 22.0 [28].