INTRODUCTION

Depression affects 7–27% of the population1,2,3 but is often under-diagnosed and under-treated. By appropriately identifying and treating patients, primary care physicians can play a crucial role in improving depression care.4, 5 In 2016, the U.S. Preventive Service Task Force (USPSTF) issued a recommendation supporting systematically screening adults for depression.6 The recommendation included the caveat that “adequate systems” should be in place to ensure patients receive appropriate treatment. Yet it is unclear how to effectively implement depression screening into health care systems since implementation of screening questionnaires alone does not increase identification or treatment.7, 8 In fact, two-thirds of patients screened for depression in primary care did not receive treatment.9

Despite this ambiguity, many value-based contracts have incorporated quality measures for depression screening.10 In response, health systems are implementing systematic processes to screen and treat patients.11 System-wide approaches have been effective in treating other diseases (e.g., hypertension) by using practice-wide disease registries, allied health professional support, and patient education.12 Efforts to integrate depression treatment have been less successful11, 13 due to physicians’ lack of comfort managing depression14 and limited access to psychiatrists and psychologists.15 Even the collaborative care model, which has been shown to improve depression management in research studies,16,17,18 has been challenging to implement in primary care practices.19

Integration of systematic depression screening with clinical decision support within the clinical workflow may alleviate barriers to depression identification and management. This study aims to identify whether integrating systematic depression screening into primary care coupled with electronic clinical decision support is associated with increased depression identification and treatment. We also assessed the test characteristics of the Patient Health Questionnaire (PHQ) when implemented as part of usual care in a large integrated health system.

METHODS

Study Design and Population

This retrospective pre-post study evaluated the impact of integrating depression screening within 37 internal and family medicine clinics in a large health system in Northeast Ohio. Practices began screening in May, June, or July of 2016. We grouped practices by their start dates (Appendix) (subsequently called “practice group”). To enable acclimation to screening, we excluded visits during the month screening began. We included adult patients with at least one primary care visit in 2016. Since depression screening is meant to increase case-finding, we excluded adults with a diagnosis of depression in 2015 or those diagnosed in 2016 prior to their first primary care visit of the year. For patients with a diagnosis of depression on the problem list, we identified treatments within 90 days of the initial diagnosis and excluded all subsequent visits. To allow adequate follow-up time, we included referrals, evaluations, or medication prescriptions that occurred in the first quarter of 2017. This study was approved by Cleveland Clinic’s Institutional Review Board.

Systematic Depression Screening

Before implementation, a project manager instructed physicians on the purpose of depression screening, the workflow, the care plan for patients diagnosed with depression, and the clinical depression smart set. Prior to implementation, physicians could, at their own discretion, screen patients on paper.

Afterwards, all clinics used the Patient Health Questionnaire (PHQ). The PHQ is both valid and reliable,20 is self-administered, and is commonly used in medical settings. Patients were screened using the PHQ-2, either through the patient portal prior to the visit, via a tablet or desktop computer during the visit. PHQ-2 scores ≥ 2 prompted patients to complete the remaining 7 questions included in the PHQ-9. If a patient’s score was ≥ 10 (e.g., at least moderate symptoms), they were administered the PHQ at each subsequent visit (assuming at least 30 days elapsed since the last PHQ) until the score was < 5. If the initial score was ≤ 9, the PHQ was repeated annually. For simplicity, we refer to the PHQ-2 and PHQ-9 as the PHQ since the first two questions on the PHQ were always asked initially, followed by the remaining 7 questions when applicable.

Clinical Decision Support

The Knowledge Program (KP), a tool that collects structured health information, enabled the systematic collection of patient-entered data.21 PHQ responses were immediately available in the electronic health record (EHR). If a score was ≥ 10, the physician was alerted within the encounter through a clinical decision alert. The alert contained an order set that included antidepressant medications, consults to behavioral health providers, and a pre-checked order to include depression and resource information on the patient’s after visit handout.

Data Collection

We used the EHR to identify demographic, clinical, and utilization data. Patient-level covariates included demographic characteristics (i.e., age, sex, race, marital status) and diagnosis codes. Diagnoses were based on International Classification of Diseases, Tenth Revision (ICD-10) code definitions from the Medicare Chronic Conditions Data Warehouse.22

Outcomes

Our primary outcomes were depression diagnosis and treatment. We identified a person as depressed if an ICD-10 diagnosis of depression was associated with their visit. We defined treatment as receipt of an antidepressant prescription, referral to a behavioral health specialist (i.e., a psychologist, psychiatrist, or behavioral health social worker), or evaluation by a behavioral health specialist within 90 days. We used a 90-day window because patients often defer treatment, and it may take months to get an appointment with a behavioral health specialist. To test the robustness of our results to this time frame, we conducted a sensitivity analysis wherein referrals and prescriptions had to occur on the same day as the visit. These results were similar and therefore not reported.

We calculated the test characteristics of the PHQ when it is implemented as part of usual care using a multiple step process. Initially, we calculated the sensitivity and specificity of the PHQ using depression diagnosis on the problem list as the gold standard. Next, we conducted a chart review to verify the accuracy of the ICD codes, which can be subject to coding variation and misdiagnosis.23, 24 We reviewed 50 patients who scored zero on the PHQ and had a diagnosis of depression (potential false negatives), classifying them as having controlled or active depression. We also reviewed 50 patients with a score ≥ 10 and no depression diagnosis (potential false positives), classifying them as having depression, other mood disorders (anxiety/pain/fatigue), or no mention of mood disorder. This last category could encompass patients without depression and others where the diagnosis was missed. Finally, we adjusted our original sensitivity and specificity calculations based on the chart review results. We transferred the percentage of patients with controlled depression and a PHQ score ≤ 10 from the “depressed” to “not depressed” category. We transferred the percentage of patients with active depression and a PHQ score ≥ 10 from the “not depressed” to “depressed” category.

Statistical Analyses

We first assessed the percentage of participants who were screened, diagnosed, and treated for depression. To identify whether the percentage of visits where a patient received a diagnosis was different before versus after implementation of screening, we performed an interrupted time series analysis, using a linear regression model that included splines at the month of transition for each practice group and adjusted for patient covariates and practice group. We used a post-estimation command to determine if the intercept and slope differed significantly after implementation. We reported treatment at the first visit with an associated ICD code for depression on the problem list because these visits should be particularly affected by systematic depression screening. We used multilevel logistic regression to evaluate the effect of screening on odds of depression treatment, controlling for patient age, sex, race, insurance status, and number of chronic conditions and accounted for clustering within practice group.

Secondary Analysis

To identify whether the PHQ score was associated with depression identification or treatment, we compared the odds of diagnosis and treatment for patients with a score right below versus right above the cutoff for clinical decision support (PHQ score ≥ 10). We defined below the cutoff as a score of 8 or 9 and a score of 10 or 11 was defined as above the cutoff. As a false-specification test, we also compared patients who scored a 10 or 11 with those who scored a 12 or 13. We used Stata 14.0 to perform the analysis.

RESULTS

Our study population included 259,411 patients. Patients had an average of 1.5 visits during the study period (SD 0.97); 26% had at least one visit before and after implementation. The median age was 55.1 years (interquartile range [IQR] 40.5–66.8), and the majority were female (56.4%), white (83.7%), married (59.9%), and privately insured (62.0%). Demographic characteristics were similar before and after implementation (Table 1)

Table 1 Characteristics of Patients with a Visit Before Versus After Implementation of Systematic Depression Screening

.

Depression Screening After Implementation

After implementation, 59% of patients had ≥ 1 screen. On the initial screen, the majority (96%) had a score that indicated no depression symptoms and 3.0% demonstrated moderate to severe symptoms. Three and a half percent of all patients initially screened were screened again in 2016.

Depression Diagnosis by a Physician

In 2016, 6.6% of patients had a visit diagnosis of depression. The adjusted rate of depression diagnosis per month increased by 1.2% immediately after implementation of systematic screening (p < 0.01) (Fig. 1) from 1.7 to 2.9%.

Fig. 1
figure 1

Adjusted percent of primary care visits with a depression diagnosis by month

Of the 7499 patients with a diagnosis of depression after implementation of screening, 56% received a PHQ on the day of diagnosis. Of those, 42% had a score of zero and the median of the remaining scores was 10 (IQR 1–27).

Treatment

Receipt of ≥ 1 treatment within 90 days of depression diagnosis increased from 64.3 to 68.7% following implementation of screening (p = 0.001) (Table 2). The percentage of patients with depression who received an antidepressant prescription increased by 3.2% and referrals increased by 4.0% (p < 0.001, respectively) post-implementation. The percentage of patients who had an evaluation was unchanged. The adjusted odds of receiving any treatment within 90 days of diagnosis was 20% higher post-implementation compared with pre-implementation (AOR 1.20, 95% CI 1.12–1.28, p < 0.01).

Table 2 Depression Treatment Before and After Implementation of Systematic Screening

Secondary Analysis

Treatment After Depression Screening

There were 1686 PHQ scores between 8 and 11 (52% were scores of 8 or 9 and 48% were scores of 10 or 11). In the adjusted regression model, the odds of depression identification was 3.5 times higher (AOR 3.54, 95% CI 1.79–7.00) and the odds of treatment was three times higher (AOR 3.03, 95% CI 1.77–5.12) for patients who had a score of 10 or 11 versus 8 or 9. To assess whether this association was due to having more severe symptoms, as opposed to being a direct consequence of the screening program and its clinical decision support, we compared patients with a score of 10 or 11 with patients with a score of 12 or 13. Patients with a 12 or 13 had non-significantly increased odds of depression identification (AOR 1.44, 95% CI 0.93–2.23, p = 0.10) and non-significantly increased odds of treatment (AOR 1.52, 95% CI 0.91–2.56, p = 0.11) compared with patients with a score of 10 or 11.

Test Characteristics of the PHQ

Based on ICD codes only, the sensitivity of a PHQ ≥ 10 to identify patients with depression was 29% and specificity was 98%. In chart review, 58% of patients who scored zero on the PHQ and were diagnosed with depression had a controlled depression and 42% had active depression. Of patients with a score ≥ 10 and no depression diagnosis, 16% had depression, 44% had another mood disorder, and 42% had no mention of any mood disorder. After adjusting the PHQ’s test characteristics for the chart review results, the sensitivity was 55% and the specificity was 98% (Appendix).

DISCUSSION

Our study found that implementation of systematic depression screening with electronic clinical decision support resulted in high rates of depression screening—59% compared with the national average of 4%.25 After implementation, there was a large relative increase in the rate of depression diagnosis, coupled with higher rates of treatment compared with the pre-implementation period. The increase in treatment was mainly comprised of referrals to behavioral health and secondarily of prescriptions for antidepressant medications.

Analogous to evaluations of other complex quality improvement interventions,12, 26 it would be impossible to isolate the influence of screening as a stand-alone intervention. Although depression screening undoubtedly identified cases that would otherwise have been missed, most cases identified in the post-implementation period either did not complete the PHQ or scored zero. In those cases, increased awareness of depression among physicians (resulting from introduction of screening into their clinical workflow), and among patients (by asking them about depression symptoms), may have resulted in greater sensitivity to symptoms of depression.

Our findings suggest that physicians were influenced by the clinical decision support. Patients who were screened and scored just above the threshold for clinical decision support had much greater odds of being diagnosed and treated for depression than patients just below the threshold. We did not see a similar increase when we compared patients with a score of 10 or 11 with those with a score of 12 or 13, suggesting that it was the decision support and not the severity of illness that was responsible for additional diagnosis and treatment. This might explain why one pragmatic cluster-randomized trial found no impact of systematic depression screening on depression recognition.27 The study team held training sessions for physicians and sent monthly reminders to screen, but did not incorporate decision support.27 Decision support embedded in the EHR can help physicians interpret unfamiliar data and facilitate prescribing and referrals through linked order sets.

Only 3% of our patients had a PHQ score indicative of depression, which is substantially lower than in prior studies.28, 29 A quality improvement project which implemented depression screening in primary care found that 17% of patients screened positive for depression and 56% of those had clinician-diagnosed depressive disorder, for an overall prevalence of 10%.28 Our depression prevalence was slightly lower (at 7%) but similar to a Medical Expenditure Panel Survey study that found 8% of U.S. adults had depression.9 The low rate of detected cases likely reflects that we excluded patients with a prior depression diagnosis in accordance with the “screening for depression” quality measure.30 Importantly, the rate of detection rose after screening implementation. This increase likely represents new cases that would have otherwise gone undetected. The percentage of patients diagnosed with depression after screening is similar to prior reported incidence rates.31, 32 Thus, other health systems might expect a 1% increase in patients with newly detected depression if they initiate a comparable screening program.

Alternatively, our low detection rate could be due to a change in the test characteristics of the PHQ when implemented as part of usual care across a health system. In the research setting, the PHQ has been validated to a have a sensitivity 88% and a specificity 88% for major depression.20 A study in primary care practices in New Zealand found that the PHQ-9 had lower sensitivity (74%) and higher specificity (91%) than in clinical trials.29 In our practices, sensitivity was even lower (55%) and specificity higher (98%). This may have been a function of how the test was administered. In some practices, medical assistants administered the PHQ-2 verbally, to decide whether the PHQ-9 was required. Patients may have been reticent to answer truthfully, leading to low sensitivity. This may also have contributed to the test’s high specificity, as patients without depression may have been successfully screened out by the PHQ-2. After conducting a chart review, we adjusted our sensitivity and specificity to account for the inaccuracy of the ICD codes, which do not distinguish between controlled and uncontrolled depression. Even so, 45% of patients with a clinician-identified depression would have been missed by the PHQ in this setting.

The USPSTF recommends screening for depression in the adult population based on evidence that screening coupled with treatment improves outcomes.6 Only one study included in their review directly compared depression screening with usual care and found no effect of screening on treatment or depression symptoms compared to usual care.6, 33 The other studies evaluated effectiveness of treatment for patients whose depression was identified through screening.6, 34 The Canadian Task Force on Preventive Health Care does not recommend routine depression screening among average-risk adults because they found insufficient evidence that screening without integrated staff support was beneficial and the availability of staff-assisted support is variable.35 Our findings demonstrate that systematic screening with clinical decision support can increase both diagnosis and treatment of depression in primary care. Similarly the DIAMOND (Depression Improvement Across Minnesota: Offering a New Direction) study which tested a state-wide initiative to implement depression screening along with the collaborative care model in primary care found screening increased treatment intensification. Despite this, they found no difference in depression outcomes.36, 37 Thus, whether screening improves outcomes remains uncertain, but given treatment of screen-detected patients is associated with improved outcomes, our findings are encouraging.

This study has several limitations. First, there was likely variability in administration of the PHQ. However, this study of over 250,000 patients provides insights into what other large health systems might expect after implementing systematic depression screening. Second, the percentage of visits with a PHQ screen may have been under-estimated since we captured only screening in primary care, and screening was also deployed during specialty visits. Third, we used receiving a diagnosis of depression on the day of the PHQ screen as the gold standard to calculate sensitivity and specificity. Prior studies have found that physicians misdiagnose depression in primary care which may have impacted our results.23, 38 Fourth, we may have under-estimated treatment since we only included treatment that occurred within 90 days of a visit. Further, we could only identify behavioral health visits within our health system. Visits to a behavioral health provider outside of our system could not be included. Thus, the percentage of patients who saw a behavioral health provider is likely higher than reported. Importantly, we used a consistent measure of treatment before and after implementation of systematic screening so the increase in treatment is likely real.

CONCLUSIONS

Implementing systematic depression screening in primary care within a large health care system led to high rates of screening and higher rates of depression diagnosis and treatment. Health systems implementing depression initiatives as part of value-based care contracts can anticipate that depression screening in concert with clinical decision support should improve treatment rates, particularly referrals to behavioral health care.